Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-16 Thread Ekaterina Dimitrova
I second Patrick about the parties and all that… Thanks, Simon for all your
work! I am excited to see what’s next from you as I am sure it will be
awesome!

Cheers!

On Thu, 16 May 2024 at 14:50, Jon Haddad  wrote:

> Benjamin, I’m +1 on adding BETWEEN, thanks for bringing this up.
>
> To all, my intention wasn’t to suggest we add support for update between
> via range writes at the same time, if it came across that way i apologize
> for the confusion.
>
> Josh, thanks for the suggestion. If I feel inspired to discuss with the
> dev list any further I’ll be sure to start a new thread.
>
> Jon
>
>
> On Thu, May 16, 2024 at 7:57 AM Josh McKenzie 
> wrote:
>
>> More of a "how could we technically reach mars?" discussion than a "how
>> we get congress to authorize a budget to reach mars?"
>>
>> Wow - that is genuinely a great simile. Really good point.
>>
>> To Jeff's point - want to kick off a [DISCUSS] thread referencing this
>> thread Jon so we can take the conversation there? Definitely think it's
>> worth continuing from a technical perspective.
>>
>> On Wed, May 15, 2024, at 2:49 PM, Jeff Jirsa wrote:
>>
>> You can remove the shadowed values at compaction time, but you can’t ever
>> fully propagate the range update to point updates, so you’d be propagating
>> all of the range-update structures throughout everything forever. It’s JUST
>> like a range tombstone - you don’t know what it’s shadowing (and can’t, in
>> many cases, because the width of the range is uncountable for some types).
>>
>> Setting aside whether or not this construct is worth adding (I suspect a
>> lot of binding votes would say it’s not), the thread focuses on BETWEEN
>> operator, and there’s no reason we should pollute the conversation of “add
>> a missing SQL operator that basically maps to existing functionality” with
>> creation of a brand new form of update that definitely doesn’t map to any
>> existing concepts.
>>
>>
>>
>>
>>
>> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
>>
>> Personally, I don't think that something being scary at first glance is a
>> good reason not to explore an idea.  The scenario you've described here is
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the
>> last I checked) has O(N) complexity on returning result sets with regard to
>> rows returned.  We've also merged in Vector search which has O(N) overhead
>> with the number of SSTables.  We're still fundamentally looking at, in most
>> cases, a limited number of SSTables and some merging of values.
>>
>> Write updates are essentially a timestamped mask, potentially
>> overlapping, and I suspect potentially resolvable during compaction by
>> propagating the values.  They could be eliminated or narrowed based on how
>> they've propagated by using the timestamp metadata on the SSTable.
>>
>> It would be a lot more constructive to apply our brains towards solving
>> an interesting problem than pointing out all its potential flaws based on
>> gut feelings.  We haven't even moved this past an idea.
>>
>> I think it would solve a massive problem for a lot of people and is 100%
>> worth considering.  Thanks Patrick and David for raising this.
>>
>> Jon
>>
>>
>>
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>
>> Ranged update sounds like a disaster for compaction and read performance.
>>
>> Imagine compacting or reading some SSTables in which a large number of
>> overlapping but non-identical ranges were updated with different values. It
>> gives me a headache by just thinking about it.
>>
>> Ranged delete is much simpler, because the "value" is the same tombstone
>> marker, and it also is guaranteed to expire and disappear eventually, so
>> the performance impact of dealing with them at read and compaction time
>> doesn't suffer in the long term.
>>
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>
>> It should be like range tombstones ... in much worse ;-). A tombstone is
>> a simple marker (deleted). An update can be far more complex.
>>
>> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>>
>> Is there a technical limitation that would prevent a range write that
>> functions the same way as a range tombstone, other than probably needing a
>> version bump of the storage format?
>>
>>
>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer 
>> wrote:
>>
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs.
>> They do work on DELETE because under the hood C* they get translated into
>> range tombstones.
>>
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>
>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
>> work.
>>
>> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>>
>> This is a great feature addition to CQL! I get asked about it from time
>> to time but then people figure out a workaround. It will be great to just
>> have it available.
>>
>> And right on Simon! I think the only project I had as a high school
>> senior was fi

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-16 Thread Jon Haddad
Benjamin, I’m +1 on adding BETWEEN, thanks for bringing this up.

To all, my intention wasn’t to suggest we add support for update between
via range writes at the same time, if it came across that way i apologize
for the confusion.

Josh, thanks for the suggestion. If I feel inspired to discuss with the dev
list any further I’ll be sure to start a new thread.

Jon


On Thu, May 16, 2024 at 7:57 AM Josh McKenzie  wrote:

> More of a "how could we technically reach mars?" discussion than a "how we
> get congress to authorize a budget to reach mars?"
>
> Wow - that is genuinely a great simile. Really good point.
>
> To Jeff's point - want to kick off a [DISCUSS] thread referencing this
> thread Jon so we can take the conversation there? Definitely think it's
> worth continuing from a technical perspective.
>
> On Wed, May 15, 2024, at 2:49 PM, Jeff Jirsa wrote:
>
> You can remove the shadowed values at compaction time, but you can’t ever
> fully propagate the range update to point updates, so you’d be propagating
> all of the range-update structures throughout everything forever. It’s JUST
> like a range tombstone - you don’t know what it’s shadowing (and can’t, in
> many cases, because the width of the range is uncountable for some types).
>
> Setting aside whether or not this construct is worth adding (I suspect a
> lot of binding votes would say it’s not), the thread focuses on BETWEEN
> operator, and there’s no reason we should pollute the conversation of “add
> a missing SQL operator that basically maps to existing functionality” with
> creation of a brand new form of update that definitely doesn’t map to any
> existing concepts.
>
>
>
>
>
> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
>
> Personally, I don't think that something being scary at first glance is a
> good reason not to explore an idea.  The scenario you've described here is
> tricky but I'm not expecting it to be any worse than say, SAI, which (the
> last I checked) has O(N) complexity on returning result sets with regard to
> rows returned.  We've also merged in Vector search which has O(N) overhead
> with the number of SSTables.  We're still fundamentally looking at, in most
> cases, a limited number of SSTables and some merging of values.
>
> Write updates are essentially a timestamped mask, potentially overlapping,
> and I suspect potentially resolvable during compaction by propagating the
> values.  They could be eliminated or narrowed based on how they've
> propagated by using the timestamp metadata on the SSTable.
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.  We haven't even moved this past an idea.
>
> I think it would solve a massive problem for a lot of people and is 100%
> worth considering.  Thanks Patrick and David for raising this.
>
> Jon
>
>
>
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>
> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> On 14/05/2024 16:59, Benjamin Lerer wrote:
>
> It should be like range tombstones ... in much worse ;-). A tombstone is a
> simple marker (deleted). An update can be far more complex.
>
> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>
> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
>
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>
> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They
> do work on DELETE because under the hood C* they get translated into range
> tombstones.
>
> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>
> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work.
>
> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>
> This is a great feature addition to CQL! I get asked about it from time to
> time but then people figure out a workaround. It will be great to just have
> it available.
>
> And right on Simon! I think the only project I had as a high school senior
> was figuring out how many parties I could go to and still maintain a
> passing grade. Thanks for your work here.
>
> Patrick
>
> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:
>
> Hi everybody,
>
> Just raising awareness that Simon is working on adding support for the
> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-16 Thread Josh McKenzie
> More of a "how could we technically reach mars?" discussion than a "how we 
> get congress to authorize a budget to reach mars?"
Wow - that is genuinely a great simile. Really good point.

To Jeff's point - want to kick off a [DISCUSS] thread referencing this thread 
Jon so we can take the conversation there? Definitely think it's worth 
continuing from a technical perspective.

On Wed, May 15, 2024, at 2:49 PM, Jeff Jirsa wrote:
> You can remove the shadowed values at compaction time, but you can’t ever 
> fully propagate the range update to point updates, so you’d be propagating 
> all of the range-update structures throughout everything forever. It’s JUST 
> like a range tombstone - you don’t know what it’s shadowing (and can’t, in 
> many cases, because the width of the range is uncountable for some types). 
> 
> Setting aside whether or not this construct is worth adding (I suspect a lot 
> of binding votes would say it’s not), the thread focuses on BETWEEN operator, 
> and there’s no reason we should pollute the conversation of “add a missing 
> SQL operator that basically maps to existing functionality” with creation of 
> a brand new form of update that definitely doesn’t map to any existing 
> concepts. 
> 
> 
> 
> 
> 
>> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
>> 
>> Personally, I don't think that something being scary at first glance is a 
>> good reason not to explore an idea.  The scenario you've described here is 
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
>> last I checked) has O(N) complexity on returning result sets with regard to 
>> rows returned.  We've also merged in Vector search which has O(N) overhead 
>> with the number of SSTables.  We're still fundamentally looking at, in most 
>> cases, a limited number of SSTables and some merging of values.
>> 
>> Write updates are essentially a timestamped mask, potentially overlapping, 
>> and I suspect potentially resolvable during compaction by propagating the 
>> values.  They could be eliminated or narrowed based on how they've 
>> propagated by using the timestamp metadata on the SSTable.
>> 
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.  We haven't even moved this past an idea.  
>> 
>> I think it would solve a massive problem for a lot of people and is 100% 
>> worth considering.  Thanks Patrick and David for raising this.
>> 
>> Jon
>> 
>> 
>> 
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
>>  wrote:
>>> __
>>> Ranged update sounds like a disaster for compaction and read performance.
>>> 
>>> Imagine compacting or reading some SSTables in which a large number of 
>>> overlapping but non-identical ranges were updated with different values. It 
>>> gives me a headache by just thinking about it.
>>> 
>>> Ranged delete is much simpler, because the "value" is the same tombstone 
>>> marker, and it also is guaranteed to expire and disappear eventually, so 
>>> the performance impact of dealing with them at read and compaction time 
>>> doesn't suffer in the long term.
>>> 
>>> 
>>> On 14/05/2024 16:59, Benjamin Lerer wrote:
 It should be like range tombstones ... in much worse ;-). A tombstone is a 
 simple marker (deleted). An update can be far more complex.  
 
 Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
> Is there a technical limitation that would prevent a range write that 
> functions the same way as a range tombstone, other than probably needing 
> a version bump of the storage format?
> 
> 
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. 
>> They do work on DELETE because under the hood C* they get translated 
>> into range tombstones.
>> 
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this 
>>> work.
>>> 
 On May 13, 2024, at 7:40 AM, Patrick McFadin  
 wrote:
 
 This is a great feature addition to CQL! I get asked about it from 
 time to time but then people figure out a workaround. It will be great 
 to just have it available. 
 
 And right on Simon! I think the only project I had as a high school 
 senior was figuring out how many parties I could go to and still 
 maintain a passing grade. Thanks for your work here. 
 
 Patrick 
 
 On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  
 wrote:
> Hi everybody,
> 
> Just raising awareness that Simon is working on adding support for 
> the BETWEEN operator in WHERE clauses (SELECT and DELETE) in 
> CASSANDRA-19604. We plan to add support for it in conditions in a 
> separate patch.
> 
> The pa

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread Jeff Jirsa
You can remove the shadowed values at compaction time, but you can’t ever fully 
propagate the range update to point updates, so you’d be propagating all of the 
range-update structures throughout everything forever. It’s JUST like a range 
tombstone - you don’t know what it’s shadowing (and can’t, in many cases, 
because the width of the range is uncountable for some types). 

Setting aside whether or not this construct is worth adding (I suspect a lot of 
binding votes would say it’s not), the thread focuses on BETWEEN operator, and 
there’s no reason we should pollute the conversation of “add a missing SQL 
operator that basically maps to existing functionality” with creation of a 
brand new form of update that definitely doesn’t map to any existing concepts. 





> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
> 
> Personally, I don't think that something being scary at first glance is a 
> good reason not to explore an idea.  The scenario you've described here is 
> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
> last I checked) has O(N) complexity on returning result sets with regard to 
> rows returned.  We've also merged in Vector search which has O(N) overhead 
> with the number of SSTables.  We're still fundamentally looking at, in most 
> cases, a limited number of SSTables and some merging of values.
> 
> Write updates are essentially a timestamped mask, potentially overlapping, 
> and I suspect potentially resolvable during compaction by propagating the 
> values.  They could be eliminated or narrowed based on how they've propagated 
> by using the timestamp metadata on the SSTable.
> 
> It would be a lot more constructive to apply our brains towards solving an 
> interesting problem than pointing out all its potential flaws based on gut 
> feelings.  We haven't even moved this past an idea.  
> 
> I think it would solve a massive problem for a lot of people and is 100% 
> worth considering.  Thanks Patrick and David for raising this.
> 
> Jon
> 
> 
> 
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev  > wrote:
>> Ranged update sounds like a disaster for compaction and read performance.
>> 
>> Imagine compacting or reading some SSTables in which a large number of 
>> overlapping but non-identical ranges were updated with different values. It 
>> gives me a headache by just thinking about it.
>> 
>> Ranged delete is much simpler, because the "value" is the same tombstone 
>> marker, and it also is guaranteed to expire and disappear eventually, so the 
>> performance impact of dealing with them at read and compaction time doesn't 
>> suffer in the long term.
>> 
>> 
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>> It should be like range tombstones ... in much worse ;-). A tombstone is a 
>>> simple marker (deleted). An update can be far more complex.  
>>> 
>>> Le mar. 14 mai 2024 à 15:52, Jon Haddad >> > a écrit :
 Is there a technical limitation that would prevent a range write that 
 functions the same way as a range tombstone, other than probably needing a 
 version bump of the storage format?
 
 
 On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer >>> > wrote:
> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. 
> They do work on DELETE because under the hood C* they get translated into 
> range tombstones.
> 
> Le mar. 14 mai 2024 à 02:44, David Capwell  > a écrit :
>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this 
>> work.
>> 
>>> On May 13, 2024, at 7:40 AM, Patrick McFadin >> > wrote:
>>> 
>>> This is a great feature addition to CQL! I get asked about it from time 
>>> to time but then people figure out a workaround. It will be great to 
>>> just have it available. 
>>> 
>>> And right on Simon! I think the only project I had as a high school 
>>> senior was figuring out how many parties I could go to and still 
>>> maintain a passing grade. Thanks for your work here. 
>>> 
>>> Patrick 
>>> 
>>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer >> > wrote:
 Hi everybody,
 
 Just raising awareness that Simon is working on adding support for the 
 BETWEEN operator in WHERE clauses (SELECT and DELETE) in 
 CASSANDRA-19604. We plan to add support for it in conditions in a 
 separate patch.
 
 The patch is available.
 
 As a side note, Simon chose to do his highschool senior project 
 contributing to Apache Cassandra. This patch is his first contribution 
 for his senior project (his second feature contribution to Apache 
 Cassandra).
 
 
>> 



Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread David Capwell
Thanks for the reply Benjamin, makes sense to me.  We can always add it later 
if it makes sense later, don’t need now in UPDATE

> On May 15, 2024, at 7:44 AM, Jon Haddad  wrote:
> 
> I was trying to have a discussion about a technical possibility, not a cost 
> benefit analysis.  More of a "how could we technically reach mars?" 
> discussion than a "how we get congress to authorize a budget to reach mars?"
> 
> Happy to talk about this privately with anyone interested as I enjoy a 
> technical discussion for the sake of a good technical discussion.
> 
> Thanks,
> Jon
> 
> On Wed, May 15, 2024 at 7:18 AM Josh McKenzie  wrote:
>> Is there a technical limitation that would prevent a range write that 
>> functions the same way as a range tombstone, other than probably needing a 
>> version bump of the storage format?
> The technical limitation would be cost/benefit due to how this intersects 
> w/our architecture I think.
> 
> Range tombstones have taught us that something that should be relatively 
> simple (merge in deletion mask at read time) introduces a significant amount 
> of complexity on all the paths Benjamin enumerated with a pretty long tail of 
> bugs and data incorrectness issues and edge cases. The work to get there, at 
> a high level glance, would be:
> • Updates to CQL grammar, spec
> • Updates to write path
> • Updates to accord. And thinking about how this intersects w/accord's 
> WAL / logic (I think? Consider me not well educated on details here)
> • Updates to compaction w/consideration for edge cases on all the 
> different compaction strategies
> • Updates to iteration and merge logic
> • Updates to paging logic
> • Indexing
> • repair, both full and incremental implications, support, etc
> • the list probably goes on? There's always >= 1 thing we're not thinking 
> of with a change like this. Usually more.
> For all of the above we also would need unit, integration, and fuzz testing 
> extensively to ensure the introduction of this new spanning concept on a 
> write doesn't introduce edge cases where incorrect data is returned on merge.
> 
> All of which is to say: it's an interesting problem, but IMO given our 
> architecture and what we know about the past of trying to introduce an 
> architectural concept like this, the costs to getting something like this to 
> production ready are pretty high.
> 
> To me the cost/benefit don't really balance out. Just my .02 though.
> 
> On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.
>> 
>> It is not simply a gut feeling, Jon. This change impacts read, write, 
>> indexing, storage, compaction, repair... The risk and cost associated with 
>> it are pretty significant and I am not convinced at this point of its 
>> benefit.
>> 
>> Le mar. 14 mai 2024 à 19:05, Jon Haddad  a écrit :
>> Personally, I don't think that something being scary at first glance is a 
>> good reason not to explore an idea.  The scenario you've described here is 
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
>> last I checked) has O(N) complexity on returning result sets with regard to 
>> rows returned.  We've also merged in Vector search which has O(N) overhead 
>> with the number of SSTables.  We're still fundamentally looking at, in most 
>> cases, a limited number of SSTables and some merging of values.
>> 
>> Write updates are essentially a timestamped mask, potentially overlapping, 
>> and I suspect potentially resolvable during compaction by propagating the 
>> values.  They could be eliminated or narrowed based on how they've 
>> propagated by using the timestamp metadata on the SSTable.
>> 
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.  We haven't even moved this past an idea.  
>> 
>> I think it would solve a massive problem for a lot of people and is 100% 
>> worth considering.  Thanks Patrick and David for raising this.
>> 
>> Jon
>> 
>> 
>> 
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
>>  wrote:
>> 
>> Ranged update sounds like a disaster for compaction and read performance.
>> Imagine compacting or reading some SSTables in which a large number of 
>> overlapping but non-identical ranges were updated with different values. It 
>> gives me a headache by just thinking about it.
>> Ranged delete is much simpler, because the "value" is the same tombstone 
>> marker, and it also is guaranteed to expire and disappear eventually, so the 
>> performance impact of dealing with them at read and compaction time doesn't 
>> suffer in the long term.
>> 
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>> It should be like range tombstones ... in much worse ;-). A tombstone is a 
>>> simple marker (d

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread Jon Haddad
I was trying to have a discussion about a technical possibility, not a cost
benefit analysis.  More of a "how could we technically reach mars?"
discussion than a "how we get congress to authorize a budget to reach mars?"

Happy to talk about this privately with anyone interested as I enjoy a
technical discussion for the sake of a good technical discussion.

Thanks,
Jon

On Wed, May 15, 2024 at 7:18 AM Josh McKenzie  wrote:

> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
> The technical limitation would be cost/benefit due to how this intersects
> w/our architecture I think.
>
> Range tombstones have taught us that something that should be relatively
> simple (merge in deletion mask at read time) introduces a significant
> amount of complexity on all the paths Benjamin enumerated with a pretty
> long tail of bugs and data incorrectness issues and edge cases. The work to
> get there, at a high level glance, would be:
>
>1. Updates to CQL grammar, spec
>2. Updates to write path
>3. Updates to accord. And thinking about how this intersects
>w/accord's WAL / logic (I think? Consider me not well educated on details
>here)
>4. Updates to compaction w/consideration for edge cases on all the
>different compaction strategies
>5. Updates to iteration and merge logic
>6. Updates to paging logic
>7. Indexing
>8. repair, both full and incremental implications, support, etc
>9. the list probably goes on? There's always >= 1 thing we're not
>thinking of with a change like this. Usually more.
>
> For all of the above we also would need unit, integration, and fuzz
> testing extensively to ensure the introduction of this new spanning concept
> on a write doesn't introduce edge cases where incorrect data is returned on
> merge.
>
> All of which is to say: it's an interesting problem, but IMO given our
> architecture and what we know about the past of trying to introduce an
> architectural concept like this, the costs to getting something like this
> to production ready are pretty high.
>
> To me the cost/benefit don't really balance out. Just my .02 though.
>
> On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.
>
>
> It is not simply a gut feeling, Jon. This change impacts read, write,
> indexing, storage, compaction, repair... The risk and cost associated with
> it are pretty significant and I am not convinced at this point of its
> benefit.
>
> Le mar. 14 mai 2024 à 19:05, Jon Haddad  a écrit :
>
> Personally, I don't think that something being scary at first glance is a
> good reason not to explore an idea.  The scenario you've described here is
> tricky but I'm not expecting it to be any worse than say, SAI, which (the
> last I checked) has O(N) complexity on returning result sets with regard to
> rows returned.  We've also merged in Vector search which has O(N) overhead
> with the number of SSTables.  We're still fundamentally looking at, in most
> cases, a limited number of SSTables and some merging of values.
>
> Write updates are essentially a timestamped mask, potentially overlapping,
> and I suspect potentially resolvable during compaction by propagating the
> values.  They could be eliminated or narrowed based on how they've
> propagated by using the timestamp metadata on the SSTable.
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.  We haven't even moved this past an idea.
>
> I think it would solve a massive problem for a lot of people and is 100%
> worth considering.  Thanks Patrick and David for raising this.
>
> Jon
>
>
>
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>
> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> On 14/05/2024 16:59, Benjamin Lerer wrote:
>
> It should be like range tombstones ... in much worse ;-). A tombstone is a
> simple marker (deleted). An update can be far more complex.
>
> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>
> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> ve

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread Josh McKenzie
> Is there a technical limitation that would prevent a range write that 
> functions the same way as a range tombstone, other than probably needing a 
> version bump of the storage format?
The technical limitation would be cost/benefit due to how this intersects w/our 
architecture I think.

Range tombstones have taught us that something that should be relatively simple 
(merge in deletion mask at read time) introduces a significant amount of 
complexity on all the paths Benjamin enumerated with a pretty long tail of bugs 
and data incorrectness issues and edge cases. The work to get there, at a high 
level glance, would be:
 1. Updates to CQL grammar, spec
 2. Updates to write path
 3. Updates to accord. And thinking about how this intersects w/accord's WAL / 
logic (I think? Consider me not well educated on details here)
 4. Updates to compaction w/consideration for edge cases on all the different 
compaction strategies
 5. Updates to iteration and merge logic
 6. Updates to paging logic
 7. Indexing
 8. repair, both full and incremental implications, support, etc
 9. the list probably goes on? There's always >= 1 thing we're not thinking of 
with a change like this. Usually more.
For all of the above we also would need unit, integration, and fuzz testing 
extensively to ensure the introduction of this new spanning concept on a write 
doesn't introduce edge cases where incorrect data is returned on merge.

All of which is to say: it's an interesting problem, but IMO given our 
architecture and what we know about the past of trying to introduce an 
architectural concept like this, the costs to getting something like this to 
production ready are pretty high.

To me the cost/benefit don't really balance out. Just my .02 though.

On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.
> 
> It is not simply a gut feeling, Jon. This change impacts read, write, 
> indexing, storage, compaction, repair... The risk and cost associated with it 
> are pretty significant and I am not convinced at this point of its benefit.
> 
> Le mar. 14 mai 2024 à 19:05, Jon Haddad  a écrit :
>> Personally, I don't think that something being scary at first glance is a 
>> good reason not to explore an idea.  The scenario you've described here is 
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
>> last I checked) has O(N) complexity on returning result sets with regard to 
>> rows returned.  We've also merged in Vector search which has O(N) overhead 
>> with the number of SSTables.  We're still fundamentally looking at, in most 
>> cases, a limited number of SSTables and some merging of values.
>> 
>> Write updates are essentially a timestamped mask, potentially overlapping, 
>> and I suspect potentially resolvable during compaction by propagating the 
>> values.  They could be eliminated or narrowed based on how they've 
>> propagated by using the timestamp metadata on the SSTable.
>> 
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.  We haven't even moved this past an idea.  
>> 
>> I think it would solve a massive problem for a lot of people and is 100% 
>> worth considering.  Thanks Patrick and David for raising this.
>> 
>> Jon
>> 
>> 
>> 
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
>>  wrote:
>>> __
>>> Ranged update sounds like a disaster for compaction and read performance.
>>> 
>>> Imagine compacting or reading some SSTables in which a large number of 
>>> overlapping but non-identical ranges were updated with different values. It 
>>> gives me a headache by just thinking about it.
>>> 
>>> Ranged delete is much simpler, because the "value" is the same tombstone 
>>> marker, and it also is guaranteed to expire and disappear eventually, so 
>>> the performance impact of dealing with them at read and compaction time 
>>> doesn't suffer in the long term.
>>> 
>>> 
>>> On 14/05/2024 16:59, Benjamin Lerer wrote:
 It should be like range tombstones ... in much worse ;-). A tombstone is a 
 simple marker (deleted). An update can be far more complex.  
 
 Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
> Is there a technical limitation that would prevent a range write that 
> functions the same way as a range tombstone, other than probably needing 
> a version bump of the storage format?
> 
> 
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. 
>> They do work on DELETE because under the hood C* they get translated 
>> into range tombstones.
>> 
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>> I would also include in UPDATE… but yeah, <3 BETWEEN 

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Benjamin Lerer
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.


It is not simply a gut feeling, Jon. This change impacts read, write,
indexing, storage, compaction, repair... The risk and cost associated with
it are pretty significant and I am not convinced at this point of its
benefit.

Le mar. 14 mai 2024 à 19:05, Jon Haddad  a écrit :

> Personally, I don't think that something being scary at first glance is a
> good reason not to explore an idea.  The scenario you've described here is
> tricky but I'm not expecting it to be any worse than say, SAI, which (the
> last I checked) has O(N) complexity on returning result sets with regard to
> rows returned.  We've also merged in Vector search which has O(N) overhead
> with the number of SSTables.  We're still fundamentally looking at, in most
> cases, a limited number of SSTables and some merging of values.
>
> Write updates are essentially a timestamped mask, potentially overlapping,
> and I suspect potentially resolvable during compaction by propagating the
> values.  They could be eliminated or narrowed based on how they've
> propagated by using the timestamp metadata on the SSTable.
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.  We haven't even moved this past an idea.
>
> I think it would solve a massive problem for a lot of people and is 100%
> worth considering.  Thanks Patrick and David for raising this.
>
> Jon
>
>
>
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>> Ranged update sounds like a disaster for compaction and read performance.
>>
>> Imagine compacting or reading some SSTables in which a large number of
>> overlapping but non-identical ranges were updated with different values. It
>> gives me a headache by just thinking about it.
>>
>> Ranged delete is much simpler, because the "value" is the same tombstone
>> marker, and it also is guaranteed to expire and disappear eventually, so
>> the performance impact of dealing with them at read and compaction time
>> doesn't suffer in the long term.
>>
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>
>> It should be like range tombstones ... in much worse ;-). A tombstone is
>> a simple marker (deleted). An update can be far more complex.
>>
>> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>>
>>> Is there a technical limitation that would prevent a range write that
>>> functions the same way as a range tombstone, other than probably needing a
>>> version bump of the storage format?
>>>
>>>
>>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer 
>>> wrote:
>>>
 Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs.
 They do work on DELETE because under the hood C* they get translated into
 range tombstones.

 Le mar. 14 mai 2024 à 02:44, David Capwell  a
 écrit :

> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
> work.
>
> On May 13, 2024, at 7:40 AM, Patrick McFadin 
> wrote:
>
> This is a great feature addition to CQL! I get asked about it from
> time to time but then people figure out a workaround. It will be great to
> just have it available.
>
> And right on Simon! I think the only project I had as a high school
> senior was figuring out how many parties I could go to and still maintain 
> a
> passing grade. Thanks for your work here.
>
> Patrick
>
> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer 
> wrote:
>
>> Hi everybody,
>>
>> Just raising awareness that Simon is working on adding support for
>> the BETWEEN operator in WHERE clauses (SELECT and DELETE) in
>> CASSANDRA-19604. We plan to add support for it in conditions in a 
>> separate
>> patch.
>>
>> The patch is available.
>>
>> As a side note, Simon chose to do his highschool senior project
>> contributing to Apache Cassandra. This patch is his first contribution 
>> for
>> his senior project (his second feature contribution to Apache Cassandra).
>>
>>
>>
>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Jon Haddad
Personally, I don't think that something being scary at first glance is a
good reason not to explore an idea.  The scenario you've described here is
tricky but I'm not expecting it to be any worse than say, SAI, which (the
last I checked) has O(N) complexity on returning result sets with regard to
rows returned.  We've also merged in Vector search which has O(N) overhead
with the number of SSTables.  We're still fundamentally looking at, in most
cases, a limited number of SSTables and some merging of values.

Write updates are essentially a timestamped mask, potentially overlapping,
and I suspect potentially resolvable during compaction by propagating the
values.  They could be eliminated or narrowed based on how they've
propagated by using the timestamp metadata on the SSTable.

It would be a lot more constructive to apply our brains towards solving an
interesting problem than pointing out all its potential flaws based on gut
feelings.  We haven't even moved this past an idea.

I think it would solve a massive problem for a lot of people and is 100%
worth considering.  Thanks Patrick and David for raising this.

Jon



On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
wrote:

> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> On 14/05/2024 16:59, Benjamin Lerer wrote:
>
> It should be like range tombstones ... in much worse ;-). A tombstone is a
> simple marker (deleted). An update can be far more complex.
>
> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>
>> Is there a technical limitation that would prevent a range write that
>> functions the same way as a range tombstone, other than probably needing a
>> version bump of the storage format?
>>
>>
>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer 
>> wrote:
>>
>>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs.
>>> They do work on DELETE because under the hood C* they get translated into
>>> range tombstones.
>>>
>>> Le mar. 14 mai 2024 à 02:44, David Capwell  a
>>> écrit :
>>>
 I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
 work.

 On May 13, 2024, at 7:40 AM, Patrick McFadin 
 wrote:

 This is a great feature addition to CQL! I get asked about it from time
 to time but then people figure out a workaround. It will be great to just
 have it available.

 And right on Simon! I think the only project I had as a high school
 senior was figuring out how many parties I could go to and still maintain a
 passing grade. Thanks for your work here.

 Patrick

 On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer 
 wrote:

> Hi everybody,
>
> Just raising awareness that Simon is working on adding support for the
> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
> We plan to add support for it in conditions in a separate patch.
>
> The patch is available.
>
> As a side note, Simon chose to do his highschool senior project
> contributing to Apache Cassandra. This patch is his first contribution for
> his senior project (his second feature contribution to Apache Cassandra).
>
>
>



Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Bowen Song via dev

Ranged update sounds like a disaster for compaction and read performance.

Imagine compacting or reading some SSTables in which a large number of 
overlapping but non-identical ranges were updated with different values. 
It gives me a headache by just thinking about it.


Ranged delete is much simpler, because the "value" is the same tombstone 
marker, and it also is guaranteed to expire and disappear eventually, so 
the performance impact of dealing with them at read and compaction time 
doesn't suffer in the long term.



On 14/05/2024 16:59, Benjamin Lerer wrote:
It should be like range tombstones ... in much worse ;-). A tombstone 
is a simple marker (deleted). An update can be far more complex.


Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :

Is there a technical limitation that would prevent a range write
that functions the same way as a range tombstone, other than
probably needing a version bump of the storage format?


On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer
 wrote:

Range restrictions (>, >=, =<, < and BETWEEN) do not work on
UPDATEs. They do work on DELETE because under the hood C* they
get translated into range tombstones.

Le mar. 14 mai 2024 à 02:44, David Capwell
 a écrit :

I would also include in UPDATE… but yeah, <3 BETWEEN and
welcome this work.


On May 13, 2024, at 7:40 AM, Patrick McFadin
 wrote:

This is a great feature addition to CQL! I get
asked about it from time to time but then people figure
out a workaround. It will be great to just have it
available.

And right on Simon! I think the only project I had as a
high school senior was figuring out how many parties I
could go to and still maintain a passing grade. Thanks
for your work here.

Patrick

On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer
 wrote:

Hi everybody,

Just raising awareness that Simon is working on
adding support for the BETWEEN operator in WHERE
clauses (SELECT and DELETE) in CASSANDRA-19604. We
plan to add support for it in conditions in a
separate patch.

The patch is available.

As a side note, Simon chose to do his highschool
senior project contributing to Apache Cassandra. This
patch is his first contribution for his senior
project (his second feature contribution to Apache
Cassandra).




Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Benjamin Lerer
It should be like range tombstones ... in much worse ;-). A tombstone is a
simple marker (deleted). An update can be far more complex.

Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :

> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
>
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs.
>> They do work on DELETE because under the hood C* they get translated into
>> range tombstones.
>>
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>
>>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
>>> work.
>>>
>>> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>>>
>>> This is a great feature addition to CQL! I get asked about it from time
>>> to time but then people figure out a workaround. It will be great to just
>>> have it available.
>>>
>>> And right on Simon! I think the only project I had as a high school
>>> senior was figuring out how many parties I could go to and still maintain a
>>> passing grade. Thanks for your work here.
>>>
>>> Patrick
>>>
>>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer 
>>> wrote:
>>>
 Hi everybody,

 Just raising awareness that Simon is working on adding support for the
 BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
 We plan to add support for it in conditions in a separate patch.

 The patch is available.

 As a side note, Simon chose to do his highschool senior project
 contributing to Apache Cassandra. This patch is his first contribution for
 his senior project (his second feature contribution to Apache Cassandra).



>>>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Jon Haddad
Is there a technical limitation that would prevent a range write that
functions the same way as a range tombstone, other than probably needing a
version bump of the storage format?


On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:

> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They
> do work on DELETE because under the hood C* they get translated into range
> tombstones.
>
> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>
>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
>> work.
>>
>> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>>
>> This is a great feature addition to CQL! I get asked about it from time
>> to time but then people figure out a workaround. It will be great to just
>> have it available.
>>
>> And right on Simon! I think the only project I had as a high school
>> senior was figuring out how many parties I could go to and still maintain a
>> passing grade. Thanks for your work here.
>>
>> Patrick
>>
>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:
>>
>>> Hi everybody,
>>>
>>> Just raising awareness that Simon is working on adding support for the
>>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
>>> We plan to add support for it in conditions in a separate patch.
>>>
>>> The patch is available.
>>>
>>> As a side note, Simon chose to do his highschool senior project
>>> contributing to Apache Cassandra. This patch is his first contribution for
>>> his senior project (his second feature contribution to Apache Cassandra).
>>>
>>>
>>>
>>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Benjamin Lerer
Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They
do work on DELETE because under the hood C* they get translated into range
tombstones.

Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :

> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work.
>
> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>
> This is a great feature addition to CQL! I get asked about it from time to
> time but then people figure out a workaround. It will be great to just have
> it available.
>
> And right on Simon! I think the only project I had as a high school senior
> was figuring out how many parties I could go to and still maintain a
> passing grade. Thanks for your work here.
>
> Patrick
>
> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:
>
>> Hi everybody,
>>
>> Just raising awareness that Simon is working on adding support for the
>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
>> We plan to add support for it in conditions in a separate patch.
>>
>> The patch is available.
>>
>> As a side note, Simon chose to do his highschool senior project
>> contributing to Apache Cassandra. This patch is his first contribution for
>> his senior project (his second feature contribution to Apache Cassandra).
>>
>>
>>
>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-13 Thread David Capwell
I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work.

> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
> 
> This is a great feature addition to CQL! I get asked about it from time to 
> time but then people figure out a workaround. It will be great to just have 
> it available. 
> 
> And right on Simon! I think the only project I had as a high school senior 
> was figuring out how many parties I could go to and still maintain a passing 
> grade. Thanks for your work here. 
> 
> Patrick 
> 
> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  > wrote:
>> Hi everybody,
>> 
>> Just raising awareness that Simon is working on adding support for the 
>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604. We 
>> plan to add support for it in conditions in a separate patch.
>> 
>> The patch is available.
>> 
>> As a side note, Simon chose to do his highschool senior project contributing 
>> to Apache Cassandra. This patch is his first contribution for his senior 
>> project (his second feature contribution to Apache Cassandra).
>> 
>> 



Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-13 Thread Patrick McFadin
This is a great feature addition to CQL! I get asked about it from time to
time but then people figure out a workaround. It will be great to just have
it available.

And right on Simon! I think the only project I had as a high school senior
was figuring out how many parties I could go to and still maintain a
passing grade. Thanks for your work here.

Patrick

On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:

> Hi everybody,
>
> Just raising awareness that Simon is working on adding support for the
> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
> We plan to add support for it in conditions in a separate patch.
>
> The patch is available.
>
> As a side note, Simon chose to do his highschool senior project
> contributing to Apache Cassandra. This patch is his first contribution for
> his senior project (his second feature contribution to Apache Cassandra).
>
>
>


[DISCUSS] Adding support for BETWEEN operator

2024-05-13 Thread Benjamin Lerer
Hi everybody,

Just raising awareness that Simon is working on adding support for the
BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
We plan to add support for it in conditions in a separate patch.

The patch is available.

As a side note, Simon chose to do his highschool senior project
contributing to Apache Cassandra. This patch is his first contribution for
his senior project (his second feature contribution to Apache Cassandra).