Re: [VOTE][IP CLEARANCE] GoCQL driver

2024-06-26 Thread Ekaterina Dimitrova
+1

On Wed, 26 Jun 2024 at 6:14, Jon Haddad  wrote:

> +1.
>
> On Wed, Jun 26, 2024 at 1:50 AM J. D. Jordan 
> wrote:
>
>> +1 nb. Good to see this heavily used driver get continued development in
>> the project.
>>
>> > On Jun 25, 2024, at 5:29 PM, Michael Shuler 
>> wrote:
>> >
>> > +1
>> >
>> > Kind regards,
>> > Michael
>> >
>> >> On 6/25/24 12:29, Mick Semb Wever wrote:
>> >> Please vote on the acceptance of the GoCQL driver and its IP Clearance:
>> >> https://incubator.apache.org/ip-clearance/cassandra-gocql-driver.html
>> 
>> >> All consent from original authors of the donation, and tracking of
>> collected CLAs, is found in:
>> >>  - https://github.com/gocql/gocql/issues/1751 <
>> https://github.com/gocql/gocql/issues/1751>
>> >>  -
>> https://cwiki.apache.org/confluence/pages/worddav/preview.action?fileName=GoCQL+ASF+CLA+collection.xlsx=225152485
>> <
>> https://cwiki.apache.org/confluence/pages/worddav/preview.action?fileName=GoCQL+ASF+CLA+collection.xlsx=225152485
>> >
>> >> These do not require acknowledgement before thevote.
>> >> The code is prepared for donation at https://github.com/gocql/gocql <
>> https://github.com/gocql/gocql>
>> >> Once thisvotepasses we will request ASF Infra to move the gocql/gocql
>> as-is to apache/cassandra-gocql-driver  . The master branch and tags, with
>> all their history, will be kept.  Because consent and CLAs were not
>> received from all original authors the source files keep additional
>> reference to their earlier copyright authors and license.
>> >> This will become part of the Drivers subproject, ref CEP-8:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>> <
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>> >
>> >> PMC members, please check carefully the IP Clearance requirements
>> beforevoting.
>> >> Thevotewill be open for 72 hours (or longer).Votesby PMC members are
>> considered binding. Avotepasses if there are at least three binding +1s and
>> no -1's.
>> >> regards,
>> >> Mick
>>
>


Re: Cassandra PMC Chair Rotation, 2024 Edition

2024-06-20 Thread Ekaterina Dimitrova
Thank you, Josh, and congrats, Dinesh! 拾

On Thu, 20 Jun 2024 at 19:44, Joseph Lynch  wrote:

> This is exciting news!!
>
> Congratulations Dinesh and thank you for taking on this role!  Also thank
> you to Josh for taking the role this year!
>
> -Joey
>
> On Thu, Jun 20, 2024 at 12:39 PM Rahul Xavier Singh <
> rahul.xavier.si...@gmail.com> wrote:
>
>> Congrats Dinesh!
>>
>> On Thu, Jun 20, 2024 at 12:27 PM Francisco Guerrero 
>> wrote:
>>
>>> Thanks Josh for your contributions to the project as PMC Chair.
>>> Congratulations Dinesh!
>>>
>>> On 2024/06/20 16:25:26 David Capwell wrote:
>>> > Congrats!
>>> >
>>> > > On Jun 20, 2024, at 9:10 AM, Melissa Logan 
>>> wrote:
>>> > >
>>> > > Josh, thank you for your time as chair + congrats Dinesh!
>>> > >
>>> > > On Thu, Jun 20, 2024 at 9:08 AM Abe Ratnofsky >> a...@aber.io>> wrote:
>>> > >> Congrats Dinesh! Thank you Josh!
>>> > >>
>>> > >>> On Jun 20, 2024, at 11:53 AM, Jeremiah Jordan <
>>> jeremiah.jor...@gmail.com > wrote:
>>> > >>>
>>> > >>> Welcome to the Chair role Dinesh!  Congrats!
>>> > >>>
>>> > >>> On Jun 20, 2024 at 10:50:37 AM, Josh McKenzie <
>>> jmcken...@apache.org > wrote:
>>> >  Another PMC Chair baton pass incoming! On behalf of the Apache
>>> Cassandra Project Management Committee (PMC) I would like to welcome and
>>> congratulate our next PMC Chair Dinesh Joshi (djoshi).
>>> > 
>>> >  Dinesh has been a member of the PMC for a few years now and many
>>> of you likely know him from his thoughtful, measured presence on many of
>>> our collective discussions as we've grown and evolved over the past few
>>> years.
>>> > 
>>> >  I appreciate the project trusting me as liaison with the board
>>> over the past year and look forward to supporting Dinesh in the role in the
>>> future.
>>> > 
>>> >  Repeating Mick (repeating Paulo's) words from last year: The
>>> chair is an administrative position that interfaces with the Apache
>>> Software Foundation Board, by submitting regular reports about project
>>> status and health. Read more about the PMC chair role on Apache projects:
>>> >  - https://www.apache.org/foundation/how-it-works.html#pmc
>>> >  - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>>> >  -
>>> https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>>> > 
>>> >  The PMC as a whole is the entity that oversees and leads the
>>> project and any PMC member can be approached as a representative of the
>>> committee. A list of Apache Cassandra PMC members can be found on:
>>> https://cassandra.apache.org/_/community.html
>>> > >>
>>> >
>>> >
>>>
>>


Re: [DISCUSS] LWT conditions behavior on collections is inconsistent (CASSANDRA-19637)

2024-06-13 Thread Ekaterina Dimitrova
Hey Benjamin,

Thanks for the fix and raising the point here. The patch seems reasonable
to me. The only thing to confirm on my mind is whether people consider it
behavior breaking change or only a bug fix. Based on that we can move
forward with patch for 4.0+ or only trunk.

It will be also good to ensure we have similar behavior in Accord.

Thoughts, anyone?

Best regards,
Ekaterina

On Tue, 11 Jun 2024 at 18:35, Benjamin Lerer  wrote:

> Hi everybody,
>
> I wanted to raise attention to CASSANDRA-19637
>  that I already
> mentioned in the "[DISCUSS] NULL handling and the unfrozen collection
> issue" thread.
>
> The patch attempts to fix some inconsistencies in the way LWT conditions
> work with collections when being compared to null values.
>
> In case you feel that the solution is not going in the right direction or
> you believe that there are some valid use cases that will be impacted by
> this change in behavior feel free to raise the problem.
>


Re: [DISCUSS] Stream Pipelines on hot paths

2024-06-02 Thread Ekaterina Dimitrova
+1 to ban them, considering we also do not have any regular performance
testing as part of the project test suites.
Jacek has a good point about the checkstyle - there is separate one for
tests. Though I would not object if people want them banned from tests too.

I guess if we ban them this would mean - no new additions and fix only
places where there is a clear problem?

On Sun, 2 Jun 2024 at 14:03, Enrico Olivelli  wrote:

> +1 (from the peanuts gallery)
>
> Removing streams from anything that looks like an hot path is indeed a
> good thing.
>
> Please balance with 'don't fix things that aren't broken'.
>
> While doing such changes seems a great idea, sometimes it may have side
> effects that you don't see until you run on real datasets.
>
> Enrico
>
>
> Il Dom 2 Giu 2024, 19:38 J. D. Jordan  ha
> scritto:
>
>> +1 agree with all this.  Also fine to just use in tests or ban completely.
>>
>> On Jun 2, 2024, at 11:58 AM, Jake Luciani  wrote:
>>
>> 
>>
>> +1 Java streams cause perf issues in hot paths. Its fine for tests and
>> slow paths. But for clairity its fine to ban it as well if the majority
>> agrees.
>>
>>
>> On Sun, Jun 2, 2024 at 12:34 PM Sam  wrote:
>>
>>> I completely agree with this. I suspect there is a lot of low-hanging
>>> fruit with easy perf wins.
>>>
>>> On profiling a 90% write workload I found
>>> StorageProxy::updateCoordinatorWriteLatencyTableMetric to be a hot-path,
>>> consuming between 15-20% of ModificationStatement::executeWithoutCondition
>>> cycles.
>>>
>>> https://github.com/apache/cassandra/pull/3344
>>> 
>>>
>>>
>>> On Fri, 31 May 2024 at 21:19, Jacek Lewandowski <
>>> lewandowski.ja...@gmail.com> wrote:
>>>
 Usages of them in tests are ok I think. We have a separate checkstyle
 file for the test code.

 - - -- --- -  -
 Jacek Lewandowski


 pt., 31 maj 2024 o 19:14 David Capwell  napisał(a):

> I am cool for forbidding with a callout that tests are ok.  I am cool
> with forbidding in tests as well, but thats just for consistency reasons
> than anything.
>
> On May 31, 2024, at 8:12 AM, Brandon Williams 
> wrote:
>
>
> On Fri, May 31, 2024 at 9:35 AM Abe Ratnofsky  wrote:
>
>> +1 to forbidding Stream usage entirely; the convenience of using them
>> outside of hot paths is less than the burden of figuring out whether or 
>> not
>> a particular path is hot.
>>
>
> I think I have most frequently appreciated them in tests, which I
> think we could except, since these are categorically not in the hot path.
>
> Kind Regards,
> Brandon
>
>
>
>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-16 Thread Ekaterina Dimitrova
I second Patrick about the parties and all that… Thanks, Simon for all your
work! I am excited to see what’s next from you as I am sure it will be
awesome!

Cheers!

On Thu, 16 May 2024 at 14:50, Jon Haddad  wrote:

> Benjamin, I’m +1 on adding BETWEEN, thanks for bringing this up.
>
> To all, my intention wasn’t to suggest we add support for update between
> via range writes at the same time, if it came across that way i apologize
> for the confusion.
>
> Josh, thanks for the suggestion. If I feel inspired to discuss with the
> dev list any further I’ll be sure to start a new thread.
>
> Jon
>
>
> On Thu, May 16, 2024 at 7:57 AM Josh McKenzie 
> wrote:
>
>> More of a "how could we technically reach mars?" discussion than a "how
>> we get congress to authorize a budget to reach mars?"
>>
>> Wow - that is genuinely a great simile. Really good point.
>>
>> To Jeff's point - want to kick off a [DISCUSS] thread referencing this
>> thread Jon so we can take the conversation there? Definitely think it's
>> worth continuing from a technical perspective.
>>
>> On Wed, May 15, 2024, at 2:49 PM, Jeff Jirsa wrote:
>>
>> You can remove the shadowed values at compaction time, but you can’t ever
>> fully propagate the range update to point updates, so you’d be propagating
>> all of the range-update structures throughout everything forever. It’s JUST
>> like a range tombstone - you don’t know what it’s shadowing (and can’t, in
>> many cases, because the width of the range is uncountable for some types).
>>
>> Setting aside whether or not this construct is worth adding (I suspect a
>> lot of binding votes would say it’s not), the thread focuses on BETWEEN
>> operator, and there’s no reason we should pollute the conversation of “add
>> a missing SQL operator that basically maps to existing functionality” with
>> creation of a brand new form of update that definitely doesn’t map to any
>> existing concepts.
>>
>>
>>
>>
>>
>> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
>>
>> Personally, I don't think that something being scary at first glance is a
>> good reason not to explore an idea.  The scenario you've described here is
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the
>> last I checked) has O(N) complexity on returning result sets with regard to
>> rows returned.  We've also merged in Vector search which has O(N) overhead
>> with the number of SSTables.  We're still fundamentally looking at, in most
>> cases, a limited number of SSTables and some merging of values.
>>
>> Write updates are essentially a timestamped mask, potentially
>> overlapping, and I suspect potentially resolvable during compaction by
>> propagating the values.  They could be eliminated or narrowed based on how
>> they've propagated by using the timestamp metadata on the SSTable.
>>
>> It would be a lot more constructive to apply our brains towards solving
>> an interesting problem than pointing out all its potential flaws based on
>> gut feelings.  We haven't even moved this past an idea.
>>
>> I think it would solve a massive problem for a lot of people and is 100%
>> worth considering.  Thanks Patrick and David for raising this.
>>
>> Jon
>>
>>
>>
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>
>> Ranged update sounds like a disaster for compaction and read performance.
>>
>> Imagine compacting or reading some SSTables in which a large number of
>> overlapping but non-identical ranges were updated with different values. It
>> gives me a headache by just thinking about it.
>>
>> Ranged delete is much simpler, because the "value" is the same tombstone
>> marker, and it also is guaranteed to expire and disappear eventually, so
>> the performance impact of dealing with them at read and compaction time
>> doesn't suffer in the long term.
>>
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>
>> It should be like range tombstones ... in much worse ;-). A tombstone is
>> a simple marker (deleted). An update can be far more complex.
>>
>> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>>
>> Is there a technical limitation that would prevent a range write that
>> functions the same way as a range tombstone, other than probably needing a
>> version bump of the storage format?
>>
>>
>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer 
>> wrote:
>>
>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs.
>> They do work on DELETE because under the hood C* they get translated into
>> range tombstones.
>>
>> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>>
>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
>> work.
>>
>> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>>
>> This is a great feature addition to CQL! I get asked about it from time
>> to time but then people figure out a workaround. It will be great to just
>> have it available.
>>
>> And right on Simon! I think the only project I had as a high school
>> senior was 

Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
All I say is we should be careful not to open the door for someone to be
able to set for a parameter in cassandra.yaml 512MiB and convert it to 0
GiB internally while changing those classes. Loss of precision and weird
settings. As long as that pandora box stays closed, all good 

 I do support this new function addition proposed here, thank you!

On Thu, 25 Apr 2024 at 7:31, Jon Haddad  wrote:

> I can’t see a good reason not to support it. Seems like extra work to
> avoid with no benefit.
>
> —
>
> Jon Haddad
> Rustyrazorblade Consulting
> rustyrazorblade.com
>
>
> On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> Can you elaborate on intentionally not supporting some conversions? Are
>> we safe to base these conversions on DataStorageUnit? We have set of units
>> from BYTES to GIBIBYTES and respective methods on them which convert from
>> that unit to whatever else. Is this OK to be used for the purposes of this
>> feature? I would expect that once we have units like these and methods on
>> them to convert from-to, it can be reused in wherever else.
>>
>> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova <
>> e.dimitr...@gmail.com> wrote:
>>
>>> All I am saying is be careful with adding those conversions not to end
>>> up used while setting our configuration. Thanks 
>>>
>>> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
>>>> Well, technically I do not need DataStorageSpec at all. All I need is
>>>> DataStorageUnit for that matter. That can convert from one unit to another
>>>> easily.
>>>>
>>>> We can omit tebibytes, that's just fine. People would need to live with
>>>> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
>>>> guess that is just enough to have a picture of what magnitude that value
>>>> looks like.
>>>>
>>>> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova <
>>>> e.dimitr...@gmail.com> wrote:
>>>>
>>>>> Quick comment:
>>>>>
>>>>> DataRateSpec, DataStorageSpec, or DurationSpec
>>>>> - we intentionally do not support going smaller to bigger size in
>>>>> those classes which are specific for cassandra.yaml - precision issues.
>>>>> Please keep it that way. That is why the notion of min unit was added in
>>>>> cassandra.yaml for parameters that are internally represented in a bigger
>>>>> unit.
>>>>>
>>>>> I am not sure that people want to add TiB. There was explicit
>>>>> agreement what units we will allow in cassandra.yaml. I suspect any new
>>>>> units should be approved on the ML
>>>>>
>>>>> Hope this helps
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
>>>>> dev@cassandra.apache.org> wrote:
>>>>>
>>>>>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>>>>>
>>>>>> A quick review tells me that all the units are unique across the 3
>>>>>> specs.  As long as we guarantee that in the future the method you propose
>>>>>> should be easily expandable to the other specs.
>>>>>>
>>>>>> +1 to this idea.
>>>>>>
>>>>>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>>>>>> stefan.mikloso...@gmail.com> wrote:
>>>>>>
>>>>>>> That is a very interesting point, Claude. My so-far implementation
>>>>>>> is using FileUtils.stringifyFileSize which is just dividing a value by a
>>>>>>> respective divisor based on how big a value is. While this works, it 
>>>>>>> will
>>>>>>> prevent you from specifying what unit you want that value to be 
>>>>>>> converted
>>>>>>> to as well as it will prevent you from specifying what unit a value you
>>>>>>> provided is of. So, for example, if a column is known to be in kibibytes
>>>>>>> and we want that to be converted into gibibytes, that won't be possible
>>>>>>> because that function will think that a value is in bytes.
>>>>>>>
>>>>>>> It would be more appropriate to have something like this:
>>>>>>>
>>

Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
All I am saying is be careful with adding those conversions not to end up
used while setting our configuration. Thanks 

On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič 
wrote:

> Well, technically I do not need DataStorageSpec at all. All I need is
> DataStorageUnit for that matter. That can convert from one unit to another
> easily.
>
> We can omit tebibytes, that's just fine. People would need to live with
> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
> guess that is just enough to have a picture of what magnitude that value
> looks like.
>
> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova 
> wrote:
>
>> Quick comment:
>>
>> DataRateSpec, DataStorageSpec, or DurationSpec
>> - we intentionally do not support going smaller to bigger size in those
>> classes which are specific for cassandra.yaml - precision issues. Please
>> keep it that way. That is why the notion of min unit was added in
>> cassandra.yaml for parameters that are internally represented in a bigger
>> unit.
>>
>> I am not sure that people want to add TiB. There was explicit agreement
>> what units we will allow in cassandra.yaml. I suspect any new units should
>> be approved on the ML
>>
>> Hope this helps
>>
>>
>>
>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>>
>>> A quick review tells me that all the units are unique across the 3
>>> specs.  As long as we guarantee that in the future the method you propose
>>> should be easily expandable to the other specs.
>>>
>>> +1 to this idea.
>>>
>>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
>>>> That is a very interesting point, Claude. My so-far implementation is
>>>> using FileUtils.stringifyFileSize which is just dividing a value by a
>>>> respective divisor based on how big a value is. While this works, it will
>>>> prevent you from specifying what unit you want that value to be converted
>>>> to as well as it will prevent you from specifying what unit a value you
>>>> provided is of. So, for example, if a column is known to be in kibibytes
>>>> and we want that to be converted into gibibytes, that won't be possible
>>>> because that function will think that a value is in bytes.
>>>>
>>>> It would be more appropriate to have something like this:
>>>>
>>>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
>>>> source nor target unit, it will consider it to be in bytes and it will
>>>> convert it like in FileUtils.stringifyFileSize
>>>>
>>>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>>>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>>>
>>>> the first argument is the source unit, the second argument is target
>>>> unit
>>>>
>>>> to_human_size(val, 'B', 'MiB')
>>>> to_human_size(val, 'B', 'GiB')
>>>> to_human_size(val, 'KiB', 'GiB')
>>>> to_human_size(val, 'KiB', 'TiB')
>>>>
>>>> I think this is more flexible and we should funnel this via
>>>> DataStorageSpec and similar as you mentioned.
>>>>
>>>> In the future, we might also add to_human_duration which would be
>>>> implemented against DurationSpec so similar conversions are possible.
>>>>
>>>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> I like the idea.  Is the intention to have the of the function be
>>>>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>>>>> DurationSpec?
>>>>>
>>>>> Claude
>>>>>
>>>>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think it’s a good quality of life improvement, but I am someone who
>>>>>> believes in a rich set of built-in functions being a good thing.
>>>>>>
>>>>>> A format function is a bit more scope and kind of orthogonal. It
>>>>>> would still be good to have shorthand functions for things like size.
>>>>>>
>>>>>> Ariel
>>>>>>
>>>>>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to propose CASSANDRA-19546. It would be possible to convert
>>>>>> raw numbers to something human-friendly.
>>>>>> There are cases when we write just a number of bytes in our system
>>>>>> tables but these numbers are just hard to parse visually. Users can 
>>>>>> indeed
>>>>>> use this for their tables too if they find it useful.
>>>>>>
>>>>>> Also, a user can indeed write a UDF for this but I would prefer if we
>>>>>> had something baked in.
>>>>>>
>>>>>> Does this make sense to people? Are there any other approaches to do
>>>>>> this?
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>>>>>> https://github.com/apache/cassandra/pull/3239/files
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>>


Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
Edit: I meant to say smaller to bigger unit, not size, sorry

On Thu, 25 Apr 2024 at 6:35, Ekaterina Dimitrova 
wrote:

> Quick comment:
>
> DataRateSpec, DataStorageSpec, or DurationSpec
> - we intentionally do not support going smaller to bigger size in those
> classes which are specific for cassandra.yaml - precision issues. Please
> keep it that way. That is why the notion of min unit was added in
> cassandra.yaml for parameters that are internally represented in a bigger
> unit.
>
> I am not sure that people want to add TiB. There was explicit agreement
> what units we will allow in cassandra.yaml. I suspect any new units should
> be approved on the ML
>
> Hope this helps
>
>
>
> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>
>> A quick review tells me that all the units are unique across the 3
>> specs.  As long as we guarantee that in the future the method you propose
>> should be easily expandable to the other specs.
>>
>> +1 to this idea.
>>
>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> That is a very interesting point, Claude. My so-far implementation is
>>> using FileUtils.stringifyFileSize which is just dividing a value by a
>>> respective divisor based on how big a value is. While this works, it will
>>> prevent you from specifying what unit you want that value to be converted
>>> to as well as it will prevent you from specifying what unit a value you
>>> provided is of. So, for example, if a column is known to be in kibibytes
>>> and we want that to be converted into gibibytes, that won't be possible
>>> because that function will think that a value is in bytes.
>>>
>>> It would be more appropriate to have something like this:
>>>
>>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
>>> source nor target unit, it will consider it to be in bytes and it will
>>> convert it like in FileUtils.stringifyFileSize
>>>
>>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>>
>>> the first argument is the source unit, the second argument is target unit
>>>
>>> to_human_size(val, 'B', 'MiB')
>>> to_human_size(val, 'B', 'GiB')
>>> to_human_size(val, 'KiB', 'GiB')
>>> to_human_size(val, 'KiB', 'TiB')
>>>
>>> I think this is more flexible and we should funnel this via
>>> DataStorageSpec and similar as you mentioned.
>>>
>>> In the future, we might also add to_human_duration which would be
>>> implemented against DurationSpec so similar conversions are possible.
>>>
>>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> I like the idea.  Is the intention to have the of the function be
>>>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>>>> DurationSpec?
>>>>
>>>> Claude
>>>>
>>>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think it’s a good quality of life improvement, but I am someone who
>>>>> believes in a rich set of built-in functions being a good thing.
>>>>>
>>>>> A format function is a bit more scope and kind of orthogonal. It would
>>>>> still be good to have shorthand functions for things like size.
>>>>>
>>>>> Ariel
>>>>>
>>>>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I want to propose CASSANDRA-19546. It would be possible to convert raw
>>>>> numbers to something human-friendly.
>>>>> There are cases when we write just a number of bytes in our system
>>>>> tables but these numbers are just hard to parse visually. Users can indeed
>>>>> use this for their tables too if they find it useful.
>>>>>
>>>>> Also, a user can indeed write a UDF for this but I would prefer if we
>>>>> had something baked in.
>>>>>
>>>>> Does this make sense to people? Are there any other approaches to do
>>>>> this?
>>>>>
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>>>>> https://github.com/apache/cassandra/pull/3239/files
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>>


Re: discuss: add to_human_size function

2024-04-25 Thread Ekaterina Dimitrova
Quick comment:

DataRateSpec, DataStorageSpec, or DurationSpec
- we intentionally do not support going smaller to bigger size in those
classes which are specific for cassandra.yaml - precision issues. Please
keep it that way. That is why the notion of min unit was added in
cassandra.yaml for parameters that are internally represented in a bigger
unit.

I am not sure that people want to add TiB. There was explicit agreement
what units we will allow in cassandra.yaml. I suspect any new units should
be approved on the ML

Hope this helps



On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> TiB is not yet in DataStorageSpec (perhaps we should add it).
>
> A quick review tells me that all the units are unique across the 3 specs.
> As long as we guarantee that in the future the method you propose should be
> easily expandable to the other specs.
>
> +1 to this idea.
>
> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
>> That is a very interesting point, Claude. My so-far implementation is
>> using FileUtils.stringifyFileSize which is just dividing a value by a
>> respective divisor based on how big a value is. While this works, it will
>> prevent you from specifying what unit you want that value to be converted
>> to as well as it will prevent you from specifying what unit a value you
>> provided is of. So, for example, if a column is known to be in kibibytes
>> and we want that to be converted into gibibytes, that won't be possible
>> because that function will think that a value is in bytes.
>>
>> It would be more appropriate to have something like this:
>>
>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
>> source nor target unit, it will consider it to be in bytes and it will
>> convert it like in FileUtils.stringifyFileSize
>>
>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>
>> the first argument is the source unit, the second argument is target unit
>>
>> to_human_size(val, 'B', 'MiB')
>> to_human_size(val, 'B', 'GiB')
>> to_human_size(val, 'KiB', 'GiB')
>> to_human_size(val, 'KiB', 'TiB')
>>
>> I think this is more flexible and we should funnel this via
>> DataStorageSpec and similar as you mentioned.
>>
>> In the future, we might also add to_human_duration which would be
>> implemented against DurationSpec so similar conversions are possible.
>>
>> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> I like the idea.  Is the intention to have the of the function be
>>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>>> DurationSpec?
>>>
>>> Claude
>>>
>>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg 
>>> wrote:
>>>
 Hi,

 I think it’s a good quality of life improvement, but I am someone who
 believes in a rich set of built-in functions being a good thing.

 A format function is a bit more scope and kind of orthogonal. It would
 still be good to have shorthand functions for things like size.

 Ariel

 On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:

 Hi,

 I want to propose CASSANDRA-19546. It would be possible to convert raw
 numbers to something human-friendly.
 There are cases when we write just a number of bytes in our system
 tables but these numbers are just hard to parse visually. Users can indeed
 use this for their tables too if they find it useful.

 Also, a user can indeed write a UDF for this but I would prefer if we
 had something baked in.

 Does this make sense to people? Are there any other approaches to do
 this?

 https://issues.apache.org/jira/browse/CASSANDRA-19546
 https://github.com/apache/cassandra/pull/3239/files

 Regards





Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire, Olivier Michallat as Cassandra Committers

2024-04-17 Thread Ekaterina Dimitrova
Congrats and thank you for all your work on the drivers!

On Wed, 17 Apr 2024 at 13:17, Francisco Guerrero  wrote:

> Congratulations everyone!
>
> On 2024/04/17 17:14:34 Abe Ratnofsky wrote:
> > Congrats everyone!
> >
> > > On Apr 17, 2024, at 1:10 PM, Benjamin Lerer  wrote:
> > >
> > > The Apache Cassandra PMC is pleased to announce that Alexandre Dutra,
> Andrew Tolbert, Bret McGuire and Olivier Michallat have accepted the
> invitation to become committers on the java driver sub-project.
> > >
> > > Thanks for your contributions to the Java driver during all those
> years!
> > > Congratulations and welcome!
> > >
> > > The Apache Cassandra PMC members
> >
> >
>


Re: Default table compression defined in yaml.

2024-03-19 Thread Ekaterina Dimitrova
Any new settings are expected to be added in the new format

On Tue, 19 Mar 2024 at 5:52, Bowen Song via dev 
wrote:

> I believe the `foobar_in_kb: 123` format in the cassandra.yaml file is
> deprecated, and the new format is `foobar: 123KiB`. Is there a need to
> introduce new settings entries with the deprecated format only to be
> removed at a later version?
>
>
> On 18/03/2024 14:39, Claude Warren, Jr via dev wrote:
>
> After much work by several people, I have pulled together the changes to
> define the default compression in the cassandra.yaml file and have created
> a pull request [1].
>
> If you are interested this in topic, please take a look at the changes and
> give at least a cursory review.
>
> [1]  https://github.com/apache/cassandra/pull/3168
>
> Thanks,
> Claude
>
>


Re: Welcome Brad Schoening as Cassandra Committer

2024-02-21 Thread Ekaterina Dimitrova
Congrats and thank you for everything, Brad! 拾

On Wed, 21 Feb 2024 at 15:46, Josh McKenzie  wrote:

> The Apache Cassandra PMC is pleased to announce that Brad Schoening has
> accepted
> the invitation to become a committer.
>
> Your work on the integrated python driver, launch script environment, and
> tests
> has been a big help to many. Congratulations and welcome!
>
> The Apache Cassandra PMC members
>


Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-16 Thread Ekaterina Dimitrova
Thanks for opening an epic @Jacek.

It seems the dtest_offheap job is replaced by dtest_latest which means we
will have the same amount of jobs after the current ticket and I am not
worried about Jenkins.

Though in CircleCI we did not have the dtest_offheap job mandatory run
pre-commit but as far as I can see this ticket suggests dtest_latest to be
mandatory run in the pre-commit workflow.
I would like to suggest we commit the current proposal. Only, I think the
config should be mentioned experimental somewhere.

As a short term solution to the raised consumption pre-commit tests run I
would like to suggest we accept running only the J11 pre-commit workflow
(which covers also tests run with J17) until we surface the other
discussion and we apply other test configuration changes/optimizations.

On Fri, 16 Feb 2024 at 9:08, Paulo Motta  wrote:

> Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I
> think this change will make it easier to gain confidence to update
> configurations.
>
> Interesting discussion and suggestions on this thread - I think we can
> follow-up on improving test/CI workflow in a different thread/proposal to
> avoid blocking this.
>
> On Thu, Feb 15, 2024 at 9:59 AM Branimir Lambov <
> branimir.lam...@datastax.com> wrote:
>
>> Paulo:
>>
>>> 1) Will cassandra.yaml remain the default test config? Is the plan
>>> moving forward to require green CI for both configurations on pre-commit,
>>> or pre-release?
>>
>> The plan is to ensure both configurations are green pre-commit. This
>> should not increase the CI cost as this replaces extra configurations we
>> were running before (e.g. test-tries).
>>
>> 2) What will this mean for the release artifact, is the idea to continue
>>> shipping with the current cassandra.yaml or eventually switch to the
>>> optimized configuration (ie. 6.X) while making the legacy default
>>> configuration available via an optional flag?
>>
>> The release simply includes an additional yaml file, which contains a
>> one-liner how to use it.
>>
>> Jeff:
>>
>>> 1) If there’s an “old compatible default” and “latest recommended
>>> settings”, when does the value in “old compatible default” get updated?
>>> Never?
>>
>> This does not change anything about these decisions. The question is very
>> serious without this patch as well: Does V6 have to support pain-free
>> upgrade from V5 working in V4 compatible mode? If so, can we ever deprecate
>> or drop anything? If not, are we not breaking upgradeability promises?
>>
>> 2) If there are test failures with the new values, it seems REALLY
>>> IMPORTANT to make sure those test failures are discovered + fixed IN THE
>>> FUTURE TOO. If pushing new yaml into a different file makes us less likely
>>> to catch the failures in the future, it seems like we’re hurting ourselves.
>>> Branimir mentions this, but how do we ensure that we don’t let this pattern
>>> disguise future bugs?
>>
>> The main objective of this patch is to ensure that the second yaml is
>> tested too, pre-commit. We were not doing this for all features we tell
>> users are supported.
>>
>> Paulo:
>>
>>> - if cassandra_latest.yaml becomes the new default configuration for
>>> 6.0, then precommit only needs to be run against thatversion - prerelease
>>> needs to be run against all cassandra.yaml variants.
>>
>> Assuming we keep the pace of development, there will be new "latest"
>> features in 6.0 (e.g. Accord could be one). The idea is more to move some
>> of the settings from latest to default when they are deemed mature enough.
>>
>> Josh:
>>
>>> I propose to significantly reduce that stuff. Let's distinguish the
>>> packages of tests that need to be run with CDC enabled / disabled, with
>>> commitlog compression enabled / disabled, tests that verify sstable formats
>>> (mostly io and index I guess), and leave other parameters set as with the
>>> latest configuration - this is the easiest way I think.
>>> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about
>>> other stuff. To me running no-vnodes makes no sense because no-vnodes is
>>> just a special case of vnodes=1. On the other hand offheap/onheap buffers
>>> could be tested in unit tests. In short, I'd run dtests only with the
>>> default and latest configuration.
>>
>> Some of these changes are already done in this ticket.
>>
>> Regards,
>> Branimir
>>
>>
>>
>> On Thu, Feb 15, 2024 at 3:08 PM Paulo Motta  wrote:
>>
>>> > It's also been questioned about why we don't just enable settings we
>>> recommend.  These are settings we recommend for new clusters.  *Our
>>> existing cassandra.yaml needs to be tailored for existing clusters being
>>> upgraded, where we are very conservative about changing defaults.*
>>>
>>> I think this unnecessarily penalizes new users with subpar defaults and
>>> existing users who wish to use optimized/recommended defaults and need to
>>> maintain additional logic to support that. This change offers an
>>> opportunity to revisit this.
>>>
>>> 

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-15 Thread Ekaterina Dimitrova
>
> Perhaps this was needed during j17 stabilization but is no longer required?

No, I only switched from tests running J8+J11 to tests running J11+J17.
What we tested was something decided in the 4.0 era when JDK11 was added,
and I was not even part of the community yet :-)

> Any known java-related changes require precommit j11 + j17.

While it is correct, we need to run both when changing anything
Java-related. I think more cases are not always obvious during development.
Just a few examples on top of my mind:
- Every next Java version is closing more and more internals - this means
that someone can be tempted to use some JDK11 internals, which are still
open in 11 but closed in later Java versions, and we will be in trouble. I
would prefer to see that before committing the patch, not after. Some cases
can be considered early in development and save us time. They are not
necessarily caught in compile time.
- Not all dependencies always support all Java versions, and it is not
necessarily immediately obvious
- Not all issues are exposed as failures on every CI run - we have flakies
hard to reproduce with runtime exceptions, especially around things that
require object crawling - on top of my mind jamm and the leak detector, but
also it could come from a dependency we haven't even thought about it.

With all my due respect to everyone and the extremely valuable discussion
here around what we run in CI, I fear we diverged the discussion from what
we do about the ticket in hand, and it is feasible today, to the long-term
discussion - nightly builds, etc. I believe the long-term discussion
deserves its thread, ticket, and work.
@Berenguer, thank you for volunteering to open an additional thread and
working on a new suggestion. Does anyone have anything against moving all
long-term suggestions not immediately related to this work here into a new
discussion thread? Also, in the spirit of the repeatable CI project,
creating a table with pre-commit and post-commit suggested jobs to run will
be good. Then we can decide what we want and as a second step add/remove
jobs in Jenkins, Circle, or whatever other CI people use at the moment and
hopefully converge it soon through the repeatable CI project. Do you think
this makes sense?

Again, I don't see value in running build J11 and J17 runtime additionally
> to J11 runtime - just pick one unless we change something specific to JVM

All JDK-17 problems I've seen were exposed in both situations - run 17 with
build 11 or 17 build. So I am fine with Jacek's suggestion, too, but I
prefer us to run on every commit, whatever we ship with. In the case of 5.0
- build JDK11, run JDK11 tests, run JDK17 tests, and to help ourselves -
build with JDK17.

Branimir in this patch has already done some basic cleanup of test
> variations, so this is not a duplication of the pipeline. It's a
> significant improvement.
>  I'm ok with cassandra_latest being committed and added to the pipeline,
> *if* the authors genuinely believe there's significant time and effort
> saved in doing so.

I share this sentiment if people are okay with us adding pre-commit now new
Python and Java distributed test jobs with the new configuration, and this
is not going to raise a lot the resource consumption. (Python tests are the
most resource-heavy tests. though we do not look at upgrade tests)

The plan is to ensure both configurations are green pre-commit. This should
> not increase the CI cost as this replaces extra configurations we were
> running before (e.g. test-tries).

Branimir, Did you also replace any Python tests? I am not worried about
unit test consumption but about the Python tests primarily.  Those are
running on the bigger containers in CircleCI, which burn more credits.
Also, Stefan, valid point - does Jenkins currently have enough resources to
cover the load? Was this tested?


On Thu, 15 Feb 2024 at 13:59, David Capwell  wrote:

> This thread got large quick, yay!
>
> is there a reason all guardrails and reliability (aka repair retries)
> configs are off by default?  They are off by default in the normal config
> for backwards compatibility reasons, but if we are defining a config saying
> what we recommend, we should enable these things by default IMO.
>
> This is one more question to be answered by this discussion. Are there
> other options that should be enabled by the "latest" configuration? To what
> values should they be set?
> Is there something that is currently enabled that should not be?
>
>
> Very likely, we should try to figure that out.  We should also answer how
> conservative do we want to be by default?  There are many configs we need
> to flesh out here, glad to help with the configs I authored (prob best for
> JIRA rather than this thread)
>
>
> Should we merge the configs breaking these tests?  No…. When we have
> failing tests people do not spend the time to figure out if their logic
> caused a regression and merge, making things more unstable… so when we
> merge 

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-14 Thread Ekaterina Dimitrova
>
> I'm ok with breaking trunk CI temporarily as long as failures are tracked
> and triaged/addressed before the next release.


>From the ticket, I understand it is meant for 5.0-rc

I share this sentiment for the release we decide to ship with:

> The failures should block release or we should not advertise we have those
> features at all, and the configuration should be named "experimental"
> rather than "latest".


Is the community okay with committing the patch before all of these are
> addressed?

If we aim to fix everything before the next release 5.0-rc, we can commit
CASSANDRA-18753 after the fixes are applied. If we are not going to do all
the fixes anytime soon - I prefer to commit and have the failures and the
tickets open. Otherwise, I can guarantee that I, personally, will forget
some of those failures and miss them in time... and I am suspicious I won’t
be the only one :-)

This version is provided for new users of # Cassandra who want to get the
> most out of their cluster and for users # evaluating the technology.

>From reading this thread, we do not recommend using it straight into
production but to experiment, gain trust, and then use it in production.
Did I get it correctly? We need to confirm what it is and be sure it is
clearly stated in the docs.

Announcing this new yaml file under NEWS.txt features sounds reasonable to
me. Or can we add a new separate section on top of  NEWS.txt 5.0, dedicated
only to the announcement of this new configuration file?

Mick and Ekaterina (and everyone really) - any thoughts on what test
> coverage we should commit to for this new configuration? Acknowledging that
> we already have *a lot* of CI that we run.

I do not have an immediate answer. I see there is some proposed CI
configuration in the ticket. As far as I can tell from a quick look, the
suggestion is to replace unit-trie with unit-latest (which exercises also
tries) and the additional new jobs will be Python and Java DTests. (no new
upgrade tests)
On top of my mind - we probably need a cost-benefit analysis, risk
analysis, and tradeoffs discussed - burnt resources vs manpower, early
detection vs late discovery, or even prod issues. Experimental vs
production-ready, etc

Now, this question can have different answers depending on whether this is
an experimental config or we recommend it for production use.

I would expect new features to be enabled in this configuration and all
tests to be run pre-commit with the default and the new YAML files. Is this
a correct assumption? Probably done with a note on the ML.

The question is, do we have enough resources in Jenkins to facilitate all
this testing post-commit?

> I think it is much more valuable to test those various configurations
> rather than test against j11 and j17 separately. I can see a really little
> value in doing that.

Excellent point, I was saying for some time that IMHO we can reduce
to running in CI at least pre-commit:
1) Build J11 2) build J17
3) run tests with build 11 + runtime 11
4) run tests with build 11 and runtime 17.

Technically, that is what we also ship in 5.0. (Except the 2), the JDK17
build but we should not remove that from CI)
Does it make sense to reduce to what I mentioned in 1,2,3,4 and instead add
the suggested jobs with the new configuration from CASSANDRA-18753 in
pre-commit? Please correct me if I am wrong, but I understand that running
with JDK17 tests on the 17 build is experimental in CI, so we can gain
confidence until the release when we will drop 11. No? If that is correct,
I do not see why we run those tests on every pre-commit and not only what
we ship.

Best regards,
Ekaterina

On Wed, 14 Feb 2024 at 17:35, Štefan Miklošovič 
wrote:

> I agree with Jacek, I don't quite understand why we are running the
> pipeline for j17 and j11 every time. I think this should be opt-in.
> Majority of the time, we are just refactoring and coding stuff for
> Cassandra where testing it for both jvms is just pointless and we _know_
> that it will be fine in 11 and 17 too because we do not do anything
> special. If we find some subsystems where testing that on both jvms is
> crucial, we might do that, I just do not remember when it was last time
> that testing it in both j17 and j11 suddenly uncovered some bug. Seems more
> like a hassle.
>
> We might then test the whole pipeline with a different config basically
> for same time as we currently do.
>
> On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> śr., 14 lut 2024 o 17:30 Josh McKenzie  napisał(a):
>>
>>> When we have failing tests people do not spend the time to figure out if
>>> their logic caused a regression and merge, making things more unstable… so
>>> when we merge failing tests that leads to people merging even more failing
>>> tests...
>>>
>>> What's the counter position to this Jacek / Berenguer?
>>>
>>
>> For how long are we going to deceive ourselves? Are we shipping those
>> features or not? Perhaps it is 

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Ekaterina Dimitrova
Congrats, Maxim! Well deserved! Thank you for everything!

On Mon, 8 Jan 2024 at 13:26, Jeremiah Jordan 
wrote:

> Congrats Maxim!  Thanks for all of your contributions!
>
> On Jan 8, 2024 at 12:19:04 PM, Josh McKenzie  wrote:
>
>> The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has
>> accepted
>> the invitation to become a committer.
>>
>> Thanks for all the hard work and collaboration on the project thus far,
>> and we're all looking forward to working more with you in the future.
>> Congratulations and welcome!
>>
>> The Apache Cassandra PMC members
>>
>>
>>


Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread Ekaterina Dimitrova
Great news! Congrats, Mike!! Thank you for all your hard work!!!

On Fri, 8 Dec 2023 at 9:55, Jeremiah Jordan 
wrote:

> Congrats Mike!  Thanks for all your work on SAI and Vector index.  Well
> deserved!
>
> On Dec 8, 2023 at 8:52:07 AM, Brandon Williams  wrote:
>
>> Congratulations Mike!
>>
>> Kind Regards,
>> Brandon
>>
>> On Fri, Dec 8, 2023 at 8:41 AM Benjamin Lerer  wrote:
>>
>>
>> The PMC members are pleased to announce that Mike Adamson has accepted
>>
>> the invitation to become committer.
>>
>>
>> Thanks a lot, Mike, for everything you have done for the project.
>>
>>
>> Congratulations and welcome
>>
>>
>> The Apache Cassandra PMC members
>>
>>


Re: Pre-commit (devbranch) ci-cassandra.a.o on 5.0 and trunk unavailable

2023-11-29 Thread Ekaterina Dimitrova
Jenkins Trunk is also not running post commit at the moment - pending
https://issues.apache.org/jira/browse/CASSANDRA-19083

On Wed, 29 Nov 2023 at 3:38, Mick Semb Wever  wrote:

>
> For broader awareness.
>
> The debranch job on ci-cassandra.a.o: that used for pre-commit testing; is
> currently not available for patches against 5.0 and trunk.
>
> Restoring it is part of CASSANDRA-18594.  This was a priority up until
> recently, eta now is early January.
>
> Those that do not have circleci premium plan, or any other CI system, to
> use for testing patches, please reach out to me (or anyone with access) to
> get it run through circleci.
>


Re: [DISCUSSION] CASSANDRA-19001 - JRE vs JDK runtime

2023-11-29 Thread Ekaterina Dimitrova
Thanks, Mick

It seems that no one had objections to your suggestion so I will move
forward with that:

“I suggest, for expediency, to

- put a nice failure message in Sjk.java (e.g. "JDK required for this
nodetool command"),
- add a comment in cassandra.yaml in front of audit_logging_options stating
compatibility with JRE is unknown (referencing the jira issue), and
- add the jdk add-opens and add-exports only when jdk is detected (javac is
on path).”


On Tue, 21 Nov 2023 at 7:05, Mick Semb Wever  wrote:

>
>
>
>> Given we have functionality that depends on a JDK, and all our testing is
>> done with a JDK, I'm in favour of printing a warning that JDK is
>> recommended, from 5.0 onwards.  And before 5.0 just leaving things in the
>> state they are in.
>>
>
>
> Correction: ignore these two sentences.  They should have been deleted
> before I hit send.
>
>


Re: CEP-21 - Transactional cluster metadata merged to trunk

2023-11-28 Thread Ekaterina Dimitrova
I hate to say it, but I was disappointed that this email thread was started
after the TCM work had already been committed. Especially knowing how we
had even an epic with patches spread around the codebase, which are waiting
on TCM to get committed first so that we do not disturb any rebase. The
latest commits in trunk were relatively isolated. We could have committed
the last outstanding tickets part of the push for the next release and
still made some exceptions for the TCM work merge after that — just a
constructive approach and visibility.

On a positive note, thank you to everyone who worked and keeps working on
TCM! This is an extreme effort!! I'm super excited!

"Can we get these tests temporarily annotated as skipped while all the
subtickets to 19055 are being worked on"

I am against that. Considering I can see Sam, Alex, and Marcus working
around the clock to solve any test-related tickets, I think there is no
need to do this. Also, in general, ignoring tests also leads to risks of
something being forgotten, a ticket being closed by mistake, and new types
of failures being missed.

"it makes sense to do a repeated run of the new tests."
I do agree with Jacek here.

"I also had to go the route of extracting a blend of what's in circle and
what's in ASF CI (in terms of test suites, filtering, etc) since neither
represented a complete view of our CI ecosystem;"
There is a single test suite in CircleCI, and the packaging needs to be
included.

"From a cursory inspection it looks like most of the breakages being
tracked on the ticket Sam linked for TCM are likely to be circle env
specific (new *nix optimized deletion having a race, OOM's, etc). The TCM
merge is actually a great forcing function for us to surface anything env
specific in terms of timing and resourcing up-front;"
I spent my afternoon triaging the tickets and running repeated runs on
reported failures in the CASSANDRA-19055, as I do not believe we can/should
blindly blame all new failures/flakies on TCM. Unfortunately, all the
tickets I triaged were accurately assigned to TCM, and I don't think there
were more than 1 or 2 OOM tickets.

"My gut tells me it's basically impossible to have a merge of this size
that doesn't disrupt what it's merging into"
I agree with this. There is no way that everything will end smoothly and
perfectly with such extensive work. But we should all recognize the
excellent work that Sam, Alex, and Marcus are doing here to identify and
fix outstanding issues in the past two days. Thank you! I am sure everyone
appreciates that! And I would like to appeal to the community members to
keep triaging and bisecting any new test flakies/failures as we were doing
and not just blindly assign everything to the TCM follow-up epic. We should
be constructive.

"for the record, I don't think we should hold off on merging things just
because some folks are on holiday. :)"

I do not believe anyone advocates for that. I personally even often commit
disruptive patches on the weekend. This gives me the time to deal with the
fallout and reduce the potential disruption to people's work during
office hours. My main concern is what was mentioned already by Benjamin and
Jacek - creating precedent here after all the discussions that happened.


On Mon, 27 Nov 2023 at 16:34, Josh McKenzie  wrote:

> on our internal CI system
>
> Some more context:
>
> This environment adheres to the requirements we laid out in pre-commit CI
> on Cassandra
> 
>  with
> a couple required differences. We don't yet include the resource
> restriction detail in the test report; it's on my backlog of things to do
> but I can confirm that less CPU and <= equivalent ASFCI memory is being
> allocated for each test suite. I also had to go the route of extracting a
> blend of what's in circle and what's in ASF CI (in terms of test suites,
> filtering, etc) since neither represented a complete view of our CI
> ecosystem; there are currently things executed in either environment not
> executed in the other.
>
> I've been tracking the upstreaming of that declarative combination in
> CASSANDRA-18731 but have had some other priorities take front-seat (i.e.
> getting a new CI system based on that working since neither upstream ASF CI
> nor circle are re-usable in their current form) and will be upstreaming
> that ASAP. https://issues.apache.org/jira/browse/CASSANDRA-18731
>
> I've left a pretty long comment on CASSANDRA-18731 about the structure of
> things and where my opinion falls; *I think we need a separate DISCUSS
> thread on the ML about CI and what we require for pre-commit smoke*
> suites:
> https://issues.apache.org/jira/browse/CASSANDRA-18731?focusedCommentId=17790270=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17790270
>
> The TL;DR:
>
> With an *incredibly large* patch in the form of TCM (88k+ LoC, 900+ files
> touched), we have less than a 

Re: Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-28 Thread Ekaterina Dimitrova
Congrats Francisco! Well deserved !

On Tue, 28 Nov 2023 at 14:08, J. D. Jordan 
wrote:

> Congrats!
>
> > On Nov 28, 2023, at 12:57 PM, C. Scott Andreas 
> wrote:
> >
> > Congratulations, Francisco!
> >
> > - Scott
> >
> >> On Nov 28, 2023, at 10:53 AM, Dinesh Joshi  wrote:
> >>
> >> The PMC members are pleased to announce that Francisco Guerrero
> Hernandez has accepted
> >> the invitation to become committer today.
> >>
> >> Congratulations and welcome!
> >>
> >> The Apache Cassandra PMC members
>


Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-28 Thread Ekaterina Dimitrova
“The QA label is part of our version number and that's baked in.”


How can we have beta 2, but there is no way for alpha 2?

On Tue, 28 Nov 2023 at 13:55, Mick Semb Wever  wrote:

>
>
> On Tue, 28 Nov 2023 at 19:27, J. D. Jordan 
> wrote:
>
>> That said. This is clearly better than and with many fixes from the
>> alpha. Would people be more comfortable if this cut was released as another
>> alpha and we do beta1 once the known fixes land?
>>
>
>
> There is no way with our release process to do this.  The QA label is part
> of our version number and that's baked in.
>
> Otherwise I entirely agree that alpha2 would have been the better label
> here.  We are working in an unusual situation, one I hope we don't repeat.
>
>


Re: [DISCUSS] Harry in-tree

2023-11-27 Thread Ekaterina Dimitrova
+1, also, Alex, just an idea - maybe you want to make a virtual talk, as
part of the contributors meetings?


На понеделник, 27 ноември 2023 г. Yifan Cai  написа:

> +1
> --
> *发件人:* Sam Tunnicliffe 
> *发送时间:* Tuesday, November 28, 2023 2:43:51 AM
> *收件人:* dev 
> *主题:* Re: [DISCUSS] Harry in-tree
>
> Definite +1 to bringing harry-core in tree.
>
> On 24 Nov 2023, at 15:43, Alex Petrov  wrote:
>
> Hi everyone,
>
> With TCM landed, there will be way more Harry tests in-tree: we are using
> it for many coordination tests, and there's now a simulator test that uses
> Harry. During development, Harry has allowed us to uncover and resolve
> numerous elusive edge cases.
>
> I had conversations with several folks, and wanted to propose to move
> harry-core to Cassandra test tree. This will substantially
> simplify/streamline co-development of Cassandra and Harry. With a new
> HistoryBuilder API that has helped to find and trigger [1] [2] and [3], it
> will also be much more approachable.
>
> Besides making it easier for everyone to develop new fuzz tests, it will
> also substantially lower the barrier to entry. Currently, debugging an
> issue found by Harry involves a cumbersome process of rebuilding and
> transferring jars between Cassandra and Harry, depending on which side you
> modify. This not only hampers efficiency but also deters broader adoption.
> By merging harry-core into the Cassandra test tree, we eliminate this
> barrier.
>
> Thank you,
> --Alex
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-19011
> [2] https://issues.apache.org/jira/browse/CASSANDRA-18993
> [3] https://issues.apache.org/jira/browse/CASSANDRA-18932
>
>
>


[DISCUSSION] CASSANDRA-19001 - JRE vs JDK runtime

2023-11-20 Thread Ekaterina Dimitrova
Hi everyone,

Some of you remember that Paulo raised a concern around warnings about
missing modules on startup during the 5.0 alpha2 voting.[1]
We opened the ticket CASSANDRA-19001. The problem turned out to be not a
5.0 one.
Those modules are missing when we run Cassandra on JRE. Those warnings
happened to appear only when using Java 17. However, the modules are also
missing with Java 11…

jdk. attach is needed for nodetool sjk hh. When someone tries to use
that command, they get an error that JDK is expected. There isn't
anything to do about that here.


Now, the potential issue comes with jdk.compiler. It was added for
Chronicle queues as per this article -
https://chronicle.software/chronicle-support-java-17/

I tried contacting the project to understand the implications of
missing this module - errors expected/degraded performance. I did not
get any response so far. More details on the ticket. (CASSANDRA-19001)


In the meantime, a discussion of whether we want to recommend using
JRE and not JDK started on the ticket. For more details, please check
CASSANDRA-19001.


FQL and Audit Logging use Chronicle Queues. Our CI does not show any
issues when I remove the add-opens/add-exports for jdk.compiler, but
that doesn't mean there is no hidden issue, as we all know.


Naturally, I have a few questions for you:

1) Does Anyone have more experience with Chronicle queues and have
some thoughts/advice to share? Is it worth it to dig more into the
Chronicle repos ourselves?

2) Do we want to recommend people to switch to JDK? Or should we warn
that JDK is recommended on startup if someone has enabled Audit
Logging/FQL and is using JRE? These are some of the options so far
mentioned on CASSANDRA-19001. What we decide will also influence the
response to 1) (Is it worth it to dig more into the Chronicle repos
ourselves?).


A few notes:

- our CI tests with JDK

- our packages check JRE

- the official docker image has used JRE for years

- our docs are inconsistent when mentioning JDK and JRE in different places


https://lists.apache.org/thread/sb3qfmgfx9hpbm58jfo83qsdsxqpqscx


Re: [VOTE] Release Apache Cassandra 5.0-alpha2

2023-11-04 Thread Ekaterina Dimitrova
   1. “No objections from me since these issues are mostly cosmetic, but it
   would be nice to clear these before the next alpha/beta. I will create a
   ticket for the unknown module warning later if nobody beats me to it.”



   1.


   1. CASSANDRA-19001
    opened for the
   issue to be checked. Until we have tested/investigated whether the features
   that are supposed to use those modules experience any issues, this is an
   isolated problem that might turn out to be a cosmetic one. So far, we know
   the associated features and the JDK where the warnings are seen.


You are right, Paulo, I meant CASSANDRA-18711 is the one that will take
care of the Security Manager deprecation in the future. I just moved it out
of triage.

Best regards,
Ekaterina


On Sat, 4 Nov 2023 at 7:15, Mick Semb Wever  wrote:

> > As this is alpha release - can we open a ticket to be resolved in the
>> next alpha/beta? It is up to PMC to decide, of course.
>>
>> No objections from me since these issues are mostly cosmetic, but it
>> would be nice to clear these before the next alpha/beta. I will create a
>> ticket for the unknown module warning later if nobody beats me to it.
>>
>
>
> I agree.  When these tickets are created please add fixVersion '5.0-beta'
> to indicate such.
>
>
>


Re: Road to 5.0-GA (was: [VOTE] Release Apache Cassandra 5.0-alpha2)

2023-11-04 Thread Ekaterina Dimitrova
Totally agree with the others. Such an issue on its own should be a
priority in any release. Looking forward to the reproduction test mentioned
on the ticket.

Thanks to Alex for his work on harry!

On Sat, 4 Nov 2023 at 12:47, Benedict  wrote:

> Alex can confirm but I think it actually turns out to be a new bug in 5.0,
> but either way we should not cut a release with such a serious potential
> known issue.
>
> > On 4 Nov 2023, at 16:18, J. D. Jordan  wrote:
> >
> > Sounds like 18993 is not a regression in 5.0? But present in 4.1 as
> well?  So I would say we should fix it with the highest priority and get a
> new 4.1.x released. Blocking 5.0 beta voting is a secondary issue to me if
> we have a “data not being returned” issue in an existing release?
> >
> >> On Nov 4, 2023, at 11:09 AM, Benedict  wrote:
> >>
> >> I think before we cut a beta we need to have diagnosed and fixed 18993
> (assuming it is a bug).
> >>
>  On 4 Nov 2023, at 16:04, Mick Semb Wever  wrote:
> >>>
> >>> 
> 
>  With the publication of this release I would like to switch the
>  default 'latest' docs on the website from 4.1 to 5.0.  Are there any
>  objections to this ?
> >>>
> >>>
> >>> I would also like to propose the next 5.0 release to be 5.0-beta1
> >>>
> >>> With the aim of reaching GA for the Summit, I would like to suggest we
> >>> work towards the best-case scenario of 5.0-beta1 in two weeks and
> >>> 5.0-rc1 first week Dec.
> >>>
> >>> I know this is a huge ask with lots of unknowns we can't actually
> >>> commit to.  But I believe it is a worthy goal, and possible if nothing
> >>> sideswipes us – but we'll need all the help we can get this month to
> >>> make it happen.
> >>
>
>


Re: [VOTE] Release Apache Cassandra 5.0-alpha2

2023-11-03 Thread Ekaterina Dimitrova
I am sorry, I totally forgot to address your other concern, Paulo. The
security manager is marked for deprecation in JDK 17. So this warning is to
stress to people they need to take care of a replacement, sooner than
later. I believe we have somewhere an unassigned ticket in Open status to
address this topic.

On Fri, 3 Nov 2023 at 21:07, Ekaterina Dimitrova 
wrote:

> Hi Paulo,
>
> Thank you for testing and for raising the issue!
> I can confirm I do not use the same JDK as you, and I did not see any
> warnings on my machine on startup or when calling nodetool commands.
>
> I believe on a quick check that jdk.attach was needed for nodetool sjk.
> (It was mentioned on CASSANDRA-16895 at least)
> About jdk.compiler - it was added as per this recommendation for chronicle
> https://chronicle.software/chronicle-support-java-17/
>
> I do not believe we test with the mentioned JDK in CI, so additional
> testing will be required to figure out things better.
>
> As this is alpha release - can we open a ticket to be resolved in the
> next alpha/beta? It is up to PMC to decide, of course. Also, we need a bit
> more investigation here. I can try to take a look tomorrow in more
> detail if no one beats me to that.
>
> Best regards,
> Ekaterina
>
>
> On Fri, 3 Nov 2023 at 20:01, Paulo Motta  wrote:
>
>> Clarification:
>> - When running nodetool only the "Unknown module" warnings show up. All
>> warnings show up during startup.
>>
>> On Fri, Nov 3, 2023 at 7:58 PM Paulo Motta  wrote:
>>
>>> Launched a tarball-based 5.0-alpha2 container on top of
>>> "eclipse-temurin:17-jre-focal" and the server starts up fine, can run
>>> nodetool and cqlsh.
>>>
>>> I got these seemingly harmless JDK17 warnings during startup and when
>>> running nodetool (no warnings on JDK11):
>>>
>>> WARNING: Unknown module: jdk.attach specified to --add-exports
>>> WARNING: Unknown module: jdk.compiler specified to --add-exports
>>> WARNING: Unknown module: jdk.compiler specified to --add-opens
>>> WARNING: A terminally deprecated method in java.lang.System has been
>>> called
>>> WARNING: System::setSecurityManager has been called by
>>> org.apache.cassandra.security.ThreadAwareSecurityManager
>>> (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar)
>>> WARNING: Please consider reporting this to the maintainers of
>>> org.apache.cassandra.security.ThreadAwareSecurityManager
>>> WARNING: System::setSecurityManager will be removed in a future release
>>>
>>> Anybody knows if these warnings are legit/expected ? We can create
>>> follow-up tickets if needed.
>>>
>>> $ java --version
>>> openjdk 17.0.9 2023-10-17
>>> OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
>>> OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode,
>>> sharing)
>>>
>>> On Fri, Nov 3, 2023 at 6:13 PM Jonathan Ellis  wrote:
>>>
>>>> +1
>>>>
>>>> On Mon, Oct 30, 2023 at 3:47 PM Mick Semb Wever  wrote:
>>>>
>>>>> Proposing the test build of Cassandra 5.0-alpha2 for release.
>>>>>
>>>>> DISCLAIMER, this alpha release does not contain the features:
>>>>> Transactional Cluster Metadata (CEP-21) and Accord Transactions
>>>>> (CEP-15).  These features are under discussion to be pushed to a
>>>>> 5.1-alpha1 release, with an eta still this year.
>>>>>
>>>>> This release does contain Vector Similarity Search (CEP-30).
>>>>>
>>>>> Please also note that this is an alpha release and what that means,
>>>>> further info at
>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>>>>>
>>>>> sha1: ea76d148c374198fede6978422895668857a927f
>>>>> Git: https://github.com/apache/cassandra/tree/5.0-alpha2-tentative
>>>>> Maven Artifacts:
>>>>>
>>>>> https://repository.apache.org/content/repositories/orgapachecassandra-1317/org/apache/cassandra/cassandra-all/5.0-alpha2/
>>>>>
>>>>> The Source and Build Artifacts, and the Debian and RPM packages and
>>>>> repositories, are available here:
>>>>> https://dist.apache.org/repos/dist/dev/cassandra/5.0-alpha2/
>>>>>
>>>>> The vote will be open for 72 hours (longer if needed). Everyone who
>>>>> has tested the build is invited to vote. Votes by PMC members are
>>>>> considered binding. A vote passes if there are at least three binding
>>>>> +1s and no -1's.
>>>>>
>>>>> [1]: CHANGES.txt:
>>>>>
>>>>> https://github.com/apache/cassandra/blob/5.0-alpha2-tentative/CHANGES.txt
>>>>> [2]: NEWS.txt:
>>>>> https://github.com/apache/cassandra/blob/5.0-alpha2-tentative/NEWS.txt
>>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> co-founder, http://www.datastax.com
>>>> @spyced
>>>>
>>>


Re: [VOTE] Release Apache Cassandra 5.0-alpha2

2023-11-03 Thread Ekaterina Dimitrova
Hi Paulo,

Thank you for testing and for raising the issue!
I can confirm I do not use the same JDK as you, and I did not see any
warnings on my machine on startup or when calling nodetool commands.

I believe on a quick check that jdk.attach was needed for nodetool sjk. (It
was mentioned on CASSANDRA-16895 at least)
About jdk.compiler - it was added as per this recommendation for chronicle
https://chronicle.software/chronicle-support-java-17/

I do not believe we test with the mentioned JDK in CI, so additional
testing will be required to figure out things better.

As this is alpha release - can we open a ticket to be resolved in the
next alpha/beta? It is up to PMC to decide, of course. Also, we need a bit
more investigation here. I can try to take a look tomorrow in more
detail if no one beats me to that.

Best regards,
Ekaterina


On Fri, 3 Nov 2023 at 20:01, Paulo Motta  wrote:

> Clarification:
> - When running nodetool only the "Unknown module" warnings show up. All
> warnings show up during startup.
>
> On Fri, Nov 3, 2023 at 7:58 PM Paulo Motta  wrote:
>
>> Launched a tarball-based 5.0-alpha2 container on top of
>> "eclipse-temurin:17-jre-focal" and the server starts up fine, can run
>> nodetool and cqlsh.
>>
>> I got these seemingly harmless JDK17 warnings during startup and when
>> running nodetool (no warnings on JDK11):
>>
>> WARNING: Unknown module: jdk.attach specified to --add-exports
>> WARNING: Unknown module: jdk.compiler specified to --add-exports
>> WARNING: Unknown module: jdk.compiler specified to --add-opens
>> WARNING: A terminally deprecated method in java.lang.System has been
>> called
>> WARNING: System::setSecurityManager has been called by
>> org.apache.cassandra.security.ThreadAwareSecurityManager
>> (file:/opt/cassandra/lib/apache-cassandra-5.0-alpha2-SNAPSHOT.jar)
>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.cassandra.security.ThreadAwareSecurityManager
>> WARNING: System::setSecurityManager will be removed in a future release
>>
>> Anybody knows if these warnings are legit/expected ? We can create
>> follow-up tickets if needed.
>>
>> $ java --version
>> openjdk 17.0.9 2023-10-17
>> OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
>> OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode,
>> sharing)
>>
>> On Fri, Nov 3, 2023 at 6:13 PM Jonathan Ellis  wrote:
>>
>>> +1
>>>
>>> On Mon, Oct 30, 2023 at 3:47 PM Mick Semb Wever  wrote:
>>>
 Proposing the test build of Cassandra 5.0-alpha2 for release.

 DISCLAIMER, this alpha release does not contain the features:
 Transactional Cluster Metadata (CEP-21) and Accord Transactions
 (CEP-15).  These features are under discussion to be pushed to a
 5.1-alpha1 release, with an eta still this year.

 This release does contain Vector Similarity Search (CEP-30).

 Please also note that this is an alpha release and what that means,
 further info at
 https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

 sha1: ea76d148c374198fede6978422895668857a927f
 Git: https://github.com/apache/cassandra/tree/5.0-alpha2-tentative
 Maven Artifacts:

 https://repository.apache.org/content/repositories/orgapachecassandra-1317/org/apache/cassandra/cassandra-all/5.0-alpha2/

 The Source and Build Artifacts, and the Debian and RPM packages and
 repositories, are available here:
 https://dist.apache.org/repos/dist/dev/cassandra/5.0-alpha2/

 The vote will be open for 72 hours (longer if needed). Everyone who
 has tested the build is invited to vote. Votes by PMC members are
 considered binding. A vote passes if there are at least three binding
 +1s and no -1's.

 [1]: CHANGES.txt:

 https://github.com/apache/cassandra/blob/5.0-alpha2-tentative/CHANGES.txt
 [2]: NEWS.txt:
 https://github.com/apache/cassandra/blob/5.0-alpha2-tentative/NEWS.txt

>>>
>>>
>>> --
>>> Jonathan Ellis
>>> co-founder, http://www.datastax.com
>>> @spyced
>>>
>>


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-26 Thread Ekaterina Dimitrova
Benedict, what is your expectation for stabilization time? And what is the
suggestion for the patches Benjamin mentioned, which are on their way to
land in trunk? (Or any other patch on its way to be merged)

On Thu, 26 Oct 2023 at 8:20, Benedict  wrote:

> The time to stabilise is orthogonal to the time we branch. Once we branch
> we stop accepting new features for the branch, and work to stabilise.
>
> My understanding is we will branch as soon as we have a viable alpha
> containing TCM and Accord. That means pretty soon after they land in the
> project, which we expect to be around the summit.
>
> If this isn’t the expectation we should make that clear, as it will affect
> how this decision is made.
>
> On 26 Oct 2023, at 10:14, Benjamin Lerer  wrote:
>
> 
>
> Regarding the release of 5.1, I understood the proposal to be that we cut
>> an actual alpha, thereby sealing the 5.1 release from new features. Only
>> features merged before we cut the alpha would be permitted, and the alpha
>> should be cut as soon as practicable. What exactly would we be waiting for?
>
>
> The problem I believe is about expectations. It seems that your
> expectation is that a release with only TCM and Accord will reach GA
> quickly. Based on the time it took us to release 4.1, I am simply expecting
> more delays (a GA around end of May, June). In which case it seems to me
> that we could be interested in shipping more stuff in the meantime
> (thinking of CASSANDRA-15254 or CEP-29 for example).
> I do not have a strong opinion, I just want to make sure that we all share
> the same understanding and fully understand what we agree upon.
>
> Le jeu. 26 oct. 2023 à 10:59, Benjamin Lerer  a écrit :
>
>> I am surprised this needs to be said, but - especially for long-running
>>> CEPs - you must involve yourself early, and certainly within some
>>> reasonable time of being notified the work is ready for broader input and
>>> review. In this case, more than six months ago.
>>
>>
>> It is unfortunately more complicated than that because six month ago
>> Ekaterina and I were working on supporting Java 17 and dropping Java 8
>> which was needed by different ongoing works. We both missed the
>> announcement that TCM was ready for review and anyway would not have been
>> available at that time. Maxim has asked me ages ago for a review of
>> CASSANDRA-15254 
>> more than 6 months ago and I have not been able to help him so far. We all
>> have a limited bandwidth and can miss some announcements.
>>
>> The project has grown and a lot of things are going on in parallel. There
>> are also more interdependencies between the different projects. In my
>> opinion what we are lacking is a global overview of the different things
>> going on in the project and some rough ideas of the status of the different
>> significant pieces. It would allow us to better organize ourselves.
>>
>> Le jeu. 26 oct. 2023 à 00:26, Benedict  a écrit :
>>
>>> I have spoken privately with Ekaterina, and to clear up some possible
>>> ambiguity: I realise nobody has demanded a delay to this work to conduct
>>> additional reviews; a couple of folk have however said they would prefer
>>> one.
>>>
>>>
>>> My point is that, as a community, we need to work on ensuring folk that
>>> care about a CEP participate at an appropriate time. If they aren’t able
>>> to, the consequences of that are for them to bear.
>>>
>>>
>>> We should be working to avoid surprises as CEP start to land. To this
>>> end, I think we should work on some additional paragraphs for the
>>> governance doc covering expectations around the landing of CEPs.
>>>
>>> On 25 Oct 2023, at 21:55, Benedict  wrote:
>>>
>>> 
>>>
>>> I am surprised this needs to be said, but - especially for long-running
>>> CEPs - you must involve yourself early, and certainly within some
>>> reasonable time of being notified the work is ready for broader input and
>>> review. In this case, more than six months ago.
>>>
>>>
>>> This isn’t the first time this has happened, and it is disappointing to
>>> see it again. Clearly we need to make this explicit in the guidance docs.
>>>
>>>
>>> Regarding the release of 5.1, I understood the proposal to be that we
>>> cut an actual alpha, thereby sealing the 5.1 release from new features.
>>> Only features merged before we cut the alpha would be permitted, and the
>>> alpha should be cut as soon as practicable. What exactly would we be
>>> waiting for?
>>>
>>>
>>> If we don’t have a clear and near-term trigger for branching 5.1 for its
>>> own release, shortly after Accord and TCM merge, then I am in favour of
>>> instead delaying 5.0.
>>>
>>> On 25 Oct 2023, at 19:40, Mick Semb Wever  wrote:
>>>
>>> 
>>> I'm open to the suggestions of not branching cassandra-5.1 and/or naming
>>> a preview release something other than 5.1-alpha1.
>>>
>>> But… the codebases and release process (and upgrade tests) do not
>>> currently support releases with 

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-25 Thread Ekaterina Dimitrova
Hi everyone,
Thanks, Mick, for raising the topic.

I support having released 5.0 without waiting on Accord and TCM as
previously discussed here. (we are almost November, and the features
are not ready. The currently committed set is glamorous in its own way
:-) )https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3

I support releasing 5.1 when TCM and Accord are ready and not
necessarily waiting another year for 5.1, next release.


Now, in more detail, what my understanding is and what I did just
supported exactly:

"2. The next release will be 5.1 and will include only Accord and TCM"
I read the current thread, and I didn't see a mention that we are not
accepting anything else to trunk in the meantime until those are ready
and reviewed. I do not see an issue with accepting other works in the
meantime as long as everyone adheres to our approach of merging
finished, fully reviewed, and tested working patches.

"The TCM work (CEP-21) is in its review stage but being well past our
cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
like to propose the following."
I feel this statement is maybe a bit confusing as there were no new
patches (except a doc and harry version change) since this
conversation 
happened:https://the-asf.slack.com/archives/CK23JSY2K/p1695654432116079

This means to me that we are doing high-level checks and building
understanding at the moment (that's actually great; I am very happy we
are doing it), but the full-fledged review hasn't started because the
code is not yet fully committed.

"Reviewing of TCM and Accord will continue to happen post-merge. This is
not our normal practice, but this work will have already received its
two +1s from committers, and such ongoing review effort is akin to GA
stabilisation work on release branches."
My reading here is that a review, as per our project guidelines, by
two committers, will happen before the merge. As usual, if there is
some exception or something - it should be brought to the dev ML for
discussion. https://cassandra.apache.org/_/development/how_to_review.html
(feature flags are not mentioned in the doc, but we have it somewhere
on the ML for sure)
 BUT as it is a big piece of work, as it happens with many
features - there might be more than 2 reviewers involved. Normally, we
wait until everyone is done, not only as long as we have two
committers +1s and ignore the others (as long as they do not have
immediate concerns). My understanding of Mick's suggestion is if
people are not ready with their reviews and they do not have immediate
concerns to raise - things can be merged based on the two committers
who are confident the features meet our standards. I have a  strong
preference to wait on everyone, but I am not going to block anything
as long as we have the two reviewers confident and green CI, as usual.
Then, the rest of the reviewers can continue testing and reviewing,
and the authors will stay available to address any new
concerns/questions/feedback. (Overall, we know that the CLA says
contributions are accepted on "AS-IS" basis
(https://www.apache.org/licenses/icla.pdf), but the authors will make
a waiver in this case and ensure things are addressed in a reasonable
timeframe before a release (not two years later :D ); correct me if I
am wrong but that is how I read Mick's suggestion.)

Last but not least, I think we probably need a separate discussion
thread on the preview release with all the details, as this is
something we all agree is nice to have, but we haven't done it before,
and the details are unclear. I think the current discussion thread
proves that calling an early preview alpha1 gives a wrong perspective
of the state of affairs to our users. But overall, I am not against
having an early preview; we need to shape only the form of it. I think
this is a very good call.

Best regards,

Ekaterina



On Wed, 25 Oct 2023 at 12:07, Jeremiah Jordan 
wrote:

> If we do a 5.1 release why not take it as an opportunity to release more
>> things. I am not saying that we will. Just that we should let that door
>> open.
>>
>
> Agreed.  This is the reason I brought up the possibility of not branching
> off 5.1 immediately.
>
>
> On Oct 25, 2023 at 3:17:13 AM, Benjamin Lerer  wrote:
>
>> The proposal includes 3 things:
>> 1. Do not include TCM and Accord in 5.0 to avoid delaying 5.0
>> 2. The next release will be 5.1 and will include only Accord and TCM
>> 3. Merge TCM and Accord right now in 5.1 (making an initial release)
>>
>> I am fine with question 1 and do not have a strong opinion on which way
>> to go.
>> 2. Means that every new feature will have to wait for post 5.1 even if it
>> is ready before 5.1 is stabilized and shipped. If we do a 5.1 release why
>> not take it as an opportunity to release more things. I am not saying that
>> we will. Just that we should let that door open.
>> 3. There is a need to merge TCM and Accord as maintaining those separate
>> branches is costly in terms of 

Re: Development Dependencies documentation.

2023-10-25 Thread Ekaterina Dimitrova
Hi Claude,
You are not wrong. Unfortunately, it is outdated. Abe Ratnofsky has a work
in progress patch. You might want to get in touch with him to finish it.
Best regards,
Ekaterina

On Wed, 25 Oct 2023 at 8:04, Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> I just had to change dependencies in Cassandra for the first  time and I
> think the documentation [1] is out of date.
>
> First I think most of the file edits are in the ".build" directory.
> Adding jars to the "lib" directory works until calling "ant realclean", so
> perhaps the instructions should include regenerating the "lib" folder after
> making the edits.
>
> If I am wrong please let me know, otherwise I will open a ticket and
> update the documentation.
>
> [1] https://cassandra.apache.org/_/development/dependencies.html
>


Re: CASSANDRA-18775 (Cassandra supported OSs)

2023-10-20 Thread Ekaterina Dimitrova
+1 on removal the whole lib if we are sure we don’t need it. Nothing better
than some healthy house cleaning

 -1 on partial removals

On Fri, 20 Oct 2023 at 17:34, David Capwell  wrote:

> +1 to drop the whole lib…
>
>
> On Oct 20, 2023, at 7:55 AM, Jeremiah Jordan 
> wrote:
>
> Agreed.  -1 on selectively removing any of the libs.  But +1 for removing
> the whole thing if it is no longer used.
>
> -Jeremiah
>
> On Oct 20, 2023 at 9:28:55 AM, Mick Semb Wever  wrote:
>
>> Does anyone see any reason _not_ to do this?
>>>
>>
>>
>> Thanks for bring this to dev@
>>
>> I see reason not to do it, folk do submit patches for other archs despite
>> us not formally maintaining and testing the code for those archs.  Some
>> examples are PPC64 Big Endian (CASSANDRA-7476), s390x (CASSANDRA-17723),
>> PPC64 Little Endian (CASSANDRA-7381), sparcv9 (CASSANDRA-6628).  Wrote this
>> on the ticket too.
>>
>> +1 for removing sigar altogether (as Brandon points out).
>>
>>
>


Re: [DISCUSS] putting versions into Deprecated annotations

2023-10-13 Thread Ekaterina Dimitrova
I’ve been told in the past not to remove public methods in a patch release
though.

On Fri, 13 Oct 2023 at 8:03, Benjamin Lerer  wrote:

> Could you point me some document / ML thread this was explicitly decided
>> in if you know of anything like that? It would be great if there was some
>> solid guidance on this.
>
>
> I am seeing it the other way around. Using Deprecated annotations make
> sense only if something is part of a public interface/API. Maintaining a
> public API represent a significant work and put some constraints on further
> evolution.
> By default most of the code of C* should be considered as internal and we
> should be able to modify it without going through a deprecation phase.
> One problem that we have is that we have never been clear, outside of some
> obvious stuff, about what code should be consider as APIs that need to go
> through a deprecation phase.
>
>
> Le ven. 13 oct. 2023 à 13:13, Miklosovic, Stefan via dev <
> dev@cassandra.apache.org> a écrit :
>
>> OK. That is definitely something to mention when we will approach the
>> second phase where  we decide what do with it but I humbly think we are not
>> there yet.
>>
>> Could you point me some document / ML thread this was explicitly decided
>> in if you know of anything like that? It would be great if there was some
>> solid guidance on this.
>>
>> 
>> From: Benjamin Lerer 
>> Sent: Friday, October 13, 2023 13:07
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>> I was asking because outside of configuration parameters and JMX calls,
>> the approach as far as I remember was to just change things without using
>> an annotation.
>>
>> Le ven. 13 oct. 2023 à 12:45, Miklosovic, Stefan via dev <
>> dev@cassandra.apache.org> a écrit :
>> Hi Benjamin,
>>
>> in other words, anything we have @Deprecated annotation on top of (or
>> anything you want to annotate with it). Does it help with the explanation?
>>
>> For the initial phase, I plan to just put "since" everywhere (into every
>> already existing @Deprecated annotation) and we leave out "forRemoval" in
>> Deprecated annotation for now as that is quite tricky to get right.
>>
>> I am confused what is considered to be removed and what we keep there for
>> ever even it is deprecated (referring to what Mick said in this thread that
>> forRemoval can not be by default true). After we map what technical debt we
>> have, we can summarize this and I bring it to the ML again for further
>> discussion what to actually remove and when.
>>
>> Regards
>>
>> 
>> From: Benjamin Lerer mailto:b.le...@gmail.com>>
>> Sent: Friday, October 13, 2023 12:19
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] putting versions into Deprecated annotations
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>> I am a bit confused by the starting point of this discussion: "When we
>> deprecate APIs / methods"
>> What are we exactly calling APIs/methods? It is really unclear to me what
>> we are talking about here.
>>
>> Le jeu. 12 oct. 2023 à 02:38, Francisco Guerrero >  fran...@apache.org>>> a écrit :
>>
>>
>> On 2023/10/11 16:59:35 Maxim Muzafarov wrote:
>> > Francisco,
>> >
>> > I agree with your vision of the deprecation comments and actually, I
>> > think we should recommend doing it that way for the cases where it is
>> > applicable on our code-style page, but when things get to the
>> > implementation phase there are some obstacles that are not easy to
>> > overcome.
>>
>> Yeah, I agree that this should be recommended rather than enforced via
>> some checkstyle rule. However, reviewers should be aware of this
>> recommendation in the code-style page.
>>
>> >
>> > So, adding the MissingDeprecated will emphasize to a developer the
>> > need to describe the deprecation reasons in comments, but
>> > unfortunately, there is no general pattern that we can enforce for
>> > every such description message and/or automatically validate its
>> > meaningfulness. There may be no alternative for a deprecated field, or
>> > it may simply be marked for deletion, so the pattern is slightly
>> > different in this case.
>>
>>
>> +1 for adding the MissingDeprecated rule
>>
>> > Another problem is how to add meaningful comments to the deprecated
>> > annotations that we already have in the code, since we can't enforce
>> > checkstyle rules only on newly added code. This is a very exhausting
>> > process with no 100% guarantee of accuracy - some of the commits don't
>> > 

Re: [VOTE] Accept java-driver

2023-10-03 Thread Ekaterina Dimitrova
+1(nb), thank you for making it happen!

На вторник, 3 октомври 2023 г. Dinesh Joshi  написа:

> +1
>
> This is great for the project. Thank you for all the hard work everyone
> put into this! It has been a long journey to get to this point.
>
> Dinesh
>
> On Oct 2, 2023, at 9:53 PM, Mick Semb Wever  wrote:
>
> 
> The donation of the java-driver is ready for its IP Clearance vote.
> https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
>
> The SGA has been sent to the ASF.  This does not require acknowledgement
> before the vote.
>
> Once the vote passes, and the SGA has been filed by the ASF Secretary, we
> will request ASF Infra to move the datastax/java-driver as-is to
> apache/java-driver
>
> This means all branches and tags, with all their history, will be kept.  A
> cleaning effort has already cleaned up anything deemed not needed.
>
> Background for the donation is found in CEP-8: https://cwiki.apache.
> org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>
> PMC members, please take note of (and check) the IP Clearance requirements
> when voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
>
> regards,
> Mick
>
>


Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-22 Thread Ekaterina Dimitrova
For my understanding, isn’t it gonna be an issue to be copyrighted also to
a single person? For the same reasons?

On Fri, 22 Sep 2023 at 7:59, Mick Semb Wever  wrote:

>
>
> Just for my understanding on this. Is the issue that the code has a
>> copyright header on it or that it is copyright to a corporate entity?
>>
>
>
> The potential issue here is about dependence upon one vendor (or
> commercial actor).
> If the project is not usable without a specific piece of work (library)
> that is controlled and maintained elsewhere, and exercising our freedom to
> rewrite/fork is difficult, the project isn't really independent.  Being
> independent is an important tenant for ASF projects.
>
> I don't see this being an issue with jamm or jvector.  But I do think it's
> important to check.
>
>


Re: [DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-09-21 Thread Ekaterina Dimitrova
+1 from me too. Moreover, this work has started as part of the test efforts
and identifying weak points during the 4.0 testing, if I recall correctly.
5.0 sounds like a good place to land. Thank you David and everyone else
involved for your efforts!

On Thu, 21 Sep 2023 at 1:01, Berenguer Blasi 
wrote:

> +1 I agree with Brandon. It's more like a bug imo.
> On 20/9/23 21:42, Caleb Rackliffe wrote:
>
> +1 on a 5.0 backport
>
> On Wed, Sep 20, 2023 at 2:26 PM Brandon Williams  wrote:
>
>> I think it could be argued that not retrying messages is a bug, I am
>> +1 on including this in 5.0.
>>
>> Kind Regards,
>> Brandon
>>
>> On Tue, Sep 19, 2023 at 1:16 PM David Capwell  wrote:
>> >
>> > To try to get repair more stable, I added optional retry logic (patch
>> is still in review) to a handful of critical repair verbs.  This patch is
>> disabled by default but allows you to opt-in to retries so ephemeral issues
>> don’t cause a repair to fail after running for a long time (assuming they
>> resolve within the retry window). There are 2 protocol level changes to
>> enable this: VALIDATION_RSP and SYNC_RSP now send an ACK (if the sender
>> doesn’t attach a callback, these ACKs get ignored in all versions; see
>> org.apache.cassandra.net.ResponseVerbHandler#doVerb and
>> Verb.REPAIR_RSP).  Given that we have already forked, I believe we would
>> need to give a waiver to allow this patch due to this change.
>> >
>> > The patch was written on trunk, but figured back porting 5.0 would be
>> rather trivial and this was brought up during the review, so floating this
>> to a wider audience.
>> >
>> > If you look at the patch you will see that it is very large, but this
>> is only to make testing of repair coordination easier and deterministic,
>> the biggest code changes are:
>> >
>> > 1) Moving from ActiveRepairService.instance to
>> ActiveRepairService.instance() (this is the main reason so many files were
>> touched; this was needed so unit tests don’t load the whole world)
>> > 2) Repair no longer reaches into global space and instead is provided
>> the subsystems needed to perform repair; this change is local to repair code
>> >
>> > Both of these changes were only for testing as they allow us to
>> simulate 1k repairs in around 15 seconds with 100% deterministic execution.
>>
>


Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Ekaterina Dimitrova
Jeff, isn’t this ok as long as it is used only in tests? If we are not sure
we can open a Jira to legal?

On Wed, 13 Sep 2023 at 12:23, Jeff Jirsa  wrote:

> Just to be clear - this repo?
> https://github.com/haifengl/smile/blob/master/LICENSE
>
> That shows GPL + Commercial?
>
>
>
> On Wed, Sep 13, 2023 at 9:10 AM Brandon Williams  wrote:
>
>> I don't see any problem with this, +1.
>>
>> Kind Regards,
>> Brandon
>>
>>
>> On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson 
>> wrote:
>>
>>> CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
>>> Storage-Attached Indexes] uses the smile-nlp library
>>> (com.github.haifengl.smile-nlp) in its testing to allow the creation of
>>> word2vec embeddings for valid input into the HNSW graph index.
>>>
>>> The reason for this library is that we found that using random vectors
>>> in testing produced very inconsistent results. Using the smile-nlp word2vec
>>> implementation with the glove.3k.50d library produces repeatable results.
>>>
>>> Does anyone have any objections to the use of this library as a test
>>> only dependency?
>>> --
>>> [image: DataStax Logo Square]  *Mike Adamson*
>>> Engineering
>>>
>>> +1 650 389 6000 <16503896000> | datastax.com 
>>> Find DataStax Online: [image: LinkedIn Logo]
>>> 
>>>[image: Facebook Logo]
>>> 
>>>[image: Twitter Logo]    [image: RSS
>>> Feed]    [image: Github Logo]
>>> 
>>>
>>>


Re: [DISCUSS] Update default disk_access_mode to mmap_index_only on 5.0

2023-09-06 Thread Ekaterina Dimitrova
Thanks for starting this discussion, Paulo!

Shall we also share this thread with @user?

On Wed, 6 Sep 2023 at 16:35, C. Scott Andreas  wrote:

> Supportive of switching the default to mmap_index_only as well.
>
> I don’t have numbers handy to share, but my experience has been
> significantly lower read latency and I wouldn’t run with auto. I’ve also
> not observed substantial heap pressure after switching - it was strictly an
> improvement.
>
> - Scott
>
> —
> Mobile
>
> On Sep 6, 2023, at 8:50 AM, Paulo Motta  wrote:
>
> 
>
> Hi,
>
> I've been bitten by OOMs with disk_access_mode:auto/mmap that were fixed
> by changing to disk_access_mode:mmap_index_only. In a particular benchmark
> I got 5x more read throughput on 3.11.x with disk_access_mode:
> mmap_index_only vs disk_access_mode: auto/mmap.
>
> Changing disk_access_mode to mmap_index_only seems to be a common
> recommendation on forums[1][2][3][4] and slack (find by searching
> disk_access_mode in the #cassandra channel on https://the-asf.slack.com/).
>
> It's not clear to me when using the default disk_access_mode:auto/mmap is
> beneficial, perhaps only when the read set fits in memory? Mick seems to
> think on CASSANDRA-15531 [5], that mmap_index_only has a higher heap cost
> and should be only used when warranted. However it's not uncommon to see
> people being bitten with OOMs or lower read performance due to the default
> disk_access_mode, so it makes me think it's not the best fool-proof default.
>
> Should we consider changing default "auto" behavior of "disk_access_mode"
> to be "mmap_index_only" instead of "mmap" in 5.0 since it's likely safer
> and perhaps more performant?
>
> Thanks,
>
> Paulo
>
> [1]
> https://stackoverflow.com/questions/72272035/troubleshooting-and-fixing-cassandra-oom-issue
> [2] https://phabricator.wikimedia.org/T137419
> [3] https://stackoverflow.com/a/55975471
> [4]
> https://support.datastax.com/s/article/FAQ-Use-of-disk-access-mode-in-DSE-51-and-earlier
> [5] https://issues.apache.org/jira/browse/CASSANDRA-15531
>
>


Re: [DISCUSSION] Dependency management in our first alpha release

2023-08-30 Thread Ekaterina Dimitrova
Thank you for your work and the prompt update, Maxim!

We seem to have a lazy consensus here so I suggest we continue all
operational discussions on the tickets themselves (not only the mentioned
ones but also any other dependency management tickets that might be open).
My understanding is that people are happy with some careful housekeeping in
the cassandra-5.0 branch while we are still not feature-complete.

Best regards,
Ekaterina

On Tue, 29 Aug 2023 at 16:23, Maxim Muzafarov  wrote:

> A few updates.
>
> We've posted a message to the user-list asking the question about the
> use of the metrics-reporter-config library to make sure we are on the
> safe side with the removal:
> https://lists.apache.org/thread/c4m3tc08zhd4d41zs05jcdkr3gjwlhno
>
> The issue for the `org.caffinitas.ohc:ohc-core-j8` is here, we'll try
> to handle it:
> https://issues.apache.org/jira/browse/CASSANDRA-18799
>
> On Fri, 25 Aug 2023 at 18:39, Ekaterina Dimitrova 
> wrote:
> >
> > Thank you all. We are going to continue with those tickets and related
> problems then.
> >
> > On Maxim's question:
> > "Do we need a separate issue for 'org.caffinitas.ohc:ohc-core-j8' or we
> should handle everything under [2]?"
> >
> > It depends on whether someone has the time to sit and deal with the
> complete list as soon as possible or we should do divide and conquer. It
> will also require some archeology and potential discussions with users in
> some cases, etc.
> >
> > Best regards,
> > Ekaterina
> >
> >
> >
> > On Wed, 23 Aug 2023 at 17:29, Abe Ratnofsky  wrote:
> >>
> >> > I also want to hear if Abe still has concerns about not following
> deprecation process here.
> >>
> >> I support removing the library on an expedited schedule, rather than
> waiting for a full major of deprecation. We still have a large surface for
> metrics integrations, and users who depended on metrics-reporter-config
> will have a path forward if they need similar functionality.
> >>
> >> On Aug 23, 2023, at 07:28, Ekaterina Dimitrova 
> wrote:
> >>
> >> I also want to hear if Abe still has concerns about not following
> deprecation process here.
>


Re: [DISCUSSION] Dependency management in our first alpha release

2023-08-25 Thread Ekaterina Dimitrova
Thank you all. We are going to continue with those tickets and related
problems then.

On Maxim's question:
"Do we need a separate issue for 'org.caffinitas.ohc:ohc-core-j8' or we
should handle everything under [2]?"

It depends on whether someone has the time to sit and deal with the
complete list as soon as possible or we should do divide and conquer. It
will also require some archeology and potential discussions with users in
some cases, etc.

Best regards,
Ekaterina



On Wed, 23 Aug 2023 at 17:29, Abe Ratnofsky  wrote:

> > I also want to hear if Abe still has concerns about not following
> deprecation process here.
>
> I support removing the library on an expedited schedule, rather than
> waiting for a full major of deprecation. We still have a large surface for
> metrics integrations, and users who depended on metrics-reporter-config
> will have a path forward if they need similar functionality.
>
> On Aug 23, 2023, at 07:28, Ekaterina Dimitrova 
> wrote:
>
> I also want to hear if Abe still has concerns about not following
> deprecation process here.
>
>


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-25 Thread Ekaterina Dimitrova
To close this one. The patch to fix javadoc and enable in CI under the ant
check the failure under broken javadoc is ready. After considering the
current project status, we decided to delay committing it until 5.0 is
feature complete and all bug CEPs are in.

Best regards,
Ekaterina

On Mon, 21 Aug 2023 at 15:30, Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> You can play with this in
>
> File -> Settings -> Editor -> Inspections -> Javadoc
>
> HTML problems in Javadoc -> Severity: Error
>
> This will mark invalid html rendering of Javadocs as errorneous.
>
> There are other checkboxes as well. I am not saying it solves it all but
> there is some room for tweaking this if one wishes.
>
> ____
> From: Ekaterina Dimitrova 
> Sent: Monday, August 21, 2023 20:05
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> My only slight concern is that almost all the errors fixed in
> CASSANDRA-18717 are such that they don’t trigger error or warning in the
> IDE. This means people will realize there is something to be done
> differently around javadoc only after they run the check task. I figured
> that before we commit to anything, it is good this point to be shared for
> visibility.
> Maxim highlighted for me that this is already the case with some of the
> other checks we already have added with checkstyle. He looks into the
> problem in CASSANDRA-18277, mentioning here in case there are committers
> with cycles to help push it through
>
> On Thu, 17 Aug 2023 at 12:46, Mick Semb Wever  m...@apache.org>> wrote:
> +1 to `ant check` (and to failing on it).
>
> On Thu, 17 Aug 2023 at 18:43, Ekaterina Dimitrova  <mailto:e.dimitr...@gmail.com>> wrote:
> Agreed with Maxim. If we fail CI on the javadoc task, in my opinion it
> should be added to ant check probably.
>
> On Thu, 17 Aug 2023 at 12:40, Maxim Muzafarov  mmu...@apache.org>> wrote:
> We have "artifacts" ant target that depends on "checks" and "gen-doc",
> from my point of view, it would be nice to have the "artifacts"
> depending on "javadocs" as well. That way we can be sure that
> everything related is in good order.
>
> On Thu, 17 Aug 2023 at 18:05, Brandon Williams  dri...@gmail.com>> wrote:
> >
> > If everything is good now, I think CI should fail if it regresses so
> > we can keep it this way.
> >
> > Kind Regards,
> > Brandon
> >
> > On Thu, Aug 17, 2023 at 10:49 AM Ekaterina Dimitrova
> > mailto:e.dimitr...@gmail.com>> wrote:
> > >
> > > In CASSANDRA-18717 Maxim posted the javadoc fix. Stefan already made a
> first pass of review so it seems we are not removing this ant task as it
> was already fixed and there are people who find value of keeping it.
> > > My question is do we want to fail CI if this regress or not?
> > >
> > > On Thu, 3 Aug 2023 at 22:44, Josh McKenzie  <mailto:jmcken...@apache.org>> wrote:
> > >>
> > >> the problem is that the javadoc task is not given the attention
> > >> it deserves. The failonerror is currently 'false' and the task itself
> > >> is not a part of any build and/or release processes
> > >>
> > >>
> > >> I just wrote a tool that explores the distribution of keys across
> multiple sstables, I needed some of the tools classes but not much more.
> Javadocs would have made that easy
> > >>
> > >> You know what? I agree with all that. If I had to jump into the
> source for the JDK or other libraries every time I needed to work with them
> that'd be annoying.
> > >>
> > >> BTW, I have managed to fix all the javadoc errors.
> > >>
> > >> Of course you have. :) Industrious as usual Maxim; thanks for
> tackling that!
> > >>
> > >> So yeah. Depending on how long javadocs take to generate, I think
> having them as part of our pre-commit rotation makes sense. Could even add
> them to our site with something like an "API" section (gasp) here:
> https://cassandra.apache.org/doc/latest/<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcassandra.apache.org%2Fdoc%2Flatest%2F=05%7C01%7CStefan.Miklosovic%40netapp.com%7Cd3af424ebf634336d60108dba2715985%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638282379959793803%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC

Re: [VOTE] Release Apache Cassandra 5.0-alpha1

2023-08-25 Thread Ekaterina Dimitrova
+1

On Fri, 25 Aug 2023 at 11:14, Mick Semb Wever  wrote:

>
> Proposing the test build of Cassandra 5.0-alpha1 for release.
>
> DISCLAIMER, this alpha release does not contain the expected 5.0
> features: Vector Search (CEP-30), Transactional Cluster Metadata
> (CEP-21) and Accord Transactions (CEP-15).  These features will land
> in a later alpha release.
>
> Please also note that this is an alpha release and what that means,
> further info at
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>
> sha1: 62cb03cc7311384db6619a102d1da6a024653fa6
> Git: https://github.com/apache/cassandra/tree/5.0-alpha1-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1314/org/apache/cassandra/cassandra-all/5.0-alpha1/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-alpha1/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0-alpha1-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/5.0-alpha1-tentative/NEWS.txt
>


[DISCUSSION] Dependency management in our first alpha release

2023-08-23 Thread Ekaterina Dimitrova
Hi everyone,

I wanted to clarify something. I understood dependency updates/cleaning can
also be done in an alpha release if they lead to minimal user-facing
changes, if any at all. I agree with that in our first 5.0 alpha release
because we are not yet feature-complete. It is a good time for people to do
a bit of housekeeping and tighten some loose ends.
Do you think this is a valid statement? Thoughts?
I wanted to clear this topic as we have a few in-flight tickets/discussions:

   1. - CASSANDRA-14667
    - upgrade
   dropwizard metrics, for which to be accommodated, Bret is creating a new
   3.11.4 drivers version. So we should update the driver. I am unsure how
   much effort and change it will be on our side to update the drivers though.
   Maxim, did you try it? Any thoughts?
   2.
  1.
 1. - CASSANDRA-18789
  -
 commons-lang3, a pretty non-controversial bump with two
versions. The one
 we are on is tested to Java 11, and the newest one tests up
to JDK17 and
 beyond. This is enough reason for me honestly to update it.
 2.
1.
3.
  2. - In [1], Maxim mentioned that we can clean
  org.caffinitas.ohc:ohc-core-j8.


- In [2], Stefan and Mick made a point that we could even remove in 5.0
metrics-reporter-config(CASSANDRA-18743) and commons-codec(CASSANDRA-18772)
 I think this should be a good idea - let's make some noise in the user
group to ensure people are aware and no one raises any significant concerns
and then clean those two. I also want to hear if Abe still has concerns
about not following deprecation process here.

And if we decide, we can find a few more loose ends to deal with. I am sure.

Looking forward to your feedback and thoughts.

Best regards,
Ekaterina

   1.


[1] https://lists.apache.org/thread/9m1vz5qyows97wlppkwk1fd8386rj9q1
[2] https://lists.apache.org/thread/9m1vz5qyows97wlppkwk1fd8386rj9q1


[DISCUSSION] News and announcements, help needed

2023-08-21 Thread Ekaterina Dimitrova
Hey friends,
The 5.0 alpha test build is out for testing! Great! Did you also have the
chance to attend any of the latest town halls?

Ok, it is time to say where I am going with this short intro...
Our friends from Constantia are working hard to help us spread the news and
promote all the great work we do in this community. They open Jira tickets
and PRs (not more than 1-2 per week, if any) to add announcements around
events, new features, town hall replays, etc. on the Cassandra website. It
requires a committer to do the review and push things through. The
publishing process on the Cassandra website underwent certain
transformations, and now it requires less effort (more on that below). I
wanted to raise the visibility and ask for committers' involvement; if you
see a ticket/PR and have 15 minutes, please, try to push it through.
Currently, some PRs can take up to 10 days to get committed. Is it possible
to cut that to 5 so we can spread any news on time?

For example, what it took me to correct a page on the website the other
week:

We have pre-commit CI in GitHub for cassandra-website commits; for example,
in CASSANDRA-18618 in my fork, the commits had checks like this:
https://github.com/ekaterinadimitrova2/cassandra-website/actions/runs/5825288289/job/15796474462

And it generates a (just top-level) website in the _generated branch, e.g.,
https://github.com/ekaterinadimitrova2/cassandra-website/blob/18738-trunk_generated/content/_/development/how_to_commit.html

I can take the raw URL of that and put it into raw.githack.com, e.g.,
https://raw.githack.com/ekaterinadimitrova2/cassandra-website/18738-trunk_generated/content/_/development/how_to_commit.html
to preview the generated output.

The rest is as before - we stage the changes and then push them live when
Jenkins completes. (full details in the website readme, I will also be
happy to guide anyone who is looking to lend a hand with this effort)

Feel free to let me know if you have any questions, concerns, or
suggestions to improve the process. Most discussions around the website
happen in ASF Slack #cassandra-website.

See you at the town hall on Thursday! :-)


Best regards,

Ekaterina


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-21 Thread Ekaterina Dimitrova
My only slight concern is that almost all the errors fixed in
CASSANDRA-18717 are such that they don’t trigger error or warning in the
IDE. This means people will realize there is something to be done
differently around javadoc only after they run the check task. I figured
that before we commit to anything, it is good this point to be shared for
visibility.
Maxim highlighted for me that this is already the case with some of the
other checks we already have added with checkstyle. He looks into the
problem in CASSANDRA-18277, mentioning here in case there are committers
with cycles to help push it through

On Thu, 17 Aug 2023 at 12:46, Mick Semb Wever  wrote:

> +1 to `ant check` (and to failing on it).
>
> On Thu, 17 Aug 2023 at 18:43, Ekaterina Dimitrova 
> wrote:
>
>> Agreed with Maxim. If we fail CI on the javadoc task, in my opinion it
>> should be added to ant check probably.
>>
>> On Thu, 17 Aug 2023 at 12:40, Maxim Muzafarov  wrote:
>>
>>> We have "artifacts" ant target that depends on "checks" and "gen-doc",
>>> from my point of view, it would be nice to have the "artifacts"
>>> depending on "javadocs" as well. That way we can be sure that
>>> everything related is in good order.
>>>
>>> On Thu, 17 Aug 2023 at 18:05, Brandon Williams  wrote:
>>> >
>>> > If everything is good now, I think CI should fail if it regresses so
>>> > we can keep it this way.
>>> >
>>> > Kind Regards,
>>> > Brandon
>>> >
>>> > On Thu, Aug 17, 2023 at 10:49 AM Ekaterina Dimitrova
>>> >  wrote:
>>> > >
>>> > > In CASSANDRA-18717 Maxim posted the javadoc fix. Stefan already made
>>> a first pass of review so it seems we are not removing this ant task as it
>>> was already fixed and there are people who find value of keeping it.
>>> > > My question is do we want to fail CI if this regress or not?
>>> > >
>>> > > On Thu, 3 Aug 2023 at 22:44, Josh McKenzie 
>>> wrote:
>>> > >>
>>> > >> the problem is that the javadoc task is not given the attention
>>> > >> it deserves. The failonerror is currently 'false' and the task
>>> itself
>>> > >> is not a part of any build and/or release processes
>>> > >>
>>> > >>
>>> > >> I just wrote a tool that explores the distribution of keys across
>>> multiple sstables, I needed some of the tools classes but not much more.
>>> Javadocs would have made that easy
>>> > >>
>>> > >> You know what? I agree with all that. If I had to jump into the
>>> source for the JDK or other libraries every time I needed to work with them
>>> that'd be annoying.
>>> > >>
>>> > >> BTW, I have managed to fix all the javadoc errors.
>>> > >>
>>> > >> Of course you have. :) Industrious as usual Maxim; thanks for
>>> tackling that!
>>> > >>
>>> > >> So yeah. Depending on how long javadocs take to generate, I think
>>> having them as part of our pre-commit rotation makes sense. Could even add
>>> them to our site with something like an "API" section (gasp) here:
>>> https://cassandra.apache.org/doc/latest/.
>>> > >>
>>> > >> Would certainly help motivate us to clarify the whole "what is an
>>> external API we're committing to or not" discussions.
>>> > >>
>>> > >> On Thu, Aug 3, 2023, at 6:09 PM, Ekaterina Dimitrova wrote:
>>> > >>
>>> > >> Thank you Maxim. There is CASSANDRA-18717, I guess that patch
>>> should go there. Keeping the task or not, the fix of the docs should go in
>>> anyway IMHO. I will not be available the next few days, but I can help with
>>> reviews when I am back.
>>> > >>
>>> > >> On Thu, 3 Aug 2023 at 17:44, Maxim Muzafarov 
>>> wrote:
>>> > >>
>>> > >> Yes, I agree. The javadoc task should be part of our CI if we decide
>>> > >> to keep it, to keep it buildable at all times.
>>> > >>
>>> > >>
>>> > >> BTW, I have managed to fix all the javadoc errors.
>>> > >> I have tested the task for both jdk11 and jdk17.
>>> > >>
>>> > >> Changes are here:
>>> > >>
>>> https://github.com/apache/cassandra/compare/trunk...Mm

[DISCUSSION] CASSANDRA-18772 - removal of commons-codec on trunk

2023-08-17 Thread Ekaterina Dimitrova
Hi everyone,

I propose we remove commons-codec on trunk.
The only usage I found was from CASSANDRA-12790
* - *Support
InfluxDb metrics reporter configuration, which relied on commons-codec and
metrics-reporter-config, which will be removed as part of CASSANDRA-18743.
The only question is whether we can remove those two dependencies on trunk,
considering it is 5.1, or do we need to wait until 6.0.

Best regards,
Ekaterina


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-17 Thread Ekaterina Dimitrova
Agreed with Maxim. If we fail CI on the javadoc task, in my opinion it
should be added to ant check probably.

On Thu, 17 Aug 2023 at 12:40, Maxim Muzafarov  wrote:

> We have "artifacts" ant target that depends on "checks" and "gen-doc",
> from my point of view, it would be nice to have the "artifacts"
> depending on "javadocs" as well. That way we can be sure that
> everything related is in good order.
>
> On Thu, 17 Aug 2023 at 18:05, Brandon Williams  wrote:
> >
> > If everything is good now, I think CI should fail if it regresses so
> > we can keep it this way.
> >
> > Kind Regards,
> > Brandon
> >
> > On Thu, Aug 17, 2023 at 10:49 AM Ekaterina Dimitrova
> >  wrote:
> > >
> > > In CASSANDRA-18717 Maxim posted the javadoc fix. Stefan already made a
> first pass of review so it seems we are not removing this ant task as it
> was already fixed and there are people who find value of keeping it.
> > > My question is do we want to fail CI if this regress or not?
> > >
> > > On Thu, 3 Aug 2023 at 22:44, Josh McKenzie 
> wrote:
> > >>
> > >> the problem is that the javadoc task is not given the attention
> > >> it deserves. The failonerror is currently 'false' and the task itself
> > >> is not a part of any build and/or release processes
> > >>
> > >>
> > >> I just wrote a tool that explores the distribution of keys across
> multiple sstables, I needed some of the tools classes but not much more.
> Javadocs would have made that easy
> > >>
> > >> You know what? I agree with all that. If I had to jump into the
> source for the JDK or other libraries every time I needed to work with them
> that'd be annoying.
> > >>
> > >> BTW, I have managed to fix all the javadoc errors.
> > >>
> > >> Of course you have. :) Industrious as usual Maxim; thanks for
> tackling that!
> > >>
> > >> So yeah. Depending on how long javadocs take to generate, I think
> having them as part of our pre-commit rotation makes sense. Could even add
> them to our site with something like an "API" section (gasp) here:
> https://cassandra.apache.org/doc/latest/.
> > >>
> > >> Would certainly help motivate us to clarify the whole "what is an
> external API we're committing to or not" discussions.
> > >>
> > >> On Thu, Aug 3, 2023, at 6:09 PM, Ekaterina Dimitrova wrote:
> > >>
> > >> Thank you Maxim. There is CASSANDRA-18717, I guess that patch should
> go there. Keeping the task or not, the fix of the docs should go in anyway
> IMHO. I will not be available the next few days, but I can help with
> reviews when I am back.
> > >>
> > >> On Thu, 3 Aug 2023 at 17:44, Maxim Muzafarov 
> wrote:
> > >>
> > >> Yes, I agree. The javadoc task should be part of our CI if we decide
> > >> to keep it, to keep it buildable at all times.
> > >>
> > >>
> > >> BTW, I have managed to fix all the javadoc errors.
> > >> I have tested the task for both jdk11 and jdk17.
> > >>
> > >> Changes are here:
> > >>
> https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:javadoc_build
> > >>
> > >> On Thu, 3 Aug 2023 at 21:20, Ekaterina Dimitrova <
> e.dimitr...@gmail.com> wrote:
> > >> >
> > >> > Thank you Maxim,
> > >> >
> > >> > “
> > >> >
> > >> > From my point of
> > >> > view, the problem is that the javadoc task is not given the
> attention
> > >> > it deserves. The failonerror is currently 'false' and the task
> itself
> > >> > is not a part of any build and/or release processes, correct me if
> I'm
> > >> > wrong.
> > >> >
> > >> > So,
> > >> > 1. Fix warnings/errors;
> > >> > 2. Make the javadoc task part of the build (e.g. put it under
> > >> > 'artifacts'), or make it part of the release process that is
> regularly
> > >> > checked on the CI;
> > >> > 3. Publish/deploy the javadoc htmls for release in the special
> > >> > directory of the cassandra website to give them a chance of being
> > >> > indexed;“
> > >> >
> > >> > This is aligned with what I saw and the two options mentioned at
> the beginning - if we decide to keep it we should fix things and add the
>

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-17 Thread Ekaterina Dimitrova
In CASSANDRA-18717 Maxim posted the javadoc fix. Stefan already made a
first pass of review so it seems we are not removing this ant task as it
was already fixed and there are people who find value of keeping it.
My question is do we want to fail CI if this regress or not?

On Thu, 3 Aug 2023 at 22:44, Josh McKenzie  wrote:

> the problem is that the javadoc task is not given the attention
> it deserves. The failonerror is currently 'false' and the task itself
> is not a part of any build and/or release processes
>
>
> I just wrote a tool that explores the distribution of keys across multiple
> sstables, I needed some of the tools classes but not much more.  Javadocs
> would have made that easy
>
> You know what? I agree with all that. If I had to jump into the source for
> the JDK or other libraries every time I needed to work with them that'd be
> annoying.
>
> BTW, I have managed to fix all the javadoc errors.
>
> Of course you have. :) Industrious as usual Maxim; thanks for tackling
> that!
>
> So yeah. Depending on how long javadocs take to generate, I think having
> them as part of our pre-commit rotation makes sense. Could even add them to
> our site with something like an "API" section (gasp) here:
> https://cassandra.apache.org/doc/latest/.
>
> Would certainly help motivate us to clarify the whole "what is an external
> API we're committing to or not" discussions.
>
> On Thu, Aug 3, 2023, at 6:09 PM, Ekaterina Dimitrova wrote:
>
> Thank you Maxim. There is CASSANDRA-18717, I guess that patch should go
> there. Keeping the task or not, the fix of the docs should go in anyway
> IMHO. I will not be available the next few days, but I can help with
> reviews when I am back.
>
> On Thu, 3 Aug 2023 at 17:44, Maxim Muzafarov  wrote:
>
> Yes, I agree. The javadoc task should be part of our CI if we decide
> to keep it, to keep it buildable at all times.
>
>
> BTW, I have managed to fix all the javadoc errors.
> I have tested the task for both jdk11 and jdk17.
>
> Changes are here:
>
> https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:javadoc_build
>
> On Thu, 3 Aug 2023 at 21:20, Ekaterina Dimitrova 
> wrote:
> >
> > Thank you Maxim,
> >
> > “
> >
> > From my point of
> > view, the problem is that the javadoc task is not given the attention
> > it deserves. The failonerror is currently 'false' and the task itself
> > is not a part of any build and/or release processes, correct me if I'm
> > wrong.
> >
> > So,
> > 1. Fix warnings/errors;
> > 2. Make the javadoc task part of the build (e.g. put it under
> > 'artifacts'), or make it part of the release process that is regularly
> > checked on the CI;
> > 3. Publish/deploy the javadoc htmls for release in the special
> > directory of the cassandra website to give them a chance of being
> > indexed;“
> >
> > This is aligned with what I saw and the two options mentioned at the
> beginning - if we decide to keep it we should fix things and add the task
> to CI, if we don’t because no one wants the html pages - then better to
> remove it this ant task.
> > On your comment about 100 errors - it seems they are more. There is a
> cap of 100 but when you fix them, more errors appear.
> > Further discussion can be found at CASSANDRA-17687
> >
> > On Thu, 3 Aug 2023 at 14:21, Maxim Muzafarov  wrote:
> >>
> >> Personally, I find javadocs quite useful, especially when htmls are
> >> indexed by search engines, which in turn increases the chances of
> >> finding the right answer faster (I have seen a lot of useful javadocs
> >> in the source code).
> >>
> >> I have done a quick build of the javadocs:
> >>
> >>   [javadoc] Building index for all the packages and classes...
> >>   [javadoc] Building index for all classes...
> >>   [javadoc] Building index for all classes...
> >>   [javadoc] 100 errors
> >>   [javadoc] 100 warnings
> >>
> >> 100 errors is no big deal and can be easily fixed. From my point of
> >> view, the problem is that the javadoc task is not given the attention
> >> it deserves. The failonerror is currently 'false' and the task itself
> >> is not a part of any build and/or release processes, correct me if I'm
> >> wrong.
> >>
> >> So,
> >> 1. Fix warnings/errors;
> >> 2. Make the javadoc task part of the build (e.g. put it under
> >> 'artifacts'), or make it part of the release process that is regularly
> >> checked on the CI;
> >> 3. Publish/deploy the javadoc htmls for release in the s

Re: [DISCUSS] CASSANDRA-18743 Deprecation of metrics-reporter-config

2023-08-11 Thread Ekaterina Dimitrova
“ The rationale for this proposed deprecation is that the upcoming 5.0
release is a good time to evaluate dependencies that are no longer
receiving updates and will become risks in the future.”

Thank you for raising it, I support your proposal for deprecation

On Fri, 11 Aug 2023 at 8:55, Abe Ratnofsky  wrote:

> Hey folks,
>
> Opening a thread to get input on a proposed dependency deprecation in 5.0:
> metrics-reporter-config has been archived for 3 years and not updated in
> nearly 6 years.
>
> This project has a minor security issue with its usage of unsafe YAML
> loading via snakeyaml’s unprotected Constructor:
> https://nvd.nist.gov/vuln/detail/CVE-2022-1471
>
> This CVE is reasonable to suppress, since operators should be able to
> trust their YAML configuration files.
>
> The rationale for this proposed deprecation is that the upcoming 5.0
> release is a good time to evaluate dependencies that are no longer
> receiving updates and will become risks in the future.
>
> https://issues.apache.org/jira/browse/CASSANDRA-18743
>
> —
> Abe
>
>


Re: Timing of the last releases of Cassandra 3.0.x / 3.11.x

2023-08-09 Thread Ekaterina Dimitrova
I agree, we have enough for a release of 3.0 and 3.11 and if there are
users expecting it, I don’t see a reason why not to cut releases. Shall we
do it together with 5.0 alpha? There was a mention that one might be cut
soon, as far as I recall. Just thinking it might be easier to do the
releases all together, from operational perspective

On Wed, 9 Aug 2023 at 9:19, Brandon Williams  wrote:

> On Wed, Aug 9, 2023 at 3:58 AM Miklosovic, Stefan
>  wrote:
> > There is a user on Slack asking for a release  of 3.11.16 because of
> 16555.
>
> If people are showing interest, I think it's time.
>
> > If we release right now, we might potentially do one more release before
> 5.0 is GA.
>
> I think 5.0 is still far enough away from users' hands that we
> shouldn't factor it into decisions about other releases.  We also
> don't have to do a final release before 5.0, we can release anytime we
> like.
>


Re: [DISCUSS] Creating a 5.0 landing page

2023-08-03 Thread Ekaterina Dimitrova
I honestly didn’t realize events page was added already. Thank you, Josh!

So yes, in that case I guess that is what I am asking - shall we add there
town halle, etc?

On Thu, 3 Aug 2023 at 22:37, Josh McKenzie  wrote:

> We actually already have an events page:
> https://cassandra.apache.org/_/events.html; not sure if you were saying
> we should add one Ekaterina or saying we should add this content there. +1
> to the content there and having a landing page that points there +
> integrating meetups, town halls, etc.
>
> Community -> Events on the menu up top in case someone missed it.
>
> On Thu, Aug 3, 2023, at 4:21 PM, Ekaterina Dimitrova wrote:
>
>
> Hi Hugh,
>
> Thank you for reaching out. I think this is a great idea. Also, great
> timing, considering the community is discussing a potential 5.0 alpha
> release soon.
>
> It seems to me you actually suggest more than one page?
> 1) 5.0 and new features - could this be an update of the What’s new page?
> -
> https://cassandra.apache.org/doc/trunk/cassandra/new/index.html
> Adding also links to some of the talks sounds great to me.
> 2) Dedicated events page? We were using the Blogs page before but I don’t
> think it is a bad idea to split Blog posts from Events page.
>
> Thank you
> Ekaterina
>
>
> On Wed, 2 Aug 2023 at 21:03, Hugh Lashbrooke  wrote:
>
> With the upcoming release of Apache Cassandra 5.0, I’d like to create a
> landing page for the release and what that could look like.
>
> The landing page would be intended to educate users about what is coming
> up in this important release, highlighting why upgrading will be valuable
> to them, as well as guiding them into more community activities, such as
> Town Halls, where they can learn more and become further involved.
>
> The 5.0 landing page could include:
>
>- An overview of the release with a brief summary of the major features
>- A page for each CEP that is likely to be included–with key features,
>implementation information, and other technical details. These pages can
>also include recordings of relevant Contributor Meetings. Here is an
>example for CEP-28 - Spark Bulk Analytics Library
>
> <https://docs.google.com/document/d/1SUBEtkjWQoesDSCBzwUxb3Ugy1FdzUDUVp_k_Jh0BP4/edit>
>.
>- CTAs to community platforms and activities - Slack, Meetups, Town
>Halls, Contributor Meetings, etc.
>
> Let’s discuss! Does this sound valuable? If so, I will create a Jira
> ticket and am happy to get started. What other things do you think should
> be included in a page like this?
>
>
>


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Ekaterina Dimitrova
Thank you Maxim. There is CASSANDRA-18717, I guess that patch should go
there. Keeping the task or not, the fix of the docs should go in anyway
IMHO. I will not be available the next few days, but I can help with
reviews when I am back.

On Thu, 3 Aug 2023 at 17:44, Maxim Muzafarov  wrote:

> Yes, I agree. The javadoc task should be part of our CI if we decide
> to keep it, to keep it buildable at all times.
>
>
> BTW, I have managed to fix all the javadoc errors.
> I have tested the task for both jdk11 and jdk17.
>
> Changes are here:
>
> https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:javadoc_build
>
> On Thu, 3 Aug 2023 at 21:20, Ekaterina Dimitrova 
> wrote:
> >
> > Thank you Maxim,
> >
> > “
> >
> > From my point of
> > view, the problem is that the javadoc task is not given the attention
> > it deserves. The failonerror is currently 'false' and the task itself
> > is not a part of any build and/or release processes, correct me if I'm
> > wrong.
> >
> > So,
> > 1. Fix warnings/errors;
> > 2. Make the javadoc task part of the build (e.g. put it under
> > 'artifacts'), or make it part of the release process that is regularly
> > checked on the CI;
> > 3. Publish/deploy the javadoc htmls for release in the special
> > directory of the cassandra website to give them a chance of being
> > indexed;“
> >
> > This is aligned with what I saw and the two options mentioned at the
> beginning - if we decide to keep it we should fix things and add the task
> to CI, if we don’t because no one wants the html pages - then better to
> remove it this ant task.
> > On your comment about 100 errors - it seems they are more. There is a
> cap of 100 but when you fix them, more errors appear.
> > Further discussion can be found at CASSANDRA-17687
> >
> > On Thu, 3 Aug 2023 at 14:21, Maxim Muzafarov  wrote:
> >>
> >> Personally, I find javadocs quite useful, especially when htmls are
> >> indexed by search engines, which in turn increases the chances of
> >> finding the right answer faster (I have seen a lot of useful javadocs
> >> in the source code).
> >>
> >> I have done a quick build of the javadocs:
> >>
> >>   [javadoc] Building index for all the packages and classes...
> >>   [javadoc] Building index for all classes...
> >>   [javadoc] Building index for all classes...
> >>   [javadoc] 100 errors
> >>   [javadoc] 100 warnings
> >>
> >> 100 errors is no big deal and can be easily fixed. From my point of
> >> view, the problem is that the javadoc task is not given the attention
> >> it deserves. The failonerror is currently 'false' and the task itself
> >> is not a part of any build and/or release processes, correct me if I'm
> >> wrong.
> >>
> >> So,
> >> 1. Fix warnings/errors;
> >> 2. Make the javadoc task part of the build (e.g. put it under
> >> 'artifacts'), or make it part of the release process that is regularly
> >> checked on the CI;
> >> 3. Publish/deploy the javadoc htmls for release in the special
> >> directory of the cassandra website to give them a chance of being
> >> indexed;
> >>
> >> On Thu, 3 Aug 2023 at 17:11, Jeremiah Jordan 
> wrote:
> >> >
> >> > I don’t think anyone wants to remove the javadocs.  This thread is
> about removing the broken ant task which generates html files from them.
> >> >
> >> > +1 from me on removing the ant task.  If someone feels the task is
> useful they can always implement one that does not crash and add it back.
> >> >
> >> > -Jeremiah
> >> >
> >> > On Aug 3, 2023 at 9:59:55 AM, "Claude Warren, Jr via dev" <
> dev@cassandra.apache.org> wrote:
> >> >>
> >> >> I think that we can get more developers interested if there are
> available javadocs.  While many of the core classes are not going to be
> touched by someone just starting, being able to understand what the
> external touch points are and how they interact with other bits of the
> system can be invaluable, particularly when you don't have the entire code
> base in front of you.
> >> >>
> >> >> For example, I just wrote a tool that explores the distribution of
> keys across multiple sstables, I needed some of the tools classes but not
> much more.  Javadocs would have made that easy if I did not have the source
> code in front of me.
> >> >>
> >> >> I am -1 on removing the javadocs.
> >&g

Re: [DISCUSS] Creating a 5.0 landing page

2023-08-03 Thread Ekaterina Dimitrova
Hi Hugh,

Thank you for reaching out. I think this is a great idea. Also, great
timing, considering the community is discussing a potential 5.0 alpha
release soon.

It seems to me you actually suggest more than one page?
1) 5.0 and new features - could this be an update of the What’s new page? -
https://cassandra.apache.org/doc/trunk/cassandra/new/index.html
Adding also links to some of the talks sounds great to me.
2) Dedicated events page? We were using the Blogs page before but I don’t
think it is a bad idea to split Blog posts from Events page.

Thank you
Ekaterina


On Wed, 2 Aug 2023 at 21:03, Hugh Lashbrooke  wrote:

> With the upcoming release of Apache Cassandra 5.0, I’d like to create a
> landing page for the release and what that could look like.
>
> The landing page would be intended to educate users about what is coming
> up in this important release, highlighting why upgrading will be valuable
> to them, as well as guiding them into more community activities, such as
> Town Halls, where they can learn more and become further involved.
>
> The 5.0 landing page could include:
>
>- An overview of the release with a brief summary of the major features
>- A page for each CEP that is likely to be included–with key features,
>implementation information, and other technical details. These pages can
>also include recordings of relevant Contributor Meetings. Here is an
>example for CEP-28 - Spark Bulk Analytics Library
>
> 
>.
>- CTAs to community platforms and activities - Slack, Meetups, Town
>Halls, Contributor Meetings, etc.
>
> Let’s discuss! Does this sound valuable? If so, I will create a Jira
> ticket and am happy to get started. What other things do you think should
> be included in a page like this?
>


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Ekaterina Dimitrova
s an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
> >>>
> >>>
> >>>
> >>> With or without outputting JavaDoc to HTML, there are some errors
> which we should maybe fix. We want to keep the documentation, but there can
> be syntax errors which may prevent IDE generating a proper preview. So, the
> question is - should we validate the JavaDoc comments as a precommit task?
> Can it be done without actually generating HTML output?
> >>>
> >>> Thanks,
> >>> Jacek
> >>>
> >>> śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker <
> de...@chen-becker.org<mailto:de...@chen-becker.org>> napisał:
> >>> Oh, whoops, I guess I'm the only one that thinks Javadoc is just the
> tool and/or it's output (not the markup itself) :P If anything, the
> codebase could use a little more package/class/method markup in some
> places, so I'm definitely only in favor of getting rid of the ant task. I
> should amend my statement to be "...I suspect most people are not opening
> their browsers and looking at Javadoc..." :)
> >>>
> >>> Cheers,
> >>>
> >>> Derek
> >>>
> >>>
> >>>
> >>> On Wed, Aug 2, 2023, 1:30 PM Josh McKenzie  <mailto:jmcken...@apache.org>> wrote:
> >>> most people are not looking at Javadoc when working on the codebase.
> >>> I definitely use it extensively inside the IDE. But never as a
> compiled set of external docs.
> >>>
> >>> Which is to say, I'm +1 on removing the target and I'd ask everyone to
> keep javadoccing your classes and methods where things are non-obvious or
> there's a logical coupling with something else in the system. :)
> >>>
> >>> On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
> >>> +1. If a need comes up for Javadoc we can fix it at that point, but I
> suspect most people are not looking at Javadoc when working on the codebase.
> >>>
> >>> Cheers,
> >>>
> >>> Derek
> >>>
> >>> On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams  <mailto:dri...@gmail.com>> wrote:
> >>> I don't think even if it works anyone is going to use the output, so
> >>> I'm good with removal.
> >>>
> >>> Kind Regards,
> >>> Brandon
> >>>
> >>> On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
> >>> mailto:e.dimitr...@gmail.com>> wrote:
> >>> >
> >>> > Hi everyone,
> >>> > We were looking into a user report around our ant javadoc task
> recently.
> >>> > That made us realize it is not run in CI; it finishes successfully
> even if there are hundreds of errors, some potentially breaking doc pages.
> >>> >
> >>> > There was a ticket discussion where a few community members
> mentioned that this task was probably unnecessary. Can we remove it, or
> shall we fix it?
> >>> >
> >>> > Best regards,
> >>> > Ekaterina
> >>>
> >>>
> >>> --
> >>> +---+
> >>> | Derek Chen-Becker |
> >>> | GPG Key available at https://keybase.io/dchenbecker<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fdchenbecker=05%7C01%7CStefan.Miklosovic%40netapp.com%7C7ca04f0f58764996ab1e08db93a0de2a%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638266091373361824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=n%2BrDfikzzoQG%2Fg%2BRvNqEEE6vHP8ZmY1skeosesLK9v0%3D=0>
> and   |
> >>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpgp.mit.edu%2Fpks%2Flookup%3Fsearch%3Dderek%2540chen-becker.org=05%7C01%7CStefan.Miklosovic%40netapp.com%7C7ca04f0f58764996ab1e08db93a0de2a%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638266091373518054%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=Tnu5cIoIFZGqhaqOjCjW8yK%2BDTT2%2B0ifvFNs1pJO93s%3D=0>
> |
> >>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> >>> +---+
> >>>
> >>>
> >>>
> >>>
>


Re: [DISCUSSION] Adding BountyCastle as a test dependency

2023-08-03 Thread Ekaterina Dimitrova
As it seems there are no objections, we will continue with the addition of
this dependency, thank you

On Mon, 31 Jul 2023 at 11:41, Ekaterina Dimitrova 
wrote:

> Hi team,
> Probably most of you noticed that SSLFactoryTest unit tests fail now with
> JDK 17. Java 15 removed sun.security.x509 package. You can still generate
> a Self-Signed certificate using BouncyCastle by adding it to the classpath, 
> described
> in this issue <https://github.com/netty/netty/issues/10317> from the
> Netty community. We plan to add BountyCastle test dependency as part of
> CASSANDRA-17992, which will fix SSLFactoryTest. Please let us know if you
> have any questions or concerns.
> Best regards,
> Ekaterina
>


Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-02 Thread Ekaterina Dimitrova
I second what Josh said and confirm we were talking only about the task, no
one is going to remove javadoc from the source code and I totally encourage
people to continue documenting the code

On Wed, 2 Aug 2023 at 15:30, Josh McKenzie  wrote:

> most people are not looking at Javadoc when working on the codebase.
>
> I definitely use it extensively *inside the IDE*. But never as a compiled
> set of external docs.
>
> Which is to say, I'm +1 on removing the target and I'd ask everyone to
> keep javadoccing your classes and methods where things are non-obvious or
> there's a logical coupling with something else in the system. :)
>
> On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
>
> +1. If a need comes up for Javadoc we can fix it at that point, but I
> suspect most people are not looking at Javadoc when working on the codebase.
>
> Cheers,
>
> Derek
>
> On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams  wrote:
>
> I don't think even if it works anyone is going to use the output, so
> I'm good with removal.
>
> Kind Regards,
> Brandon
>
> On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
>  wrote:
> >
> > Hi everyone,
> > We were looking into a user report around our ant javadoc task recently.
> > That made us realize it is not run in CI; it finishes successfully even
> if there are hundreds of errors, some potentially breaking doc pages.
> >
> > There was a ticket discussion where a few community members mentioned
> that this task was probably unnecessary. Can we remove it, or shall we fix
> it?
> >
> > Best regards,
> > Ekaterina
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>
>


[DISCUSSION] Shall we remove ant javadoc task?

2023-08-02 Thread Ekaterina Dimitrova
Hi everyone,
We were looking into a user report around our ant javadoc task recently.
That made us realize it is not run in CI; it finishes successfully even if
there are hundreds of errors, some potentially breaking doc pages.

There was a ticket discussion where a few community members mentioned that
this task was probably unnecessary. Can we remove it, or shall we fix it?

Best regards,
Ekaterina


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-31 Thread Ekaterina Dimitrova
As there were no objections from the community I closed CASSANDRA-18701

On Thu, 27 Jul 2023 at 16:22, Mick Semb Wever  wrote:

>
>
>
>> FYI,  devbranch (pre-commit) on ci-cassandra.a.o is hardcoded to build on
>> jdk8 and jdk11
>> so it's not going to work with trunk based patches…
>>
>> I'll review any patches to fix this, but otherwise focus is
>> on CASSANDRA-18594 where a standalone jenkinsfile avoids this problem
>> altogether.
>>
>
>
> To clarify this, devbranch was hardcoded to a matrix with jdk8 and jdk11,
> with a hack in place to skip jdk11 for <4 branches (at run time).
> Dropping jdk8 was not the cause of devbranch no longer working, it's just
> not built for it.  These limitations of devbranch (and cassandra-builds)
> are being addressed as part of the repeatable ci work.  Note, this does not
> affect post-commit ci-cassandra.a.o pipelines (which already is testing
> jdk11+17 on trunk).
>
> So… I would like to suggest we abandon CASSANDRA-18701 and we collectively
> focus on the intended replacement for this, which is the in-tree scripts
> and 18594.  (It might possibly come as part of CASSANDRA-18665 which is an
> intermediate ticket).
>
> Stefan and Maxim, I think this impacts the two of you most over the next
> few weeks.  In the meantime I suggest using the in-tree scripts as much as
> possible, using individual stage jobs under the patches tab in
> ci-cassandra.a.o (just ignoring the jdk8 axis failures), and asking others
> for final pre-commit ci runs on circleci.  We're close.
>
>
>


[DISCUSSION] Adding BountyCastle as a test dependency

2023-07-31 Thread Ekaterina Dimitrova
Hi team,
Probably most of you noticed that SSLFactoryTest unit tests fail now with
JDK 17. Java 15 removed sun.security.x509 package. You can still generate a
Self-Signed certificate using BouncyCastle by adding it to the
classpath, described
in this issue  from the Netty
community. We plan to add BountyCastle test dependency as part of
CASSANDRA-17992, which will fix SSLFactoryTest. Please let us know if you
have any questions or concerns.
Best regards,
Ekaterina


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-27 Thread Ekaterina Dimitrova
CASSANDRA-18701

On Thu, 27 Jul 2023 at 13:10, Ekaterina Dimitrova 
wrote:

> It seems there was some misunderstanding as dev-branch was on the list of
> tasks to be fixed here.
>
> Anyway, I will work on a patch as I do not want to block people who are
> using it, thanks. My understanding from a private conversation is that
> CASSANDRA-18594 might land in August-September so I don’t think we should
> wait for it.
> Apologize for any inconvenience caused, it was definitely not my intention
>
> On Thu, 27 Jul 2023 at 12:49, Mick Semb Wever  wrote:
>
>>
>>
>>
>> > I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
>> systems today.
>> > This requires numerous patches in a few repos so you will be seeing
>> more failures in CI throughout the day today, but it shouldn’t be anything
>> more 爛 than what we have listed in the table of failures in
>> CASSANDRA-16895’s description. I will be applying the fixes one by one
>> today.
>> > I will keep you posted with updates. Also, please, do let me know if
>> you have any questions or concerns.
>>
>>
>>
>> FYI,  devbranch (pre-commit) on ci-cassandra.a.o is hardcoded to build on
>> jdk8 and jdk11
>> so it's not going to work with trunk based patches…
>>
>> I'll review any patches to fix this, but otherwise focus is
>> on CASSANDRA-18594 where a standalone jenkinsfile avoids this problem
>> altogether.
>>
>>


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-27 Thread Ekaterina Dimitrova
It seems there was some misunderstanding as dev-branch was on the list of
tasks to be fixed here.

Anyway, I will work on a patch as I do not want to block people who are
using it, thanks. My understanding from a private conversation is that
CASSANDRA-18594 might land in August-September so I don’t think we should
wait for it.
Apologize for any inconvenience caused, it was definitely not my intention

On Thu, 27 Jul 2023 at 12:49, Mick Semb Wever  wrote:

>
>
>
> > I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
> systems today.
> > This requires numerous patches in a few repos so you will be seeing more
> failures in CI throughout the day today, but it shouldn’t be anything more
> 爛 than what we have listed in the table of failures in CASSANDRA-16895’s
> description. I will be applying the fixes one by one today.
> > I will keep you posted with updates. Also, please, do let me know if you
> have any questions or concerns.
>
>
>
> FYI,  devbranch (pre-commit) on ci-cassandra.a.o is hardcoded to build on
> jdk8 and jdk11
> so it's not going to work with trunk based patches…
>
> I'll review any patches to fix this, but otherwise focus is
> on CASSANDRA-18594 where a standalone jenkinsfile avoids this problem
> altogether.
>
>


Re: August 5.0 Freeze (with waivers…) and a 5.0-alpha1

2023-07-27 Thread Ekaterina Dimitrova
+1 on freeze 1st August with the suggested waiver.

Also, +1 on the idea for alpha 1, as long as we give straight upfront the
right expectations to users.

Now, when we added also JDK17 in CI, I want to encourage people to start
testing it in their test environments and give us some feedback.



On Wed, 26 Jul 2023 at 18:39, J. D. Jordan 
wrote:

> I think this plan seems reasonable to me. +1
>
> -Jeremiah
>
> On Jul 26, 2023, at 5:28 PM, Mick Semb Wever  wrote:
>
> 
>
>
> The previous thread¹ on when to freeze 5.0 landed on freezing the first
> week of August, with a waiver in place for TCM and Accord to land later
> (but before October).
>
> With JDK8 now dropped and SAI and UCS merged, the only expected 5.0 work
> that hasn't landed is Vector search (CEP-30).
>
> Are there any objections to a waiver on Vector search?  All the
> groundwork: SAI and the vector type; has been merged, with all remaining
> work expected to land in August.
>
> I'm keen to freeze and see us shift gears – there's already SO MUCH in 5.0
> and a long list of flakies.  It takes time and patience to triage and
> identify the bugs that hit us before GA.  The freeze is about being "mostly
> feature complete",  so we have room for things before our first beta
> (precedence is to ask).   If we hope for a GA by December, account for the
> 6 weeks turnaround time for cutting and voting on one alpha, one beta, and
> one rc release, and the quiet period that August is, we really only have
> September and October left.
>
> I already feel this is asking a bit of a miracle from us given how 4.1
> went (and I'm hoping I will be proven wrong).
>
> In addition, are there any objections to cutting an 5.0-alpha1 release as
> soon as we freeze?
>
> This is on the understanding vector, tcm and accord will become available
> in later alphas.  Originally the discussion¹ was waiting for Accord for
> alpha1, but a number of folk off-list have requested earlier alphas to help
> with testing.
>
>
> ¹) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>
>


Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-26 Thread Ekaterina Dimitrova
Thanks Caleb!
Great  job everyone! 

On Wed, 26 Jul 2023 at 15:07, J. D. Jordan 
wrote:

> Thanks for all the work here!
>
> On Jul 26, 2023, at 1:57 PM, Caleb Rackliffe 
> wrote:
>
> 
>
> Alright, the cep-7-sai branch is now merged to trunk!
>
> Now we move to addressing the most urgent items from "Phase 2" (
> CASSANDRA-18473 )
> before (and in the case of some testing after) the 5.0 freeze...
>
> On Wed, Jul 26, 2023 at 6:07 AM Jeremy Hanna 
> wrote:
>
>> Thanks Caleb and Mike and Zhao and Andres and Piotr and everyone else
>> involved with the SAI implementation!
>>
>> On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Just a quick update...
>>
>> With CASSANDRA-18670
>>  complete, and
>> all remaining items in the category of performance optimizations and
>> further testing, the process of merging to trunk will likely start today,
>> beginning with a final rebase on the current trunk and J11 and J17 test
>> runs.
>>
>> On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe 
>> wrote:
>>
>>> Hello there!
>>>
>>> After much toil, the first phase of CEP-7 is nearing completion (see
>>> CASSANDRA-16052 ).
>>> There are presently two issues to resolve before we'd like to merge the
>>> cep-7-sai feature branch and all its goodness to trunk:
>>>
>>> CASSANDRA-18670 
>>> - Importer should build SSTable indexes successfully before making new
>>> SSTables readable (in review)
>>>
>>> CASSANDRA-18673 
>>> - Reduce size of per-SSTable index components (in progress)
>>>
>>> (We've been getting clean CircleCI runs for a while now, and have been
>>> using the multiplexer to sniff out as much flakiness as possible up front.)
>>>
>>> Once merged to trunk, the next steps are:
>>>
>>> 1.) Finish a Harry model that we can use to further fuzz test SAI before
>>> 5.0 releases (see CASSANDRA-18275
>>> ). We've done a
>>> fair amount of fuzz/randomized testing at the component level, but I'd
>>> still consider Harry (at least around single-partition query use-cases) a
>>> critical item for us to have confidence before release.
>>>
>>> 2.) Start pursuing Phase 2 items as time and our needs allow. (see
>>> CASSANDRA-18473 )
>>>
>>> A reminder, SAI is a secondary index, and therefore is by definition an
>>> opt-in feature, and has no explicit "feature flag". However, its
>>> availability to users is still subject to the secondary_indexes_enabled
>>> guardrail, which currently defaults to allowing creation.
>>>
>>> Any thoughts, questions, or comments on the pre-merge plan here?
>>>
>>


Re: Java distributed upgrade tests fail and issue with CircleCI generate.sh script

2023-07-25 Thread Ekaterina Dimitrova
Upgrade tests fixed, please rebase your branches on latest trunk


On Tue, 25 Jul 2023 at 13:03, Ekaterina Dimitrova 
wrote:

> 1 of 2 is resolved.
> The CircleCI script is fixed, you might want to rebase trunk.
>
> On Tue, 25 Jul 2023 at 12:05, Ekaterina Dimitrova 
> wrote:
>
>> Hi all,
>> All java distributed upgrade tests fail at the moment.
>> Testing a potential fix in CASSANDRA-18690.
>> Apologize for the issue.
>> After this issue is fixed you should not be seeing any other failures
>> than the SSLFactoryTest as pointed last night.
>>
>> Also, I added a small fix for generate.sh script which was silently
>> ignoring the -e option and not detecting new/modified tests. It is also
>> under testing in CASSANDRA-18255. Anyone is welcome to test it
>>
>> Apologies for the inconvenience caused. Both should be resolved today.
>>
>> Best regards,
>> Ekaterina
>>
>


Re: Java distributed upgrade tests fail and issue with CircleCI generate.sh script

2023-07-25 Thread Ekaterina Dimitrova
1 of 2 is resolved.
The CircleCI script is fixed, you might want to rebase trunk.

On Tue, 25 Jul 2023 at 12:05, Ekaterina Dimitrova 
wrote:

> Hi all,
> All java distributed upgrade tests fail at the moment.
> Testing a potential fix in CASSANDRA-18690.
> Apologize for the issue.
> After this issue is fixed you should not be seeing any other failures than
> the SSLFactoryTest as pointed last night.
>
> Also, I added a small fix for generate.sh script which was silently
> ignoring the -e option and not detecting new/modified tests. It is also
> under testing in CASSANDRA-18255. Anyone is welcome to test it
>
> Apologies for the inconvenience caused. Both should be resolved today.
>
> Best regards,
> Ekaterina
>


Java distributed upgrade tests fail and issue with CircleCI generate.sh script

2023-07-25 Thread Ekaterina Dimitrova
Hi all,
All java distributed upgrade tests fail at the moment.
Testing a potential fix in CASSANDRA-18690.
Apologize for the issue.
After this issue is fixed you should not be seeing any other failures than
the SSLFactoryTest as pointed last night.

Also, I added a small fix for generate.sh script which was silently
ignoring the -e option and not detecting new/modified tests. It is also
under testing in CASSANDRA-18255. Anyone is welcome to test it

Apologies for the inconvenience caused. Both should be resolved today.

Best regards,
Ekaterina


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-24 Thread Ekaterina Dimitrova
Good news!
After run #1638-39 you should not see anything else failing than SSLFactory
test class. This known issue will be fixed by potentially adding  bounty
castle. More info in CASSANDRA-17992 and this netty PR:
https://github.com/netty/netty/issues/10317
We can probably mark the test class with @Ignore, but knowing how easily
those are forgotten and 17992 being already in review, I prefer not to do
it.

The only new failure I found in #1636 is a rare flaky test we never saw in
CircleCI before. (unit tests were running only there; they were not enabled
in Jenkins until we cleaned them ). Ticket already opened -  CASSANDRA-18685
<https://issues.apache.org/jira/browse/CASSANDRA-18685>

Last but not least, eclipse-warnings is already removed (it doesn't work
with post JDK8 versions), but the new static analysis from Checker
Framework is already in review and soon to land in trunk - CASSANDRA-18239

As usual - if you have any questions or concerns, please do let me know.
Last but not least - thank you to everyone who helped in one way or another
with this effort!!

On Mon, 24 Jul 2023 at 16:37, Ekaterina Dimitrova 
wrote:

> Ninja fix was required for Jenkins, new build started #1636
>
> On Mon, 24 Jul 2023 at 15:42, Ekaterina Dimitrova 
> wrote:
>
>> Done!
>>
>> All commits from 18255 are in.
>> The first run to monitor will be in Jenkins #1635
>>
>> There will be still fixes to be applied for some unit and in-jvm tests
>> that were pending on the drop but I will do it when I see Jenkins kicking
>> in this run properly.  (Which are those can be seen in CASSANDRA-16895,
>> there is a table in its description)
>>
>> I will keep you posted on any new developments.
>>
>>
>> On Mon, 24 Jul 2023 at 14:52, Ekaterina Dimitrova 
>> wrote:
>>
>>> Starting commits for 18255. Please put on hold any trunk commits. I will
>>> let you know when it is done. Thank you
>>>
>>> On Mon, 24 Jul 2023 at 11:29, Ekaterina Dimitrova 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Happy Monday!
>>>>
>>>> I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
>>>> systems today.
>>>> This requires numerous patches in a few repos so you will be seeing
>>>> more failures in CI throughout the day today, but it shouldn’t be anything
>>>> more 爛 than what we have listed in the table of failures in
>>>> CASSANDRA-16895’s description. I will be applying the fixes one by one
>>>> today.
>>>> I will keep you posted with updates. Also, please, do let me know if
>>>> you have any questions or concerns.
>>>>
>>>> Best regards,
>>>> Ekaterina
>>>>
>>>>
>>>>


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-24 Thread Ekaterina Dimitrova
Ninja fix was required for Jenkins, new build started #1636

On Mon, 24 Jul 2023 at 15:42, Ekaterina Dimitrova 
wrote:

> Done!
>
> All commits from 18255 are in.
> The first run to monitor will be in Jenkins #1635
>
> There will be still fixes to be applied for some unit and in-jvm tests
> that were pending on the drop but I will do it when I see Jenkins kicking
> in this run properly.  (Which are those can be seen in CASSANDRA-16895,
> there is a table in its description)
>
> I will keep you posted on any new developments.
>
>
> On Mon, 24 Jul 2023 at 14:52, Ekaterina Dimitrova 
> wrote:
>
>> Starting commits for 18255. Please put on hold any trunk commits. I will
>> let you know when it is done. Thank you
>>
>> On Mon, 24 Jul 2023 at 11:29, Ekaterina Dimitrova 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Happy Monday!
>>>
>>> I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
>>> systems today.
>>> This requires numerous patches in a few repos so you will be seeing more
>>> failures in CI throughout the day today, but it shouldn’t be anything more
>>> 爛 than what we have listed in the table of failures in CASSANDRA-16895’s
>>> description. I will be applying the fixes one by one today.
>>> I will keep you posted with updates. Also, please, do let me know if you
>>> have any questions or concerns.
>>>
>>> Best regards,
>>> Ekaterina
>>>
>>>
>>>


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-24 Thread Ekaterina Dimitrova
Done!

All commits from 18255 are in.
The first run to monitor will be in Jenkins #1635

There will be still fixes to be applied for some unit and in-jvm tests that
were pending on the drop but I will do it when I see Jenkins kicking in
this run properly.  (Which are those can be seen in CASSANDRA-16895, there
is a table in its description)

I will keep you posted on any new developments.


On Mon, 24 Jul 2023 at 14:52, Ekaterina Dimitrova 
wrote:

> Starting commits for 18255. Please put on hold any trunk commits. I will
> let you know when it is done. Thank you
>
> On Mon, 24 Jul 2023 at 11:29, Ekaterina Dimitrova 
> wrote:
>
>> Hi everyone,
>>
>> Happy Monday!
>>
>> I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
>> systems today.
>> This requires numerous patches in a few repos so you will be seeing more
>> failures in CI throughout the day today, but it shouldn’t be anything more
>> 爛 than what we have listed in the table of failures in CASSANDRA-16895’s
>> description. I will be applying the fixes one by one today.
>> I will keep you posted with updates. Also, please, do let me know if you
>> have any questions or concerns.
>>
>> Best regards,
>> Ekaterina
>>
>>
>>


Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-24 Thread Ekaterina Dimitrova
Starting commits for 18255. Please put on hold any trunk commits. I will
let you know when it is done. Thank you

On Mon, 24 Jul 2023 at 11:29, Ekaterina Dimitrova 
wrote:

> Hi everyone,
>
> Happy Monday!
>
> I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
> systems today.
> This requires numerous patches in a few repos so you will be seeing more
> failures in CI throughout the day today, but it shouldn’t be anything more
> 爛 than what we have listed in the table of failures in CASSANDRA-16895’s
> description. I will be applying the fixes one by one today.
> I will keep you posted with updates. Also, please, do let me know if you
> have any questions or concerns.
>
> Best regards,
> Ekaterina
>
>
>


[ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-24 Thread Ekaterina Dimitrova
Hi everyone,

Happy Monday!

I am working on dropping JDK 8 and adding JDK17 on trunk in both CI systems
today.
This requires numerous patches in a few repos so you will be seeing more
failures in CI throughout the day today, but it shouldn’t be anything more
爛 than what we have listed in the table of failures in CASSANDRA-16895’s
description. I will be applying the fixes one by one today.
I will keep you posted with updates. Also, please, do let me know if you
have any questions or concerns.

Best regards,
Ekaterina


Re: [Discuss] CQLSH confusion

2023-07-13 Thread Ekaterina Dimitrova
“ keep the C*
version somewhere in cqlsh and warn when it doesn't match the server.”

+1 on this suggestion. I had horrible experience recently with things
changing their versioning in another project. It brings mostly confusion.
Warning and adding C* version makes it simple and obvious. No need to dig
into docs, tickets, PRs

On Thu, 13 Jul 2023 at 13:26, Brandon Williams  wrote:

> You forgot to link references,
> https://issues.apache.org/jira/browse/CASSANDRA-18666 has it all
> though I think
>
> I think it's too late to align versions, that cat is out of the bag.
> What we can do though is what I last suggested there: keep the C*
> version somewhere in cqlsh and warn when it doesn't match the server.
>
> Kind Regards,
> Brandon
>
> On Thu, Jul 13, 2023 at 12:14 PM German Eichberger via dev
>  wrote:
> >
> > All,
> >
> > I am working with clusters with different Cassandra versions and have
> been using some cqlsh which "just worked". Recently I wanted to use virtual
> tables and ran into [1]. After that I filed [2].
> >
> > Brandon states that "do not use a cqlsh that is from a different version
> than what is distributed with the server" since I have no idea what other
> incompatibilities like this there are, compatibility of that kind has never
> been a goal."
> >
> > I would like to open the discussion if this is what we want: cqlsh needs
> to be in lockstep with the C* version.
> >
> > Assuming, this is how things should be, I would propose to change the
> cqlsh versioning to be in line with the C* versioning. Right now I am using
> cqlsh 6.0.1 and I have no idea to which C* version that translates to.
> Aligning versions would make this much easier.
> >
> > Thanks,
> > German
>


Re: Changing the output of tooling between majors

2023-07-13 Thread Ekaterina Dimitrova
On German’s point, to be honest in pre-4.1 any new changes we tried to add
with a flag after concerns for breaking changes were raised. So in that
sense I think it will be confusing to flip that to - new output by default
and moving to legacy the old one

On Thu, 13 Jul 2023 at 12:09, German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Let's take this discussion in a different direction: If we add a --legacy
> ​ argument where we are supporting an old version for those who
> need/want it but have the (breaking) changes on the default this feels like
> a compromise - and then we can deprecate the legacy format without
> impacting innovation. We can also flip this with requiring a flag for the
> changed format if we feel this is better.
>
> This let's us innovate without breaking anyone. Thoughts?
>
> Thanks,
> German
>
> --
> *From:* Miklosovic, Stefan 
> *Sent:* Thursday, July 13, 2023 8:20 AM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: Changing the output of tooling between majors
>
> "Dinesh's message cautions against making "breaking" changes that are
> likely to break parsing of output by current users (e.g., changes to
> naming/meaning/"
>
> That is 100% correct. So by that logic, changing the output which you grep
> on to something else will break your scripts if you expect it there.
>
> For example, take sstablemetadata command - I know it is not nodetool but
> it does not matter. This is just an example. Same "problem" can be found in
> nodetool probably, sstablemetadata just came to my mind first as that is
> what I hit recently.
>
> sstablemetadata write this:
>
> Repaired at: 0
> Originating host id: d2d12c56-7d9c-49a7-aaef-05bd2633b09e
> Pending repair: --
> Replay positions covered: {CommitLogPosition(segmentId=1689261027905,
> position=59450)=CommitLogPosition(segmentId=1689261027905, position=60508)}
> totalColumnsSet: 0
> totalRows: 1
> Estimated tombstone drop times:
>
>
> Do you see "totalColumsSet" and "totalRows" when all other keys in that
> ouput (in whole command) are following different format? In this case, it
> should be "Total columns set" and "Total rows".
>
> So when we change it to that, anybody who is grepping "totalRows" will
> have no output. That is a breaking change to me. His script stopped to work.
>
> You are correct and I agree with you completely that STRICT ADDITIONS
> (what I was suggesting) are fine because we are not breaking anything to
> anybody.
>
> So here, if I want to change this, by what Dinesh says, (we change the
> naming and we break it), I need to offer JSON / YAML alternative to what
> sstablemetadata prints currently. (might be as well nodetool, just an
> example).
>
> 
> From: C. Scott Andreas 
> Sent: Thursday, July 13, 2023 17:01
> To: dev@cassandra.apache.org
> Cc: dev@cassandra.apache.org
> Subject: Re: Changing the output of tooling between majors
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> Dinesh's message cautions against making "breaking" changes that are
> likely to break parsing of output by current users (e.g., changes to
> naming/meaning/position of existing fields vs. adding new ones). I don't
> read his message as saying that any change to nodetool output is
> conditional on offering a JSON/YAML representation, though.
>
> What are some changes that you'd like to make?
>
> – Scott
>
> On Jul 13, 2023, at 7:44 AM, "Miklosovic, Stefan" <
> stefan.mikloso...@netapp.com> wrote:
>
>
> For example Dinesh said this:
>
> "Until nodetool can support JSON as output format for all interaction and
> there is a significant adoption in the user community, I would strongly
> advise against making breaking changes to the CLI output."
>
> That is where I get the need to have a JSON output in order to fix a typo
> from. That is if we look at fixing a typo as a breaking change. Which I
> would say it is as if somebody is "greping" it and it is not there, it will
> break.
>
> Do you understand that the same way or am I interpreting that wrong?
>
> 
> From: C. Scott Andreas 
> Sent: Thursday, July 13, 2023 16:35
> To: dev@cassandra.apache.org
> Cc: dev
> Subject: Re: Changing the output of tooling between majors
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> "From what I see you guys want to condition any change by offering
> json/yaml as well."
>
> I don't think I've seen a proposal to block changes to nodetool output on
> machine-parseable formats in this thread.
>
> Additions of new delimited fields to nodetool output are mostly
> straightforward. Changes to fields that exist today are likely to cause
> problems - as Josh mentions. These seem 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Ekaterina Dimitrova
Revert for only trunk patches right?
I’d say we need to completely stabilize the environment, no noise before we
go into that direction.

On Wed, 12 Jul 2023 at 8:55, Jacek Lewandowski 
wrote:

> Would it be re-opening the ticket or creating a new ticket with "revert of
> fix" ?
>
>
>
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova 
> napisał(a):
>
>> jenkins_jira_integration
>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>  script
>> updating the JIRA ticket with test results if you cause a regression + us
>> building a muscle around reverting your commit if they break tests.“
>>
>> I am not sure people finding the time to fix their breakages will be
>> solved but at least they will be pinged automatically. Hopefully many
>> follow Jira updates.
>>
>> “  I don't take the past as strongly indicative of the future here since
>> we've been allowing circle to validate pre-commit and haven't been
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have
>> pre-5.0 now compared to pre-4.1.
>>
>>
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:
>>
>>> (This response ended up being a bit longer than intended; sorry about
>>> that)
>>>
>>> What is more common though is packaging errors,
>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>> upgrade tests, being less responsive post-commit as you already
>>> moved on
>>>
>>> *Two that **should **be resolved in the new regime:*
>>> * Packaging errors should be caught pre as we're making the artifact
>>> builds part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator
>>> is the only one for 5.0 (and just bypasses the cdc-related work on
>>> allocation if it's disabled thus not impacting perf); the existing targeted
>>> testing of cdc specific functionality should be sufficient to confirm its
>>> correctness as it doesn't vary from the primary allocation path when it
>>> comes to mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>>
>>> *Outstanding issues:*
>>> * compression. If we just run with defaults we won't test all cases so
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we
>>> have a transient burst of these types of issues? And would we expect these
>>> to vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a
>>> combination of the jenkins_jira_integration
>>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>> script updating the JIRA ticket with test results if you cause a regression
>>> + us building a muscle around reverting your commit if they break tests.
>>>
>>> To quote Jacek:
>>>
>>> why don't run dtests w/wo sstable compression x w/wo internode
>>> encryption x w/wo vnodes,
>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc.
>>> I think this is a matter of cost vs result.
>>>
>>>
>>> I think we've organically made these decisions and tradeoffs in the past
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now they're
>>> *consistently* not, you're the only one who changed something", and
>>> 3. Instill a culture of "if you can't fix it immediately revert your
>>> commit"
>>>
>>> Then I think we'll only be vulnerable to flaky failures introduced
>>> across different non-default configurations as side effects in tests that
>>> aren't touched, which *intuitively* feels like a lot less than we're
>>> facing today. We could even get clever as a day 2 effort and define
>>> packages in the primary codebase where changes take place and multiplex (on
>>> a smaller scale) their respective packages of unit tests in the future if
>>> we see problems in this area.
>>>
>>> Flakey tests are a giant pain in the ass and a huge drain on
>>> productivity, don't get me wrong. *And* we have to balance how much
>>> cost we're paying before each commit with the benefit we expect to gain
>>> from that.
>>>
>>> Does the above make sense? Are ther

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Ekaterina Dimitrova
jenkins_jira_integration

script
updating the JIRA ticket with test results if you cause a regression + us
building a muscle around reverting your commit if they break tests.“

I am not sure people finding the time to fix their breakages will be solved
but at least they will be pinged automatically. Hopefully many follow Jira
updates.

“  I don't take the past as strongly indicative of the future here since
we've been allowing circle to validate pre-commit and haven't been
multiplexing.”
I am interested to compare how many tickets for flaky tests we will have
pre-5.0 now compared to pre-4.1.


On Wed, 12 Jul 2023 at 8:41, Josh McKenzie  wrote:

> (This response ended up being a bit longer than intended; sorry about that)
>
> What is more common though is packaging errors,
> cdc/compression/system_ks_directory targeted fixes, CI w/wo
> upgrade tests, being less responsive post-commit as you already
> moved on
>
> *Two that **should **be resolved in the new regime:*
> * Packaging errors should be caught pre as we're making the artifact
> builds part of pre-commit.
> * I'm hoping to merge the commit log segment allocation so CDC allocator
> is the only one for 5.0 (and just bypasses the cdc-related work on
> allocation if it's disabled thus not impacting perf); the existing targeted
> testing of cdc specific functionality should be sufficient to confirm its
> correctness as it doesn't vary from the primary allocation path when it
> comes to mutation space in the buffer
> * Upgrade tests are going to be part of the pre-commit suite
>
> *Outstanding issues:*
> * compression. If we just run with defaults we won't test all cases so
> errors could pop up here
> * system_ks_directory related things: is this still ongoing or did we have
> a transient burst of these types of issues? And would we expect these to
> vary based on different JDK's, non-default configurations, etc?
> * Being less responsive post-commit: My only ideas here are a combination
> of the jenkins_jira_integration
> 
> script updating the JIRA ticket with test results if you cause a regression
> + us building a muscle around reverting your commit if they break tests.
>
> To quote Jacek:
>
> why don't run dtests w/wo sstable compression x w/wo internode encryption
> x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
>
>
> I think we've organically made these decisions and tradeoffs in the past
> without being methodical about it. If we can:
> 1. Multiplex changed or new tests
> 2. Tighten the feedback loop of "tests were green, now they're
> *consistently* not, you're the only one who changed something", and
> 3. Instill a culture of "if you can't fix it immediately revert your
> commit"
>
> Then I think we'll only be vulnerable to flaky failures introduced across
> different non-default configurations as side effects in tests that aren't
> touched, which *intuitively* feels like a lot less than we're facing
> today. We could even get clever as a day 2 effort and define packages in
> the primary codebase where changes take place and multiplex (on a smaller
> scale) their respective packages of unit tests in the future if we see
> problems in this area.
>
> Flakey tests are a giant pain in the ass and a huge drain on productivity,
> don't get me wrong. *And* we have to balance how much cost we're paying
> before each commit with the benefit we expect to gain from that.
>
> Does the above make sense? Are there things you've seen in the trenches
> that challenge or invalidate any of those perspectives?
>
> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>
> Isn't novnodes a special case of vnodes with n=1 ?
>
> We should rather select a subset of tests for which it makes sense to run
> with different configurations.
>
> The set of configurations against which we run the tests currently is
> still only the subset of all possible cases.
> I could ask - why don't run dtests w/wo sstable compression x w/wo
> internode encryption x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
> This equation contains the likelihood of failure in configuration X, given
> there was no failure in the default
> configuration, the cost of running those tests, the time we delay merging,
> the likelihood that we wait for
> the test results so long that our branch diverge and we will have to rerun
> them or accept the fact that we merge
> a code which was tested on outdated base. Eventually, the overall new
> contributors experience - whether they
> want to participate in the future.
>
>
>
> śr., 12 lip 2023 o 07:24 Berenguer Blasi 
> napisał(a):
>
> On our 4.0 release I remember a 

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-12 Thread Ekaterina Dimitrova
“ On our 4.0 release I remember a number of such failures but not recently.
 ”

Based on all the 5.0 work I’d say we need as a minimum to build and start a
node with all JDK versions pre-commit.

On Wed, 12 Jul 2023 at 7:29, Jacek Lewandowski 
wrote:

> Isn't novnodes a special case of vnodes with n=1 ?
>
> We should rather select a subset of tests for which it makes sense to run
> with different configurations.
>
> The set of configurations against which we run the tests currently is
> still only the subset of all possible cases.
> I could ask - why don't run dtests w/wo sstable compression x w/wo
> internode encryption x w/wo vnodes,
> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I
> think this is a matter of cost vs result.
> This equation contains the likelihood of failure in configuration X, given
> there was no failure in the default
> configuration, the cost of running those tests, the time we delay merging,
> the likelihood that we wait for
> the test results so long that our branch diverge and we will have to rerun
> them or accept the fact that we merge
> a code which was tested on outdated base. Eventually, the overall new
> contributors experience - whether they
> want to participate in the future.
>
>
>
> śr., 12 lip 2023 o 07:24 Berenguer Blasi 
> napisał(a):
>
>> On our 4.0 release I remember a number of such failures but not recently.
>> What is more common though is packaging errors,
>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade tests,
>> being less responsive post-commit as you already moved on,... Either the
>> smoke pre-commit has approval steps for everything or we should give imo a
>> devBranch alike job to the dev pre-commit. I find it terribly useful. My
>> 2cts.
>> On 11/7/23 18:26, Josh McKenzie wrote:
>>
>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: at
>> reviewer's discretion
>>
>> In general, maybe offering a dev the option of choosing either
>> "pre-commit smoke" or "post-commit full" at their discretion for any work
>> would be the right play.
>>
>> A follow-on thought: even with something as significant as Accord, TCM,
>> Trie data structures, etc - I'd be a bit surprised to see tests fail on
>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that weren't
>> immediately clear the patch stumbled across something surprising and was
>> immediately trivially attributable if not fixable. *In theory* the
>> things we're talking about excluding from the pre-commit smoke test suite
>> are all things that are supposed to be identical across environments and
>> thus opaque / interchangeable by default (JDK version outside checking
>> build which we will, vnodes vs. non, etc).
>>
>> Has that not proven to be the case in your experience?
>>
>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>>
>> A strong +1 to getting to a single CI system. CircleCI definitely has
>> some niceties and I understand why it's currently used, but right now we
>> get 2 CI systems for twice the price. +1 on the proposed subsets.
>>
>> Derek
>>
>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie 
>> wrote:
>>
>>
>> I'm personally not thinking about CircleCI at all; I'm envisioning a
>> world where all of us have 1 CI *software* system (i.e. reproducible on
>> any env) that we use for pre-commit validation, and then post-commit
>> happens on reference ASF hardware.
>>
>> So:
>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On green,
>> merge.
>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, link
>> back to the JIRA where the commit took place
>>
>> Circle would need to remain in lockstep with the requirements for point 1
>> here.
>>
>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>
>> +1 to Josh which is exactly my line of thought as well. But that is only
>> valid if we have a solid Jenkins that will eventually run all test configs.
>> So I think I lost track a bit here. Are you proposing:
>>
>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, TBD)
>> config of tests
>>
>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies you
>> in case of problems?
>>
>> Or sthg different like having 1 also in Jenkins?
>> On 7/7/23 17:55, Andrés de la Peña wrote:
>>
>> I think 500 runs combining all configs could be reasonable, since it's
>> unlikely to have config-specific flaky tests. As in five configs with 100
>> repetitions each.
>>
>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie  wrote:
>>
>> Maybe. Kind of depends on how long we write our tests to run doesn't it?
>> :)
>>
>> But point taken. Any non-trivial test would start to be something of a
>> beast under this approach.
>>
>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>
>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie 
>> wrote:
>> > 3. Multiplexed tests (changed, added) run against all JDK's and a
>> broader range of configs (no-vnode, vnode default, compression, 

Re: Proposed update to cassandra-stress to use Apache Commons CLI

2023-07-10 Thread Ekaterina Dimitrova
Hey Brad,

Thanks for raising the topic. I wanted to mention we are now on a very old
version of commons-cli (1.1 from 2007). So I would suggest we first update
it.
While there is activity in the commons-cli github repo (a lot of dependency
updates as far as I can tell from a quick look), the last version is 1.5
from 2021.

Best regards,
Ekaterina

On Mon, 10 Jul 2023 at 11:47, Brad  wrote:

> The Apache Commons CLI library provides an API for parsing command line
> options with the package org.apache.commons.cli and this is already used by
> a dozen of existing Cassandra utilities including:
>
> SSTableMetadataViewer, StandaloneScrubber, StandaloneSplitter,
> SSTableExport, BulkLoader, and others.
>
>
> However, cassandra-stress is an outlier which uses its own custom classes
> to parse command line options with classes such as OptionsSimple.  In
> addition, the options syntax for username, password, and others are not
> aligned with the format used by CQLSH.
>
> This suggestion is to:
>
> a) Upgrade cassandra-stress to use Apache Commons CLI (no new dependencies
> are required as this library is already used by the project)
>
> b) Align the cassandra-stress CLI options with those in CQLSH,
>
> For example, using the new syntax like CQLSH:
>
>
> cassandra-stress -username foo -password bar
>
>
> and replacing the old syntax:
>
> cassandra-stress -mode username=foo and password=bar
>
>
> This will simplify and unify the code base, eliminate code and reduce the
> confusion between similar named classes such
> as org.apache.cassandra.stress.settings.{Option, OptionsMulti,
> OptionsSimple} and org.apache.commons.cli.{Option, OptionGroup, Options)
>
> If there are no significant objections, I can raise a Jira for this
> proposal.
>
> Regards,
>
> Brad Schoening
>


Re: Removal of CloudstackSnitch

2023-07-10 Thread Ekaterina Dimitrova
Hi Stefan,

I think we should follow our deprecation rules and deprecate it in 5.0,
potentially remove in 6.0. (Deprecate in one major, remove in the next
major)
Maybe the deprecation can come with a note on your findings for the users,
just in case someone somewhere uses it and did not follow the user mailing
list?

Thank you
Ekaterina

On Mon, 10 Jul 2023 at 9:47, Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Hi list,
>
> I want to ask about the future of CloudstackSnitch.
>
> This snitch was added 9 years ago (1). I contacted the original author of
> that snitch, Pierre-Yves Ritschard, who is currently CEO of a company he
> coded that snitch for.
>
> In a nutshell, Pierre answered that he does not think this snitch is
> relevant anymore and the company is using different way how to fetch
> metadata from a node, rendering CloudstackSnitch, as is, irrelevant for
> them.
>
> I also wrote an email to user ML list (2) about two weeks ago and nobody
> answered that they are using it either.
>
> The current implementation is using this approach (3) but I think that it
> is already obsolete in the snitch because snitch is adding a path to parsed
> metadata service IP which is probably not there at all in the default
> implementation of Cloudstack data server.
>
> What also bothers me is that we, as a community, seem to not be able to
> test the functionality of this snitch as I do not know anybody with a
> Cloudstack deployment who would be able to test this reliably.
>
> For completeness, in (1), Brandon expressed his opinion that unless users
> come forward for this snitch, he thinks the retiring it is the best option.
>
> For all cloud-based snitches, we did the refactorization of the code in
> 16555 an we work on improvement in 18438 which introduces a generic way how
> metadata services are called and plugging in custom logic or reusing a
> default implementation of a cloud connector is very easy, further making
> this snitch less relevant.
>
> This being said, should we:
>
> 1) remove it in 5.0
> 2) keep it there in 5.0 but mark it @Deprecated
> 3) keep it there
>
> Regards
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-7147
> (2) https://lists.apache.org/thread/k4woljlk23m2oylvrbnod6wocno2dlm3
> (3)
> https://docs.cloudstack.apache.org/en/latest/adminguide/virtual_machines/user-data.html#determining-the-virtual-router-address-without-dns


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-06 Thread Ekaterina Dimitrova
Hi,

First of all, thank you for all the work!
I personally think that it should be ok to add a new column.

I will be very happy to see this landing in 5.0.
I am personally against porting this patch to 4.1. To be clear, I am sure
you did a great job and my response would be the same to every single
person - the configuration is quite wide-spread and the devil is in the
details. I do not see a good reason for exception here except convenience.
There is no feature flag for these changes too, right?

Best regards,
Ekaterina

На четвъртък, 6 юли 2023 г. Miklosovic, Stefan 
написа:

> Hi Maxim,
>
> I went through the PR and added my comments. I think David also reviewed
> it. All points you mentioned make sense to me but I humbly think it is
> necessary to have at least one additional pair of eyes on this as the patch
> is relatively impactful.
>
> I would like to see additional column in system_views.settings of name
> "mutable" and of type "boolean" to see what field I am actually allowed to
> update as an operator.
>
> It seems to me you agree with the introduction of this column (1) but
> there is no clear agreement where we actually want to put it. You want this
> whole feature to be committed to 4.1 branch as well which is an interesting
> proposal. I was thinking that this work will go to 5.0 only. I am not
> completely sure it is necessary to backport this feature but your
> argumentation here (2) is worth to discuss further.
>
> If we introduce this change to 4.1, that field would not be there but in
> 5.0 it would. So that way we will not introduce any new column to
> system_views.settings.
> We could also go with the introduction of this column to 4.1 if people are
> ok with that.
>
> For the simplicity, I am slightly leaning towards introducing this feature
> to 5.0 only.
>
> (1) https://github.com/apache/cassandra/pull/2334#discussion_r1251104171
> (2) https://github.com/apache/cassandra/pull/2334#discussion_r1251248041
>
> 
> From: Maxim Muzafarov 
> Sent: Friday, June 23, 2023 13:50
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change
> running configuration
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> Hello everyone,
>
>
> As there is a lack of feedback for an option to go on with and having
> a discussion for pros and cons for each option I tend to agree with
> the vision of this problem proposed by David :-) After a lot of
> discussion on Slack, we came to the @ValidatedBy annotation which
> points to a validation method of a property and this will address all
> our concerns and issues with validation.
>
> I'd like to raise the visibility of these changes and try to find one
> more committer to look at them:
> https://issues.apache.org/jira/browse/CASSANDRA-15254
> https://github.com/apache/cassandra/pull/2334/files
>
> I'd really appreciate any kind of review in advance.
>
>
> Despite the number of changes +2,043 −302 and the fact that most of
> these additions are related to the tests themselves, I would like to
> highlight the crucial design points which are required to make the
> SettingsTable virtual table updatable. Some of these have already been
> discussed in this thread, and I would like to provide a brief outline
> of these points to facilitate the PR review.
>
> So, what are the problems that have been solved to make the
> SettingsTable updatable?
>
> 1. Input validation.
>
> Currently, the JMX, Yaml and DatabaseDescriptor#apply methods perform
> the same validation of user input for the same property in their own
> ways which fortunately results in a consistent configuration state,
> but not always. The CASSANDRA-17734 is a good example of this.
>
> The @ValidatedBy annotations, which point to a validation method have
> been added to address this particular problem. So, no matter what API
> is triggered the method will be called to validate input and will also
> work even if the cassandra.yaml is loaded by the yaml engine in a
> pre-parse state, such as we are now checking input properties for
> deprecation and nullability.
>
> There are two types of validation worth mentioning:
> - stateless - properties do not depend on any other configuration;
> - stateful - properties that require a fully-constructed Config
> instance to be validated and those values depend on other properties;
>
> For the sake of simplicity, the scope of this task will be limited to
> dealing with stateless properties only, but stateful validations are
> also supported in the initial PR using property change listeners.
>
> 2. Property mutability.
>
> There is no way of distinguishing which parts of a property are
> mutable and which are not. This meta-information must be available at
> runtime and as we discussed earlier the @Mutable annotation is added
> to handle this.
>
> 3. Listening for 

Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-07-05 Thread Ekaterina Dimitrova
“I'm curious what it triggers for you Brandon, Berenguer, Andres,
Ekaterina, and Mick (when you're back from the mountains ;)). ”
We already have pre-commit a minimum set being mandatory in CircleCI.
People can manually trigger other tests if they feel they might have broken
something. The only tests that are a matter of config twist that is
mandatory to run now in the pre-commit CircleCI workflow are those that
were never added to Jenkins (for example, system keyspaces, and oa unit
tests). Probably the only combination that we might want to reconsider is
with/without vnodes?

I wouldn’t also advocate not running JDK17 tests until we enable all test
suites post-commit. Reminder - those that still have failing tests we are
actively working on are disabled in Jenkins to reduce the noise until we
are ready to fully switch from 8+11 to 11+17. Probably also in the future,
when we work to introduce new JDK versions, we would again want to run
tests and see whether we regress the people who are dealing with all the
maintenance/problems in the background.

Another twist - I think Jenkins dev so far triggers all tests as
post-commit does, no? Probably that can change to mimic what we agreed on
for CircleCI. I am sure the devil will be again in the details, but just a
thing to consider.

“ If a failure makes it to post-commit, it's much more expensive to root
cause and figure out with much higher costs to the community's collective
productivity.”

Totally agree. And my hope is that the pre-commit spinning in loop tests
should help us deal with that to some extend. It is always easy for the
author to do a fix while their thoughts are still on the topic. It also
reduces the time people spend on bisecting and doing archeology later. On
Derek’s point about flakiness being also attributed to tests sometimes -
when we get bitten a few times pre-commit and have to improve our tests to
make them more deterministic, I believe we will learn a thing or two, and
those types of things will happen less in time.

Best regards,
Ekaterina

-- Forwarded message -
From: Josh McKenzie 
Date: Wed, 5 Jul 2023 at 8:25
Subject: Re: [DISCUSS] Formalizing requirements for pre-commit patches on
new CI
To: dev 


choose a consistent, representative subset of stable tests that we feel
give us a reasonable level of confidence in return for a reasonable amount
of runtime
...
Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and
j11 w/wo vnodes. That is 6 times total. I wonder about that ROI.
...

test with the default number of vnodes, test with the default compression
settings, and test with the default heap/off-heap buffers.

If I take these at face value to be true (I happen to agree with them, so
I'm going to do this :)), what falls out for me:

   1. Pre-commit should be an intentional smoke-testing suite, much smaller
   relative to post-commit than it is today
   2. We should aggressively cull all low-signal pre-commit tests, suites,
   and configurations that aren't needed to keep post-commit stable

High signal in pre-commit (indicative; non-exhaustive):

   1. Only the most commonly used JDK (JDK11 atm?)
   2. Config defaults (vnodes, compression, heap/off-heap buffers, memtable
   format, sstable format)
   3. Most popular / general / run-of-the-mill linux distro (debian?)

Low signal in pre-commit (indicative; non-exhaustive):

   1. No vnodes
   2. JDK8; JDK17
   3. Non-default settings (Compression off. Fully mmap, no mmap. Trie
   memtables or sstables, cdc enabled)


So this shape of thinking - I'm curious what it triggers for you Brandon,
Berenguer, Andres, Ekaterina, and Mick (when you're back from the mountains
;)). You guys paid a lot of the debt in the run up to 4.1 so have the most
recent expertise and I trust your perspectives here.

If a failure makes it to post-commit, it's much more expensive to root
cause and figure out with much higher costs to the community's collective
productivity. That said, I think we can make a lot of progress along this
line of thinking.

On Wed, Jul 5, 2023, at 5:54 AM, Jacek Lewandowski wrote:

Perhaps pre-commit checks should include mostly the typical configuration
of Cassandra rather than some subset of possible combinations. Like it was
said somewhere above - test with the default number of vnodes, test with
the default compression settings, and test with the default heap/off-heap
buffers.

A longer-term goal could be to isolate what depends on particular
configuration options. Instead of blindly running everything with, say,
vnodes enabled and disabled, isolate those tests that need to be run with
those two configurations and run the rest with the default one.

... the rule of multiplexing new or changed tests might go a long way to
mitigating that ...


I wonder if there is some commonality in the flaky tests reported so far,
like the presence of certain statements? Also, there could be a tool that
inspects coverage analysis reports and chooses the 

Re: JDK 17 Support Proposal for ant generate-idea-files in CASSANDRA-18467

2023-07-05 Thread Ekaterina Dimitrova
Dear all,

Thanks to everyone who shared feedback and tested the changes. The patch
was committed.

‘ant generate-idea-files’ now support JDK8, JDK11, and JDK17
To add support for another JDK the java-jvmargs property must be set for
the JDK in question (see how it’s done in build.xml for Java 11 and 17)
Other minor, but notable changes are:
- test jvmargs are now added to idea run configuration
-.idea dir and project iml file are first removed and then recreated during
‘ant generate-idea-files’

Best regards,
Ekaterina

On Mon, 15 May 2023 at 5:18, Jakub Zytka  wrote:

> Dear Team,
>
> I am writing to discuss a proposal I've made in CASSANDRA-18467, which
> pertains to offering JDK 17 support for the ant generate-idea-files
> command. With this support in place, when you run ant generate-idea-files
> using Java 17, the resulting IntelliJ IDEA project will have the necessary
> settings to build and run tests with Java 17 within the IDE. This is
> similar to how the current Java 11 support works.
>
> In addition to this primary enhancement, I have also made two other minor,
> but significant updates:
>
>1. The .idea subdirectory and the project IML files are now removed
>when running ant generate-idea-files. However, please note that these
>files are still unaffected when you run ant realclean.
>2. The JUnit run configurations now include additional test JVM
>arguments, referred to as _std-test-jvmargs in build.xml.
>
> Given the nature of these changes, I have conducted manual tests on the
> IntelliJ projects created. I validated these changes in two distinct
> environments:
>
>1. Separate repositories for different JDKs, with no JDK switching
>within a single repository.
>2. A single repository with JDK switching.
>
> If your workflow differs from these scenarios, I kindly request you to
> test these changes and share your feedback. Alternatively, please inform me
> about your unique workflow, and I will conduct the necessary testing on my
> end.
>
> The updated code can be accessed at the following GitHub link:
> https://github.com/jakubzytka/cassandra/tree/CASSANDRA-18467
>
> You can also view the pull request here:
> https://github.com/apache/cassandra/pull/2320
>
> I look forward to hearing your thoughts on this proposal.
>
> Best regards, Kuba
>


Re: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

2023-06-30 Thread Ekaterina Dimitrova
Thank you, Josh and Mick

Immediate questions on my mind:
- Currently we run at most two parallel CI runs in Jenkins-dev, I guess you
will try to improve that limitation?
- There are hw constraints, is there any approximation on how long it will
take to run all tests? Or is there a stated goal that we will strive to
reach as a project?
- Bringing scripts in-tree will make it easier to add a multiplexer which
we miss at the moment, that’s great. (Running jobs in a loop, helps a lot
with flaky tests) . Also makes it easier to add any new test suites

On Fri, 30 Jun 2023 at 13:35, Derek Chen-Becker 
wrote:

> Thanks Josh, this looks great! I think the constraints you've outlined are
> reasonable for an initial attempt. We can always evolve if we run into
> issues.
>
> Cheers,
>
> Derek
>
> On Fri, Jun 30, 2023 at 11:19 AM Josh McKenzie 
> wrote:
>
>> Context: we're looking to get away from having split CircleCI and ASF CI
>> as well
>> as getting ASF CI to a stable state. There's a variety of reasons why
>> it's flaky
>> (orchestration, heterogenous hardware, hardware failures, flaky tests,
>> non-deterministic runs, noisy neighbors, etc), many of which Mick has been
>> making great headway on starting to address.
>>
>> If you're curious see:
>> - Mick's 2023/01/09 email thread on CI:
>> https://lists.apache.org/thread/fqdvqkjmz6w8c864vw98ymvb1995lcy4
>> - Mick's 2023/04/26 email thread on CI:
>> https://lists.apache.org/thread/xb80v6r857dz5rlm5ckcn69xcl4shvbq
>> - CASSANDRA-18137: epic for "Repeatable ci-cassandra.a.o":
>> https://issues.apache.org/jira/browse/CASSANDRA-18137
>> - CASSANDRA-18133: In-tree build scripts:
>> https://issues.apache.org/jira/browse/CASSANDRA-18133
>>
>> What's fallen out from this: the new reference CI will have the following
>> logical layers:
>> 1. ant
>> 2. build/test scripts that setup the env. See run-tests.sh and
>> run-python-dtests.sh here:
>>
>> https://github.com/thelastpickle/cassandra/tree/0aecbd873ff4de5474fe15efac4cdde10b603c7b/.build
>> 3. dockerized build/test scripts that have containerized the flow of 1
>> and 2. See:
>>
>> https://github.com/thelastpickle/cassandra/tree/0aecbd873ff4de5474fe15efac4cdde10b603c7b/.build/docker
>> 4. CI integrations. See generation of unified test report in build.xml:
>>
>> https://github.com/thelastpickle/cassandra/blame/mck/18133/trunk/build.xml#L1794-L1817
>> )
>> 5. Optional full CI lifecycle w/Jenkins running in a container (full stack
>> setup, run, teardown, pending)
>>
>>
>> *I want to let everyone know the high level structure of how this is
>> shaping up,*
>>
>> *as this is a change that will directly impact the work of *all of us* on
>> the*
>> *project.*
>>
>> In terms of our goals, the chief goals I'd like to call out in this
>> context are:
>> * ASF CI needs to be and remain consistent
>> * contributors need a turnkey way to validate their work before merging
>> that
>> they can accelerate by throwing resources at it.
>>
>> We as a project need to determine what is *required* to run in a CI
>> environment
>> to consider that run certified for merge. Where Mick and I landed
>> through a lot
>> of back and forth is that the following would be required:
>> 1. used ant / pytest to build and run tests
>> 2. used the reference scripts being changed in CASSANDRA-18133 (in-tree
>> .build/)
>> to setup and execute your test environment
>> 3. constrained your runtime environment to the same hardware and time
>> constraints we use in ASF CI, within reason (CPU count independent of
>> speed,
>> memory size and disk size independent of hardware specs, etc)
>> 4. reported test results in a unified fashion that has all the
>> information we
>> normally get from a test run
>> 5. (maybe) Parallelized the tests across the same split lines as upstream
>> ASF
>> (i.e. no weird env specific neighbor / scheduling flakes)
>>
>> Last but not least is the "What do we do with CircleCI?" angle. The
>> current
>> thought is we allow people to continue using it with the stated goal of
>> migrating the circle config over to using the unified build scripts as
>> well and
>> get it in compliance with the above requirements.
>>
>> For reference, here's a gdoc where we've hashed this out:
>>
>> https://docs.google.com/document/d/1TaYMvE5ryOYX03cxzY6XzuUS651fktVER02JHmZR5FU/edit?usp=sharing
>>
>> So my questions for the community here:
>> 1. What's missing from the above conceptualization of the problem?
>> 2. Are the constraints too strong? Too weak? Just right?
>>
>> Thanks everyone, and happy Friday. ;)
>>
>> ~Josh
>>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> 

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-29 Thread Ekaterina Dimitrova
There is a separate thread started and respective ticket for
generate-idea-files.
https://lists.apache.org/thread/o2fdkyv2skvf9ngy9jhpnhvo92qvr17m
CASSANDRA-18467


On Thu, 29 Jun 2023 at 16:54, Jeremiah Jordan 
wrote:

> +100 I support making generate-idea-files auto setup everything in
> IntelliJ for you.  If you post a diff, I will test it.
>
> On this proposal, I don’t really have an opinion one way or the other
> about what the default is for local "ant jar”, if its slow I will figure
> out how to turn it off, if its fast I will leave it on.
> I do care that CI runs checks, and complains loudly if something is wrong
> such that it is very easy to tell during review.
>
> -Jeremiah
>
> On Jun 29, 2023 at 1:44:09 PM, Josh McKenzie  wrote:
>
>> In accord I added an opt-out for each hook, and will require such here as
>> well
>>
>> On for main branches, off for feature branches seems like it might
>> blanket satisfy this concern? Doesn't fix the "--atomic across 5 branches
>> means style checks and build on hook across those branches" which isn't
>> ideal. I don't think style check failures after push upstream are frequent
>> enough to make the cost/benefit there make sense overall are they?
>>
>> Related to this - I have sonarlint, spotbugs, and checkstyle all running
>> inside idea; since pulling those in and tuning the configs a bit I haven't
>> run into a single issue w/our checkstyle build target (go figure). Having
>> the required style checks reflected realtime inside your work environment
>> goes a long way towards making it a more intuitive part of your workflow
>> rather than being an annoying last minute block of your ability to progress
>> that requires circling back into the code.
>>
>> From a technical perspective, it looks like adding a reference
>> "externalDependencies.xml" to our ide/idea directory which we copied over
>> during "generate-idea-files" would be sufficient to get idea to pop up
>> prompts to install those extensions if you don't have them when opening the
>> project (theory; haven't tested).
>>
>> We'd need to make sure the configuration for each of those was calibrated
>> to our project out of the box of course, but making style considerations a
>> first-class citizen in that way seems a more intuitive and human-centered
>> approach to all this rather than debating nuance of our command-line
>> targets, hooks, and how we present things to people. To Berenguer's point -
>> better to have these be completely invisible to people with their workflows
>> and Just Work (except for when your IDE scolds you for bad behavior w/build
>> errors immediately).
>>
>> I still think Flags Are Bad. :)
>>
>> On Thu, Jun 29, 2023, at 1:38 PM, Ekaterina Dimitrova wrote:
>>
>> Should we just keep a consolidated for all kind of checks no-check flag
>> and get rid of the no-checkstyle one?
>>
>> Trading one for one with Josh :-)
>>
>> Best regards,
>> Ekaterina
>>
>> On Thu, 29 Jun 2023 at 10:52, Josh McKenzie  wrote:
>>
>>
>> I really prefer separate tasks than flags. Flags are not listed in the
>> help message like "ant -p" and are not auto-completed in the terminal. That
>> makes them almost undiscoverable for newcomers.
>>
>> Please, no more flags. We are *more* than flaggy enough right now.
>>
>> Having to dig through build.xml to determine how to change things or do
>> things is painful; the more we can avoid this (for oldtimers and newcomers
>> alike!) the better.
>>
>> On Thu, Jun 29, 2023, at 8:34 AM, Mick Semb Wever wrote:
>>
>>
>>
>> On Thu, 29 Jun 2023 at 13:30, Jacek Lewandowski <
>> lewandowski.ja...@gmail.com> wrote:
>>
>> There is another target called "build", which retrieves dependencies, and
>> then calls "build-project".
>>
>>
>>
>> Is it intended to be called by a user ?
>>
>> If not, please follow the ant style prefixing the target name with an
>> underscore (so that it does not appear in the `ant -projecthelp` list).
>>
>> If possible, I agree with Brandon, `build` is the better name to expose
>> to the user.
>>
>>
>>
>>


Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-29 Thread Ekaterina Dimitrova
Should we just keep a consolidated for all kind of checks no-check flag and
get rid of the no-checkstyle one?

Trading one for one with Josh :-)

Best regards,
Ekaterina

On Thu, 29 Jun 2023 at 10:52, Josh McKenzie  wrote:

> I really prefer separate tasks than flags. Flags are not listed in the
> help message like "ant -p" and are not auto-completed in the terminal. That
> makes them almost undiscoverable for newcomers.
>
> Please, no more flags. We are *more* than flaggy enough right now.
>
> Having to dig through build.xml to determine how to change things or do
> things is painful; the more we can avoid this (for oldtimers and newcomers
> alike!) the better.
>
> On Thu, Jun 29, 2023, at 8:34 AM, Mick Semb Wever wrote:
>
>
>
> On Thu, 29 Jun 2023 at 13:30, Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
> There is another target called "build", which retrieves dependencies, and
> then calls "build-project".
>
>
>
> Is it intended to be called by a user ?
>
> If not, please follow the ant style prefixing the target name with an
> underscore (so that it does not appear in the `ant -projecthelp` list).
>
> If possible, I agree with Brandon, `build` is the better name to expose to
> the user.
>
>
>


Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-26 Thread Ekaterina Dimitrova
Thank you, Jacek, for starting the thread; those things are essential for
developer productivity.

I support the idea of opting out vs opting into checks. In my experience,
it also makes things easier and faster during review time.

If people have to opt-in - it is one more thing for new people to discover,
and it will probably happen only during review time if they do not have
access to Jenkins/paid CircleCI, etc.

I also support consolidating all types of checks/analyses and running them
together.

Maxim’s suggestion about rat replacement sounds like a good improvement
that can be explored (not part of what Jacek does here, though). Maxim, do
you mind creating a ticket, please?

Best regards,
Ekaterina

On Mon, 26 Jun 2023 at 17:04, Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Yes, in this case, opting-out is better than opting-in. I feel like the
> build process is quite versatile and one just picks what is necessary. I
> never build docs, there is a flag for that. I turned off checkstyle because
> I was fed up with that until Berenguer cached it and now I get ant jar with
> checkstyle like under 10 seconds so I leave it on, which is great.
>
> Even though I feel like it is already flexible enough, grouping all
> checkstyles and rats etc under one target seems like a good idea. From my
> perspective, it is "all or nothing" so turning it all off until I am going
> to push it so I want it all on is a good idea. I barely want to "just
> checkstyle" in the middle of the development.
>
> I do not think that having a lot of flags is bad. I like that I have bash
> aliases almost for everything and I bet folks have their tricks to get the
> mundane stuff done.
>
> It would be pretty interesting to know the workflow of other people. I
> think there would be a lot of insights how other people have it on a daily
> basis when it comes to Cassandra development.
>
> 
> From: David Capwell 
> Sent: Monday, June 26, 2023 19:57
> To: dev
> Subject: Re: [DISCUSS] When to run CheckStyle and other verificiations
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> not running it automatically with the targets which devs usually run
> locally.
>
> The checks tend to have an opt-out, such as -Dno-checkstyle=true… so its
> really easy to setup your local environment to opt out what you do not care
> about… I feel we should force people to opt-out rather than opt-in…
>
>
>
> On Jun 26, 2023, at 7:47 AM, Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
> That would work as well Brandon, basically what is proposed in
> CASSANDRA-18618, that is "check" target, actually needs to build the
> project to perform some verifications - I suppose running "ant check"
> should be sufficient.
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> pon., 26 cze 2023 o 16:01 Brandon Williams  dri...@gmail.com>> napisał(a):
> The "artifacts" task is not quite the same since it builds other things
> like docs, which significantly contributes to longer build time.  I don't
> see why we couldn't add a new task that preserves the old behavior though,
> "fulljar" or something like that.
>
> Kind Regards,
> Brandon
>
>
> On Mon, Jun 26, 2023 at 6:12 AM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
> Yes, I've mentioned that there is a property we can set to skip checkstyle.
>
> Currently such a goal is "artifacts" which basically validates everything.
>
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> pon., 26 cze 2023 o 13:09 Mike Adamson  madam...@datastax.com>> napisał(a):
> While I like the idea of this because of added time these checks take, I
> was under the impression that checkstyle (at least) can be disabled with a
> flag.
>
> If we did do this, would it make sense to have a "release"  or "commit"
> target (or some other name) that ran a full build with all checks that can
> be used prior to pushing changes?
>
> On Mon, 26 Jun 2023 at 08:35, Berenguer Blasi  > wrote:
>
> I would prefer sthg that is totally transparent to me and not add one more
> step I have to remember. Just to push/run CI to find out I missed it and
> rinse and repeat... With the recent fix to checkstyle I am happy as things
> stand atm. My 2cts
>
> On 26/6/23 8:43, Jacek Lewandowski wrote:
> Hi,
>
> The context is that we currently have 3 checks in the build:
> - Checkstyle,
> - Eclipse-Warnings,
> - RAT
>
> CheckStyle and RAT are executed with almost every target we run: build,
> jar, test, test-some, testclasslist, etc.; on the other hand,
> Eclipse-Warnings is executed automatically only with the artifacts target.
>
> Checkstyle currently uses some caching, so subsequent reruns without
> cleaning the project validate only the modified files.
>
> Both CI - Jenkins and 

Re: [DISCUSS] Being specific about JDK versions and python lib versions in CI

2023-06-23 Thread Ekaterina Dimitrova
   Sounds like you were assuming the effort would require 3 Ekaterina?

Thank you Josh.
Yes, you are totally right. I forgot for a moment that the new scripts are
layered. Valid point about 2.
Now the question is how fast 2 will be to run it and how many people would
actually use it for running full CI.

On Fri, 23 Jun 2023 at 9:37, Josh McKenzie  wrote:

> I'll take engagement. Even if it's broadly disagreement. :D
>
> (Full disclosure, I have a weak opinion here held weakly):
>
> Wouldn’t we recommend people to use the test images the project CI use?
> Thus using in testing the versions we use? I would assume the repeatable CI
> will still expect test images the way we have now?
>
> Not exactly no. Check out CASSANDRA-18137
> <https://issues.apache.org/jira/browse/CASSANDRA-18137>, specifically:
>
> Proposal
>
>- ...
>-
> *CI-agnostic build and test scripts can be run with docker, and without
>any CI, on any machine. *
>
>
> I've been discussing this with mck (and working with him on that patch)
> and realistically what we have are architectural layers for CI:
>
>1. ant
>2. build/test scripts
>3. dockerized build/test scripts (using scripts from 2; removes env
>and setup variability)
>4. CI integrations (using 3 or output of 2)
>5. CI lifecycle (full stack setup, run, teardown)
>
> The debate there would be whether we accept CI results from environments
> using 2 (build/test scripts), or we require environments to use 3 (docker
> images so it's all fully uniform). Originally mick and I were leaning
> towards allowing results from environments using 2 which would then
> introduce possible drift in OS, JDK, and python env, which is where my
> questions on this thread came in. Sounds like you were assuming the effort
> would require 3 Ekaterina?
>
>  My concern with this is atrophy; we'll set the version once and when
> finally forced to update
>
> *In theory* we could pretty easily mitigate this through automation. For
> instance, having a branch where we run the test suite with no dependencies
> declared (i.e. run all latest) that we run periodically, and then update
> the main branch to reflect the last good run from that "always run latest"
> branch, could give us a blending of both worlds.
>
> So technically feasible or not, I think there's a good question that's
> surfacing here of "what problem would we be trying to solve". I think that
> pivots on 2 axes:
>
>1. Do we allow folks to certify in a CI environment using build/test
>scripts but not using specific docker images we provide on the project?
>2. How frequently do we see failures in tests (straight failure,
>flakes, etc) that are due to the versions of dependencies (JDK or python)
>that we're running the tests with?
>
> Regardless of 1, if the answer to 2 is "almost never", then there's likely
> no bridge to cross here.
>
> To your point JD, our pre-reqs for C* installation in our docs explicitly
> call out "Run Latest JDK", so conforming to that recommendation in our CI
> environment makes sense to me (see here
> <https://cassandra.apache.org/doc/latest/cassandra/getting_started/installing.html#prerequisites>
> ).
>
> On Thu, Jun 22, 2023, at 6:47 PM, Jeremy Hanna wrote:
>
>
> I like having the latest patch release always brought in for testing and
> CI for the JDK versions explicitly supported. So for this release, 11/17.
>
> On Jun 22, 2023, at 5:42 PM, Jeremiah Jordan 
> wrote:
>
> 
> Yes.  -1 on forcing the patch release version, and possibly minor version,
> for anything that is not absolutely necessary to do so.  Especially for
> things like Java or Python version, I would hope we just install the latest
> Java 8, Java 11, or Java 17 JDK for the platform the image is built from
> and run under them.  Otherwise we don’t find out until it’s too late when
> some JDK update breaks things.  I know I always tell users to run the
> latest JDK patch release, so we should do the same.
>
> If we want to pin the major version of something, then I have no problem
> there.
>
> -Jeremiah
>
> On Jun 22, 2023 at 5:00:36 PM, Ekaterina Dimitrova 
> wrote:
>
> Wouldn’t we recommend people to use the test images the project CI use?
> Thus using in testing the versions we use? I would assume the repeatable CI
> will still expect test images the way we have now?
> (I hope I did not misunderstand the question)
>
> I also share similar worries with Brandon
>
> On Thu, 22 Jun 2023 at 17:48, Brandon Williams  wrote:
>
> On Thu, Jun 22, 2023 at 4:23 PM Josh McKenzie 
> wrote:
> >
> > 2. Anyone concerned about us being specific about versi

Re: [DISCUSS] Being specific about JDK versions and python lib versions in CI

2023-06-22 Thread Ekaterina Dimitrova
Wouldn’t we recommend people to use the test images the project CI use?
Thus using in testing the versions we use? I would assume the repeatable CI
will still expect test images the way we have now?
(I hope I did not misunderstand the question)

I also share similar worries with Brandon

On Thu, 22 Jun 2023 at 17:48, Brandon Williams  wrote:

> On Thu, Jun 22, 2023 at 4:23 PM Josh McKenzie 
> wrote:
> >
> > 2. Anyone concerned about us being specific about versions in
> requirements.txt in the dtest repo?
>
> My concern with this is atrophy; we'll set the version once and when
> finally forced to update, find that a lot of other things must also be
> updated in order to do so.  I think our current approach of only
> setting them on things we require at a certain version like thrift has
> been successful thus far, and I don't think having different versions
> would be very common, but also not really affect repeatability if
> encountered.  You can see what versions are used from the logs though
> and could adjust them to be the same if necessary.
>


Re: [DISCUSSIONS] Replace ant eclipse-warnings with CheckerFramework

2023-06-18 Thread Ekaterina Dimitrova
Thank you all! It seems there is a consensus here so I updated accordingly
CASSANDRA-18239

On Fri, 16 Jun 2023 at 8:56, Jeremiah Jordan 
wrote:

> +1 from me.
>
> On Jun 15, 2023 at 1:01:01 PM, Ekaterina Dimitrova 
> wrote:
>
>> Hi everyone,
>> Happy Thursday!
>> Some time ago, Jacek raised the point that ant eclipse-warnings is 
>> generating too many false positives and not really working as expected. 
>> (CASSANDRA-18239)
>>
>> Reminder: ant eclipse-warnings is a task we run with the goal to check 
>> Cassandra code - static analysis to warn on unsafe use of Autocloseable 
>> instances; checks against two related particular compiler options
>>
>> While trying to upgrade ECJ compiler that we use for this task 
>> (CASSANDRA-18190) so we can switch the task from running it with JDK8 to 
>> JDK11 in preparation for dropping JDK8, I hit the following issues:
>> - the latest version of ECJ is throwing more than 300 Potential Resource 
>> Leak warnings. I looked at 10-15, and they were all false positives.
>> - Even if we file a bug report to the Eclipse community, JDK11 is about to 
>> be removed with the next version of the compiler
>>
>> So I shared this information with Jacek. He came up with a different 
>> solution:
>> It seems we already pull through Guava CheckerFramework with an MIT license, 
>> which appears to be acceptable according to this link -  
>> https://www.apache.org/legal/resolved.html#category-a
>> He already has an initial integration with Cassandra which shows the 
>> following:
>> - CheckerFramework does not understand the @SuppressWarnings("resource") 
>> (there is a different one to be used), so it is immediately visible how it 
>> does not report all those false positives that eclipse-warnings does. On the 
>> flip side, I got the feedback that what it has witnessed so far is something 
>> we should investigate.
>> - Also, there are additional annotations like @Owning that let you fix many 
>> problems at once because the tool understands that the ownership of the 
>> resources was passed to another entity; It also enables you to do something 
>> impossible with eclipse-warnings - you can tell the tool that there is 
>> another method that needs to be called to release the resources, like 
>> release, free, disconnect, etc.
>> - the tool works with JDK8, JDK11, JDK17, and JDK20, so we can backport it 
>> even to older branches (while at the same time keeping eclipse-warnings 
>> there)
>> - though it runs 8 minutes so, we should not run it with every test, some 
>> reorganization around ant tasks will be covered as even for eclipse-warnings 
>> it was weird to call it on every single test run locally by default
>>
>>
>> If there are no concerns, we will continue replacing ant eclipse-warnings 
>> with the CheckerFramework as part of CASSANDRA-18239 and CASSANDRA-18190 in 
>> trunk.
>>
>> Best regards,
>>
>> Ekaterina
>>
>>


Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-16 Thread Ekaterina Dimitrova
Hi all,
I was following the discussion. What Andres just summarized sounds
reasonable to me. Let’s just not forget to document also all this.
Thank you
Ekaterina

On Fri, 16 Jun 2023 at 10:16, Andrés de la Peña 
wrote:

> It seems we agree on removing the default value for the old thresholds,
> and don't count system keyspaces/tables on the new ones.
>
> The old thresholds were on active duty for around ten months, and they
> have been deprecated for around a year. They will have been deprecated for
> longer by the time we release 5.0. If we want to keep them in perpetuity, I
> guess the plan would be:
>
> - Remove the default value of the old thresholds in Config.java to make
> them disabled by default.
> - Remove the old thresholds from the default cassandra.yaml, although old
> yamls can still have them.
> - Use converters (@Replaces tag in Config.java) to read the old threshold
> values (if present) and apply them to the new guardrails.
> - During the conversion from the old thresholds to the new guardrails,
> subtract the current number of system keyspace/tables from the old value.
> For example, 150 tables in the old threshold translate to 103 tables in the
> new guardrail, considering that there are 47 system tables.
>
> Does this sound good?
>
> On Wed, 14 Jun 2023 at 17:26, David Capwell  wrote:
>
>> That's problematic because the new thresholds we added in CASSANDRA-17147
>> don't include system tables. Do you think we should change that?
>>
>>
>> I wouldn’t change the semantics of the config as it’s already live.  I
>> guess where I am coming from is that logically we have to think about the
>> system tables, so to your point, if we think 150 is too much and the system
>> already exposes 50… then we should recommend no more than 100….
>>
>> I find it's better for usability to not count the system tables and just
>> say "It's recommended not to have more than 100 tables. This doesn't
>> include system tables.”
>>
>>
>> I am fine with this framing… internally we think about 150 but
>> publicly speak 100 (due to our 50 tables)...
>>
>>
>> On Jun 14, 2023, at 8:29 AM, Josh McKenzie  wrote:
>>
>> In my opinion including system tables defeats that purpose because it
>> forces users to know details about the system tables.
>>
>> Perhaps having a unit test that caps our system tables at some value and
>> keeping the guardrail user-scope specific would be a better approach. I see
>> your point about leaking internal details to users, specifically on things
>> they can't control at this point.
>>
>> On Wed, Jun 14, 2023, at 8:19 AM, Andrés de la Peña wrote:
>>
>> > Default value I agree with you; features should be off by default!  If
>> we remove the default then we disable the feature by default (which im cool
>> with) and for anyone who changed the config, they would keep their behavior
>>
>>
>> I'm glad we agree on at least removing the default value if we keep the
>> deprecated properties.
>>
>> > With that, I kinda don’t agree that including system tables is a
>> mistake, as we add more we allow less for user tables before we start to
>> have issues….
>>
>>
>> That's problematic because the new thresholds we added in CASSANDRA-17147
>> don't include system tables. Do you think we should change that?
>>
>> I still think it's better not to include the system tables in the count.
>> The thresholds on the number of keyspaces/tables/rows/columns/tombstones
>> are just guidance since they cannot be exactly related to exact resource
>> consumption. The main purpose of those thresholds is to prevent obvious
>> antipatterns such as creating thousands of tables. A benefit of expressing
>> the guardrails in terms of the number of schema entities, rather than
>> counting the memory usage of those entities, is that they are easy to
>> understand and reason about. In my opinion including system tables defeats
>> that purpose because it forces users to know details about the system
>> tables. The fact that those details change between versions doesn't help.
>> Including system tables is not going to make the thresholds precise in
>> terms of measuring memory consumption because that depends on other
>> factors, such as the columns they store.
>>
>> Including system tables also imposes a minimum threshold value, like in
>> 5.0 you cannot set a threshold value under 45 tables without triggering it
>> with an empty db. For other thresholds, this can be more tricky. That would
>> be the case of the guardrail on the number of columns in a partition, where
>> you would need to know the size of the widest row in the system tables,
>> which can change over time.
>>
>> I guess that if system tables were to be counted, a recommendation for
>> the threshold would say something like "It's recommended not to have more
>> than 150 tables. The system already includes 45 tables for internal usage,
>> so you shouldn't create more than 105 user tables". I find it's better for
>> usability to not count the system tables and just 

Re: [DISCUSSIONS] Replace ant eclipse-warnings with CheckerFramework

2023-06-16 Thread Ekaterina Dimitrova
Got so excited that I forgot to say which of the two options exactly  I
meant - running the analysis only on changed files after the initial full
pass is done sounds like a good improvement to me

On Fri, 16 Jun 2023 at 7:43, Ekaterina Dimitrova 
wrote:

> I think this is a great idea and it will probably reduce the time to run
> it. Thank you!
>
> On Fri, 16 Jun 2023 at 7:40, Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> Additional question is whether we want to run the checks against the
>> whole project or just against the file changes between the feature branch
>> and the target release branch?
>>
>>
>> - - -- --- -  -
>> Jacek Lewandowski
>>
>>
>> pt., 16 cze 2023 o 13:09 Aleksey Yeshchenko 
>> napisał(a):
>>
>>> Sounds like a clear improvement to me. Only once this check flagged a
>>> legitimate issue I missed, if I’m remembering correctly. All other
>>> instances have just been annoyances, forcing to add a redundant suppressed
>>> annotation.
>>>
>>> On 15 Jun 2023, at 19:01, Ekaterina Dimitrova 
>>> wrote:
>>>
>>> Hi everyone,
>>> Happy Thursday!
>>> Some time ago, Jacek raised the point that ant eclipse-warnings is 
>>> generating too many false positives and not really working as expected. 
>>> (CASSANDRA-18239)
>>>
>>> Reminder: ant eclipse-warnings is a task we run with the goal to check 
>>> Cassandra code - static analysis to warn on unsafe use of Autocloseable 
>>> instances; checks against two related particular compiler options
>>>
>>> While trying to upgrade ECJ compiler that we use for this task 
>>> (CASSANDRA-18190) so we can switch the task from running it with JDK8 to 
>>> JDK11 in preparation for dropping JDK8, I hit the following issues:
>>> - the latest version of ECJ is throwing more than 300 Potential Resource 
>>> Leak warnings. I looked at 10-15, and they were all false positives.
>>> - Even if we file a bug report to the Eclipse community, JDK11 is about to 
>>> be removed with the next version of the compiler
>>>
>>> So I shared this information with Jacek. He came up with a different 
>>> solution:
>>> It seems we already pull through Guava CheckerFramework with an MIT 
>>> license, which appears to be acceptable according to this link -  
>>> https://www.apache.org/legal/resolved.html#category-a
>>> He already has an initial integration with Cassandra which shows the 
>>> following:
>>> - CheckerFramework does not understand the @SuppressWarnings("resource") 
>>> (there is a different one to be used), so it is immediately visible how it 
>>> does not report all those false positives that eclipse-warnings does. On 
>>> the flip side, I got the feedback that what it has witnessed so far is 
>>> something we should investigate.
>>> - Also, there are additional annotations like @Owning that let you fix many 
>>> problems at once because the tool understands that the ownership of the 
>>> resources was passed to another entity; It also enables you to do something 
>>> impossible with eclipse-warnings - you can tell the tool that there is 
>>> another method that needs to be called to release the resources, like 
>>> release, free, disconnect, etc.
>>> - the tool works with JDK8, JDK11, JDK17, and JDK20, so we can backport it 
>>> even to older branches (while at the same time keeping eclipse-warnings 
>>> there)
>>> - though it runs 8 minutes so, we should not run it with every test, some 
>>> reorganization around ant tasks will be covered as even for 
>>> eclipse-warnings it was weird to call it on every single test run locally 
>>> by default
>>>
>>>
>>> If there are no concerns, we will continue replacing ant eclipse-warnings 
>>> with the CheckerFramework as part of CASSANDRA-18239 and CASSANDRA-18190 in 
>>> trunk.
>>>
>>> Best regards,
>>>
>>> Ekaterina
>>>
>>>
>>>


Re: [DISCUSSIONS] Replace ant eclipse-warnings with CheckerFramework

2023-06-16 Thread Ekaterina Dimitrova
I think this is a great idea and it will probably reduce the time to run
it. Thank you!

On Fri, 16 Jun 2023 at 7:40, Jacek Lewandowski 
wrote:

> Additional question is whether we want to run the checks against the whole
> project or just against the file changes between the feature branch and the
> target release branch?
>
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> pt., 16 cze 2023 o 13:09 Aleksey Yeshchenko 
> napisał(a):
>
>> Sounds like a clear improvement to me. Only once this check flagged a
>> legitimate issue I missed, if I’m remembering correctly. All other
>> instances have just been annoyances, forcing to add a redundant suppressed
>> annotation.
>>
>> On 15 Jun 2023, at 19:01, Ekaterina Dimitrova 
>> wrote:
>>
>> Hi everyone,
>> Happy Thursday!
>> Some time ago, Jacek raised the point that ant eclipse-warnings is 
>> generating too many false positives and not really working as expected. 
>> (CASSANDRA-18239)
>>
>> Reminder: ant eclipse-warnings is a task we run with the goal to check 
>> Cassandra code - static analysis to warn on unsafe use of Autocloseable 
>> instances; checks against two related particular compiler options
>>
>> While trying to upgrade ECJ compiler that we use for this task 
>> (CASSANDRA-18190) so we can switch the task from running it with JDK8 to 
>> JDK11 in preparation for dropping JDK8, I hit the following issues:
>> - the latest version of ECJ is throwing more than 300 Potential Resource 
>> Leak warnings. I looked at 10-15, and they were all false positives.
>> - Even if we file a bug report to the Eclipse community, JDK11 is about to 
>> be removed with the next version of the compiler
>>
>> So I shared this information with Jacek. He came up with a different 
>> solution:
>> It seems we already pull through Guava CheckerFramework with an MIT license, 
>> which appears to be acceptable according to this link -  
>> https://www.apache.org/legal/resolved.html#category-a
>> He already has an initial integration with Cassandra which shows the 
>> following:
>> - CheckerFramework does not understand the @SuppressWarnings("resource") 
>> (there is a different one to be used), so it is immediately visible how it 
>> does not report all those false positives that eclipse-warnings does. On the 
>> flip side, I got the feedback that what it has witnessed so far is something 
>> we should investigate.
>> - Also, there are additional annotations like @Owning that let you fix many 
>> problems at once because the tool understands that the ownership of the 
>> resources was passed to another entity; It also enables you to do something 
>> impossible with eclipse-warnings - you can tell the tool that there is 
>> another method that needs to be called to release the resources, like 
>> release, free, disconnect, etc.
>> - the tool works with JDK8, JDK11, JDK17, and JDK20, so we can backport it 
>> even to older branches (while at the same time keeping eclipse-warnings 
>> there)
>> - though it runs 8 minutes so, we should not run it with every test, some 
>> reorganization around ant tasks will be covered as even for eclipse-warnings 
>> it was weird to call it on every single test run locally by default
>>
>>
>> If there are no concerns, we will continue replacing ant eclipse-warnings 
>> with the CheckerFramework as part of CASSANDRA-18239 and CASSANDRA-18190 in 
>> trunk.
>>
>> Best regards,
>>
>> Ekaterina
>>
>>
>>


[DISCUSSIONS] Replace ant eclipse-warnings with CheckerFramework

2023-06-15 Thread Ekaterina Dimitrova
Hi everyone,
Happy Thursday!
Some time ago, Jacek raised the point that ant eclipse-warnings is
generating too many false positives and not really working as
expected. (CASSANDRA-18239)

Reminder: ant eclipse-warnings is a task we run with the goal to check
Cassandra code - static analysis to warn on unsafe use of
Autocloseable instances; checks against two related particular
compiler options

While trying to upgrade ECJ compiler that we use for this task
(CASSANDRA-18190) so we can switch the task from running it with JDK8
to JDK11 in preparation for dropping JDK8, I hit the following issues:
- the latest version of ECJ is throwing more than 300 Potential
Resource Leak warnings. I looked at 10-15, and they were all false
positives.
- Even if we file a bug report to the Eclipse community, JDK11 is
about to be removed with the next version of the compiler

So I shared this information with Jacek. He came up with a different solution:
It seems we already pull through Guava CheckerFramework with an MIT
license, which appears to be acceptable according to this link -
https://www.apache.org/legal/resolved.html#category-a
He already has an initial integration with Cassandra which shows the following:
- CheckerFramework does not understand the
@SuppressWarnings("resource") (there is a different one to be used),
so it is immediately visible how it does not report all those false
positives that eclipse-warnings does. On the flip side, I got the
feedback that what it has witnessed so far is something we should
investigate.
- Also, there are additional annotations like @Owning that let you fix
many problems at once because the tool understands that the ownership
of the resources was passed to another entity; It also enables you to
do something impossible with eclipse-warnings - you can tell the tool
that there is another method that needs to be called to release the
resources, like release, free, disconnect, etc.
- the tool works with JDK8, JDK11, JDK17, and JDK20, so we can
backport it even to older branches (while at the same time keeping
eclipse-warnings there)
- though it runs 8 minutes so, we should not run it with every test,
some reorganization around ant tasks will be covered as even for
eclipse-warnings it was weird to call it on every single test run
locally by default


If there are no concerns, we will continue replacing ant
eclipse-warnings with the CheckerFramework as part of CASSANDRA-18239
and CASSANDRA-18190 in trunk.

Best regards,

Ekaterina


Re: [DISCUSS] Moving system property names to the CassandraRelevantProperties

2023-06-13 Thread Ekaterina Dimitrova
Thank you Maxim!

I also agree with 1 and 2. Also, thank you for considering documentation.
The datastax docs page mentioned here is not maintained by the project, the
other one can get some love. I guess if we are adding description column,
we can automate the way we populate that page. But I think this is
something for a separate ticket for docs mgmt.

On Tue, 13 Jun 2023 at 15:05, Mick Semb Wever  wrote:

> > I want also to clarify and discuss with you some points related to the
> > SystemPropertiesTable virtual table.
> >
> > 1. Hide non-production environment properties in the SystemProperties
> > virtual table.
> >
> > As you may know, the result of the query on the virtual table
> > currently includes the environment properties related to tests as well
> > (used internally for our testing purposes) ~ 42 out of 290. This seems
> > a bit redundant for production use and floods the query output with
> > the things you don't need to think about. I think we can add a new
> > property -Dcassandra.vt.show.test.system.properties (false by default)
> > properties to hide these test-related properties for production
> > environments while still using them for our test runs and build
> > scripts. Hiding test properties is not a regression in this case. Any
> > thoughts?
> >
> >
> > 2. The absence of a description of the system properties.
> >
> > I have found that there is no good description of the system
> > properties used to configure production environments. Is there any
> > documentation for this case? Do we need to keep these pages up to
> > date?
>
>
>
> It's a yes to both (1) and (2) from me.
>
> Appreciate it Maxim.  Also that CASSANDRA-18587 is taking into account
> compatibility.
>


Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread Ekaterina Dimitrova
+1

On Tue, 13 Jun 2023 at 10:33, Jeff Jirsa  wrote:

> +1
>
>
> On Tue, Jun 13, 2023 at 7:15 AM Jeremy Hanna 
> wrote:
>
>> Calling for a vote on CEP-8 [1].
>>
>> To clarify the intent, as Benjamin said in the discussion thread [2], the
>> goal of this vote is simply to ensure that the community is in favor of
>> the donation. Nothing more.
>> The plan is to introduce the drivers, one by one. Each driver donation
>> will need to be accepted first by the PMC members, as it is the case for
>> any donation. Therefore the PMC should have full control on the pace at
>> which new drivers are accepted.
>>
>> If this vote passes, we can start this process for the Java driver under
>> the direction of the PMC.
>>
>> Jeremy
>>
>> 1.
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>>
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Ekaterina Dimitrova
+1

On Thu, 25 May 2023 at 16:46, Brandon Williams  wrote:

> +1
>
> Kind Regards,
> Brandon
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
> >
> > Let's make this official.
> >
> > CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
> >
> > POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
>


Re: [DISCUSS] Feature branch version hygiene

2023-05-16 Thread Ekaterina Dimitrova
I do not think we discussed it from feature branch perspective. (Happy to
stand corrected but I did not find anything in @dev archive myself) But we
do mark with 5.0 anything that lands in trunk. I think it might be a good
idea anything that lands in a feature branch to have different fix version
until the feature branch is merged.

“ Why I’m not keen on 5.0 is because if we cut the release today those
tickets would not be there.”

I guess it can make it easier also from Release Management process as if I
remember correctly there is a script that changes potentially all tickets
resolved with major version (in this case 5.0) to 5.0-alpha or whatever we
stop on to be the release version.

Though NA can be confusing I guess. Shall we use something like
5.0-candidate, 6.0-candidate? This can be based on the confidence people
have around a feature branch, where it can potentially land. I am curious
how other projects do it too.

I think it is a good call to decide on something and document it. I can
imagine it will be also easier during release management. Thank you
Jeremiah for raising the topic

Best regards,
Ekaterina


On Tue, 16 May 2023 at 16:04, Benedict  wrote:

> Copying my rely on the ticket…
>
>
> We have this discussion roughly once per major. If you look back through
> dev@ you'll find the last one a few years back.
>
> I don't recall NA ever being the approved approach, though. ".x" lines are
> target versions, whereas concrete versions are the ones a fix landed in.
> There's always ambiguity over the next release, as it's sort of both. But
> since there is no 5.0 version, only 5.0-alphaN, 5.0-betaN and 5.0.0,
> perhaps 5.0 is the correct label (and makes sense to me). I forget what we
> landed upon last time.
>
> Work that has actually landed should probably be labelled as 5.0-alpha1
>
> On 16 May 2023, at 21:02, J. D. Jordan  wrote:
>
> 
>
> Process question/discussion. Should tickets that are merged to CEP feature
> branches, like  https://issues.apache.org/jira/browse/CASSANDRA-18204, have
> a fixver of 5.0 on them After merging to the feature branch?
>
>
> For the SAI CEP which is also using the feature branch method the
> "reviewed and merged to feature branch" tickets seem to be given a version
> of NA.
>
>
> Not sure that's the best “waiting for cep to merge” version either?  But
> it seems better than putting 5.0 on them to me.
>
>
> Why I’m not keen on 5.0 is because if we cut the release today those
> tickets would not be there.
>
>
> What do other people think?  Is there a better version designation we can
> use?
>
>
> On a different project I have in the past made a “version number” in JIRA
> for each long running feature branch. Tickets merged to the feature branch
> got the epic ticket number as their version, and then it got updated to the
> “real” version when the feature branch was merged to trunk.
>
>
> -Jeremiah
>
>


Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-10 Thread Ekaterina Dimitrova
+1

On Wed, 10 May 2023 at 0:40, Dinesh Joshi  wrote:

> +1
>
> > On May 8, 2023, at 1:52 AM, Piotr Kołaczkowski 
> wrote:
> >
> > Let's vote.
> >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
> >
> > Piotr Kołaczkowski
> > e. pkola...@datastax.com
> > w. www.datastax.com
>
>


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Ekaterina Dimitrova
“ KEYSPACE is fine. If we want to introduce a standard nomenclature like
DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no
benefit.

I think it would be fine to introduce some arbitrary unrelated concept for
assigning tables with similar behaviours some configuration that is
orthogonal to replication, but that should be a different discussion about
how we evolve config.”

+1


On Thu, 6 Apr 2023 at 5:26, Benedict  wrote:

> KEYSPACE is fine. If we want to introduce a standard nomenclature like
> DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no
> benefit.
>
> I think it would be fine to introduce some arbitrary unrelated concept for
> assigning tables with similar behaviours some configuration that is
> orthogonal to replication, but that should be a different discussion about
> how we evolve config.
>
> On 6 Apr 2023, at 09:40, Mick Semb Wever  wrote:
>
> 
>
> Something like "TABLESPACE" or 'TABLEGROUP" would *theoretically* better
>> satisfy point 1 and 2 above but subjectively I kind of recoil at both
>> equally. So there's that.
>>
>
>
>
> TABLEGROUP would work for me.  Immediately intuitive.
>
> brain-storming…
>
> A keyspace today defines replication strategy, rf, and durable_writes. If
> they also had the table options that could be defined as defaults for all
> tables in that group, and one tablegroup could be a child and inherit
> settings from another tablegroup, you could logically group tables in ways
> that both benefit your application platform's taxonomy and the spread of
> keyspace/table settings. DATABASE, NAMESPACE, whatever, can be aliases to
> it too, if you like.
>
>
>
>
>


Re: Google Season of Docs

2023-04-04 Thread Ekaterina Dimitrova
Thank you for your efforts Lorina!

On Tue, 4 Apr 2023 at 16:26, Deepak Vohra via dev 
wrote:

>   I noticed that fewer projects were selected this year and no Apache
> project was selected.
>
> On Monday, April 3, 2023 at 06:07:53 p.m. EDT, Nate McCall <
> zznat...@gmail.com> wrote:
>
>
> Thank you for your effort regardless, Lorina. Very much appreciated!
>
> On Tue, Apr 4, 2023 at 6:39 AM lorinapoland 
> wrote:
>
> Sadly, I am informing the community that our grant application to GSoD was
> unsuccessful.
>
> If you would like to see the list of winning projects, check out
> https://developers.google.com/season-of-docs/docs/participants.
>
> Lorina
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
>


Re: [VOTE] CEP-26: Unified Compaction Strategy

2023-04-04 Thread Ekaterina Dimitrova
+1

On Tue, 4 Apr 2023 at 11:44, Benjamin Lerer  wrote:

> +1
>
> Le mar. 4 avr. 2023 à 17:17, Andrés de la Peña  a
> écrit :
>
>> +1
>>
>> On Tue, 4 Apr 2023 at 15:09, Jeremy Hanna 
>> wrote:
>>
>>> +1 nb, will be great to have this in the codebase - it will make nearly
>>> every table's compaction work more efficiently.  The only possible
>>> exception is tables that are well suited for TWCS.
>>>
>>> On Apr 4, 2023, at 8:00 AM, Berenguer Blasi 
>>> wrote:
>>>
>>> +1
>>> On 4/4/23 14:36, J. D. Jordan wrote:
>>>
>>> +1
>>>
>>> On Apr 4, 2023, at 7:29 AM, Brandon Williams 
>>>  wrote:
>>>
>>> 
>>> +1
>>>
>>> On Tue, Apr 4, 2023, 7:24 AM Branimir Lambov  wrote:
>>>
 Hi everyone,

 I would like to put CEP-26 to a vote.

 Proposal:

 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy

 JIRA and draft implementation:
 https://issues.apache.org/jira/browse/CASSANDRA-18397

 Up-to-date documentation:

 https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md

 Discussion:
 https://lists.apache.org/thread/8xf5245tclf1mb18055px47b982rdg4b

 The vote will be open for 72 hours.
 A vote passes if there are at least three binding +1s and no binding
 vetoes.

 Thanks,
 Branimir

>>>
>>>


  1   2   3   4   >