from:"Abe Ratnofsky"

Re: Welcome Joey Lynch as Cassandra PMC member

2024-07-24 Thread Abe Ratnofsky

Congratulations!

Re: [VOTE] CEP-42: Constraints Framework

2024-07-02 Thread Abe Ratnofsky

+1 (nb)

> On Jul 2, 2024, at 12:15 PM, Yifan Cai  wrote:
> 
> +1 on CEP-42.
> 
> - Yifan
> 
> On Tue, Jul 2, 2024 at 5:17 AM Jon Haddad  > wrote:
>> +1
>> 
>> On Tue, Jul 2, 2024 at 5:06 AM > > wrote:
>>> +1
>>> 
>>> 
 On Jul 1, 2024, at 8:34 PM, Doug Rohrer >>> > wrote:
 
 +1 (nb) - Thanks for all of the suggestions and Bernardo for wrangling the 
 CEP into shape!
 
 Doug
 
> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi  > wrote:
> 
> +1
> 
> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg  > wrote:
>> Hi,
>> 
>> I am +1 on CEP-42 with the latest updates to the CEP to clarify syntax, 
>> error messages, constraint naming and generated naming, alter/drop, 
>> describe etc.
>> 
>> I think this now tracks very closely to how other SQL databases define 
>> constraints and the syntax is easily extensible to multi-column and 
>> multi-table constraints.
>> 
>> Ariel
>> 
>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>>> With all the feedback that came in the discussion thread after the call 
>>> for votes, I’d like to extend the period another 72 hours starting 
>>> today.
>>> 
>>> As before, a vote passes if there are at least 3 binding +1s and no 
>>> binding vetoes.
>>> 
>>> Thanks,
>>> Bernardo Botella
>>> 
 On Jun 24, 2024, at 7:17 AM, Bernardo Botella 
 mailto:conta...@bernardobotella.com>> 
 wrote:
 
 Hi everyone,
 
 I would like to start the voting for CEP-42.
 
 Proposal: 
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
 Discussion: 
 https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
 
 The vote will be open for 72 hours. A vote passes if there are at 
 least 3 binding +1s and no binding vetoes.
 
 Thanks,
 Bernardo Botella
>> 
 
>>>

Re: [VOTE] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-06-26 Thread Abe Ratnofsky

+1

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-25 Thread Abe Ratnofsky

If we're going to introduce a feature that looks like SQL constraints, we 
should make sure it's "reasonably" compliant. In particular, we should avoid 
situations where a user creates a constraint, writes some data, then reads data 
that violates that constraint, unless they've expressed that violations on read 
would be acceptable.

For Postgres, when adding a new constraint you can specify NOT VALID to avoid 
scanning all existing relevant data[1]. If we want to avoid scan-on-DDL, this 
tradeoff needs to be made clear to a user.

As we've already discussed, constraints must deal with operations that appear 
within limits on the write path, but once reconciled on read or during 
compaction can lead to a violation. Adding to non-frozen collections is one 
example. Expecting users to understand the write path for collections feels 
unrealistic to me; I wonder if we should express in the constraint itself that 
it only applies during write.

Anything that uses "nodetool import" (including cassandra-analytics) could 
theoretically push constraint-violating mutations to a table. We could update 
import to scan table contents first, or add a flag to trust the data in 
imported SSTables and make cassandra-analytics executors aware of table-level 
constraints.

Some client implementations read the system_schema tables to build their object 
mappers, I'd like to confirm that nothing will require clients to be aware of 
these new schema constructs.

Overall, I'm supportive of the distinctions discussed between constraints and 
guardrails and like the direction this is heading; I'd just like to make sure 
the more detailed semantics aren't confusing or misleading for our users, and 
semantics are much harder to change in the future.

[1]: https://www.postgresql.org/docs/current/sql-altertable.html

Re: [DISCUSS] spark-cassandra-connector donation to Analytics subproject

2024-06-24 Thread Abe Ratnofsky

Likewise - another vote in favor of bringing in this subproject.

Any thoughts on bringing in dsbulk as well? dsbulk has a lower barrier to entry 
than Spark Cassandra Connector, addresses a real need for users, and appears to 
be at a similar place in its project lifecycle.

Abe

> On Jun 24, 2024, at 4:36 PM, Francisco Guerrero  wrote:
> 
> Yeah, having the connector will enhance the Cassandra ecosystem. I'm looking 
> forward to this contribution.
> 
> On 2024/06/24 17:28:48 "C. Scott Andreas" wrote:
>> Supportive of accepting a donation of the Spark Cassandra Connector under 
>> the project's umbrella as well - I think that would be very welcome and 
>> appreciated. Spark Cassandra Connector and the Analytics library are also 
>> suited to slightly different usage patterns. SCC can be a good fit for Spark 
>> jobs that operate with a high degree of selectivity; vs. larger bulk scoops. 
>> – Scott On Jun 24, 2024, at 1:29 AM, Jon Haddad  wrote: 
>> I also think it would be a great contribution, especially since the bulk 
>> analytics library can’t be used by the majority of teams, since it’s hard 
>> coded to only work with single token clusters. On Mon, Jun 24, 2024 at 9:51 
>> AM Dinesh Joshi < djo...@apache.org > wrote: This would be a great 
>> contribution to have for the Analytics subproject. The current bulk 
>> functionality in the Analytics subproject complements the 
>> spark-cassandra-connector so I see it as a good fit for donation. On Mon, 
>> Jun 24, 2024 at 12:32 AM Mick Semb Wever < m...@apache.org > wrote: What are 
>> folks thoughts on accepting a donation of the spark-cassandra-connector 
>> project into the Analytics subproject ? A number of folks have requested 
>> this, stating that they cannot contribute to the project while it is under 
>> DataStax. The project has largely been in maintenance mode the past few 
>> years. Under ASF I believe that it will attract more attention and 
>> contributions, and offline discussions I have had indicate that the 
>> spark-cassandra-connector remains an important complement to the bulk 
>> analytics component.

Re: Cassandra PMC Chair Rotation, 2024 Edition

2024-06-20 Thread Abe Ratnofsky

Congrats Dinesh! Thank you Josh!

> On Jun 20, 2024, at 11:53 AM, Jeremiah Jordan  
> wrote:
> 
> Welcome to the Chair role Dinesh!  Congrats!
> 
> On Jun 20, 2024 at 10:50:37 AM, Josh McKenzie  > wrote:
>> Another PMC Chair baton pass incoming! On behalf of the Apache Cassandra 
>> Project Management Committee (PMC) I would like to welcome and congratulate 
>> our next PMC Chair Dinesh Joshi (djoshi).
>> 
>> Dinesh has been a member of the PMC for a few years now and many of you 
>> likely know him from his thoughtful, measured presence on many of our 
>> collective discussions as we've grown and evolved over the past few years.
>> 
>> I appreciate the project trusting me as liaison with the board over the past 
>> year and look forward to supporting Dinesh in the role in the future.
>> 
>> Repeating Mick (repeating Paulo's) words from last year: The chair is an 
>> administrative position that interfaces with the Apache Software Foundation 
>> Board, by submitting regular reports about project status and health. Read 
>> more about the PMC chair role on Apache projects:
>> - https://www.apache.org/foundation/how-it-works.html#pmc
>> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>> 
>> The PMC as a whole is the entity that oversees and leads the project and any 
>> PMC member can be approached as a representative of the committee. A list of 
>> Apache Cassandra PMC members can be found on: 
>> https://cassandra.apache.org/_/community.html

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Abe Ratnofsky

I've thought about this some more. It would be useful for Cassandra to support 
user-defined "guardrails" (or constraints, whatever you want to call them), 
that could be applied per keyspace or table. Whether a user or an operator is 
considered the owner of a table depends on the organization deploying 
Cassandra, so allowing both parties to protect their tables against mis-use 
seems good to me, especially for large multi-tenant clusters with diverse 
workloads.

For example, it would be really useful if a user could set the 
Guardrails.{read,write}ConsistencyLevels for their tables, or declare whether 
all operations should be over LWTs to avoid mixing regular and LWT workloads.

I'm hesitant about adding lots of expression syntax to the CONSTRAINT clause. I 
think I'd prefer a function calling syntax that represents:
1. Whether the constraint is system / keyspace / table scoped
2. Where in query processing the constraint is checked
3. What is executed by the check

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Abe Ratnofsky

Hey Bernardo,

Thanks for the proposal and putting together your summary of the discussion. A 
few thoughts:

I'm not completely convinced of the value of CONSTRAINTS for a database like 
Cassandra, which doesn't support any referential integrity checks, doesn't do 
read-before-write for all queries, and doesn't have a wide library of built-in 
functions.

I'd be a supporter of more BIFs, and that's a solvable problem. String size, 
collection size, timestamp conversions, etc. could all be useful, even though 
there's not much gained over doing them in the client.

With constraints only being applied during write coordination, there's not much 
of an advantage over implementing the equivalent constraints in clients. Writes 
that don't include all columns could violate multi-column constraints, like 
your (a > b) example, for the same reason as CASSANDRA-19007 
. Constraints could be 
limited to only apply to frozen columns, where it's known that the entire value 
will be updated at once.

I don't think we should include any constraints where valid user action would 
lead to a violated constraint, like permitting multi-column constraints on 
regular columns or non-frozen types, since they would be too prone to mis-use.

Regarding 19007, it could be useful to have a constraint that indicates that a 
subset of columns will always be updated together, since that would actually 
allow Cassandra to know which read queries are safe, and permit a fix for 19007 
that minimizes the additional data replicas need to send to coordinators on 
ALLOW FILTERING queries. That's a very specific situation and shouldn't justify 
a new framework / API, but might be a useful consequence of it.

> - isJson (is the text a json?)

Wouldn't it be more compelling to have a new type, analogous to the Postgres 
JSONB type? https://www.postgresql.org/docs/current/datatype-json.html

If we're going to parse the entire JSON blob for validation, we might as well 
store it in an optimized format, support better access patterns, etc.

Re: [DISCUSS] Stream Pipelines on hot paths

2024-05-31 Thread Abe Ratnofsky

+1 to forbidding Stream usage entirely; the convenience of using them outside 
of hot paths is less than the burden of figuring out whether or not a 
particular path is hot. Even for reviewers it can be difficult to tell whether 
a particular path is hot; hard-to-diagnose bugs like CASSANDRA-18110 
 were somewhat caused by 
this ambiguity.

I wonder if there are ways we can better automate tracking of 
performance-sensitive paths. I know FB's Infer claims to be able to do this[1], 
but is limited in practice.

[1]: https://fbinfer.com/docs/checker-annotation-reachability

> On May 31, 2024, at 10:19 AM, Benedict Elliott Smith  
> wrote:
> 
> I think I have already proposed a simple solution to this problem on the 
> thread: if anyone says it’s a hot path (and cannot be persuaded otherwise), 
> it should be treated as such. Saves argument, but permits an easy escape 
> hatch if everyone agrees with minimal discussion.
> 
> I think this is a good general principle for raising standards in the 
> codebase like this: if somebody says something is important, and cannot be 
> convinced otherwise, then it should generally be treated as important. This 
> is different from cases where there are simply competing approaches.
> 
> That said, if people want to be absolutist about this I won’t mind.
> 
> 
> 
>> On 31 May 2024, at 15:04, Benjamin Lerer  wrote:
>> 
>> For me the definition of hot path is too vague. We had arguments with 
>> Berenger multiple times and it is more a waste of time than anything else at 
>> the end. If we are truly concerned about stream efficiency then we should 
>> simply forbid them. That will avoid lengthy discussions about what 
>> constitute the hot path and what does not.
>> 
>> Le ven. 31 mai 2024 à 11:08, Berenguer Blasi > > a écrit :
>>> +1 on avoiding streams in hot paths
>>> 
>>> On 31/5/24 9:48, Benedict wrote:
 My concept of hot path is simply anything we can expect to be called 
 frequently enough in normal operation that it might show up in a profiler. 
 If it’s a library method then it’s reasonable to assume it should be able 
 to be used in a hot path unless clearly labelled otherwise. 
 
 In my view this includes things that might normally be masked by caching 
 but under supported workloads may not be - such as query preparation.
 
 In fact, I’d say the default assumption should probably be that a method 
 is “in a hot path” unless there’s good argument they aren’t - such as that 
 the operation is likely to be run at some low frequency and the slow part 
 is not part of any loop. Repair setup messages perhaps aren’t a hot path 
 for instance (unless many of them are sent per repair), but validation 
 compaction or merkle tree construction definitely is.
 
 I think it’s fine to not have perfect agreement about edge cases, but if 
 anyone in a discussion thinks something is a hot path then it should be 
 treated as one IMO.
 
> On 30 May 2024, at 18:39, David Capwell  
>  wrote:
> 
>  As a general statement I agree with you (same for String.format as 
> well), but one thing to call out is that it can be hard to tell what is 
> the hot path and what isn’t.  When you are doing background work (like 
> repair) its clear, but when touching something internal it can be hard to 
> tell; this can also be hard with shared code as it gets authored outside 
> the hot path then later used in the hot path…
> 
> Also, what defines hot path?  Is this user facing only?  What about 
> Validation/Streaming (stuff processing a large dataset)?  
> 
>> On May 30, 2024, at 9:29 AM, Benedict  
>>  wrote:
>> 
>> Since it’s related to the logging discussion we’re already having, I 
>> have seen stream pipelines showing up in a lot of traces recently. I am 
>> surprised; I thought it was understood that they shouldn’t be used on 
>> hot paths as they are not typically as efficient as old skool for-each 
>> constructions done sensibly, especially for small collections that may 
>> normally take zero or one items.
>> 
>> I would like to propose forbidding the use of streams on hot paths 
>> without good justification that the cost:benefit is justified. 
>> 
>> It looks like it was nominally agreed two years ago that we would 
>> include words to this effect in the code style guide, but I forgot to 
>> include them when I transferred the new contents from the Google Doc 
>> proposal. So we could just include the “Performance” section that was 
>> meant to be included at the time.
>> 
>> lists.apache.org
>> 
>>  
>> lists.apache.org
>>

Re: [DISCUSS] Adding experimental vtables and rules around them

2024-05-29 Thread Abe Ratnofsky

I agree that ClientWarning is the best way to indicate the risk of using an 
experimental feature directly to the user. Presenting information in the client 
application's logs directly means that the person who wrote the query is most 
likely to see the warning, rather than an operator who sees cluster logs.

I don't think it's necessary to attach a ClientWarning to every single client 
response; a ClientWarning analog to NoSpamLogger would be useful for this 
("warn a client at most once per day").

This would also be useful for warning on usage of deprecated features.

> On May 29, 2024, at 3:01 PM, David Capwell  wrote:
> 
> We agreed a long time ago that all new features are disabled by default, but 
> I wanted to try to flesh out what we “should” do with something that might 
> still be experimental and subject to breaking changes; I would prefer we keep 
> this thread specific to vtables as the UX is different for different types of 
> things…
> 
> So, lets say we are adding a set of vtables but we are not 100% sure what the 
> schema should be and we learn after the release that changes should be made, 
> but that would end up breaking the table… we currently define everything as 
> “don’t break this” so if we publish a table that isn’t 100% baked we are 
> kinda stuck with it for a very very long time… I would like to define a way 
> to expose vtables that are subject to change (breaking schema changes) across 
> different release and rules around them (only in minor?  Maybe even in 
> patch?).
> 
> Lets try to use a concrete example so everyone is on the same page.
> 
> Accord is disabled by default (it is a new feature), so the vtables to expose 
> internals would be expected to be undefined and not present on the instance.
> 
> When accord is enabled (accord.enabled = true) we add a set of vtables:
> 
> Epochs - shows what epochs are known to accord
> Cache - shows how the internal caches are performing
> Etc.
> 
> Using epochs as an example it currently only shows a single column: the long 
> epoch
> 
> CREATE VIRTUAL TABLE system_accord.epochs (epoch bigint PRIMARY KEY);
> 
> Lets say we find that this table isn’t enough and we really need to scope it 
> to each of the “stores” (threads for processing accord tasks)
> 
> CREATE VIRTUAL TABLE system_accord.epochs (epoch bigint, store_id int, 
> PRIMARY KEY (epoch, store_id));
> 
> In this example the table changed the schema in a way that could break users, 
> so this normally is not allowed.
> 
> Since we don’t really have a way to define something experimental other than 
> NEWS.txt, we kinda get stuck with this table and are forced to make new 
> versions and maintain them for a long time (in this example we would have 
> epochs and epochs_v2)… it would be nice if we could define a way to express 
> that tables are free to be changed (modified or even deleted) and the life 
> cycle for them….
> 
> I propose that we allow such a case and make changes to the UX (as best as we 
> can) to warn about this:
> 
> 1) update NEWS.txt to denote that the feature is experimental
> 2) when you access an experimental table you get a ClientWarning stating that 
> this is free to change
> 3) the tables comments starts with “[EXPERIMENTAL]”
> 
> What do others think?
> 
>

Re: [DISCUSS] ccm as a subproject

2024-05-15 Thread Abe Ratnofsky

Strong supporter for bringing ccm into the project as well. ccm is necessary 
test infrastructure for multiple subprojects, and Cassandra committers should 
be able to make the changes to ccm that are necessary for their patches.

There's also the security angle: we should work to consolidate our dependencies 
where appropriate, and reduce the risk of supply chain antics.

> On May 15, 2024, at 4:25 PM, Bret McGuire  wrote:
> 
>Speaking only for myself I _love_ this idea.  The various drivers use ccm 
> extensively in their integration test suites so having this tool in-house and 
> actively looked after would be very beneficial for our work.
> 
>- Bret -
> 
> On Wed, May 15, 2024 at 9:23 AM Josh McKenzie  > wrote:
>> Right now ccm isn't formally a subproject of Cassandra or under governance 
>> of the ASF. Given it's an integral components of our CI as well as for local 
>> testing for many devs, and we now have more experience w/our muscle on IP 
>> clearance and ingesting / absorbing subprojects where we can't track down 
>> every single contributor to get an ICLA, seems like it might be worth 
>> revisiting the topic of donation of ccm to Apache.
>> 
>> For what it's worth, Sylvain originally and then DataStax after transfer 
>> have both been incredible and receptive stewards of the projects and repos, 
>> so this isn't about any response to any behavior on their part. 
>> Structurally, however, it'd be better for the health of the project(s) 
>> long-term to have ccm promoted in. As far as I know there was strong 
>> receptivity to that donation in the past but the IP clearance was the 
>> primary hurdle.
>> 
>> Anyone have any thoughts for or against?
>> 
>> https://github.com/riptano/ccm
>>

Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire, Olivier Michallat as Cassandra Committers

2024-04-17 Thread Abe Ratnofsky

Congrats everyone!

> On Apr 17, 2024, at 1:10 PM, Benjamin Lerer  wrote:
> 
> The Apache Cassandra PMC is pleased to announce that Alexandre Dutra, Andrew 
> Tolbert, Bret McGuire and Olivier Michallat have accepted the invitation to 
> become committers on the java driver sub-project. 
> 
> Thanks for your contributions to the Java driver during all those years!
> Congratulations and welcome!
> 
> The Apache Cassandra PMC members

Cassandra Code Coverage Reports

2024-04-11 Thread Abe Ratnofsky

Hey folks,

Recently I put together per-suite and consolidated code coverage reports in 
advance of our upcoming releases. I’ve uploaded them to a static site on GitHub 
Pages so you can view them without downloading anything: 
https://aber.io/cassandra-coverage-reports

Disclaimers: This is just for informational purposes. I’m not advocating for 
any changes to the contribution process. I wanted to share this data as we go 
into qualification for our next major release, and hopefully start a discussion 
on how we can we can continue to improve our testing. I recognize that coverage 
is not a sole indicator of code quality or defect rate.

Potential areas for discussion:

- How can we improve the coverage of our fuzz suite?
- Are the 5.0 upgrade risks adequately covered? In particular, things like 
Config compatibility, Schema compatibility, mixed-version operation, etc.

These coverage reports are based on a fork of trunk with merge base 
16b43e4d4bd4b49029c0fc360bae1e732a7d5aae. The branch used to produce these 
reports is available here: 
https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:coverage-reports

I’m looking to merge these changes into trunk since they include fixes for a 
few issues that prevented correct coverage collection in the past, particularly 
for jvm-dtest-upgrade and jvm-dtest-fuzz suites. Feedback welcome.

--
Abe

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-04-04 Thread Abe Ratnofsky

CEP-8 proposes using separate Jira projects per Cassandra sub-project:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation

> We suggest distinct Jira projects, one per driver, all to be created.

I don't see any discussion changing that from the [DISCUSS] or vote threads:
https://lists.apache.org/thread/01pljcncyjyo467l5orh8nf9okrh7oxm
https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
https://lists.apache.org/thread/crolkrhd4y6tt3k4hsy204xomshlcp4p

But looks like upon acceptance that was changed:
https://lists.apache.org/thread/dhov01s8dvvh3882oxhkmmfv4tqdd68o

> New issues will be tracked under the CASSANDRA project on Apache’s JIRA 
>  under the component 
> ‘Client/java-driver’.

I'm in favor of using the same Jira as Cassandra proper. Committership is 
project-wide, so having a standardized process (same ticket flow, review rules, 
labels, etc. is beneficial). But multiple votes happened based on the content 
of the CEP, so we should stick to what was voted on and move to a separate Jira.

--
Abe

Re: [DISCUSS] Fixing coverage reports for jvm-dtest-upgrade

2024-03-19 Thread Abe Ratnofsky

Here's where ClassLoader names were introduced in JDK9: 
https://bugs.java.com/bugdatabase/view_bug?bug_id=6479237

And where Jacoco introduced support for the exclclassloader agent option: 
https://github.com/jacoco/jacoco/commit/b8ee4efe9c2ba93485fe5d9e25340113efc2390b

My understanding is that names only exist to improve error messages 
(ClassCastException specifically), and other diagnostic features like Jacoco's 
exclusions. I am not aware of any behavior-impacting reasons to name a 
ClassLoader, or anything this would break by adding ClassLoader names, but I'm 
testing my changes now and will report back. Naming is done just by overriding 
ClassLoader.getName: 
https://docs.oracle.com/javase/9/docs/api/java/lang/ClassLoader.html#getName--

> On Mar 18, 2024, at 3:48 PM, David Capwell  wrote:
> 
> Are there any side effects to naming ClassLoaders?  How do we do this?
> 
> I am +1 to the idea but don’t know enough to know what negatives could happen.
> 
>> On Mar 17, 2024, at 7:25 AM, Josh McKenzie  wrote:
>> 
>> +1 from me
>> 
>>> If classloaders are appropriately named in the current versions of 
>>> Cassandra, we should be able to test upgrade paths to that version without 
>>> updating the older branches or building new jvm-dtest JARs for them.
>> Pretty sure Caleb was wrestling w/something recently that might have 
>> benefited from being able to differentiate which ClassLoader was sad; in 
>> general this seems like it'd be a big help to debugging startup / env 
>> issues, assuming it doesn't break anything. :)
>> 
>> On Fri, Mar 15, 2024, at 4:41 PM, Abe Ratnofsky wrote:
>>> Hey folks,
>>> 
>>> I'm working on gathering coverage data across all of our test suites. The 
>>> jvm-dtest-upgrade suite is currently unfriendly to Jacoco: it uses classes 
>>> from multiple versions of Cassandra but with the same class names, so 
>>> Jacoco's analysis fails due to "Can't add different class with same 
>>> name"[1]. We need a way to exclude alternate-version classes from Jacoco's 
>>> analysis, so we can get coverage for the current version of Cassandra.
>>> 
>>> Jacoco supports exclusion of classes based on class name or classloader 
>>> name, but the class names are frequently usually identical across Cassandra 
>>> versions. The jvm-dtest framework doesn't name classloaders, so we can't 
>>> use that either.
>>> 
>>> I'd like to propose that we name the jvm-dtest InstanceClassLoader 
>>> instances so that some can be excluded from Jacoco's analysis. Instances 
>>> that create new InstanceClassLoaders should be able to provide an immutable 
>>> name in the constructor. InstanceClassLoader names should indicate whether 
>>> they're for the current version of Cassandra (where coverage should be 
>>> collected) or an alternate version. If classloaders are appropriately named 
>>> in the current versions of Cassandra, we should be able to test upgrade 
>>> paths to that version without updating the older branches or building new 
>>> jvm-dtest JARs for them.
>>> 
>>> Any objections or alternate approaches?
>>> 
>>> --
>>> Abe
>>> 
>>> [1]: More specifically: Jacoco uses class IDs to map the analysis data 
>>> that's produced during text execution to .class files. I'm configuring the 
>>> Jacoco agent's classdumpdir to ensure that the classes loaded during 
>>> execution are the same classes that are analyzed during report generation, 
>>> as is recommended. When we build the alternate version JARs for 
>>> jvm-dtest-upgrade, we end up with multiple classes with the same name but 
>>> different IDs.
>

[DISCUSS] Fixing coverage reports for jvm-dtest-upgrade

2024-03-15 Thread Abe Ratnofsky

Hey folks,

I'm working on gathering coverage data across all of our test suites. The 
jvm-dtest-upgrade suite is currently unfriendly to Jacoco: it uses classes from 
multiple versions of Cassandra but with the same class names, so Jacoco's 
analysis fails due to "Can't add different class with same name"[1]. We need a 
way to exclude alternate-version classes from Jacoco's analysis, so we can get 
coverage for the current version of Cassandra.

Jacoco supports exclusion of classes based on class name or classloader name, 
but the class names are frequently usually identical across Cassandra versions. 
The jvm-dtest framework doesn't name classloaders, so we can't use that either.

I'd like to propose that we name the jvm-dtest InstanceClassLoader instances so 
that some can be excluded from Jacoco's analysis. Instances that create new 
InstanceClassLoaders should be able to provide an immutable name in the 
constructor. InstanceClassLoader names should indicate whether they're for the 
current version of Cassandra (where coverage should be collected) or an 
alternate version. If classloaders are appropriately named in the current 
versions of Cassandra, we should be able to test upgrade paths to that version 
without updating the older branches or building new jvm-dtest JARs for them.

Any objections or alternate approaches?

--
Abe

[1]: More specifically: Jacoco uses class IDs to map the analysis data that's 
produced during text execution to .class files. I'm configuring the Jacoco 
agent's classdumpdir to ensure that the classes loaded during execution are the 
same classes that are analyzed during report generation, as is recommended. 
When we build the alternate version JARs for jvm-dtest-upgrade, we end up with 
multiple classes with the same name but different IDs.

Re: [Discuss] Introducing Flexible Authentication in Cassandra via Feature Flag

2024-02-12 Thread Abe Ratnofsky

Hey Guarav,

Thanks for your proposal.

> disruptive, full-cluster restart, posing significant risks in live 
> environments

For configuration that isn't hot-reloadable, like providing a new 
IAuthenticator implementation, a rolling restart is required. But rolling 
restarts are zero-downtime and safe in production, as long as you pace them 
accordingly.

In general, changing authenticators is a risky thing because it requires 
coordination with clients. To mitigate this risk and support clients while they 
transition between authenticators, I like the approach taken by 
MutualTlsWithPasswordFallbackAuthenticator:
https://github.com/apache/cassandra/blob/bec6bfde1f3b6a782f123f9f9ff18072a97e379f/src/java/org/apache/cassandra/auth/MutualTlsWithPasswordFallbackAuthenticator.java#L34

If client certificates are available, then use those, otherwise use the 
existing PasswordAuthenticator that clients are already using. The existing 
IAuthenticator interface supports this transitional behavior well.

Your proposal to include a new configuration for auth_enforcement_flag doesn't 
clearly cover how to transition from one authenticator to another. It says:

> Soft: Operates in a monitoring mode without enforcing authentication

Most users use authentication today, so auth_enforcement_flag=Soft would allow 
unauthenticated clients to connect to the database.

--
Abe

> On Feb 12, 2024, at 2:44 PM, Gaurav Agarwal  wrote:
> 
> Dear Cassandra Community,
> 
> I'm excited to share a proposal for a new feature that I believe would 
> significantly enhance the platform's security and operational flexibility: a 
> flexible authentication mechanism implemented through a feature flag .
> 
> Currently, enforcing authentication in Cassandra requires a disruptive, 
> full-cluster restart, posing significant risks in live environments. My 
> proposal, the auth_enforcement_flag, addresses this challenge by offering 
> three modes:
> 
> Hard: Enforces strict authentication with detailed logging.
> Soft: Monitors connection attempts (valid and invalid) without enforcing 
> authentication.
> None: Maintains the current Cassandra behavior.
> 
> This flag enables:
> Minimized downtime: Seamless authentication rollout without service 
> interruptions.
> Enhanced security: Detailed logs for improved threat detection and 
> troubleshooting.
> Gradual adoption: Phased implementation with real-world feedback integration.
> 
> I believe this feature provides substantial benefits for both users and 
> administrators. Please see the detailed proposal here: Introducing flexible 
> authentication mechanism 
> 
> 
> I warmly invite the community to review this proposal and share your valuable 
> feedback. I'm eager to discuss its potential impact and collaborate on making 
> Cassandra even better.
> 
> Thank you for your time and consideration.
> 
> Sincerely,
> Gaurav Agarwal
> Software Engineer at Uber

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-08 Thread Abe Ratnofsky

> Deprecating helps nothing for existing releases. We can’t/shouldn’t remove 
> the feature in existing releases.

The deprecation I'm proposing is intended to push people to configure their 
servers in a way that is more secure and maximizes compatibility with clients. 
Deprecating can help for existing releases - the better configuration already 
exists, and it's likely that users of dual-native-port optional SSL can use it. 
At the very least, users should be made aware of the risks of dual-native-port 
operation.

Currently, if a user specifies the following server configuration:
- client_encryption_options.enabled=true
- client_encryption_options.optional=false
- native_transport_port != native_transport_port_ssl

then the server will still handle unencrypted traffic on native_transport_port. 
This feels like a security risk: it would be reasonable to interpret that this 
configuration requires all traffic to be encrypted.

And if a user specifies this server configuration:
- client_encryption_options.enabled=true
- client_encryption_options.optional=true
- native_transport_port != native_transport_port_ssl

then clients can still send unencrypted traffic to native_transport_port_ssl, 
since the server handles optional encryption on this port. In this case, there 
are two ports that accept unencrypted traffic, one of which also accepts 
encrypted traffic.

In both cases:
- Clients configured to use SSL will discover non-SSL ports from 
system.peers_v2 and fail to connect to those hosts, causing single-connection 
sessions and no load balancing
- Clients configured to use SSL will fail to interpret server-reported topology 
and status events because those events currently include non-SSL ports, causing 
user connection pools to incorrectly track cluster changes

I'm proposing that in the dual-native-port case, we log a warning to advise 
clients to move to a single port, with the same values for enabled and 
optional. With a single port, users won't need to worry about 
native_transport_port always accepting unencrypted traffic. The peers_v2 system 
table and EVENT response messages will include ports that clients will be able 
to connect to regardless of their SSL configuration.

I'm open to discussing other ways we could handle this, but I think the 
requirements are:
- Maintain compatibility with existing clients (no new tables for discovering 
peers, etc.)
- Ensure SSL and non-SSL sessions can continue to operate (with >1 pooled 
connections) without disruption

Thanks to Stefan for his efforts looking into this more closely with me 
yesterday. We found out how this came about as well - when the project moved 
support dual-native-port in CASSANDRA-9590 
, driver configurations 
were expected to include hard-coded ports for encrypted and unencrypted 
traffic. Then, when customizable per-host native ports came out in 
CASSANDRA-7554 , drivers 
were expected to discover the right native protocol port from the system table 
instead of hard-coding it. So this has been a problem since 4.0 for users 
running dual-native-port and clients requiring SSL.

--
Abe

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Abe Ratnofsky

> For existing versions what about having a “default ssl” or “default no SSL” 
> yaml setting which decides what port is advertised?  Then someone could still 
> connect on the other port manually specifying.  Then new column can be added 
> with the new table in trunk.

I'm assuming "advertisement" here includes which port is in system.peers(_v2) 
and is included in status and topology events.

If dual-native-port is enabled, a client is connecting unencrypted to the 
non-SSL port, and "advertise-native-port=ssl" (name pending) is enabled, then 
when that client fetches peers it will get the SSL ports, right? If the client 
doesn't support SSL, then those subsequent connections will fail. An operator 
would have to set "advertise-native-port=ssl" and override the port options in 
all clients, which isn't feasible.

It might be possible to move to a system view, and return the right native port 
depending on whether the connection the query originates from is SSL or not. So 
if a control connection is over SSL, then ports in system_views.peers_ssl_aware 
(name pending) would be SSL ports. But having query responses dependent on 
connection state seems like a recipe for a very weird bug in the future.

I still think deprecating dual-native-port is the cleanest and most compatible 
solution, especially since it would address a few other related bugs as well. 
I'm considering a separate thread to get consensus on that.

Abe

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Abe Ratnofsky

CASSANDRA-9590 (Support for both encrypted and unencrypted native transport 
connections) was implemented before CASSANDRA-10559 (Support encrypted and 
plain traffic on the same port), but both been available since 3.0.

On 9590, STARTTLS was considered, but rejected due to the changes that would be 
required to support it from all drivers. But the current server implementation 
doesn't require STARTTLS: the client is expected to send the first message over 
the connection, so the server can just check if that message is encrypted 
,
 and then enable the Netty pipeline's SslHandler 
.

The implementation in 10559 is compatible with existing clients, and is already 
used widely. Are there any reasons for users to stick with dual-native-port 
rather than a single port that supports both encrypted and unencrypted traffic?

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-07 Thread Abe Ratnofsky

What is the audience for dual-native-port operation? My understanding is that 
most users can use a single port for optional SSL, ever since CASSANDRA-10559 
. Using a single port 
for both encrypted and unencrypted traffic also makes clients more likely to 
behave correctly, since status and topology events (which identify hosts by 
their native address and port) will correctly identify host+port pairs that 
exist in user load balancing policies and connection pools. Dual-port operation 
appears to behave incorrectly against apache/cassandra-java-driver 4.x, as 
discussed in the #cassandra-drivers Slack channel 
. If this 
does not behave correctly in the official driver, it's likely there are bugs in 
other drivers' handling of dual-native-port clusters as well.

Would there be any appetite for deprecating dual-native-port operation?

> Also, if there is currently a user who e.g. reads from peers_v2 table and 
> retrieves the value from a column by some index, then this would be "shifted" 
> and it might break her reading.

Given that peers_v2 is intended to be used by the internals of client 
implementations, we should be extra cautious about making changes. We shouldn't 
change the column index of any existing columns - if we're going to add a new 
column, it should be in the end position.

Abe

Re: CASSANDRA-19268: Improve Cassandra compression performance using hardware accelerators

2024-01-22 Thread Abe Ratnofsky

Hardware acceleration for more things would be great, especially based on the 
success of ACCP (CASSANDRA-18624 
). But I think it would 
be ideal to use existing compressor names and use hardware acceleration if a 
given JAR is present on the classpath / configured, like ACCP. Then hosts with 
varying hardware acceleration support can still interoperate (stream compressed 
SSTables between each other, for example) and migrating existing systems to use 
the new compressors would be as simple as ensuring a JAR is present / 
configured, not requiring a change to table options.

See org.apache.cassandra.security.DefaultCryptoProvider for example.

> On Jan 19, 2024, at 1:51 PM, Kokoori, Shylaja  
> wrote:
> 
> Hi,
> Latest processors have integrated hardware accelerators which can speed up 
> operations like compress/decompress, crypto and analytics. Here are some 
> links to details
> 1) https://cdrdv2.intel.com/v1/dl/getContent/721858
> 2) 
> https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html
>  
> We would like to add a new compressor which can accelerate 
> compress/decompress when hardware is available and which will default to 
> software otherwise.
>  
> Thanks,
> Shylaja

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Abe Ratnofsky

Congrats Maxim!

> On Jan 8, 2024, at 1:28 PM, Ekaterina Dimitrova  wrote:
> 
> Congrats, Maxim! Well deserved! Thank you for everything!
> 
> On Mon, 8 Jan 2024 at 13:26, Jeremiah Jordan  > wrote:
>> Congrats Maxim!  Thanks for all of your contributions!
>> 
>> On Jan 8, 2024 at 12:19:04 PM, Josh McKenzie > > wrote:
>>> The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has 
>>> accepted
>>> the invitation to become a committer.
>>> 
>>> Thanks for all the hard work and collaboration on the project thus far, and 
>>> we're all looking forward to working more with you in the future. 
>>> Congratulations and welcome!
>>> 
>>> The Apache Cassandra PMC members
>>> 
>>>

Re: [DISCUSSION] CEP-38: CQL Management API

2023-12-05 Thread Abe Ratnofsky

Adding to Hari's comments:

> Any changes expected at client/driver side? While using JMX/nodetool, it is 
> clear that the command/operations are getting executed against which 
> Cassandra node. But a client can connect to multiple hosts and trigger 
> queries, then how can it ensure that commands are executed against the 
> desired Cassandra instance?

Clients are expected to set the node for the given CQL statement in cases like 
this; see docstring for example: 
https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/cql/Statement.java#L124-L147

> The term COMMAND is a bit abstract I feel (subjective). Some of the examples 
> quoted are referring to updating settings (for example: EXECUTE COMMAND 
> setconcurrentcompactors WITH concurrent_compactors=5;) and some are referring 
> to operations. Updating settings and running operations are considerably 
> different things. They may have to be handled in their own way. And I also 
> feel the settings part is overlapping with virtual tables. If virtual tables 
> support writes (at least the settings virtual table), then settings can be 
> updated using the virtual table itself.

I agree with this - I actually think it would be clearer if this was referred 
to as nodetool, if the set of commands is going to be largely based on nodetool 
at the beginning. There is a lot of documentation online that references 
nodetool by name, and changing the nomenclature would make that existing 
documentation harder to understand. If a user can understand this as "nodetool, 
but better and over CQL not JMX" I think that's a clearer transition than a new 
concept of "commands".

I understand that this proposal includes more than just nodetool, but there's a 
benefit to having a tool with a name, and a web search for "cassandra commands" 
is going to have more competition and ambiguity.

[DISCUSS] CASSANDRA-19113: Publishing dtest-shaded JARs on release

2023-11-28 Thread Abe Ratnofsky

Hey folks - wanted to raise a separate thread to discuss publishing of 
dtest-shaded JARs on release.

Currently, adjacent projects that want to use the jvm-dtest framework need to 
build the shaded JARs themselves. This is a decent amount of work, and is 
duplicated across each project. This is mainly relevant for projects like 
Sidecar and Driver. Currently, those projects need to clone and build 
apache/cassandra themselves, run ant dtest-jar, and move the JAR into the 
appropriate place. Different build systems treat local JARs differently, and 
the whole process can be a bit complicated. Would be great to be able to treat 
these as normal dependencies.

https://issues.apache.org/jira/browse/CASSANDRA-19113

Any objections?

--
Abe

Re: Welcome Francisco Guerrero Hernandez as Cassandra Committer

2023-11-28 Thread Abe Ratnofsky

Congrats Francisco!

> On Nov 28, 2023, at 1:56 PM, C. Scott Andreas  wrote:
> 
> Congratulations, Francisco!
> 
> - Scott
> 
>> On Nov 28, 2023, at 10:53 AM, Dinesh Joshi  wrote:
>> 
>> The PMC members are pleased to announce that Francisco Guerrero Hernandez 
>> has accepted
>> the invitation to become committer today.
>> 
>> Congratulations and welcome!
>> 
>> The Apache Cassandra PMC members

Re: [DISCUSS] Harry in-tree

2023-11-28 Thread Abe Ratnofsky

Another strong +1 to have Harry in-tree, and another +1 to building shaded 
dtest JARs on release. There are a number of projects that would benefit from 
having these JARs available in a central repository, like Sidecar, Driver, etc. 
I didn't see a ticket so created one: 
https://issues.apache.org/jira/browse/CASSANDRA-19113

> On Nov 28, 2023, at 3:12 AM, Alex Petrov  wrote:
> 
> Sure, that should be possible. I will check and will get back.
> 
> On Mon, Nov 27, 2023, at 10:10 PM, Ekaterina Dimitrova wrote:
>> +1, also, Alex, just an idea - maybe you want to make a virtual talk, as 
>> part of the contributors meetings? 
>> 
>> 
>> На понеделник, 27 ноември 2023 г. Yifan Cai > > написа:
>> +1
>> 
>> 
>> 发件人: Sam Tunnicliffe mailto:s...@beobal.com>>
>> 发送时间: Tuesday, November 28, 2023 2:43:51 AM
>> 收件人: dev mailto:dev@cassandra.apache.org>>
>> 主题: Re: [DISCUSS] Harry in-tree
>>  
>> Definite +1 to bringing harry-core in tree.
>> 
>>> On 24 Nov 2023, at 15:43, Alex Petrov >> > wrote:
>>> 
>>> Hi everyone,
>>> 
>>> With TCM landed, there will be way more Harry tests in-tree: we are using 
>>> it for many coordination tests, and there's now a simulator test that uses 
>>> Harry. During development, Harry has allowed us to uncover and resolve 
>>> numerous elusive edge cases.
>>> 
>>> I had conversations with several folks, and wanted to propose to move 
>>> harry-core to Cassandra test tree. This will substantially 
>>> simplify/streamline co-development of Cassandra and Harry. With a new 
>>> HistoryBuilder API that has helped to find and trigger [1] [2] and [3], it 
>>> will also be much more approachable.
>>> 
>>> Besides making it easier for everyone to develop new fuzz tests, it will 
>>> also substantially lower the barrier to entry. Currently, debugging an 
>>> issue found by Harry involves a cumbersome process of rebuilding and 
>>> transferring jars between Cassandra and Harry, depending on which side you 
>>> modify. This not only hampers efficiency but also deters broader adoption. 
>>> By merging harry-core into the Cassandra test tree, we eliminate this 
>>> barrier.
>>> 
>>> Thank you,
>>> --Alex
>>> 
>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-19011
>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-18993
>>> [3] https://issues.apache.org/jira/browse/CASSANDRA-18932

Re: Development Dependencies documentation.

2023-11-01 Thread Abe Ratnofsky

Following back up here - patch has been updated and is ready for review: 
https://github.com/apache/cassandra-website/pull/170

> On Oct 25, 2023, at 8:07 AM, Ekaterina Dimitrova  
> wrote:
> 
> Hi Claude,
> You are not wrong. Unfortunately, it is outdated. Abe Ratnofsky has a work in 
> progress patch. You might want to get in touch with him to finish it.
> Best regards,
> Ekaterina
> 
> On Wed, 25 Oct 2023 at 8:04, Claude Warren, Jr via dev 
> mailto:dev@cassandra.apache.org>> wrote:
>> I just had to change dependencies in Cassandra for the first  time and I 
>> think the documentation [1] is out of date.
>> 
>> First I think most of the file edits are in the ".build" directory.  Adding 
>> jars to the "lib" directory works until calling "ant realclean", so perhaps 
>> the instructions should include regenerating the "lib" folder after making 
>> the edits.
>> 
>> If I am wrong please let me know, otherwise I will open a ticket and update 
>> the documentation.
>> 
>> [1] https://cassandra.apache.org/_/development/dependencies.html

Re: [DISCUSSION] Dependency management in our first alpha release

2023-08-23 Thread Abe Ratnofsky

> I also want to hear if Abe still has concerns about not following deprecation 
> process here.

I support removing the library on an expedited schedule, rather than waiting 
for a full major of deprecation. We still have a large surface for metrics 
integrations, and users who depended on metrics-reporter-config will have a 
path forward if they need similar functionality.

> On Aug 23, 2023, at 07:28, Ekaterina Dimitrova  wrote:
> 
> I also want to hear if Abe still has concerns about not following deprecation 
> process here.

Re: [DISCUSSION] CASSANDRA-18772 - removal of commons-codec on trunk

2023-08-17 Thread Abe Ratnofsky

If we're going to do bulk dependency pruning, we should minimize the number of 
deprecation plans that users need to prepare for. There will likely be a few 
more dependencies we clean up around 5.0, so sticking with 5.0 deprecation and 
6.0 removal for all of them would likely make our users' lives easier.

If there's new information about a security issue in a dependency and no clear 
alternative, I'd be open to an expedited removal plan as an exception, but that 
would be on a case-by-case basis.

> On Aug 17, 2023, at 10:10 AM, Ekaterina Dimitrova  
> wrote:
> 
> Hi everyone,
> 
> I propose we remove commons-codec on trunk.
> The only usage I found was from CASSANDRA-12790 
>  - Support InfluxDb 
> metrics reporter configuration, which relied on commons-codec and 
> metrics-reporter-config, which will be removed as part of CASSANDRA-18743. 
> The only question is whether we can remove those two dependencies on trunk, 
> considering it is 5.1, or do we need to wait until 6.0. 
> 
> Best regards,
> Ekaterina

Re: [DISCUSS] CASSANDRA-18743 Deprecation of metrics-reporter-config

2023-08-16 Thread Abe Ratnofsky

There's consensus here to deprecate metrics-reporter-config in 5.0.

Is there any objection to removing it in 5.1?

> On Aug 11, 2023, at 10:01 AM, Maxim Muzafarov  wrote:
> 
> +1
> 
> The rationale for deprecating/removing this library is not just that
> it is obsolete and doesn't get updates. In fact, when the
> metrics-reporter-config [1] was added the dropwizard metrics library
> (formerly com.yammer.metrics [2]) didn't support exporting metrics to
> files like csv, so it made sense at that time. Now it is fully covered
> by the drowpwizrd reporters [3], so users can achieve the same
> behaviour without the need for metrics-reporter-config. And that's why
> I have a lot of doubts about it being used by anyone, but deprecation
> is friendlier because there's no rush to remove it. :-)
> 
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-4430
> [2] https://issues.apache.org/jira/browse/CASSANDRA-5838
> [3] https://metrics.dropwizard.io/4.2.0/getting-started.html#other-reporting
> 
> On Fri, 11 Aug 2023 at 16:50, Caleb Rackliffe  
> wrote:
>> 
>> +1
>> 
>>> On Aug 11, 2023, at 8:10 AM, Brandon Williams  wrote:
>>> 
>>> +1
>>> 
>>> Kind Regards,
>>> Brandon
>>> 
>>>> On Fri, Aug 11, 2023 at 8:08 AM Ekaterina Dimitrova
>>>>  wrote:
>>>> 
>>>> 
>>>> “ The rationale for this proposed deprecation is that the upcoming 5.0 
>>>> release is a good time to evaluate dependencies that are no longer 
>>>> receiving updates and will become risks in the future.”
>>>> 
>>>> Thank you for raising it, I support your proposal for deprecation
>>>> 
>>>>> On Fri, 11 Aug 2023 at 8:55, Abe Ratnofsky  wrote:
>>>>> 
>>>>> Hey folks,
>>>>> 
>>>>> Opening a thread to get input on a proposed dependency deprecation in 
>>>>> 5.0: metrics-reporter-config has been archived for 3 years and not 
>>>>> updated in nearly 6 years.
>>>>> 
>>>>> This project has a minor security issue with its usage of unsafe YAML 
>>>>> loading via snakeyaml’s unprotected Constructor: 
>>>>> https://nvd.nist.gov/vuln/detail/CVE-2022-1471
>>>>> 
>>>>> This CVE is reasonable to suppress, since operators should be able to 
>>>>> trust their YAML configuration files.
>>>>> 
>>>>> The rationale for this proposed deprecation is that the upcoming 5.0 
>>>>> release is a good time to evaluate dependencies that are no longer 
>>>>> receiving updates and will become risks in the future.
>>>>> 
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18743
>>>>> 
>>>>> —
>>>>> Abe
>>>>>

[DISCUSS] CASSANDRA-18743 Deprecation of metrics-reporter-config

2023-08-11 Thread Abe Ratnofsky

Hey folks,

Opening a thread to get input on a proposed dependency deprecation in 5.0: 
metrics-reporter-config has been archived for 3 years and not updated in nearly 
6 years.

This project has a minor security issue with its usage of unsafe YAML loading 
via snakeyaml’s unprotected Constructor: 
https://nvd.nist.gov/vuln/detail/CVE-2022-1471

This CVE is reasonable to suppress, since operators should be able to trust 
their YAML configuration files.

The rationale for this proposed deprecation is that the upcoming 5.0 release is 
a good time to evaluate dependencies that are no longer receiving updates and 
will become risks in the future.

https://issues.apache.org/jira/browse/CASSANDRA-18743

—
Abe

Re: [Discuss] CEP-35: Add PIP support for CQLSH

2023-08-09 Thread Abe Ratnofsky

I think it would be good for the project to have an official PyPI distribution, 
and the signal from users (40K downloads per month) is a clear indication that 
this is useful. Timely releases would help us get future improvements to cqlsh 
out faster, and moving this to an official distribution would protect users 
against any changes in this volunteer effort in case something happens in the 
future.

+1 (nb)

--
Abe

> On Aug 9, 2023, at 1:33 PM, Brad  wrote:
> 
> HI Dinesh,
> 
> You are correct that the scope of this CEP is practical, narrow and limited 
> to having an official distribution of CQLSH on the official Python package 
> repository. Cassandra end-users, who use the CQLSH command line, would 
> benefit in several direct ways:
> A timely distribution of new CQLSH versions on the official Python package 
> repository aligned with Apache Cassandra releases
> A trusted distribution overseen by Apache Cassandra instead of third party 
> maintainers. Today, there is only trust-based faith that the PyPI 
> distribution of CQLSH matches the Apache Open Source one.
> A lightweight distribution of CQLSH clocking in at 110KB vs downloading a 
> 50MB tarball.
> Perhaps those are modest goals, but I would suggest they are big wins for the 
> Cassandra user community. If you haven't tried it yet, please run 'pip 
> install cqlsh' on your desktop and see how nicely it works. Indeed, the 
> return-on-investment of effort here should be really high, as the work is 
> mostly already done, it's just run from a private repo at 
> https://github.com/jeffwidman/cqlsh and has been maintained continually since 
> 2013.
> 
> Other initiatives such as subdividing the project(s) or re-writing the REPL 
> in another language would be out-of-scope. It would be entirely appropriate 
> to have a separate discussion on those two topics, if you wish to start that 
> discussion.
> 
> The process and degree of overhead required to publish to PyPI will require 
> some discovery and discussion. Ideally, it would be possible to automate it. 
> That is definitely a topic we need further input from the engineers involved 
> in the build-release process.
> 
> A pre-CEP discussion of this proposal was started by Jeff on the mailing list 
> back in early July, see 
> https://lists.apache.org/thread/sy3p2b2tncg1bk6x3r0r60y10dm6l18d. 
> 
> Regards,
> 
> Brad
> 
> On Wed, Aug 9, 2023 at 3:31 PM Dinesh Joshi  > wrote:
>> Brad,
>> 
>> Thanks for starting this discussion. My understanding is that we're
>> simply adding pip support for cqlsh and Apache Cassandra project will
>> officially publish a cqlsh pip package. This is a good goal but other
>> than having an official pip package, what is it that we're gaining?
>> Please don't interpret this as push back on your proposal but I am
>> unclear on what we're trying to solve by making this official
>> distribution. There are several distribution channels and it is
>> untenable to officially support all of them.
>> 
>> If we do adopt this, there will be non-zero overhead of the release
>> process. This is fine but we need volunteers to run this process. My
>> understanding is that they need to be ideally PMC or at least Committers
>> on the project to go through all the steps to successfully release a new
>> artifact for our users.
>> 
>> I would have liked this CEP to go a bit further than just packaging
>> cqlsh in pip. IMHO we should have cqlsh as a separate sub-project. It
>> doesn't need to live in the cassandra repo. Extracting cqlsh into it's
>> separate repo would allow us to truly decouple cqlsh from the server.
>> This is already true for the most part as we rely on the Python driver
>> which is compatible with several cassandra releases. As it stands today
>> it is not possible for us to update cqlsh without making a Cassandra
>> release.
>> 
>> If you truly want to go a bit further, we should consider rewriting
>> cqlsh in Java so we can easily share code from the server. We can then
>> potentially use Java Native Image[1] to produce a truly platform
>> independent binary like golang. Python has its strengths but it does get
>> hairy as it expects certain runtime components on the target. Java With
>> Native Image we make things very simple from a user's perspective very
>> similar to how golang produces statically linked binaries. This might be
>> a very far out thought but it is worth exploring. I believe GraalVM's
>> license might allow us to produce binaries that we can incorporate in
>> our release but IANAL so maybe we can ask ASF legal on their opinion.
>> 
>> Giving cqlsh it's own identity as a sub-project might help us build a
>> roadmap and evolve it along these lines.
>> 
>> I would like other folks to chime in with their opinions.
>> 
>> Dinesh
>> 
>> On 8/9/23 09:18, Brad wrote:
>> > 
>> > As per the CEP process guidelines, I'm starting a formal DISCUSS thread
>> > to resume the conversation started here[1]. 
>> > 
>> > The developers who

Re: [VOTE] CEP-34: mTLS based client and internode authenticators

2023-07-21 Thread Abe Ratnofsky

+1 (nb)

> On Jul 21, 2023, at 3:03 PM, Jon Meredith  wrote:
> 
> +1
> 
> On Fri, Jul 21, 2023 at 2:33 PM Blake Eggleston  > wrote:
>> +1
>> 
>>> On Jul 21, 2023, at 9:57 AM, Jyothsna Konisa >> > wrote:
>>> 
>>> Hi Everyone!
>>> 
>>> I would like to start a vote thread for CEP-34.
>>> 
>>> Proposal: 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-34%3A+mTLS+based+client+and+internode+authenticators
>>> JIRA   : 
>>> https://issues.apache.org/jira/browse/CASSANDRA-18554
>>> Draft Implementation : https://github.com/apache/cassandra/pull/2372
>>> Discussion : 
>>> https://lists.apache.org/thread/pnfg65r76rbbs70hwhsz94ds6yo2042f
>>> 
>>> The vote will be open for 72 hours. A vote passes if there are at least 3 
>>> binding +1s and no binding vetoes.
>>> 
>>> Thanks,
>>> Jyothsna Konisa.
>>

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-20 Thread Abe Ratnofsky

This feels analogous to other past discussions around prioritizing a config 
that enables new users to clone + build + run as easily as possible, vs. having 
better prod recommendations out of the box.

Both are important. I personally think we should default configuration to make 
it just work for new users, and have a config to allow power users to fail if 
ACCP is not present, and warn once on startup if we tolerate missing ACCP but 
detect it is absent.

> On Jul 20, 2023, at 14:51, Brandon Williams  wrote:
> 
> I think we could special-case and default to 'auto' but allow other
> more explicit options.
> 
> Kind Regards,
> Brandon
> 
>> On Thu, Jul 20, 2023 at 4:18 PM German Eichberger via dev
>>  wrote:
>> 
>> In general I agree with Joey -- but I would prefer if this behavior is 
>> configurable, e.g. there is an option to get a startup failure if the 
>> configured fastest provider can't run for any reason to avoid a "silent" 
>> performance degradation as Jordan was experiencing.
>> 
>> Thanks,
>> German
>> 
>> 
>> From: Joseph Lynch 
>> Sent: Thursday, July 20, 2023 7:38 AM
>> To: dev@cassandra.apache.org 
>> Subject: [EXTERNAL] Re: [DISCUSS] Using ACCP or tc-native by default
>> 
>> Having native dependencies shouldn't make the project x86 only, it
>> should just accelerate the performance on x86 when available. Can't we
>> just try to load the fastest available provider (so arm will use
>> native java but x86 will use proper hardware acceleration) and failing
>> that fall-back to the default? If I recall correctly from the
>> messaging service patches (and zstd/lz4) it's reasonably
>> straightforward to try to load native code and then fail-back if you
>> fail.
>> 
>> -Joey
>> 
>>> On Thu, Jul 20, 2023 at 10:27 AM J. D. Jordan  
>>> wrote:
>>> 
>>> Maybe we could start providing Dockerfile’s and/or make arch specific 
>>> rpm/deb packages that have everything setup correctly per architecture?
>>> We could also download them all and have the startup scripts put stuff in 
>>> the right places depending on the arch of the machine running them?
>>> I feel like there are probably multiple ways we could solve this without 
>>> requiring users to jump through a bunch of hoops?
>>> But I do agree we can’t make the project x86 only.
>>> 
>>> -Jeremiah
>>> 
 On Jul 20, 2023, at 2:01 AM, Miklosovic, Stefan 
  wrote:

 Hi,

 as I was reviewing the patch for this feature (1), we realized that it is 
 not quite easy to bundle this directly into Cassandra.

 The problem is that this was supposed to be introduced as a new dependency:

   software.amazon.cryptools
   AmazonCorrettoCryptoProvider
   2.2.0
   linux-x86_64

 Notice "classifier". That means that if we introduced this dependency into 
 the project, what about ARM users? (there is corresponding aarch 
 classifier as well). ACCP is platform-specific but we have to ship 
 Cassandra platform-agnostic. It just needs to run OOTB everywhere. If we 
 shipped that with x86 and a user runs Cassandra on ARM, I guess that would 
 break things, right?

 We also can not just add both dependencies (both x86 and aarch) because 
 how would we differentiate between them in runtime? That all is just too 
 tricky / error prone.

 So, the approach we want to take is this:

 1) nothing will be bundled in Cassandra by default
 2) a user is supposed to download the library and put it to the class path
 3) a user is supposed to put the implementation of ICryptoProvider 
 interface Cassandra exposes to the class path
 3) a user is supposed to configure cassandra.yaml and its section 
 "crypto_provider" to reference the implementation he wants

 That way, we avoid the situation when somebody runs x86 lib on ARM or vice 
 versa.

 By default, NoOpProvider will be used, that means that the default crypto 
 provider from JRE will be used.

 It can seem like we have not done too much progress here but hey ... we 
 opened the project to the custom implementations of crypto providers a 
 community can create. E.g. as 3rd party extensions etc ...

 I want to be sure that everybody is aware of this change (that we plan to 
 do that in such a way that it will not be "bundled") and that everybody is 
 on board with this. Otherwise I am all ears about how to do that 
 differently.

 (1) 
 https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRA-18624=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cf4530a41df3b419fd2ff08db892f0ed6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638254607439254753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=kYGSZGi3caINvm%2FDT4ms3%2BrcnMTxg0E921cMjmUvHQw%3D=0

Re: [VOTE] CEP 33 - CIDR filtering authorizer

2023-06-28 Thread Abe Ratnofsky

+1 (nb)On Jun 28, 2023, at 18:38, guo Maxwell  wrote:+1 Nate McCall 于2023年6月29日 周四上午9:25写道：+1 On Wed, Jun 28, 2023 at 5:17 AM Shailaja Koppu  wrote:Hi Team,(Starting a new thread for VOTE instead of reusing the DISCUSS thread, to follow usual procedure).Please vote on CEP 33 - CIDR filtering authorizer https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-33%3A+CIDR+filtering+authorizer.Thanks,Shailaja
-- you are the apple of my eye !

Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread Abe Ratnofsky

+1 (nb)On Jun 13, 2023, at 09:23, Andrés de la Peña  wrote:+1On Tue, 13 Jun 2023 at 16:40, Yifan Cai  wrote:






+1






From: David Capwell 
Sent: Tuesday, June 13, 2023 8:37:10 AM
To: dev 
Subject: Re: [VOTE] CEP-8 Datastax Drivers Donation
 

+1


On Jun 13, 2023, at 7:59 AM, Josh McKenzie  wrote:



+1




On Tue, Jun 13, 2023, at 10:55 AM, Jeremiah Jordan wrote:


+1 nb




On Jun 13, 2023 at 9:14:35 AM, Jeremy Hanna  wrote:






Calling for a vote on CEP-8 [1].



To clarify the intent, as Benjamin said in the discussion thread [2], the goal of this vote is simply to ensure that the community is in favor of the donation. Nothing more.

The plan is to introduce the drivers, one by one. Each driver donation will need to be accepted first by the PMC members, as it is the case for any donation. Therefore the PMC should have full control on the pace at which new drivers are
 accepted.



If this vote passes, we can start this process for the Java driver under the direction of the PMC.



Jeremy



1. https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp

Re: [DISCUSS] CEP-8 Drivers Donation - take 2

2023-05-26 Thread Abe Ratnofsky

Sharing the same sentiment. Thank you for all your efforts on getting this 
ready to contribute!

> On May 26, 2023, at 9:59 AM, Francisco Guerrero  wrote:
> 
> I second Dinesh's sentiment. I'm looking forward to this contribution.
> 
> On 2023/05/26 16:29:12 Dinesh Joshi wrote:
>> This is exciting. Thank you for all your hard work on getting ICLAs from
>> contributors. I am in favor of moving forward.  
>> 
>> 
>>> 
>>> On May 26, 2023, at 5:54 AM, Jeremy Hanna 
>>> wrote:  
>>> 
>>> 
>> 
>>> To add to a somewhat crowded [DISCUSS] party, I'd like to restart the
>>> discussion around CEP-8.
>>> 
>>> 
>>> 
>>> 
>>> This is the original thread from July 2020:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> At the time, several good points were discussed and the CEP has been updated
>>> with many of them.  One point in particular was that we should start with
>>> the DataStax Java driver as it is the reference implementation of the CQL
>>> protocol, a dependency of the project, and the most used of the 7 drivers
>>> discussed.  Other points were about package naming evolution and DataStax
>>> specific functionality.  I believe everyone agreed that we should take the
>>> first step of contributing the drivers as-is to minimize user disruption.
>>> That way we get through the legal and procedural process for the first
>>> driver.  Then we can proceed with discussing how it will be managed and by
>>> whom.
>>> 
>>> 
>>> 
>>> 
>>> As the next step in donating the Java driver to the project and as we talked
>>> about at ApacheCon last year, we needed to verify that we had all of the
>>> CLAs for all of the codebase.  Over the last year, Greg, Benjamin, Mick,
>>> Josh, Scott, and I have been tracking down all of the contributors of the
>>> DataStax Java driver that had not signed the DataStax CLA and asked them to
>>> please sign that one or the ASF CLA.  After having discussed the CLAs with
>>> the PMC and ASF legal, we believe we are ready to proceed.
>>> 
>>> 
>>> 
>>> 
>>> At this point, we'd like to propose CEP-8 for consideration, starting the
>>> process to accept the DataStax Java driver as an official ASF project.
>>> 
>>> 
>>> 
>>> 
>>> [| [CEP-8: Datastax Drivers Donation - CASSANDRA - Apache Software
>>> Foundation](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation)[cwiki.apache.org
>>>  
>>> ](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation)|
>>> [](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation)
>>>   
>>> ---|---  
>>> 
>>> ](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation)
>>> 
>>> 
>>> 
>>> 
>>> Jeremy

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Abe Ratnofsky

gt;>>>>>>> understate how excited I am about this, and how important I think this 
>>>>>>>> is. Time constraints are somehow hard to overcome, but I hope the 
>>>>>>>> results brought by TCM will make it all worth it.
>>>>>>>> 
>>>>>>>> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>>>>>>>>> I think pulling Harry into the tree will make adoption easier for the 
>>>>>>>>> folks. I have been a bit swamped with Transactional Metadata work, 
>>>>>>>>> but I wanted to make some of the things we were using for testing TCM 
>>>>>>>>> available outside of TCM branch. This includes a bunch of helper 
>>>>>>>>> methods to perform operations on the clusters, data generation, and 
>>>>>>>>> more useful stuff. Of course, the question always remains about how 
>>>>>>>>> much time I want to spend porting it all to Gossip, but I think we 
>>>>>>>>> can find a reasonable compromise. 
>>>>>>>>> 
>>>>>>>>> I would not set this improvement as a prerequisite to pulling Harry 
>>>>>>>>> into the main branch, but rather interpret it as a commitment from 
>>>>>>>>> myself to take community input and make it more approachable by the 
>>>>>>>>> day. 
>>>>>>>>> 
>>>>>>>>> On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:
>>>>>>>>>>> importantly it’s a million times better than the dtest-api process 
>>>>>>>>>>> - which stymies development due to the friction.
>>>>>>>>>> This is my major concern.
>>>>>>>>>> 
>>>>>>>>>> What prompted this thread was harry being external to the core 
>>>>>>>>>> codebase and the lack of adoption and usage of it having led to 
>>>>>>>>>> atrophy of certain aspects of it, which then led to redundant 
>>>>>>>>>> implementation of some fuzz testing and lost time.
>>>>>>>>>> 
>>>>>>>>>> We'd all be better served to have this closer to the main codebase 
>>>>>>>>>> as a forcing function to smooth out the rough edges, integrate it, 
>>>>>>>>>> and make it a collective artifact and first class citizen IMO.
>>>>>>>>>> 
>>>>>>>>>> I have similar opinions about the dtest-api.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:
>>>>>>>>>>> 
>>>>>>>>>>> It’s not without hiccups, and I’m sure we have more to learn. But 
>>>>>>>>>>> it mostly just works, and importantly it’s a million times better 
>>>>>>>>>>> than the dtest-api process - which stymies development due to the 
>>>>>>>>>>> friction.
>>>>>>>>>>> 
>>>>>>>>>>>> On 24 May 2023, at 08:39, Mick Semb Wever >>>>>>>>>>> <mailto:m...@apache.org>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> WRT git submodules and CASSANDRA-18204, are we happy with how it 
>>>>>>>>>>>> is working for accord ? 
>>>>>>>>>>>> 
>>>>>>>>>>>> The time spent on getting that running has been a fair few hours, 
>>>>>>>>>>>> where we could have cut many manual module releases in that time. 
>>>>>>>>>>>> 
>>>>>>>>>>>> David and folks working on accord ? 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, 23 May 2023 at 20:09, Josh McKenzie >>>>>>>>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll hold off on this until Alex Petrov chimes in. @Alex -> got 
>>>>>>>>>>>> any thoughts here?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, May 16, 2023, at 5:17 PM, Jeremy Hanna wrote:
>>>>>>>>>>>>> I think it would be great to onboard Harry more officially into 
>>>>>>>>>>>>> the project.  However it would be nice to perhaps do some sanity 
>>>>>>>>>>>>> checking outside of Apple folks to see how approachable it is.  
>>>>>>>>>>>>> That is, can someone take it and just run it with the current 
>>>>>>>>>>>>> readme without any additional context?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I wonder if a mini-onboarding session would be good as a 
>>>>>>>>>>>>> community session - go over Harry, how to run it, how to add a 
>>>>>>>>>>>>> test?  Would that be the right venue?  I just would like to see 
>>>>>>>>>>>>> how we can not only plug it in to regular CI but get everyone 
>>>>>>>>>>>>> that wants to add a test be able to know how to get started with 
>>>>>>>>>>>>> it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jeremy
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 16, 2023, at 1:34 PM, Abe Ratnofsky >>>>>>>>>>>>> <mailto:a...@aber.io>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Just to make sure I'm understanding the details, this would mean 
>>>>>>>>>>>>>> apache/cassandra-harry maintains its status as a separate 
>>>>>>>>>>>>>> repository, apache/cassandra references it as a submodule, and 
>>>>>>>>>>>>>> clones and builds Harry locally, rather than pulling a released 
>>>>>>>>>>>>>> JAR. We can then reference Harry as a library without 
>>>>>>>>>>>>>> maintaining public artifacts for it. Is that in line with what 
>>>>>>>>>>>>>> you're thinking?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> > I'd also like to see us get a Harry run integrated as part of 
>>>>>>>>>>>>>> > our pre-commit CI
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm a strong supporter of this, of course.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 16, 2023, at 11:03 AM, Josh McKenzie 
>>>>>>>>>>>>>>> mailto:jmcken...@apache.org>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Similar to what we've done with accord in 
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like 
>>>>>>>>>>>>>>> to discuss bringing cassandra-harry in-tree as a submodule. 
>>>>>>>>>>>>>>> repo link: https://github.com/apache/cassandra-harry
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Given the value it's brought to the project's stabilization 
>>>>>>>>>>>>>>> efforts and the movement of other things in the ecosystem to 
>>>>>>>>>>>>>>> being more integrated (accord, build-scripts 
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-18133), I think 
>>>>>>>>>>>>>>> having the testing framework better localized and integrated 
>>>>>>>>>>>>>>> would be a net benefit for adoption, awareness, maintenance, 
>>>>>>>>>>>>>>> and tighter workflows as we troubleshoot future failures it 
>>>>>>>>>>>>>>> surfaces.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'd also like to see us get a Harry run integrated as part of 
>>>>>>>>>>>>>>> our pre-commit CI (a 5 minute simple soak test for instance) 
>>>>>>>>>>>>>>> and having that local in this fashion should make that a 
>>>>>>>>>>>>>>> cleaner integration as well.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thoughts?

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-16 Thread Abe Ratnofsky

Just to make sure I'm understanding the details, this would mean 
apache/cassandra-harry maintains its status as a separate repository, 
apache/cassandra references it as a submodule, and clones and builds Harry 
locally, rather than pulling a released JAR. We can then reference Harry as a 
library without maintaining public artifacts for it. Is that in line with what 
you're thinking?

> I'd also like to see us get a Harry run integrated as part of our pre-commit 
> CI

I'm a strong supporter of this, of course.

> On May 16, 2023, at 11:03 AM, Josh McKenzie  wrote:
> 
> Similar to what we've done with accord in 
> https://issues.apache.org/jira/browse/CASSANDRA-18204, I'd like to discuss 
> bringing cassandra-harry in-tree as a submodule. repo link: 
> https://github.com/apache/cassandra-harry
> 
> Given the value it's brought to the project's stabilization efforts and the 
> movement of other things in the ecosystem to being more integrated (accord, 
> build-scripts https://issues.apache.org/jira/browse/CASSANDRA-18133), I think 
> having the testing framework better localized and integrated would be a net 
> benefit for adoption, awareness, maintenance, and tighter workflows as we 
> troubleshoot future failures it surfaces.
> 
> I'd also like to see us get a Harry run integrated as part of our pre-commit 
> CI (a 5 minute simple soak test for instance) and having that local in this 
> fashion should make that a cleaner integration as well.
> 
> Thoughts?

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-04 Thread Abe Ratnofsky

I agree with Bowen - I find Keyspace easier to communicate with. There are 
plenty of situations where the use of "database" is ambiguous (like "Could you 
help me connect to database x?"), but Keyspace refers to a single thing. I 
think more software is moving towards calling these things "namespaces" (like 
Kubernetes), and while "Keyspaces" is not a term used in this way elsewhere, I 
still find it leads to clearer communication.

--
Abe


> On Apr 4, 2023, at 9:24 AM, Andrés de la Peña  wrote:
> 
> I think supporting DATABASE is a great idea. 
> 
> It's better aligned with SQL databases, and can save new users one of the 
> first troubles they find. 
> 
> Probably anyone starting to use Cassandra for the first time is going to face 
> the what is a keyspace? question in the first minutes. Saving that to users 
> with a more common name would be a victory for usability IMO.
> 
> On Tue, 4 Apr 2023 at 16:48, Mike Adamson  > wrote:
>> Hi,
>> 
>> I'd like to propose that we add DATABASE to the CQL grammar as an 
>> alternative to KEYSPACE. 
>> 
>> Background: While TABLE was introduced as an alternative for COLUMNFAMILY in 
>> the grammar we have kept KEYSPACE for the container name for a group of 
>> tables. Nearly all traditional SQL databases use DATABASE as the container 
>> name for a group of tables so it would make sense for Cassandra to adopt 
>> this naming as well.
>> 
>> KEYSPACE would be kept in the grammar but we would update some logging and 
>> documentation to encourage use of the new name. 
>> 
>> Mike Adamson
>> 
>> -- 
>>   Mike Adamson
>> Engineering
>> 
>> +1 650 389 6000  | datastax.com 
>> Find DataStax Online: 
>> 
>> 
>> 
>> 
>> 
>>

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Abe Ratnofsky

> there's a point at which a host limping along is better put down and replaced

I did a basic literature review and it looks like load (total program-erase 
cycles), disk age, and operating temperature all lead to BER increases. We 
don't need to build a whole model of disk failure, we could probably get a lot 
of mileage out of a warn / failure threshold for number of automatic corruption 
repairs.

Under this model, Cassandra could automatically repair X (3?) corruption events 
before warning a user ("time to replace this host"), and Y (10?) corruption 
events before forcing itself down.

But it would be good to get a better sense of user expectations here. Bowen - 
how would you want Cassandra to handle frequent disk corruption events?

--
Abe

> On Mar 9, 2023, at 12:44 PM, Josh McKenzie  wrote:
> 
>> I'm not seeing any reasons why CEP-21 would make this more difficult to 
>> implement
> I think I communicated poorly - I was just trying to point out that there's a 
> point at which a host limping along is better put down and replaced than 
> piecemeal flagging range after range dead and working around it, and there's 
> no immediately obvious "Correct" answer to where that point is regardless of 
> what mechanism we're using to hold a cluster-wide view of topology.
> 
>> ...CEP-21 makes this sequencing safe...
> For sure - I wouldn't advocate for any kind of "automated corrupt data 
> repair" in a pre-CEP-21 world.
> 
> On Thu, Mar 9, 2023, at 2:56 PM, Abe Ratnofsky wrote:
>> I'm not seeing any reasons why CEP-21 would make this more difficult to 
>> implement, besides the fact that it hasn't landed yet.
>> 
>> There are two major potential pitfalls that CEP-21 would help us avoid:
>> 1. Bit-errors beget further bit-errors, so we ought to be resistant to a 
>> high frequency of corruption events
>> 2. Avoid token ownership changes when attempting to stream a corrupted token
>> 
>> I found some data supporting (1) - 
>> https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2014/20140806_T1_Hetzler.pdf
>> 
>> If we detect bit-errors and store them in system_distributed, then we need a 
>> capacity to throttle that load and ensure that consistency is maintained.
>> 
>> When we attempt to rectify any bit-error by streaming data from peers, we 
>> implicitly take a lock on token ownership. A user needs to know that it is 
>> unsafe to change token ownership in a cluster that is currently in the 
>> process of repairing a corruption error on one of its instances' disks. 
>> CEP-21 makes this sequencing safe, and provides abstractions to better 
>> expose this information to operators.
>> 
>> --
>> Abe
>> 
>>> On Mar 9, 2023, at 10:55 AM, Josh McKenzie  wrote:
>>> 
>>>> Personally, I'd like to see the fix for this issue come after CEP-21. It 
>>>> could be feasible to implement a fix before then, that detects bit-errors 
>>>> on the read path and refuses to respond to the coordinator, implicitly 
>>>> having speculative execution handle the retry against another replica 
>>>> while repair of that range happens. But that feels suboptimal to me when a 
>>>> better framework is on the horizon.
>>> I originally typed something in agreement with you but the more I think 
>>> about this, the more a node-local "reject queries for specific token 
>>> ranges" degradation profile seems like it _could_ work. I don't see an 
>>> obvious way to remove the need for a human-in-the-loop on fixing things in 
>>> a pre-CEP-21 world without opening pandora's box (Gossip + TMD + 
>>> non-deterministic agreement on ownership state cluster-wide /cry).
>>> 
>>> And even in a post CEP-21 world you're definitely in the "at what point is 
>>> it better to declare a host dead and replace it" fuzzy territory where 
>>> there's no immediately correct answers.
>>> 
>>> A system_distributed table of corrupt token ranges that are currently being 
>>> rejected by replicas with a mechanism to kick off a repair of those ranges 
>>> could be interesting.
>>> 
>>> On Thu, Mar 9, 2023, at 1:45 PM, Abe Ratnofsky wrote:
>>>> Thanks for proposing this discussion Bowen. I see a few different issues 
>>>> here:
>>>> 
>>>> 1. How do we safely handle corruption of a handful of tokens without 
>>>> taking an entire instance offline for re-bootstrap? This includes refusal 
>>>> to serve read requests for the corrupted token(s), and correct repair of 
>>>> the data.
>>>

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Abe Ratnofsky

I'm not seeing any reasons why CEP-21 would make this more difficult to 
implement, besides the fact that it hasn't landed yet.

There are two major potential pitfalls that CEP-21 would help us avoid:
1. Bit-errors beget further bit-errors, so we ought to be resistant to a high 
frequency of corruption events
2. Avoid token ownership changes when attempting to stream a corrupted token

I found some data supporting (1) - 
https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2014/20140806_T1_Hetzler.pdf

If we detect bit-errors and store them in system_distributed, then we need a 
capacity to throttle that load and ensure that consistency is maintained.

When we attempt to rectify any bit-error by streaming data from peers, we 
implicitly take a lock on token ownership. A user needs to know that it is 
unsafe to change token ownership in a cluster that is currently in the process 
of repairing a corruption error on one of its instances' disks. CEP-21 makes 
this sequencing safe, and provides abstractions to better expose this 
information to operators.

--
Abe

> On Mar 9, 2023, at 10:55 AM, Josh McKenzie  wrote:
> 
>> Personally, I'd like to see the fix for this issue come after CEP-21. It 
>> could be feasible to implement a fix before then, that detects bit-errors on 
>> the read path and refuses to respond to the coordinator, implicitly having 
>> speculative execution handle the retry against another replica while repair 
>> of that range happens. But that feels suboptimal to me when a better 
>> framework is on the horizon.
> I originally typed something in agreement with you but the more I think about 
> this, the more a node-local "reject queries for specific token ranges" 
> degradation profile seems like it _could_ work. I don't see an obvious way to 
> remove the need for a human-in-the-loop on fixing things in a pre-CEP-21 
> world without opening pandora's box (Gossip + TMD + non-deterministic 
> agreement on ownership state cluster-wide /cry).
> 
> And even in a post CEP-21 world you're definitely in the "at what point is it 
> better to declare a host dead and replace it" fuzzy territory where there's 
> no immediately correct answers.
> 
> A system_distributed table of corrupt token ranges that are currently being 
> rejected by replicas with a mechanism to kick off a repair of those ranges 
> could be interesting.
> 
> On Thu, Mar 9, 2023, at 1:45 PM, Abe Ratnofsky wrote:
>> Thanks for proposing this discussion Bowen. I see a few different issues 
>> here:
>> 
>> 1. How do we safely handle corruption of a handful of tokens without taking 
>> an entire instance offline for re-bootstrap? This includes refusal to serve 
>> read requests for the corrupted token(s), and correct repair of the data.
>> 2. How do we expose the corruption rate to operators, in a way that lets 
>> them decide whether a full disk replacement is worthwhile?
>> 3. When CEP-21 lands it should become feasible to support ownership 
>> draining, which would let us migrate read traffic for a given token range 
>> away from an instance where that range is corrupted. Is it worth planning a 
>> fix for this issue before CEP-21 lands?
>> 
>> I'm also curious whether there's any existing literature on how different 
>> filesystems and storage media accommodate bit-errors (correctable and 
>> uncorrectable), so we can be consistent with those behaviors.
>> 
>> Personally, I'd like to see the fix for this issue come after CEP-21. It 
>> could be feasible to implement a fix before then, that detects bit-errors on 
>> the read path and refuses to respond to the coordinator, implicitly having 
>> speculative execution handle the retry against another replica while repair 
>> of that range happens. But that feels suboptimal to me when a better 
>> framework is on the horizon.
>> 
>> --
>> Abe
>> 
>>> On Mar 9, 2023, at 8:23 AM, Bowen Song via dev  
>>> wrote:
>>> 
>>> Hi Jeremiah,
>>> 
>>> I'm fully aware of that, which is why I said that deleting the affected 
>>> SSTable files is "less safe".
>>> 
>>> If the "bad blocks" logic is implemented and the node abort the current 
>>> read query when hitting a bad block, it should remain safe, as the data in 
>>> other SSTable files will not be used. The streamed data should contain the 
>>> unexpired tombstones, and that's enough to keep the data consistent on the 
>>> node.
>>> 
>>> 
>>> Cheers,
>>> Bowen
>>> 
>>> 
>>> 
>>> On 09/03/2023 15:58, Jeremiah D Jordan wrote:
>>>> It is actua

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-09 Thread Abe Ratnofsky

Thanks for proposing this discussion Bowen. I see a few different issues here:

1. How do we safely handle corruption of a handful of tokens without taking an 
entire instance offline for re-bootstrap? This includes refusal to serve read 
requests for the corrupted token(s), and correct repair of the data.
2. How do we expose the corruption rate to operators, in a way that lets them 
decide whether a full disk replacement is worthwhile?
3. When CEP-21 lands it should become feasible to support ownership draining, 
which would let us migrate read traffic for a given token range away from an 
instance where that range is corrupted. Is it worth planning a fix for this 
issue before CEP-21 lands?

I'm also curious whether there's any existing literature on how different 
filesystems and storage media accommodate bit-errors (correctable and 
uncorrectable), so we can be consistent with those behaviors.

Personally, I'd like to see the fix for this issue come after CEP-21. It could 
be feasible to implement a fix before then, that detects bit-errors on the read 
path and refuses to respond to the coordinator, implicitly having speculative 
execution handle the retry against another replica while repair of that range 
happens. But that feels suboptimal to me when a better framework is on the 
horizon.

--
Abe

> On Mar 9, 2023, at 8:23 AM, Bowen Song via dev  
> wrote:
> 
> Hi Jeremiah,
> 
> I'm fully aware of that, which is why I said that deleting the affected 
> SSTable files is "less safe".
> 
> If the "bad blocks" logic is implemented and the node abort the current read 
> query when hitting a bad block, it should remain safe, as the data in other 
> SSTable files will not be used. The streamed data should contain the 
> unexpired tombstones, and that's enough to keep the data consistent on the 
> node.
> 
> Cheers,
> Bowen
> 
> 
> 
> On 09/03/2023 15:58, Jeremiah D Jordan wrote:
>> It is actually more complicated than just removing the sstable and running 
>> repair.
>> 
>> In the face of expired tombstones that might be covering data in other 
>> sstables the only safe way to deal with a bad sstable is wipe the token 
>> range in the bad sstable and rebuild/bootstrap that range (or wipe/rebuild 
>> the whole node which is usually the easier way).  If there are expired 
>> tombstones in play, it means they could have already been compacted away on 
>> the other replicas, but may not have compacted away on the current replica, 
>> meaning the data they cover could still be present in other sstables on this 
>> node.  Removing the sstable will mean resurrecting that data.  And pulling 
>> the range from other nodes does not help because they can have already 
>> compacted away the tombstone, so you won’t get it back.
>> 
>> Tl;DR you can’t just remove the one sstable you have to remove all data in 
>> the token range covered by the sstable (aka all data that sstable may have 
>> had a tombstone covering).  Then you can stream from the other nodes to get 
>> the data back.
>> 
>> -Jeremiah
>> 
>>> On Mar 8, 2023, at 7:24 AM, Bowen Song via dev  
>>>  wrote:
>>> 
>>> At the moment, when a read error, such as unrecoverable bit error or data 
>>> corruption, occurs in the SSTable data files, regardless of the 
>>> disk_failure_policy configuration, manual (or to be precise, external) 
>>> intervention is required to recover from the error.
>>> 
>>> Commonly, there's two approach to recover from such error:
>>> 
>>> The safer, but slower recover strategy: replace the entire node.
>>> The less safe, but faster recover strategy: shut down the node, delete the 
>>> affected SSTable file(s), and then bring the node back online and run 
>>> repair.
>>> Based on my understanding of Cassandra, it should be possible to recover 
>>> from such error by marking the affected token range in the existing SSTable 
>>> as "corrupted" and stop reading from them (e.g. creating a "bad block" file 
>>> or in memory), and then streaming the affected token range from the healthy 
>>> replicas. The corrupted SSTable file can then be removed upon the next 
>>> successful compaction involving it, or alternatively an anti-compaction is 
>>> performed on it to remove the corrupted data.
>>> 
>>> The advantage of this strategy is:
>>> 
>>> Reduced node down time - node restart or replacement is not needed
>>> Less data streaming is required - only the affected token range
>>> Faster recovery time - less streaming and delayed compaction or 
>>> anti-compaction
>>> No less safe than replacing the entire node
>>> This process can be automated internally, removing the need for operator 
>>> inputs
>>> The disadvantage is added complexity on the SSTable read path and it may 
>>> mask disk failures from the operator who is not paying attention to it.
>>> 
>>> What do you think about this?
>>> 
>>

Re: Downgradability

2023-02-21 Thread Abe Ratnofsky

Some interesting existing work on this subject is "Understanding and Detecting 
Software Upgrade Failures in Distributed Systems" - 
https://dl.acm.org/doi/10.1145/3477132.3483577, also summarized by Andrey 
Satarin here: 
https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/

They specifically tested Cassandra upgrades, and have a solid list of defects 
that they found. They also describe their testing mechanism DUPTester, which 
includes a component that confirms that the leftover state from one version can 
start up on the next version. There is a wider scope of upgrade defects 
highlighted in the paper, beyond SSTable version support.

I believe the project would benefit from expanding our test suite similarly, by 
parametrizing more tests on upgrade version pairs.

Also, per Benedict's comment:

> It’s a commitment, and it requires every contributor to consider it as part 
> of work they produce.

But it shouldn't be a burden. Ability to downgrade is a testable problem, so I 
see this work as a function of the suite of tests the project is willing to 
agree on supporting.

Specifically - I agree with Scott's proposal to emulate the HDFS 
upgrade-then-finalize approach. I would also support automatic finalization 
based on a time threshold or similar, to balance the priorities of safe and 
straightforward upgrades. Users need to be aware of the range of SSTable 
formats supported by a given version, and how to handle when their SSTables 
wouldn't be supported by an upcoming upgrade.

--
Abe

[DISCUSS] CEP-22: Drivers Donation Status

2023-01-12 Thread Abe Ratnofsky

What's the current status of CEP-22?

It looks like the CEP draft on Confluence [1] hasn't been updated since Aug 
2022 and mailing list search [2] shows no results, so it never made it to a 
vote.

--
Abe

[1]: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-22%3A+Datastax+Drivers+Donation
[2]: https://lists.apache.org/list?*@cassandra.apache.org:lte=240M:CEP-22

Re: [VOTE] Release Apache Cassandra 4.1.0 (take2)

2022-12-12 Thread Abe Ratnofsky

I agree Benedict - I don't think we can provide a clear advisory to our users, 
so I would approve of not sharing anything in the release notes. But if someone 
posts an issue (likely to the user ML) related to streaming / bootstrapping on 
4.1.0, then we should engage with the knowledge that it might be related to 
18110.

> On Dec 12, 2022, at 5:06 PM, Benedict  wrote:
> 
> I’m unsure that without more information it is very helpful to highlight in 
> the release notes. We don’t even have a strong hypothesis tying this issue to 
> 4.1.0 specifically, and don’t have a general policy of highlighting 
> undiagnosed issues in release notes?
> 
> 
>> On 13 Dec 2022, at 00:48, Jon Meredith  wrote:
>> 
>> 
>> Thanks for the extra time to investigate. Unfortunately no progress on 
>> finding the root cause for this issue, just successful bootstraps in our 
>> attempts to reproduce. I think highlighting the ticket in the release notes 
>> is sufficient and resolving this issue should not hold up the release.
>> 
>> I agree with Jeff that the multiple concurrent bootstraps are unlikely to be 
>> the issue - I only mentioned in the ticket in case I am wrong. Abe or I will 
>> update the ticket if we find anything new.
>> 
>> On Sun, Dec 11, 2022 at 12:33 PM Jeff Jirsa > <mailto:jji...@gmail.com>> wrote:
>> Concurrent shouldn’t matter (they’re non-overlapping in the repro). And I’d 
>> personally be a bit surprised if table count matters that much. 
>> 
>> It probably just requires high core count and enough data that the streams 
>> actually interact with the rate limiter 
>> 
>>> On Dec 11, 2022, at 10:32 AM, Mick Semb Wever >> <mailto:m...@apache.org>> wrote:
>>> 
>>> 
>>> 
>>> 
>>> On Sat, 10 Dec 2022 at 23:09, Abe Ratnofsky >> <mailto:a...@aber.io>> wrote:
>>> Sorry - responded on the take1 thread:
>>> 
>>> Could we defer the close of this vote til Monday, December 12th after 6pm 
>>> Pacific Time?
>>> 
>>> Jon Meredith and I have been working thru an issue blocking streaming on 
>>> 4.1 for the last couple months, and are now testing a promising fix. We're 
>>> currently working on a write-up, and we'd like to hold the release until 
>>> the community is able to review our findings.
>>> 
>>> 
>>> Update on behalf of Jon and Abe.
>>> 
>>> The issue raised is CASSANDRA-18110.
>>> Concurrent, or nodes with high cpu count and number of tables performing, 
>>> host replacements can fail.
>>> 
>>> It is still unclear if this is applicable to OSS C*, and if so to what 
>>> extent users might ever be impacted.
>>> More importantly, there's a simple workaround for anyone that hits the 
>>> problem.
>>> 
>>> Without further information on the table, I'm inclined to continue with 
>>> 4.1.0 GA (closing the vote in 32 hours), but add a clear message to the 
>>> release announcement of the issue and workaround. Interested in hearing 
>>> others' positions, don't be afraid to veto if that's where you're at.
>>> 
>>>

Re: [VOTE] Release Apache Cassandra 4.1.0 (take2)

2022-12-10 Thread Abe Ratnofsky

Sorry - responded on the take1 thread:

Could we defer the close of this vote til Monday, December 12th after 6pm 
Pacific Time?

Jon Meredith and I have been working thru an issue blocking streaming on 4.1 
for the last couple months, and are now testing a promising fix. We're 
currently working on a write-up, and we'd like to hold the release until the 
community is able to review our findings.

Thanks,
Abe

> On Dec 9, 2022, at 4:11 PM, guo Maxwell  wrote:
> 
> +1
> 
> Jeremiah D Jordan  >于2022年12月10日 周六上午5:59写道：
> +1 nb
> 
> 
>> On Dec 7, 2022, at 3:40 PM, Mick Semb Wever > > wrote:
>> 
>> 
>> Proposing the (second) test build of Cassandra 4.1.0 for release.
>> 
>> sha1: f9e033f519c14596da4dc954875756a69aea4e78
>> Git: 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.1.0-tentative
>>  
>> 
>> Maven Artifacts: 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1282/org/apache/cassandra/cassandra-all/4.1.0/
>>  
>> 
>> 
>> The Source and Build Artifacts, and the Debian and RPM packages and 
>> repositories, are available here: 
>> https://dist.apache.org/repos/dist/dev/cassandra/4.1.0/ 
>> 
>> 
>> The vote will be open for 96 hours (longer if needed). Everyone who has 
>> tested the build is invited to vote. Votes by PMC members are considered 
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>> 
>> [1]: CHANGES.txt: 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.1.0-tentative
>>  
>> 
>> [2]: NEWS.txt: 
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.1.0-tentative
>>  
>> 
> 
> -- 
> you are the apple of my eye !

Re: [VOTE] Release Apache Cassandra 4.1.0 GA

2022-12-10 Thread Abe Ratnofsky

Could we defer the close of this vote til Monday, December 12th after 6pm 
Pacific Time?

Jon Meredith and I have been working thru an issue blocking streaming on 4.1 
for the last couple months, and are now testing a promising fix. We're 
currently working on a write-up, and we'd like to hold the release until the 
community is able to review our findings.

Thanks,
Abe

> On Dec 9, 2022, at 5:13 AM, Marianne Lyne Manaog  
> wrote:
> 
> Hi everyone, 
> 
> Matt and I finished running all the tests for V2 with the bug fixes from 
> CASSANDRA-18086  and 
> the results for the 100 partitions are definitely better than V1. For the 
> larger partitions (1000, 1), the results for V2 are comparable and 
> overall V2 did not have any performance regression.
> 
> On Thu, Dec 8, 2022 at 11:58 AM Marianne Lyne Manaog 
> mailto:marianne.man...@ieee.org>> wrote:
> Hi everyone, 
> 
> Matt and I finished running all the tests for V2 with the bug fixes from 
> CASSANDRA-18086  and 
> the results for the 100 partitions are definitely better than V1. For the 
> larger partitions (1000, 1), the results for V2 are comparable and 
> overall V2 did not have any performance regression.
> 
> On Tue, Dec 6, 2022 at 4:49 PM Marianne Lyne Manaog  > wrote:
> Here is CASSANDRA-18097 
>  with the bug fix for 
> the performance regression encountered with 100 partitions in V2.
> 
> On Mon, Dec 5, 2022 at 2:05 PM Marianne Lyne Manaog  > wrote:
> Following on what Matt said:
>   - Here is the link to the Cassandra repo with the bugfix of wait time 
> from ms to ns: 
> https://github.com/apache/cassandra/compare/trunk...marianne-manaog:cassandra:bugfix/wait-from-ms-to-ns
>  
> 
>   - the Paxos configuration used is:
>   paxos_contention_wait_randomizer: uniform
>   paxos_contention_min_wait: 0
>   paxos_contention_max_wait: 100ms
> 
>   - V1 and V2 have the same configurations except for paxos_variant: 
> which changes accordingly
> 
>   Results: V1 (100 partitions)
>   - Average read: 28948
>   - Standard Deviation: 416.271
>   - Coefficient of variance: 1.44%
>   - Average write: 19248
>   - Standard Deviation:158.595
>   - Coefficient of variance:0.82%
> 
>   Results: V2 (100 partitions)
>   - Average read: 12307
>   - Standard Deviation: 2367.473
>   - Coefficient of variance: 19.24%
>   - Average write: 5780
>   - Standard Deviation: 1154.261
>   - Coefficient of variance: 19.97%
> 
> 
> On Mon, Dec 5, 2022 at 1:50 PM Matt Fleming  > wrote:
> Me and Marianne are also still chasing a performance issue with Paxos v2 when 
> compared with v1. We
> see way more contention on v2 for a LOCAL_SERIALIZABLE workload that 
> writes/reads to only 100 
> partitions (v2 performs better for higher partition counts). We're still 
> investigating what's going
> on.
> 
> Should that be a -1 vote? I'm not sure :)
> 
> On Mon, 5 Dec 2022 at 11:37, Benedict  > wrote:
> -0 
> 
> CASSANDRA-18086 should probably be fixed and merged first, as Paxos v2 will 
> be unlikely to work well for users without it. Either that or we need to 
> update NEWS.txt to mention it.
> 
>> On 5 Dec 2022, at 11:01, Aleksey Yeshchenko > > wrote:
>> 
>> +1
>> 
>>> On 5 Dec 2022, at 10:17, Benjamin Lerer >> > wrote:
>>> 
>>> +1
>>> 
>>> Le lun. 5 déc. 2022 à 11:02, Berenguer Blasi >> > a écrit :
>>> +1
>>> 
>>> On 5/12/22 10:53, guo Maxwell wrote:
 +1 
 
 Mick Semb Wever mailto:m...@apache.org>>于2022年12月5日 
 周一下午5:33写道：
 
 Proposing the test build of Cassandra 4.1.0 GA for release.
 
 sha1: b807f97b37933fac251020dbd949ee8ef245b158
 Git: 
 https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.1.0-tentative
  
 
 Maven Artifacts: 
 https://repository.apache.org/content/repositories/orgapachecassandra-1281/org/apache/cassandra/cassandra-all/4.1.0/
  
 
 
 The Source and Build Artifacts, and the Debian and RPM packages and 
 repositories, are available here: 
 https://dist.apache.org/repos/dist/dev/cassandra/4.1.0/ 
 
 
 The vote will be open for 72 hours (longer if needed). Everyone who has 
 tested the build is invited to vote. Votes by PMC

Re: Using bash instead of sh in generate.sh?

2022-11-01 Thread Abe Ratnofsky

I took a look through our scripts, and Bash is used in a handful of development 
tools. /bin/sh is used by everything under bin/ and tools/bin. I wouldn’t want 
to change a user-facing dependency unless there’s a solid reason, but I think 
it’s fine for CircleCI tooling to use Bash instead of /bin/sh.

Abe

> 
> On Oct 26, 2022, at 09:50, Derek Chen-Becker  wrote:
> 
> 
> I don't think this quite rises to the level of a CEP, but I wanted to get 
> input on this. I'm working on the CircleCI build tooling and the generate.sh 
> script uses "/bin/sh" for its execution. There are a couple of things that 
> could be cleaned up and simplified if we used bash instead. I'm all for 
> making sure that tooling is portable, but I think that in this case it's 
> probably safe to assume bash is available for dev work. For example, all of 
> the shell scripts in the cassandra-builds repo use bash, not sh.
> 
> Thanks,
> 
> Derek
> 
> -- 
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>

Re: [DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Abe Ratnofsky

Looking at this link: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-23%3A++Enhancement+for+Sparse+Data+Serialization

Do you have any plans to include benchmarks in your test plan? It would be 
useful to include disk usage / read performance / write performance comparisons 
with the new encodings, particularly for sparse collections where a subset of 
data is selected out of a collection.

I do wonder whether this is CEP-worthy. The CEP says that the changes will not 
impact existing users, will be backwards compatible, and overall is an 
efficiency improvement. The CEP guidelines say a CEP is encouraged “for 
significant user-facing or changes that cut across multiple subsystems”. Any 
reason why a Jira isn’t sufficient?

Abe

> On Sep 5, 2022, at 1:57 AM, Claude Warren via dev  
> wrote:
> 
> I have just posted a CEP  covering an Enhancement for Sparse Data 
> Serialzation.  This is in response to CASSANDRA-8959
> 
> I look forward to responses.
> 
>

Re: [DISCUSS] CASSANDRA-17750: Security migration away from Maven Ant Tasks

2022-07-20 Thread Abe Ratnofsky

Most of the discussion has happened in the PR: 
https://github.com/apache/cassandra/pull/1725

Leaving this thread open over the weekend to gather input.

> On Jul 20, 2022, at 10:40 AM, emmanuel warreng  
> wrote:
> 
> Unsubscribe
> 
> On Tue, Jul 19, 2022, 21:20 Abe Ratnofsky  <mailto:a...@aber.io>> wrote:
> Hello all,
> 
> We currently depend on Maven Ant Tasks (MAT) during build, for declaring 
> dependencies and generating POM files from within build.xml. MAT has long 
> been retired (no commits since maintenance in 2015), has registered CVEs in 
> its dependencies (CVE-2017-1000487), and encourages migration to its 
> successor, Maven Artifact Resolver Ant Tasks (MARAT). More detail in the 
> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17750 
> <https://issues.apache.org/jira/browse/CASSANDRA-17750>
> 
> I have a PR up to remove our dependency on MAT, with discussion from David 
> Capwell and Mick Semb Wever: https://github.com/apache/cassandra/pull/1725 
> <https://github.com/apache/cassandra/pull/1725>
> 
> There are two main items for wider discussion:
> 
> 1. Is it worth addressing this CVE and retired dependency with changes to our 
> build system, or should we suppress it?
> 
> 2. Are there more alternatives to Maven Ant Tasks that should be considered, 
> like Ivy?
> 
> My stance, summarized from the PR comments, is that a retired dependency that 
> does not receive security updates (current CVE or not) should be replaced by 
> a maintained project, and that the general approach in the PR (give or take 
> minor changes to POM packaging) is the one most compatible with our current 
> approach, and does not preclude any build system changes in the near or 
> distant future.
> 
> Curious to hear from others.
> 
> —
> Abe

[DISCUSS] CASSANDRA-17750: Security migration away from Maven Ant Tasks

2022-07-19 Thread Abe Ratnofsky

Hello all,

We currently depend on Maven Ant Tasks (MAT) during build, for declaring 
dependencies and generating POM files from within build.xml. MAT has long been 
retired (no commits since maintenance in 2015), has registered CVEs in its 
dependencies (CVE-2017-1000487), and encourages migration to its successor, 
Maven Artifact Resolver Ant Tasks (MARAT). More detail in the Jira: 
https://issues.apache.org/jira/browse/CASSANDRA-17750

I have a PR up to remove our dependency on MAT, with discussion from David 
Capwell and Mick Semb Wever: https://github.com/apache/cassandra/pull/1725

There are two main items for wider discussion:

1. Is it worth addressing this CVE and retired dependency with changes to our 
build system, or should we suppress it?

2. Are there more alternatives to Maven Ant Tasks that should be considered, 
like Ivy?

My stance, summarized from the PR comments, is that a retired dependency that 
does not receive security updates (current CVE or not) should be replaced by a 
maintained project, and that the general approach in the PR (give or take minor 
changes to POM packaging) is the one most compatible with our current approach, 
and does not preclude any build system changes in the near or distant future.

Curious to hear from others.

—
Abe

Re: CEP-15 multi key transaction syntax

2022-06-30 Thread Abe Ratnofsky

The new syntax looks great, and I’m really excited to see this coming together.

One piece of feedback on the proposed syntax is around the use of “=“ as a 
declaration in addition to its current use as an equality operator in a WHERE 
clause and an assignment operator in an UPDATE:

BEGIN TRANSACTION
  LET car_miles = miles_driven, car_is_running = is_running FROM cars WHERE 
model=’pinto’
  LET user_miles = miles_driven FROM users WHERE name=’blake’
  SELECT something else from some other table
  IF NOT car_is_running THEN ABORT
  UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
COMMIT TRANSACTION

This is supported in languages like PL/pgSQL, but in a normal SQL query kind of 
local declaration is often expressed as an alias (SELECT col AS new_col), 
subquery alias (SELECT col) t, or common table expression (WITH t AS (SELECT 
col)).

Here’s an example of an alternative to the proposed syntax that I’d find more 
readable:

BEGIN TRANSACTION
  WITH car_miles, car_is_running AS (SELECT miles_driven, is_running FROM cars 
WHERE model=’pinto’),
user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’)
  IF NOT car_is_running THEN ABORT
  UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
COMMIT TRANSACTION

There’s also the option of naming the transaction like a subquery, and 
supporting LET via AS (this one I’m less sure about but wanted to propose 
anyway):

BEGIN TRANSACTION t1
  SELECT miles_driven AS t1.car_miles, is_running AS t1.car_is_running FROM 
cars WHERE model=’pinto’;
  SELECT miles_driven AS t1.user_miles FROM users WHERE name=’blake’;
  IF NOT car_is_running THEN ABORT
  UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
  UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
COMMIT TRANSACTION

This also has the benefit of resolving ambiguity in case of naming conflicts 
with existing (or future) column names.

--
Abe

Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection

2021-11-19 Thread Abe Ratnofsky

I like the idea of adding this to the CQL Grammar, but would like to see it 
follow the ReplicationStrategy style of defining a map with a class and 
parameters. For example, something like this (names I’m not tied to):

SELECT * FROM table WHERE pk = x WITH ARTIFICIAL LATENCY = { 'class': 
'UniformLatencyInjector', 'durationMs': 4 }

If we want to support additional types of latency injection (like between pairs 
of DCs, non-uniform, etc) then the grammar will be able to handle those:

SELECT * FROM table WHERE pk = x WITH ARTIFICIAL LATENCY = { 'class': 
'WideAreaLatencyInjector', 'fromDc': 'DC1', 'toDc': 'DC2', 'durationMs': 4 }

—
Abe

> On Nov 19, 2021, at 09:30, bened...@apache.org wrote:
> 
> To resurrect this discussion briefly, does anyone have a preference for 
> either CQL Grammar or Protocol support?
> 
> This originally felt to me like something we might want to support at the 
> native protocol level, however that creates a dependency on specific clients 
> and the feature might ultimately be less flexible. It’s not clear why we 
> wouldn’t prefer some kind of CQL change like:
> 
> SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY
> 
> With queries being able to supply specific latencies if they so choose:
> 
> SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY 4ms
> 
> That might even support some DC->DC map for additional latencies:
> 
> SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY ‘{dc1:{dc2: 4ms}}’
> 
> This would leave applications a great deal of flexibility for experimenting 
> with latency impacts, and greater ease for evolving this feature over time 
> than specifying query eligibility at the protocol level.
> 
> Does anyone have any thoughts about this?
> 
> From: bened...@apache.org 
> Date: Wednesday, 6 October 2021 at 14:48
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> This is a very good point. I forget the reason we settled on consistency 
> levels, I assume it was due to simplicity of the solution, as deploying 
> support for a new protocol-level change is more involved.
> 
> That’s probably not a good reason here, and I agree that overloading 
> consistency level feels wrong. I hope we will retire user-provided 
> consistency levels over the coming year or two, which is another good reason 
> not to begin enhancing it with new meanings.
> 
> I will rework the ticket and patches.
> 
> From: Paulo Motta 
> Date: Wednesday, 6 October 2021 at 14:37
> To: Cassandra DEV 
> Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> This sounds like a great feature!
> 
> I wonder if Consistencylevel is the best way to expose this to users
> though, can't we implement this via another driver/protocol option ? Ie.
> "delay_enabled" flag that would be a modifier to an existing CL.
> 
> If we decide to go the CL route, I wonder if this isn't a good opportunity
> to introduce pluggable consistency levels (CASSANDRA-8119 <
> https://issues.apache.org/jira/browse/CASSANDRA-8119>)
>  so these would only
> become available when the feature is enabled.
> 
> My concern here is adding niche consistency levels to the default CL table
> which may create confusion to non-power users.
> 
>> Em qua., 6 de out. de 2021 às 10:12, bened...@apache.org <
>> bened...@apache.org> escreveu:
>> 
>> Hi Everyone,
>> 
>> This is a modest user-facing feature that I want to highlight in case
>> anyone has any input. In order to validate if a real cluster may modify its
>> topology or consistency level (e.g. from local to global), this ticket
>> introduces a facility for injecting latency to internode messages. This is
>> particularly helpful for high-availability topologies, and in particular
>> for LWTs (where performance may be unpredictable due to contention), so
>> that real traffic may be modified to experience gradually increasing
>> latency in order to validate a topology (or the impact of a global
>> consistency level) before any transition is undertaken.
>> 
>> The user-visible changes include new config parameters, new JMX end points
>> for modifying these parameters, and new consistency levels that may be
>> supplied to mark queries as suitable for latency injection (so that
>> applications may nominate queries for this mechanism)
>> 
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

55 matches

Mail list logo