Re: [VOTE] Release Apache Cassandra 4.1.2

2023-05-25 Thread Berenguer Blasi

+1

On 25/5/23 22:12, Mick Semb Wever wrote:


The vote will be open for 72 hours (longer if needed). Everyone
who has tested the build is invited to vote. Votes by PMC members
are considered binding. A vote passes if there are at least three
binding +1s and no -1's.




+1


Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11)***


***) same as 4.0.10 vote comment: 
https://lists.apache.org/thread/qpf7gvdts0vqmmwwjv9k3v3lsk248n7g


Re: [VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread Berenguer Blasi

+1

On 25/5/23 21:57, Mick Semb Wever wrote:



The vote will be open for 72 hours (longer if needed). Everyone
who has tested the build is invited to vote. Votes by PMC members
are considered binding. A vote passes if there are at least three
binding +1s and no -1's.



+1


Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11)***


***) yum repo installation looks to be failing due to a legacy (SHA1) 
third-party sig in our KEYS file. This would impact all rhel9+ users.

Workaround is…
```
# run this before `yum install cassandra`
update-crypto-policies --set LEGACY
```
ref: 
https://www.redhat.com/en/blog/rhel-security-sha-1-package-signatures-distrusted-rhel-9


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread guo Maxwell
+1

Dinesh Joshi 于2023年5月26日 周五上午11:08写道:

> +1
>
>
> On May 25, 2023, at 8:45 AM, Jonathan Ellis  wrote:
>
> 
>
> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> --
you are the apple of my eye !


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Dinesh Joshi
+1On May 25, 2023, at 8:45 AM, Jonathan Ellis  wrote:Let's make this official.CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+IndexesPOC that demonstrates all the big rocks, including distributed queries: https://github.com/datastax/cassandra/tree/cep-vsearch-- Jonathan Ellisco-founder, http://www.datastax.com@spyced


Re: [CASSANDRA-11471] Authentication mechanism negotiation (OPTIONS/SUPPORTED)

2023-05-25 Thread Dinesh Joshi
Leaving the naming aside (the hardest part of any software), I am generally 
positive about your idea. A protocol version bump may be avoidable like you 
suggested. Perhaps a prototype of this idea is in order to help shape the idea? 
Would you like to take it on?

> On May 21, 2023, at 4:21 AM, Derek Chen-Becker  wrote:
> 
> We had a recent discussion in Slack about how to potentially use the OPTIONS 
> and SUPPORTED messages in the existing CQL protocol to allow the server to 
> advertise more than one authentication method and allow the client to then 
> choose which authenticator to use. The primary use case here is to allow 
> seamless migration to a new authenticator without having to have all parties 
> involved agree on a single class (and avoid a disruptive change). There's 
> already a ticket open that was focused on making a change to the binary 
> protocol (https://issues.apache.org/jira/browse/CASSANDRA-11471) but I think 
> that we can accomplish this in a backwards compatible way that avoids a 
> change to the protocol itself.
> 
> What I propose is to allow a server configured for this graceful auth change 
> to send an additional value in the [string multimap] body of the SUPPORTED 
> message that indicates which authenticators are supported, in descending 
> priority order. For example, if I wanted to migrate my server to support both 
> PlainTextAuthProvider and some new MyAwesomeAuthProvider, I would configure 
> my client to query options and the server would respond with
> 
> 'AUTHENTICATORS': ['MyAwesomeAuthProvider', 'PlainTextAuthProvider']
> 
> The client can then choose from its own supported providers and send it as 
> part of the STARTUP message [string map] body:
> 
> 'AUTHENTICATOR': 'MyAwesomeAuthenticator'
> 
> I'm not good with naming so feel free to propose a different key for either 
> of these map entries. In any case, the server then validates that the 
> client-chosen authenticator is actually supported and would then proceed with 
> the AUTHENTICATE message. In the case where the client sends an 
> invalid/unsupported authenticator choice, the server can simply respond with 
> an AUTHENTICATE using the most-preferred configured authenticator.
> 
> I think this is a better approach than changing the binary protocol because 
> the mechanism already exists for negotiating options and this seems like a 
> natural use case that avoids having to create an entirely new version of the 
> protocol. It does not appear to conflict with the existing protocol 
> definition but I'm not 100% certain. Section 4.1.1 discusses "Possible 
> options"  for the STARTUP message 
> (https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec#L296),
>  but that's an unfortunate use of English that's ambiguous as to whether it 
> means "The only ones supported" or "Supported but not exclusively".
> 
> I've taken a look at the Java and Python driver source so far and I can't 
> find anything that would seem to cause a problem by returning a SUPPORTED 
> multimap entry that the client isn't aware of (in both they would appear to 
> ignore it), but I'll also admit that this is the first time I've looked at 
> this part of the client code and I could be missing something. Is anyone 
> aware of possible problems that would be caused by using this approach? In 
> particular, if there are clients that strictly validate all entries in the 
> SUPPORTED map then this could cause a problem. 
> 
> Worst case, we may still need a protocol version bump if the enumeration of 
> STARTUP options is intended to be strict, but at least this would not require 
> a new message type and would fit into the existing framework for negotiation 
> between client and server.
> 
> Thoughts, questions, or concerns would be appreciated.
> 
> Cheers,
> 
> Derek
> 
> -- 
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 



Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread J. D. Jordan
+1 nbOn May 25, 2023, at 7:47 PM, Jasonstack Zhao Yang  wrote:+1On Fri, 26 May 2023 at 8:44 AM, Yifan Cai  wrote:






+1






From: Josh McKenzie 
Sent: Thursday, May 25, 2023 5:37:02 PM
To: dev 
Subject: Re: [VOTE] CEP-30 ANN Vector Search
 



+1


On Thu, May 25, 2023, at 8:33 PM, Jake Luciani wrote:


+1





On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis  wrote:



Let's make this official.




CEP: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes




POC that demonstrates all the big rocks, including distributed queries: 
https://github.com/datastax/cassandra/tree/cep-vsearch





--




Jonathan Ellis

co-founder, http://www.datastax.com

@spyced








--

http://twitter.com/tjake








Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Jasonstack Zhao Yang
+1

On Fri, 26 May 2023 at 8:44 AM, Yifan Cai  wrote:

> +1
> --
> *From:* Josh McKenzie 
> *Sent:* Thursday, May 25, 2023 5:37:02 PM
> *To:* dev 
> *Subject:* Re: [VOTE] CEP-30 ANN Vector Search
>
> +1
>
> On Thu, May 25, 2023, at 8:33 PM, Jake Luciani wrote:
>
> +1
>
> On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis  wrote:
>
> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> --
> http://twitter.com/tjake
>
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Yifan Cai
+1

From: Josh McKenzie 
Sent: Thursday, May 25, 2023 5:37:02 PM
To: dev 
Subject: Re: [VOTE] CEP-30 ANN Vector Search

+1

On Thu, May 25, 2023, at 8:33 PM, Jake Luciani wrote:
+1

On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis 
mailto:jbel...@gmail.com>> wrote:
Let's make this official.

CEP: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

POC that demonstrates all the big rocks, including distributed queries: 
https://github.com/datastax/cassandra/tree/cep-vsearch

--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
--
http://twitter.com/tjake


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Josh McKenzie
+1

On Thu, May 25, 2023, at 8:33 PM, Jake Luciani wrote:
> +1
> 
> On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis  wrote:
>> Let's make this official.
>> 
>> CEP: 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>> 
>> POC that demonstrates all the big rocks, including distributed queries: 
>> https://github.com/datastax/cassandra/tree/cep-vsearch
>> 
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
> --
> http://twitter.com/tjake

Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Jake Luciani
+1

On Thu, May 25, 2023 at 11:45 AM Jonathan Ellis  wrote:

> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
-- 
http://twitter.com/tjake


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread David Capwell
+1

> On May 25, 2023, at 1:53 PM, Ekaterina Dimitrova  
> wrote:
> 
> +1
> 
> On Thu, 25 May 2023 at 16:46, Brandon Williams  > wrote:
>> +1
>> 
>> Kind Regards,
>> Brandon
>> 
>> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis > > wrote:
>> >
>> > Let's make this official.
>> >
>> > CEP: 
>> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>> >
>> > POC that demonstrates all the big rocks, including distributed queries: 
>> > https://github.com/datastax/cassandra/tree/cep-vsearch
>> >
>> > --
>> > Jonathan Ellis
>> > co-founder, http://www.datastax.com 
>> > @spyced



Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Ekaterina Dimitrova
+1

On Thu, 25 May 2023 at 16:46, Brandon Williams  wrote:

> +1
>
> Kind Regards,
> Brandon
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
> >
> > Let's make this official.
> >
> > CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
> >
> > POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
>


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Brandon Williams
+1

Kind Regards,
Brandon

On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis  wrote:
>
> Let's make this official.
>
> CEP: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries: 
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced


Re: Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread German Eichberger via dev
+ 1

I am seeing ANN Vector Search pop up in every database...

From: Patrick McFadin 
Sent: Thursday, May 25, 2023 11:29 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [VOTE] CEP-30 ANN Vector Search

+1
Love the buzz this creating with new users. Thanks for the work on this 
Jonathan.

On Thu, May 25, 2023 at 8:45 AM Jonathan Ellis 
mailto:jbel...@gmail.com>> wrote:
Let's make this official.

CEP: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

POC that demonstrates all the big rocks, including distributed queries: 
https://github.com/datastax/cassandra/tree/cep-vsearch

--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 4.1.2

2023-05-25 Thread Mick Semb Wever
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.





+1


Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11) ***


***) same as 4.0.10 vote comment:
https://lists.apache.org/thread/qpf7gvdts0vqmmwwjv9k3v3lsk248n7g


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Benedict
Nope, my awareness of Agrona predates Branimir’s proposal, as does others. Aleksey intended to propose its inclusion beforehand also.If all we’re getting is lock striping, do we really need a separate library?On 25 May 2023, at 19:33, Jonathan Ellis  wrote:Let's not fall prey to status quo bias, nobody performed an exhaustive analysis of agrona in November.  If Branimir had proposed fastutils at the time that's what we'd be using today.On Thu, May 25, 2023 at 10:50 AM Benedict  wrote:Given they provide no data or explanation, and that benchmarking is hard, I’m not inclined to give much weight to their analysis.Agrona was favoured in large part due to the perceived quality of the library. I’m not inclined to swap it out without proper evidence the fastutils is both materially faster in a manner care about and of similar quality.On 25 May 2023, at 16:43, Jonathan Ellis  wrote:Try it out and see, the only data point I have is that the company who has spent more effort here than anyone else I could find likes fastutil better.On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:> On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> 
> Any objections to adding the concurrent wrapper and switching out agrona for fastutil?

How does fastutil compare to agrona in terms of memory profile and runtime performance? How invasive would it be to switch?-- Jonathan Ellisco-founder, http://www.datastax.com@spyced
-- Jonathan Ellisco-founder, http://www.datastax.com@spyced


Re: [VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread Mick Semb Wever
>
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>


+1


Checked
- signing correct
- checksums are correct
- source artefact builds (JDK 8+11)
- binary artefact runs (JDK 8+11)
- debian package runs (JDK 8+11)
- debian repo runs (JDK 8+11)
- redhat* package runs (JDK 8+11)
- redhat* repo runs (JDK 8+11) ***


***) yum repo installation looks to be failing due to a legacy (SHA1)
third-party sig in our KEYS file. This would impact all rhel9+ users.
Workaround is…
```
# run this before `yum install cassandra`
update-crypto-policies --set LEGACY
```
ref:
https://www.redhat.com/en/blog/rhel-security-sha-1-package-signatures-distrusted-rhel-9


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Fair enough.  Yes, my thought was if we're going to use fastutils
concurrent we might as well use them for single threaded use cases rather
than having both floating around, but, if we're in love with Agrona buffers
I'm fine with both.

On Thu, May 25, 2023 at 11:29 AM David Capwell  wrote:

> Agrona isn’t going anywhere due to the library being more than basic
> collections.
>
> Now, with regard to single-threaded collections… honestly I dislike Agrona
> as I always fight to avoid boxing; carrot was far better with this regard….
> Didn’t look at the fastutil versions to see if they are better here, but I
> do know I am personally not happy with Agrona primitive collections…
>
> I do believe the main motivator for this is that fastutil has a concurrent
> version of their collections, so you gain access to concurrent primitive
> collections; something we do not have today… Given the desire for
> concurrent primitive collections, I am cool with it.
>
> I’m not inclined to swap it out
>
>
> When it came to random testing libraries, I believe the stance taken
> before was that we should allow multiple versions and the best one will win
> eventually… so I am cool having the same stance for primitive collections...
>
> On May 25, 2023, at 8:50 AM, Benedict  wrote:
>
> Given they provide no data or explanation, and that benchmarking is hard,
> I’m not inclined to give much weight to their analysis.
>
> Agrona was favoured in large part due to the perceived quality of the
> library. I’m not inclined to swap it out without proper evidence the
> fastutils is both materially faster in a manner care about and of similar
> quality.
>
> On 25 May 2023, at 16:43, Jonathan Ellis  wrote:
>
> 
> Try it out and see, the only data point I have is that the company who has
> spent more effort here than anyone else I could find likes fastutil better.
>
> On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:
>
>> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
>> >
>> > Any objections to adding the concurrent wrapper and switching out
>> agrona for fastutil?
>>
>> How does fastutil compare to agrona in terms of memory profile and
>> runtime performance? How invasive would it be to switch?
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Let's not fall prey to status quo bias, nobody performed an exhaustive
analysis of agrona in November.  If Branimir had proposed fastutils at the
time that's what we'd be using today.



On Thu, May 25, 2023 at 10:50 AM Benedict  wrote:

> Given they provide no data or explanation, and that benchmarking is hard,
> I’m not inclined to give much weight to their analysis.
>
> Agrona was favoured in large part due to the perceived quality of the
> library. I’m not inclined to swap it out without proper evidence the
> fastutils is both materially faster in a manner care about and of similar
> quality.
>
> On 25 May 2023, at 16:43, Jonathan Ellis  wrote:
>
> 
> Try it out and see, the only data point I have is that the company who has
> spent more effort here than anyone else I could find likes fastutil better.
>
> On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:
>
>> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
>> >
>> > Any objections to adding the concurrent wrapper and switching out
>> agrona for fastutil?
>>
>> How does fastutil compare to agrona in terms of memory profile and
>> runtime performance? How invasive would it be to switch?
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Patrick McFadin
+1
Love the buzz this creating with new users. Thanks for the work on this
Jonathan.

On Thu, May 25, 2023 at 8:45 AM Jonathan Ellis  wrote:

> Let's make this official.
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


Re: [VOTE] Release Apache Cassandra 4.1.2

2023-05-25 Thread Brandon Williams
+1



On Thu, May 25, 2023 at 10:14 AM Mick Semb Wever  wrote:
>
> Proposing the test build of Cassandra 4.1.2 for release.
>
> sha1: c5c075f0080f3f499d2b01ffb155f89723076285
> Git: https://github.com/apache/cassandra/tree/4.1.2-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1302/org/apache/cassandra/cassandra-all/4.1.2/
>
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.1.2/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/4.1.2-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/4.1.2-tentative/NEWS.txt


Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Abe Ratnofsky
I'm seeing a few distinct topics here:

1. Harry's adoption and approachability

I agree that approachability is one of Harry's main improvement areas right 
now. If our goal is to produce a fuzz testing framework for the Cassandra 
project, then adoption by contributors and usage for new feature development 
are reasonable indicators for whether we're achieving that goal. If Harry is 
not getting adopted by contributors outside of Apple, and is not getting used 
for new feature development, then we should make an effort to understand why. I 
don't think that a several-hour seminar is the best point of leverage to 
achieve those goals.

Here's what I think we do need:

- The README should be understandable by anyone interested in writing a fuzz 
test
- Example tests should be runnable from a fresh clone of Cassandra, in an IDE 
or on the command line
- Examples of how we would test new features (like CEP-7, CEP-29, etc) with the 
fuzz testing framework

I find the JVM dtest framework accomplishes similar goals, and one reason is 
because there are plenty of examples, and it's relatively easy to copy and 
paste one example and have it do what you'd like. I believe the same approach 
would work for a fuzz testing framework.

Some of these tasks above are already done for Harry, such as better IDE 
support for samples. This will be available in OSS Harry shortly.

2. Moving Harry in-tree vs. in submodule

As I understand it, making Harry a submodule of Cassandra would make it easier 
to deal with versioning, since we wouldn't have to do the entire release dance 
we need to do for dtest-api, but I don't see this as a big improvement to 
approachability.

I do think that moving Harry in-tree would improve approachability, for the 
same reason as the JVM dtests. It's nice to write a feature or fix, find a 
similar JVM dtest, copy, paste, and edit, and have something useful.

3. General subdivision of Cassandra projects

This topic has come up quite a few times recently - around shared utilities 
(CEP-10 concurrency primitives, etc), dtest-api, query parser, etc. The project 
has tried out a few different approaches on composition of separate projects. 
Hopefully in the near future we find the one that works best and can start this 
process of splitting out libraries.

--
Abe

> On May 25, 2023, at 6:36 AM, Josh McKenzie  wrote:
> 
>> I would really like us to split out utilities into a common project
> +1 to the sentiment.
> 
> Would also advocate strongly for it being more tightly integrated with the 
> base project than what we've been doing with our ecosystem (i.e. completely 
> separate projects, not submodules), mostly from a discoverability and 
> workflow standpoint.
> 
> I'm definitely salty about having to have 4 IDE's / projects open just to 
> work on the entire stack.
> 
> On Thu, May 25, 2023, at 5:05 AM, Alex Petrov wrote:
>> This was not a talk, but rather an interactive workshop, unfortunately will 
>> not work in a recorded way, but I am trying to work out ways to preserve 
>> this.
>> 
>> On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
>>> Since the talk was not accepted for Cassandra Summit, would it be possible 
>>> to record it as a simple youtube video and publish it so that the detailed 
>>> information about how to use Harry is not lost?
>>> 
>>> On Thu, May 25, 2023 at 7:36 AM Alex Petrov >> > wrote:
>>> 
>>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>>> submodule, and actually move some tests that are common between the 
>>> branches there.
>>> 
>>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
 Isn’t the other reason Accord works well as a submodule that it has no 
 dependencies on C* proper? Harry does at the moment, right? (Not that we 
 couldn’t address that…just trying to think this through…)
 
> On May 24, 2023, at 6:54 PM, Benedict  > wrote:
> 
> 
> In this case Harry is a testing module - it’s not something we will 
> develop in tandem with C* releases, and we will want improvements to be 
> applied across all branches.
> 
> So it seems a natural fit for submodules to me.
> 
> 
>> On 24 May 2023, at 21:09, Caleb Rackliffe > > wrote:
>> 
>> > Submodules do have their own overhead and edge cases, so I am mostly 
>> > in favor of using for cases where the code must live outside of tree 
>> > (such as jvm-dtest that lives out of tree as all branches need the 
>> > same interfaces)
>> 
>> Agreed. Basically where I've ended up on this topic.
>> 
>> > We could go over some interesting examples such as testing 2i (SAI)
>> 
>> +100
>> 
>> 
>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov > > wrote:
>> 
>> > I'm about to need to harry test for the paging across tombstone work 

Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Benedict
I’m far less inclined to take that approach to fundamental libraries, where quality is far more important than presentation.On 25 May 2023, at 17:29, David Capwell  wrote:Agrona isn’t going anywhere due to the library being more than basic collections.Now, with regard to single-threaded collections… honestly I dislike Agrona as I always fight to avoid boxing; carrot was far better with this regard…. Didn’t look at the fastutil versions to see if they are better here, but I do know I am personally not happy with Agrona primitive collections…I do believe the main motivator for this is that fastutil has a concurrent version of their collections, so you gain access to concurrent primitive collections; something we do not have today… Given the desire for concurrent primitive collections, I am cool with it.I’m not inclined to swap it outWhen it came to random testing libraries, I believe the stance taken before was that we should allow multiple versions and the best one will win eventually… so I am cool having the same stance for primitive collections...On May 25, 2023, at 8:50 AM, Benedict  wrote:Given they provide no data or explanation, and that benchmarking is hard, I’m not inclined to give much weight to their analysis.Agrona was favoured in large part due to the perceived quality of the library. I’m not inclined to swap it out without proper evidence the fastutils is both materially faster in a manner care about and of similar quality.On 25 May 2023, at 16:43, Jonathan Ellis  wrote:Try it out and see, the only data point I have is that the company who has spent more effort here than anyone else I could find likes fastutil better.On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:> On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> 
> Any objections to adding the concurrent wrapper and switching out agrona for fastutil?

How does fastutil compare to agrona in terms of memory profile and runtime performance? How invasive would it be to switch?-- Jonathan Ellisco-founder, http://www.datastax.com@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread David Capwell
Agrona isn’t going anywhere due to the library being more than basic 
collections.

Now, with regard to single-threaded collections… honestly I dislike Agrona as I 
always fight to avoid boxing; carrot was far better with this regard…. Didn’t 
look at the fastutil versions to see if they are better here, but I do know I 
am personally not happy with Agrona primitive collections…

I do believe the main motivator for this is that fastutil has a concurrent 
version of their collections, so you gain access to concurrent primitive 
collections; something we do not have today… Given the desire for concurrent 
primitive collections, I am cool with it.

> I’m not inclined to swap it out

When it came to random testing libraries, I believe the stance taken before was 
that we should allow multiple versions and the best one will win eventually… so 
I am cool having the same stance for primitive collections...

> On May 25, 2023, at 8:50 AM, Benedict  wrote:
> 
> Given they provide no data or explanation, and that benchmarking is hard, I’m 
> not inclined to give much weight to their analysis.
> 
> Agrona was favoured in large part due to the perceived quality of the 
> library. I’m not inclined to swap it out without proper evidence the 
> fastutils is both materially faster in a manner care about and of similar 
> quality.
> 
>> On 25 May 2023, at 16:43, Jonathan Ellis  wrote:
>> 
>> 
>> Try it out and see, the only data point I have is that the company who has 
>> spent more effort here than anyone else I could find likes fastutil better.
>> 
>> On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi > > wrote:
>>> > On May 25, 2023, at 6:14 AM, Jonathan Ellis >> > > wrote:
>>> > 
>>> > Any objections to adding the concurrent wrapper and switching out agrona 
>>> > for fastutil?
>>> 
>>> How does fastutil compare to agrona in terms of memory profile and runtime 
>>> performance? How invasive would it be to switch?
>> 
>> 
>> -- 
>> Jonathan Ellis
>> co-founder, http://www.datastax.com 
>> @spyced



Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Benedict
Given they provide no data or explanation, and that benchmarking is hard, I’m not inclined to give much weight to their analysis.Agrona was favoured in large part due to the perceived quality of the library. I’m not inclined to swap it out without proper evidence the fastutils is both materially faster in a manner care about and of similar quality.On 25 May 2023, at 16:43, Jonathan Ellis  wrote:Try it out and see, the only data point I have is that the company who has spent more effort here than anyone else I could find likes fastutil better.On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:> On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> 
> Any objections to adding the concurrent wrapper and switching out agrona for fastutil?

How does fastutil compare to agrona in terms of memory profile and runtime performance? How invasive would it be to switch?-- Jonathan Ellisco-founder, http://www.datastax.com@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
There's about a dozen uses of agrona so far, plus a few more in test code,
almost all of which are SAI.  Porting over won't be hard.

On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:

> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> >
> > Any objections to adding the concurrent wrapper and switching out agrona
> for fastutil?
>
> How does fastutil compare to agrona in terms of memory profile and runtime
> performance? How invasive would it be to switch?



-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release dtest-api 0.0.15

2023-05-25 Thread Dinesh Joshi
With 5 +1s and no -1s, the vote passes. Thanks everybody.

> On May 24, 2023, at 9:58 AM, Jon Meredith  wrote:
> 
> +1
> 
> On Wed, May 24, 2023 at 10:13 AM Francisco Guerrero  > wrote:
>> +1 (nb)
>> 
>> On 2023/05/24 15:38:54 Alex Petrov wrote:
>> > +1
>> > 
>> > On Wed, May 24, 2023, at 5:36 PM, Doug Rohrer wrote:
>> > > +1 (nb)
>> > > 
>> > > > On May 24, 2023, at 11:32 AM, Brandon Williams > > > > > wrote:
>> > > > 
>> > > > +1
>> > > > 
>> > > > Kind Regards,
>> > > > Brandon
>> > > > 
>> > > > On Wed, May 24, 2023 at 10:31 AM Dinesh Joshi > > > > > wrote:
>> > > >> 
>> > > >> Proposing the test build of in-jvm dtest API 0.0.15 for release.
>> > > >> 
>> > > >> Repository:
>> > > >> https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git
>> > > >> 
>> > > >> Candidate SHA:
>> > > >> https://github.com/apache/cassandra-in-jvm-dtest-api/commit/48af78d1d4b5f285d3dd4991afd4df3101e3983a
>> > > >> tagged with 0.0.15
>> > > >> 
>> > > >> Artifacts:
>> > > >> https://repository.apache.org/content/repositories/orgapachecassandra-1290/org/apache/cassandra/dtest-api/0.0.15/
>> > > >> 
>> > > >> Key signature: 53371F9B1B425A336988B6A03B6042413D323470
>> > > >> 
>> > > >> Changes since last release:
>> > > >> 
>> > > >> * CASSANDRA-18537: Add JMX utility class to in-jvm dtest to ease
>> > > >> development of new tests using JMX
>> > > >> 
>> > > >> The vote will be open for 24 hours. Everyone who has tested the build
>> > > >> is invited to vote. Votes by PMC members are considered binding. A
>> > > >> vote passes if there are at least three binding +1s.
>> > > 
>> > > 



[VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread Jonathan Ellis
Let's make this official.

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes

POC that demonstrates all the big rocks, including distributed queries:
https://github.com/datastax/cassandra/tree/cep-vsearch

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Try it out and see, the only data point I have is that the company who has
spent more effort here than anyone else I could find likes fastutil better.

On Thu, May 25, 2023 at 10:33 AM Dinesh Joshi  wrote:

> > On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> >
> > Any objections to adding the concurrent wrapper and switching out agrona
> for fastutil?
>
> How does fastutil compare to agrona in terms of memory profile and runtime
> performance? How invasive would it be to switch?



-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Dinesh Joshi
> On May 25, 2023, at 6:14 AM, Jonathan Ellis  wrote:
> 
> Any objections to adding the concurrent wrapper and switching out agrona for 
> fastutil?

How does fastutil compare to agrona in terms of memory profile and runtime 
performance? How invasive would it be to switch?

Re: [VOTE] Release Apache Cassandra 4.1.2

2023-05-25 Thread scott
+1nb

> On May 25, 2023, at 10:12 AM, Mick Semb Wever  wrote:
> 
> Proposing the test build of Cassandra 4.1.2 for release.
> 
> sha1: c5c075f0080f3f499d2b01ffb155f89723076285
> Git: https://github.com/apache/cassandra/tree/4.1.2-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1302/org/apache/cassandra/cassandra-all/4.1.2/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.1.2/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/4.1.2-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/4.1.2-tentative/NEWS.txt



Re: [VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread scott
+1nb

> On May 25, 2023, at 10:14 AM, Brandon Williams  wrote:
> 
> +1
> 
> Kind Regards,
> Brandon
> 
> On Thu, May 25, 2023 at 10:13 AM Mick Semb Wever  wrote:
>> 
>> Proposing the test build of Cassandra 4.0.10 for release.
>> 
>> sha1: da77d3f729160e84fbab37666de99550be794265
>> Git: https://github.com/apache/cassandra/tree/4.0.10-tentative
>> Maven Artifacts: 
>> https://repository.apache.org/content/repositories/orgapachecassandra-1299/org/apache/cassandra/cassandra-all/4.0.10/
>> 
>> The Source and Build Artifacts, and the Debian and RPM packages and 
>> repositories, are available here: 
>> https://dist.apache.org/repos/dist/dev/cassandra/4.0.10/
>> 
>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>> tested the build is invited to vote. Votes by PMC members are considered 
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>> 
>> [1]: CHANGES.txt: 
>> https://github.com/apache/cassandra/blob/4.0.10-tentative/CHANGES.txt
>> [2]: NEWS.txt: 
>> https://github.com/apache/cassandra/blob/4.0.10-tentative/NEWS.txt



Re: [VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread Brandon Williams
+1

Kind Regards,
Brandon

On Thu, May 25, 2023 at 10:13 AM Mick Semb Wever  wrote:
>
> Proposing the test build of Cassandra 4.0.10 for release.
>
> sha1: da77d3f729160e84fbab37666de99550be794265
> Git: https://github.com/apache/cassandra/tree/4.0.10-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1299/org/apache/cassandra/cassandra-all/4.0.10/
>
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/4.0.10/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/4.0.10-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/4.0.10-tentative/NEWS.txt


[VOTE] Release Apache Cassandra 4.1.2

2023-05-25 Thread Mick Semb Wever
Proposing the test build of Cassandra 4.1.2 for release.

sha1: c5c075f0080f3f499d2b01ffb155f89723076285
Git: https://github.com/apache/cassandra/tree/4.1.2-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1302/org/apache/cassandra/cassandra-all/4.1.2/

The Source and Build Artifacts, and the Debian and RPM packages and
repositories, are available here:
https://dist.apache.org/repos/dist/dev/cassandra/4.1.2/

The vote will be open for 72 hours (longer if needed). Everyone who has
tested the build is invited to vote. Votes by PMC members are considered
binding. A vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt:
https://github.com/apache/cassandra/blob/4.1.2-tentative/CHANGES.txt
[2]: NEWS.txt:
https://github.com/apache/cassandra/blob/4.1.2-tentative/NEWS.txt


[VOTE] Release Apache Cassandra 4.0.10

2023-05-25 Thread Mick Semb Wever
Proposing the test build of Cassandra 4.0.10 for release.

sha1: da77d3f729160e84fbab37666de99550be794265
Git: https://github.com/apache/cassandra/tree/4.0.10-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1299/org/apache/cassandra/cassandra-all/4.0.10/

The Source and Build Artifacts, and the Debian and RPM packages and
repositories, are available here:
https://dist.apache.org/repos/dist/dev/cassandra/4.0.10/

The vote will be open for 72 hours (longer if needed). Everyone who has
tested the build is invited to vote. Votes by PMC members are considered
binding. A vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt:
https://github.com/apache/cassandra/blob/4.0.10-tentative/CHANGES.txt
[2]: NEWS.txt:
https://github.com/apache/cassandra/blob/4.0.10-tentative/NEWS.txt


Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Josh McKenzie
> I would really like us to split out utilities into a common project
+1 to the sentiment.

Would also advocate strongly for it being more tightly integrated with the base 
project than what we've been doing with our ecosystem (i.e. completely separate 
projects, not submodules), mostly from a discoverability and workflow 
standpoint.

I'm definitely salty about having to have 4 IDE's / projects open just to work 
on the entire stack.

On Thu, May 25, 2023, at 5:05 AM, Alex Petrov wrote:
> This was not a talk, but rather an interactive workshop, unfortunately will 
> not work in a recorded way, but I am trying to work out ways to preserve this.
> 
> On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
>> Since the talk was not accepted for Cassandra Summit, would it be possible 
>> to record it as a simple youtube video and publish it so that the detailed 
>> information about how to use Harry is not lost?
>> 
>> On Thu, May 25, 2023 at 7:36 AM Alex Petrov  wrote:
>>> __
>>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>>> submodule, and actually move some tests that are common between the 
>>> branches there.
>>> 
>>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
 Isn’t the other reason Accord works well as a submodule that it has no 
 dependencies on C* proper? Harry does at the moment, right? (Not that we 
 couldn’t address that…just trying to think this through…)
 
> On May 24, 2023, at 6:54 PM, Benedict  wrote:
> 
> 
> In this case Harry is a testing module - it’s not something we will 
> develop in tandem with C* releases, and we will want improvements to be 
> applied across all branches.
> 
> So it seems a natural fit for submodules to me.
> 
> 
>> On 24 May 2023, at 21:09, Caleb Rackliffe  
>> wrote:
>> 
>> > Submodules do have their own overhead and edge cases, so I am mostly 
>> > in favor of using for cases where the code must live outside of tree 
>> > (such as jvm-dtest that lives out of tree as all branches need the 
>> > same interfaces)
>> 
>> Agreed. Basically where I've ended up on this topic.
>> 
>> > We could go over some interesting examples such as testing 2i (SAI)
>> 
>> +100
>> 
>> 
>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:
>>> __
>>> > I'm about to need to harry test for the paging across tombstone work 
>>> > for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
>>> > where my own overlapping fuzzing came in). In the process, I'll see 
>>> > if I can't distill something really simple along the lines of how 
>>> > React approaches it (https://react.dev/learn).
>>> 
>>> We can pick that up as an example, sure. 
>>> 
>>> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
> workshop,
 I'm about to need to harry test for the paging across tombstone work 
 for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
 where my own overlapping fuzzing came in). In the process, I'll see if 
 I can't distill something really simple along the lines of how React 
 approaches it (https://react.dev/learn).
 
 Ideally we'd be able to get something together that's a high level "In 
 the next 15 minutes, you will know and understand A-G and have access 
 to N% of the power of harry" kind of offer.
 
 Honestly, there's a *lot* in our ecosystem where we could benefit from 
 taking a page from their book in terms of onboarding and getting 
 started IMO.
 
 On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
> > I wonder if a mini-onboarding session would be good as a community 
> > session - go over Harry, how to run it, how to add a test?  Would 
> > that be the right venue?  I just would like to see how we can not 
> > only plug it in to regular CI but get everyone that wants to add a 
> > test be able to know how to get started with it.
> 
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
> workshop, but unfortunately it got declined. Goes without saying, we 
> can still do it online, time and resources permitting. But again, I 
> do not think it should be barring us from making Harry a part of the 
> codebase, as it already is. In fact, we can be iterating on the 
> development quicker having it in-tree. 
> 
> We could go over some interesting examples such as testing 2i (SAI), 
> modelling Group By tests, or testing repair. If there is enough 
> appetite and collaboration in the community, I will see if we can 
> pull something like that together. Input on _what_ you would like to 

Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Jonathan Ellis
Hi all,

We've been using agrona for almost a year and it's a huge improvement over
boxing everything.  But it's limited to single thread use cases.

Fastutil is an alternative that has a concurrent wrapper:
https://github.com/vigna/fastutil
https://github.com/trivago/fastutil-concurrent-wrapper

Both fastutil and the concurrent wrapper are actively maintained.  The
authors of the wrapper say they evaluated fastutil vs agrona and built
their wrapper for fastutil because it's faster at reads and writes:
https://tech.trivago.com/post/2022-03-09-why-and-how-we-use-primitive-maps

Any objections to adding the concurrent wrapper and switching out agrona
for fastutil?

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced


Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Alex Petrov
This was not a talk, but rather an interactive workshop, unfortunately will not 
work in a recorded way, but I am trying to work out ways to preserve this.

On Thu, May 25, 2023, at 10:26 AM, Claude Warren, Jr via dev wrote:
> Since the talk was not accepted for Cassandra Summit, would it be possible to 
> record it as a simple youtube video and publish it so that the detailed 
> information about how to use Harry is not lost?
> 
> On Thu, May 25, 2023 at 7:36 AM Alex Petrov  wrote:
>> __
>> While we are at it, we may also want to pull the in-jvm dtest API as a 
>> submodule, and actually move some tests that are common between the branches 
>> there.
>> 
>> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>>> Isn’t the other reason Accord works well as a submodule that it has no 
>>> dependencies on C* proper? Harry does at the moment, right? (Not that we 
>>> couldn’t address that…just trying to think this through…)
>>> 
 On May 24, 2023, at 6:54 PM, Benedict  wrote:
 
 
 In this case Harry is a testing module - it’s not something we will 
 develop in tandem with C* releases, and we will want improvements to be 
 applied across all branches.
 
 So it seems a natural fit for submodules to me.
 
 
> On 24 May 2023, at 21:09, Caleb Rackliffe  
> wrote:
> 
> > Submodules do have their own overhead and edge cases, so I am mostly in 
> > favor of using for cases where the code must live outside of tree (such 
> > as jvm-dtest that lives out of tree as all branches need the same 
> > interfaces)
> 
> Agreed. Basically where I've ended up on this topic.
> 
> > We could go over some interesting examples such as testing 2i (SAI)
> 
> +100
> 
> 
> On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:
>> __
>> > I'm about to need to harry test for the paging across tombstone work 
>> > for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's 
>> > where my own overlapping fuzzing came in). In the process, I'll see if 
>> > I can't distill something really simple along the lines of how React 
>> > approaches it (https://react.dev/learn).
>> 
>> We can pick that up as an example, sure. 
>> 
>> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
 I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
 workshop,
>>> I'm about to need to harry test for the paging across tombstone work 
>>> for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where 
>>> my own overlapping fuzzing came in). In the process, I'll see if I 
>>> can't distill something really simple along the lines of how React 
>>> approaches it (https://react.dev/learn).
>>> 
>>> Ideally we'd be able to get something together that's a high level "In 
>>> the next 15 minutes, you will know and understand A-G and have access 
>>> to N% of the power of harry" kind of offer.
>>> 
>>> Honestly, there's a *lot* in our ecosystem where we could benefit from 
>>> taking a page from their book in terms of onboarding and getting 
>>> started IMO.
>>> 
>>> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
 > I wonder if a mini-onboarding session would be good as a community 
 > session - go over Harry, how to run it, how to add a test?  Would 
 > that be the right venue?  I just would like to see how we can not 
 > only plug it in to regular CI but get everyone that wants to add a 
 > test be able to know how to get started with it.
 
 I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
 workshop, but unfortunately it got declined. Goes without saying, we 
 can still do it online, time and resources permitting. But again, I do 
 not think it should be barring us from making Harry a part of the 
 codebase, as it already is. In fact, we can be iterating on the 
 development quicker having it in-tree. 
 
 We could go over some interesting examples such as testing 2i (SAI), 
 modelling Group By tests, or testing repair. If there is enough 
 appetite and collaboration in the community, I will see if we can pull 
 something like that together. Input on _what_ you would like to see / 
 hear / tested is also appreciated. Harry was developed out of a strong 
 need for large-scale testing, which also has informed many of its 
 APIs, but we can make it easier to access for interactive testing / 
 unit tests. We have been doing a lot of that with Transactional 
 Metadata, too. 
 
 > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any 
 > thoughts here?
 
 Yes, sorry for not responding on this thread earlier. I can not 
 understate how excited I am about this, and 

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Benedict
I would really like us to split out utilities into a common project, personally. It would be nice to work with a shared palette, including for dtest-api, accord, Harry etc.I think it would help clean up the codebase a bit too, as we have some (minimal) tight coupling with utilities and the C* process.But doubt we have the time for that anytime soon.On 25 May 2023, at 05:04, Caleb Rackliffe  wrote:Isn’t the other reason Accord works well as a submodule that it has no dependencies on C* proper? Harry does at the moment, right? (Not that we couldn’t address that…just trying to think this through…)On May 24, 2023, at 6:54 PM, Benedict  wrote:In this case Harry is a testing module - it’s not something we will develop in tandem with C* releases, and we will want improvements to be applied across all branches.So it seems a natural fit for submodules to me.On 24 May 2023, at 21:09, Caleb Rackliffe  wrote:> Submodules do have their own overhead and edge cases, so I am mostly in favor of using for cases where the code must live outside of tree (such as jvm-dtest that lives out of tree as all branches need the same interfaces)Agreed. Basically where I've ended up on this topic.> We could go over some interesting examples such as testing 2i (SAI)+100On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:> I'm about to need to harry test for the paging across tombstone work for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my own overlapping fuzzing came in). In the process, I'll see if I can't distill something really simple along the lines of how React approaches it (https://react.dev/learn).We can pick that up as an example, sure. On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:I have submitted a proposal to Cassandra Summit for a 4-hour Harry workshop,I'm about to need to harry test for the paging across tombstone work for https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my own overlapping fuzzing came in). In the process, I'll see if I can't distill something really simple along the lines of how React approaches it (https://react.dev/learn).Ideally we'd be able to get something together that's a high level "In the next 15 minutes, you will know and understand A-G and have access to N% of the power of harry" kind of offer.Honestly, there's a lot in our ecosystem where we could benefit from taking a page from their book in terms of onboarding and getting started IMO.On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:> I wonder if a mini-onboarding session would be good as a community session - go over Harry, how to run it, how to add a test?  Would that be the right venue?  I just would like to see how we can not only plug it in to regular CI but get everyone that wants to add a test be able to know how to get started with it.I have submitted a proposal to Cassandra Summit for a 4-hour Harry workshop, but unfortunately it got declined. Goes without saying, we can still do it online, time and resources permitting. But again, I do not think it should be barring us from making Harry a part of the codebase, as it already is. In fact, we can be iterating on the development quicker having it in-tree. We could go over some interesting examples such as testing 2i (SAI), modelling Group By tests, or testing repair. If there is enough appetite and collaboration in the community, I will see if we can pull something like that together. Input on _what_ you would like to see / hear / tested is also appreciated. Harry was developed out of a strong need for large-scale testing, which also has informed many of its APIs, but we can make it easier to access for interactive testing / unit tests. We have been doing a lot of that with Transactional Metadata, too. > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any thoughts here?Yes, sorry for not responding on this thread earlier. I can not understate how excited I am about this, and how important I think this is. Time constraints are somehow hard to overcome, but I hope the results brought by TCM will make it all worth it.On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:I think pulling Harry into the tree will make adoption easier for the folks. I have been a bit swamped with Transactional Metadata work, but I wanted to make some of the things we were using for testing TCM available outside of TCM branch. This includes a bunch of helper methods to perform operations on the clusters, data generation, and more useful stuff. Of course, the question always remains about how much time I want to spend porting it all to Gossip, but I think we can find a reasonable compromise. I would not set this improvement as a prerequisite to pulling Harry into the main branch, but rather interpret it as a commitment from myself to take community input and make it more approachable by the day. On Wed, May 24, 2023, at 2:44 PM, Josh McKenzie wrote:importantly it’s a million times better than the dtest-api process - 

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Claude Warren, Jr via dev
Since the talk was not accepted for Cassandra Summit, would it be possible
to record it as a simple youtube video and publish it so that the detailed
information about how to use Harry is not lost?

On Thu, May 25, 2023 at 7:36 AM Alex Petrov  wrote:

> While we are at it, we may also want to pull the in-jvm dtest API as a
> submodule, and actually move some tests that are common between the
> branches there.
>
> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>
> Isn’t the other reason Accord works well as a submodule that it has no
> dependencies on C* proper? Harry does at the moment, right? (Not that we
> couldn’t address that…just trying to think this through…)
>
> On May 24, 2023, at 6:54 PM, Benedict  wrote:
>
> 
>
> In this case Harry is a testing module - it’s not something we will
> develop in tandem with C* releases, and we will want improvements to be
> applied across all branches.
>
> So it seems a natural fit for submodules to me.
>
>
> On 24 May 2023, at 21:09, Caleb Rackliffe 
> wrote:
>
> 
> > Submodules do have their own overhead and edge cases, so I am mostly in
> favor of using for cases where the code must live outside of tree (such as
> jvm-dtest that lives out of tree as all branches need the same interfaces)
>
> Agreed. Basically where I've ended up on this topic.
>
> > We could go over some interesting examples such as testing 2i (SAI)
>
> +100
>
>
> On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:
>
>
> > I'm about to need to harry test for the paging across tombstone work for
> https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my
> own overlapping fuzzing came in). In the process, I'll see if I can't
> distill something really simple along the lines of how React approaches it (
> https://react.dev/learn).
>
> We can pick that up as an example, sure.
>
> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry
> workshop,
>
> I'm about to need to harry test for the paging across tombstone work for
> https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my
> own overlapping fuzzing came in). In the process, I'll see if I can't
> distill something really simple along the lines of how React approaches it (
> https://react.dev/learn).
>
> Ideally we'd be able to get something together that's a high level "In the
> next 15 minutes, you will know and understand A-G and have access to N% of
> the power of harry" kind of offer.
>
> Honestly, there's a *lot* in our ecosystem where we could benefit from
> taking a page from their book in terms of onboarding and getting started
> IMO.
>
> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>
> > I wonder if a mini-onboarding session would be good as a community
> session - go over Harry, how to run it, how to add a test?  Would that be
> the right venue?  I just would like to see how we can not only plug it in
> to regular CI but get everyone that wants to add a test be able to know how
> to get started with it.
>
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry
> workshop, but unfortunately it got declined. Goes without saying, we can
> still do it online, time and resources permitting. But again, I do not
> think it should be barring us from making Harry a part of the codebase, as
> it already is. In fact, we can be iterating on the development quicker
> having it in-tree.
>
> We could go over some interesting examples such as testing 2i (SAI),
> modelling Group By tests, or testing repair. If there is enough appetite
> and collaboration in the community, I will see if we can pull something
> like that together. Input on _what_ you would like to see / hear / tested
> is also appreciated. Harry was developed out of a strong need for
> large-scale testing, which also has informed many of its APIs, but we can
> make it easier to access for interactive testing / unit tests. We have been
> doing a lot of that with Transactional Metadata, too.
>
> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any
> thoughts here?
>
> Yes, sorry for not responding on this thread earlier. I can not understate
> how excited I am about this, and how important I think this is. Time
> constraints are somehow hard to overcome, but I hope the results brought by
> TCM will make it all worth it.
>
> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>
> I think pulling Harry into the tree will make adoption easier for the
> folks. I have been a bit swamped with Transactional Metadata work, but I
> wanted to make some of the things we were using for testing TCM available
> outside of TCM branch. This includes a bunch of helper methods to perform
> operations on the clusters, data generation, and more useful stuff. Of
> course, the question always remains about how much time I want to spend
> porting it all to Gossip, but I think we can find a reasonable compromise.
>
> I would not set this improvement as a 

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Alex Petrov
While we are at it, we may also want to pull the in-jvm dtest API as a 
submodule, and actually move some tests that are common between the branches 
there.

On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
> Isn’t the other reason Accord works well as a submodule that it has no 
> dependencies on C* proper? Harry does at the moment, right? (Not that we 
> couldn’t address that…just trying to think this through…)
> 
>> On May 24, 2023, at 6:54 PM, Benedict  wrote:
>> 
>> 
>> In this case Harry is a testing module - it’s not something we will develop 
>> in tandem with C* releases, and we will want improvements to be applied 
>> across all branches.
>> 
>> So it seems a natural fit for submodules to me.
>> 
>> 
>>> On 24 May 2023, at 21:09, Caleb Rackliffe  wrote:
>>> 
>>> > Submodules do have their own overhead and edge cases, so I am mostly in 
>>> > favor of using for cases where the code must live outside of tree (such 
>>> > as jvm-dtest that lives out of tree as all branches need the same 
>>> > interfaces)
>>> 
>>> Agreed. Basically where I've ended up on this topic.
>>> 
>>> > We could go over some interesting examples such as testing 2i (SAI)
>>> 
>>> +100
>>> 
>>> 
>>> On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:
 __
 > I'm about to need to harry test for the paging across tombstone work for 
 > https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my 
 > own overlapping fuzzing came in). In the process, I'll see if I can't 
 > distill something really simple along the lines of how React approaches 
 > it (https://react.dev/learn).
 
 We can pick that up as an example, sure. 
 
 On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>> workshop,
> I'm about to need to harry test for the paging across tombstone work for 
> https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my 
> own overlapping fuzzing came in). In the process, I'll see if I can't 
> distill something really simple along the lines of how React approaches 
> it (https://react.dev/learn).
> 
> Ideally we'd be able to get something together that's a high level "In 
> the next 15 minutes, you will know and understand A-G and have access to 
> N% of the power of harry" kind of offer.
> 
> Honestly, there's a *lot* in our ecosystem where we could benefit from 
> taking a page from their book in terms of onboarding and getting started 
> IMO.
> 
> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>> > I wonder if a mini-onboarding session would be good as a community 
>> > session - go over Harry, how to run it, how to add a test?  Would that 
>> > be the right venue?  I just would like to see how we can not only plug 
>> > it in to regular CI but get everyone that wants to add a test be able 
>> > to know how to get started with it.
>> 
>> I have submitted a proposal to Cassandra Summit for a 4-hour Harry 
>> workshop, but unfortunately it got declined. Goes without saying, we can 
>> still do it online, time and resources permitting. But again, I do not 
>> think it should be barring us from making Harry a part of the codebase, 
>> as it already is. In fact, we can be iterating on the development 
>> quicker having it in-tree. 
>> 
>> We could go over some interesting examples such as testing 2i (SAI), 
>> modelling Group By tests, or testing repair. If there is enough appetite 
>> and collaboration in the community, I will see if we can pull something 
>> like that together. Input on _what_ you would like to see / hear / 
>> tested is also appreciated. Harry was developed out of a strong need for 
>> large-scale testing, which also has informed many of its APIs, but we 
>> can make it easier to access for interactive testing / unit tests. We 
>> have been doing a lot of that with Transactional Metadata, too. 
>> 
>> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any 
>> > thoughts here?
>> 
>> Yes, sorry for not responding on this thread earlier. I can not 
>> understate how excited I am about this, and how important I think this 
>> is. Time constraints are somehow hard to overcome, but I hope the 
>> results brought by TCM will make it all worth it.
>> 
>> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>>> I think pulling Harry into the tree will make adoption easier for the 
>>> folks. I have been a bit swamped with Transactional Metadata work, but 
>>> I wanted to make some of the things we were using for testing TCM 
>>> available outside of TCM branch. This includes a bunch of helper 
>>> methods to perform operations on the clusters, data generation, and 
>>> more useful stuff. Of course, the question always remains about how