Re: [DISCUSS] Stream Pipelines on hot paths

2024-05-31 Thread David Capwell
I am cool for forbidding with a callout that tests are ok. I am cool with forbidding in tests as well, but thats just for consistency reasons than anything. > On May 31, 2024, at 8:12 AM, Brandon Williams wrote: > > > On Fri, May 31, 2024 at 9:35 AM Abe Ratnofsky >

Re: [DISCUSS] Stream Pipelines on hot paths

2024-05-31 Thread David Capwell
agreement about edge cases, but if anyone in a discussion thinks something is a hot path then it should be treated as one IMO. On 30 May 2024, at 18:39, David Capwell wrote

Re: [DISCUSS] Stream Pipelines on hot paths

2024-05-30 Thread David Capwell
As a general statement I agree with you (same for String.format as well), but one thing to call out is that it can be hard to tell what is the hot path and what isn’t. When you are doing background work (like repair) its clear, but when touching something internal it can be hard to tell; this

Re: [DISCUSS] The way we log

2024-05-29 Thread David Capwell
> I saw a lot of cases like this: > > if (logger.isTraceEnabled()) > logger.trace("a log message"); > > and sometimes just: > > logger.trace("a log message"); > > Why do we do it once like that and another time the other way? I remember years ago seeing perf numbers where the

Re: [DISCUSS] Adding experimental vtables and rules around them

2024-05-29 Thread David Capwell
t a thought. I do not see yet how we could leverage it, just > saying how I perceive it. > > On Wed, May 29, 2024 at 9:02 PM David Capwell <mailto:dcapw...@apple.com>> wrote: >> We agreed a long time ago that all new features are disabled by default, but >> I want

[DISCUSS] Adding experimental vtables and rules around them

2024-05-29 Thread David Capwell
We agreed a long time ago that all new features are disabled by default, but I wanted to try to flesh out what we “should” do with something that might still be experimental and subject to breaking changes; I would prefer we keep this thread specific to vtables as the UX is different for

Re: [DISCUSS] Gossip Protocol Change

2024-05-16 Thread David Capwell
ad serious impact (from >> data loss to loss of ability to safely replace nodes in the cluster). >> >> I am happy to contribute some time to this if lack of folks is the issue. >> >> Jordan >> >> On Mon, May 13, 2024 at 17:05 David Capwell > <mailt

Re: [DISCUSS] ccm as a subproject

2024-05-15 Thread David Capwell
Yes please! > On May 15, 2024, at 2:23 PM, Bret McGuire wrote: > >Very much agreed Paulo; I was musing on the idea of adding Docker support > to ccm recently as well. We'd want to preserve the current ability to work > with releases (and Github branches) but I very much like the idea of

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread David Capwell
like range tombstones ... in much worse ;-). A tombstone is a >>> simple marker (deleted). An update can be far more complex. >>> >>> Le mar. 14 mai 2024 à 15:52, Jon Haddad a écrit : >>> Is there a technical limitation that would prevent a range write

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-13 Thread David Capwell
I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work. > On May 13, 2024, at 7:40 AM, Patrick McFadin wrote: > > This is a great feature addition to CQL! I get asked about it from time to > time but then people figure out a workaround. It will be great to just have > it

Re: [DISCUSS] Gossip Protocol Change

2024-05-13 Thread David Capwell
So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which lets you do deterministic gossip simulation testing cross large clusters within seconds… I stopped this work as it conflicted with TCM (they were trying to merge that week) and it hit issues where some nodes never

Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-29 Thread David Capwell
> So: besides Jon, who in the community expects/desires to maintain this going > forward? I have been maintaining a fork for years, so don’t mind helping maintain this project. > On Apr 28, 2024, at 4:08 AM, Mick Semb Wever wrote: > >> A separate subproject like dtest and the Java driver

Re: [DISCUSS] Fixing coverage reports for jvm-dtest-upgrade

2024-03-19 Thread David Capwell
l#getName-- > >> On Mar 18, 2024, at 3:48 PM, David Capwell wrote: >> >> Are there any side effects to naming ClassLoaders? How do we do this? >> >> I am +1 to the idea but don’t know enough to know what negatives could >> happen. >> >>&g

Re: [DISCUSS] Fixing coverage reports for jvm-dtest-upgrade

2024-03-18 Thread David Capwell
Are there any side effects to naming ClassLoaders? How do we do this? I am +1 to the idea but don’t know enough to know what negatives could happen. > On Mar 17, 2024, at 7:25 AM, Josh McKenzie wrote: > > +1 from me > >> If classloaders are appropriately named in the current versions of >>

Re: [DISCUSSION] Replace the Config class instance with the tree-based framework

2024-03-14 Thread David Capwell
Just went over the doc and I don’t see a real argument for why we want to switch “frameworks”… I think this should be fleshed out more; what is it we are actually solving for? Also, I “think” its trying to propose we switch from a mostly flat yaml to a nested structure… I kinda feel this

Re: [DISCUSS] Cassandra 5.0 support for RHEL 7

2024-03-11 Thread David Capwell
>>>>> >>>>> Was there something else in 3.29.0 that actually necessitated the move to >>>>> a floor of Python 3.8? Do we generally change runtime requirements in >>>>> minor releases for the driver? >>>>> >>>>>

Re: [DISCUSS] Cassandra 5.0 support for RHEL 7

2024-03-11 Thread David Capwell
ntime requirements in minor >>> releases for the driver? >>> >>> On Mon, Mar 11, 2024 at 12:12 PM Brandon Williams >> <mailto:dri...@gmail.com>> wrote: >>> Given that 3.6 has been EOL for 2+ years[1], I don't think it makes >>> sense t

[DISCUSS] Cassandra 5.0 support for RHEL 7

2024-03-11 Thread David Capwell
Originally we had planned to support RHEL 7 but in testing 5.0 we found out that cqlsh no longer works on RHEL 7[1]. This was changed in CASSANDRA-19245 which upgraded python-driver from 3.28.0 to 3.29.0. For some reason this minor version upgrade also dropped support for python 3.6 which is

[DISCUSS] What SHOULD we do when we index an inet type that is ipv4?

2024-03-06 Thread David Capwell
So, was reviewing SAI and found we convert ipv4 to ipv6 (which is valid for the type) and made me wonder what the behavior would be if client mixed ipv4 with ipv4 encoded as ipv6… this caused me to find a different behavior in SAI to the rest of C*… where I feel C* is doing the wrong thing…

Re: Welcome Brad Schoening as Cassandra Committer

2024-02-27 Thread David Capwell
Congrats! > On Feb 26, 2024, at 2:14 AM, Mick Semb Wever wrote: > > Big Congrats Brad! > > On Wed, 21 Feb 2024 at 21:47, Josh McKenzie > wrote: >> The Apache Cassandra PMC is pleased to announce that Brad Schoening has >> accepted >> the invitation to become a

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-16 Thread David Capwell
I’m +1 once the tests are passing ands +0 while they are failingSent from my iPhoneOn Feb 16, 2024, at 6:08 AM, Paulo Motta wrote:Thanks for clarifying Branimir! I'm +1 on proceeding as proposed and I think this change will make it easier to gain confidence to update configurations.Interesting

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-15 Thread David Capwell
This thread got large quick, yay! >> is there a reason all guardrails and reliability (aka repair retries) >> configs are off by default? They are off by default in the normal config >> for backwards compatibility reasons, but if we are defining a config saying >> what we recommend, we should

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-13 Thread David Capwell
> so can cause repairs to deadlock forever Small correction, I finished fixing the tests in CASSANDRA-19042 and we don’t deadlock, we timeout and fail repair if any of those messages are dropped. > On Feb 13, 2024, at 11:04 AM, David Capwell wrote: > >> and to point

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-13 Thread David Capwell
> and to point potential users that are evaluating the technology to an > optimized set of defaults Left this comment in the GH… is there a reason all guardrails and reliability (aka repair retries) configs are off by default? They are off by default in the normal config for backwards

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread David Capwell
Congrats! > On Jan 8, 2024, at 10:53 AM, Jacek Lewandowski > wrote: > > Congratulations Maxim, well deserved, it's a pleasure to work with you! > > - - -- --- - - > Jacek Lewandowski > > > pon., 8 sty 2024 o 19:35 Lorina Poland >

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-19 Thread David Capwell
t; In general, there are plenty of use cases that prefer determinism. So I >>>> agree that there should at least be a CBO implementation that makes the >>>> same decisions as the status quo, deterministically. >>>> >>>> I do support the proposal, but

Re: Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?

2023-12-18 Thread David Capwell
> A brief perusal shows jqwik as integrated with JUnit 5 taking a fairly > interesting annotation-based approach to property testing. Curious if you've > looked into or used that at all David (Capwell)? (link for the lazy: > https://jqwik.net/docs/current/user-guide.html#de

Re: Moving Semver4j from test to main dependencies

2023-12-18 Thread David Capwell
+1 > On Dec 15, 2023, at 7:35 PM, Mick Semb Wever wrote: > > > >> I'd like to add Semver4j to the production dependencies. It is currently on >> the test classpath. The library is pretty lightweight, licensed with MIT and >> has no transitive dependencies. >> >> We need to represent the

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-12 Thread David Capwell
Overall LGTM. > On Dec 12, 2023, at 5:29 AM, Benjamin Lerer wrote: > > Hi everybody, > > I would like to open the discussion on the introduction of a cost based > optimizer to allow Cassandra to pick the best execution plan based on the > data distribution.Therefore, improving the overall

Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread David Capwell
Congrats! > On Dec 8, 2023, at 11:00 AM, Lorina Poland wrote: > > Congratulations, Mike!

Re: [DISCUSS] CASSANDRA-19113: Publishing dtest-shaded JARs on release

2023-11-28 Thread David Capwell
+1 from me > On Nov 28, 2023, at 12:55 PM, Doug Rohrer wrote: > > +1 (nb, but not a vote, so ¯\_(ツ)_/¯ ) - would be lovely to not have to deal > with this individually for each project in which we use the in-jvm dtest > framework. As Francisco noted, we’re using this in the sidecar and

Re: [DISCUSS] Harry in-tree

2023-11-27 Thread David Capwell
+1 to in-tree > On Nov 27, 2023, at 9:17 AM, Benjamin Lerer wrote: > > +1 > > Le lun. 27 nov. 2023 à 18:01, Brandon Williams > a écrit : >> I am +1 on including Harry in-tree. >> >> Kind Regards, >> Brandon >> >> On Fri, Nov 24, 2023 at 9:44 AM Alex Petrov >

Re: [DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-10-25 Thread David Capwell
IR patch is up for review https://issues.apache.org/jira/browse/CASSANDRA-18962 > On Oct 24, 2023, at 3:15 PM, David Capwell wrote: > > I sat down to add IR messages to the mix… given how positive the feedback was > for other repair messages I assume people are still ok with this

Re: [DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-10-24 Thread David Capwell
retry logic > On Sep 26, 2023, at 12:08 PM, David Capwell wrote: > > Thanks all for the feedback! The patch has 2 +1s on trunk and back ported to > 5.0, making sure it’s stable now; I plan to merge early this week. > >> On Sep 21, 2023, at 2:07 PM, Ekaterina Dimitrova >

Re: CASSANDRA-18775 (Cassandra supported OSs)

2023-10-20 Thread David Capwell
+1 to drop the whole lib… > On Oct 20, 2023, at 7:55 AM, Jeremiah Jordan > wrote: > > Agreed. -1 on selectively removing any of the libs. But +1 for removing the > whole thing if it is no longer used. > > -Jeremiah > > On Oct 20, 2023 at 9:28:55 AM, Mick Semb Wever

Re: [DISCUSS] Gossip shutdown may corrupt peers making it so the cluster never converges, and a small protocol change to fix

2023-10-09 Thread David Capwell
PM, David Capwell wrote: > >> Won't the replacement have a newer generation? > > The replacement is a different instance. I performs a shadow round with its > seeds and if they are impacted by this issue then they are missing tokens, so > we fail the host replacem

Re: [DISCUSS] Gossip shutdown may corrupt peers making it so the cluster never converges, and a small protocol change to fix

2023-10-06 Thread David Capwell
liams wrote: > > On Fri, Oct 6, 2023 at 5:50 PM David Capwell wrote: >> Lets say you now need to host replace node1 > > Won't the replacement have a newer generation? > >> avoid peers mutating endpoint states they don’t own > > This sounds reasonable to me. > &

[DISCUSS] Gossip shutdown may corrupt peers making it so the cluster never converges, and a small protocol change to fix

2023-10-06 Thread David Capwell
Just filed https://issues.apache.org/jira/browse/CASSANDRA-18913 (Gossip NPE due to shutdown event corrupting empty statuses) which is where I saw this issue.. When we do gossip shutdown we send a message GOSSIP_SHUTDOWN which then gets handled by this method

Re: [VOTE] Accept java-driver

2023-10-03 Thread David Capwell
+1 > On Oct 3, 2023, at 8:32 AM, Chris Lohfink wrote: > > +1 > > On Tue, Oct 3, 2023 at 10:30 AM Jeff Jirsa > wrote: >> +1 >> >> >> On Mon, Oct 2, 2023 at 9:53 PM Mick Semb Wever > > wrote: >>> The donation of the java-driver is ready for its

Re: multiple ParameterizedClass objects?

2023-10-03 Thread David Capwell
It would help me if you could give examples of what you want the yaml to look like and why it requires ParameterizedClass. I try to avoid that class as much as possible when doing configs and newer configs are finding different ways to solve the same problems... > On Oct 3, 2023, at 12:10 AM,

Re: [DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-09-26 Thread David Capwell
I think it could be argued that not retrying messages is a bug, I am >>>> +1 on including this in 5.0. >>>> >>>> Kind Regards, >>>> Brandon >>>> >>>> On Tue, Sep 19, 2023 at 1:16 PM David Capwell >>> <mailto:dcap

Re: [DISCUSS] Vector type and empty value

2023-09-20 Thread David Capwell
nse: strings and blobs. All else >> should not allow this except for backward compatibility reasons. So, not for >> new types. >> >>>> On 20 Sep 2023, at 00:08, David Capwell wrote: >>>> >>>> When does empty mean null? >>> >>&

Re: [DISCUSS] Vector type and empty value

2023-09-20 Thread David Capwell
Days. It’s a distinct concern from columns being nullable or not. > > There are a couple types where this makes sense: strings and blobs. All else > should not allow this except for backward compatibility reasons. So, not for > new types. > >> On 20 Sep 2023, at 00:08, David Ca

Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread David Capwell
t empty. It’s too late for the existing types, but we should > hold to this going forward. Which is what I think the idea was in > https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was > sad the existing numerics were emptiable, but too late to change, and we > co

[DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-09-19 Thread David Capwell
To try to get repair more stable, I added optional retry logic (patch is still in review) to a handful of critical repair verbs. This patch is disabled by default but allows you to opt-in to retries so ephemeral issues don’t cause a repair to fail after running for a long time (assuming they

Re: [DISCUSS] Vector type and empty value

2023-09-19 Thread David Capwell
> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started making > types non -emptiable. This approach makes more sense to me as having to deal > with empty value is error prone in my opinion. I agree it’s confusing, and in the patch I found that different code paths didn’t handle

[DISCUSS] Add Jepsen's Elle as a test dependency for Accord / Paxos

2023-09-13 Thread David Capwell
For validation of Paxos and Accord 2 different consistency verifiers were created: accord.verify.StrictSerializabilityVerifier (Accord), and org.apache.cassandra.simulator.paxos.LinearizabilityValidator (Paxos). To increase confidence in both protocols it would be good to use an external

Re: [Discuss] Repair inside C*

2023-07-26 Thread David Capwell
, etc., that has been >>> working for us for the last six years at an immense scale, and I will share >>> it soon on a private fork. >>> >>> Thanks, >>> Jaydeep >>> >>> On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev &

Re: [Discuss] Repair inside C*

2023-07-25 Thread David Capwell
As someone who has done a lot of work trying to make repair stable, I approve of this message ^_^ More than glad to help mentor this work > On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia > wrote: > > To clarify the repair solution timing, the one we have listed in the article > is not the

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-27 Thread David Capwell
> nobody referred to running checks in a pre-push (or pre-commit) hook In accord I added an opt-out for each hook, and will require such here as well… as long as you can opt-out, its fine by me… I know I will likely opt-out, but wouldn’t block such an effort > Your point that pre-push hook

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-26 Thread David Capwell
> not running it automatically with the targets which devs usually run locally. The checks tend to have an opt-out, such as -Dno-checkstyle=true… so its really easy to setup your local environment to opt out what you do not care about… I feel we should force people to opt-out rather than

Re: [DISCUSS] Using ACCP or tc-native by default

2023-06-22 Thread David Capwell
+1 to ACCP > On Jun 22, 2023, at 3:05 PM, C. Scott Andreas wrote: > > +1 for ACCP and can attest to its results. ACCP also optimizes for a range of > hash functions and other cryptographic primitives beyond TLS acceleration for > Netty. > >> On Jun 22, 2023, at 2:07 PM, Jeff Jirsa wrote: >>

Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-16 Thread David Capwell
ct the current number of system keyspace/tables from the old value. >>> For example, 150 tables in the old threshold translate to 103 tables in the >>> new guardrail, considering that there are 47 system tables. >>> >>> Does this sound good? >>&g

Re: [DISCUSS] Remove org.apache.cassandra.io.sstable.SSTableHeaderFix in trunk (5.0)?

2023-06-15 Thread David Capwell
Not heard any feedback yet, so tomorrow plan to remove… the feature was local to 3.6+ so all users migrating from 3.0 to 4.0 never had this issue > On Jun 13, 2023, at 10:22 AM, David Capwell wrote: > > org.apache.cassandra.io.sstable.SSTableHeaderFix was added due to bugs in 3.6

Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-14 Thread David Capwell
che.org>> wrote: >> >>> Warning that too many tables (including system) may have negative behavior >>> I think is fine >> This reminds me of the current situation with our tests where we just keep >> adding more and more without really considering the v

Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-13 Thread David Capwell
se converters, we would need to know how many system > keyspaces/tables were on the version we are upgrading from. I don't know if > that information is available. Or perhaps we could assume that counting > system keyspaces/tables was a bug, and just translate changing the meaning to > not inc

[DISCUSS] Remove org.apache.cassandra.io.sstable.SSTableHeaderFix in trunk (5.0)?

2023-06-13 Thread David Capwell
org.apache.cassandra.io.sstable.SSTableHeaderFix was added due to bugs in 3.6 causing invalidate types or incompatible types (due to toString changes) in the SSTableHeader… this logic runs on start and rewrites all Stats files that had a mismatch from the local schema; with 5.0 requiring

Re: [DISCUSS] Remove deprecated keyspace_count_warn_threshold and table_count_warn_threshold

2023-06-13 Thread David Capwell
> Have we been dropping support entirely for old params or using the @Replaces > annotation into perpetuity? My understanding is that the goal is to keep things around in perpetuity unless it actively causes us harm… and with @Replaces, there tends to be no harm to keep around… Looking at

Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread David Capwell
+1 > On Jun 13, 2023, at 7:59 AM, Josh McKenzie wrote: > > +1 > > On Tue, Jun 13, 2023, at 10:55 AM, Jeremiah Jordan wrote: >> +1 nb >> >> On Jun 13, 2023 at 9:14:35 AM, Jeremy Hanna > > wrote: >>> >>> Calling for a vote on CEP-8 [1]. >>> >>> To clarify

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-06-01 Thread David Capwell
t and the default > logging on git being... completely silent. =/ > > Looks like subsequent runs aren't hanging on that and are hopping right > through, so perhaps this a "first run tax" for submodule + worktree. > > On Thu, Jun 1, 2023, at 2:05 PM, David Capwell w

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-06-01 Thread David Capwell
to apache. So the main issue has been when 2 authors try to work together (such as during review of a PR) > On Jun 1, 2023, at 10:15 AM, David Capwell wrote: > > Most edge cases we have seen in Accord are working with feature branches from > other authors where we use relative p

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-06-01 Thread David Capwell
Most edge cases we have seen in Accord are working with feature branches from other authors where we use relative paths to make sure the git@ vs https:// doesn’t become a problem for CI (submodule points to https:// to work in CI, but if you do that during feature development it gets annoying

Re: [VOTE] CEP-30 ANN Vector Search

2023-05-25 Thread David Capwell
+1 > On May 25, 2023, at 1:53 PM, Ekaterina Dimitrova > wrote: > > +1 > > On Thu, 25 May 2023 at 16:46, Brandon Williams > wrote: >> +1 >> >> Kind Regards, >> Brandon >> >> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis > > wrote: >> > >>

Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread David Capwell
Agrona isn’t going anywhere due to the library being more than basic collections. Now, with regard to single-threaded collections… honestly I dislike Agrona as I always fight to avoid boxing; carrot was far better with this regard…. Didn’t look at the fastutil versions to see if they are

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread David Capwell
> The time spent on getting that running has been a fair few hours, where we > could have cut many manual module releases in that time. We spent a few hours getting submodules working, and we no longer need to release for every single commit… $ git log

Re: Vector search demo, and query syntax

2023-05-23 Thread David Capwell
I am ok with the syntax, but wondering if a function maybe better than a CQL change? SELECT id, start, end, text FROM {self.keyspace}.{self.table} ORDER BY ANN(embedding, ?) LIMIT ? Not really a common syntax, but could be useful down the line > On May 23, 2023, at 12:37 AM, Mick Semb Wever

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-17 Thread David Capwell
the database >>>> scales/works. (I've done my best to call this out in all discussions >>>> around SAI over time, and there may even end up being further guardrails >>>> put in place to make it even harder to misuse it...but I digress.) >>>> >>>

Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread David Capwell
> [POLL] Centralize existing syntax or create new syntax? 1.) CREATE INDEX ... USING WITH OPTIONS... > [POLL] Should there be a default? (YES/NO) Yes > [POLL] What do do with the default? 3.) YAML config to override default index (legacy 2i remains the default) 4.) YAML config/guardrail

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread David Capwell
> I really dislike the idea of the same CQL doing different things based upon a > per-node configuration. > I agree with Brandon that changing CQL behaviour like this based on node > config is really not ideal. I am cool adding such a config, and also cool keeping CREATE INDEX disabled by

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-10 Thread David Capwell
+1 > On May 10, 2023, at 9:36 AM, Francisco Guerrero wrote: > > +1 (nb) > > On 2023/05/10 14:10:06 Jeremiah D Jordan wrote: >> +1 nb >> >>> On May 8, 2023, at 3:52 AM, Piotr Kołaczkowski >>> wrote: >>> >>> Let's vote. >>> >>>

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread David Capwell
> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd prefer > allowing USING...WITH... for CREATE INDEX I have 0 issues with a new syntax to make this more clear > just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's more or > less what my original proposal

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread David Capwell
If we assume SAI is what we should use by default for the cluster, would it make sense to allow CREATE INDEX [IF NOT EXISTS] [name] ON () But use a new yaml config that switches from legacy to SAI? default_2i_impl: sai For 5.0 we can default to “legacy” (new features disabled by default),

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread David Capwell
Approach section doesn’t go over how this will handle cross replica search, this would be good to flesh out… given results have a real ranking, the current 2i logic may yield incorrect results… so would think we need num_ranges / rf queries in the best case, with some new capability to sort the

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
https://issues.apache.org/jira/browse/CASSANDRA-18504 > On May 5, 2023, at 12:27 PM, David Capwell wrote: > > Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP > >> On May 5, 2023, at 11:58 AM, David Capwell wrote: >> >>> If we ever add sparse

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP > On May 5, 2023, at 11:58 AM, David Capwell wrote: > >> If we ever add sparse vectors, we can assume that DENSE is the default and >> allow to use either DENSE, SPARSE or nothing. > > I have been feeling tha

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
If we ever add sparse vectors, we can assume that DENSE is the default and > allow to use either DENSE, SPARSE or nothing. > > Perhaps the dimension could be separated from the type, such as in > VECTOR[dimension] or VECTOR(dimension). > > On Fri, 5 May 2023 at 19:05

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
t conversation to remove =)… maybe defer this to JIRA as long as all parties agree in the ticket? With all votes in, this is what I see Syntax Jonathan Ellis David Capwell Josh McKenzie Caleb Rackliffe Patrick McFadin Brandon Williams Mike Adamson Benedict Mick Semb Wever Derek Chen-Becker VECTOR 1

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Sorry, DENSE_VECTOR was pointing to the wrong row, updated score Syntax Score VECTOR 16 DENSE VECTOR 11 type[dimension] 9 NON NULL [dimention] 6 VECTOR type[n] 5 DENSE_VECTOR 3 NON-NULL FROZEN 3 ARRAY 0 > On May 5, 2023, at 10:01 AM, David Capwell wrote: > > Updated > > Sy

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Updated Syntax Jonathan Ellis David Capwell Josh McKenzie Caleb Rackliffe Patrick McFadin Brandon Williams Mike Adamson Benedict Mick Semb Wever Derek Chen-Becker VECTOR 1 2 2 1 ? 3 2 DENSE VECTOR 2 1 ? ? type[dimension] 3 3 3 1 3 2 DENSE_VECTOR 1 NON NULL [dimention] 1

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
t;>> >>>> - We don't have to explain what it is. A lot of prior art out there >>>> already [1][2][3] >>>> - We're matching an established term with what users would expect. No >>>> surprises. >>>> - Shorter ramp-up time for users. Cassandra is being modernized. >>&g

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
being modernized. >>> >>> The implementation is flexible, but the interface should empower our users >>> to be awesome. >>> >>> Patrick >>> >>> 1 - >>> https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-a

Re: [POLL] Vector type for ML

2023-05-04 Thread David Capwell
My views have changed over time on syntax and I feel type[dimention] may not be the best, so it has gone lower in my own personal ranking… this is my current preference 1) DENSE [dimention] | NON NULL [dimention] 2) VECTOR 3) type[dimention] My reasoning for this order * type[dimention] looks

Re: [POLL] Vector type for ML

2023-05-03 Thread David Capwell
gt;>>>> >>>>> Patrick >>>>> >>>>> On Tue, May 2, 2023 at 3:27 PM Jonathan Ellis >>>> <mailto:jbel...@gmail.com>> wrote: >>>>>> I had a call with David. We agreed that we want a "vector" data

Re: [POLL] Vector type for ML

2023-05-02 Thread David Capwell
> How about it, David? Did you already make this? I checked out the patch, fixed serialize/deserialize, added the constraints, then added a composeForFloat(ByteBuffer), with this the impact to the POC patch was the following 1) move away from VectorType.instance.serializer().deserialize(bb)

Re: [POLL] Vector type for ML

2023-05-02 Thread David Capwell
> B) Should we introduce a type that is general purpose, and supports all > Cassandra types, so that this may be used to support ML (and perhaps other) > workloads I vote B only as well... > On May 2, 2023, at 9:02 AM, Benedict wrote: > > This is not the poll I thought we would be

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread David Capwell
e possibility of using the VECTOR keyword and bring us back to >>>> something like `NON-NULL FROZEN`. This is odd to me because >>>> `VECTOR` here can be just an alias for `NON-NULL FROZEN` while meeting the >>>> patch's audience and their idioms. I have no p

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread David Capwell
troducing such >> an alias to meet the ML crowd. >> >> Another way I think of this is >> `VECTOR FLOAT[n]` is the porcelain ML cql api, >> `NON-NULL FROZEN` and `FROZEN` and `FLOAT[n]` are the >> general-use plumbing cql apis. >> >> This would

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread David Capwell
> I think it is totally reasonable that the ANN patch (and Jonathan) is not > asked to implement on top of, or towards, other array (or other) new data > types. This impacts serialization, if you do not think about this day 1 you then can’t add later on without having to worry about migration

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread David Capwell
e most similar >>>> proposal in the past as wontfix. >>>> >>>> On Thu, Apr 27, 2023 at 7:49 PM Josh McKenzie wrote: >>>> From a machine learning perspective, vectors are a well-known concept that >>>> are effectively immutab

Re: [DISCUSS] New data type for vector search

2023-04-27 Thread David Capwell
> but as you point out it has the problem of allowing nulls. If nulls are not allowed for the elements, then either we need a) a new type, or b) add some way to say elements may not be null…. As much as I do like b, I am leaning towards new type for this use case. So, to flesh out the type

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread David Capwell
023, at 10:50 AM, David Capwell wrote: > > Thanks for starting this thread! > >> In the initial commits and thread, this was DENSE FLOAT32. Nobody really >> loved that, so we considered a bunch of alternatives, including >> >> - `FLOAT[N]`: This minimal opti

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread David Capwell
Thanks for starting this thread! > In the initial commits and thread, this was DENSE FLOAT32. Nobody really > loved that, so we considered a bunch of alternatives, including > > - `FLOAT[N]`: This minimal option resembles C and Java array syntax, which > would make it familiar for many users.

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread David Capwell
> DENSE seems to just be an array? So very similar to a frozen list, but with a > fixed size? How I read the doc, DENSE = ARRAY, but knew that couldn’t be the case, so when I read the code its fixed size array…. So the real syntax was “DENSE FLOAT32[42]” Not a fan of the type naming, and feel

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-24 Thread David Capwell
This work sounds interesting, I would recommend decoupling the types from the ANN support as the types require client changes and can go in now (would give a lot of breathing room to get this ready for 5.0), where as ANN depends on SAI which is still being worked on. > On Apr 22, 2023, at 1:02

Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-04-06 Thread David Capwell
Overall I welcome this feature, was trying to use this around 1-2 months back and found we didn’t support, so glad to see it coming! From a testing point of view, I think we would want to have good fuzz testing covering complex types (frozen/non-frozen collections, tuples, udt, etc.), and

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread David Capwell
>> against it. > Isn't the argument that cep-21 provides this so we could just remove the > temporary impl and point to the new facility for this generation? > > On Fri, Mar 24, 2023, at 3:22 PM, David Capwell wrote: >>> the question we want to answer is wh

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread David Capwell
> the question we want to answer is whether or not we build a throwaway patch > for linearizable epochs If this is in a release, we then need to maintain that feature, so would be against it. If this is for testing, then I would argue the current world is “fine”… current world is hard to use

Re: Role of Hadoop code in Cassandra 5.0

2023-03-16 Thread David Capwell
Isn’t our deprecation rules that if we deprecate in 4.0.0 we can remove in 5.x, but 4.x needs to wait for 6.x? I am cool deprecating this and willing to pull into another repo if people (not me) are willing to maintain it (else just delete). > On Mar 10, 2023, at 1:13 AM, Jacek Lewandowski

Re: hsqldb test dependency in simulator code

2023-03-14 Thread David Capwell
Quickly looking I think we can switch to org.agrona.collections.Long2LongHashMap, the key isn’t the “correct” type (long when we want int) but isn’t too hard to switch. Few differences in the semantics need to be handled, but not much 1) get of non-defeiled key should throw

Re: [DISCUSS] Next release date

2023-03-01 Thread David Capwell
I am cool with defining target release date and working backwards from there. If we do want to go this route, I think we do need to answer why 4.1 cut -> release took so much time, and if people could start validation “before” we branch? If we know trunk is stable today then we could release

  1   2   3   >