Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-06 Thread Brad
The 'cqlsh' package has been maintained at pypi.org since 2013, see
https://pypi.org/project/cqlsh/#history.  There is a solid 10 year history
of support and interest in the Python package distribution for cqlsh and it
has 11K/downloads per week.

A few additions to Jeff's comments:

   - The 'cqlsh' package has one primary file which is all of 36 lines,
   *setup.cfg*. The suggestion is to 1) move this file into the Apache
   Cassandra repository together with a README.md, and 2) add pypi as a
   distribution target for new Apache Cassandra releases of cqlsh.


   - As it exists today, the 'cqlsh' project is really just a stub which
   exists outside of Apache Cassandra to package cqlsh for distribution onto
   pypi.org.


   - For Windows clients (and yes, there are lots), 'pip install cqlsh' is
   the best way to run cqlsh on Windows.



On Thu, Jul 6, 2023 at 9:50 PM guo Maxwell  wrote:

> Hi :
> First of all, thank you very much for your work. I have a question: what
> is your long-term evolution plan for this project? How to achieve long-term
> continuous maintenance of this project? I have encountered some situations
> where some people's work is related to a certain project, and then they may
> have time to maintain, but once they change jobs, they may not have enough
> time to do this.  Besides, can you share more about the code management
> mechanism?
>
> Jeff Widman  于2023年7月7日周五 08:56写道:
>
>> Myself and Brad Schoening currently maintain
>> https://pypi.org/project/cqlsh/ which repackages CQLSH that ships with
>> every Cassandra release.
>>
>> This way:
>>
>>- anyone who wants a lightweight client to talk to a remote cassandra
>>can simply `pip install cqlsh` without having to download the full
>>cassandra source, unzip it, etc.
>>- it's very easy for folks to use it as scaffolding in their python
>>scripts/tooling since they can simply include it in the list of their
>>required dependencies.
>>
>> We currently handle the packaging by waiting for a release, then manually
>> copy/pasting the code out of the cassandra source tree into
>> https://github.com/jeffwidman/cqlsh which has some additional
>> build/python package configuration files, then using standard
>> python tooling to publish to PyPI.
>>
>> Given that our project is simply a build/packaging project, I wanted to
>> start a conversation about upstreaming this into core Cassandra. I realize
>> that Cassandra has no interest in maintaining lots of build targets... but
>> given that cqlsh is written in Python and publishing to PyPI enables DBA's
>> to share more complicated tooling built on top of it this seems like a
>> natural fit for core cassandra rather than a standalone project.
>>
>> Goal:
>> When a Cassandra release happens, the build/release process automatically
>> publishes cqlsh to https://pypi.org/project/cqlsh/.
>>
>> Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
>> was some initial chatter about that in
>> https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a
>> lot of complexity, and I'm honestly not sure it's a great idea. Even if
>> folks later want to go that route, the first hurdle is publishing to PyPI,
>> so for now let's keep the scope of the discussion limited to treating PyPI
>> purely as a release target, and not as an ingredient to a release.
>>
>> From an implementation perspective, this should be very straightforward.
>> We don't have any differences from the CQLSH source that's in cassandra,
>> instead we point folks to make changes to cqlsh in the Cassandra source. In
>> fact we've made multiple contributions back to `cqlsh` ourselves and have
>> drastically cleaned up the code:
>> https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening&type=pullrequests.
>> So the only real change is adding the package config files and the build /
>> release pipeline.
>>
>> We realize the Cassandra team isn't python/PyPI experts, so we'd be more
>> than happy to help wire this up and maintain it. I am also a maintainer of
>> kazoo and kafka-python which are both popular python clients for other
>> distributed databases. So I'm very familiar with open source, python, and
>> distributed databases.
>>
>> My one hesitation around this discussion is that I'm a little concerned
>> that we might lose the nimbleness we've currently got from having a
>> separate project. Ie, if something is screwed up on PyPI / the build
>> process, we can quickly get it fixed and get a new release out so that
>> users aren't blocked. Would it be possible as part of this process to
>> continue that myself/Brad had commit rights to the build process for PyPI?
>> To be clear, I'm not asking for commit rights to the Java code or anything
>> outside of Python, I just want to be sure that if we go to the trouble of
>> working with you to upstream this that there's a commitment from Cassandra
>> to keeping this build working, or to let

Re: CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-06 Thread guo Maxwell
Hi :
First of all, thank you very much for your work. I have a question: what is
your long-term evolution plan for this project? How to achieve long-term
continuous maintenance of this project? I have encountered some situations
where some people's work is related to a certain project, and then they may
have time to maintain, but once they change jobs, they may not have enough
time to do this.  Besides, can you share more about the code management
mechanism?

Jeff Widman  于2023年7月7日周五 08:56写道:

> Myself and Brad Schoening currently maintain
> https://pypi.org/project/cqlsh/ which repackages CQLSH that ships with
> every Cassandra release.
>
> This way:
>
>- anyone who wants a lightweight client to talk to a remote cassandra
>can simply `pip install cqlsh` without having to download the full
>cassandra source, unzip it, etc.
>- it's very easy for folks to use it as scaffolding in their python
>scripts/tooling since they can simply include it in the list of their
>required dependencies.
>
> We currently handle the packaging by waiting for a release, then manually
> copy/pasting the code out of the cassandra source tree into
> https://github.com/jeffwidman/cqlsh which has some additional
> build/python package configuration files, then using standard
> python tooling to publish to PyPI.
>
> Given that our project is simply a build/packaging project, I wanted to
> start a conversation about upstreaming this into core Cassandra. I realize
> that Cassandra has no interest in maintaining lots of build targets... but
> given that cqlsh is written in Python and publishing to PyPI enables DBA's
> to share more complicated tooling built on top of it this seems like a
> natural fit for core cassandra rather than a standalone project.
>
> Goal:
> When a Cassandra release happens, the build/release process automatically
> publishes cqlsh to https://pypi.org/project/cqlsh/.
>
> Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
> was some initial chatter about that in
> https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a
> lot of complexity, and I'm honestly not sure it's a great idea. Even if
> folks later want to go that route, the first hurdle is publishing to PyPI,
> so for now let's keep the scope of the discussion limited to treating PyPI
> purely as a release target, and not as an ingredient to a release.
>
> From an implementation perspective, this should be very straightforward.
> We don't have any differences from the CQLSH source that's in cassandra,
> instead we point folks to make changes to cqlsh in the Cassandra source. In
> fact we've made multiple contributions back to `cqlsh` ourselves and have
> drastically cleaned up the code:
> https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening&type=pullrequests.
> So the only real change is adding the package config files and the build /
> release pipeline.
>
> We realize the Cassandra team isn't python/PyPI experts, so we'd be more
> than happy to help wire this up and maintain it. I am also a maintainer of
> kazoo and kafka-python which are both popular python clients for other
> distributed databases. So I'm very familiar with open source, python, and
> distributed databases.
>
> My one hesitation around this discussion is that I'm a little concerned
> that we might lose the nimbleness we've currently got from having a
> separate project. Ie, if something is screwed up on PyPI / the build
> process, we can quickly get it fixed and get a new release out so that
> users aren't blocked. Would it be possible as part of this process to
> continue that myself/Brad had commit rights to the build process for PyPI?
> To be clear, I'm not asking for commit rights to the Java code or anything
> outside of Python, I just want to be sure that if we go to the trouble of
> working with you to upstream this that there's a commitment from Cassandra
> to keeping this build working, or to letting us be able to fix the build.
> Otherwise there's no point in upstreaming it only for it to go unmaintained
> leaving us looking on helplessly from the sidelines. I'm very flexible here
> on the solution.
>
> Thoughts?
>
> --
>
> *Jeff Widman*
> jeffwidman.com  | 740-WIDMAN-J (943-6265)
> <><
>


-- 
you are the apple of my eye !


CASSANDRA-18654 - start publishing CQLSH to PyPI as part of the release process

2023-07-06 Thread Jeff Widman
Myself and Brad Schoening currently maintain https://pypi.org/project/cqlsh/
which repackages CQLSH that ships with every Cassandra release.

This way:

   - anyone who wants a lightweight client to talk to a remote cassandra
   can simply `pip install cqlsh` without having to download the full
   cassandra source, unzip it, etc.
   - it's very easy for folks to use it as scaffolding in their python
   scripts/tooling since they can simply include it in the list of their
   required dependencies.

We currently handle the packaging by waiting for a release, then manually
copy/pasting the code out of the cassandra source tree into
https://github.com/jeffwidman/cqlsh which has some additional build/python
package configuration files, then using standard python tooling to publish
to PyPI.

Given that our project is simply a build/packaging project, I wanted to
start a conversation about upstreaming this into core Cassandra. I realize
that Cassandra has no interest in maintaining lots of build targets... but
given that cqlsh is written in Python and publishing to PyPI enables DBA's
to share more complicated tooling built on top of it this seems like a
natural fit for core cassandra rather than a standalone project.

Goal:
When a Cassandra release happens, the build/release process automatically
publishes cqlsh to https://pypi.org/project/cqlsh/.

Non-Goal: This is _not_ about having cassandra itself rely on PyPI. There
was some initial chatter about that in
https://issues.apache.org/jira/browse/CASSANDRA-18654, but that adds a lot
of complexity, and I'm honestly not sure it's a great idea. Even if folks
later want to go that route, the first hurdle is publishing to PyPI, so for
now let's keep the scope of the discussion limited to treating PyPI purely
as a release target, and not as an ingredient to a release.

>From an implementation perspective, this should be very straightforward. We
don't have any differences from the CQLSH source that's in cassandra,
instead we point folks to make changes to cqlsh in the Cassandra source. In
fact we've made multiple contributions back to `cqlsh` ourselves and have
drastically cleaned up the code:
https://github.com/search?q=repo%3Aapache%2Fcassandra%20is%3Apr%20author%3Ajeffwidman%20author%3Abschoening&type=pullrequests.
So the only real change is adding the package config files and the build /
release pipeline.

We realize the Cassandra team isn't python/PyPI experts, so we'd be more
than happy to help wire this up and maintain it. I am also a maintainer of
kazoo and kafka-python which are both popular python clients for other
distributed databases. So I'm very familiar with open source, python, and
distributed databases.

My one hesitation around this discussion is that I'm a little concerned
that we might lose the nimbleness we've currently got from having a
separate project. Ie, if something is screwed up on PyPI / the build
process, we can quickly get it fixed and get a new release out so that
users aren't blocked. Would it be possible as part of this process to
continue that myself/Brad had commit rights to the build process for PyPI?
To be clear, I'm not asking for commit rights to the Java code or anything
outside of Python, I just want to be sure that if we go to the trouble of
working with you to upstream this that there's a commitment from Cassandra
to keeping this build working, or to letting us be able to fix the build.
Otherwise there's no point in upstreaming it only for it to go unmaintained
leaving us looking on helplessly from the sidelines. I'm very flexible here
on the solution.

Thoughts?

-- 

*Jeff Widman*
jeffwidman.com  | 740-WIDMAN-J (943-6265)
<><


Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-06 Thread Dinesh Joshi
> On Jun 30, 2023, at 1:09 PM, Jeremiah Jordan  wrote:
> 
> I don’t think users necessarily need to be able to update their own 
> identities.  I just don’t want to have to use the super user role.  The super 
> user role has all power over all things in the data base.  I don’t want to 
> have to give that much power to the person who manages identities, I just 
> want to give them the power to manage identities.

Makes sense. I think Jyothsna already pushed an update to the PR to relax the 
restriction. Please feel free to take a look at it.

Dinesh





Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-06 Thread Dinesh Joshi
> It is surprising to me that we load the identity from the keystore vs 
> explicitly setting an expected value in cassandra.yaml. I get that an error 
> is thrown if the identity doesn't match those of other nodes in the cluster, 
> but does it make sense to prevent startup should the value in the keystore 
> deviate from a (currently nonexistent) value in cassandra.yaml?

We can make it optionally configurable. The concern about adding identities in 
a yaml is that it generally requires a bounce for Cassandra to pick up new 
values.

> It feels like there is a parallel to how we set the cluster name in 
> cassandra.yaml even though the value is also present within our local 
> sstables and leads to startup errors should they differ.

I can see the parallels here. Thanks for the feedback.

Dinesh

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-07-06 Thread Christopher Bradford
Looping back to the discussion around keystore usage and shared vs
individual identity. I understand the approach of having a single identity
shared by all nodes in the cluster. Including the entire response here, but
want to focus on the first line.

*The reason we use the keystore is that the node extracts its own identity
> and expects other nodes in the cluster to share the same identity.* This
> default behavior makes it easy to avoid configuring individual identities
> of nodes in the cluster. It's critical to recognize that if we had a
> separate identity for each node in the cluster, then we would need to
> update all nodes in the cluster when a new node is added or removed. This
> way all nodes in the cluster can have a shared identity while
> simultaneously preventing unnecessary operational pain of adding and
> removing identities each time a node is added or removed from the cluster.

(emphasis mine)

It is surprising to me that we load the identity from the keystore vs
explicitly setting an expected value in cassandra.yaml. I get that an error
is thrown if the identity doesn't match those of other nodes in the
cluster, but does it make sense to prevent startup should the value in the
keystore deviate from a (currently nonexistent) value in cassandra.yaml?

It feels like there is a parallel to how we set the cluster name in
cassandra.yaml even though the value is also present within our local
sstables and leads to startup errors should they differ.

Christopher Bradford



On Fri, Jun 30, 2023 at 4:09 PM Jeremiah Jordan 
wrote:

> I don’t think users necessarily need to be able to update their own
> identities.  I just don’t want to have to use the super user role.  The
> super user role has all power over all things in the data base.  I don’t
> want to have to give that much power to the person who manages identities,
> I just want to give them the power to manage identities.
>
> Jeremiah Jordan
> e. jerem...@datastax.com
> w. www.datastax.com
>
>
>
> On Jun 30, 2023 at 1:35:41 PM, Dinesh Joshi  wrote:
>
>> Yuki, Jeremiah both are fair points. The mental model we're using for
>> mTLS authentication is slightly different.
>>
>> In your model you're treating the TLS identity itself to be similar to
>> the password. The password is the 'shared secret' that currently needs
>> to be rotated by the user that owns the account therefore necessitating
>> the permission to update their password. But that is not the case with
>> TLS certificates and mTLS identities.
>>
>> The model we're going for is different. The identity is provisioned for
>> an account by a super user. This is more locked down and the user can
>> still rotate their own certificates but not change the identity
>> associated with their account without a super user.
>>
>> Once provisioned, a user does not need rotate the identity itself. They
>> only need to obtain fresh certificates as their certificates near
>> expiry. This requires no updates on the database unlike passwords.
>>
>> We could extend this functionality in the future to allow users to
>> change their own identity. Nothing here prevents that.
>>
>> thanks,
>>
>> Dinesh
>>
>>
>>
>> On 6/29/23 08:16, Jeremiah Jordan wrote:
>>
>> I like the idea of extending CREATE ROLE rather than adding a brand new
>>
>> ADD IDENTITY syntax.  Not sure how that can line up with one to many
>>
>> relationships for an identity, but maybe that can just be done through
>>
>> role hierarchy?
>>
>>
>> In either case, I don’t think IDENTITY related operations should be tied
>>
>> to the super user flag. They should be tied to either existing role
>>
>> permissions, or a brand new permissions about IDENTITY.  We should not
>>
>> require that end users give the account allowed to make IDENTITY changes
>>
>> super user permission to do what ever they want across the whole database.
>>
>>
>> On Jun 28, 2023 at 11:48:02 PM, Yuki Morishita >
>> > wrote:
>>
>> > Thinking more about "CREATE ROLE" permission, if we can extend CREATE
>>
>> > ROLE/ALTER ROLE statements, it may look streamlined:
>>
>> >
>>
>> > I don't have the good example, but something like:
>>
>> > ```
>>
>> > CREATE ROLE dev WITH LOGIN = true AND IDENTITIES = {'spiffe://xxx'};
>>
>> > ALTER ROLE dev ADD IDENTITY 'xxx';
>>
>> > LIST ROLES;
>>
>> > ```
>>
>> >
>>
>> > This requires a role to identities table as well as the current
>>
>> > identity to role table though.
>>
>> >
>>
>> > On Thu, Jun 29, 2023 at 12:34 PM Yuki Morishita >
>> > > wrote:
>>
>> >
>>
>> > Hi Jyothsna,
>>
>> >
>>
>> > I think for the *initial* commit, the description looks fine to me.
>>
>> > I'd like to see/contribute to the future improvement though:
>>
>> >
>>
>> > * ADD IDENTITY requires SUPERUSER, this means that the brand new
>>
>> > cluster needs to start with
>>
>> > PasswordAuthenticator/CassandraAuthorizer first, and then change
>>
>> > to mTLS one.
>>
>> > * For this

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-06 Thread Jon Meredith
sorry, hit send early.

ant test is an interesting one as it seems impractical to run all tests
sequentially, but somebody may want to I suppose.

On Thu, Jul 6, 2023 at 11:53 AM Jon Meredith  wrote:

> I think the -Dno-blah settings have usability issues. As they look at
> the property name, not the value, you cannot override them or default
> them with ANT_ARGS or by importing to another ant build file.  The way
> rat.skip does it seems much better using configured value.
>
> Ideally, I would like an easy/fast configuration to set a default for
> checks that slow up the compilation/test cycle locally to be able to
> iterate quickly compile and deal with javadoc/checkstyle comments when
> they're ready to commit, or opt into them on the commandline when
> needed.
>
> e.g.
> export ANT_ARGS="-Dcheckstyle.default=skip -Djavadoc.default=skip"
> ant # should just compile, no checkstyle/javadoc etc
> ant checkstyle  # explicitly requested, run checkstyle
>
> Similarly I'd like to have the option to configure any CI system I run so
> all
> non-execution essential checks run in their own pipeline and fail the
> build if there's a problem, but still run the other test targets despite
> violations. Each builder wasted the time running the checks that only
> need to happen once and you didn't get feedback about your tests that
> could have run. Of course not everybody may want that and the main
> Apache Cassandra CI may only want to run tests for checked commits
> for resource reasons.
>
> Also,as a minor nuisance, if you forget the =true as in the examples,
> ant consumes the next argument as the value, so "ant publish
> -Dno-tests -Dno-checks" would set no-tests=-Dno-checks and run the
> checks you tried to skip anyway.
>
> Back to the proposal, I like the idea of an explicit check target that
> runs all checks,
> I would not personally have the default target run them but think that's
> fine as long
> as you can disable them.
>
> ant test is an interesting one
>
> On Thu, Jul 6, 2023 at 7:30 AM Maxim Muzafarov  wrote:
>
>> In my humble opinion, it is better to have only one plain and
>> straightforward build pipeline for the whole project, with custom
>> flags used to skip a particular step, than to have multiple pipelines
>> under the ant tool with multiple endpoints accordingly. I mean, all
>> the steps need to be lined up, with each step in the pipeline
>> executing everything that stands before it unless skip flags are
>> specified. Meanwhile, I like your idea of grouping all the checks
>> under the dedicated step (and changing the no-checkstyle flag to
>> no-checks accordingly as Ekaterina mentioned).
>>
>>
>> Let me share a simple example of what I'm talking about with one
>> single endpoint.
>> Let's assume the following step order:
>>
>> init -> _build_java (compile) -> checks -> build -> jar -> test ->
>> artifacts -> publish;
>>
>> So, the use would be:
>>
>> ant jar -Dno-checks
>> ant test -Dno-build
>> ant publish -Dno-tests -Dno-checks
>>
>>
>> I'm not saying what you've proposed is bad, in fact, we're not
>> currently doing the pipeline I'm talking about, but adding an
>> additional endpoint is something we should consider very carefully as
>> it may create some difficulties for Maven/Gradle migration if it ever
>> happens.
>>
>> So, if I'm not mistaken the following you're trying to add a new
>> endpoint to the way how we might build the project:
>>
>> - "ant [check]" = build + all checks (first endpoint)
>> - "ant jar" = build + make jars + no checks (second endpoint)
>>
>> And I would suggest running `ant jar -Dno-checks` instead to achieve
>> the same result assuming the `jar` is still transitively dependent on
>> `checks`.
>>
>> On Thu, 6 Jul 2023 at 14:02, Jacek Lewandowski
>>  wrote:
>> >
>> > Great discussion, but I feel we still have no conclusion.
>> >
>> >
>> > I fully support automatically setting up IDE(A) to run the necessary
>> stuff automatically in a developer-friendly environment, but let it be
>> continued in a separate thread.
>> >
>> >
>> > I wouldn't say I like flags, especially if they have to be used on a
>> daily basis. The build script help message does not list them when "ant -p"
>> is run.
>> >
>> >
>> > I'm going to make these changes unless it is vetoed:
>> >
>> > "ant [check]" = build + all checks, build everything, and run all the
>> checks; also, this would become the default target if no target is specified
>> > "ant jar" = build + make jars: build all the jars and tests, no checks
>> > All "test" commands = build + make jars + run the tests: build all the
>> jars and tests, run the tests, no checks
>> >
>> >
>> > Therefore, a user who wants to validate their branch before running CI
>> would need to run just "ant" without any args. This way, a newcomer who
>> does not know our build targets will likely run the checks.
>> >
>> >
>> > We still need some flags for skipping specific tasks to optimize for
>> CI, but in general, they would not be required fo

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-06 Thread Jon Meredith
I think the -Dno-blah settings have usability issues. As they look at
the property name, not the value, you cannot override them or default
them with ANT_ARGS or by importing to another ant build file.  The way
rat.skip does it seems much better using configured value.

Ideally, I would like an easy/fast configuration to set a default for
checks that slow up the compilation/test cycle locally to be able to
iterate quickly compile and deal with javadoc/checkstyle comments when
they're ready to commit, or opt into them on the commandline when
needed.

e.g.
export ANT_ARGS="-Dcheckstyle.default=skip -Djavadoc.default=skip"
ant # should just compile, no checkstyle/javadoc etc
ant checkstyle  # explicitly requested, run checkstyle

Similarly I'd like to have the option to configure any CI system I run so
all
non-execution essential checks run in their own pipeline and fail the
build if there's a problem, but still run the other test targets despite
violations. Each builder wasted the time running the checks that only
need to happen once and you didn't get feedback about your tests that
could have run. Of course not everybody may want that and the main
Apache Cassandra CI may only want to run tests for checked commits
for resource reasons.

Also,as a minor nuisance, if you forget the =true as in the examples,
ant consumes the next argument as the value, so "ant publish
-Dno-tests -Dno-checks" would set no-tests=-Dno-checks and run the
checks you tried to skip anyway.

Back to the proposal, I like the idea of an explicit check target that runs
all checks,
I would not personally have the default target run them but think that's
fine as long
as you can disable them.

ant test is an interesting one

On Thu, Jul 6, 2023 at 7:30 AM Maxim Muzafarov  wrote:

> In my humble opinion, it is better to have only one plain and
> straightforward build pipeline for the whole project, with custom
> flags used to skip a particular step, than to have multiple pipelines
> under the ant tool with multiple endpoints accordingly. I mean, all
> the steps need to be lined up, with each step in the pipeline
> executing everything that stands before it unless skip flags are
> specified. Meanwhile, I like your idea of grouping all the checks
> under the dedicated step (and changing the no-checkstyle flag to
> no-checks accordingly as Ekaterina mentioned).
>
>
> Let me share a simple example of what I'm talking about with one
> single endpoint.
> Let's assume the following step order:
>
> init -> _build_java (compile) -> checks -> build -> jar -> test ->
> artifacts -> publish;
>
> So, the use would be:
>
> ant jar -Dno-checks
> ant test -Dno-build
> ant publish -Dno-tests -Dno-checks
>
>
> I'm not saying what you've proposed is bad, in fact, we're not
> currently doing the pipeline I'm talking about, but adding an
> additional endpoint is something we should consider very carefully as
> it may create some difficulties for Maven/Gradle migration if it ever
> happens.
>
> So, if I'm not mistaken the following you're trying to add a new
> endpoint to the way how we might build the project:
>
> - "ant [check]" = build + all checks (first endpoint)
> - "ant jar" = build + make jars + no checks (second endpoint)
>
> And I would suggest running `ant jar -Dno-checks` instead to achieve
> the same result assuming the `jar` is still transitively dependent on
> `checks`.
>
> On Thu, 6 Jul 2023 at 14:02, Jacek Lewandowski
>  wrote:
> >
> > Great discussion, but I feel we still have no conclusion.
> >
> >
> > I fully support automatically setting up IDE(A) to run the necessary
> stuff automatically in a developer-friendly environment, but let it be
> continued in a separate thread.
> >
> >
> > I wouldn't say I like flags, especially if they have to be used on a
> daily basis. The build script help message does not list them when "ant -p"
> is run.
> >
> >
> > I'm going to make these changes unless it is vetoed:
> >
> > "ant [check]" = build + all checks, build everything, and run all the
> checks; also, this would become the default target if no target is specified
> > "ant jar" = build + make jars: build all the jars and tests, no checks
> > All "test" commands = build + make jars + run the tests: build all the
> jars and tests, run the tests, no checks
> >
> >
> > Therefore, a user who wants to validate their branch before running CI
> would need to run just "ant" without any args. This way, a newcomer who
> does not know our build targets will likely run the checks.
> >
> >
> > We still need some flags for skipping specific tasks to optimize for CI,
> but in general, they would not be required for local development.
> >
> >
> > Flags will also be needed to customize some tasks, but they should be
> optional for newcomers. In addition, a "help" target could display a list
> of selected tasks and properties with descriptions.
> >
> >
> > I'd be more than happy if we could conclude the discussion somehow and
> move forward :)
> >
> >
> > th

New episode of The Apache Cassandra (R) Corner podcast!

2023-07-06 Thread Aaron Ploetz
Link to the next episode (audio only):
https://drive.google.com/file/d/1HmhtR1stWmtD8gJTFh3gKIye7lQKXN1y/view?usp=sharing

s2e7 - German Eighberger and Theo van Kraay (Microsoft)
(You may have to download it to play)

It will remain in staging for 72 hours, going live (assuming no objections)
by Monday, July 10th.

If anyone should have any questions or comments, or if you want to be a
guest, please reach out to me.

For my guest pipeline, I'm still coordinating with Josh McKenzie.  But I am
looking for additional guests.  So if you know someone who would be a
great guest, let me know!

Thanks, everyone!

Aaron


Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-06 Thread Ekaterina Dimitrova
Hi,

First of all, thank you for all the work!
I personally think that it should be ok to add a new column.

I will be very happy to see this landing in 5.0.
I am personally against porting this patch to 4.1. To be clear, I am sure
you did a great job and my response would be the same to every single
person - the configuration is quite wide-spread and the devil is in the
details. I do not see a good reason for exception here except convenience.
There is no feature flag for these changes too, right?

Best regards,
Ekaterina

На четвъртък, 6 юли 2023 г. Miklosovic, Stefan 
написа:

> Hi Maxim,
>
> I went through the PR and added my comments. I think David also reviewed
> it. All points you mentioned make sense to me but I humbly think it is
> necessary to have at least one additional pair of eyes on this as the patch
> is relatively impactful.
>
> I would like to see additional column in system_views.settings of name
> "mutable" and of type "boolean" to see what field I am actually allowed to
> update as an operator.
>
> It seems to me you agree with the introduction of this column (1) but
> there is no clear agreement where we actually want to put it. You want this
> whole feature to be committed to 4.1 branch as well which is an interesting
> proposal. I was thinking that this work will go to 5.0 only. I am not
> completely sure it is necessary to backport this feature but your
> argumentation here (2) is worth to discuss further.
>
> If we introduce this change to 4.1, that field would not be there but in
> 5.0 it would. So that way we will not introduce any new column to
> system_views.settings.
> We could also go with the introduction of this column to 4.1 if people are
> ok with that.
>
> For the simplicity, I am slightly leaning towards introducing this feature
> to 5.0 only.
>
> (1) https://github.com/apache/cassandra/pull/2334#discussion_r1251104171
> (2) https://github.com/apache/cassandra/pull/2334#discussion_r1251248041
>
> 
> From: Maxim Muzafarov 
> Sent: Friday, June 23, 2023 13:50
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change
> running configuration
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> Hello everyone,
>
>
> As there is a lack of feedback for an option to go on with and having
> a discussion for pros and cons for each option I tend to agree with
> the vision of this problem proposed by David :-) After a lot of
> discussion on Slack, we came to the @ValidatedBy annotation which
> points to a validation method of a property and this will address all
> our concerns and issues with validation.
>
> I'd like to raise the visibility of these changes and try to find one
> more committer to look at them:
> https://issues.apache.org/jira/browse/CASSANDRA-15254
> https://github.com/apache/cassandra/pull/2334/files
>
> I'd really appreciate any kind of review in advance.
>
>
> Despite the number of changes +2,043 −302 and the fact that most of
> these additions are related to the tests themselves, I would like to
> highlight the crucial design points which are required to make the
> SettingsTable virtual table updatable. Some of these have already been
> discussed in this thread, and I would like to provide a brief outline
> of these points to facilitate the PR review.
>
> So, what are the problems that have been solved to make the
> SettingsTable updatable?
>
> 1. Input validation.
>
> Currently, the JMX, Yaml and DatabaseDescriptor#apply methods perform
> the same validation of user input for the same property in their own
> ways which fortunately results in a consistent configuration state,
> but not always. The CASSANDRA-17734 is a good example of this.
>
> The @ValidatedBy annotations, which point to a validation method have
> been added to address this particular problem. So, no matter what API
> is triggered the method will be called to validate input and will also
> work even if the cassandra.yaml is loaded by the yaml engine in a
> pre-parse state, such as we are now checking input properties for
> deprecation and nullability.
>
> There are two types of validation worth mentioning:
> - stateless - properties do not depend on any other configuration;
> - stateful - properties that require a fully-constructed Config
> instance to be validated and those values depend on other properties;
>
> For the sake of simplicity, the scope of this task will be limited to
> dealing with stateless properties only, but stateful validations are
> also supported in the initial PR using property change listeners.
>
> 2. Property mutability.
>
> There is no way of distinguishing which parts of a property are
> mutable and which are not. This meta-information must be available at
> runtime and as we discussed earlier the @Mutable annotation is added
> to handle this.
>
> 3. Listening for p

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-06 Thread Maxim Muzafarov
In my humble opinion, it is better to have only one plain and
straightforward build pipeline for the whole project, with custom
flags used to skip a particular step, than to have multiple pipelines
under the ant tool with multiple endpoints accordingly. I mean, all
the steps need to be lined up, with each step in the pipeline
executing everything that stands before it unless skip flags are
specified. Meanwhile, I like your idea of grouping all the checks
under the dedicated step (and changing the no-checkstyle flag to
no-checks accordingly as Ekaterina mentioned).


Let me share a simple example of what I'm talking about with one
single endpoint.
Let's assume the following step order:

init -> _build_java (compile) -> checks -> build -> jar -> test ->
artifacts -> publish;

So, the use would be:

ant jar -Dno-checks
ant test -Dno-build
ant publish -Dno-tests -Dno-checks


I'm not saying what you've proposed is bad, in fact, we're not
currently doing the pipeline I'm talking about, but adding an
additional endpoint is something we should consider very carefully as
it may create some difficulties for Maven/Gradle migration if it ever
happens.

So, if I'm not mistaken the following you're trying to add a new
endpoint to the way how we might build the project:

- "ant [check]" = build + all checks (first endpoint)
- "ant jar" = build + make jars + no checks (second endpoint)

And I would suggest running `ant jar -Dno-checks` instead to achieve
the same result assuming the `jar` is still transitively dependent on
`checks`.

On Thu, 6 Jul 2023 at 14:02, Jacek Lewandowski
 wrote:
>
> Great discussion, but I feel we still have no conclusion.
>
>
> I fully support automatically setting up IDE(A) to run the necessary stuff 
> automatically in a developer-friendly environment, but let it be continued in 
> a separate thread.
>
>
> I wouldn't say I like flags, especially if they have to be used on a daily 
> basis. The build script help message does not list them when "ant -p" is run.
>
>
> I'm going to make these changes unless it is vetoed:
>
> "ant [check]" = build + all checks, build everything, and run all the checks; 
> also, this would become the default target if no target is specified
> "ant jar" = build + make jars: build all the jars and tests, no checks
> All "test" commands = build + make jars + run the tests: build all the jars 
> and tests, run the tests, no checks
>
>
> Therefore, a user who wants to validate their branch before running CI would 
> need to run just "ant" without any args. This way, a newcomer who does not 
> know our build targets will likely run the checks.
>
>
> We still need some flags for skipping specific tasks to optimize for CI, but 
> in general, they would not be required for local development.
>
>
> Flags will also be needed to customize some tasks, but they should be 
> optional for newcomers. In addition, a "help" target could display a list of 
> selected tasks and properties with descriptions.
>
>
> I'd be more than happy if we could conclude the discussion somehow and move 
> forward :)
>
>
> thanks,
>
> Jacek
>
>
>
> czw., 29 cze 2023 o 23:34 Ekaterina Dimitrova  
> napisał(a):
>>
>> There is a separate thread started and respective ticket for 
>> generate-idea-files.
>> https://lists.apache.org/thread/o2fdkyv2skvf9ngy9jhpnhvo92qvr17m
>> CASSANDRA-18467
>>
>>
>> On Thu, 29 Jun 2023 at 16:54, Jeremiah Jordan  
>> wrote:
>>>
>>> +100 I support making generate-idea-files auto setup everything in IntelliJ 
>>> for you.  If you post a diff, I will test it.
>>>
>>> On this proposal, I don’t really have an opinion one way or the other about 
>>> what the default is for local "ant jar”, if its slow I will figure out how 
>>> to turn it off, if its fast I will leave it on.
>>> I do care that CI runs checks, and complains loudly if something is wrong 
>>> such that it is very easy to tell during review.
>>>
>>> -Jeremiah
>>>
>>> On Jun 29, 2023 at 1:44:09 PM, Josh McKenzie  wrote:

 In accord I added an opt-out for each hook, and will require such here as 
 well

 On for main branches, off for feature branches seems like it might blanket 
 satisfy this concern? Doesn't fix the "--atomic across 5 branches means 
 style checks and build on hook across those branches" which isn't ideal. I 
 don't think style check failures after push upstream are frequent enough 
 to make the cost/benefit there make sense overall are they?

 Related to this - I have sonarlint, spotbugs, and checkstyle all running 
 inside idea; since pulling those in and tuning the configs a bit I haven't 
 run into a single issue w/our checkstyle build target (go figure). Having 
 the required style checks reflected realtime inside your work environment 
 goes a long way towards making it a more intuitive part of your workflow 
 rather than being an annoying last minute block of your ability to 
 progress that requires circling back into the code.

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-07-06 Thread Jacek Lewandowski
Great discussion, but I feel we still have no conclusion.


I fully support automatically setting up IDE(A) to run the necessary stuff
automatically in a developer-friendly environment, but let it be continued
in a separate thread.


I wouldn't say I like flags, especially if they have to be used on a daily
basis. The build script help message does not list them when "ant -p" is
run.


I'm going to make these changes unless it is vetoed:

   - "ant [check]" = build + all checks, build everything, and run all the
   checks; also, this would become the default target if no target is specified
   - "ant jar" = build + make jars: build all the jars and tests, no checks
   - All "test" commands = build + make jars + run the tests: build all the
   jars and tests, run the tests, no checks


Therefore, a user who wants to validate their branch before running CI
would need to run just "ant" without any args. This way, a newcomer who
does not know our build targets will likely run the checks.


We still need some flags for skipping specific tasks to optimize for CI,
but in general, they would not be required for local development.


Flags will also be needed to customize some tasks, but they should be
optional for newcomers. In addition, a "help" target could display a list
of selected tasks and properties with descriptions.


I'd be more than happy if we could conclude the discussion somehow and move
forward :)


thanks,

Jacek



czw., 29 cze 2023 o 23:34 Ekaterina Dimitrova 
napisał(a):

> There is a separate thread started and respective ticket for
> generate-idea-files.
> https://lists.apache.org/thread/o2fdkyv2skvf9ngy9jhpnhvo92qvr17m
> CASSANDRA-18467
>
>
> On Thu, 29 Jun 2023 at 16:54, Jeremiah Jordan 
> wrote:
>
>> +100 I support making generate-idea-files auto setup everything in
>> IntelliJ for you.  If you post a diff, I will test it.
>>
>> On this proposal, I don’t really have an opinion one way or the other
>> about what the default is for local "ant jar”, if its slow I will figure
>> out how to turn it off, if its fast I will leave it on.
>> I do care that CI runs checks, and complains loudly if something is wrong
>> such that it is very easy to tell during review.
>>
>> -Jeremiah
>>
>> On Jun 29, 2023 at 1:44:09 PM, Josh McKenzie 
>> wrote:
>>
>>> In accord I added an opt-out for each hook, and will require such here
>>> as well
>>>
>>> On for main branches, off for feature branches seems like it might
>>> blanket satisfy this concern? Doesn't fix the "--atomic across 5 branches
>>> means style checks and build on hook across those branches" which isn't
>>> ideal. I don't think style check failures after push upstream are frequent
>>> enough to make the cost/benefit there make sense overall are they?
>>>
>>> Related to this - I have sonarlint, spotbugs, and checkstyle all running
>>> inside idea; since pulling those in and tuning the configs a bit I haven't
>>> run into a single issue w/our checkstyle build target (go figure). Having
>>> the required style checks reflected realtime inside your work environment
>>> goes a long way towards making it a more intuitive part of your workflow
>>> rather than being an annoying last minute block of your ability to progress
>>> that requires circling back into the code.
>>>
>>> From a technical perspective, it looks like adding a reference
>>> "externalDependencies.xml" to our ide/idea directory which we copied over
>>> during "generate-idea-files" would be sufficient to get idea to pop up
>>> prompts to install those extensions if you don't have them when opening the
>>> project (theory; haven't tested).
>>>
>>> We'd need to make sure the configuration for each of those was
>>> calibrated to our project out of the box of course, but making style
>>> considerations a first-class citizen in that way seems a more intuitive and
>>> human-centered approach to all this rather than debating nuance of our
>>> command-line targets, hooks, and how we present things to people. To
>>> Berenguer's point - better to have these be completely invisible to people
>>> with their workflows and Just Work (except for when your IDE scolds you for
>>> bad behavior w/build errors immediately).
>>>
>>> I still think Flags Are Bad. :)
>>>
>>> On Thu, Jun 29, 2023, at 1:38 PM, Ekaterina Dimitrova wrote:
>>>
>>> Should we just keep a consolidated for all kind of checks no-check flag
>>> and get rid of the no-checkstyle one?
>>>
>>> Trading one for one with Josh :-)
>>>
>>> Best regards,
>>> Ekaterina
>>>
>>> On Thu, 29 Jun 2023 at 10:52, Josh McKenzie 
>>> wrote:
>>>
>>>
>>> I really prefer separate tasks than flags. Flags are not listed in the
>>> help message like "ant -p" and are not auto-completed in the terminal. That
>>> makes them almost undiscoverable for newcomers.
>>>
>>> Please, no more flags. We are *more* than flaggy enough right now.
>>>
>>> Having to dig through build.xml to determine how to change things or do
>>> things is painful; the more we can avoid this (for oldti

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-07-06 Thread Miklosovic, Stefan
Hi Maxim,

I went through the PR and added my comments. I think David also reviewed it. 
All points you mentioned make sense to me but I humbly think it is necessary to 
have at least one additional pair of eyes on this as the patch is relatively 
impactful.

I would like to see additional column in system_views.settings of name 
"mutable" and of type "boolean" to see what field I am actually allowed to 
update as an operator.

It seems to me you agree with the introduction of this column (1) but there is 
no clear agreement where we actually want to put it. You want this whole 
feature to be committed to 4.1 branch as well which is an interesting proposal. 
I was thinking that this work will go to 5.0 only. I am not completely sure it 
is necessary to backport this feature but your argumentation here (2) is worth 
to discuss further.

If we introduce this change to 4.1, that field would not be there but in 5.0 it 
would. So that way we will not introduce any new column to 
system_views.settings.
We could also go with the introduction of this column to 4.1 if people are ok 
with that.

For the simplicity, I am slightly leaning towards introducing this feature to 
5.0 only.

(1) https://github.com/apache/cassandra/pull/2334#discussion_r1251104171
(2) https://github.com/apache/cassandra/pull/2334#discussion_r1251248041


From: Maxim Muzafarov 
Sent: Friday, June 23, 2023 13:50
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Allow UPDATE on settings virtual table to change running 
configuration

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




Hello everyone,


As there is a lack of feedback for an option to go on with and having
a discussion for pros and cons for each option I tend to agree with
the vision of this problem proposed by David :-) After a lot of
discussion on Slack, we came to the @ValidatedBy annotation which
points to a validation method of a property and this will address all
our concerns and issues with validation.

I'd like to raise the visibility of these changes and try to find one
more committer to look at them:
https://issues.apache.org/jira/browse/CASSANDRA-15254
https://github.com/apache/cassandra/pull/2334/files

I'd really appreciate any kind of review in advance.


Despite the number of changes +2,043 −302 and the fact that most of
these additions are related to the tests themselves, I would like to
highlight the crucial design points which are required to make the
SettingsTable virtual table updatable. Some of these have already been
discussed in this thread, and I would like to provide a brief outline
of these points to facilitate the PR review.

So, what are the problems that have been solved to make the
SettingsTable updatable?

1. Input validation.

Currently, the JMX, Yaml and DatabaseDescriptor#apply methods perform
the same validation of user input for the same property in their own
ways which fortunately results in a consistent configuration state,
but not always. The CASSANDRA-17734 is a good example of this.

The @ValidatedBy annotations, which point to a validation method have
been added to address this particular problem. So, no matter what API
is triggered the method will be called to validate input and will also
work even if the cassandra.yaml is loaded by the yaml engine in a
pre-parse state, such as we are now checking input properties for
deprecation and nullability.

There are two types of validation worth mentioning:
- stateless - properties do not depend on any other configuration;
- stateful - properties that require a fully-constructed Config
instance to be validated and those values depend on other properties;

For the sake of simplicity, the scope of this task will be limited to
dealing with stateless properties only, but stateful validations are
also supported in the initial PR using property change listeners.

2. Property mutability.

There is no way of distinguishing which parts of a property are
mutable and which are not. This meta-information must be available at
runtime and as we discussed earlier the @Mutable annotation is added
to handle this.

3. Listening for property changes.

Some of the internal components e.g. CommitLog, may perform some
operations and/or calculations just before or just after the property
change. As long as JMX is the only API used to update configuration
properties, there is no problem. To address this issue the observer
pattern has been used to maintain the same behaviour.

4. SettingsTable input/output format.

JMX, SettingsTable and Yaml accept values in different formats which
may not be compatible in some of the cases especially when
representing composite objects. The former uses toString() as an
output, and the latter uses a yaml human-readable format.

So, in order to see the same properties in the same format through
different APIs, the Yaml representation is reuse