Re: Intra-project dependencies

2023-02-17 Thread Mick Semb Wever
On Thu, 16 Feb 2023 at 21:43, David Capwell  wrote:

> After a lot of effort I think this branch is in a good state, accord feels
> mostly like its in-tree and all the complexity of git is hidden mostly.  I
> would love more feedback as the patch is in a usable state
>


This work is very good, thanks David.
It is going to require a little bit for folk to get familiar with: there's
no free lunch; so give it a whirl. There's help in CONTRIBUTING.md
https://github.com/dcapwell/cassandra/tree/accord-submodules


Re: Intra-project dependencies

2023-02-16 Thread David Capwell
After a lot of effort I think this branch is in a good state, accord feels 
mostly like its in-tree and all the complexity of git is hidden mostly.  I 
would love more feedback as the patch is in a usable state

> On Jan 30, 2023, at 3:16 PM, David Capwell  wrote:
> 
> I took a stab at creating a patch that I think addresses most of the comments 
> I saw in this thread, would love feedback in 
> https://issues.apache.org/jira/browse/CASSANDRA-18204
> 
> Given that the leading solution is git submodules I went down this path and 
> fleshed out the things I saw in this thread.  I don’t think this patch is 
> 100% perfect (been trying to figure out release logic to confirm) so would 
> love to here places that I neglected or problem areas found!
> 
>> On Jan 20, 2023, at 6:48 AM, Mick Semb Wever > > wrote:
>> 
>>  
> Both a git post-checkout and a build fail-fast will protect us here. But 
> the post-checkout will need to fail silently if the .git subdirectory 
> doesn't exist.
 
 Correction: the build fail-fast will need to fail silently if the .git 
 subdirectory doesn't exist.
>>> 
>>> 
>>> How will this work for users downloading source distributions?
>> 
>> It is presumed that the source found in the submodule is on the correct SHA. 
>> The integrity checks are in place when creating and when voting on the 
>> source tarball release. This means that the the build of the submodule has 
>> to be part of the in-tree build (which I have assumed already).
> 



Re: Intra-project dependencies

2023-01-30 Thread David Capwell
I took a stab at creating a patch that I think addresses most of the comments I 
saw in this thread, would love feedback in 
https://issues.apache.org/jira/browse/CASSANDRA-18204 


Given that the leading solution is git submodules I went down this path and 
fleshed out the things I saw in this thread.  I don’t think this patch is 100% 
perfect (been trying to figure out release logic to confirm) so would love to 
here places that I neglected or problem areas found!

> On Jan 20, 2023, at 6:48 AM, Mick Semb Wever  wrote:
> 
>  
> Both a git post-checkout and a build fail-fast will protect us here. But the 
> post-checkout will need to fail silently if the .git subdirectory doesn't 
> exist.
> 
> Correction: the build fail-fast will need to fail silently if the .git 
> subdirectory doesn't exist.
> 
> How will this work for users downloading source distributions?
> 
> It is presumed that the source found in the submodule is on the correct SHA. 
> The integrity checks are in place when creating and when voting on the source 
> tarball release. This means that the the build of the submodule has to be 
> part of the in-tree build (which I have assumed already).



Re: Intra-project dependencies

2023-01-20 Thread Mick Semb Wever
> Both a git post-checkout and a build fail-fast will protect us here. But
>>> the post-checkout will need to fail silently if the .git subdirectory
>>> doesn't exist.
>>>
>>
>> Correction: the build fail-fast will need to fail silently if the .git
>> subdirectory doesn't exist.
>>
>
> How will this work for users downloading source distributions?
>

It is presumed that the source found in the submodule is on the correct
SHA. The integrity checks are in place when creating and when voting on the
source tarball release. This means that the the build of the submodule has
to be part of the in-tree build (which I have assumed already).


Re: Intra-project dependencies

2023-01-20 Thread Brandon Williams
On Fri, Jan 20, 2023, 8:31 AM Mick Semb Wever  wrote:

> Both a git post-checkout and a build fail-fast will protect us here. But
>> the post-checkout will need to fail silently if the .git subdirectory
>> doesn't exist.
>>
>
>
> Correction: the build fail-fast will need to fail silently if the .git
> subdirectory doesn't exist.
>

How will this work for users downloading source distributions?


Re: Intra-project dependencies

2023-01-20 Thread Mick Semb Wever
>
> Both a git post-checkout and a build fail-fast will protect us here. But
> the post-checkout will need to fail silently if the .git subdirectory
> doesn't exist.
>


Correction: the build fail-fast will need to fail silently if the .git
subdirectory doesn't exist.


Re: Intra-project dependencies

2023-01-20 Thread Henrik Ingo
Thanks Mick and David. I've been following this silently for a few days
because we already exhausted my knowledge on the topic. But it seems your
collective knowledge is uncovering a nice solution.

If I summarize, I like all of this:

- link to SHA, not library version
- use git submodules because that's what they are meant to be used for/
it's standard
- use git hooks to automate the otherwise annoying ux of submodules
- use gradle to automate the installation of the hooks (note: imo must ask
user for explicit permission)
- whether or not user installed the hooks, build system by default should
check and fail to work with wrong sha in any submodule. But allow overrides.
- The build system, source tarball etc should consider the submodules as
just being a directory in the source tree. Things should work the same
whether you are in a git checkout or source tarball.

Henrik

On Fri, 20 Jan 2023, 02:54 David Capwell,  wrote:

> Thanks for the reply, my replies are inline to your inline replies =D
>
> On Jan 19, 2023, at 2:39 PM, Mick Semb Wever  wrote:
>
>
> Thanks David for the detailed write up. Replies inline…
>
>
>
>> We tried in-tree for in-jvm dtest and found that this broke every other
>> commit… maintaining the APIs across all our supported branches was too hard
>> to do and moving it outside of the tree helped make the upgrade tests more
>> stable (there were breakage but less frequent)….
>>
>
>
> The in-jvm dtest-api library is unique in this way. I would not use it as
> reasoning that other libraries should not be in-tree.
>
>
> Fair, its the only API we have that is required to be byte code compatible
> cross versions/builds; this unique property may argue for different
> solutions than others
>
>
>
>
>
>> We tried to do snapshot builds where the version contained the SHA, but
>> this has the issue that snapshot builds “may” go away over time and made
>> older SHAs no longer building…
>>
>
>
> Only keeping the last snapshot in repository.a.o is INFRA's policy (i've
> found out).
> We can ask INFRA to set up a separate snapshots repository just for us,
> with a longer expiry policy. I'd rather not create extra work for infra if
> there's other ways we can do this, and this approach would always require
> some fallback approach to rebuilding the dedepency's SHA from scratch.
>
>
> If they will allow this and allow the snapshots to never be purged, then I
> am ok with this as a solution.
>
>
>
>
>
>> We break python-dtest when cross-cutting changes are added as CI is hard
>> to do correctly or not supported (testing downstream users (our 4 supported
>> branches) is rarely done).
>>
>
>
> python dtests' is also in a different category, (the context and
> consumption in a different direction, i.e. it's not a library used within
> the in-tree).
>
>
> I disagree.  The point I was making is we have several dependencies and we
> should think about how we maintain them.  My point is still valid that
> python dtests are involved with cross cutting changes to Cassandra, and the
> lack of downstream testing has broken us several times.  The solution to
> this problem may be different than Accord (as C* doesn’t depend on python
> dtest as you point out), but that does not mean we shouldn’t think about it
> in this conversation….
>
> One thing that comes to mind is that dependencies may benefit from running
> a limited C* CI as part of their merge process.  At the moment people are
> expected to create a tmp CI branch for all 4 supported C* versions, point
> it to the python dtest change, then submit to the JIRA as proof that CI was
> ran… normally when I find python dtest broke in branch X I find this had
> not happened…
>
> This holds true I believe for JVM dtest as well as we should be validating
> that the 4 target C* branches still work if you are touching jvm dtest…
>
> Now, with all that, Accord being external will have similar issues, a
> change there may break Cassandra so we should include a subset of Cassandra
> tests in Accord’s CI.
>
>
>
>
>> * [nice to have] be able to work with all subprojects in one IDE and not
>> have to switch between windows while making cross-cutting changes
>>
>
>
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
> it, just asking…)
>
>
> I have not used Eclipse or NetBeans for around 10 years so no clue!
>
>
>
>
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>
>>
>> Correct, if you use submodules/script you have a text file saying what we
>> “should” use, but this does not enforce actually using them… again we could
>> make sure build.xml does the right thing,
>>
>
>
> If we try this approach out, I'm definitely in favour of any build.xml
> command immediately failing if `git submodule status` != `git submodule
> status --cached`
>
>
> +1
>
>
>
>
>> but this can be confusing for people who mainly build in IDE and don’t
>> depend on build.xml until later in development… this is something we should
>> think about…
>>
>

Re: Intra-project dependencies

2023-01-20 Thread Mick Semb Wever
replies are inline to your inline replies to my inline replies 論



> We can ask INFRA to set up a separate snapshots repository just for us,
> with a longer expiry policy. I'd rather not create extra work for infra if
> there's other ways we can do this, and this approach would always require
> some fallback approach to rebuilding the dedepency's SHA from scratch.
>
>
> If they will allow this and allow the snapshots to never be purged, then I
> am ok with this as a solution.
>


They will get purged eventually, and may get lost (no backups).


>
>
>> We break python-dtest when cross-cutting changes are added as CI is hard
>> to do correctly or not supported (testing downstream users (our 4 supported
>> branches) is rarely done).
>>
>
>
> python dtests' is also in a different category, (the context and
> consumption in a different direction, i.e. it's not a library used within
> the in-tree).
>
>
> I disagree.  The point I was making is we have several dependencies and we
> should think about how we maintain them.  My point is still valid that
> python dtests are involved with cross cutting changes to Cassandra, and the
> lack of downstream testing has broken us several times.  The solution to
> this problem may be different than Accord (as C* doesn’t depend on python
> dtest as you point out), but that does not mean we shouldn’t think about it
> in this conversation….
>
> One thing that comes to mind is that dependencies may benefit from running
> a limited C* CI as part of their merge process.  At the moment people are
> expected to create a tmp CI branch for all 4 supported C* versions, point
> it to the python dtest change, then submit to the JIRA as proof that CI was
> ran… normally when I find python dtest broke in branch X I find this had
> not happened…
>
> This holds true I believe for JVM dtest as well as we should be validating
> that the 4 target C* branches still work if you are touching jvm dtest…
>
> Now, with all that, Accord being external will have similar issues, a
> change there may break Cassandra so we should include a subset of Cassandra
> tests in Accord’s CI.
>


Fair enough, and this reasoning also applies to dtest-api. But this is an
additional concern in the discussion, with potentially different solutions.

Part of the testing requirements to dtests (and libraries that are included
in-tree) is downstream CI.
When you make a change in cassandra-dtest, you shouldn't have to go test
the C* branches – it should be part of the CI pipeline for cassandra-dtest
itself.

For dtests the versions tested are explicit. It's different for libraries
that are included in-tree, but you have to make the change in-tree, so it
makes sense it's part of in-tree CI.



> * [nice to have] be able to work with all subprojects in one IDE and not
>> have to switch between windows while making cross-cutting changes
>>
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
> it, just asking…)
>
> I have not used Eclipse or NetBeans for around 10 years so no clue!
>
>
>
>> but this can be confusing for people who mainly build in IDE and don’t
>> depend on build.xml until later in development… this is something we should
>> think about…
>>
> Again, isn't this only IntelliJ?
>
> Not sure, the only other IDE we support is NetBeans and not sure what we
> do there.
>


Off-topic: NetBeans allows you to have many projects open in the one window
(easy to have 20-30 projects open), and it does not do anything with
sources its own way – everything is delegated to the project's build system
(ant/gradle/maven).


A project I am familiar with has their build auto-inject git hooks to make
>> sure things “just work”, we may be able to solve this in a similar way?
>>
>
> I'd like to hear/see more!
>
> The project wants to make sure commit messages are structured “correctly”
> so enforces this via git hooks.  Gradle (the build they use) makes tasks
> depend on “installGitHooks” which copies 2 hooks to .git/hooks (commit-msg,
> and pre-push)
>
> We could always do the same in build.xml, that copies hooks we define into
> .git/hooks to make sure the behaviors we expect are enforced.  See
> https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
> 
>
> There is a “post-checkout” hook we could leverage to detect that the
> dependences SHA are no longer the same and recursively checks out the
> “correct” dependencies
>


I like it! But I think we would need this AND a fail-fast in the build.xml



>  - our releases get more complicated (our source tarballs are the asf
>> releases)
>>
>>
>> We don’t include our dependencies do we?  If so, then does it really?  If
>> Accord is a library we use, why would we include it’s source in the build?
>> Isn’t it just another library from this point of view?
>>
>
> The build of the source 

Re: Intra-project dependencies

2023-01-19 Thread David Capwell
Thanks for the reply, my replies are inline to your inline replies =D

> On Jan 19, 2023, at 2:39 PM, Mick Semb Wever  wrote:
> 
> 
> Thanks David for the detailed write up. Replies inline…
> 
>  
> We tried in-tree for in-jvm dtest and found that this broke every other 
> commit… maintaining the APIs across all our supported branches was too hard 
> to do and moving it outside of the tree helped make the upgrade tests more 
> stable (there were breakage but less frequent)…. 
> 
> 
> The in-jvm dtest-api library is unique in this way. I would not use it as 
> reasoning that other libraries should not be in-tree.

Fair, its the only API we have that is required to be byte code compatible 
cross versions/builds; this unique property may argue for different solutions 
than others

> 
> 
>  
> We tried to do snapshot builds where the version contained the SHA, but this 
> has the issue that snapshot builds “may” go away over time and made older 
> SHAs no longer building… 
>  
> 
> Only keeping the last snapshot in repository.a.o is INFRA's policy (i've 
> found out).
> We can ask INFRA to set up a separate snapshots repository just for us, with 
> a longer expiry policy. I'd rather not create extra work for infra if there's 
> other ways we can do this, and this approach would always require some 
> fallback approach to rebuilding the dedepency's SHA from scratch.

If they will allow this and allow the snapshots to never be purged, then I am 
ok with this as a solution.

> 
> 
>  
> We break python-dtest when cross-cutting changes are added as CI is hard to 
> do correctly or not supported (testing downstream users (our 4 supported 
> branches) is rarely done).  
> 
> 
> python dtests' is also in a different category, (the context and consumption 
> in a different direction, i.e. it's not a library used within the in-tree). 

I disagree.  The point I was making is we have several dependencies and we 
should think about how we maintain them.  My point is still valid that python 
dtests are involved with cross cutting changes to Cassandra, and the lack of 
downstream testing has broken us several times.  The solution to this problem 
may be different than Accord (as C* doesn’t depend on python dtest as you point 
out), but that does not mean we shouldn’t think about it in this conversation….

One thing that comes to mind is that dependencies may benefit from running a 
limited C* CI as part of their merge process.  At the moment people are 
expected to create a tmp CI branch for all 4 supported C* versions, point it to 
the python dtest change, then submit to the JIRA as proof that CI was ran… 
normally when I find python dtest broke in branch X I find this had not 
happened… 

This holds true I believe for JVM dtest as well as we should be validating that 
the 4 target C* branches still work if you are touching jvm dtest…

Now, with all that, Accord being external will have similar issues, a change 
there may break Cassandra so we should include a subset of Cassandra tests in 
Accord’s CI.

> 
>  
> * [nice to have] be able to work with all subprojects in one IDE and not have 
> to switch between windows while making cross-cutting changes
> 
> 
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate 
> it, just asking…)

I have not used Eclipse or NetBeans for around 10 years so no clue!  

> 
>   
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
> 
> Correct, if you use submodules/script you have a text file saying what we 
> “should” use, but this does not enforce actually using them… again we could 
> make sure build.xml does the right thing,
> 
> 
> If we try this approach out, I'm definitely in favour of any build.xml 
> command immediately failing if `git submodule status` != `git submodule 
> status --cached`

+1

> 
>  
> but this can be confusing for people who mainly build in IDE and don’t depend 
> on build.xml until later in development… this is something we should think 
> about…
> 
> 
> Again, isn't this only IntelliJ?

Not sure, the only other IDE we support is NetBeans and not sure what we do 
there. 

> 
>  
> A project I am familiar with has their build auto-inject git hooks to make 
> sure things “just work”, we may be able to solve this in a similar way?
> 
> 
> I'd like to hear/see more!

The project wants to make sure commit messages are structured “correctly” so 
enforces this via git hooks.  Gradle (the build they use) makes tasks depend on 
“installGitHooks” which copies 2 hooks to .git/hooks (commit-msg, and pre-push)


We could always do the same in build.xml, that copies hooks we define into 
.git/hooks to make sure the behaviors we expect are enforced.  See 
https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks 



There is a “post-checkout” hook we could leverage to detect that the 
dependences SHA are no longer the same and recursively checks out the “correct” 

Re: Intra-project dependencies

2023-01-19 Thread Mick Semb Wever
Thanks David for the detailed write up. Replies inline…



> We tried in-tree for in-jvm dtest and found that this broke every other
> commit… maintaining the APIs across all our supported branches was too hard
> to do and moving it outside of the tree helped make the upgrade tests more
> stable (there were breakage but less frequent)….
>


The in-jvm dtest-api library is unique in this way. I would not use it as
reasoning that other libraries should not be in-tree.




> We tried to do snapshot builds where the version contained the SHA, but
> this has the issue that snapshot builds “may” go away over time and made
> older SHAs no longer building…
>


Only keeping the last snapshot in repository.a.o is INFRA's policy (i've
found out).
We can ask INFRA to set up a separate snapshots repository just for us,
with a longer expiry policy. I'd rather not create extra work for infra if
there's other ways we can do this, and this approach would always require
some fallback approach to rebuilding the dedepency's SHA from scratch.




> We break python-dtest when cross-cutting changes are added as CI is hard
> to do correctly or not supported (testing downstream users (our 4 supported
> branches) is rarely done).
>


python dtests' is also in a different category, (the context and
consumption in a different direction, i.e. it's not a library used within
the in-tree).



> * [nice to have] be able to work with all subprojects in one IDE and not
> have to switch between windows while making cross-cutting changes
>


Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
it, just asking…)



>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>
>
> Correct, if you use submodules/script you have a text file saying what we
> “should” use, but this does not enforce actually using them… again we could
> make sure build.xml does the right thing,
>


If we try this approach out, I'm definitely in favour of any build.xml
command immediately failing if `git submodule status` != `git submodule
status --cached`



> but this can be confusing for people who mainly build in IDE and don’t
> depend on build.xml until later in development… this is something we should
> think about…
>


Again, isn't this only IntelliJ?



> A project I am familiar with has their build auto-inject git hooks to make
> sure things “just work”, we may be able to solve this in a similar way?
>


I'd like to hear/see more!

 - permanence from a git SHA no longer exists
>
>
> Why is this?  The SHA points to other SHAs, so it is still immutable.  If
> we claim that pointing to other SHAs doesn’t count then why do library
> versions?  Both are immutable snapshots of code at a specific point in time?
>


This, and a number of the other points, is already resolved (that
submodule's are on fixed SHAs, not floating HEAD).

 - our releases get more complicated (our source tarballs are the asf
> releases)
>
>
> We don’t include our dependencies do we?  If so, then does it really?  If
> Accord is a library we use, why would we include it’s source in the build?
> Isn’t it just another library from this point of view?
>


The build of the source tarball must work. If the source tarball release
switches how it does things, from building the submodule to including a
dependency then we're back to having to make releases (and introducing a
risk, and we don't ourselves work frequently with the source tarballs).




> switching between branches,
>
>
> This is a pain point that I feel should be handled by git hooks.  We have
> this issue when you try to reuse the same directory for different release
> branches, and its super annoying when you go back in time when jars were
> in-tree as you need to cleanup after switching back…. I do agree that we
> should flesh this out as its going to be the common case, so how do we “do
> the right thing” should be handled
>


+1



> Rather that you need to know in advance when the SHA is not HEAD.
>
>
> Do you?  Or do you really need to know which “branch” it is following?
> For example, lets say we release 5.0 then 5.1 then 5.2, and there are
> accord versions for each: 1.0, 1.2, 2.0… do we not need to really know
> which branch it is following, and only when you are trying to do a
> cross-cutting change?
>

I'm still a little confused here. If a submodule is following a branch, is
that floating? Then a parent SHA isn't fixed to a submodule SHA?

Say trunk is using accord:a12 where a12 is a SHA on its trunk. Other
non-cassandra people using accord make commits, but our in-tree trunk isn't
moved forward. Then someone in-tree does some dev that touches accord, they
work away but late in the dev cycle find out that in-tree trunk isn't on
the latest accord trunk and there's a conflict rebasing their work onto the
latest accord. Is this an accurate description?

Hope that all makes sense.


Re: Intra-project dependencies

2023-01-18 Thread Benedict
If we make sure all branches are using the latest “stable” accord then this is 6 commits (4 for C*, 1 for accord the stable branch, then 1 to merge into trunk)If we’re modifying stable, we only need one commit per C* branch per release. We don’t need to immediately point C* to it. So there could plausibly be far fewer total commits this way, though the reality is hard to predict and will vary.On 18 Jan 2023, at 20:45, David Capwell  wrote:Been out, sorry for just catching up now…I feel this thread pidgin hold on the word Accord and ignored the fact we are dealing with this pain today with python/jvm dtest and trying to improve that would help the project…. We also have other related projects that we are developing in parallel to Cassandra such as Harry, and there is interest in exporting our utils + simulator for other projects to use…. We also depend on related projects such as JAMM which clog us from bumping JDK versions...Accord is just 1 example of a Cassandra dependency needed for a release… by only focusing on Accord and “should it be external” this thread is ignoring the pain we face today and how we could improve.We tried in-tree for in-jvm dtest and found that this broke every other commit… maintaining the APIs across all our supported branches was too hard to do and moving it outside of the tree helped make the upgrade tests more stable (there were breakage but less frequent)…. We currently have to release this for every patch, which has actually caused us to rely on class path ordering to have some branches fork the classes so they can avoid this….  We tried to do snapshot builds where the version contained the SHA, but this has the issue that snapshot builds “may” go away over time and made older SHAs no longer building… Jvm-dtest is in bad shape and really could benefit from us looking to improve this process…We break python-dtest when cross-cutting changes are added as CI is hard to do correctly or not supported (testing downstream users (our 4 supported branches) is rarely done).  We want to start using Harry as part of our test suite, so if a patch needs to change harry then what “should” we do? Do we block merging into Cassandra until we vote on a Harry release?Maybe we should be asking what capabilities we need and how to address each?  I believe Mick has focused on this capabilities conversation and feel its 100% the best route to do, we should be listing out what we need to do our work and if/how the different solutions address this.For me I need the following:* be able to make cross-cutting changes in 1 ticket** in my PR override CI to use my PRs for sub-projects* commits to Cassandra should be reproducible and buildable* downstream testing support… if we make a change to python-dtest or Harry we should know if this breaks Cassandra before merging and which supported branches* [nice to have] be able to work with all subprojects in one IDE and not have to switch between windows while making cross-cutting changes* [nice to have] commit understand dependencies and commits things in correct orderNow, for the “how”, I am open but see the two leading cases are: git submodule and script that mimics git submodules…. I have used other tools that boil down to fetching a list of repo/sha into specific directories and find them more annoying than git submodules…For me, both ways address my needs above; I can make cross cutting change with easy and could change CI to build my changes rather than the HEAD of a specific branch.To address Mick’s capabilities I think I saw the following (correct me if missing any): - you can no longer just `git clone …`  (and we clone automatically in a number of places)But submodules and script that no longer works, but we can make this less painful by enhancing build.xml to make sure it builds out the gate; we can’t see all the code on a fresh commit but we would still be buildable - same with `git pull …` (easy to be left with out-of-sync submodules)Correct, if you use submodules/script you have a text file saying what we “should” use, but this does not enforce actually using them… again we could make sure build.xml does the right thing, but this can be confusing for people who mainly build in IDE and don’t depend on build.xml until later in development… this is something we should think about…A project I am familiar with has their build auto-inject git hooks to make sure things “just work”, we may be able to solve this in a similar way? - permanence from a git SHA no longer existsWhy is this?  The SHA points to other SHAs, so it is still immutable.  If we claim that pointing to other SHAs doesn’t count then why do library versions?  Both are immutable snapshots of code at a specific point in time? - our releases get more complicated (our source tarballs are the asf releases)We don’t include our dependencies do we?  If so, then does it really?  If Accord is a library we use, why would we include it’s source in the build?  Isn’t it just another library from this 

Re: Intra-project dependencies

2023-01-18 Thread David Capwell
Been out, sorry for just catching up now…

I feel this thread pidgin hold on the word Accord and ignored the fact we are 
dealing with this pain today with python/jvm dtest and trying to improve that 
would help the project…. We also have other related projects that we are 
developing in parallel to Cassandra such as Harry, and there is interest in 
exporting our utils + simulator for other projects to use…. We also depend on 
related projects such as JAMM which clog us from bumping JDK versions...

Accord is just 1 example of a Cassandra dependency needed for a release… by 
only focusing on Accord and “should it be external” this thread is ignoring the 
pain we face today and how we could improve.

We tried in-tree for in-jvm dtest and found that this broke every other commit… 
maintaining the APIs across all our supported branches was too hard to do and 
moving it outside of the tree helped make the upgrade tests more stable (there 
were breakage but less frequent)…. We currently have to release this for every 
patch, which has actually caused us to rely on class path ordering to have some 
branches fork the classes so they can avoid this….  We tried to do snapshot 
builds where the version contained the SHA, but this has the issue that 
snapshot builds “may” go away over time and made older SHAs no longer building… 
Jvm-dtest is in bad shape and really could benefit from us looking to improve 
this process…

We break python-dtest when cross-cutting changes are added as CI is hard to do 
correctly or not supported (testing downstream users (our 4 supported branches) 
is rarely done).  

We want to start using Harry as part of our test suite, so if a patch needs to 
change harry then what “should” we do? Do we block merging into Cassandra until 
we vote on a Harry release?

Maybe we should be asking what capabilities we need and how to address each?  I 
believe Mick has focused on this capabilities conversation and feel its 100% 
the best route to do, we should be listing out what we need to do our work and 
if/how the different solutions address this.

For me I need the following:

* be able to make cross-cutting changes in 1 ticket
** in my PR override CI to use my PRs for sub-projects
* commits to Cassandra should be reproducible and buildable
* downstream testing support… if we make a change to python-dtest or Harry we 
should know if this breaks Cassandra before merging and which supported branches
* [nice to have] be able to work with all subprojects in one IDE and not have 
to switch between windows while making cross-cutting changes
* [nice to have] commit understand dependencies and commits things in correct 
order

Now, for the “how”, I am open but see the two leading cases are: git submodule 
and script that mimics git submodules…. I have used other tools that boil down 
to fetching a list of repo/sha into specific directories and find them more 
annoying than git submodules…

For me, both ways address my needs above; I can make cross cutting change with 
easy and could change CI to build my changes rather than the HEAD of a specific 
branch.

To address Mick’s capabilities I think I saw the following (correct me if 
missing any):

>  - you can no longer just `git clone …`  (and we clone automatically in a 
> number of places)

But submodules and script that no longer works, but we can make this less 
painful by enhancing build.xml to make sure it builds out the gate; we can’t 
see all the code on a fresh commit but we would still be buildable

>  - same with `git pull …` (easy to be left with out-of-sync submodules)

Correct, if you use submodules/script you have a text file saying what we 
“should” use, but this does not enforce actually using them… again we could 
make sure build.xml does the right thing, but this can be confusing for people 
who mainly build in IDE and don’t depend on build.xml until later in 
development… this is something we should think about…

A project I am familiar with has their build auto-inject git hooks to make sure 
things “just work”, we may be able to solve this in a similar way?

>  - permanence from a git SHA no longer exists

Why is this?  The SHA points to other SHAs, so it is still immutable.  If we 
claim that pointing to other SHAs doesn’t count then why do library versions?  
Both are immutable snapshots of code at a specific point in time?

>  - our releases get more complicated (our source tarballs are the asf 
> releases)

We don’t include our dependencies do we?  If so, then does it really?  If 
Accord is a library we use, why would we include it’s source in the build?  
Isn’t it just another library from this point of view?

>  - handling patches cover submodules

I don’t know what you mean by this, do you mean how do we submit cross-cutting 
patches?  How I do this in the cep-15-accord branch is by updating the pointer 
to point to my dependency PR, that way the build “does the right thing”, I just 
have to fix this up before merging into 

Re: Intra-project dependencies

2023-01-18 Thread Benedict
> Linking or merging while it is still also being a separate library and repo.

I am still unclear why you think this is “a significant thing”?

> On 18 Jan 2023, at 10:41, Mick Semb Wever  wrote:
> 
> 
> 
> 
>>> You would reference the snapshot dependency by the timestamped snapshot. 
>>> This makes it a reproducible build.
>> 
>> How confident are we that the repository will not alter or delete them?
> 
> 
> They cannot be altered.
> 
> I see artefacts there that are more than a decade old. But we cannot rely on 
> their permanence. 
> 
> Putting the SHA into the jar's manifest is easy.  And this blog post shows 
> how you can also expose this info on the command line: 
> https://medium.com/liveramp-engineering/identifying-maven-snapshot-artifacts-by-git-revision-15b860d6228b
>  
> 
> Given there's no guaranteed permanence to the snapshots, we would need to 
> have the git sha in the version, so if much older versions can't be 
> downloaded it can still be rebuilt.
> 
> This is done like: 1.0.0_${sha1}-SNAPSHOT
> 
>  
>>> linking in the source code into in-tree is a significant thing to do
>> 
>> Could you explain why? I thought your preferred alternative was merging the 
>> source trees permanently
> 
> 
> Linking or merging while it is still also being a separate library and repo.
> If we are really not that interested in it as a separate library, and dev 
> change is high, or the code is somewhere less accessible, then in tree makes 
> sense IMHO.
> 


Re: Intra-project dependencies

2023-01-18 Thread Mick Semb Wever
You would reference the snapshot dependency by the timestamped snapshot.
> This makes it a reproducible build.
>
>
> How confident are we that the repository will not alter or delete them?
>


They cannot be altered.

I see artefacts there that are more than a decade old. But we cannot rely
on their permanence.

Putting the SHA into the jar's manifest is easy.  And this blog post shows
how you can also expose this info on the command line:
https://medium.com/liveramp-engineering/identifying-maven-snapshot-artifacts-by-git-revision-15b860d6228b


Given there's no guaranteed permanence to the snapshots, we would need to
have the git sha in the version, so if much older versions can't be
downloaded it can still be rebuilt.

This is done like: 1.0.0_${sha1}-SNAPSHOT



> linking in the source code into in-tree is a significant thing to do
>
>
> Could you explain why? I thought your preferred alternative was merging
> the source trees permanently
>


Linking or merging while it is still also being a separate library and repo.
If we are really not that interested in it as a separate library, and dev
change is high, or the code is somewhere less accessible, then in tree
makes sense IMHO.


Re: Intra-project dependencies

2023-01-17 Thread Henrik Ingo
On Tue, Jan 17, 2023 at 11:40 PM Mick Semb Wever  wrote:

>
>> It introduces some overhead when bisecting to go from the snapshot's
> timestamp to the other repo's SHA (this is easily solvable by putting the
> SHA inside the jarfile).
>

Whatever system we choose, the link should be the SHA. It shouldn't be
necessary for a human to lookup the necessary parameters based on some
mapping to other parameters.

A basic first order requirement: Restarting an old build in Jenkins should
rerun the exact same version of all modules.

More complex requirement: Starting a Jenkins build from cassandra commit
123abc, should checkout/download/use the correct versions of  other modules
at the time 123abc was committed.

henrik


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

  
  


Re: Intra-project dependencies

2023-01-17 Thread Benedict
> You would reference the snapshot dependency by the timestamped snapshot. This 
> makes it a reproducible build.

How confident are we that the repository will not alter or delete them?

> linking in the source code into in-tree is a significant thing to do

Could you explain why? I thought your preferred alternative was merging the 
source trees permanently

> On 17 Jan 2023, at 21:40, Mick Semb Wever  wrote:
> 
> 
>> Regarding the use of snapshots, this isn’t impossible Henrik - I floated 
>> this as an option. But besides the additional overhead during development, 
>> this does not maintain reproducible builds, as the snapshot can change. 
> 
> You would reference the snapshot dependency by the timestamped snapshot. This 
> makes it a reproducible build.
> 
> We have done this with dtest-api already, and there's already a comment 
> explaining it:
> https://github.com/apache/cassandra/blob/trunk/.build/build-resolver.xml#L59-L60
>  
> 
> It introduces some overhead when bisecting to go from the snapshot's 
> timestamp to the other repo's SHA (this is easily solvable by putting the SHA 
> inside the jarfile).
> 
> I don't see the problem of letting trunk use snapshots during the annual 
> development cycle, if we accept the overhead of cutting all library releases 
> before we cut the first alpha/beta.
> 
> FTR, i'm sitting on the fence between this and submodules. There's many dev 
> tasks we do, and different approaches have different pain points. The amount 
> of dev happening in the library also matters. I also agree with Derek that 
> linking in the source code into in-tree is a significant thing to do, just to 
> avoid the rigamaroles of dependency management.
> 
> Josh, bundling releases gets tricky in that you need to include the library 
> sources, because the cassandra release is essentially being voted on (because 
> it has been built) with non-released dependencies.


Re: Intra-project dependencies

2023-01-17 Thread Josh McKenzie
> Josh, bundling releases gets tricky in that you need to include the library 
> sources, because the cassandra release is essentially being voted on (because 
> it has been built) with non-released dependencies.
Arguably, one shouldn't vote on a release of Accord unless there's something 
that's integrated it and shown it's working. Through that lens it doesn't make 
sense to release those dependencies w/out the parent, nor the parent without 
the dependency.

Not a hill I'm willing to die on but at least out of the gate, seems like a way 
we could streamline the process of cutting releases until someone / something 
external starts exerting influence on Accord.

On Tue, Jan 17, 2023, at 4:39 PM, Mick Semb Wever wrote:
>> Regarding the use of snapshots, this isn’t impossible Henrik - I floated 
>> this as an option. But besides the additional overhead during development, 
>> this does not maintain reproducible builds, as the snapshot can change. 
> 
> 
> You would reference the snapshot dependency by the timestamped snapshot. This 
> makes it a reproducible build.
> 
> We have done this with dtest-api already, and there's already a comment 
> explaining it:
> https://github.com/apache/cassandra/blob/trunk/.build/build-resolver.xml#L59-L60
>  
> 
> It introduces some overhead when bisecting to go from the snapshot's 
> timestamp to the other repo's SHA (this is easily solvable by putting the SHA 
> inside the jarfile).
> 
> I don't see the problem of letting trunk use snapshots during the annual 
> development cycle, if we accept the overhead of cutting all library releases 
> before we cut the first alpha/beta.
> 
> FTR, i'm sitting on the fence between this and submodules. There's many dev 
> tasks we do, and different approaches have different pain points. The amount 
> of dev happening in the library also matters. I also agree with Derek that 
> linking in the source code into in-tree is a significant thing to do, just to 
> avoid the rigamaroles of dependency management.
> 
> Josh, bundling releases gets tricky in that you need to include the library 
> sources, because the cassandra release is essentially being voted on (because 
> it has been built) with non-released dependencies.


Re: Intra-project dependencies

2023-01-17 Thread Mick Semb Wever
>
> Regarding the use of snapshots, this isn’t impossible Henrik - I floated
> this as an option. But besides the additional overhead during development,
> this does not maintain reproducible builds, as the snapshot can change.
>

You would reference the snapshot dependency by the timestamped snapshot.
This makes it a reproducible build.

We have done this with dtest-api already, and there's already a comment
explaining it:
https://github.com/apache/cassandra/blob/trunk/.build/build-resolver.xml#L59-L60


It introduces some overhead when bisecting to go from the snapshot's
timestamp to the other repo's SHA (this is easily solvable by putting the
SHA inside the jarfile).

I don't see the problem of letting trunk use snapshots during the annual
development cycle, if we accept the overhead of cutting all library
releases before we cut the first alpha/beta.

FTR, i'm sitting on the fence between this and submodules. There's many dev
tasks we do, and different approaches have different pain points. The
amount of dev happening in the library also matters. I also agree with
Derek that linking in the source code into in-tree is a significant thing
to do, just to avoid the rigamaroles of dependency management.

Josh, bundling releases gets tricky in that you need to include the library
sources, because the cassandra release is essentially being voted on
(because it has been built) with non-released dependencies.


Re: Intra-project dependencies

2023-01-17 Thread Benedict
I am certainly not proposing any certainty about outside interest, but I think as the only full implementation of a leaderless protocol in existence, as well as an open source pluggable distributed transaction protocol, the chance of some interest is not vanishingly small (once it is proven in Cassandra). Whether that will translate to any useful interest is even less certain, but I think it would be valuable if it transpires.It’s also not the only reason to develop this project as a library, and wasn’t one of the reasons given when we discussed this when the proposal was discussed initially. The main reasons were ensuring that the transaction system in Cassandra was not tied to Accord, and to allow better testing in isolation.Regarding the use of snapshots, this isn’t impossible Henrik - I floated this as an option. But besides the additional overhead during development, this does not maintain reproducible builds, as the snapshot can change. Perhaps we could introduce permanent snapshots for a given SHA, but at that point it seems to just make more sense to use submodules. Including snapshots inside the lib directory until release would seem to be fine, and then perform parallel release votes.On 17 Jan 2023, at 21:04, Henrik Ingo  wrote:Hi DerekSomewhat of a newcomer myself, it seems the answers to your excellent questions are: * We don't all agree with the premise that Accord will attract substantial outside interest. Even so, my personal opinion is that whether that happens or not, it's not wrong to aspire toward or plan for such a future. * Yes, just using Accord as a library dependency would be the normal thing to do, but that introduces a need to create Accord releases to match Cassandra releases. Since ASF mandates a 3 day voting process to release software artifacts, this creates a lot of bureaucratic overhead, which is why this otherwise sane alternative is nobody's favorite. (Cassandra releases cannot or should not depend on snapshot releases of libraries. * So we are discussing various alternatives that keep Accord separate, while at the same time recording some link about which exact version of Accord was checked out.henrikOn Tue, Jan 17, 2023 at 7:23 PM Derek Chen-Becker  wrote:Actually, re-reading the thread, I think I missed the initial point
brought up and got lost in the discussion specific to submodules. What
is the technical reason for bringing Accord in-tree? While I think
submodules are the best way to include source in-tree, I'm not sure
this is actually the correct thing to do in this case. Don't we
already have mechanisms to deal with snapshot versions of library
dependencies in the build? Do we need release votes for snapshots?

Thanks,

Derek

On Tue, Jan 17, 2023 at 9:25 AM Derek Chen-Becker  wrote:
>
> I'd like to go back to Benedict's initial point: if we have a new
> consensus protocol that other projects would potentially be interested
> in, then by all means it should be its own project. Let's start with
> that as a basis for discussion, because from my reading it seems like
> people might be disagreeing with that initial premise.
>
> If we agree that Accord should be independent, I'm +1 for git
> submodules primarily because that's a standard way of doing things and
> I don't think we need yet another bespoke solution to a problem that
> hundreds, if not thousands of other software projects encounter. I've
> worked with lots of projects using submodules and while they're not a
> panacea, they've never been a significant problem to work with.
>
> It's also a little confusing to see people argue about HEAD in
> relation to any of this, since that's just an alias to the latest
> commit for a given branch. In every project I've worked with that uses
> submodules you would never use HEAD, because the submodule itself
> already records the *exact* commit associated with the parent.
>
> Cheers,
>
> Derek
>
> On Tue, Jan 17, 2023 at 2:28 AM Benedict  wrote:
> >
> > The answer to all your questions is “like any other library” - this is a procedural hack to ease development. There are alternative isomorphic hacks, like compiling source jars from Accord and including them in the C* tree, if it helps your mental model.
> >
> > > you stated that a goal was to avoid maintaining multiple branches.
> >
> > No, I stated that a goal was to *decouple* development of Accord from C*. I don’t see why you would take that to mean there are no branches of Accord, as that would quite clearly be incompatible with the C* release strategy.
> >
> >
> >
> > On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
> >
> > 
> >>
> >> … extrapolating this experience to multiple C* versions
> >
> >
> > To include forward-merging, bisecting old history, etc etc. that's a leap of faith that I believe deserves the discussion.
> >
> >> - patches are off submodule SHAs, not the submodule's HEAD,
> >>
> >>
> >> A SHA would point to the 

Re: Intra-project dependencies

2023-01-17 Thread Josh McKenzie
Is there any reason we couldn't "bundle" a release vote to include both an 
Accord release and ASF C* in one voting round as a combined release? My reading 
of the release process w/the ASF doesn't speak to that (if anything it implies 
this might be a valid approach):

https://www.apache.org/legal/release-policy.html#release-approval

> Every ASF release MUST contain one or more source packages,

On Tue, Jan 17, 2023, at 4:03 PM, Henrik Ingo wrote:
> Hi Derek
> 
> Somewhat of a newcomer myself, it seems the answers to your excellent 
> questions are:
> 
>  * We don't all agree with the premise that Accord will attract substantial 
> outside interest. Even so, my personal opinion is that whether that happens 
> or not, it's not wrong to aspire toward or plan for such a future.
> 
>  * Yes, just using Accord as a library dependency would be the normal thing 
> to do, but that introduces a need to create Accord releases to match 
> Cassandra releases. Since ASF mandates a 3 day voting process to release 
> software artifacts, this creates a lot of bureaucratic overhead, which is why 
> this otherwise sane alternative is nobody's favorite. (Cassandra releases 
> cannot or should not depend on snapshot releases of libraries.
> 
>  * So we are discussing various alternatives that keep Accord separate, while 
> at the same time recording some link about which exact version of Accord was 
> checked out.
> 
> henrik
> 
> On Tue, Jan 17, 2023 at 7:23 PM Derek Chen-Becker  
> wrote:
>> Actually, re-reading the thread, I think I missed the initial point
>> brought up and got lost in the discussion specific to submodules. What
>> is the technical reason for bringing Accord in-tree? While I think
>> submodules are the best way to include source in-tree, I'm not sure
>> this is actually the correct thing to do in this case. Don't we
>> already have mechanisms to deal with snapshot versions of library
>> dependencies in the build? Do we need release votes for snapshots?
>> 
>> Thanks,
>> 
>> Derek
>> 
>> On Tue, Jan 17, 2023 at 9:25 AM Derek Chen-Becker  
>> wrote:
>> >
>> > I'd like to go back to Benedict's initial point: if we have a new
>> > consensus protocol that other projects would potentially be interested
>> > in, then by all means it should be its own project. Let's start with
>> > that as a basis for discussion, because from my reading it seems like
>> > people might be disagreeing with that initial premise.
>> >
>> > If we agree that Accord should be independent, I'm +1 for git
>> > submodules primarily because that's a standard way of doing things and
>> > I don't think we need yet another bespoke solution to a problem that
>> > hundreds, if not thousands of other software projects encounter. I've
>> > worked with lots of projects using submodules and while they're not a
>> > panacea, they've never been a significant problem to work with.
>> >
>> > It's also a little confusing to see people argue about HEAD in
>> > relation to any of this, since that's just an alias to the latest
>> > commit for a given branch. In every project I've worked with that uses
>> > submodules you would never use HEAD, because the submodule itself
>> > already records the *exact* commit associated with the parent.
>> >
>> > Cheers,
>> >
>> > Derek
>> >
>> > On Tue, Jan 17, 2023 at 2:28 AM Benedict  wrote:
>> > >
>> > > The answer to all your questions is “like any other library” - this is a 
>> > > procedural hack to ease development. There are alternative isomorphic 
>> > > hacks, like compiling source jars from Accord and including them in the 
>> > > C* tree, if it helps your mental model.
>> > >
>> > > > you stated that a goal was to avoid maintaining multiple branches.
>> > >
>> > > No, I stated that a goal was to *decouple* development of Accord from 
>> > > C*. I don’t see why you would take that to mean there are no branches of 
>> > > Accord, as that would quite clearly be incompatible with the C* release 
>> > > strategy.
>> > >
>> > >
>> > >
>> > > On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
>> > >
>> > > 
>> > >>
>> > >> … extrapolating this experience to multiple C* versions
>> > >
>> > >
>> > > To include forward-merging, bisecting old history, etc etc. that's a 
>> > > leap of faith that I believe deserves the discussion.
>> > >
>> > >> - patches are off submodule SHAs, not the submodule's HEAD,
>> > >>
>> > >>
>> > >> A SHA would point to the HEAD of a given branch, at the time of merge, 
>> > >> just by SHA? I’ve no idea what you imagine here, but this just ensures 
>> > >> that a given SHA of the importing project continues to compile 
>> > >> correctly when it is no longer HEAD. It does not mean there’s no HEAD 
>> > >> that corresponds directly to the SHA of the importing project’s HEAD.
>> > >
>> > >
>> > >
>> > > That wasn't my concern. Rather that you need to know in advance when the 
>> > > SHA is not HEAD. You can't commit off a past SHA. Once you find out (and 
>> > > how does this happen?) that 

Re: Intra-project dependencies

2023-01-17 Thread Henrik Ingo
Hi Derek

Somewhat of a newcomer myself, it seems the answers to your excellent
questions are:

 * We don't all agree with the premise that Accord will attract substantial
outside interest. Even so, my personal opinion is that whether that happens
or not, it's not wrong to aspire toward or plan for such a future.

 * Yes, just using Accord as a library dependency would be the normal thing
to do, but that introduces a need to create Accord releases to match
Cassandra releases. Since ASF mandates a 3 day voting process to release
software artifacts, this creates a lot of bureaucratic overhead, which is
why this otherwise sane alternative is nobody's favorite. (Cassandra
releases cannot or should not depend on snapshot releases of libraries.

 * So we are discussing various alternatives that keep Accord separate,
while at the same time recording some link about which exact version of
Accord was checked out.

henrik

On Tue, Jan 17, 2023 at 7:23 PM Derek Chen-Becker 
wrote:

> Actually, re-reading the thread, I think I missed the initial point
> brought up and got lost in the discussion specific to submodules. What
> is the technical reason for bringing Accord in-tree? While I think
> submodules are the best way to include source in-tree, I'm not sure
> this is actually the correct thing to do in this case. Don't we
> already have mechanisms to deal with snapshot versions of library
> dependencies in the build? Do we need release votes for snapshots?
>
> Thanks,
>
> Derek
>
> On Tue, Jan 17, 2023 at 9:25 AM Derek Chen-Becker 
> wrote:
> >
> > I'd like to go back to Benedict's initial point: if we have a new
> > consensus protocol that other projects would potentially be interested
> > in, then by all means it should be its own project. Let's start with
> > that as a basis for discussion, because from my reading it seems like
> > people might be disagreeing with that initial premise.
> >
> > If we agree that Accord should be independent, I'm +1 for git
> > submodules primarily because that's a standard way of doing things and
> > I don't think we need yet another bespoke solution to a problem that
> > hundreds, if not thousands of other software projects encounter. I've
> > worked with lots of projects using submodules and while they're not a
> > panacea, they've never been a significant problem to work with.
> >
> > It's also a little confusing to see people argue about HEAD in
> > relation to any of this, since that's just an alias to the latest
> > commit for a given branch. In every project I've worked with that uses
> > submodules you would never use HEAD, because the submodule itself
> > already records the *exact* commit associated with the parent.
> >
> > Cheers,
> >
> > Derek
> >
> > On Tue, Jan 17, 2023 at 2:28 AM Benedict  wrote:
> > >
> > > The answer to all your questions is “like any other library” - this is
> a procedural hack to ease development. There are alternative isomorphic
> hacks, like compiling source jars from Accord and including them in the C*
> tree, if it helps your mental model.
> > >
> > > > you stated that a goal was to avoid maintaining multiple branches.
> > >
> > > No, I stated that a goal was to *decouple* development of Accord from
> C*. I don’t see why you would take that to mean there are no branches of
> Accord, as that would quite clearly be incompatible with the C* release
> strategy.
> > >
> > >
> > >
> > > On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
> > >
> > > 
> > >>
> > >> … extrapolating this experience to multiple C* versions
> > >
> > >
> > > To include forward-merging, bisecting old history, etc etc. that's a
> leap of faith that I believe deserves the discussion.
> > >
> > >> - patches are off submodule SHAs, not the submodule's HEAD,
> > >>
> > >>
> > >> A SHA would point to the HEAD of a given branch, at the time of
> merge, just by SHA? I’ve no idea what you imagine here, but this just
> ensures that a given SHA of the importing project continues to compile
> correctly when it is no longer HEAD. It does not mean there’s no HEAD that
> corresponds directly to the SHA of the importing project’s HEAD.
> > >
> > >
> > >
> > > That wasn't my concern. Rather that you need to know in advance when
> the SHA is not HEAD. You can't commit off a past SHA. Once you find out
> (and how does this happen?) that the submodule code is not HEAD what do you
> then do? What if fast-forwarding the submodule to HEAD's SHA breaks things,
> do you now have to fix that or introduce branching in the submodule? If the
> submodule doesn't have releases, is it doing versioning, and if not how are
> branches distinguished?
> > >
> > > Arn't these all fair enquiries to raise?
> > >
> > >> - you need to be making commits to all branches (and forward merging)
> anyway to update submodule SHAs,
> > >>
> > >>
> > >> Exactly as you would any library upgrade?
> > >
> > >
> > >
> > > Correct. submodules does not solve/remove the need to commit to
> multiple branches and forward merge.
> 

Re: Intra-project dependencies

2023-01-17 Thread Derek Chen-Becker
Actually, re-reading the thread, I think I missed the initial point
brought up and got lost in the discussion specific to submodules. What
is the technical reason for bringing Accord in-tree? While I think
submodules are the best way to include source in-tree, I'm not sure
this is actually the correct thing to do in this case. Don't we
already have mechanisms to deal with snapshot versions of library
dependencies in the build? Do we need release votes for snapshots?

Thanks,

Derek

On Tue, Jan 17, 2023 at 9:25 AM Derek Chen-Becker  wrote:
>
> I'd like to go back to Benedict's initial point: if we have a new
> consensus protocol that other projects would potentially be interested
> in, then by all means it should be its own project. Let's start with
> that as a basis for discussion, because from my reading it seems like
> people might be disagreeing with that initial premise.
>
> If we agree that Accord should be independent, I'm +1 for git
> submodules primarily because that's a standard way of doing things and
> I don't think we need yet another bespoke solution to a problem that
> hundreds, if not thousands of other software projects encounter. I've
> worked with lots of projects using submodules and while they're not a
> panacea, they've never been a significant problem to work with.
>
> It's also a little confusing to see people argue about HEAD in
> relation to any of this, since that's just an alias to the latest
> commit for a given branch. In every project I've worked with that uses
> submodules you would never use HEAD, because the submodule itself
> already records the *exact* commit associated with the parent.
>
> Cheers,
>
> Derek
>
> On Tue, Jan 17, 2023 at 2:28 AM Benedict  wrote:
> >
> > The answer to all your questions is “like any other library” - this is a 
> > procedural hack to ease development. There are alternative isomorphic 
> > hacks, like compiling source jars from Accord and including them in the C* 
> > tree, if it helps your mental model.
> >
> > > you stated that a goal was to avoid maintaining multiple branches.
> >
> > No, I stated that a goal was to *decouple* development of Accord from C*. I 
> > don’t see why you would take that to mean there are no branches of Accord, 
> > as that would quite clearly be incompatible with the C* release strategy.
> >
> >
> >
> > On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
> >
> > 
> >>
> >> … extrapolating this experience to multiple C* versions
> >
> >
> > To include forward-merging, bisecting old history, etc etc. that's a leap 
> > of faith that I believe deserves the discussion.
> >
> >> - patches are off submodule SHAs, not the submodule's HEAD,
> >>
> >>
> >> A SHA would point to the HEAD of a given branch, at the time of merge, 
> >> just by SHA? I’ve no idea what you imagine here, but this just ensures 
> >> that a given SHA of the importing project continues to compile correctly 
> >> when it is no longer HEAD. It does not mean there’s no HEAD that 
> >> corresponds directly to the SHA of the importing project’s HEAD.
> >
> >
> >
> > That wasn't my concern. Rather that you need to know in advance when the 
> > SHA is not HEAD. You can't commit off a past SHA. Once you find out (and 
> > how does this happen?) that the submodule code is not HEAD what do you then 
> > do? What if fast-forwarding the submodule to HEAD's SHA breaks things, do 
> > you now have to fix that or introduce branching in the submodule? If the 
> > submodule doesn't have releases, is it doing versioning, and if not how are 
> > branches distinguished?
> >
> > Arn't these all fair enquiries to raise?
> >
> >> - you need to be making commits to all branches (and forward merging) 
> >> anyway to update submodule SHAs,
> >>
> >>
> >> Exactly as you would any library upgrade?
> >
> >
> >
> > Correct. submodules does not solve/remove the need to commit to multiple 
> > branches and forward merge.
> > Furthermore submodules means at least one additional commit, and possibly 
> > twice as many commits.
> >
> >
> >> - if development is active on trunk, and then you need an update on an 
> >> older branch, you have to accommodate to backporting all those trunk 
> >> changes (or introduce the same branching in the submodule),
> >>
> >>
> >> If you do feature development against Accord then you will obviously 
> >> branch it? You would only make bug fixes to a bug fix branch. I’m not sure 
> >> what you think is wrong here.
> >
> >
> >
> > That's not obvious, you stated that a goal was to avoid maintaining 
> > multiple branches. Sure there's benefits to a lazy branching approach, but 
> > it contradicts your initial motivations and introduces methodology changes 
> > that are worth pointing out. What happens when there are multiple consumers 
> > of Accord, and (like the situation we face with jamm) its HEAD is well in 
> > front of anything C* is using.
> >
> > As Henrik states, the underlying problem doesn't change, we're just 
> > choosing between 

Re: Intra-project dependencies

2023-01-17 Thread Derek Chen-Becker
I'd like to go back to Benedict's initial point: if we have a new
consensus protocol that other projects would potentially be interested
in, then by all means it should be its own project. Let's start with
that as a basis for discussion, because from my reading it seems like
people might be disagreeing with that initial premise.

If we agree that Accord should be independent, I'm +1 for git
submodules primarily because that's a standard way of doing things and
I don't think we need yet another bespoke solution to a problem that
hundreds, if not thousands of other software projects encounter. I've
worked with lots of projects using submodules and while they're not a
panacea, they've never been a significant problem to work with.

It's also a little confusing to see people argue about HEAD in
relation to any of this, since that's just an alias to the latest
commit for a given branch. In every project I've worked with that uses
submodules you would never use HEAD, because the submodule itself
already records the *exact* commit associated with the parent.

Cheers,

Derek

On Tue, Jan 17, 2023 at 2:28 AM Benedict  wrote:
>
> The answer to all your questions is “like any other library” - this is a 
> procedural hack to ease development. There are alternative isomorphic hacks, 
> like compiling source jars from Accord and including them in the C* tree, if 
> it helps your mental model.
>
> > you stated that a goal was to avoid maintaining multiple branches.
>
> No, I stated that a goal was to *decouple* development of Accord from C*. I 
> don’t see why you would take that to mean there are no branches of Accord, as 
> that would quite clearly be incompatible with the C* release strategy.
>
>
>
> On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
>
> 
>>
>> … extrapolating this experience to multiple C* versions
>
>
> To include forward-merging, bisecting old history, etc etc. that's a leap of 
> faith that I believe deserves the discussion.
>
>> - patches are off submodule SHAs, not the submodule's HEAD,
>>
>>
>> A SHA would point to the HEAD of a given branch, at the time of merge, just 
>> by SHA? I’ve no idea what you imagine here, but this just ensures that a 
>> given SHA of the importing project continues to compile correctly when it is 
>> no longer HEAD. It does not mean there’s no HEAD that corresponds directly 
>> to the SHA of the importing project’s HEAD.
>
>
>
> That wasn't my concern. Rather that you need to know in advance when the SHA 
> is not HEAD. You can't commit off a past SHA. Once you find out (and how does 
> this happen?) that the submodule code is not HEAD what do you then do? What 
> if fast-forwarding the submodule to HEAD's SHA breaks things, do you now have 
> to fix that or introduce branching in the submodule? If the submodule doesn't 
> have releases, is it doing versioning, and if not how are branches 
> distinguished?
>
> Arn't these all fair enquiries to raise?
>
>> - you need to be making commits to all branches (and forward merging) anyway 
>> to update submodule SHAs,
>>
>>
>> Exactly as you would any library upgrade?
>
>
>
> Correct. submodules does not solve/remove the need to commit to multiple 
> branches and forward merge.
> Furthermore submodules means at least one additional commit, and possibly 
> twice as many commits.
>
>
>> - if development is active on trunk, and then you need an update on an older 
>> branch, you have to accommodate to backporting all those trunk changes (or 
>> introduce the same branching in the submodule),
>>
>>
>> If you do feature development against Accord then you will obviously branch 
>> it? You would only make bug fixes to a bug fix branch. I’m not sure what you 
>> think is wrong here.
>
>
>
> That's not obvious, you stated that a goal was to avoid maintaining multiple 
> branches. Sure there's benefits to a lazy branching approach, but it 
> contradicts your initial motivations and introduces methodology changes that 
> are worth pointing out. What happens when there are multiple consumers of 
> Accord, and (like the situation we face with jamm) its HEAD is well in front 
> of anything C* is using.
>
> As Henrik states, the underlying problem doesn't change, we're just choosing 
> between trade-offs. My concern is that we're not even doing a very good job 
> of choosing between the trade-offs. Based on past experiences with 
> submodules: that started with great excitement and led to tears and 
> frustration after a few years; I'm only pushing for a more thorough 
> discussion and proposal.
>
>
>
>


-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: Intra-project dependencies

2023-01-17 Thread Benedict
The answer to all your questions is “like any other library” - this is a 
procedural hack to ease development. There are alternative isomorphic hacks, 
like compiling source jars from Accord and including them in the C* tree, if it 
helps your mental model.

> you stated that a goal was to avoid maintaining multiple branches.

No, I stated that a goal was to *decouple* development of Accord from C*. I 
don’t see why you would take that to mean there are no branches of Accord, as 
that would quite clearly be incompatible with the C* release strategy.



> On 17 Jan 2023, at 07:36, Mick Semb Wever  wrote:
> 
> 
>>> … extrapolating this experience to multiple C* versions
> 
> To include forward-merging, bisecting old history, etc etc. that's a leap of 
> faith that I believe deserves the discussion.
> 
>>> - patches are off submodule SHAs, not the submodule's HEAD,
>> 
>> A SHA would point to the HEAD of a given branch, at the time of merge, just 
>> by SHA? I’ve no idea what you imagine here, but this just ensures that a 
>> given SHA of the importing project continues to compile correctly when it is 
>> no longer HEAD. It does not mean there’s no HEAD that corresponds directly 
>> to the SHA of the importing project’s HEAD.
> 
> 
> That wasn't my concern. Rather that you need to know in advance when the SHA 
> is not HEAD. You can't commit off a past SHA. Once you find out (and how does 
> this happen?) that the submodule code is not HEAD what do you then do? What 
> if fast-forwarding the submodule to HEAD's SHA breaks things, do you now have 
> to fix that or introduce branching in the submodule? If the submodule doesn't 
> have releases, is it doing versioning, and if not how are branches 
> distinguished? 
> 
> Arn't these all fair enquiries to raise? 
> 
>>> - you need to be making commits to all branches (and forward merging) 
>>> anyway to update submodule SHAs,
>> 
>> Exactly as you would any library upgrade?
> 
> 
> Correct. submodules does not solve/remove the need to commit to multiple 
> branches and forward merge.
> Furthermore submodules means at least one additional commit, and possibly 
> twice as many commits.
>  
> 
>>> - if development is active on trunk, and then you need an update on an 
>>> older branch, you have to accommodate to backporting all those trunk 
>>> changes (or introduce the same branching in the submodule),
>> 
>> If you do feature development against Accord then you will obviously branch 
>> it? You would only make bug fixes to a bug fix branch. I’m not sure what you 
>> think is wrong here.
> 
> 
> That's not obvious, you stated that a goal was to avoid maintaining multiple 
> branches. Sure there's benefits to a lazy branching approach, but it 
> contradicts your initial motivations and introduces methodology changes that 
> are worth pointing out. What happens when there are multiple consumers of 
> Accord, and (like the situation we face with jamm) its HEAD is well in front 
> of anything C* is using.
> 
> As Henrik states, the underlying problem doesn't change, we're just choosing 
> between trade-offs. My concern is that we're not even doing a very good job 
> of choosing between the trade-offs. Based on past experiences with 
> submodules: that started with great excitement and led to tears and 
> frustration after a few years; I'm only pushing for a more thorough 
> discussion and proposal.
> 
>  
> 


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>
> … extrapolating this experience to multiple C* versions
>
>
To include forward-merging, bisecting old history, etc etc. that's a leap
of faith that I believe deserves the discussion.

- patches are off submodule SHAs, not the submodule's HEAD,
>
>
> A SHA would point to the HEAD of a given branch, at the time of merge,
> just by SHA? I’ve no idea what you imagine here, but this just ensures that
> a given SHA of the importing project continues to compile correctly when it
> is no longer HEAD. It does not mean there’s no HEAD that corresponds
> directly to the SHA of the importing project’s HEAD.
>


That wasn't my concern. Rather that you need to know in advance when the
SHA is not HEAD. You can't commit off a past SHA. Once you find out (and
how does this happen?) that the submodule code is not HEAD what do you then
do? What if fast-forwarding the submodule to HEAD's SHA breaks things, do
you now have to fix that or introduce branching in the submodule? If the
submodule doesn't have releases, is it doing versioning, and if not how are
branches distinguished?

Arn't these all fair enquiries to raise?

- you need to be making commits to all branches (and forward merging)
> anyway to update submodule SHAs,
>
>
> Exactly as you would any library upgrade?
>


Correct. submodules does not solve/remove the need to commit to multiple
branches and forward merge.
Furthermore submodules means at least one additional commit, and possibly
twice as many commits.


- if development is active on trunk, and then you need an update on an
> older branch, you have to accommodate to backporting all those trunk
> changes (or introduce the same branching in the submodule),
>
>
> If you do feature development against Accord then you will obviously
> branch it? You would only make bug fixes to a bug fix branch. I’m not sure
> what you think is wrong here.
>


That's not obvious, you stated that a goal was to avoid maintaining
multiple branches. Sure there's benefits to a lazy branching approach, but
it contradicts your initial motivations and introduces methodology changes
that are worth pointing out. What happens when there are multiple consumers
of Accord, and (like the situation we face with jamm) its HEAD is well in
front of anything C* is using.

As Henrik states, the underlying problem doesn't change, we're just
choosing between trade-offs. My concern is that we're not even doing a very
good job of choosing between the trade-offs. Based on past experiences with
submodules: that started with great excitement and led to tears and
frustration after a few years; I'm only pushing for a more thorough
discussion and proposal.


Re: Intra-project dependencies

2023-01-16 Thread Benedict
 Benedict, experience based on developing one feature against one branch doesn't face the problems of working, and switching frequently, between branches.Mick, please take a look at the ongoing development. Over the last week I have been actively developing five separate PRs against each repository at once (ten in total), with not insignificant changes between them. I am quite experienced with actively developing against multiple branches, and of extrapolating this experience to multiple C* versions, and your hypothetical concerns do not invalidate that experience.- patches are off submodule SHAs, not the submodule's HEAD,A SHA would point to the HEAD of a given branch, at the time of merge, just by SHA? I’ve no idea what you imagine here, but this just ensures that a given SHA of the importing project continues to compile correctly when it is no longer HEAD. It does not mean there’s no HEAD that corresponds directly to the SHA of the importing project’s HEAD.- you need to be making commits to all branches (and forward merging) anyway to update submodule SHAs,Exactly as you would any library upgrade?- if development is active on trunk, and then you need an update on an older branch, you have to accommodate to backporting all those trunk changes (or introduce the same branching in the submodule),If you do feature development against Accord then you will obviously branch it? You would only make bug fixes to a bug fix branch. I’m not sure what you think is wrong here.On 16 Jan 2023, at 19:52, Mick Semb Wever  wrote: - permanence from a git SHA no longer existsWith the caveat that I haven't worked w/submodules before and only know about them from a cursory search, it looks like git-submodule status would show us the sha for submodules and …That isn't one SHA, but a collection of SHAs.I'm thinking about reproducible builds, switching between branches, and git bisecting, this stuff needs to just work. A build that fails fast if a submodule is not on a specific SHA helps but introduces more problems. we could have parent projects reference specific shas to pull for submodules to build? https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203Yes, we can enforce a 1:1 relationship from parent SHA to submodule SHAs, but then what's the point: you have both the headache of submodules and having to always commit to multiple branches and forward merge.That is, with fixed parent-to-submodule SHA relationships, these new challenges are introduced: - patches are off submodule SHAs, not the submodule's HEAD,- you need to be making commits to all branches (and forward merging) anyway to update submodule SHAs,- if development is active on trunk, and then you need an update on an older branch, you have to accommodate to backporting all those trunk changes (or introduce the same branching in the submodule),IMHO submodules are just trading one set of problems for another. And overall life is simpler if we reduce the cognitive burden to just what we have today: forward merging.Benedict, experience based on developing one feature against one branch doesn't face the problems of working, and switching frequently, between branches.The problem of wanting an external repository for these libraries to promote external non-cassandra consumers I would solve by exporting the code out of cassandra (not trying to import it). Git history is easy to keep/replicate. We were talking about doing this with the jamm library, given its primary development is currently with C* but we want it to appear as a standalone library (/github codebase).


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>  - permanence from a git SHA no longer exists
>
> With the caveat that I haven't worked w/submodules before and only know
> about them from a cursory search, it looks like git-submodule status would
> show us the sha for submodules and …
>


That isn't one SHA, but a collection of SHAs.

I'm thinking about reproducible builds, switching between branches, and git
bisecting, this stuff needs to just work. A build that fails fast if a
submodule is not on a specific SHA helps but introduces more problems.



> we could have parent projects reference specific shas to pull for
> submodules to build?
> https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203
> 
>


Yes, we can enforce a 1:1 relationship from parent SHA to submodule SHAs,
but then what's the point: you have both the headache of submodules and
having to always commit to multiple branches and forward merge.

That is, with fixed parent-to-submodule SHA relationships, these new
challenges are introduced:
- patches are off submodule SHAs, not the submodule's HEAD,
- you need to be making commits to all branches (and forward merging)
anyway to update submodule SHAs,
- if development is active on trunk, and then you need an update on an
older branch, you have to accommodate to backporting all those trunk
changes (or introduce the same branching in the submodule),

IMHO submodules are just trading one set of problems for another. And
overall life is simpler if we reduce the cognitive burden to just what we
have today: forward merging.

Benedict, experience based on developing one feature against one branch
doesn't face the problems of working, and switching frequently, between
branches.

The problem of wanting an external repository for these libraries to
promote external non-cassandra consumers I would solve by exporting the
code out of cassandra (not trying to import it). Git history is easy to
keep/replicate. We were talking about doing this with the jamm library,
given its primary development is currently with C* but we want it to appear
as a standalone library (/github codebase).


Re: Intra-project dependencies

2023-01-16 Thread Benedict
We have a build script that is invoked by ant to grab a specific SHA (or HEAD of a branch). We were previously just grabbing HEAD but this has the problems mentioned elsewhere in the thread, amongst others. I don’t think it probably matters much if we use a build script or submodules.I am driven in part by wanting to maintain the library status and not wanting to discard the work done to maintain this, but no less also by my expectation that tying Accord to C* version would entail additional maintenance burden (that might in the near term perhaps fall predominantly on me).I could be wrong in this prediction of course, but it seems to be a one-sided trade. I don’t think there‘s much extra work with separate repositories even in the worst case of a 1:1 mapping, and we can more easily reverse this decision if there’s no external interest and we really are just 1:1 for several releases.That said, clearly we don’t want to pursue this approach for every subsystem. So perhaps one of the decisive reasons is indeed the broader utility, but the fact the library is fully decoupled is by itself a strong reason IMO.I guess an interesting thought exercise to validate this is what other idealised subsystems I might want to apply this approach to. I’ll ponder that.On 16 Jan 2023, at 18:32, Henrik Ingo  wrote:Hi BenedictAt least for my part, again, I'm not (yet) trying to argue for or  against a particular alternative. So I think you'll find that if you allow a few more iterations of discussion, we can gravitate to some good consensus. Or failing that, we can at least gravitate around a small number of alternatives and then argue about those :-D It seems also in your email, the strongest argument for keeping a separate library, is your desire or expectation that Accord would attract significant 3rd party interest. And - this is btw also some advice Magnus Carlsen would give - your main argument therefore is, if we expect we need to make a specific move in the future, it's usually best to just do it immediately.I didn't write in my previous email, but I did have in mind that one drawback with the proposal of later extracting Accord out of Cassandra into its own repository would be to lose the history of commits. (At least without significant effort to keep/recreate the history.) For example, there could be commits in the Accord history that also edit files in Cassandra. So yes, I agree that if this is a major goal, then keeping Accord development in its own repository is the right choice.This then leads to the question should the link from Cassandra to Accord be via git sub-modules or via some bash code in the build system. I now remember something that was a major problem for years in the MongoDB CI system, and I believe this is also a problem with our dtests? That the nightly CI system would just check out HEAD of each module, and then compile them and run tests. This had the problem that it was impossible to return to a specific failure, say, a week later, and expect to rebuild and retest the same combination, because the system would just check out and build whatever the HEAD was at that date. (The only way to test  the actual SHA you had been bisecting or patching was to submit it as a patch to the CI system. So if a test setup had 5 sub modules, and you were fixing a bug in one of them, you had to "patch" the 4 other ones too, simply because otherwise the CI system wouldn't check out the right position in their history.)So, whatever method we choose, it's important that our CI system and other tools can know and track the correct and current SHA for each sub-module. Presumably git sub-modules actually are the best answer to this need. How have you dealt with this in Accord so far?One point: I wouldn't directly compare dtest and Accord though. For a test framework, it's the dtest framework that is consuming a Cassandra version, while for Accord it's Cassandra that depends on a specific Accord version. Because of this, the same solution may or may not be right for both of them.henrikOn Mon, Jan 16, 2023 at 6:44 PM Benedict  wrote:How often have we modified Paxos? There are currently no proposals to develop Accord further after the initial release. So I think it is very likely that Accord development will decouple from Cassandra version, unless there is significant external interest that drives it.Furthermore, the idea of revisiting this later is problematic. We can’t easily decouple Accord if it becomes tightly coupled with Cassandra, which becomes quite likely when the builds are co-dependent. We have spent great effort developing them separately to avoid this.You can’t go back later and recover lost interest. How many projects have adopted ZAB, versus Raft?None of this also addresses the wider need for reform of our approach here, for both the dtest-api and the simulator.I’m still not clear on the concrete downsides of maintaining a separate tree here? Could somebody explain what they expect to 

Re: Intra-project dependencies

2023-01-16 Thread Henrik Ingo
Hi Benedict

At least for my part, again, I'm not (yet) trying to argue for or  against
a particular alternative. So I think you'll find that if you allow a few
more iterations of discussion, we can gravitate to some good consensus. Or
failing that, we can at least gravitate around a small number of
alternatives and then argue about those :-D

It seems also in your email, the strongest argument for keeping a separate
library, is your desire or expectation that Accord would attract
significant 3rd party interest. And - this is btw also some advice Magnus
Carlsen would give - your main argument therefore is, if we expect we need
to make a specific move in the future, it's usually best to just do it
immediately.

I didn't write in my previous email, but I did have in mind that one
drawback with the proposal of later extracting Accord out of Cassandra into
its own repository would be to lose the history of commits. (At least
without significant effort to keep/recreate the history.) For example,
there could be commits in the Accord history that also edit files in
Cassandra. So yes, I agree that if this is a major goal, then keeping
Accord development in its own repository is the right choice.

This then leads to the question should the link from Cassandra to Accord be
via git sub-modules or via some bash code in the build system. I now
remember something that was a major problem for years in the MongoDB CI
system, and I believe this is also a problem with our dtests? That the
nightly CI system would just check out HEAD of each module, and then
compile them and run tests. This had the problem that it was impossible to
return to a specific failure, say, a week later, and expect to rebuild and
retest the same combination, because the system would just check out and
build whatever the HEAD was at that date. (The only way to test  the actual
SHA you had been bisecting or patching was to submit it as a patch to the
CI system. So if a test setup had 5 sub modules, and you were fixing a bug
in one of them, you had to "patch" the 4 other ones too, simply because
otherwise the CI system wouldn't check out the right position in their
history.)

So, whatever method we choose, it's important that our CI system and other
tools can know and track the correct and current SHA for each sub-module.
Presumably git sub-modules actually are the best answer to this need. How
have you dealt with this in Accord so far?


One point: I wouldn't directly compare dtest and Accord though. For a test
framework, it's the dtest framework that is consuming a Cassandra version,
while for Accord it's Cassandra that depends on a specific Accord version.
Because of this, the same solution may or may not be right for both of them.

henrik

On Mon, Jan 16, 2023 at 6:44 PM Benedict  wrote:

> How often have we modified Paxos?
>
> There are currently no proposals to develop Accord further after the
> initial release. So I think it is very likely that Accord development will
> decouple from Cassandra version, unless there is significant external
> interest that drives it.
>
> Furthermore, the idea of revisiting this later is problematic. We can’t
> easily decouple Accord if it becomes tightly coupled with Cassandra, which
> becomes quite likely when the builds are co-dependent. We have spent great
> effort developing them separately to avoid this.
>
> You can’t go back later and recover lost interest. How many projects have
> adopted ZAB, versus Raft?
>
> None of this also addresses the wider need for reform of our approach
> here, for both the dtest-api and the simulator.
>
> I’m still not clear on the concrete downsides of maintaining a separate
> tree here? Could somebody explain what they expect to go wrong? I respond
> to Mick’s points below, as I do not recognise them from our experience.
> We’ve been doing this for a year without incident.
>
> I will note we explicitly voted to develop Accord as a standalone library
> as part of the original CEP, and this was debated quite extensively, so to
> change that will require a new dedicated DISCUSS thread and vote.
>
>  - you can no longer just `git clone …`  (and we clone automatically in a
>> number of places)
>>
>> Yes you can, if your build script updates the sub modules like we have
> been doing.
>
>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>
>> Yes you can, again for the same reason. This is no different to ensuring
> your libraries are in sync, which must be done on every pull or checkout.
>
>  - permanence from a git SHA no longer exists
>>
>> It is intact, if you link to a SHA.
>
>  - our releases get more complicated (our source tarballs are the asf
>> releases)
>>
>> How?
>
>  - handling patches cover submodules
>>
>> How is this different to patches affecting multiple versions in C*?
>
>  - switching branches, and using git worktrees, during dv
>>
>> Elaborate? I don’t see any problem, but I might be missing something.
>
> On 16 Jan 2023, at 16:11, Henrik 

Re: Intra-project dependencies

2023-01-16 Thread Benedict
How often have we modified Paxos? There are currently no proposals to develop Accord further after the initial release. So I think it is very likely that Accord development will decouple from Cassandra version, unless there is significant external interest that drives it.Furthermore, the idea of revisiting this later is problematic. We can’t easily decouple Accord if it becomes tightly coupled with Cassandra, which becomes quite likely when the builds are co-dependent. We have spent great effort developing them separately to avoid this.You can’t go back later and recover lost interest. How many projects have adopted ZAB, versus Raft?None of this also addresses the wider need for reform of our approach here, for both the dtest-api and the simulator.I’m still not clear on the concrete downsides of maintaining a separate tree here? Could somebody explain what they expect to go wrong? I respond to Mick’s points below, as I do not recognise them from our experience. We’ve been doing this for a year without incident.I will note we explicitly voted to develop Accord as a standalone library as part of the original CEP, and this was debated quite extensively, so to change that will require a new dedicated DISCUSS thread and vote. - you can no longer just `git clone …`  (and we clone automatically in a number of places)Yes you can, if your build script updates the sub modules like we have been doing. - same with `git pull …` (easy to be left with out-of-sync submodules)Yes you can, again for the same reason. This is no different to ensuring your libraries are in sync, which must be done on every pull or checkout. - permanence from a git SHA no longer existsIt is intact, if you link to a SHA. - our releases get more complicated (our source tarballs are the asf releases)How? - handling patches cover submodulesHow is this different to patches affecting multiple versions in C*? - switching branches, and using git worktrees, during dvElaborate? I don’t see any problem, but I might be missing something.On 16 Jan 2023, at 16:11, Henrik Ingo  wrote:Hi allI was invited to share my thoughts just as an additional and somewhat fresh point of view...On a high level: We talked through this with Mick and a few other colleagues, and I/we came to the conclusion that fundamentally all of the mentioned options 1-5 are just variations of the same problem being moved into different places. That is to say there's complexity here that isn't going away. This is good to recognize just so that you realize when you are feeling that you don't quite like any of the available options, this is why. At least for me it's somehow calming when you understand this is the reality and you just have to face it.It seems to me the fundamental question is, will the link from Cassandra to Accord be a 1-1 or n-1 mapping? Superficially we would think that Accord is a separate library and all future Cassandra versions will use the same version of Accord. But is that really the case? Isn't it rather expected that Cassandra 5.1, 5.2 will probably come with more and improved functionality than what will be in 5.0? Fundamental additional functionality like less-than-strict consistency, mvcc, and maybe one day interactive transactions. What I'd expect to see here is then that the separate Accord library in fact is rather closely tied to its parent Cassandra release, and as soon as we have a 5.0 GA, we will also need a stable Accord branch to match, while significant new development will happen in tandem with Cassandra trunk/5.1?If the latter scenario is more likely, then having Accord in tree seems to be the easiest choice, because it's actually not the case that you are maintaining three copies of the same codebase. (Anymore than that's the case for all Cassandra code.)FWIW MongoDB does in fact use option 5: At build time there's a bash script that copies your separate WiredTiger repository into the source tree, then compiles. A major reason they did it this way was to support the possiblity that some modules would be closed source. Git modules would not work - or at least be very annoying - for a case where the parent directory is open source but the sub-module is not available to everyone. But having used the MongoDB system - which apparently is also Accord's system today - I'd say in the end it's just git submodules in a different form: You get to choose whether to manage the library dependency with git or a bash script.Finally, and I know this was stated before as well, the Accord developers seem hopeful that Accord will gain interest and contributors from outside of Cassandra, and as such warrants its own repository. For arguments sake, let's assume this is possible/likely...I didn't write this email to support any particular alternative or opinion. But combining the above thoughts, I feel like there is a conclusion sticking out of this email... And the conclusion is of the form "we can always change this later"...It seems to me that especially now, and 

Re: Intra-project dependencies

2023-01-16 Thread Henrik Ingo
Hi all

I was invited to share my thoughts just as an additional and somewhat fresh
point of view...

On a high level: We talked through this with Mick and a few other
colleagues, and I/we came to the conclusion that fundamentally all of the
mentioned options 1-5 are just variations of the same problem being moved
into different places. That is to say there's complexity here that isn't
going away. This is good to recognize just so that you realize when you are
feeling that you don't quite like any of the available options, this is
why. At least for me it's somehow calming when you understand this is the
reality and you just have to face it.



It seems to me the fundamental question is, will the link from Cassandra to
Accord be a 1-1 or n-1 mapping? Superficially we would think that Accord is
a separate library and all future Cassandra versions will use the same
version of Accord. But is that really the case? Isn't it rather expected
that Cassandra 5.1, 5.2 will probably come with more and improved
functionality than what will be in 5.0? Fundamental additional
functionality like less-than-strict consistency, mvcc, and maybe one day
interactive transactions. What I'd expect to see here is then that the
separate Accord library in fact is rather closely tied to its parent
Cassandra release, and as soon as we have a 5.0 GA, we will also need a
stable Accord branch to match, while significant new development will
happen in tandem with Cassandra trunk/5.1?

If the latter scenario is more likely, then having Accord in tree seems to
be the easiest choice, because it's actually not the case that you are
maintaining three copies of the same codebase. (Anymore than that's the
case for all Cassandra code.)


FWIW MongoDB does in fact use option 5: At build time there's a bash script
that copies your separate WiredTiger repository into the source tree, then
compiles. A major reason they did it this way was to support the possiblity
that some modules would be closed source. Git modules would not work - or
at least be very annoying - for a case where the parent directory is open
source but the sub-module is not available to everyone.

But having used the MongoDB system - which apparently is also Accord's
system today - I'd say in the end it's just git submodules in a different
form: You get to choose whether to manage the library dependency with git
or a bash script.


Finally, and I know this was stated before as well, the Accord developers
seem hopeful that Accord will gain interest and contributors from outside
of Cassandra, and as such warrants its own repository. For arguments sake,
let's assume this is possible/likely...



I didn't write this email to support any particular alternative or opinion.
But combining the above thoughts, I feel like there is a conclusion
sticking out of this email... And the conclusion is of the form "we can
always change this later"...

It seems to me that especially now, and probably also after 5.0 is
released, we will in any case only have a single version of Cassandra using
a singgle version of Accord. So at least to begin with, it's the least
effort to keep it in-tree, to avoid the overhead of git submodules, or
having to make releases, etc.  The separate constituency of Accord-only
developers can be satisfied by keeping Accord in its own directory, could
even be a top-level directory, and a small build system that can build a
separate Accord jar file. You could even maintain a separate github repo
just for advertising purposes. (Just like github.com/apache/cassandra isn't
the official git repo for Cassandra either.)

If both of my assumptions above are true, then from a Cassandra point of
view there's not much benefit having Accord separately, but if 3rd party
interest in Accord grows, then it could indeed be split out into its own
repository at that point. The main motivation then would be to service
those 3rd party developers who aren't so interested in Cassandra. But this
split would only be done once it is known that such a community will form.

Thoughts?

henrik


On Mon, Jan 16, 2023 at 2:30 PM Josh McKenzie  wrote:

>  - permanence from a git SHA no longer exists
>
> With the caveat that I haven't worked w/submodules before and only know
> about them from a cursory search, it looks like git-submodule status would
> show us the sha for submodules and we could have parent projects reference
> specific shas to pull for submodules to build?
> https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203
> 
>
> It seems like our use case is one of the primary ones git submodules are
> designed to address.
>
> On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
>
>
> I guess option 5 

Re: Intra-project dependencies

2023-01-16 Thread Josh McKenzie
>  - permanence from a git SHA no longer exists
With the caveat that I haven't worked w/submodules before and only know about 
them from a cursory search, it looks like git-submodule status would show us 
the sha for submodules and we could have parent projects reference specific 
shas to pull for submodules to build? 
https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203

It seems like our use case is one of the primary ones git submodules are 
designed to address.

On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
> 
> I guess option 5 is what we have today in cep-15, have the build file grab 
> the relevant SHA for the library. This way you maintain a precise SHA for 
> builds and scripts don’t have to be modified.
> 
> I believe this is also possible with git submodules, but I’m happy to bake 
> this into our build file instead with a script.
> 
> > As the library itself no longer has an explicit version, what I presume you 
> > meant by logical version.
> 
> I mean that we don’t want to duplicate work and risk diverging functionality 
> maintaining what is logically (meant to be) the same code. As a developer, 
> managing all of the branches is already a pain. Libraries naturally have a 
> different development cadence to the main project, and tying the development 
> to C* versions is just an unnecessary ongoing burden (and risk) that we can 
> avoid.
> 
> There’s also an additional penalty: we reduce the likelihood of outside 
> contributions to the libraries only. Accord in particular I hope will attract 
> outside interest if it is maintained as a separate library, as it has broad 
> applicability, and is likely of academic interest. Tying it to C* version and 
> more tightly coupling with C* codebase makes that less likely. We might also 
> see folk interested in our utilities, or our simulator framework, if they 
> were to be maintained separately, which could be valuable.
> 
> 
> 
> 
>> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
>> 
>>> I think (4) is the only sensible option. It permits different development 
>>> branches to easily reference different versions of a library and also to 
>>> easily co-develop them - from within the same IDE project, even.
>>> 
>> 
>> 
>> I've only heard horror stories about submodules. The challenges they bring 
>> should be listed and checked.
>> 
>> Some examples
>>  - you can no longer just `git clone …`  (and we clone automatically in a 
>> number of places)
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>  - permanence from a git SHA no longer exists
>>  - our releases get more complicated (our source tarballs are the asf 
>> releases)
>>  - handling patches cover submodules
>>  - switching branches, and using git worktrees, during dv
>> 
>> I see (4) as a valid option, but concerned with the amount of work required 
>> to adapt to it, and whether it will only make it more complicated for the 
>> new contributor to the project. For example the first two points are 
>> addressed by remembering to do `git clone --recurse-submodules …` . And who 
>> would be fixing our build/test/release scripts to accommodate?
>> 
>> Not blockers, just concerns we need to raise and address.
>> 
>>  
>>> We might even be able to avoid additional release votes as a matter of 
>>> course, by compiling the library source as part of the C* release, so that 
>>> they adopt the C* release vote (or else we may periodically release the 
>>> library as we do other releases)
>>> 
>> 
>> 
>> Yes. Today we do a combination of first (3) and then (1). Having to make a 
>> release of these libraries every time a patch (/feature branch) is 
>> completing is a horror story in itself.
>> 
>> 
>>> I might be missing something, does anyone have any other bright ideas for 
>>> approaching this problem? I’m sure there are plenty of opinions out there.
>>> 
>> 
>> 
>> Looking at the problem with these libraries, 
>>  - we don't need releases
>>  - we don't have a clean version/branch parity to in-tree
>>  - codebase parity between branches is important for upgrade tests (shared 
>> classloaders)
>> 
>>  For (2) you mention drift of the "same" version, isn't this only a problem 
>> for dtest-api in the way it requires the "same version" of a codebase for 
>> compatibility when running upgrade tests? As the library itself no longer 
>> has an explicit version, what I presume you meant by logical version.
>> 
>> To begin with, I'm leaning towards (2) because it is a cognitive re-use of 
>> our release branches, and the problems around classpath compatibility can be 
>> solved with tests. I'm sure I'm not seeing the whole picture though…
>> 


Re: Intra-project dependencies

2023-01-16 Thread Benedict
I guess option 5 is what we have today in cep-15, have the build file grab the 
relevant SHA for the library. This way you maintain a precise SHA for builds 
and scripts don’t have to be modified.

I believe this is also possible with git submodules, but I’m happy to bake this 
into our build file instead with a script.

> As the library itself no longer has an explicit version, what I presume you 
> meant by logical version.

I mean that we don’t want to duplicate work and risk diverging functionality 
maintaining what is logically (meant to be) the same code. As a developer, 
managing all of the branches is already a pain. Libraries naturally have a 
different development cadence to the main project, and tying the development to 
C* versions is just an unnecessary ongoing burden (and risk) that we can avoid.

There’s also an additional penalty: we reduce the likelihood of outside 
contributions to the libraries only. Accord in particular I hope will attract 
outside interest if it is maintained as a separate library, as it has broad 
applicability, and is likely of academic interest. Tying it to C* version and 
more tightly coupling with C* codebase makes that less likely. We might also 
see folk interested in our utilities, or our simulator framework, if they were 
to be maintained separately, which could be valuable.




> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
> 
> 
>> I think (4) is the only sensible option. It permits different development 
>> branches to easily reference different versions of a library and also to 
>> easily co-develop them - from within the same IDE project, even.
> 
> 
> I've only heard horror stories about submodules. The challenges they bring 
> should be listed and checked.
> 
> Some examples
>  - you can no longer just `git clone …`  (and we clone automatically in a 
> number of places)
>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>  - permanence from a git SHA no longer exists
>  - our releases get more complicated (our source tarballs are the asf 
> releases)
>  - handling patches cover submodules
>  - switching branches, and using git worktrees, during dv
> 
> I see (4) as a valid option, but concerned with the amount of work required 
> to adapt to it, and whether it will only make it more complicated for the new 
> contributor to the project. For example the first two points are addressed by 
> remembering to do `git clone --recurse-submodules …` . And who would be 
> fixing our build/test/release scripts to accommodate?
> 
> Not blockers, just concerns we need to raise and address.
> 
>  
>> We might even be able to avoid additional release votes as a matter of 
>> course, by compiling the library source as part of the C* release, so that 
>> they adopt the C* release vote (or else we may periodically release the 
>> library as we do other releases)
> 
> 
> Yes. Today we do a combination of first (3) and then (1). Having to make a 
> release of these libraries every time a patch (/feature branch) is completing 
> is a horror story in itself.
> 
>> I might be missing something, does anyone have any other bright ideas for 
>> approaching this problem? I’m sure there are plenty of opinions out there.
> 
> 
> Looking at the problem with these libraries, 
>  - we don't need releases
>  - we don't have a clean version/branch parity to in-tree
>  - codebase parity between branches is important for upgrade tests (shared 
> classloaders)
> 
>  For (2) you mention drift of the "same" version, isn't this only a problem 
> for dtest-api in the way it requires the "same version" of a codebase for 
> compatibility when running upgrade tests? As the library itself no longer has 
> an explicit version, what I presume you meant by logical version.
> 
> To begin with, I'm leaning towards (2) because it is a cognitive re-use of 
> our release branches, and the problems around classpath compatibility can be 
> solved with tests. I'm sure I'm not seeing the whole picture though…
> 


Re: Intra-project dependencies

2023-01-16 Thread Mick Semb Wever
>
> I think (4) is the only sensible option. It permits different development
> branches to easily reference different versions of a library and also to
> easily co-develop them - from within the same IDE project, even.
>


I've only heard horror stories about submodules. The challenges they bring
should be listed and checked.

Some examples
 - you can no longer just `git clone …`  (and we clone automatically in a
number of places)
 - same with `git pull …` (easy to be left with out-of-sync submodules)
 - permanence from a git SHA no longer exists
 - our releases get more complicated (our source tarballs are the asf
releases)
 - handling patches cover submodules
 - switching branches, and using git worktrees, during dv

I see (4) as a valid option, but concerned with the amount of work required
to adapt to it, and whether it will only make it more complicated for the
new contributor to the project. For example the first two points are
addressed by remembering to do `git clone --recurse-submodules …` . And who
would be fixing our build/test/release scripts to accommodate?

Not blockers, just concerns we need to raise and address.



> We might even be able to avoid additional release votes as a matter of
> course, by compiling the library source as part of the C* release, so that
> they adopt the C* release vote (or else we may periodically release the
> library as we do other releases)
>


Yes. Today we do a combination of first (3) and then (1). Having to make a
release of these libraries every time a patch (/feature branch) is
completing is a horror story in itself.

I might be missing something, does anyone have any other bright ideas for
> approaching this problem? I’m sure there are plenty of opinions out there.
>


Looking at the problem with these libraries,
 - we don't need releases
 - we don't have a clean version/branch parity to in-tree
 - codebase parity between branches is important for upgrade tests (shared
classloaders)

 For (2) you mention drift of the "same" version, isn't this only a problem
for dtest-api in the way it requires the "same version" of a codebase for
compatibility when running upgrade tests? As the library itself no longer
has an explicit version, what I presume you meant by logical version.

To begin with, I'm leaning towards (2) because it is a cognitive re-use of
our release branches, and the problems around classpath compatibility can
be solved with tests. I'm sure I'm not seeing the whole picture though…