Re: Paying off tech debt and correctly naming things

2018-03-28 Thread Jonathan Haddad
As requested, I'm alerting the ML that I've got the first patch of several
to come.  This one only addresses the ColumnFamilyStore class - and only
changes the name.  There's follow up tickets to change method and variable
names.  As we agreed, I'm doing this incrementally, so please let's keep
that in mind, I have no interest in producing a 10K patch that changes all
the things.

Since the patch only touches lines explicitly naming ColumnFamilyStore this
patch should be reasonably non-intrusive.

https://issues.apache.org/jira/browse/CASSANDRA-14340

Jon

On Thu, Mar 22, 2018 at 8:09 AM Jon Haddad  wrote:

> Cool.  I think there’s general agreement that doing this in as small bites
> as possible is going to be the best approach.  I have no interest in mega
> patches.
>
> >  The combined approach takes a
> > change that's already non-trivially dealing with complex subsystem
> > changes and injects a bunch of trivial renaming noise across unrelated
> > subsystems into the signal of an actual logic refactor.
>
> I agree.  This is why I like the idea of proactively working to improve
> the readability of the codebase as a specific goal, rather than being
> wrapped into some other unrelated patch.  Keeping the scope in check is the
> challenge.  Simple class and method renames, as several have pointed out,
> is easy enough with IDEA.
>
> I’ll start with class renames, as individual patches for each of them.
> I’ll be sure to call it out on the ML.  First one will be ColumnFamilyStore
> -> TableStore.
>
> Jon
>
> > On Mar 22, 2018, at 7:13 AM, Jason Brown  wrote:
> >
> > Jon,
> >
> > Thanks for bringing up this topic. I'll admit that I've been around this
> > code base for long enough, and have enough accumulated history, that I
> > probably can't fully appreciate the impact for a newcomer wrt naming.
> > However, as Josh points out, this situation probably happens to "every
> > non-trivially aged code-base ever".
> >
> > One thing I'd like to add is that with these types of large refactoring
> > changes, the review effort is non-trivial. This is because the review
> still
> > has to ensure that correctness is preserved and it's easy to overlook a
> > seemingly innocuous change.
> >
> > That being said, I am supportive of this effort. However, I believe it's
> > going to be best, for contributor and reviewer, to break it up into
> > smaller, more digestible pieces. I'd also like to request that we not go
> > whole hog and try to do everything in a compressed time frame; reviewer
> > availability is already stretched thin and I'm afraid of deepening the
> > review queue, especially mine :)
> >
> > Thanks,
> >
> > -Jason
> >
> >
> >
> >
> > On Thu, Mar 22, 2018 at 6:41 AM, Josh McKenzie 
> wrote:
> >
> >>> Some of us have big patches in flight, things that actually
> >>> pay off some technical debt, and dealing with such renames is rebase
> >> hell :\
> >> For sure, but with a code-base this old / organically grown, I expect
> >> this will always be the case. If we're talking something as simple as
> >> an intellij rename refactor, while menial, couldn't someone with a
> >> giant patch just do the same thing on their side and spend half an
> >> hour of their life clicking next? ;)
> >>
> >>> That said, there is good time for such renames - it’s during
> >>> those major refactors and rewrites. When you are
> >>> changing a subsystem, might as well do the appropriate renames.
> >> Does that hold true for a code-base with as much static state and
> >> abstraction leaking / bad factoring as we have? (i.e. every
> >> non-trivially aged code-base ever) The combined approach takes a
> >> change that's already non-trivially dealing with complex subsystem
> >> changes and injects a bunch of trivial renaming noise across unrelated
> >> subsystems into the signal of an actual logic refactor.
> >>
> >> On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko 
> >> wrote:
> >>> Poor and out-of-date naming of things is probably the least serious
> part
> >> of our technical debt. Bad factoring, and straight-up
> >>> poorly written components is where it’s really at.
> >>>
> >>> Doing a big rename for rename sake alone does more harm than it is
> good,
> >> sometimes. Some of us have big patches
> >>> in flight, things that actually pay off some technical debt, and
> dealing
> >> with such renames is rebase hell :\
> >>>
> >>> That said, there is good time for such renames - it’s during those
> major
> >> refactors and rewrites. When you are
> >>> changing a subsystem, might as well do the appropriate renames.
> >>>
> >>> —
> >>> AY
> >>>
> >>> On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:
> >>>
> >>> Whenever I hop around in the codebase, one thing that always manages to
> >> slow me down is needing to understand the context of the variable names
> >> that I’m looking at. We’ve now removed thrift the transport, but the
> >> variables, classes and comments still remain. Personally, I’d like to
> go in
> >> and

Re: Paying off tech debt and correctly naming things

2018-03-21 Thread DuyHai Doan
+10

There are 2 hard problems in CS: naming things and cache invalidation

Le 20 mars 2018 23:04, "Jon Haddad"  a écrit :

> Whenever I hop around in the codebase, one thing that always manages to
> slow me down is needing to understand the context of the variable names
> that I’m looking at.  We’ve now removed thrift the transport, but the
> variables, classes and comments still remain.  Personally, I’d like to go
> in and pay off as much technical debt as possible by refactoring the code
> to be as close to CQL as possible.  Rows should be rows, not partitions,
> I’d love to see the term column family removed forever in favor of always
> using tables.  That said, it’s a big task.  I did a quick refactor in a
> branch, simply changing the ColumnFamilyStore class to TableStore, and
> pushed it up to GitHub. [1]
>
> Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> LOC changed across 275 files.  I’ll note that my branch doesn’t change any
> of the almost 1000 search results of “columnfamilystore” found in the
> codebase and hundreds of tests failed on my branch in CircleCI, so that 2K
> LOC change would probably be quite a bit bigger.  There is, of course, a
> lot more than just renaming this one class, there’s thousands of variable
> names using any manner of “cf”, “cfs”, “columnfamily”, names plus comments
> and who knows what else.  There’s lots of references in probably every file
> that would have to get updated.
>
> What are people’s thoughts on this?  We should be honest with ourselves
> and know this isn’t going to get any easier over time.  It’s only going to
> get more confusing for new people to the project, and having to figure out
> “what kind of row am i even looking at” is a waste of time.  There’s
> obviously a much bigger impact than just renaming a bunch of files, there’s
> any number of patches and branches that would become outdated, plus anyone
> pulling in Cassandra as a dependency would be affected.  I don’t really
> have a solution for the disruption other than “leave it in place”, but in
> my mind that’s not a great (or even good) solution.
>
> Anyways, enough out of me.  My concern for ergonomics and naming might be
> significantly higher than the rest of the folks working in the code, and I
> wanted to put a feeler out there before I decided to dig into this in a
> more serious manner.
>
> Jon
>
> [1] https://github.com/apache/cassandra/compare/trunk...
> rustyrazorblade:refactor_column_family_store?expand=1 <
> https://github.com/apache/cassandra/compare/trunk...
> rustyrazorblade:refactor_column_family_store?expand=1>


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread kurt greaves
As someone who came to the codebase post CQL but prior to thrift being
removed, +1 to refactor. The current mixing of terminology is a complete
nightmare. This would also give a good opportunity document a lot of code
that simply isn't documented (or incorrect). I'd say it's worth doing it in
multiple steps though, such as refactor of a single class at a time, then
followed by refactor of variable names. We've already done one pretty big
refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
make it any worse (lol).

Row vs partition vs key vs PK is killing me

On 20 March 2018 at 22:04, Jon Haddad  wrote:

> Whenever I hop around in the codebase, one thing that always manages to
> slow me down is needing to understand the context of the variable names
> that I’m looking at.  We’ve now removed thrift the transport, but the
> variables, classes and comments still remain.  Personally, I’d like to go
> in and pay off as much technical debt as possible by refactoring the code
> to be as close to CQL as possible.  Rows should be rows, not partitions,
> I’d love to see the term column family removed forever in favor of always
> using tables.  That said, it’s a big task.  I did a quick refactor in a
> branch, simply changing the ColumnFamilyStore class to TableStore, and
> pushed it up to GitHub. [1]
>
> Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> LOC changed across 275 files.  I’ll note that my branch doesn’t change any
> of the almost 1000 search results of “columnfamilystore” found in the
> codebase and hundreds of tests failed on my branch in CircleCI, so that 2K
> LOC change would probably be quite a bit bigger.  There is, of course, a
> lot more than just renaming this one class, there’s thousands of variable
> names using any manner of “cf”, “cfs”, “columnfamily”, names plus comments
> and who knows what else.  There’s lots of references in probably every file
> that would have to get updated.
>
> What are people’s thoughts on this?  We should be honest with ourselves
> and know this isn’t going to get any easier over time.  It’s only going to
> get more confusing for new people to the project, and having to figure out
> “what kind of row am i even looking at” is a waste of time.  There’s
> obviously a much bigger impact than just renaming a bunch of files, there’s
> any number of patches and branches that would become outdated, plus anyone
> pulling in Cassandra as a dependency would be affected.  I don’t really
> have a solution for the disruption other than “leave it in place”, but in
> my mind that’s not a great (or even good) solution.
>
> Anyways, enough out of me.  My concern for ergonomics and naming might be
> significantly higher than the rest of the folks working in the code, and I
> wanted to put a feeler out there before I decided to dig into this in a
> more serious manner.
>
> Jon
>
> [1]
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> <
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> >


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Benjamin Lerer
I agree that the refactoring make sense but I also know the pain of merging
branches with smaller refactorings. When you have to make a patch for
several branches and that you have to go through several round of reviews
it can be pretty painful.
I know that pain too well so I am not in favor of the bing bang approach.





On Wed, Mar 21, 2018 at 9:05 AM, kurt greaves  wrote:

> As someone who came to the codebase post CQL but prior to thrift being
> removed, +1 to refactor. The current mixing of terminology is a complete
> nightmare. This would also give a good opportunity document a lot of code
> that simply isn't documented (or incorrect). I'd say it's worth doing it in
> multiple steps though, such as refactor of a single class at a time, then
> followed by refactor of variable names. We've already done one pretty big
> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
> make it any worse (lol).
>
> Row vs partition vs key vs PK is killing me
>
> On 20 March 2018 at 22:04, Jon Haddad  wrote:
>
> > Whenever I hop around in the codebase, one thing that always manages to
> > slow me down is needing to understand the context of the variable names
> > that I’m looking at.  We’ve now removed thrift the transport, but the
> > variables, classes and comments still remain.  Personally, I’d like to go
> > in and pay off as much technical debt as possible by refactoring the code
> > to be as close to CQL as possible.  Rows should be rows, not partitions,
> > I’d love to see the term column family removed forever in favor of always
> > using tables.  That said, it’s a big task.  I did a quick refactor in a
> > branch, simply changing the ColumnFamilyStore class to TableStore, and
> > pushed it up to GitHub. [1]
> >
> > Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> > LOC changed across 275 files.  I’ll note that my branch doesn’t change
> any
> > of the almost 1000 search results of “columnfamilystore” found in the
> > codebase and hundreds of tests failed on my branch in CircleCI, so that
> 2K
> > LOC change would probably be quite a bit bigger.  There is, of course, a
> > lot more than just renaming this one class, there’s thousands of variable
> > names using any manner of “cf”, “cfs”, “columnfamily”, names plus
> comments
> > and who knows what else.  There’s lots of references in probably every
> file
> > that would have to get updated.
> >
> > What are people’s thoughts on this?  We should be honest with ourselves
> > and know this isn’t going to get any easier over time.  It’s only going
> to
> > get more confusing for new people to the project, and having to figure
> out
> > “what kind of row am i even looking at” is a waste of time.  There’s
> > obviously a much bigger impact than just renaming a bunch of files,
> there’s
> > any number of patches and branches that would become outdated, plus
> anyone
> > pulling in Cassandra as a dependency would be affected.  I don’t really
> > have a solution for the disruption other than “leave it in place”, but in
> > my mind that’s not a great (or even good) solution.
> >
> > Anyways, enough out of me.  My concern for ergonomics and naming might be
> > significantly higher than the rest of the folks working in the code, and
> I
> > wanted to put a feeler out there before I decided to dig into this in a
> > more serious manner.
> >
> > Jon
> >
> > [1]
> > https://github.com/apache/cassandra/compare/trunk...
> rustyrazorblade:refactor_column_family_store?expand=1
> > <
> > https://github.com/apache/cassandra/compare/trunk...
> rustyrazorblade:refactor_column_family_store?expand=1
> > >
>


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Sylvain Lebresne
I really don't think anyone has been recently against such renaming, and in
fact, a _lot_ of renaming *has* already happen over time. The problem, as
you carefully noted, is that it's such a big task that there is still a lot
to do. Anyway, I've yet to see a patch renaming things to match the CQL
naming scheme be rejected, so I'd personally encourage such submission. But
maybe with a few caveats (already mentioned largely, so repeating here to
signify my personal agreement with them):
- renaming with large surface area can be painful for ongoing patches or
even future merge. That's not a reason for not doing them, but that's imo a
good enough reason to do things incrementally/in as-small-as-reasonable
steps. Making sure a renaming commit only does renaming and doesn't change
the logic is also pretty nice when you rebase such things.
- breaking hundreds of tests is obviously not ok :)
- pure code renaming is one reasonably simple aspect, but quite a few
renaming may have user visible impact. Particularly around JMX where many
things are name based on their class, and to a lesser extend some of our
tools still use "old" naming. We can't and shouldn't ignore those impact:
such user visible changes should imo be documented, and we should make sure
we have a reasonably painless (and thus incremental) upgrade path. My hunch
is the latter isn't as simple as it seems.


--
Sylvain


On Wed, Mar 21, 2018 at 9:06 AM kurt greaves  wrote:

> As someone who came to the codebase post CQL but prior to thrift being
> removed, +1 to refactor. The current mixing of terminology is a complete
> nightmare. This would also give a good opportunity document a lot of code
> that simply isn't documented (or incorrect). I'd say it's worth doing it in
> multiple steps though, such as refactor of a single class at a time, then
> followed by refactor of variable names. We've already done one pretty big
> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
> make it any worse (lol).
>
> Row vs partition vs key vs PK is killing me
>
> On 20 March 2018 at 22:04, Jon Haddad  wrote:
>
> > Whenever I hop around in the codebase, one thing that always manages to
> > slow me down is needing to understand the context of the variable names
> > that I’m looking at.  We’ve now removed thrift the transport, but the
> > variables, classes and comments still remain.  Personally, I’d like to go
> > in and pay off as much technical debt as possible by refactoring the code
> > to be as close to CQL as possible.  Rows should be rows, not partitions,
> > I’d love to see the term column family removed forever in favor of always
> > using tables.  That said, it’s a big task.  I did a quick refactor in a
> > branch, simply changing the ColumnFamilyStore class to TableStore, and
> > pushed it up to GitHub. [1]
> >
> > Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> > LOC changed across 275 files.  I’ll note that my branch doesn’t change
> any
> > of the almost 1000 search results of “columnfamilystore” found in the
> > codebase and hundreds of tests failed on my branch in CircleCI, so that
> 2K
> > LOC change would probably be quite a bit bigger.  There is, of course, a
> > lot more than just renaming this one class, there’s thousands of variable
> > names using any manner of “cf”, “cfs”, “columnfamily”, names plus
> comments
> > and who knows what else.  There’s lots of references in probably every
> file
> > that would have to get updated.
> >
> > What are people’s thoughts on this?  We should be honest with ourselves
> > and know this isn’t going to get any easier over time.  It’s only going
> to
> > get more confusing for new people to the project, and having to figure
> out
> > “what kind of row am i even looking at” is a waste of time.  There’s
> > obviously a much bigger impact than just renaming a bunch of files,
> there’s
> > any number of patches and branches that would become outdated, plus
> anyone
> > pulling in Cassandra as a dependency would be affected.  I don’t really
> > have a solution for the disruption other than “leave it in place”, but in
> > my mind that’s not a great (or even good) solution.
> >
> > Anyways, enough out of me.  My concern for ergonomics and naming might be
> > significantly higher than the rest of the folks working in the code, and
> I
> > wanted to put a feeler out there before I decided to dig into this in a
> > more serious manner.
> >
> > Jon
> >
> > [1]
> >
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> > <
> >
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
> > >
>


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Eric Evans
On Wed, Mar 21, 2018 at 3:48 AM, Sylvain Lebresne  wrote:

[ ... ]

> - pure code renaming is one reasonably simple aspect, but quite a few
> renaming may have user visible impact. Particularly around JMX where many
> things are name based on their class, and to a lesser extend some of our
> tools still use "old" naming. We can't and shouldn't ignore those impact:
> such user visible changes should imo be documented, and we should make sure
> we have a reasonably painless (and thus incremental) upgrade path. My hunch
> is the latter isn't as simple as it seems.

Speaking as someone who has personally been burned by this
(repeatedly, and it's on-going), please think very carefully before
making such changes.  I hate to think about of all the hours I wasted
shaving this breed of yak.

> On Wed, Mar 21, 2018 at 9:06 AM kurt greaves  wrote:

[ ... ]

-- 
Eric Evans
john.eric.ev...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Lerh Chuan Low
For reasons others have mentioned (nightmare to continuously update branch
and resolve merge conflicts, existing patches/big features..) it will be a
nightmare. It seems like in software projects (just basing it off personal
experience) people typically refactor if a ticket they are working on
touches the part of the code base that needs refactoring, I've not really
seen a freeze and work off technical debt before (I'll admit upfront I
don't know much).

Thinking about it, the only ones I could come up with are the same as
Sylvain had mentioned, which is start with a small subset and just do only
renaming and cleaning up comments; no logic changes. I would think some
parts of the code may take ages more before a ticket finds its way to it
(and a knowledgable enough person is involved to even guide the refactor).

So definitely, you have my (moral) support if you are going to go with it,
+1 +1 +1

On 22 March 2018 at 00:31, Eric Evans  wrote:

> On Wed, Mar 21, 2018 at 3:48 AM, Sylvain Lebresne 
> wrote:
>
> [ ... ]
>
> > - pure code renaming is one reasonably simple aspect, but quite a few
> > renaming may have user visible impact. Particularly around JMX where many
> > things are name based on their class, and to a lesser extend some of our
> > tools still use "old" naming. We can't and shouldn't ignore those impact:
> > such user visible changes should imo be documented, and we should make
> sure
> > we have a reasonably painless (and thus incremental) upgrade path. My
> hunch
> > is the latter isn't as simple as it seems.
>
> Speaking as someone who has personally been burned by this
> (repeatedly, and it's on-going), please think very carefully before
> making such changes.  I hate to think about of all the hours I wasted
> shaving this breed of yak.
>
> > On Wed, Mar 21, 2018 at 9:06 AM kurt greaves 
> wrote:
>
> [ ... ]
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Jeremiah D Jordan
+1 if you are willing to take it on.  As the person who performed the 
Table->Keyspace rename of 2.0, I say good luck!  From hindsight of doing that, 
as others suggested, I would come at this in multiple tickets.
I would suggest a simple class rename with intellij refactoring tools or 
something as the first ticket.  This is going to touch the most files at once, 
but will be mechanical, and for the most part if it compiles it was right :).
After you have done that you can take on other renaming of things with a 
smaller scope.
Also as others have said the main things to be wary of are the naming of things 
in JMX metrics.  Ideally we would keep around deprecated aliases of the old JMX 
names for a release before removing them.  The other thing is to watch out for 
class names in byte man scripts in dtest.

-Jeremiah

> On Mar 21, 2018, at 4:48 AM, Sylvain Lebresne  wrote:
> 
> I really don't think anyone has been recently against such renaming, and in
> fact, a _lot_ of renaming *has* already happen over time. The problem, as
> you carefully noted, is that it's such a big task that there is still a lot
> to do. Anyway, I've yet to see a patch renaming things to match the CQL
> naming scheme be rejected, so I'd personally encourage such submission. But
> maybe with a few caveats (already mentioned largely, so repeating here to
> signify my personal agreement with them):
> - renaming with large surface area can be painful for ongoing patches or
> even future merge. That's not a reason for not doing them, but that's imo a
> good enough reason to do things incrementally/in as-small-as-reasonable
> steps. Making sure a renaming commit only does renaming and doesn't change
> the logic is also pretty nice when you rebase such things.
> - breaking hundreds of tests is obviously not ok :)
> - pure code renaming is one reasonably simple aspect, but quite a few
> renaming may have user visible impact. Particularly around JMX where many
> things are name based on their class, and to a lesser extend some of our
> tools still use "old" naming. We can't and shouldn't ignore those impact:
> such user visible changes should imo be documented, and we should make sure
> we have a reasonably painless (and thus incremental) upgrade path. My hunch
> is the latter isn't as simple as it seems.
> 
> 
> --
> Sylvain
> 
> 
> On Wed, Mar 21, 2018 at 9:06 AM kurt greaves  wrote:
> 
>> As someone who came to the codebase post CQL but prior to thrift being
>> removed, +1 to refactor. The current mixing of terminology is a complete
>> nightmare. This would also give a good opportunity document a lot of code
>> that simply isn't documented (or incorrect). I'd say it's worth doing it in
>> multiple steps though, such as refactor of a single class at a time, then
>> followed by refactor of variable names. We've already done one pretty big
>> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
>> make it any worse (lol).
>> 
>> Row vs partition vs key vs PK is killing me
>> 
>> On 20 March 2018 at 22:04, Jon Haddad  wrote:
>> 
>>> Whenever I hop around in the codebase, one thing that always manages to
>>> slow me down is needing to understand the context of the variable names
>>> that I’m looking at.  We’ve now removed thrift the transport, but the
>>> variables, classes and comments still remain.  Personally, I’d like to go
>>> in and pay off as much technical debt as possible by refactoring the code
>>> to be as close to CQL as possible.  Rows should be rows, not partitions,
>>> I’d love to see the term column family removed forever in favor of always
>>> using tables.  That said, it’s a big task.  I did a quick refactor in a
>>> branch, simply changing the ColumnFamilyStore class to TableStore, and
>>> pushed it up to GitHub. [1]
>>> 
>>> Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
>>> LOC changed across 275 files.  I’ll note that my branch doesn’t change
>> any
>>> of the almost 1000 search results of “columnfamilystore” found in the
>>> codebase and hundreds of tests failed on my branch in CircleCI, so that
>> 2K
>>> LOC change would probably be quite a bit bigger.  There is, of course, a
>>> lot more than just renaming this one class, there’s thousands of variable
>>> names using any manner of “cf”, “cfs”, “columnfamily”, names plus
>> comments
>>> and who knows what else.  There’s lots of references in probably every
>> file
>>> that would have to get updated.
>>> 
>>> What are people’s thoughts on this?  We should be honest with ourselves
>>> and know this isn’t going to get any easier over time.  It’s only going
>> to
>>> get more confusing for new people to the project, and having to figure
>> out
>>> “what kind of row am i even looking at” is a waste of time.  There’s
>>> obviously a much bigger impact than just renaming a bunch of files,
>> there’s
>>> any number of patches and branches that would become outdated, plus
>> anyone
>>> pulling in Cassandra as a dependency would be af

Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Jeff Jirsa
Please please please ping the list and ask if anyone has big commits ready to 
merge before actually committing any huge automated refactors - people who may 
be sitting on big patches will thank you if they don’t have to rebase against 
huge IntelliJ refactors . 

-- 
Jeff Jirsa


> On Mar 21, 2018, at 5:49 PM, Jeremiah D Jordan  
> wrote:
> 
> +1 if you are willing to take it on.  As the person who performed the 
> Table->Keyspace rename of 2.0, I say good luck!  From hindsight of doing 
> that, as others suggested, I would come at this in multiple tickets.
> I would suggest a simple class rename with intellij refactoring tools or 
> something as the first ticket.  This is going to touch the most files at 
> once, but will be mechanical, and for the most part if it compiles it was 
> right :).
> After you have done that you can take on other renaming of things with a 
> smaller scope.
> Also as others have said the main things to be wary of are the naming of 
> things in JMX metrics.  Ideally we would keep around deprecated aliases of 
> the old JMX names for a release before removing them.  The other thing is to 
> watch out for class names in byte man scripts in dtest.
> 
> -Jeremiah
> 
>> On Mar 21, 2018, at 4:48 AM, Sylvain Lebresne  wrote:
>> 
>> I really don't think anyone has been recently against such renaming, and in
>> fact, a _lot_ of renaming *has* already happen over time. The problem, as
>> you carefully noted, is that it's such a big task that there is still a lot
>> to do. Anyway, I've yet to see a patch renaming things to match the CQL
>> naming scheme be rejected, so I'd personally encourage such submission. But
>> maybe with a few caveats (already mentioned largely, so repeating here to
>> signify my personal agreement with them):
>> - renaming with large surface area can be painful for ongoing patches or
>> even future merge. That's not a reason for not doing them, but that's imo a
>> good enough reason to do things incrementally/in as-small-as-reasonable
>> steps. Making sure a renaming commit only does renaming and doesn't change
>> the logic is also pretty nice when you rebase such things.
>> - breaking hundreds of tests is obviously not ok :)
>> - pure code renaming is one reasonably simple aspect, but quite a few
>> renaming may have user visible impact. Particularly around JMX where many
>> things are name based on their class, and to a lesser extend some of our
>> tools still use "old" naming. We can't and shouldn't ignore those impact:
>> such user visible changes should imo be documented, and we should make sure
>> we have a reasonably painless (and thus incremental) upgrade path. My hunch
>> is the latter isn't as simple as it seems.
>> 
>> 
>> --
>> Sylvain
>> 
>> 
>>> On Wed, Mar 21, 2018 at 9:06 AM kurt greaves  wrote:
>>> 
>>> As someone who came to the codebase post CQL but prior to thrift being
>>> removed, +1 to refactor. The current mixing of terminology is a complete
>>> nightmare. This would also give a good opportunity document a lot of code
>>> that simply isn't documented (or incorrect). I'd say it's worth doing it in
>>> multiple steps though, such as refactor of a single class at a time, then
>>> followed by refactor of variable names. We've already done one pretty big
>>> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
>>> make it any worse (lol).
>>> 
>>> Row vs partition vs key vs PK is killing me
>>> 
 On 20 March 2018 at 22:04, Jon Haddad  wrote:
 
 Whenever I hop around in the codebase, one thing that always manages to
 slow me down is needing to understand the context of the variable names
 that I’m looking at.  We’ve now removed thrift the transport, but the
 variables, classes and comments still remain.  Personally, I’d like to go
 in and pay off as much technical debt as possible by refactoring the code
 to be as close to CQL as possible.  Rows should be rows, not partitions,
 I’d love to see the term column family removed forever in favor of always
 using tables.  That said, it’s a big task.  I did a quick refactor in a
 branch, simply changing the ColumnFamilyStore class to TableStore, and
 pushed it up to GitHub. [1]
 
 Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
 LOC changed across 275 files.  I’ll note that my branch doesn’t change
>>> any
 of the almost 1000 search results of “columnfamilystore” found in the
 codebase and hundreds of tests failed on my branch in CircleCI, so that
>>> 2K
 LOC change would probably be quite a bit bigger.  There is, of course, a
 lot more than just renaming this one class, there’s thousands of variable
 names using any manner of “cf”, “cfs”, “columnfamily”, names plus
>>> comments
 and who knows what else.  There’s lots of references in probably every
>>> file
 that would have to get updated.
 
 What are people’s thoughts on this?  We should be honest with ourselves
>>

Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Rahul Singh
+1 - can help with sections of the code

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 21, 2018, 4:25 PM -0500, Lerh Chuan Low , wrote:
> For reasons others have mentioned (nightmare to continuously update branch
> and resolve merge conflicts, existing patches/big features..) it will be a
> nightmare. It seems like in software projects (just basing it off personal
> experience) people typically refactor if a ticket they are working on
> touches the part of the code base that needs refactoring, I've not really
> seen a freeze and work off technical debt before (I'll admit upfront I
> don't know much).
>
> Thinking about it, the only ones I could come up with are the same as
> Sylvain had mentioned, which is start with a small subset and just do only
> renaming and cleaning up comments; no logic changes. I would think some
> parts of the code may take ages more before a ticket finds its way to it
> (and a knowledgable enough person is involved to even guide the refactor).
>
> So definitely, you have my (moral) support if you are going to go with it,
> +1 +1 +1
>
> On 22 March 2018 at 00:31, Eric Evans  wrote:
>
> > On Wed, Mar 21, 2018 at 3:48 AM, Sylvain Lebresne  > wrote:
> >
> > [ ... ]
> >
> > > - pure code renaming is one reasonably simple aspect, but quite a few
> > > renaming may have user visible impact. Particularly around JMX where many
> > > things are name based on their class, and to a lesser extend some of our
> > > tools still use "old" naming. We can't and shouldn't ignore those impact:
> > > such user visible changes should imo be documented, and we should make
> > sure
> > > we have a reasonably painless (and thus incremental) upgrade path. My
> > hunch
> > > is the latter isn't as simple as it seems.
> >
> > Speaking as someone who has personally been burned by this
> > (repeatedly, and it's on-going), please think very carefully before
> > making such changes. I hate to think about of all the hours I wasted
> > shaving this breed of yak.
> >
> > > On Wed, Mar 21, 2018 at 9:06 AM kurt greaves  > wrote:
> >
> > [ ... ]
> >
> > --
> > Eric Evans
> > john.eric.ev...@gmail.com
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Dinesh Joshi
Having been through a bunch of refactors in other projects, I would suggest 
doing small incremental patches isolated in related parts of the code. It would 
also be nice if you could give us a heads up on changes you're making.
Best of luck!
Dinesh 

On Tuesday, March 20, 2018, 3:04:43 PM PDT, Jon Haddad  
wrote:  
 
 Whenever I hop around in the codebase, one thing that always manages to slow 
me down is needing to understand the context of the variable names that I’m 
looking at.  We’ve now removed thrift the transport, but the variables, classes 
and comments still remain.  Personally, I’d like to go in and pay off as much 
technical debt as possible by refactoring the code to be as close to CQL as 
possible.  Rows should be rows, not partitions, I’d love to see the term column 
family removed forever in favor of always using tables.  That said, it’s a big 
task.  I did a quick refactor in a branch, simply changing the 
ColumnFamilyStore class to TableStore, and pushed it up to GitHub. [1]

Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K LOC 
changed across 275 files.  I’ll note that my branch doesn’t change any of the 
almost 1000 search results of “columnfamilystore” found in the codebase and 
hundreds of tests failed on my branch in CircleCI, so that 2K LOC change would 
probably be quite a bit bigger.  There is, of course, a lot more than just 
renaming this one class, there’s thousands of variable names using any manner 
of “cf”, “cfs”, “columnfamily”, names plus comments and who knows what else.  
There’s lots of references in probably every file that would have to get 
updated.

What are people’s thoughts on this?  We should be honest with ourselves and 
know this isn’t going to get any easier over time.  It’s only going to get more 
confusing for new people to the project, and having to figure out “what kind of 
row am i even looking at” is a waste of time.  There’s obviously a much bigger 
impact than just renaming a bunch of files, there’s any number of patches and 
branches that would become outdated, plus anyone pulling in Cassandra as a 
dependency would be affected.  I don’t really have a solution for the 
disruption other than “leave it in place”, but in my mind that’s not a great 
(or even good) solution.

Anyways, enough out of me.  My concern for ergonomics and naming might be 
significantly higher than the rest of the folks working in the code, and I 
wanted to put a feeler out there before I decided to dig into this in a more 
serious manner. 

Jon

[1] 
https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
 

  

Re: Paying off tech debt and correctly naming things

2018-03-22 Thread Aleksey Yeshchenko
Poor and out-of-date naming of things is probably the least serious part of our 
technical debt. Bad factoring, and straight-up
poorly written components is where it’s really at.

Doing a big rename for rename sake alone does more harm than it is good, 
sometimes. Some of us have big patches
in flight, things that actually pay off some technical debt, and dealing with 
such renames is rebase hell :\

That said, there is good time for such renames - it’s during those major 
refactors and rewrites. When you are
changing a subsystem, might as well do the appropriate renames.

—
AY

On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:

Whenever I hop around in the codebase, one thing that always manages to slow me 
down is needing to understand the context of the variable names that I’m 
looking at. We’ve now removed thrift the transport, but the variables, classes 
and comments still remain. Personally, I’d like to go in and pay off as much 
technical debt as possible by refactoring the code to be as close to CQL as 
possible. Rows should be rows, not partitions, I’d love to see the term column 
family removed forever in favor of always using tables. That said, it’s a big 
task. I did a quick refactor in a branch, simply changing the ColumnFamilyStore 
class to TableStore, and pushed it up to GitHub. [1]  

Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K LOC 
changed across 275 files. I’ll note that my branch doesn’t change any of the 
almost 1000 search results of “columnfamilystore” found in the codebase and 
hundreds of tests failed on my branch in CircleCI, so that 2K LOC change would 
probably be quite a bit bigger. There is, of course, a lot more than just 
renaming this one class, there’s thousands of variable names using any manner 
of “cf”, “cfs”, “columnfamily”, names plus comments and who knows what else. 
There’s lots of references in probably every file that would have to get 
updated.  

What are people’s thoughts on this? We should be honest with ourselves and know 
this isn’t going to get any easier over time. It’s only going to get more 
confusing for new people to the project, and having to figure out “what kind of 
row am i even looking at” is a waste of time. There’s obviously a much bigger 
impact than just renaming a bunch of files, there’s any number of patches and 
branches that would become outdated, plus anyone pulling in Cassandra as a 
dependency would be affected. I don’t really have a solution for the disruption 
other than “leave it in place”, but in my mind that’s not a great (or even 
good) solution.  

Anyways, enough out of me. My concern for ergonomics and naming might be 
significantly higher than the rest of the folks working in the code, and I 
wanted to put a feeler out there before I decided to dig into this in a more 
serious manner.  

Jon  

[1] 
https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
 


Re: Paying off tech debt and correctly naming things

2018-03-22 Thread Josh McKenzie
> Some of us have big patches in flight, things that actually
> pay off some technical debt, and dealing with such renames is rebase hell :\
For sure, but with a code-base this old / organically grown, I expect
this will always be the case. If we're talking something as simple as
an intellij rename refactor, while menial, couldn't someone with a
giant patch just do the same thing on their side and spend half an
hour of their life clicking next? ;)

> That said, there is good time for such renames - it’s during
> those major refactors and rewrites. When you are
> changing a subsystem, might as well do the appropriate renames.
Does that hold true for a code-base with as much static state and
abstraction leaking / bad factoring as we have? (i.e. every
non-trivially aged code-base ever) The combined approach takes a
change that's already non-trivially dealing with complex subsystem
changes and injects a bunch of trivial renaming noise across unrelated
subsystems into the signal of an actual logic refactor.

On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko  wrote:
> Poor and out-of-date naming of things is probably the least serious part of 
> our technical debt. Bad factoring, and straight-up
> poorly written components is where it’s really at.
>
> Doing a big rename for rename sake alone does more harm than it is good, 
> sometimes. Some of us have big patches
> in flight, things that actually pay off some technical debt, and dealing with 
> such renames is rebase hell :\
>
> That said, there is good time for such renames - it’s during those major 
> refactors and rewrites. When you are
> changing a subsystem, might as well do the appropriate renames.
>
> —
> AY
>
> On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:
>
> Whenever I hop around in the codebase, one thing that always manages to slow 
> me down is needing to understand the context of the variable names that I’m 
> looking at. We’ve now removed thrift the transport, but the variables, 
> classes and comments still remain. Personally, I’d like to go in and pay off 
> as much technical debt as possible by refactoring the code to be as close to 
> CQL as possible. Rows should be rows, not partitions, I’d love to see the 
> term column family removed forever in favor of always using tables. That 
> said, it’s a big task. I did a quick refactor in a branch, simply changing 
> the ColumnFamilyStore class to TableStore, and pushed it up to GitHub. [1]
>
> Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K LOC 
> changed across 275 files. I’ll note that my branch doesn’t change any of the 
> almost 1000 search results of “columnfamilystore” found in the codebase and 
> hundreds of tests failed on my branch in CircleCI, so that 2K LOC change 
> would probably be quite a bit bigger. There is, of course, a lot more than 
> just renaming this one class, there’s thousands of variable names using any 
> manner of “cf”, “cfs”, “columnfamily”, names plus comments and who knows what 
> else. There’s lots of references in probably every file that would have to 
> get updated.
>
> What are people’s thoughts on this? We should be honest with ourselves and 
> know this isn’t going to get any easier over time. It’s only going to get 
> more confusing for new people to the project, and having to figure out “what 
> kind of row am i even looking at” is a waste of time. There’s obviously a 
> much bigger impact than just renaming a bunch of files, there’s any number of 
> patches and branches that would become outdated, plus anyone pulling in 
> Cassandra as a dependency would be affected. I don’t really have a solution 
> for the disruption other than “leave it in place”, but in my mind that’s not 
> a great (or even good) solution.
>
> Anyways, enough out of me. My concern for ergonomics and naming might be 
> significantly higher than the rest of the folks working in the code, and I 
> wanted to put a feeler out there before I decided to dig into this in a more 
> serious manner.
>
> Jon
>
> [1] 
> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
>  
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Paying off tech debt and correctly naming things

2018-03-22 Thread Jason Brown
Jon,

Thanks for bringing up this topic. I'll admit that I've been around this
code base for long enough, and have enough accumulated history, that I
probably can't fully appreciate the impact for a newcomer wrt naming.
However, as Josh points out, this situation probably happens to "every
non-trivially aged code-base ever".

One thing I'd like to add is that with these types of large refactoring
changes, the review effort is non-trivial. This is because the review still
has to ensure that correctness is preserved and it's easy to overlook a
seemingly innocuous change.

That being said, I am supportive of this effort. However, I believe it's
going to be best, for contributor and reviewer, to break it up into
smaller, more digestible pieces. I'd also like to request that we not go
whole hog and try to do everything in a compressed time frame; reviewer
availability is already stretched thin and I'm afraid of deepening the
review queue, especially mine :)

Thanks,

-Jason




On Thu, Mar 22, 2018 at 6:41 AM, Josh McKenzie  wrote:

> > Some of us have big patches in flight, things that actually
> > pay off some technical debt, and dealing with such renames is rebase
> hell :\
> For sure, but with a code-base this old / organically grown, I expect
> this will always be the case. If we're talking something as simple as
> an intellij rename refactor, while menial, couldn't someone with a
> giant patch just do the same thing on their side and spend half an
> hour of their life clicking next? ;)
>
> > That said, there is good time for such renames - it’s during
> > those major refactors and rewrites. When you are
> > changing a subsystem, might as well do the appropriate renames.
> Does that hold true for a code-base with as much static state and
> abstraction leaking / bad factoring as we have? (i.e. every
> non-trivially aged code-base ever) The combined approach takes a
> change that's already non-trivially dealing with complex subsystem
> changes and injects a bunch of trivial renaming noise across unrelated
> subsystems into the signal of an actual logic refactor.
>
> On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko 
> wrote:
> > Poor and out-of-date naming of things is probably the least serious part
> of our technical debt. Bad factoring, and straight-up
> > poorly written components is where it’s really at.
> >
> > Doing a big rename for rename sake alone does more harm than it is good,
> sometimes. Some of us have big patches
> > in flight, things that actually pay off some technical debt, and dealing
> with such renames is rebase hell :\
> >
> > That said, there is good time for such renames - it’s during those major
> refactors and rewrites. When you are
> > changing a subsystem, might as well do the appropriate renames.
> >
> > —
> > AY
> >
> > On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:
> >
> > Whenever I hop around in the codebase, one thing that always manages to
> slow me down is needing to understand the context of the variable names
> that I’m looking at. We’ve now removed thrift the transport, but the
> variables, classes and comments still remain. Personally, I’d like to go in
> and pay off as much technical debt as possible by refactoring the code to
> be as close to CQL as possible. Rows should be rows, not partitions, I’d
> love to see the term column family removed forever in favor of always using
> tables. That said, it’s a big task. I did a quick refactor in a branch,
> simply changing the ColumnFamilyStore class to TableStore, and pushed it up
> to GitHub. [1]
> >
> > Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K
> LOC changed across 275 files. I’ll note that my branch doesn’t change any
> of the almost 1000 search results of “columnfamilystore” found in the
> codebase and hundreds of tests failed on my branch in CircleCI, so that 2K
> LOC change would probably be quite a bit bigger. There is, of course, a lot
> more than just renaming this one class, there’s thousands of variable names
> using any manner of “cf”, “cfs”, “columnfamily”, names plus comments and
> who knows what else. There’s lots of references in probably every file that
> would have to get updated.
> >
> > What are people’s thoughts on this? We should be honest with ourselves
> and know this isn’t going to get any easier over time. It’s only going to
> get more confusing for new people to the project, and having to figure out
> “what kind of row am i even looking at” is a waste of time. There’s
> obviously a much bigger impact than just renaming a bunch of files, there’s
> any number of patches and branches that would become outdated, plus anyone
> pulling in Cassandra as a dependency would be affected. I don’t really have
> a solution for the disruption other than “leave it in place”, but in my
> mind that’s not a great (or even good) solution.
> >
> > Anyways, enough out of me. My concern for ergonomics and naming might be
> significantly higher than the rest of the fo

Re: Paying off tech debt and correctly naming things

2018-03-22 Thread Jon Haddad
Cool.  I think there’s general agreement that doing this in as small bites as 
possible is going to be the best approach.  I have no interest in mega patches. 
 

>  The combined approach takes a
> change that's already non-trivially dealing with complex subsystem
> changes and injects a bunch of trivial renaming noise across unrelated
> subsystems into the signal of an actual logic refactor.

I agree.  This is why I like the idea of proactively working to improve the 
readability of the codebase as a specific goal, rather than being wrapped into 
some other unrelated patch.  Keeping the scope in check is the challenge.  
Simple class and method renames, as several have pointed out, is easy enough 
with IDEA.  

I’ll start with class renames, as individual patches for each of them.  I’ll be 
sure to call it out on the ML.  First one will be ColumnFamilyStore -> 
TableStore.  

Jon

> On Mar 22, 2018, at 7:13 AM, Jason Brown  wrote:
> 
> Jon,
> 
> Thanks for bringing up this topic. I'll admit that I've been around this
> code base for long enough, and have enough accumulated history, that I
> probably can't fully appreciate the impact for a newcomer wrt naming.
> However, as Josh points out, this situation probably happens to "every
> non-trivially aged code-base ever".
> 
> One thing I'd like to add is that with these types of large refactoring
> changes, the review effort is non-trivial. This is because the review still
> has to ensure that correctness is preserved and it's easy to overlook a
> seemingly innocuous change.
> 
> That being said, I am supportive of this effort. However, I believe it's
> going to be best, for contributor and reviewer, to break it up into
> smaller, more digestible pieces. I'd also like to request that we not go
> whole hog and try to do everything in a compressed time frame; reviewer
> availability is already stretched thin and I'm afraid of deepening the
> review queue, especially mine :)
> 
> Thanks,
> 
> -Jason
> 
> 
> 
> 
> On Thu, Mar 22, 2018 at 6:41 AM, Josh McKenzie  wrote:
> 
>>> Some of us have big patches in flight, things that actually
>>> pay off some technical debt, and dealing with such renames is rebase
>> hell :\
>> For sure, but with a code-base this old / organically grown, I expect
>> this will always be the case. If we're talking something as simple as
>> an intellij rename refactor, while menial, couldn't someone with a
>> giant patch just do the same thing on their side and spend half an
>> hour of their life clicking next? ;)
>> 
>>> That said, there is good time for such renames - it’s during
>>> those major refactors and rewrites. When you are
>>> changing a subsystem, might as well do the appropriate renames.
>> Does that hold true for a code-base with as much static state and
>> abstraction leaking / bad factoring as we have? (i.e. every
>> non-trivially aged code-base ever) The combined approach takes a
>> change that's already non-trivially dealing with complex subsystem
>> changes and injects a bunch of trivial renaming noise across unrelated
>> subsystems into the signal of an actual logic refactor.
>> 
>> On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko 
>> wrote:
>>> Poor and out-of-date naming of things is probably the least serious part
>> of our technical debt. Bad factoring, and straight-up
>>> poorly written components is where it’s really at.
>>> 
>>> Doing a big rename for rename sake alone does more harm than it is good,
>> sometimes. Some of us have big patches
>>> in flight, things that actually pay off some technical debt, and dealing
>> with such renames is rebase hell :\
>>> 
>>> That said, there is good time for such renames - it’s during those major
>> refactors and rewrites. When you are
>>> changing a subsystem, might as well do the appropriate renames.
>>> 
>>> —
>>> AY
>>> 
>>> On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:
>>> 
>>> Whenever I hop around in the codebase, one thing that always manages to
>> slow me down is needing to understand the context of the variable names
>> that I’m looking at. We’ve now removed thrift the transport, but the
>> variables, classes and comments still remain. Personally, I’d like to go in
>> and pay off as much technical debt as possible by refactoring the code to
>> be as close to CQL as possible. Rows should be rows, not partitions, I’d
>> love to see the term column family removed forever in favor of always using
>> tables. That said, it’s a big task. I did a quick refactor in a branch,
>> simply changing the ColumnFamilyStore class to TableStore, and pushed it up
>> to GitHub. [1]
>>> 
>>> Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K
>> LOC changed across 275 files. I’ll note that my branch doesn’t change any
>> of the almost 1000 search results of “columnfamilystore” found in the
>> codebase and hundreds of tests failed on my branch in CircleCI, so that 2K
>> LOC change would probably be quite a bit bigger. There is, of course, a lot

RE: Paying off tech debt and correctly naming things

2018-03-22 Thread Kenneth Brotman
I agree with Jason Brown: good topic to address, effect not trivial.  Going 
whole hog too risky.  Careful meticulous small areas at a time with warning to 
everyone so they can chime in if they are working on that area and would prefer 
it left alone for now.  

I wouldn't want it to delay others things.  For example, not sure where 
everyone is on this but, if Reaper was integrated in Cassandra so it just 
worked away in Cassandra without needing any installation or attention of any 
kind - in time for version 4.0 - would be really cool.

Kenneth Brotman

-Original Message-
From: Jason Brown [mailto:jasedbr...@gmail.com] 
Sent: Thursday, March 22, 2018 7:14 AM
To: dev@cassandra.apache.org
Subject: Re: Paying off tech debt and correctly naming things

Jon,

Thanks for bringing up this topic. I'll admit that I've been around this code 
base for long enough, and have enough accumulated history, that I probably 
can't fully appreciate the impact for a newcomer wrt naming.
However, as Josh points out, this situation probably happens to "every 
non-trivially aged code-base ever".

One thing I'd like to add is that with these types of large refactoring 
changes, the review effort is non-trivial. This is because the review still has 
to ensure that correctness is preserved and it's easy to overlook a seemingly 
innocuous change.

That being said, I am supportive of this effort. However, I believe it's going 
to be best, for contributor and reviewer, to break it up into smaller, more 
digestible pieces. I'd also like to request that we not go whole hog and try to 
do everything in a compressed time frame; reviewer availability is already 
stretched thin and I'm afraid of deepening the review queue, especially mine :)

Thanks,

-Jason




On Thu, Mar 22, 2018 at 6:41 AM, Josh McKenzie  wrote:

> > Some of us have big patches in flight, things that actually pay off 
> > some technical debt, and dealing with such renames is rebase
> hell :\
> For sure, but with a code-base this old / organically grown, I expect 
> this will always be the case. If we're talking something as simple as 
> an intellij rename refactor, while menial, couldn't someone with a 
> giant patch just do the same thing on their side and spend half an 
> hour of their life clicking next? ;)
>
> > That said, there is good time for such renames - it’s during those 
> > major refactors and rewrites. When you are changing a subsystem, 
> > might as well do the appropriate renames.
> Does that hold true for a code-base with as much static state and 
> abstraction leaking / bad factoring as we have? (i.e. every 
> non-trivially aged code-base ever) The combined approach takes a 
> change that's already non-trivially dealing with complex subsystem 
> changes and injects a bunch of trivial renaming noise across unrelated 
> subsystems into the signal of an actual logic refactor.
>
> On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko 
> 
> wrote:
> > Poor and out-of-date naming of things is probably the least serious 
> > part
> of our technical debt. Bad factoring, and straight-up
> > poorly written components is where it’s really at.
> >
> > Doing a big rename for rename sake alone does more harm than it is 
> > good,
> sometimes. Some of us have big patches
> > in flight, things that actually pay off some technical debt, and 
> > dealing
> with such renames is rebase hell :\
> >
> > That said, there is good time for such renames - it’s during those 
> > major
> refactors and rewrites. When you are
> > changing a subsystem, might as well do the appropriate renames.
> >
> > —
> > AY
> >
> > On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:
> >
> > Whenever I hop around in the codebase, one thing that always manages 
> > to
> slow me down is needing to understand the context of the variable 
> names that I’m looking at. We’ve now removed thrift the transport, but 
> the variables, classes and comments still remain. Personally, I’d like 
> to go in and pay off as much technical debt as possible by refactoring 
> the code to be as close to CQL as possible. Rows should be rows, not 
> partitions, I’d love to see the term column family removed forever in 
> favor of always using tables. That said, it’s a big task. I did a 
> quick refactor in a branch, simply changing the ColumnFamilyStore 
> class to TableStore, and pushed it up to GitHub. [1]
> >
> > Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 
> > 2K
> LOC changed across 275 files. I’ll note that my branch doesn’t change 
> any of the almost 1000 search results of “columnfamilystore” found in 
> the codebase and hundreds of 

RE: Paying off tech debt and correctly naming things

2018-03-22 Thread Kenneth Brotman
Perfect!

-Original Message-
From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
Sent: Thursday, March 22, 2018 8:10 AM
To: dev@cassandra.apache.org
Subject: Re: Paying off tech debt and correctly naming things

Cool.  I think there’s general agreement that doing this in as small bites as 
possible is going to be the best approach.  I have no interest in mega patches. 
 

>  The combined approach takes a
> change that's already non-trivially dealing with complex subsystem 
> changes and injects a bunch of trivial renaming noise across unrelated 
> subsystems into the signal of an actual logic refactor.

I agree.  This is why I like the idea of proactively working to improve the 
readability of the codebase as a specific goal, rather than being wrapped into 
some other unrelated patch.  Keeping the scope in check is the challenge.  
Simple class and method renames, as several have pointed out, is easy enough 
with IDEA.  

I’ll start with class renames, as individual patches for each of them.  I’ll be 
sure to call it out on the ML.  First one will be ColumnFamilyStore -> 
TableStore.  

Jon

> On Mar 22, 2018, at 7:13 AM, Jason Brown  wrote:
> 
> Jon,
> 
> Thanks for bringing up this topic. I'll admit that I've been around 
> this code base for long enough, and have enough accumulated history, 
> that I probably can't fully appreciate the impact for a newcomer wrt naming.
> However, as Josh points out, this situation probably happens to "every 
> non-trivially aged code-base ever".
> 
> One thing I'd like to add is that with these types of large 
> refactoring changes, the review effort is non-trivial. This is because 
> the review still has to ensure that correctness is preserved and it's 
> easy to overlook a seemingly innocuous change.
> 
> That being said, I am supportive of this effort. However, I believe 
> it's going to be best, for contributor and reviewer, to break it up 
> into smaller, more digestible pieces. I'd also like to request that we 
> not go whole hog and try to do everything in a compressed time frame; 
> reviewer availability is already stretched thin and I'm afraid of 
> deepening the review queue, especially mine :)
> 
> Thanks,
> 
> -Jason
> 
> 
> 
> 
> On Thu, Mar 22, 2018 at 6:41 AM, Josh McKenzie  wrote:
> 
>>> Some of us have big patches in flight, things that actually pay off 
>>> some technical debt, and dealing with such renames is rebase
>> hell :\
>> For sure, but with a code-base this old / organically grown, I expect 
>> this will always be the case. If we're talking something as simple as 
>> an intellij rename refactor, while menial, couldn't someone with a 
>> giant patch just do the same thing on their side and spend half an 
>> hour of their life clicking next? ;)
>> 
>>> That said, there is good time for such renames - it’s during those 
>>> major refactors and rewrites. When you are changing a subsystem, 
>>> might as well do the appropriate renames.
>> Does that hold true for a code-base with as much static state and 
>> abstraction leaking / bad factoring as we have? (i.e. every 
>> non-trivially aged code-base ever) The combined approach takes a 
>> change that's already non-trivially dealing with complex subsystem 
>> changes and injects a bunch of trivial renaming noise across 
>> unrelated subsystems into the signal of an actual logic refactor.
>> 
>> On Thu, Mar 22, 2018 at 9:31 AM, Aleksey Yeshchenko 
>> 
>> wrote:
>>> Poor and out-of-date naming of things is probably the least serious 
>>> part
>> of our technical debt. Bad factoring, and straight-up
>>> poorly written components is where it’s really at.
>>> 
>>> Doing a big rename for rename sake alone does more harm than it is 
>>> good,
>> sometimes. Some of us have big patches
>>> in flight, things that actually pay off some technical debt, and 
>>> dealing
>> with such renames is rebase hell :\
>>> 
>>> That said, there is good time for such renames - it’s during those 
>>> major
>> refactors and rewrites. When you are
>>> changing a subsystem, might as well do the appropriate renames.
>>> 
>>> —
>>> AY
>>> 
>>> On 20 March 2018 at 22:04:48, Jon Haddad (j...@jonhaddad.com) wrote:
>>> 
>>> Whenever I hop around in the codebase, one thing that always manages 
>>> to
>> slow me down is needing to understand the context of the variable 
>> names that I’m looking at. We’ve now removed thrift the transport, 
>> but the variables,