Re: Paying off tech debt and correctly naming things

Benjamin Lerer Wed, 21 Mar 2018 01:47:46 -0700

I agree that the refactoring make sense but I also know the pain of merging
branches with smaller refactorings. When you have to make a patch for
several branches and that you have to go through several round of reviews
it can be pretty painful.
I know that pain too well so I am not in favor of the bing bang approach.






On Wed, Mar 21, 2018 at 9:05 AM, kurt greaves <[email protected]> wrote:

> As someone who came to the codebase post CQL but prior to thrift being
> removed, +1 to refactor. The current mixing of terminology is a complete
> nightmare. This would also give a good opportunity document a lot of code
> that simply isn't documented (or incorrect). I'd say it's worth doing it in
> multiple steps though, such as refactor of a single class at a time, then
> followed by refactor of variable names. We've already done one pretty big
> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
> make it any worse (lol).
>
> Row vs partition vs key vs PK is killing me
>
> On 20 March 2018 at 22:04, Jon Haddad <[email protected]> wrote:
>
> > Whenever I hop around in the codebase, one thing that always manages to
> > slow me down is needing to understand the context of the variable names
> > that I’m looking at.  We’ve now removed thrift the transport, but the
> > variables, classes and comments still remain.  Personally, I’d like to go
> > in and pay off as much technical debt as possible by refactoring the code
> > to be as close to CQL as possible.  Rows should be rows, not partitions,
> > I’d love to see the term column family removed forever in favor of always
> > using tables.  That said, it’s a big task.  I did a quick refactor in a
> > branch, simply changing the ColumnFamilyStore class to TableStore, and
> > pushed it up to GitHub. [1]
> >
> > Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
> > LOC changed across 275 files.  I’ll note that my branch doesn’t change
> any
> > of the almost 1000 search results of “columnfamilystore” found in the
> > codebase and hundreds of tests failed on my branch in CircleCI, so that
> 2K
> > LOC change would probably be quite a bit bigger.  There is, of course, a
> > lot more than just renaming this one class, there’s thousands of variable
> > names using any manner of “cf”, “cfs”, “columnfamily”, names plus
> comments
> > and who knows what else.  There’s lots of references in probably every
> file
> > that would have to get updated.
> >
> > What are people’s thoughts on this?  We should be honest with ourselves
> > and know this isn’t going to get any easier over time.  It’s only going
> to
> > get more confusing for new people to the project, and having to figure
> out
> > “what kind of row am i even looking at” is a waste of time.  There’s
> > obviously a much bigger impact than just renaming a bunch of files,
> there’s
> > any number of patches and branches that would become outdated, plus
> anyone
> > pulling in Cassandra as a dependency would be affected.  I don’t really
> > have a solution for the disruption other than “leave it in place”, but in
> > my mind that’s not a great (or even good) solution.
> >
> > Anyways, enough out of me.  My concern for ergonomics and naming might be
> > significantly higher than the rest of the folks working in the code, and
> I
> > wanted to put a feeler out there before I decided to dig into this in a
> > more serious manner.
> >
> > Jon
> >
> > [1]
> > https://github.com/apache/cassandra/compare/trunk...
> rustyrazorblade:refactor_column_family_store?expand=1
> > <
> > https://github.com/apache/cassandra/compare/trunk...
> rustyrazorblade:refactor_column_family_store?expand=1
> > >
>

Re: Paying off tech debt and correctly naming things

Reply via email to