I agree that the refactoring make sense but I also know the pain of merging branches with smaller refactorings. When you have to make a patch for several branches and that you have to go through several round of reviews it can be pretty painful. I know that pain too well so I am not in favor of the bing bang approach.
On Wed, Mar 21, 2018 at 9:05 AM, kurt greaves <k...@instaclustr.com> wrote: > As someone who came to the codebase post CQL but prior to thrift being > removed, +1 to refactor. The current mixing of terminology is a complete > nightmare. This would also give a good opportunity document a lot of code > that simply isn't documented (or incorrect). I'd say it's worth doing it in > multiple steps though, such as refactor of a single class at a time, then > followed by refactor of variable names. We've already done one pretty big > refactor (InetAddressAndPort) for 4.0, I don't see how a few more could > make it any worse (lol). > > Row vs partition vs key vs PK is killing me > > On 20 March 2018 at 22:04, Jon Haddad <j...@jonhaddad.com> wrote: > > > Whenever I hop around in the codebase, one thing that always manages to > > slow me down is needing to understand the context of the variable names > > that I’m looking at. We’ve now removed thrift the transport, but the > > variables, classes and comments still remain. Personally, I’d like to go > > in and pay off as much technical debt as possible by refactoring the code > > to be as close to CQL as possible. Rows should be rows, not partitions, > > I’d love to see the term column family removed forever in favor of always > > using tables. That said, it’s a big task. I did a quick refactor in a > > branch, simply changing the ColumnFamilyStore class to TableStore, and > > pushed it up to GitHub. [1] > > > > Didn’t click on the link? That’s ok. The TL;DR is that it’s almost 2K > > LOC changed across 275 files. I’ll note that my branch doesn’t change > any > > of the almost 1000 search results of “columnfamilystore” found in the > > codebase and hundreds of tests failed on my branch in CircleCI, so that > 2K > > LOC change would probably be quite a bit bigger. There is, of course, a > > lot more than just renaming this one class, there’s thousands of variable > > names using any manner of “cf”, “cfs”, “columnfamily”, names plus > comments > > and who knows what else. There’s lots of references in probably every > file > > that would have to get updated. > > > > What are people’s thoughts on this? We should be honest with ourselves > > and know this isn’t going to get any easier over time. It’s only going > to > > get more confusing for new people to the project, and having to figure > out > > “what kind of row am i even looking at” is a waste of time. There’s > > obviously a much bigger impact than just renaming a bunch of files, > there’s > > any number of patches and branches that would become outdated, plus > anyone > > pulling in Cassandra as a dependency would be affected. I don’t really > > have a solution for the disruption other than “leave it in place”, but in > > my mind that’s not a great (or even good) solution. > > > > Anyways, enough out of me. My concern for ergonomics and naming might be > > significantly higher than the rest of the folks working in the code, and > I > > wanted to put a feeler out there before I decided to dig into this in a > > more serious manner. > > > > Jon > > > > [1] > > https://github.com/apache/cassandra/compare/trunk... > rustyrazorblade:refactor_column_family_store?expand=1 > > < > > https://github.com/apache/cassandra/compare/trunk... > rustyrazorblade:refactor_column_family_store?expand=1 > > > >