Re: Paying off tech debt and correctly naming things

Jeff Jirsa Wed, 21 Mar 2018 17:56:38 -0700

Please please please ping the list and ask if anyone has big commits ready to 
merge before actually committing any huge automated refactors - people who may 
be sitting on big patches will thank you if they don’t have to rebase against 
huge IntelliJ refactors .


-- 
Jeff Jirsa


> On Mar 21, 2018, at 5:49 PM, Jeremiah D Jordan <[email protected]> 
> wrote:
> 
> +1 if you are willing to take it on.  As the person who performed the 
> Table->Keyspace rename of 2.0, I say good luck!  From hindsight of doing 
> that, as others suggested, I would come at this in multiple tickets.
> I would suggest a simple class rename with intellij refactoring tools or 
> something as the first ticket.  This is going to touch the most files at 
> once, but will be mechanical, and for the most part if it compiles it was 
> right :).
> After you have done that you can take on other renaming of things with a 
> smaller scope.
> Also as others have said the main things to be wary of are the naming of 
> things in JMX metrics.  Ideally we would keep around deprecated aliases of 
> the old JMX names for a release before removing them.  The other thing is to 
> watch out for class names in byte man scripts in dtest.
> 
> -Jeremiah
> 
>> On Mar 21, 2018, at 4:48 AM, Sylvain Lebresne <[email protected]> wrote:
>> 
>> I really don't think anyone has been recently against such renaming, and in
>> fact, a _lot_ of renaming *has* already happen over time. The problem, as
>> you carefully noted, is that it's such a big task that there is still a lot
>> to do. Anyway, I've yet to see a patch renaming things to match the CQL
>> naming scheme be rejected, so I'd personally encourage such submission. But
>> maybe with a few caveats (already mentioned largely, so repeating here to
>> signify my personal agreement with them):
>> - renaming with large surface area can be painful for ongoing patches or
>> even future merge. That's not a reason for not doing them, but that's imo a
>> good enough reason to do things incrementally/in as-small-as-reasonable
>> steps. Making sure a renaming commit only does renaming and doesn't change
>> the logic is also pretty nice when you rebase such things.
>> - breaking hundreds of tests is obviously not ok :)
>> - pure code renaming is one reasonably simple aspect, but quite a few
>> renaming may have user visible impact. Particularly around JMX where many
>> things are name based on their class, and to a lesser extend some of our
>> tools still use "old" naming. We can't and shouldn't ignore those impact:
>> such user visible changes should imo be documented, and we should make sure
>> we have a reasonably painless (and thus incremental) upgrade path. My hunch
>> is the latter isn't as simple as it seems.
>> 
>> 
>> --
>> Sylvain
>> 
>> 
>>> On Wed, Mar 21, 2018 at 9:06 AM kurt greaves <[email protected]> wrote:
>>> 
>>> As someone who came to the codebase post CQL but prior to thrift being
>>> removed, +1 to refactor. The current mixing of terminology is a complete
>>> nightmare. This would also give a good opportunity document a lot of code
>>> that simply isn't documented (or incorrect). I'd say it's worth doing it in
>>> multiple steps though, such as refactor of a single class at a time, then
>>> followed by refactor of variable names. We've already done one pretty big
>>> refactor (InetAddressAndPort) for 4.0, I don't see how a few more could
>>> make it any worse (lol).
>>> 
>>> Row vs partition vs key vs PK is killing me
>>> 
>>>> On 20 March 2018 at 22:04, Jon Haddad <[email protected]> wrote:
>>>> 
>>>> Whenever I hop around in the codebase, one thing that always manages to
>>>> slow me down is needing to understand the context of the variable names
>>>> that I’m looking at.  We’ve now removed thrift the transport, but the
>>>> variables, classes and comments still remain.  Personally, I’d like to go
>>>> in and pay off as much technical debt as possible by refactoring the code
>>>> to be as close to CQL as possible.  Rows should be rows, not partitions,
>>>> I’d love to see the term column family removed forever in favor of always
>>>> using tables.  That said, it’s a big task.  I did a quick refactor in a
>>>> branch, simply changing the ColumnFamilyStore class to TableStore, and
>>>> pushed it up to GitHub. [1]
>>>> 
>>>> Didn’t click on the link?  That’s ok.  The TL;DR is that it’s almost 2K
>>>> LOC changed across 275 files.  I’ll note that my branch doesn’t change
>>> any
>>>> of the almost 1000 search results of “columnfamilystore” found in the
>>>> codebase and hundreds of tests failed on my branch in CircleCI, so that
>>> 2K
>>>> LOC change would probably be quite a bit bigger.  There is, of course, a
>>>> lot more than just renaming this one class, there’s thousands of variable
>>>> names using any manner of “cf”, “cfs”, “columnfamily”, names plus
>>> comments
>>>> and who knows what else.  There’s lots of references in probably every
>>> file
>>>> that would have to get updated.
>>>> 
>>>> What are people’s thoughts on this?  We should be honest with ourselves
>>>> and know this isn’t going to get any easier over time.  It’s only going
>>> to
>>>> get more confusing for new people to the project, and having to figure
>>> out
>>>> “what kind of row am i even looking at” is a waste of time.  There’s
>>>> obviously a much bigger impact than just renaming a bunch of files,
>>> there’s
>>>> any number of patches and branches that would become outdated, plus
>>> anyone
>>>> pulling in Cassandra as a dependency would be affected.  I don’t really
>>>> have a solution for the disruption other than “leave it in place”, but in
>>>> my mind that’s not a great (or even good) solution.
>>>> 
>>>> Anyways, enough out of me.  My concern for ergonomics and naming might be
>>>> significantly higher than the rest of the folks working in the code, and
>>> I
>>>> wanted to put a feeler out there before I decided to dig into this in a
>>>> more serious manner.
>>>> 
>>>> Jon
>>>> 
>>>> [1]
>>>> 
>>> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
>>>> <
>>>> 
>>> https://github.com/apache/cassandra/compare/trunk...rustyrazorblade:refactor_column_family_store?expand=1
>>>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Paying off tech debt and correctly naming things

Reply via email to