Refactoring cassandra service package

2014-06-03 Thread Simon Chemouil
Hi,

I'm new to Cassandra and felt like exploring and hacking on the code. I
was surprised to see the usage of so many mutable/global state statics
all over the service package (basically global variables/singletons).

While I understand it can be practical to work with singletons, and that
in any case I'm not sure multi-tenant Cassandra (as in two different
Cassandra instances within the same process) would make sense at all (or
even work considering there is some native access going on with JNA), I
find static state can easily lead to tangled 'spaghetti' code (accessing
the singletons from anywhere, even where one shouldn't), and in general
it ties the code to the VM instance, rather than to the class.

I tried to find if it was an actual design choice, but from my
understanding this is more something inherited from the early Cassandra
times at Facebook. I just found this thread[1] pointing to issue
CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
because no one took it (but still marked as open for contribution). The
current code conventions also don't mention the usage of singletons
except by stating:  "Do not extract interfaces (or abstract classes)
unless you actually need multiple implementations of it" (switching to a
"service"-style design doesn't require passing interfaces but it's
highly encouraged to help testability).

So, I'd like to try to make this refactoring happen and remove all (or
most) mutable static state. It would be an easy way in for me in
Cassandra's internals (maybe to contribute further). I think it would
help testing (ability to unit test components without going to the
storage for instance) and in general modernize the code. It would also
make hacking on Cassandra easier because people could pick different
pieces without pulling the whole thing.

It would definitely break backwards compatibility with current Java code
that directly embeds Cassandra / uses it as a library, but I would keep
the same abstraction so the refactoring would be easy. In any case,
backwards compatibility can be broken by many more changes than just
refactoring, and once this is done it will be easier to deal with
backwards compatibility.

Obviously all ".instance" fields would be gone, and I'd try to fix
potential cyclic class dependencies and generally make sure classes
dependencies form a direct acyclic graph with CassandraDaemon as its
root. The basic idea is to have each 'service' component require all its
service dependencies in their constructor (and keeping them as a final
field), rather than getting them via the global namespace (singleton
instances).

If I had it my way, I'd probably use a dependency injection framework,
namely Dagger which is as far as I knpw the lightest Java DI framework
actively developed (jointly developed by Square and Google's Java team
responsible for Guice & Guava), which has a neat compile-time annotation
processor that detects missing dependencies early on. It works with both
Android and J2SE and is very fast, simple and light (65kB vs 710kB for
Guice).

So, the question is: would you guys accept such a patch? I'd rather not
do the work if it has no chance of being merged upstream :).

Cheers,

-- 
Simon


[1]
http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code


Re: Refactoring cassandra service package

2014-06-03 Thread Gary Dusbabek
On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil  wrote:

> Hi,
>
> I'm new to Cassandra and felt like exploring and hacking on the code. I
> was surprised to see the usage of so many mutable/global state statics
> all over the service package (basically global variables/singletons).
>
> While I understand it can be practical to work with singletons, and that
> in any case I'm not sure multi-tenant Cassandra (as in two different
> Cassandra instances within the same process) would make sense at all (or
> even work considering there is some native access going on with JNA), I
> find static state can easily lead to tangled 'spaghetti' code (accessing
> the singletons from anywhere, even where one shouldn't), and in general
> it ties the code to the VM instance, rather than to the class.
>
> I tried to find if it was an actual design choice, but from my
> understanding this is more something inherited from the early Cassandra
> times at Facebook. I just found this thread[1] pointing to issue
> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
> because no one took it (but still marked as open for contribution). The
> current code conventions also don't mention the usage of singletons
> except by stating:  "Do not extract interfaces (or abstract classes)
> unless you actually need multiple implementations of it" (switching to a
> "service"-style design doesn't require passing interfaces but it's
> highly encouraged to help testability).
>
> So, I'd like to try to make this refactoring happen and remove all (or
> most) mutable static state. It would be an easy way in for me in
> Cassandra's internals (maybe to contribute further). I think it would
> help testing (ability to unit test components without going to the
> storage for instance) and in general modernize the code. It would also
> make hacking on Cassandra easier because people could pick different
> pieces without pulling the whole thing.
>
> It would definitely break backwards compatibility with current Java code
> that directly embeds Cassandra / uses it as a library, but I would keep
> the same abstraction so the refactoring would be easy. In any case,
> backwards compatibility can be broken by many more changes than just
> refactoring, and once this is done it will be easier to deal with
> backwards compatibility.
>
> Obviously all ".instance" fields would be gone, and I'd try to fix
> potential cyclic class dependencies and generally make sure classes
> dependencies form a direct acyclic graph with CassandraDaemon as its
> root. The basic idea is to have each 'service' component require all its
> service dependencies in their constructor (and keeping them as a final
> field), rather than getting them via the global namespace (singleton
> instances).
>
> If I had it my way, I'd probably use a dependency injection framework,
> namely Dagger which is as far as I knpw the lightest Java DI framework
> actively developed (jointly developed by Square and Google's Java team
> responsible for Guice & Guava), which has a neat compile-time annotation
> processor that detects missing dependencies early on. It works with both
> Android and J2SE and is very fast, simple and light (65kB vs 710kB for
> Guice).
>
> So, the question is: would you guys accept such a patch? I'd rather not
> do the work if it has no chance of being merged upstream :).
>

This has come up before. Let's face it, removing the singletons is a
tempting proposition.

Several of us have been down the path of trying to do it.

At the end of the day, here's what you'd end up with (absolutely best case):

1. Modifying just about every class, sometimes substantially.
2. A huge patch for someone else to review.
3. No performance gains, no bug fixes.  In fact, since so many classes have
to be changed, I'd say that the risk of introducing a bug/regression is
fairly likely.
4. Complicated merges when bugs need to be fixed in older versions.
5. More modular and testable code.

So far, the positive aspects of 5 have not been able to trump the
challenges presented by 1, 2, 3, and 4.

Kind Regards,

Gary.


>
> Cheers,
>
> --
> Simon
>
>
> [1]
>
> http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code
>


Re: Refactoring cassandra service package

2014-06-03 Thread Brandon Williams
Relevant: https://issues.apache.org/jira/browse/CASSANDRA-6881


On Tue, Jun 3, 2014 at 12:59 PM, Gary Dusbabek  wrote:

> On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil 
> wrote:
>
> > Hi,
> >
> > I'm new to Cassandra and felt like exploring and hacking on the code. I
> > was surprised to see the usage of so many mutable/global state statics
> > all over the service package (basically global variables/singletons).
> >
> > While I understand it can be practical to work with singletons, and that
> > in any case I'm not sure multi-tenant Cassandra (as in two different
> > Cassandra instances within the same process) would make sense at all (or
> > even work considering there is some native access going on with JNA), I
> > find static state can easily lead to tangled 'spaghetti' code (accessing
> > the singletons from anywhere, even where one shouldn't), and in general
> > it ties the code to the VM instance, rather than to the class.
> >
> > I tried to find if it was an actual design choice, but from my
> > understanding this is more something inherited from the early Cassandra
> > times at Facebook. I just found this thread[1] pointing to issue
> > CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
> > because no one took it (but still marked as open for contribution). The
> > current code conventions also don't mention the usage of singletons
> > except by stating:  "Do not extract interfaces (or abstract classes)
> > unless you actually need multiple implementations of it" (switching to a
> > "service"-style design doesn't require passing interfaces but it's
> > highly encouraged to help testability).
> >
> > So, I'd like to try to make this refactoring happen and remove all (or
> > most) mutable static state. It would be an easy way in for me in
> > Cassandra's internals (maybe to contribute further). I think it would
> > help testing (ability to unit test components without going to the
> > storage for instance) and in general modernize the code. It would also
> > make hacking on Cassandra easier because people could pick different
> > pieces without pulling the whole thing.
> >
> > It would definitely break backwards compatibility with current Java code
> > that directly embeds Cassandra / uses it as a library, but I would keep
> > the same abstraction so the refactoring would be easy. In any case,
> > backwards compatibility can be broken by many more changes than just
> > refactoring, and once this is done it will be easier to deal with
> > backwards compatibility.
> >
> > Obviously all ".instance" fields would be gone, and I'd try to fix
> > potential cyclic class dependencies and generally make sure classes
> > dependencies form a direct acyclic graph with CassandraDaemon as its
> > root. The basic idea is to have each 'service' component require all its
> > service dependencies in their constructor (and keeping them as a final
> > field), rather than getting them via the global namespace (singleton
> > instances).
> >
> > If I had it my way, I'd probably use a dependency injection framework,
> > namely Dagger which is as far as I knpw the lightest Java DI framework
> > actively developed (jointly developed by Square and Google's Java team
> > responsible for Guice & Guava), which has a neat compile-time annotation
> > processor that detects missing dependencies early on. It works with both
> > Android and J2SE and is very fast, simple and light (65kB vs 710kB for
> > Guice).
> >
> > So, the question is: would you guys accept such a patch? I'd rather not
> > do the work if it has no chance of being merged upstream :).
> >
>
> This has come up before. Let's face it, removing the singletons is a
> tempting proposition.
>
> Several of us have been down the path of trying to do it.
>
> At the end of the day, here's what you'd end up with (absolutely best
> case):
>
> 1. Modifying just about every class, sometimes substantially.
> 2. A huge patch for someone else to review.
> 3. No performance gains, no bug fixes.  In fact, since so many classes have
> to be changed, I'd say that the risk of introducing a bug/regression is
> fairly likely.
> 4. Complicated merges when bugs need to be fixed in older versions.
> 5. More modular and testable code.
>
> So far, the positive aspects of 5 have not been able to trump the
> challenges presented by 1, 2, 3, and 4.
>
> Kind Regards,
>
> Gary.
>
>
> >
> > Cheers,
> >
> > --
> > Simon
> >
> >
> > [1]
> >
> >
> http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code
> >
>


Re: Refactoring cassandra service package

2014-06-03 Thread Brian O'Neill

Interesting proposition.  We¹ve embedded Cassandra a few times, so I¹d be
interested in an approach that makes that easier.

Is there a way to do it incrementally?  Introduce the injection framework,
and convert a few classes (those required for startup), then slowly
convert the remainder?

peanut gallery,
-brian

---
Brian O'Neill
Chief Technology Officer

Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42   €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 6/3/14, 1:59 PM, "Gary Dusbabek"  wrote:

>On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil 
>wrote:
>
>> Hi,
>>
>> I'm new to Cassandra and felt like exploring and hacking on the code. I
>> was surprised to see the usage of so many mutable/global state statics
>> all over the service package (basically global variables/singletons).
>>
>> While I understand it can be practical to work with singletons, and that
>> in any case I'm not sure multi-tenant Cassandra (as in two different
>> Cassandra instances within the same process) would make sense at all (or
>> even work considering there is some native access going on with JNA), I
>> find static state can easily lead to tangled 'spaghetti' code (accessing
>> the singletons from anywhere, even where one shouldn't), and in general
>> it ties the code to the VM instance, rather than to the class.
>>
>> I tried to find if it was an actual design choice, but from my
>> understanding this is more something inherited from the early Cassandra
>> times at Facebook. I just found this thread[1] pointing to issue
>> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
>> because no one took it (but still marked as open for contribution). The
>> current code conventions also don't mention the usage of singletons
>> except by stating:  "Do not extract interfaces (or abstract classes)
>> unless you actually need multiple implementations of it" (switching to a
>> "service"-style design doesn't require passing interfaces but it's
>> highly encouraged to help testability).
>>
>> So, I'd like to try to make this refactoring happen and remove all (or
>> most) mutable static state. It would be an easy way in for me in
>> Cassandra's internals (maybe to contribute further). I think it would
>> help testing (ability to unit test components without going to the
>> storage for instance) and in general modernize the code. It would also
>> make hacking on Cassandra easier because people could pick different
>> pieces without pulling the whole thing.
>>
>> It would definitely break backwards compatibility with current Java code
>> that directly embeds Cassandra / uses it as a library, but I would keep
>> the same abstraction so the refactoring would be easy. In any case,
>> backwards compatibility can be broken by many more changes than just
>> refactoring, and once this is done it will be easier to deal with
>> backwards compatibility.
>>
>> Obviously all ".instance" fields would be gone, and I'd try to fix
>> potential cyclic class dependencies and generally make sure classes
>> dependencies form a direct acyclic graph with CassandraDaemon as its
>> root. The basic idea is to have each 'service' component require all its
>> service dependencies in their constructor (and keeping them as a final
>> field), rather than getting them via the global namespace (singleton
>> instances).
>>
>> If I had it my way, I'd probably use a dependency injection framework,
>> namely Dagger which is as far as I knpw the lightest Java DI framework
>> actively developed (jointly developed by Square and Google's Java team
>> responsible for Guice & Guava), which has a neat compile-time annotation
>> processor that detects missing dependencies early on. It works with both
>> Android and J2SE and is very fast, simple and light (65kB vs 710kB for
>> Guice).
>>
>> So, the question is: would you guys accept such a patch? I'd rather not
>> do the work if it has no chance of being merged upstream :).
>>
>
>This has come up before. Let's face it, removing the singletons is a
>tempting proposition.
>
>Several of us have been down the path of trying to do it.
>
>At the end of the day, here's what you'd end up with (absolutely best
>case):
>
>1. Modifying just about every class, sometimes substantially.
>2. A huge patch for someone else to review.
>3. N

Re: Refactoring cassandra service package

2014-06-03 Thread Jeremy Hanna
There was some hope started in CASSANDRA-6881 - see some of the later comments: 
https://issues.apache.org/jira/browse/CASSANDRA-6881

On 4 Jun 2014, at 04:04, Brian O'Neill  wrote:

> 
> Interesting proposition.  We¹ve embedded Cassandra a few times, so I¹d be
> interested in an approach that makes that easier.
> 
> Is there a way to do it incrementally?  Introduce the injection framework,
> and convert a few classes (those required for startup), then slowly
> convert the remainder?
> 
> peanut gallery,
> -brian
> 
> ---
> Brian O'Neill
> Chief Technology Officer
> 
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive € King of Prussia, PA € 19406
> M: 215.588.6024 € @boneill42   €
> healthmarketscience.com
> 
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or
> the person responsible to deliver it to the intended recipient, please
> contact the sender at the email above and delete this email and any
> attachments and destroy any copies thereof. Any review, retransmission,
> dissemination, copying or other use of, or taking any action in reliance
> upon, this information by persons or entities other than the intended
> recipient is strictly prohibited.
> 
> 
> 
> 
> 
> 
> 
> On 6/3/14, 1:59 PM, "Gary Dusbabek"  wrote:
> 
>> On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I'm new to Cassandra and felt like exploring and hacking on the code. I
>>> was surprised to see the usage of so many mutable/global state statics
>>> all over the service package (basically global variables/singletons).
>>> 
>>> While I understand it can be practical to work with singletons, and that
>>> in any case I'm not sure multi-tenant Cassandra (as in two different
>>> Cassandra instances within the same process) would make sense at all (or
>>> even work considering there is some native access going on with JNA), I
>>> find static state can easily lead to tangled 'spaghetti' code (accessing
>>> the singletons from anywhere, even where one shouldn't), and in general
>>> it ties the code to the VM instance, rather than to the class.
>>> 
>>> I tried to find if it was an actual design choice, but from my
>>> understanding this is more something inherited from the early Cassandra
>>> times at Facebook. I just found this thread[1] pointing to issue
>>> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
>>> because no one took it (but still marked as open for contribution). The
>>> current code conventions also don't mention the usage of singletons
>>> except by stating:  "Do not extract interfaces (or abstract classes)
>>> unless you actually need multiple implementations of it" (switching to a
>>> "service"-style design doesn't require passing interfaces but it's
>>> highly encouraged to help testability).
>>> 
>>> So, I'd like to try to make this refactoring happen and remove all (or
>>> most) mutable static state. It would be an easy way in for me in
>>> Cassandra's internals (maybe to contribute further). I think it would
>>> help testing (ability to unit test components without going to the
>>> storage for instance) and in general modernize the code. It would also
>>> make hacking on Cassandra easier because people could pick different
>>> pieces without pulling the whole thing.
>>> 
>>> It would definitely break backwards compatibility with current Java code
>>> that directly embeds Cassandra / uses it as a library, but I would keep
>>> the same abstraction so the refactoring would be easy. In any case,
>>> backwards compatibility can be broken by many more changes than just
>>> refactoring, and once this is done it will be easier to deal with
>>> backwards compatibility.
>>> 
>>> Obviously all ".instance" fields would be gone, and I'd try to fix
>>> potential cyclic class dependencies and generally make sure classes
>>> dependencies form a direct acyclic graph with CassandraDaemon as its
>>> root. The basic idea is to have each 'service' component require all its
>>> service dependencies in their constructor (and keeping them as a final
>>> field), rather than getting them via the global namespace (singleton
>>> instances).
>>> 
>>> If I had it my way, I'd probably use a dependency injection framework,
>>> namely Dagger which is as far as I knpw the lightest Java DI framework
>>> actively developed (jointly developed by Square and Google's Java team
>>> responsible for Guice & Guava), which has a neat compile-time annotation
>>> processor that detects missing dependencies early on. It works with both
>>> Android and J2SE and is very fast, simple and light (65kB vs 710kB for
>>> Guice).
>>> 
>>> So, the question is: would you guys accept such a patch? I'd rather not
>>> do the work if it has no chance of being merged upstream :).
>>> 
>> 
>> This has co

Re: Refactoring cassandra service package

2014-06-03 Thread Simon Chemouil
Brandon Williams racontait le 03/06/2014 20:00:
> Relevant: https://issues.apache.org/jira/browse/CASSANDRA-6881

Thanks for the pointer, couldn't find that issue.




> On Tue, Jun 3, 2014 at 12:59 PM, Gary Dusbabek  wrote:
> 
>> On Tue, Jun 3, 2014 at 3:52 AM, Simon Chemouil 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm new to Cassandra and felt like exploring and hacking on the code. I
>>> was surprised to see the usage of so many mutable/global state statics
>>> all over the service package (basically global variables/singletons).
>>>
>>> While I understand it can be practical to work with singletons, and that
>>> in any case I'm not sure multi-tenant Cassandra (as in two different
>>> Cassandra instances within the same process) would make sense at all (or
>>> even work considering there is some native access going on with JNA), I
>>> find static state can easily lead to tangled 'spaghetti' code (accessing
>>> the singletons from anywhere, even where one shouldn't), and in general
>>> it ties the code to the VM instance, rather than to the class.
>>>
>>> I tried to find if it was an actual design choice, but from my
>>> understanding this is more something inherited from the early Cassandra
>>> times at Facebook. I just found this thread[1] pointing to issue
>>> CASSANDRA-741 (slightly more limited scope) that was marked as WONTFIX
>>> because no one took it (but still marked as open for contribution). The
>>> current code conventions also don't mention the usage of singletons
>>> except by stating:  "Do not extract interfaces (or abstract classes)
>>> unless you actually need multiple implementations of it" (switching to a
>>> "service"-style design doesn't require passing interfaces but it's
>>> highly encouraged to help testability).
>>>
>>> So, I'd like to try to make this refactoring happen and remove all (or
>>> most) mutable static state. It would be an easy way in for me in
>>> Cassandra's internals (maybe to contribute further). I think it would
>>> help testing (ability to unit test components without going to the
>>> storage for instance) and in general modernize the code. It would also
>>> make hacking on Cassandra easier because people could pick different
>>> pieces without pulling the whole thing.
>>>
>>> It would definitely break backwards compatibility with current Java code
>>> that directly embeds Cassandra / uses it as a library, but I would keep
>>> the same abstraction so the refactoring would be easy. In any case,
>>> backwards compatibility can be broken by many more changes than just
>>> refactoring, and once this is done it will be easier to deal with
>>> backwards compatibility.
>>>
>>> Obviously all ".instance" fields would be gone, and I'd try to fix
>>> potential cyclic class dependencies and generally make sure classes
>>> dependencies form a direct acyclic graph with CassandraDaemon as its
>>> root. The basic idea is to have each 'service' component require all its
>>> service dependencies in their constructor (and keeping them as a final
>>> field), rather than getting them via the global namespace (singleton
>>> instances).
>>>
>>> If I had it my way, I'd probably use a dependency injection framework,
>>> namely Dagger which is as far as I knpw the lightest Java DI framework
>>> actively developed (jointly developed by Square and Google's Java team
>>> responsible for Guice & Guava), which has a neat compile-time annotation
>>> processor that detects missing dependencies early on. It works with both
>>> Android and J2SE and is very fast, simple and light (65kB vs 710kB for
>>> Guice).
>>>
>>> So, the question is: would you guys accept such a patch? I'd rather not
>>> do the work if it has no chance of being merged upstream :).
>>>
>>
>> This has come up before. Let's face it, removing the singletons is a
>> tempting proposition.
>>
>> Several of us have been down the path of trying to do it.
>>
>> At the end of the day, here's what you'd end up with (absolutely best
>> case):
>>
>> 1. Modifying just about every class, sometimes substantially.
>> 2. A huge patch for someone else to review.
>> 3. No performance gains, no bug fixes.  In fact, since so many classes have
>> to be changed, I'd say that the risk of introducing a bug/regression is
>> fairly likely.
>> 4. Complicated merges when bugs need to be fixed in older versions.
>> 5. More modular and testable code.
>>
>> So far, the positive aspects of 5 have not been able to trump the
>> challenges presented by 1, 2, 3, and 4.
>>
>> Kind Regards,
>>
>> Gary.
>>
>>
>>>
>>> Cheers,
>>>
>>> --
>>> Simon
>>>
>>>
>>> [1]
>>>
>>>
>> http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code
>>>
>>
> 



Re: Refactoring cassandra service package

2014-06-03 Thread Simon Chemouil
Gary Dusbabek racontait le 03/06/2014 19:59:
> This has come up before. Let's face it, removing the singletons is a
> tempting proposition.
> 
> Several of us have been down the path of trying to do it.
> 
> At the end of the day, here's what you'd end up with (absolutely best case):
> 
> 1. Modifying just about every class, sometimes substantially.
> 2. A huge patch for someone else to review.
> 3. No performance gains, no bug fixes.  In fact, since so many classes have
> to be changed, I'd say that the risk of introducing a bug/regression is
> fairly likely.
> 4. Complicated merges when bugs need to be fixed in older versions.
> 5. More modular and testable code.
> 
> So far, the positive aspects of 5 have not been able to trump the
> challenges presented by 1, 2, 3, and 4.

Thanks for your reply. I understand the reasoning, yet obviously I think
your 5th point weighs a lot more than the others, because it also means
more hackable code, and though a huge patch might be scary, seeing so
many static fields is scary as well ;).

We could make the patch more manageable by splitting it in several
patches, one per 'service', and start from the leaves of the dependency
graph (services not using other services), but we'd have to apply the
patches in a specific order. Still, it would make it easier to review.

The 3rd point is I guess usual in the life of an healthy project: fixing
the 'technical debt' the project accumulated over the years seems almost
as important as fixing bugs and improving performance. While doing it,
we would also port the tests and hopefully introduce new tests to make
sure we didn't introduce bugs, and generally be careful we don't break
anything. Yes, there might be new bugs, probably the kind of bugs that
are already there but lurking because of the initialization order or
relying on specific side-effects, but those already exist and might pop
out any time. Refactoring this code seems, for the most part, to be a
fairly repetitive process and doing it carefully should allow to avoid
most bugs introduced by inattention.

The (4) is an important problem, but merging this would probably mean a
major version bump (e.g, within the 3.0.0 branch), and at some point in
time, older versions will reach their EOL. It seems to me that only bug
fixes are backported, and the patches already have to be adapted for the
1.x branch... But unless people give up entirely on the idea of fixing
this, this problem is going to become worse as time goes by.

Cheers,

Simon


> Kind Regards,
> 
> Gary.
> 
> 
>>
>> Cheers,
>>
>> --
>> Simon
>>
>>
>> [1]
>>
>> http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code
>>
> 



Re: Refactoring cassandra service package

2014-06-03 Thread Simon Chemouil
> (services not using other services)
meant services not used by other services

Simon Chemouil racontait le 03/06/2014 21:20:
> Gary Dusbabek racontait le 03/06/2014 19:59:
>> This has come up before. Let's face it, removing the singletons is a
>> tempting proposition.
>>
>> Several of us have been down the path of trying to do it.
>>
>> At the end of the day, here's what you'd end up with (absolutely best case):
>>
>> 1. Modifying just about every class, sometimes substantially.
>> 2. A huge patch for someone else to review.
>> 3. No performance gains, no bug fixes.  In fact, since so many classes have
>> to be changed, I'd say that the risk of introducing a bug/regression is
>> fairly likely.
>> 4. Complicated merges when bugs need to be fixed in older versions.
>> 5. More modular and testable code.
>>
>> So far, the positive aspects of 5 have not been able to trump the
>> challenges presented by 1, 2, 3, and 4.
> 
> Thanks for your reply. I understand the reasoning, yet obviously I think
> your 5th point weighs a lot more than the others, because it also means
> more hackable code, and though a huge patch might be scary, seeing so
> many static fields is scary as well ;).
> 
> We could make the patch more manageable by splitting it in several
> patches, one per 'service', and start from the leaves of the dependency
> graph (services not using other services), but we'd have to apply the
> patches in a specific order. Still, it would make it easier to review.
> 
> The 3rd point is I guess usual in the life of an healthy project: fixing
> the 'technical debt' the project accumulated over the years seems almost
> as important as fixing bugs and improving performance. While doing it,
> we would also port the tests and hopefully introduce new tests to make
> sure we didn't introduce bugs, and generally be careful we don't break
> anything. Yes, there might be new bugs, probably the kind of bugs that
> are already there but lurking because of the initialization order or
> relying on specific side-effects, but those already exist and might pop
> out any time. Refactoring this code seems, for the most part, to be a
> fairly repetitive process and doing it carefully should allow to avoid
> most bugs introduced by inattention.
> 
> The (4) is an important problem, but merging this would probably mean a
> major version bump (e.g, within the 3.0.0 branch), and at some point in
> time, older versions will reach their EOL. It seems to me that only bug
> fixes are backported, and the patches already have to be adapted for the
> 1.x branch... But unless people give up entirely on the idea of fixing
> this, this problem is going to become worse as time goes by.
> 
> Cheers,
> 
> Simon
> 
> 
>> Kind Regards,
>>
>> Gary.
>>
>>
>>>
>>> Cheers,
>>>
>>> --
>>> Simon
>>>
>>>
>>> [1]
>>>
>>> http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code
>>>
>>
> 



Re: Refactoring cassandra service package

2014-06-03 Thread Dave Brosius

This is a relatively stupid comment, but i'm good at that so here goes.

The solution to several of these issues is,

Refactor 1 singleton use case, and post a patch.

1) less classes are involved
2) the review is more likely to be understood by one person
3) less chance for regression
4) marginally more modular code

When that goes well, pick another one.

There still is the issue of

 >> 4. Complicated merges when bugs need to be fixed in older versions.

On 2014-06-03 15:20, Simon Chemouil wrote:

Gary Dusbabek racontait le 03/06/2014 19:59:

This has come up before. Let's face it, removing the singletons is a
tempting proposition.

Several of us have been down the path of trying to do it.

At the end of the day, here's what you'd end up with (absolutely best 
case):


1. Modifying just about every class, sometimes substantially.
2. A huge patch for someone else to review.
3. No performance gains, no bug fixes.  In fact, since so many classes 
have
to be changed, I'd say that the risk of introducing a bug/regression 
is

fairly likely.
4. Complicated merges when bugs need to be fixed in older versions.
5. More modular and testable code.

So far, the positive aspects of 5 have not been able to trump the
challenges presented by 1, 2, 3, and 4.


Thanks for your reply. I understand the reasoning, yet obviously I 
think

your 5th point weighs a lot more than the others, because it also means
more hackable code, and though a huge patch might be scary, seeing so
many static fields is scary as well ;).

We could make the patch more manageable by splitting it in several
patches, one per 'service', and start from the leaves of the dependency
graph (services not using other services), but we'd have to apply the
patches in a specific order. Still, it would make it easier to review.

The 3rd point is I guess usual in the life of an healthy project: 
fixing
the 'technical debt' the project accumulated over the years seems 
almost

as important as fixing bugs and improving performance. While doing it,
we would also port the tests and hopefully introduce new tests to make
sure we didn't introduce bugs, and generally be careful we don't break
anything. Yes, there might be new bugs, probably the kind of bugs that
are already there but lurking because of the initialization order or
relying on specific side-effects, but those already exist and might pop
out any time. Refactoring this code seems, for the most part, to be a
fairly repetitive process and doing it carefully should allow to avoid
most bugs introduced by inattention.

The (4) is an important problem, but merging this would probably mean a
major version bump (e.g, within the 3.0.0 branch), and at some point in
time, older versions will reach their EOL. It seems to me that only bug
fixes are backported, and the patches already have to be adapted for 
the

1.x branch... But unless people give up entirely on the idea of fixing
this, this problem is going to become worse as time goes by.

Cheers,

Simon



Kind Regards,

Gary.




Cheers,

--
Simon


[1]

http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code





Re: Refactoring cassandra service package

2014-06-03 Thread Jacob Rhoden


> On 4 Jun 2014, at 7:24 am, Dave Brosius  wrote:
> 
> The solution to several of these issues is,
> 
> Refactor 1 singleton use case, and post a patch.

+1 do it slowly

-1 for adding a dependency on a dependency injection framework. It's not 
strictly needed if your goal is for removing singletons and improving 
testability.

> 
> 1) less classes are involved
> 2) the review is more likely to be understood by one person
> 3) less chance for regression
> 4) marginally more modular code
> 
> When that goes well, pick another one.


Re: Refactoring cassandra service package

2014-06-03 Thread Dave Brosius



It also means that problems that are introduced are bisectable, which 
given these changes would be a real benefit.


It also means if the process doesn't go well, you won't hate reviewers 
for life.


On 2014-06-03 17:24, Dave Brosius wrote:

This is a relatively stupid comment, but i'm good at that so here goes.

The solution to several of these issues is,

Refactor 1 singleton use case, and post a patch.

1) less classes are involved
2) the review is more likely to be understood by one person
3) less chance for regression
4) marginally more modular code

When that goes well, pick another one.

There still is the issue of

 >> 4. Complicated merges when bugs need to be fixed in older versions.

On 2014-06-03 15:20, Simon Chemouil wrote:

Gary Dusbabek racontait le 03/06/2014 19:59:

This has come up before. Let's face it, removing the singletons is a
tempting proposition.

Several of us have been down the path of trying to do it.

At the end of the day, here's what you'd end up with (absolutely best 
case):


1. Modifying just about every class, sometimes substantially.
2. A huge patch for someone else to review.
3. No performance gains, no bug fixes.  In fact, since so many 
classes have
to be changed, I'd say that the risk of introducing a bug/regression 
is

fairly likely.
4. Complicated merges when bugs need to be fixed in older versions.
5. More modular and testable code.

So far, the positive aspects of 5 have not been able to trump the
challenges presented by 1, 2, 3, and 4.


Thanks for your reply. I understand the reasoning, yet obviously I 
think
your 5th point weighs a lot more than the others, because it also 
means

more hackable code, and though a huge patch might be scary, seeing so
many static fields is scary as well ;).

We could make the patch more manageable by splitting it in several
patches, one per 'service', and start from the leaves of the 
dependency

graph (services not using other services), but we'd have to apply the
patches in a specific order. Still, it would make it easier to review.

The 3rd point is I guess usual in the life of an healthy project: 
fixing
the 'technical debt' the project accumulated over the years seems 
almost

as important as fixing bugs and improving performance. While doing it,
we would also port the tests and hopefully introduce new tests to make
sure we didn't introduce bugs, and generally be careful we don't break
anything. Yes, there might be new bugs, probably the kind of bugs that
are already there but lurking because of the initialization order or
relying on specific side-effects, but those already exist and might 
pop

out any time. Refactoring this code seems, for the most part, to be a
fairly repetitive process and doing it carefully should allow to avoid
most bugs introduced by inattention.

The (4) is an important problem, but merging this would probably mean 
a
major version bump (e.g, within the 3.0.0 branch), and at some point 
in
time, older versions will reach their EOL. It seems to me that only 
bug
fixes are backported, and the patches already have to be adapted for 
the

1.x branch... But unless people give up entirely on the idea of fixing
this, this problem is going to become worse as time goes by.

Cheers,

Simon



Kind Regards,

Gary.




Cheers,

--
Simon


[1]

http://grokbase.com/t/cassandra/dev/107xr48hek/creating-two-instances-in-code





Re: Cassandra internals bootcamp, Sept 12-13, San Francisco

2014-06-03 Thread Jonathan Ellis
Update:

I'm planning to send out confirmations for half of the seats at the
end of June, and the other half two weeks later.  We are definitely
going to fill this up, so don't wait until the last minute to apply!

On Mon, May 5, 2014 at 10:16 PM, Jonathan Ellis  wrote:
> Want to contribute to Cassandra but don't know where to start?
>
> For the first time, we'll be running a bootcamp for new Cassandra
> contributors immediately after the Cassandra Summit this September.
> This is NOT a projectors-and-powerpoint conference.  The best way to
> learn a new project is to hack on it, and that's what we'll be doing.
>
> We won't be throwing you in the deep end.  Actually, we will, but
> we'll give you some pointers on swimming technique first.  Friday
> morning will cover overviews of different areas of the Cassandra
> storage engine and query processing.  Then Friday afternoon we'll have
> everyone working on a single LHF ("low hanging fruit") ticket as a
> warm up.  Saturday we'll throw open for working on any Cassandra
> ticket individually or in groups.  Both days, we'll have Cassandra
> committers circulating the room to answer questions and get you
> un-stuck.
>
> Attendance is free, but to make sure we can give people individual
> attention as needed, we're limiting this to 40 attendees.  We also
> have some prerequisites in the interest of getting the most out of our
> time together:
>
> - Have a strong core Java background including familiarity with
> java.util.concurrent
>
> - Bring a laptop with the Cassandra source checked out and ready to
> run in your IDE of choice (see
> http://wiki.apache.org/cassandra/HowToContribute for links to
> instructions for Eclipse and Intellij).  We will NOT be covering this
> during the boot camp, so come prepared.
>
> - Set up an Apache Cassandra JIRA account ahead of time
> (https://issues.apache.org/jira/browse/CASSANDRA).
>
> - Read the Dynamo and Annotated Cassandra papers ahead of
> time (http://aws.amazon.com/dynamodb/,
> http://www.datastax.com/documentation/articles/cassandra/cassandrathenandnow.html)
>
> - Browse the low-hanging fruit tickets
> (https://issues.apache.org/jira/issues/?jql=project%20%3D%2012310865%20AND%20labels%20%3D%20lhf%20AND%20status%20!%3D%20resolved)
> and have an idea of what you want to work on in day 2.
>
> Apply at http://learn.datastax.com/CassandraSummitBootcampApplication.html
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced