Re: How is Cassandra being used?

2011-11-16 Thread Peter Tillotson
I've read through the thread and have a few comments and and idea. 

1) I can understand a preference for opt in
2) As a user I would have probably opted in every time I hit a performance issue
3) Opt in may well be skewed to poorer use cases or hardware issues
4) There is a trust gap that needs to be bridged before opt out is acceptable

Now for the Idea, perhaps a report tool, in nodetool that generates a human 
readable profile, in the short term a manual submission process, perhaps down 
the line fully automated.

So basically there are two good plans in your email
1) Standard reporting  (+1)
2) Automated feedback (opt in +1)

 p



From: Jonathan Ellis jbel...@gmail.com
To: dev dev@cassandra.apache.org
Sent: Tuesday, 15 November 2011, 23:23
Subject: How is Cassandra being used?

I started a users survey thread over on the users list (replies are
still trickling in), but as useful as that is, I'd like to get
feedback that is more quantitative and with a broader base.  This will
let us prioritize our development efforts to better address what
people are actually using it for, with less guesswork.  For instance:
we put a lot of effort into compression for 1.0.0; if it turned out
that only 1% of 1.0.x users actually enable compression, then it means
that we should spend less effort fine-tuning that moving forward, and
use the energy elsewhere.

(Of course it could also mean that we did a terrible job getting the
word out about new features and explaining how to use them, but either
way, it would be good to know!)

I propose adding a basic cluster reporting feature to cassandra.yaml,
enabled by default.  It would send anonymous information about your
cluster to an apache.org VM.  Information like, number (but not names)
of keyspaces and columnfamilies, ks-level options like compression, cf
options like compaction strategy, data types (again, not names) of
columns, average row size (or better: the histogram data), and average
sstables per read.

Thoughts?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: How is Cassandra being used?

2011-11-16 Thread Eric Evans
On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans eev...@acunu.com wrote:
 I think this is potentially quite dangerous; There are a lot people
 who get very twitchy at the idea of software that Phones Home.  I've
 seen this so many times, and in all cases it was for software a lot
 less sensitive than a database.

 True, but unlike most Home Phoners, ours will be out there in the open
 and you can see exactly what it's sending (or not, if you disable it).
  I'm sure there's other examples in the wild of this, but the only one
 I can think of is popcorn [1].

I don't think the transparency of the implementation changes things
much.  It's still going to be opaque to a lot of folks, and more
importantly is the precedence it sets and the way it changes the
project/user trust relationship.

Even if you're satisfied with the implementation, and trust that it
won't be extended to transmit additional data later (unintentionally
or otherwise), there are still very valid privacy concerns.  For
example, seeing as how this must be transmitted over an IP network,
there are only so many guarantees you can make with respect to
anonymity.  There will always be *someone* that can tie the data to a
unique IP, and an IP can almost always be tied to an individual or
organization.  Imagine an organization that doesn't want *anyone* to
know it uses Cassandra, and isn't willing to accept the risk that one
of their admins might accidentally enable this reporting.

It's also interesting that you mention popcon because it has always
been contentious.  It's taken years for it to transition from the
point where it required users to install it themselves, to a prompt at
install-time that defaulted to No, to the current state of an
install-time prompt that defaults to Yes.  And, the installer asks
*very* few questions; Whether or not popcon is enabled is on par with
partitioning and the assignment of a root password.

Also, there should be no shame in the admission that we haven't earned
anywhere near the level of trust and respect that the Debian project
has.

 More broadly, my sense is that people are getting used to the idea
 that it's okay to give away anonymous statistics as part of the price
 of free, although YMMclearlyV. I am, after all, a Windows user. :)

As privacy becomes more threatened people are either capitulating, or
becoming even more defensive; Whether that makes it better or worse
for us if we do this is debatable.

 I'm sure you've already considered this though, you're already talking
 about anonymity, and transparency, and what I assume is neutrality of
 the collection endpoint (can apache actually provide a VM; is that a
 thing?).

 Yes, they provide Ubuntu or FreeBSD VMs.

 I'm just afraid that we'll scare people off before they can
 be properly convinced that it's all on the up-and-up.

 How would you propose addressing this?

Honestly?  The best way to convince people that we take the privacy of
their data seriously is to not transmit any of it to a machine outside
their control.

 I'm curious to see what others think, but at the moment I'm hovering
 somewhere around a -0 if it were opt-in (off by default).

 I'm okay with opt-in if you think that's useful as a first step to
 ease the twitchiness you mention, but longer term I think it's only
 really useful if it's on by default. There's a lot of research that
 shows that people tend to stick with whatever is the path of least
 resistance [2], and specifically, my experience with Cassandra users
 is exactly that -- one reason we've spent so much effort getting
 defaults so good is because almost nobody goes beyond that.

It's even worse than that.  It's not just that you'll be receiving
less data, it will also be less meaningful (since it's from a
self-selecting group).

 [1] http://popcon.debian.org/
 [2] 
 http://www.richmondfed.org/publications/research/region_focus/2007/winter/pdf/feature2.pdf

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Brian O'Neill
Lively thread...

+1 opt-in
+1 in separate module

I'll just substantiate Rick Shaw's comments.  If this is on by default, I
can see it making its way into production at a large corporation, at which
time the traffic would sound an alarm as suspicious activity, which would
immediately get the server's plug pulled and trigger an investigation.
 That would land the architect responsible for deploying that server in the
proverbial principal's office.  In the extreme case, that might
black-list the technology and add fuel to any debate that the corporation
should just stick with the 'proven enterprise' solutions.  That is not my
perspective, just be aware that in some large corporations it is an uphill
battle to deploy Cassandra  in the first place given incumbent systems.

In every situation I've been in, even outside of large corporations, we
would need to disable this feature given the sensitivity of the data.

All that said... I would love to see this data. ;)
I'd love to know where our deployment lies on the spectrum of use.

Maybe a good old fashioned web form that allows companies to submit their
usage scenarios might accomplish the same goal? (and you could get
additional context information about the industry, etc.)  It wouldn't be
comprehensive, but it may be sufficiently representative.  Maybe you could
just output a couple lines at server start that said something like Go
here http://... to see how your usage compares to others.

I personally wouldn't throw to big a hissy if it was incorporated into the
actual server and on by default, but I certainly know others that would.

-brian


On Wed, Nov 16, 2011 at 7:17 AM, Eric Evans eev...@acunu.com wrote:

 On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans eev...@acunu.com wrote:
  I think this is potentially quite dangerous; There are a lot people
  who get very twitchy at the idea of software that Phones Home.  I've
  seen this so many times, and in all cases it was for software a lot
  less sensitive than a database.
 
  True, but unlike most Home Phoners, ours will be out there in the open
  and you can see exactly what it's sending (or not, if you disable it).
   I'm sure there's other examples in the wild of this, but the only one
  I can think of is popcorn [1].

 I don't think the transparency of the implementation changes things
 much.  It's still going to be opaque to a lot of folks, and more
 importantly is the precedence it sets and the way it changes the
 project/user trust relationship.

 Even if you're satisfied with the implementation, and trust that it
 won't be extended to transmit additional data later (unintentionally
 or otherwise), there are still very valid privacy concerns.  For
 example, seeing as how this must be transmitted over an IP network,
 there are only so many guarantees you can make with respect to
 anonymity.  There will always be *someone* that can tie the data to a
 unique IP, and an IP can almost always be tied to an individual or
 organization.  Imagine an organization that doesn't want *anyone* to
 know it uses Cassandra, and isn't willing to accept the risk that one
 of their admins might accidentally enable this reporting.

 It's also interesting that you mention popcon because it has always
 been contentious.  It's taken years for it to transition from the
 point where it required users to install it themselves, to a prompt at
 install-time that defaulted to No, to the current state of an
 install-time prompt that defaults to Yes.  And, the installer asks
 *very* few questions; Whether or not popcon is enabled is on par with
 partitioning and the assignment of a root password.

 Also, there should be no shame in the admission that we haven't earned
 anywhere near the level of trust and respect that the Debian project
 has.

  More broadly, my sense is that people are getting used to the idea
  that it's okay to give away anonymous statistics as part of the price
  of free, although YMMclearlyV. I am, after all, a Windows user. :)

 As privacy becomes more threatened people are either capitulating, or
 becoming even more defensive; Whether that makes it better or worse
 for us if we do this is debatable.

  I'm sure you've already considered this though, you're already talking
  about anonymity, and transparency, and what I assume is neutrality of
  the collection endpoint (can apache actually provide a VM; is that a
  thing?).
 
  Yes, they provide Ubuntu or FreeBSD VMs.
 
  I'm just afraid that we'll scare people off before they can
  be properly convinced that it's all on the up-and-up.
 
  How would you propose addressing this?

 Honestly?  The best way to convince people that we take the privacy of
 their data seriously is to not transmit any of it to a machine outside
 their control.

  I'm curious to see what others think, but at the moment I'm hovering
  somewhere around a -0 if it were opt-in (off by default).
 
  I'm okay with 

Re: How is Cassandra being used?

2011-11-16 Thread Jake Luciani
Having worked at places where you get fired if software *attempts* to
contact outside world I understand the concerns.

However, if it's opt-in via config file and requires a restart then there
is no reason why it should be a concern.


On Wed, Nov 16, 2011 at 3:29 AM, Zhu Han schumi@gmail.com wrote:

 On Wed, Nov 16, 2011 at 3:03 PM, Norman Maurer nor...@apache.org wrote:

  2011/11/16 Jonathan Ellis jbel...@gmail.com:
   I started a users survey thread over on the users list (replies are
   still trickling in), but as useful as that is, I'd like to get
   feedback that is more quantitative and with a broader base.  This will
   let us prioritize our development efforts to better address what
   people are actually using it for, with less guesswork.  For instance:
   we put a lot of effort into compression for 1.0.0; if it turned out
   that only 1% of 1.0.x users actually enable compression, then it means
   that we should spend less effort fine-tuning that moving forward, and
   use the energy elsewhere.
  
   (Of course it could also mean that we did a terrible job getting the
   word out about new features and explaining how to use them, but either
   way, it would be good to know!)
  
   I propose adding a basic cluster reporting feature to cassandra.yaml,
   enabled by default.  It would send anonymous information about your
   cluster to an apache.org VM.  Information like, number (but not names)
   of keyspaces and columnfamilies, ks-level options like compression, cf
   options like compaction strategy, data types (again, not names) of
   columns, average row size (or better: the histogram data), and average
   sstables per read.
  
   Thoughts?
 

 -1.

 It may scare some admins who stores sensitive data  in cassandra. Even if
 it can
 disabled, we can not sleep well in the night when we know the door can be
 opened unintentionally...


  Hi there,
 
  I'm not a cassandra dev but an user of it. I would really hate to
  see such code in the cassandra code-base. I understand that it would
  be kind of useful to get a better feeling about usage etc, but its
  really something that scares the shit out of many managers (and even
  devs ;) ).
 
  So -1 to add this code (*non-binding)
 
  Bye,
  Norman
 




-- 
http://twitter.com/tjake


Re: How is Cassandra being used?

2011-11-16 Thread Eric Evans
On Wed, Nov 16, 2011 at 2:59 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Wed, Nov 16, 2011 at 8:46 AM, Gary Dusbabek gdusba...@gmail.com wrote:
 Here is what should determine where energy is spent:  if enough people
 are willing to expend the effort to voice their concerns about feature
 X in JIRA and on the mailing list, and there are people willing to do
 the technical work, and it doesn't represent a technical Wrong Turn
 for the project, then it should (it will) get worked on.

 Well, sort of.  I'm *willing* to work on all or most of the 217 open
 Cassandra tickets, but since I don't have time to do them all I need
 to prioritize aggressively.  My motivation here is to get more data
 for that prioritization, which so far has been mostly guided by
 intuition.

 It sounds like your implicit assumption is that jira + mailing list
 are a good enough approximation for who-is-using-what, but I'm not
 sure that's the case.

There probably is a rather large group of shadow users whose
(valuable?) input doesn't make it to the list or bug tracker.  It
sounds like Gary is questioning whether we should be giving these
people a voice.  Assuming I have that right, I agree that's a very
good question.  This is a community-based project after all.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
On Wed, Nov 16, 2011 at 10:56 AM, Eric Evans eev...@acunu.com wrote:
 There probably is a rather large group of shadow users whose
 (valuable?) input doesn't make it to the list or bug tracker.  It
 sounds like Gary is questioning whether we should be giving these
 people a voice.  Assuming I have that right, I agree that's a very
 good question.  This is a community-based project after all.

First, as attractive (and easy!) as it is to live inside our echo
chamber, yes, I do think we should give them a voice.  Of course, that
doesn't mean you're obliged to listen to it.  If you don't think that
is a valuable source of input for prioritizing your work, you're free
to ignore it.

Second, what I'm talking about is a different type of data from what
you get on jira + ML.  Those are negative sources of information --
you mostly only find out someone is using compression if they have a
problem with it.  How many people are using it with no problems?  That
is what this would let us start to find out.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: hintedhandoff in 1.0.3

2011-11-16 Thread Jonathan Ellis
Keys in HCF are nodes it has hints for.  You can try forcing delivery
to the node that still has hints.  It's also possible that new hints
were created (because that node timed out some writes) during the
delivery of the first ones.

On Tue, Nov 15, 2011 at 3:42 AM, Radim Kolar h...@sendmail.cz wrote:
 Same problem on other node:  2 keys in HintsColumnFamily. One delivered, one
 left.

  INFO [HintedHandoff:1] 2011-11-15 10:31:53,181 HintedHandOffManager.java
 (line 268) Started hinted handoff for token:
 99070591730234615865843651857942052864
  INFO [HintedHandoff:1] 2011-11-15 10:32:49,385 ColumnFamilyStore.java (line
 688) Enqueuing flush of Memtable-HintsColumnFamily@797897458(1674737/2093421
 serialized/live bytes, 6176 ops)
  INFO [FlushWriter:5] 2011-11-15 10:32:49,386 Memtable.java (line 239)
 Writing Memtable-HintsColumnFamily@797897458(1674737/2093421 serialized/live
 bytes, 6176 ops)
  INFO [CompactionExecutor:10] 2011-11-15 10:32:49,387 CompactionTask.java
 (line 112) Compacting
 [SSTableReader(path='/usr/local/cassandra/data/system/HintsColumnFamily-hb-754-Data.db'),
 SSTableReader(path='/usr/local/cassandra/data/system/HintsColumnFamily-hb-752-Data.db')]
  INFO [FlushWriter:5] 2011-11-15 10:32:49,523 Memtable.java (line 275)
 Completed flushing
 /usr/local/cassandra/data/system/HintsColumnFamily-hb-755-Data.db (1888357
 bytes)
  INFO [CompactionExecutor:10] 2011-11-15 10:32:49,820 CompactionTask.java
 (line 213) Compacted to
 [/usr/local/cassandra/data/system/HintsColumnFamily-hb-756-Data.db,].
  19,913,818 to 19,913,392 (~99% of original) bytes for 2 keys at
 43.960395MB/s.  Time: 432ms.
  INFO [HintedHandoff:1] 2011-11-15 10:32:49,820 HintedHandOffManager.java
 (line 334) Finished hinted handoff of 5796 rows





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
Sounds like the consensus is that if this is a good idea at all, it
needs to be opt-in.  Like I said earlier, I can live with that.

On Wed, Nov 16, 2011 at 10:35 AM, Jake Luciani jak...@gmail.com wrote:
 Having worked at places where you get fired if software *attempts* to
 contact outside world I understand the concerns.

 However, if it's opt-in via config file and requires a restart then there
 is no reason why it should be a concern.


 On Wed, Nov 16, 2011 at 3:29 AM, Zhu Han schumi@gmail.com wrote:

 On Wed, Nov 16, 2011 at 3:03 PM, Norman Maurer nor...@apache.org wrote:

  2011/11/16 Jonathan Ellis jbel...@gmail.com:
   I started a users survey thread over on the users list (replies are
   still trickling in), but as useful as that is, I'd like to get
   feedback that is more quantitative and with a broader base.  This will
   let us prioritize our development efforts to better address what
   people are actually using it for, with less guesswork.  For instance:
   we put a lot of effort into compression for 1.0.0; if it turned out
   that only 1% of 1.0.x users actually enable compression, then it means
   that we should spend less effort fine-tuning that moving forward, and
   use the energy elsewhere.
  
   (Of course it could also mean that we did a terrible job getting the
   word out about new features and explaining how to use them, but either
   way, it would be good to know!)
  
   I propose adding a basic cluster reporting feature to cassandra.yaml,
   enabled by default.  It would send anonymous information about your
   cluster to an apache.org VM.  Information like, number (but not names)
   of keyspaces and columnfamilies, ks-level options like compression, cf
   options like compaction strategy, data types (again, not names) of
   columns, average row size (or better: the histogram data), and average
   sstables per read.
  
   Thoughts?
 

 -1.

 It may scare some admins who stores sensitive data  in cassandra. Even if
 it can
 disabled, we can not sleep well in the night when we know the door can be
 opened unintentionally...


  Hi there,
 
  I'm not a cassandra dev but an user of it. I would really hate to
  see such code in the cassandra code-base. I understand that it would
  be kind of useful to get a better feeling about usage etc, but its
  really something that scares the shit out of many managers (and even
  devs ;) ).
 
  So -1 to add this code (*non-binding)
 
  Bye,
  Norman
 




 --
 http://twitter.com/tjake




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Ryan King
On Wed, Nov 16, 2011 at 10:02 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Sounds like the consensus is that if this is a good idea at all, it
 needs to be opt-in.  Like I said earlier, I can live with that.

In addition, if you want to get data from large companies that manage
their own datacenters, there needs to be a way to contribute data
without the software phoning home automatically. We aren't allowed to
make connections to the outside world from our datacenter. And I'm not
willing to ask for an exception for this.

A mode that dumps the data to a file which can be uploaded would be
preferable. People probably won't do it often, but imagine if your
periodic how are you using cassandra? email threads included data?

-ryan


Re: [VOTE] Release Apache Cassandra 1.0.3 (take 2)

2011-11-16 Thread Jonathan Ellis
I'm +1 on either these artifacts as is, or these artifacts with thrift
rebuilt to reflect the correct api version

On Tue, Nov 15, 2011 at 7:46 AM, Eric Evans eev...@acunu.com wrote:
 On Tue, Nov 15, 2011 at 1:40 AM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
 So, CASSANDRA-3491 and CASSANDRA-3492 got in the way of the first take.
 Now that they are fixed, let's try again. I propose the following artifacts
 for release as 1.0.3.

 SVN: 
 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-1.0@1202082
 Artifacts: 
 https://repository.apache.org/content/repositories/orgapachecassandra-186/org/apache/cassandra/apache-cassandra/1.0.3/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-186/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/I1dZG (CHANGES.txt)
 [2]: http://goo.gl/PeD3Z (NEWS.txt)


 It looks like interface/cassandra.thrift has changed without the Java
 code being regenerated.  The test_describe system test is failing
 because of this, (the versions don't match).

 Probably not justification for a re-roll, but not a great thing for
 the release either...

 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Build failed in Jenkins: Cassandra #1209

2011-11-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/Cassandra/1209/changes

Changes:

[jbellis] merge from 1.0

--
[...truncated 2243 lines...]
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.163 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.locator.ReplicationStrategyEndpointCacheTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.513 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.SimpleStrategyTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.687 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.TokenMetadataTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.453 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.service.AntiEntropyServiceCounterTest
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 2.607 sec
[junit] 
[junit] Testcase: 
testValidatorPrepare(org.apache.cassandra.service.AntiEntropyServiceCounterTest):
 Caused an ERROR
[junit] /127.0.0.1:7010 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
[junit] org.apache.cassandra.config.ConfigurationException: /127.0.0.1:7010 
is in use by another process.  Change listen_address:storage_port in 
cassandra.yaml to values that do not conflict with other services
[junit] at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:271)
[junit] at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:241)
[junit] at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:484)
[junit] at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:461)
[junit] at 
org.apache.cassandra.service.AntiEntropyServiceTestAbstract.prepare(AntiEntropyServiceTestAbstract.java:80)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.AntiEntropyServiceCounterTest 
FAILED
[junit] Testsuite: 
org.apache.cassandra.service.AntiEntropyServiceStandardTest
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 2.437 sec
[junit] 
[junit] Testcase: 
testValidatorPrepare(org.apache.cassandra.service.AntiEntropyServiceStandardTest):
Caused an ERROR
[junit] /127.0.0.1:7010 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
[junit] org.apache.cassandra.config.ConfigurationException: /127.0.0.1:7010 
is in use by another process.  Change listen_address:storage_port in 
cassandra.yaml to values that do not conflict with other services
[junit] at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:271)
[junit] at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:241)
[junit] at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:484)
[junit] at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:461)
[junit] at 
org.apache.cassandra.service.AntiEntropyServiceTestAbstract.prepare(AntiEntropyServiceTestAbstract.java:80)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.AntiEntropyServiceStandardTest 
FAILED
[junit] Testsuite: org.apache.cassandra.service.CassandraServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.461 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.service.ConsistencyLevelTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.776 sec
[junit] 
[junit] Testcase: 
testReadWriteConsistencyChecks(org.apache.cassandra.service.ConsistencyLevelTest):
Caused an ERROR
[junit] invalid consistency level: ANY
[junit] java.lang.UnsupportedOperationException: invalid consistency level: 
ANY
[junit] at 
org.apache.cassandra.service.ReadCallback.determineBlockFor(ReadCallback.java:195)
[junit] at 
org.apache.cassandra.service.ReadCallback.init(ReadCallback.java:68)
[junit] at 
org.apache.cassandra.service.StorageProxy.getReadCallback(StorageProxy.java:798)
[junit] at 
org.apache.cassandra.service.ConsistencyLevelTest.testReadWriteConsistencyChecks(ConsistencyLevelTest.java:110)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.ConsistencyLevelTest FAILED
[junit] Testsuite: org.apache.cassandra.service.EmbeddedCassandraServiceTest
[junit] Testsuite: org.apache.cassandra.service.EmbeddedCassandraServiceTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] 
[junit] Testcase: 
org.apache.cassandra.service.EmbeddedCassandraServiceTest:BeforeFirstTest:  
  Caused an ERROR
[junit] Forked Java VM exited abnormally. Please note the time in the 
report does not reflect the time until the VM exit.
[junit] 

Jenkins build is still unstable: Cassandra-Coverage #168

2011-11-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/Cassandra-Coverage/changes




Re: How is Cassandra being used?

2011-11-16 Thread Jeremy Hanna
Sounds like it would be best if it were in a separate jar for people?

On Nov 16, 2011, at 4:58 PM, Bill wrote:

  Thoughts?
 
 
 We'll turn this off, and would possibly patch it out of the code. That's not 
 to say it wouldn't be useful to others.
 
 Bill
 
 
 On 15/11/11 23:23, Jonathan Ellis wrote:
 I started a users survey thread over on the users list (replies are
 still trickling in), but as useful as that is, I'd like to get
 feedback that is more quantitative and with a broader base.  This will
 let us prioritize our development efforts to better address what
 people are actually using it for, with less guesswork.  For instance:
 we put a lot of effort into compression for 1.0.0; if it turned out
 that only 1% of 1.0.x users actually enable compression, then it means
 that we should spend less effort fine-tuning that moving forward, and
 use the energy elsewhere.
 
 (Of course it could also mean that we did a terrible job getting the
 word out about new features and explaining how to use them, but either
 way, it would be good to know!)
 
 I propose adding a basic cluster reporting feature to cassandra.yaml,
 enabled by default.  It would send anonymous information about your
 cluster to an apache.org VM.  Information like, number (but not names)
 of keyspaces and columnfamilies, ks-level options like compression, cf
 options like compaction strategy, data types (again, not names) of
 columns, average row size (or better: the histogram data), and average
 sstables per read.
 
 Thoughts?
 
 
 



Re: How is Cassandra being used?

2011-11-16 Thread Jeremiah Jordan
+1 for a separate jar (and a second download link that doesn't include this 
jar, though I would make the primary link include it with BIG BOLD PRINT saying 
it is in there)
+1 for a config option to turn off auto-post (defaulted on in the download that 
has the jar)
+1 for a nodetool command to dump it to a file for manual posting

I think this could be a good debugging tool as well.  Have a command to dump 
here is what my cluster looks like to a file, that could then be sent though 
email for others to be used help resolve issues with would be nice.  The 
current nodetool information commands have too much stuff that needs to be 
sanitized out before you can send it outside the firewall.

- Jeremiah

On Nov 16, 2011, at 7:16 PM, Jeremy Hanna wrote:

 Sounds like it would be best if it were in a separate jar for people?
 
 On Nov 16, 2011, at 4:58 PM, Bill wrote:
 
 Thoughts?
 
 
 We'll turn this off, and would possibly patch it out of the code. That's not 
 to say it wouldn't be useful to others.
 
 Bill
 
 
 On 15/11/11 23:23, Jonathan Ellis wrote:
 I started a users survey thread over on the users list (replies are
 still trickling in), but as useful as that is, I'd like to get
 feedback that is more quantitative and with a broader base.  This will
 let us prioritize our development efforts to better address what
 people are actually using it for, with less guesswork.  For instance:
 we put a lot of effort into compression for 1.0.0; if it turned out
 that only 1% of 1.0.x users actually enable compression, then it means
 that we should spend less effort fine-tuning that moving forward, and
 use the energy elsewhere.
 
 (Of course it could also mean that we did a terrible job getting the
 word out about new features and explaining how to use them, but either
 way, it would be good to know!)
 
 I propose adding a basic cluster reporting feature to cassandra.yaml,
 enabled by default.  It would send anonymous information about your
 cluster to an apache.org VM.  Information like, number (but not names)
 of keyspaces and columnfamilies, ks-level options like compression, cf
 options like compaction strategy, data types (again, not names) of
 columns, average row size (or better: the histogram data), and average
 sstables per read.
 
 Thoughts?