New Metrics Collector for Apache Cassandra w/ Prometheus

2020-05-15 Thread Jake Luciani
Hi,

Hope this email finds you well.

DataStax has recently open sourced a new metrics collector for Apache
Cassandra.
It's a drop in solution and comes with Prometheus dashboards and works with
all versions
between 2.2 to 4.0 alpha.

Blog:
https://www.datastax.com/blog/2020/05/monitoring-apache-cassandratm-made-simple
GH: https://github.com/datastax/metric-collector-for-apache-cassandra

Stay Safe!

Jake


Re: Counter performance

2017-04-17 Thread Jake Luciani
You can set the trace probability on a node to 1% and you'll catch a trace
on that table.

http://cassandra.apache.org/doc/latest/tools/nodetool/settraceprobability.html

On Mon, Apr 17, 2017 at 11:17 AM, benjamin roth  wrote:

> Just run some queries on counter tables. Some on regular tables. Look at
> traces and then compare. You don't need to do anything with application
> code. You can also set trace probability on a table level and then analyze
> the queries.
>
> Am 17.04.2017 17:07 schrieb "Eren Yilmaz" :
>
>> I can’t add tracing using driver – Usergrid code is way too complex. When
>> I look at logging the slow queries on the C* side, it says the feature is
>> added in version 3.10 (https://issues.apache.org/jir
>> a/browse/CASSANDRA-12403), and we use 3.7. Any other ways to log slow
>> queries in this version? Or, what do we expect with this log output?
>>
>>
>>
>> *From:* benjamin roth [mailto:brs...@gmail.com]
>> *Sent:* Monday, April 17, 2017 5:44 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: Counter performance
>>
>>
>>
>> You could enable a slow query log and then trace single queries couldn't
>> you?
>>
>>
>>
>> Am 17.04.2017 16:31 schrieb "Eren Yilmaz" :
>>
>> I can’t trace selects on the application tables unfortunately. The
>> application is Usergrid, and it stores the data in binary. We have little
>> control over Usergrid-created data.
>>
>>
>>
>> *From:* benjamin roth [mailto:brs...@gmail.com]
>> *Sent:* Monday, April 17, 2017 4:12 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Counter performance
>>
>>
>>
>> Do you see difference when tracing the selects?
>>
>>
>>
>> 2017-04-17 13:36 GMT+02:00 Eren Yilmaz :
>>
>> Application tables use LeveledCompactionStrategy. At first, counter
>> tables were created by default SizeTieredCompactionStrategy, but we changed
>> them to LeveledCompactionStrategy then.
>>
>>
>>
>> compaction = { 'class' : 'org.apache.cassandra.db.compa
>> ction.LeveledCompactionStrategy', 'sstable_size_in_mb' : 512 }
>>
>>
>>
>> *From:* benjamin roth [mailto:brs...@gmail.com]
>> *Sent:* Monday, April 17, 2017 12:12 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Counter performance
>>
>>
>>
>> Do you have a different compaction strategy on the counter tables?
>>
>>
>>
>> 2017-04-17 10:07 GMT+02:00 Eren Yilmaz :
>>
>> We are using Cassandra (3.7) counter tables in our application, and there
>> are about 10 counter tables. The counter tables are in a separate keyspace
>> with RF=3 (total 10 nodes). The tables are read-heavy, for each web request
>> to the application, we read at least 20 counter values. The counter reads
>> are very slow comparing to the other application data reads from cassandra,
>> and sometimes the reads put extra heavy CPU load on some nodes.
>>
>>
>>
>> Are there any tips, or best practices for increasing the performance of
>> counter tables?
>>
>>
>>
>>
>>
>>
>>
>


-- 
http://twitter.com/tjake


Re: Incremental repair for the first time

2016-12-16 Thread Jake Luciani
This was fixed post 3.0.4 please upgrade to latest 3.0 release

On Fri, Dec 16, 2016 at 4:49 PM, Kathiresan S 
wrote:

> Hi,
>
> We have a brand new Cassandra cluster (version 3.0.4) and we set up
> nodetool repair scheduled for every day (without any options for repair).
> As per documentation, incremental repair is the default in this case.
> Should we do a full repair for the very first time on each node once and
> then leave it to do incremental repair afterwards?
>
> *Problem we are facing:*
>
> On a random node, the repair process throws validation failed error,
> pointing to some other node
>
> For Eg. Node A, where the repair is run (without any option), throws below
> error
>
> *Validation failed in /Node B*
>
> In Node B when we check the logs, below exception is seen at the same
> exact time...
>
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
> *at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1087)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
> *at
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
> *at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:700)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_73]*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_73]*
>
> Can you please help on how this can be fixed?
>
> Thanks,
> Kathir
>



-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.0.9 released

2016-09-20 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.9.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/YfvFn8 (CHANGES.txt)
[2]: https://goo.gl/k9leqx (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 3.0.8 released

2016-07-07 Thread Jake Luciani
Sorry, I totally missed that.  Uploading now.

On Thu, Jul 7, 2016 at 4:51 AM, horschi  wrote:

> Same for 2.2.7.
>
> On Thu, Jul 7, 2016 at 10:49 AM, Julien Anguenot 
> wrote:
>
>> Hey,
>>
>> The Debian packages do not seem to have been published. Normal?
>>
>> Thank you.
>>
>>J.
>>
>> On Jul 6, 2016, at 4:20 PM, Jake Luciani  wrote:
>>
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 3.0.8.
>>
>> Apache Cassandra is a fully distributed database. It is the right choice
>> when you need scalability and high availability without compromising
>> performance.
>>
>>  http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>>  http://cassandra.apache.org/download/
>>
>> This version is a bug fix release[1] on the 3.0 series. As always, please
>> pay
>> attention to the release notes[2] and Let us know[3] if you were to
>> encounter
>> any problem.
>>
>> Enjoy!
>>
>> [1]: http://goo.gl/DQpe4d (CHANGES.txt)
>> [2]: http://goo.gl/UISX1K (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>
>>
>>
>


[RELEASE] Apache Cassandra 3.0.8 released

2016-07-06 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.8.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/DQpe4d (CHANGES.txt)
[2]: http://goo.gl/UISX1K (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.7 released

2016-07-06 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/KNV34t (CHANGES.txt)
[2]: http://goo.gl/VQfst8 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.15 released

2016-07-06 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.15.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Lozbuh (CHANGES.txt)
[2]: http://goo.gl/Omcaa1 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.7 released

2016-06-14 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/yPJaXi (CHANGES.txt)
[2]: http://goo.gl/Jph9Fh (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.7 released

2016-06-14 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a tick-tock bug fix release[1] on the 3.x series. As
always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/k1abJV (CHANGES.txt)
[2]: http://goo.gl/3ENJIz (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Why there is no native shutdown command in cassandra

2016-06-13 Thread Jake Luciani
If that's true it's a bug then. can you open a ticket and include the logs?
https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Jun 13, 2016 at 2:19 PM, Anshu Vajpayee 
wrote:

> I just tested. It doesn't flush memtables like nodetool drain/flush
> command. Means it only does crash for the node, no graceful shutdown.
>
>
>
> On Mon, Jun 13, 2016 at 10:51 PM, Jake Luciani  wrote:
>
>> Yeah same as drain.  Just exits at the end.
>>
>> On Mon, Jun 13, 2016 at 1:11 PM, Anshu Vajpayee > > wrote:
>>
>>> Thanks for information.
>>>
>>> Does stopdaemon also flush memtables  and stop trift and CQL interface
>>> before shutting down the daemon ?  does node also announce  shutting down
>>> message  in ring  ?
>>>
>>>
>>> On Mon, Jun 13, 2016 at 10:14 PM, Jake Luciani  wrote:
>>>
>>>> If you want to understand why, it's because C* was designed to be
>>>> crash-only.
>>>>
>>>> https://www.usenix.org/conference/hotos-ix/crash-only-software
>>>>
>>>> Since this is great for the project but bad for operators experience we
>>>> have later added this stopdaemon command.
>>>>
>>>> On Mon, Jun 13, 2016 at 12:37 PM, Anshu Vajpayee <
>>>> anshu.vajpa...@gmail.com> wrote:
>>>>
>>>>> As per Documentation(pasted as below), It does not stop Daemon . I
>>>>> tested also.I was looking for graceful shutdown  for Cassandra Daemon.
>>>>> Description
>>>>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsDrain.html?scroll=toolsDrain__description_unique_11>
>>>>>
>>>>> Flushes all memtables from the node to SSTables on disk. Cassandra
>>>>> stops listening for connections from the client and other nodes. You need
>>>>> to restart Cassandra after running nodetool drain. You typically use
>>>>> this command before upgrading a node to a new version of Cassandra. To
>>>>> simply flush memtables to disk, use nodetool flush.
>>>>>
>>>>> On Mon, Jun 13, 2016 at 10:00 PM, Jeff Jirsa <
>>>>> jeff.ji...@crowdstrike.com> wrote:
>>>>>
>>>>>> `nodetool drain`
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Anshu Vajpayee 
>>>>>> *Reply-To: *"user@cassandra.apache.org" 
>>>>>> *Date: *Monday, June 13, 2016 at 9:28 AM
>>>>>> *To: *"user@cassandra.apache.org" 
>>>>>> *Subject: *Why there is no native shutdown command in cassandra
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All
>>>>>>
>>>>>>
>>>>>>
>>>>>> Why we dont have native shutdown command in Cassandra ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Every software provides graceful shutdown command.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ​Regards,
>>>>>>
>>>>>> Anshu​
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Regards,*
>>>>> *Anshu *
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://twitter.com/tjake
>>>>
>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Anshu *
>>>
>>>
>>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


-- 
http://twitter.com/tjake


Re: Why there is no native shutdown command in cassandra

2016-06-13 Thread Jake Luciani
Yeah same as drain.  Just exits at the end.

On Mon, Jun 13, 2016 at 1:11 PM, Anshu Vajpayee 
wrote:

> Thanks for information.
>
> Does stopdaemon also flush memtables  and stop trift and CQL interface
> before shutting down the daemon ?  does node also announce  shutting down
> message  in ring  ?
>
>
> On Mon, Jun 13, 2016 at 10:14 PM, Jake Luciani  wrote:
>
>> If you want to understand why, it's because C* was designed to be
>> crash-only.
>>
>> https://www.usenix.org/conference/hotos-ix/crash-only-software
>>
>> Since this is great for the project but bad for operators experience we
>> have later added this stopdaemon command.
>>
>> On Mon, Jun 13, 2016 at 12:37 PM, Anshu Vajpayee <
>> anshu.vajpa...@gmail.com> wrote:
>>
>>> As per Documentation(pasted as below), It does not stop Daemon . I
>>> tested also.I was looking for graceful shutdown  for Cassandra Daemon.
>>> Description
>>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsDrain.html?scroll=toolsDrain__description_unique_11>
>>>
>>> Flushes all memtables from the node to SSTables on disk. Cassandra stops
>>> listening for connections from the client and other nodes. You need to
>>> restart Cassandra after running nodetool drain. You typically use this
>>> command before upgrading a node to a new version of Cassandra. To simply
>>> flush memtables to disk, use nodetool flush.
>>>
>>> On Mon, Jun 13, 2016 at 10:00 PM, Jeff Jirsa >> > wrote:
>>>
>>>> `nodetool drain`
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Anshu Vajpayee 
>>>> *Reply-To: *"user@cassandra.apache.org" 
>>>> *Date: *Monday, June 13, 2016 at 9:28 AM
>>>> *To: *"user@cassandra.apache.org" 
>>>> *Subject: *Why there is no native shutdown command in cassandra
>>>>
>>>>
>>>>
>>>> Hi All
>>>>
>>>>
>>>>
>>>> Why we dont have native shutdown command in Cassandra ?
>>>>
>>>>
>>>>
>>>> Every software provides graceful shutdown command.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ​Regards,
>>>>
>>>> Anshu​
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Anshu *
>>>
>>>
>>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


-- 
http://twitter.com/tjake


Re: Why there is no native shutdown command in cassandra

2016-06-13 Thread Jake Luciani
If you want to understand why, it's because C* was designed to be
crash-only.

https://www.usenix.org/conference/hotos-ix/crash-only-software

Since this is great for the project but bad for operators experience we
have later added this stopdaemon command.

On Mon, Jun 13, 2016 at 12:37 PM, Anshu Vajpayee 
wrote:

> As per Documentation(pasted as below), It does not stop Daemon . I tested
> also.I was looking for graceful shutdown  for Cassandra Daemon.Description
>
> 
>
> Flushes all memtables from the node to SSTables on disk. Cassandra stops
> listening for connections from the client and other nodes. You need to
> restart Cassandra after running nodetool drain. You typically use this
> command before upgrading a node to a new version of Cassandra. To simply
> flush memtables to disk, use nodetool flush.
>
> On Mon, Jun 13, 2016 at 10:00 PM, Jeff Jirsa 
> wrote:
>
>> `nodetool drain`
>>
>>
>>
>>
>>
>> *From: *Anshu Vajpayee 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Monday, June 13, 2016 at 9:28 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Why there is no native shutdown command in cassandra
>>
>>
>>
>> Hi All
>>
>>
>>
>> Why we dont have native shutdown command in Cassandra ?
>>
>>
>>
>> Every software provides graceful shutdown command.
>>
>>
>>
>>
>>
>>
>>
>> ​Regards,
>>
>> Anshu​
>>
>>
>>
>>
>>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.6 released

2016-06-06 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a tick-tock feature release[1] on the 3.x series. As
always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/eu90nx (CHANGES.txt)
[2]: http://goo.gl/ugkBQW (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.6 released

2016-05-13 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/cBU6AT (CHANGES.txt)
[2]: http://goo.gl/XvXLaJ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.6 released

2016-04-26 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/yCpWu7 (CHANGES.txt)
[2]: http://goo.gl/qktJUS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.14 released

2016-04-26 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.14.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/7lm5sY (CHANGES.txt)
[2]: http://goo.gl/SUIzT9 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Jake Luciani
What kind of collection? if its par new I wouldn't worry.

On Thu, Apr 21, 2016 at 2:02 PM, Sotirios Delimanolis 
wrote:

> Should this be of any concern? Are the corresponding threads spending too
> long in this JNI critical region and delaying GC?
>
> I don't get that impression at all from the GC log timings. They're very
> reasonable.
>
> On Thursday, April 21, 2016 10:57 AM, Jake Luciani 
> wrote:
>
>
> It's only used by the Snappy and LZ4 Compressors
>
> On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis <
> sotodel...@yahoo.com> wrote:
>
> According to this Oracle document
> <https://blogs.oracle.com/g1gc/entry/g1_gc_glossary_of_terms>, GCLocker
> Initiated GC
>
> is triggered when a JNI critical region was released. GC is blocked
> when any thread is in the JNI Critical region.
> If GC was requested during that period, that GC is invoked after all
> the threads come out of the JNI critical region.
>
> What part of Cassandra's implementation does anything with JNI?
>
> In our GC logs, this is by far the most common reason for GC pauses.
>
>
>
>
> --
> http://twitter.com/tjake
>
>
>


-- 
http://twitter.com/tjake


Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Jake Luciani
It's only used by the Snappy and LZ4 Compressors

On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis 
wrote:

> According to this Oracle document
> , GCLocker
> Initiated GC
>
> is triggered when a JNI critical region was released. GC is blocked
> when any thread is in the JNI Critical region.
> If GC was requested during that period, that GC is invoked after all
> the threads come out of the JNI critical region.
>
> What part of Cassandra's implementation does anything with JNI?
>
> In our GC logs, this is by far the most common reason for GC pauses.
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.5 released

2016-04-13 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.5 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/FchTrl (CHANGES.txt)
[2]: http://goo.gl/0zpkJU (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.5 released

2016-04-11 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/tlNv8g (CHANGES.txt)
[2]: http://goo.gl/WrCSKw (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.4 released

2016-03-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a feature release[1] on the 3.4 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/l61Mvd (CHANGES.txt)
[2]: http://goo.gl/hIamQh (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.4 released

2016-03-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/i27IR3 (CHANGES.txt)
[2]: http://goo.gl/8Fy3pe (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jake Luciani
Well typically you should run upgradesstables when you upgrade major
versions as well

https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html

On Tue, Feb 9, 2016 at 6:11 PM, Will Zhang  wrote:

> Nice work guys.
>
> Just to confirm, if you upgrade from, 2.2.x say, directly to 3.3, you will
> *not* need to run upgradesstables, right? It seems pretty clear that the
> answer is no but I just wanted to make sure. Only needed if you got from a
> 3.x version?
>
> Thank you.
>
> Sent from my iPhone
>
> On 9 Feb 2016, at 19:06, Jake Luciani  wrote:
>
> No problem. Run it after you upgrade.
>
> On Tue, Feb 9, 2016 at 2:01 PM, Will Hayworth 
> wrote:
>
>> Pardon my ignorance, Jake--should we run upgradesstables -a after or
>> before we install 3.3?
>>
>> Thanks! :)
>>
>> ___
>> Will Hayworth
>> Developer, Engagement Engine
>> Atlassian
>>
>> My pronoun is "they". <http://pronoun.is/they>
>>
>>
>>
>> On Tue, Feb 9, 2016 at 10:50 AM, Jake Luciani  wrote:
>>
>>> The Cassandra team is pleased to announce the release of Apache Cassandra
>>> version 3.3.
>>>
>>> *This release contains a critical bug in 3.0 series[4].* If you have
>>> installed version >= 3.0
>>> you will need to run 'nodetool upgradesstables -a' on all nodes to
>>> receive the fix.
>>>
>>> Apache Cassandra is a fully distributed database. It is the right choice
>>> when you need scalability and high availability without compromising
>>> performance.
>>>
>>>  http://cassandra.apache.org/
>>>
>>> Downloads of source and binary distributions are listed in our download
>>> section:
>>>
>>>  http://cassandra.apache.org/download/
>>>
>>> This version is a bug fix release[1] on the 3.3 series. As always,
>>> please pay
>>> attention to the release notes[2] and Let us know[3] if you were to
>>> encounter
>>> any problem.
>>>
>>> Enjoy!
>>>
>>> [1]: http://goo.gl/V2lsST (CHANGES.txt)
>>> [2]: http://goo.gl/5UBlNl (NEWS.txt)
>>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>> [4]: https://issues.apache.org/jira/browse/CASSANDRA-11102
>>>
>>>
>>
>
>
> --
> http://twitter.com/tjake
>
>


-- 
http://twitter.com/tjake


Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jake Luciani
No problem. Run it after you upgrade.

On Tue, Feb 9, 2016 at 2:01 PM, Will Hayworth 
wrote:

> Pardon my ignorance, Jake--should we run upgradesstables -a after or
> before we install 3.3?
>
> Thanks! :)
>
> ___
> Will Hayworth
> Developer, Engagement Engine
> Atlassian
>
> My pronoun is "they". <http://pronoun.is/they>
>
>
>
> On Tue, Feb 9, 2016 at 10:50 AM, Jake Luciani  wrote:
>
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 3.3.
>>
>> *This release contains a critical bug in 3.0 series[4].* If you have
>> installed version >= 3.0
>> you will need to run 'nodetool upgradesstables -a' on all nodes to
>> receive the fix.
>>
>> Apache Cassandra is a fully distributed database. It is the right choice
>> when you need scalability and high availability without compromising
>> performance.
>>
>>  http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>>  http://cassandra.apache.org/download/
>>
>> This version is a bug fix release[1] on the 3.3 series. As always, please
>> pay
>> attention to the release notes[2] and Let us know[3] if you were to
>> encounter
>> any problem.
>>
>> Enjoy!
>>
>> [1]: http://goo.gl/V2lsST (CHANGES.txt)
>> [2]: http://goo.gl/5UBlNl (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>> [4]: https://issues.apache.org/jira/browse/CASSANDRA-11102
>>
>>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.3.

*This release contains a critical bug in 3.0 series[4].* If you have
installed version >= 3.0
you will need to run 'nodetool upgradesstables -a' on all nodes to receive
the fix.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.3 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/V2lsST (CHANGES.txt)
[2]: http://goo.gl/5UBlNl (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: https://issues.apache.org/jira/browse/CASSANDRA-11102


[RELEASE] Apache Cassandra 3.0.3 released

2016-02-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.3.

*This release contains a critical bug in 3.0 series[4].* If you have
installed version >= 3.0
you will need to run 'nodetool upgradesstables -a' on all nodes to receive
the fix.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/UtWBp4 (CHANGES.txt)
[2]: http://goo.gl/QGrGiy (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: https://issues.apache.org/jira/browse/CASSANDRA-11102


[RELEASE] Apache Cassandra 2.2.5 released

2016-02-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/lT2JXJ  (CHANGES.txt)
[2]: http://goo.gl/9m6hGQ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 2.1.13 released

2016-02-08 Thread Jake Luciani
Apologies I send the wrong changelog and news links.

Here are the correct ones for 2.1.13

http://goo.gl/9ZPnNX (CHANGES.txt)
http://goo.gl/5cR7eh (NEWS.txt)



On Mon, Feb 8, 2016 at 9:19 AM, Jake Luciani  wrote:

> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 2.1.13.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 2.1 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/lT2JXJ (CHANGES.txt)
> [2]: http://goo.gl/9m6hGQ (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>


[RELEASE] Apache Cassandra 2.1.13 released

2016-02-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.13.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/lT2JXJ (CHANGES.txt)
[2]: http://goo.gl/9m6hGQ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: cassandra-stress tool - InvalidQueryException: Batch too large

2016-02-01 Thread Jake Luciani
Yeah that looks like a bug.  Can you open a JIRA and attach the full .yaml?

Thanks!


On Mon, Feb 1, 2016 at 5:09 AM, Ralf Steppacher 
wrote:

> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress
> tool to work for my test scenario. I have followed the example on
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to
> create a yaml file describing my test.
>
> I am collecting events per user id (text, partition key). Events have a
> session type (text), event type (text), and creation time (timestamp)
> (clustering keys, in that order). Plus some more attributes required for
> rendering the events in a UI. For testing purposes I ended up with the
> following column spec and insert distribution:
>
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
>
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
>
>
> Running stress tool for just the insert prints
>
> Generating batches with [1..1] partitions and [0..1] rows (of
> [10..120] total rows in the partitions)
>
> and then immediately starts flooding me with
> "com.datastax.driver.core.exceptions.InvalidQueryException: Batch too
> large”.
>
> Why I should be exceeding the "batch_size_fail_threshold_in_kb: 50” in the
> cassandra.yaml I do not understand. My understanding is that the stress
> tool should generate one row per batch. The size of a single row should not
> exceed 8+10*3+5*3+15*3+100*3 = 398 bytes. Assuming a worst case of all text
> characters being 3 byte unicode characters.
>
> How come I end up with batches that exceed the 50kb threshold? Am I
> missing the point about the “select” attribute?
>
>
> Thanks!
> Ralf
>



-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.2.1 released

2016-01-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.2.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/ySa5hr (CHANGES.txt)
[2]: https://goo.gl/tCBBPv (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jake Luciani
Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node
and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

>
> Ok, I will open a ticket.
>
> How could I restart my cluster without loosing everything ?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
> Thanks
>
> Jean
>
> On 14 Jan 2016, at 18:19, Tyler Hobbs  wrote:
>
> I don't think that's a known issue.  Can you open a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
> along with the commitlog files and the mutation that was saved to /tmp?
>
> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>> I use Cassandra 3.1.1.
>> I use the following setup for the memory:
>>   MAX_HEAP_SIZE="6G"
>> HEAP_NEWSIZE="496M"
>>
>> I have been loading a lot of data in this cluster over the last 24 hours.
>> The system behaved I think very nicely. It was loading very fast, and
>> giving excellent read time. There was no error messages until this one:
>>
>>
>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.lang.OutOfMemoryError: Java heap space
>> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at java.lang.Thread.run(Thread.jav

Re: [RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
Note: I made a mistake saying this is a bug fix release, it's a feature
release that includes bugfixes.

On Tue, Jan 12, 2016 at 8:46 AM, Jake Luciani  wrote:

>
> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 3.2.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.2 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/vBb0Ad (CHANGES.txt)
> [2]: http://goo.gl/JjUIGF (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>


[RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/vBb0Ad (CHANGES.txt)
[2]: http://goo.gl/JjUIGF (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.2 released

2015-12-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/swRjp9 (CHANGES.txt)
[2]: https://goo.gl/ipA763 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.1.1 released

2015-12-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.1.1.

There has been some understandable confusion about our new Tick-Tock
release style.  This thread should help explain it [4]. Since a critical
bug was discovered just after 3.1 we are releasing 3.1.1 to address it
before 3.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/etxSuG (CHANGES.txt)
[2]: https://goo.gl/gP7B3J (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: http://www.mail-archive.com/user@cassandra.apache.org/msg45119.html


[RELEASE] Apache Cassandra 3.1 released

2015-12-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.1. This is the first release from our new Tick-Tock release
process[4].
It contains only bugfixes on the 3.0 release.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.x series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/rQJ9yd (CHANGES.txt)
[2]: http://goo.gl/WBrlCs (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/


[RELEASE] Apache Cassandra 3.0.1 released

2015-12-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/99MRn6 (CHANGES.txt)
[2]: http://goo.gl/jwoQl6 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.12 released

2015-12-07 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.12.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Phl5Pd (CHANGES.txt)
[2]: http://goo.gl/L1HIfj (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.4 released

2015-12-07 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/EWjhm1 (CHANGES.txt)
[2]: http://goo.gl/WLSytN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: cassandra-stress 2.1: Generating data

2015-12-03 Thread Jake Luciani
The data is only being inserted from gen01

On Thu, Dec 3, 2015 at 10:52 AM,  wrote:

> Hi,
>
>
>
> I’m trying to insert data with Cassandra-stress into cluster C* with 6
> nodes: *node001….006*
>
>
>
> The stress-tool is executed on a different machine (*gen01*) specifying
> one of 6 nodes: tools/bin/cassandra-stress  user profile=cf.yml
> ops\(insert=1\) n=500 -mode thrift -node node001  -rate threads=50
>
>
>
> My question : The data generation of data is done on  gen01 and then
> inserted on nodes Cassandra OR ALL (generation and insertion) is running on
> nodes Cassandra ?
>
>
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this 
> message was modified, changed or falsified.
> Thank you.
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.0.0 released

2015-11-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0.

Top Cassandra 3.0 features:

  * CQL optimized storage engine and sstable format
  * Materialized views
  * More efficient hints

Read more about features and upgrade instructions in NEWS.txt[2]

The Java driver beta for 3.0.0 will be officially released within the next
week.  In the meantime,
use the version included in the release under /lib.

The Python driver rc has been released as '3.0.0rc1'

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a first release[1] on the 3.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/TduZdw (CHANGES.txt)
[2]: http://goo.gl/mJxdHZ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-rc2 released

2015-10-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-rc2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] for the 3.0 series. As always,
please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/mLK41h (CHANGES.txt)
[2]: http://goo.gl/JO8474 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.11 released

2015-10-16 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.11.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/mJCyUf (CHANGES.txt)
[2]: http://goo.gl/ax1w4y (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.3 released

2015-10-16 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.3.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/zLlUcO (CHANGES.txt)
[2]: http://goo.gl/pC433O (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.2 released

2015-10-05 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/d9xIEO (CHANGES.txt)
[2]: http://goo.gl/S64khA (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.10 released

2015-10-05 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.10.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/KE0tlf (CHANGES.txt)
[2]: http://goo.gl/0CW2iz (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Consistency Issues

2015-10-01 Thread Jake Luciani
Onur, was responding to Stephen's issue.


On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı  wrote:

> Thank you Jake.
>
> The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not
> a possibility because of the deprecation of cql dialects. Our application
> is using Hector and migrating to cql3 is a huge refactoring.
>
>
>
> On 01/10/15 15:48, Jake Luciani wrote:
>
>> Couple things to try.
>>
>> 1. nodetool resetlocalschema on the nodes with missing CFs. This will
>> refresh the schema on the local node.
>> 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
>> specific to this problem but worth upgrading)
>>
>
>


-- 
http://twitter.com/tjake


Re: Consistency Issues

2015-10-01 Thread Jake Luciani
Couple things to try.

1. nodetool resetlocalschema on the nodes with missing CFs. This will
refresh the schema on the local node.
2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
specific to this problem but worth upgrading)


[RELEASE] Apache Cassandra 3.0.0-rc1 released

2015-09-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Oppn3S (CHANGES.txt)
[2]: http://goo.gl/zQFaj4 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.17 released

2015-09-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.17.

This is most likely the final release for the 2.0 release series.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/QwruFc (CHANGES.txt)
[2]: http://goo.gl/fHlSqL (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-beta2 released

2015-09-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-beta2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a *beta* release[1] on the 3.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/UWe5yb (CHANGES.txt)
[2]: http://goo.gl/xWOSDa (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.1 released

2015-09-01 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/x6ilHu (CHANGES.txt)
[2]: http://goo.gl/FHwYLN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.9 released

2015-08-28 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.9.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/xnYwFa (CHANGES.txt)
[2]: http://goo.gl/QDqPhN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-beta1 released

2015-08-24 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-beta1.

You’ll need python-driver 3.0.0a2 (available on pypi) or java-driver
3.0.0-alpha2 (uploaded to Maven Central) to try out 3.0.0-beta1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a *BETA* release[1] on the 3.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/2TNRm5 (CHANGES.txt)
[2]: http://goo.gl/9xluWy (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-alpha1 released

2015-08-03 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-alpha1.

This is the first test build of Cassandra 3.0 that includes:

   * New storage engine
   * New sstable format
   * Materialized Views

We expect bugs in this release so test and report any issues please!


Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a *ALPHA* release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/qTe3Ed (CHANGES.txt)
[2]: http://goo.gl/eMIDGw (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0 released

2015-07-20 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0.

You can read about the release here:
http://www.datastax.com/dev/blog/cassandra-2-2

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is the first release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/nUjs6O (CHANGES.txt)
[2]: http://goo.gl/Qk4ljt (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0-rc2 released

2015-07-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-rc2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pE0pPF (CHANGES.txt)
[2]: http://goo.gl/h5OJie (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.8 released

2015-07-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.8.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/heI10N (CHANGES.txt)
[2]: http://goo.gl/BIe5dS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Adding Nodes With Inconsistent Data

2015-06-24 Thread Jake Luciani
This is no longer an issue in 2.1.
https://issues.apache.org/jira/browse/CASSANDRA-2434

We now make sure the replica we bootstrap from is the one that will no
longer own that range

On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ  wrote:

> It looks to me that can indeed happen theoretically (I might be wrong).
>
> However,
>
> - Hinted Handoff tends to remove this issue, if this is big worry, you
> might want to make sure HH are enabled and well tuned
> - Read Repairs (synchronous or not) might have mitigate things also, if
> you read fresh data. You can set this to higher values.
> - After an outage, you should always run a nodetool repair on the node
> that went done - following the best practices, or because you understand
> the reasons - or just trust HH if it is enough to you.
>
> So I would say that you can always "shoot yourself in your foot", whatever
> you do, yet following best practices or understanding the internals is the
> key imho.
>
> I would say it is a good question though.
>
> Alain.
>
>
>
> 2015-06-24 19:43 GMT+02:00 Anuj Wadehra :
>
>> Hi,
>>
>> We faced a scenario where we lost little data after adding 2 nodes in the
>> cluster. There were intermittent dropped mutations in the cluster. Need to
>> verify my understanding how this may have happened to do Root Cause
>> Analysis:
>>
>> Scenario: 3 nodes, RF=3, Read / Write CL= Quorum
>>
>> 1. Due to overloaded cluster, some writes just happened on 2 nodes: node
>> 1 & node 2 whike asynchronous mutations dropped on node 3.
>> So say key K with Token T was not written to 3.
>>
>> 2. I added node 4 and suppose as per newly calculated ranges, now token T
>> is supposed to have replicas on node 1, node 3, and node 4. Unfortunately
>> node 4 started bootstrapping from node 3 where key K was missing.
>>
>> 3. After 2 min gap recommended, I added node 5 and as per new token
>> distribution suppose token T now is suppossed to have replicas on node 3,
>> node 4 and node 5. Again node 5 bootstrapped from node 3 where data was
>> misssing.
>>
>> So now key K is lost and thats how we list very few rows.
>>
>> Moreover, in step 1 situation could be worse. we can also have a scenario
>> where some writes just happened on one of three replicas and cassandra
>> chooses  replicas where this data is missing for streaming ranges to 2 new
>> nodes.
>>
>> Am I making sense?
>>
>> We are using C* 2.0.3.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 2.1.7 released

2015-06-22 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/0AxLpL (CHANGES.txt)
[2]: http://goo.gl/kkEDSi (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.16 released

2015-06-22 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.16.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/XtSTxA (CHANGES.txt)
[2]: http://goo.gl/9NHMdH (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0-rc1 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pBjybx (CHANGES.txt)
[2]: http://goo.gl/E1RiHd (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.6 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.6.  We are now calling 2.1 series stable and suitable for
production.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/8aR9L2 (CHANGES.txt)
[2]: http://goo.gl/dstU4D (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Multiple cassandra instances per physical node

2015-05-26 Thread Jake Luciani
>
>  If I have a 20-node cluster with 2 nodes on each physical server, can I
> use 10 racks to properly segment my partitions?
>
>
Yes.


>
>
> On Sun, May 24, 2015 at 5:38 PM, Jonathan Haddad 
> wrote:
>
>> What impact would vnodes have on strong consistency?  I think the problem
>> you're describing exists with or without them.
>>
>> On Sat, May 23, 2015 at 2:30 PM Nate McCall 
>> wrote:
>>
>>>
 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container & IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?

>>>
>>> Don't use vnodes if any operations need strong consistency (reading or
>>> writing at quorum). Otherwise, at RF=3, if you loose a single node you will
>>> only have one 1 replica left for some portion of the ring.
>>>
>>>
>>>
>>> --
>>> -
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>
>
> --
> *Ken Hancock *| System Architect, Advanced Advertising
> SeaChange International
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
> 
> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
> 
>
> [image: SeaChange International]
> This e-mail and any attachments may contain
> information which is SeaChange International confidential. The information
> enclosed is intended only for the addressees herein and may not be copied
> or forwarded without permission from SeaChange International.
>



-- 
http://twitter.com/tjake


[BETA-RELEASE] Apache Cassandra 2.2.0-beta1 released

2015-05-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-beta1.

This release is *not* production ready. We are looking for testing of
existing and new features. If you encounter any problem please let us know
[1].

Cassandra 2.2 features major enhancements such as:

* Resume-able Bootstrapping
* JSON Support [4]
* User Defined Functions [5]
* Server-side Aggregation [6]
* Role based access control

Read [2] and [3] to learn about all the new features.

Downloads of source and binary distributions are listed in our download
section:

http://cassandra.apache.org/download/

Enjoy!

-The Cassandra Team

[1]: https://issues.apache.org/jira/browse/CASSANDRA
[2]: http://goo.gl/MyOEib (NEWS.txt)
[3]: http://goo.gl/MBJd1S (CHANGES.txt)
[4]: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#json
[5]: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
[6]: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udas


[RELEASE] Apache Cassandra 2.0.15 released

2015-05-18 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.15.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/G050Kn (CHANGES.txt)
[2]: http://goo.gl/ZyvMnR (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.5 released

2015-04-29 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/xjzhhE (CHANGES.txt)
[2]: http://goo.gl/skvzNS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[SECURITY ANNOUNCEMENT] CVE-2015-0225

2015-04-01 Thread Jake Luciani
CVE-2015-0225: Apache Cassandra remote execution of arbitrary code

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Cassandra 1.2.0 to 1.2.19
Cassandra 2.0.0 to 2.0.13
Cassandra 2.1.0 to 2.1.3

Description:
Under its default configuration, Cassandra binds an unauthenticated
JMX/RMI interface to all network interfaces.  As RMI is an API for the
transport and remote execution of serialized Java, anyone with access
to this interface can execute arbitrary code as the running user.

Mitigation:
1.2.x has reached EOL, so users of <= 1.2.x are recommended to upgrade
to a supported version of Cassandra, or manually configure encryption
and authentication of JMX,
(seehttps://wiki.apache.org/cassandra/JmxSecurity).
2.0.x users should upgrade to 2.0.14
2.1.x users should upgrade to 2.1.4
Alternately, users of any version not wishing to upgrade can
reconfigure JMX/RMI to enable encryption and authentication according
to https://wiki.apache.org/cassandra/JmxSecurityor
http://docs.oracle.com/javase/7/docs/technotes/guides/management/agent.html

Credit:
This issue was discovered by Georgi Geshev of MWR InfoSecurity


[RELEASE] Apache Cassandra 2.0.13 released

2015-03-16 Thread Jake Luciani
Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/Rh9gyx (CHANGES.txt)
[2]: http://goo.gl/k8vIom (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Jake Luciani
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon  wrote:
> I have been using the cassandra-stress tool to evaluate my cassandra cluster
> for quite some time now. My problem is that I am not able to comprehend the
> results generated for my specific use case.
>
> My schema looks something like this:
>
> CREATE TABLE Table_test(
>   ID uuid,
>   Time timestamp,
>   Value double,
>   Date timestamp,
>   PRIMARY KEY ((ID,Date), Time)
> ) WITH COMPACT STORAGE;
>
> I have parsed this information in a custom yaml file and used parameters
> n=1, threads=100 and the rest are default options (cl=one, mode=native
> cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.
>
> A few specifics of the custom yaml file are as follows:
>
> insert:
> partitions: fixed(100)
> select: fixed(1)/2
> batchtype: UNLOGGED
>
> columnspecs:
> -name: Time
>  size: fixed(1000)
> -name: ID
>  size: uniform(1..100)
> -name: Date
>  size: uniform(1..10)
> -name: Value
>  size: uniform(-100..100)
>
> My observations so far are as follows (Please correct me if I am wrong):
>
> With n=1 and time: fixed(1000), the number of rows getting inserted is
> 10 million. (1*1000=1000)
> The number of row-keys/partitions is 1(i.e n), within which 100
> partitions are taken at a time (which means 100 *1000 = 10 key-value
> pairs) out of which 5 key-value pairs are processed at a time. (This is
> because of select: fixed(1)/2 ~ 50%)
>
> The output message also confirms the same:
>
> Generating batches with [100..100] partitions and [5..5] rows
> (of[10..10] total rows in the partitions)
>
> The results that I get are the following for consecutive runs with the same
> configuration as above:
>
> Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
> 1 56   19 1885   943246 3.0
> 2 46   46 4648  2325498 1.0
> 3 27   30 2982  1489870 0.9
> 4 59   19 1932   966034 3.1
> 5 100  17 1730   865182 5.8
>
> Now what I need to understand are as follows:
>
> Which among these metrics is the throughput i.e, No. of records inserted per
> second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
> can I safely conclude here that I am able to insert close to 1 million
> records per second? Any thoughts on what the Op_rate and Partition_rate mean
> in this case?
> Why is it that the Total_ops vary so drastically in every run ? Has the
> number of threads got anything to do with this variation? What can I
> conclude here about the stability of my Cassandra setup?
> How do I determine the batch size per thread here? In my example, is the
> batch size 5?
>
> Thanks in advance.



-- 
http://twitter.com/tjake


Re: Many pending compactions

2015-02-18 Thread Jake Luciani
Ja, Please upgrade to official 2.1.3 we've fixed many things related to
compaction.  Are you seeing the compactions % complete progress at all?

On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar 
wrote:

> Try repair -pr on all nodes.
>
> If after that you still have issues, you can try to rebuild the SSTables
> using nodetool upgradesstables or scrub.
>
> Regards,
>
> Roni Balthazar
>
> Em 18/02/2015, às 14:13, Ja Sam  escreveu:
>
> ad 3)  I did this already yesterday (setcompactionthrouput also). But
> still SSTables are increasing.
>
> ad 1) What do you think I should use -pr or try to use incremental?
>
>
>
> On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar 
> wrote:
>
>> You are right... Repair makes the data consistent between nodes.
>>
>> I understand that you have 2 issues going on.
>>
>> You need to run repair periodically without errors and need to decrease
>> the numbers of compactions pending.
>>
>> So I suggest:
>>
>> 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
>> use incremental repairs. There were some bugs on 2.1.2.
>> 2) Run cleanup on all nodes
>> 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
>> and increase setcompactionthroughput for some time and see if the number
>> of SSTables is going down.
>>
>> Let us know what errors are you getting when running repairs.
>>
>> Regards,
>>
>> Roni Balthazar
>>
>>
>> On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam  wrote:
>>
>>> Can you explain me what is the correlation between growing SSTables and
>>> repair?
>>> I was sure, until your  mail, that repair is only to make data
>>> consistent between nodes.
>>>
>>> Regards
>>>
>>>
>>> On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar >> > wrote:
>>>
 Which error are you getting when running repairs?
 You need to run repair on your nodes within gc_grace_seconds (eg:
 weekly). They have data that are not read frequently. You can run
 "repair -pr" on all nodes. Since you do not have deletes, you will not
 have trouble with that. If you have deletes, it's better to increase
 gc_grace_seconds before the repair.

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
 After repair, try to run a "nodetool cleanup".

 Check if the number of SSTables goes down after that... Pending
 compactions must decrease as well...

 Cheers,

 Roni Balthazar




 On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam  wrote:
 > 1) we tried to run repairs but they usually does not succeed. But we
 had
 > Leveled compaction before. Last week we ALTER tables to STCS, because
 guys
 > from DataStax suggest us that we should not use Leveled and alter
 tables in
 > STCS, because we don't have SSD. After this change we did not run any
 > repair. Anyway I don't think it will change anything in SSTable count
 - if I
 > am wrong please give me an information
 >
 > 2) I did this. My tables are 99% write only. It is audit system
 >
 > 3) Yes I am using default values
 >
 > 4) In both operations I am using LOCAL_QUORUM.
 >
 > I am almost sure that READ timeout happens because of too much
 SSTables.
 > Anyway firstly I would like to fix to many pending compactions. I
 still
 > don't know how to speed up them.
 >
 >
 > On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar <
 ronibaltha...@gmail.com>
 > wrote:
 >>
 >> Are you running repairs within gc_grace_seconds? (default is 10 days)
 >>
 >>
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
 >>
 >> Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
 >> that you do not read often.
 >>
 >> Are you using default values for the properties
 >> min_compaction_threshold(4) and max_compaction_threshold(32)?
 >>
 >> Which Consistency Level are you using for reading operations? Check
 if
 >> you are not reading from DC_B due to your Replication Factor and CL.
 >>
 >>
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
 >>
 >>
 >> Cheers,
 >>
 >> Roni Balthazar
 >>
 >> On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam 
 wrote:
 >> > I don't have problems with DC_B (replica) only in DC_A(my system
 write
 >> > only
 >> > to it) I have read timeouts.
 >> >
 >> > I checked in OpsCenter SSTable count  and I have:
 >> > 1) in DC_A  same +-10% for last week, a small increase for last
 24h (it
 >> > is
 >> > more than 15000-2 SSTables depends on node)
 >> > 2) in DC_B last 24h shows up to 50% decrease, which give nice
 >> > prognostics.
 >> > Now I have less then 1000 SSTables
 >> >
 >> > What did you measure during system optimizations? Or do you have
 an idea
 >> > what m

[RELEASE] Apache Cassandra 2.1.3 released

2015-02-17 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.3.

This release contains over 100 fixes for 2.1 so anyone on 2.1.X should
upgrade to this ASAP.


Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/xGm4Qq (CHANGES.txt)
[2]: http://goo.gl/dBGQa0 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.12 released

2015-01-20 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.12.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/ZeeTfs (CHANGES.txt)
[2]: http://goo.gl/1zEijH (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.2 released

2014-11-10 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problems.

Enjoy!

[1]: http://goo.gl/pi45XF (CHANGES.txt)
[2]: http://goo.gl/vtSXzZ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Apache Cassandra debian repo issue

2014-10-31 Thread Jake Luciani
Hello,

There is currently an issue with the apache debian repo for cassandra.

ASF infrastructure is working on fixing this
https://issues.apache.org/jira/browse/INFRA-8558

Sorry for the inconvenience.

-Jake


Re: CPU consumption of Cassandra

2014-09-22 Thread Jake Luciani
Eric,

We have a new stress tool to help you share your schema for wider bench
marking.  see
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
If you wouldn't mind creating a yaml for your schema I would be happy to
take a look.

-Jake




On Mon, Sep 22, 2014 at 12:39 PM, Leleu Eric 
wrote:

>  Hi,
>
>
>
>
>
> I’m currently testing Cassandra 2.0.9  (and since the last week 2.1) under
> some read heavy load…
>
>
>
> I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM
> and 8 Cores.
>
> I have around 93GB of data per node (one Disk of 300GB with SAS interface
> and a Rotational Speed of 10500)
>
>
>
> I have 300 active client threads and they request the C* nodes with a
> Consitency level set to ONE (I’m using the CQL datastax driver).
>
>
>
> During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4%
> iowait / 20%idle).
>
> C* nodes respond to around 5000 op/s (sometime up to 6000op/s)
>
>
>
> I try to profile a node and at the first look, 60% of the CPU is passed in
> the “sun.nio.ch” package. (SelectorImpl.select or Channel.read)
>
>
>
> I know that Benchmark results are highly dependent of the Dataset and use
> cases, but according to my point of view this CPU consumption is normal
> according to the load.
>
> Someone can confirm that point ?
>
> According to my Hardware configuration, can I expect to have more than
> 6000 read op/s ?
>
>
>
>
>
> Regards,
>
> Eric
>
>
>
>
>
>
>
>
>
> --
>
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage
> exclusif de ses destinataires. Il peut également être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
> être recherchée quant au contenu de ce message. Bien que les meilleurs
> efforts soient faits pour maintenir cette transmission exempte de tout
> virus, l'expéditeur ne donne aucune garantie à cet égard et sa
> responsabilité ne saurait être recherchée pour tout dommage résultant d'un
> virus transmis.
>
> This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, the Worldline liability
> cannot be triggered for the message content. Although the sender endeavours
> to maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted.
>



-- 
http://twitter.com/tjake


Re: [RELEASE] Apache Cassandra 1.2.19 released

2014-09-18 Thread Jake Luciani
Apologies, the correct url for CHANGES.txt is http://goo.gl/eB973i

On Thu, Sep 18, 2014 at 12:58 PM, Jake Luciani  wrote:

> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 1.2.19.
>
> Cassandra is a highly scalable second-generation distributed database,
> bringing together Dynamo's fully distributed design and Bigtable's
> ColumnFamily-based data model. You can read more here:
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a maintenance/bug fix release[1] on the 1.2 series. As
> always,
> please pay attention to the release notes[2] and Let us know[3] if you
> were to
> encounter any problem. This will likely be the final release in the 1.2
> series.
>
> Enjoy!
>
> [1]: http://goo.gl/F6szqv (CHANGES.txt)
> [2]: http://goo.gl/9VsZ88 (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>


[RELEASE] Apache Cassandra 1.2.19 released

2014-09-18 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.19.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem. This will likely be the final release in the 1.2
series.

Enjoy!

[1]: http://goo.gl/F6szqv (CHANGES.txt)
[2]: http://goo.gl/9VsZ88 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jake Luciani
I'll note that historically the wiki used to be open to all and due massive
amounts of spam it was put on lockdown by the ASF.

If there is a better platform the community feels would make it simpler to
provide community based documentation then we should consider it.
The ASF also has confluence wiki which might be simpler for users to
contribute to? (at least they have captchas)

-Jake



On Wed, Jul 23, 2014 at 9:20 AM, Peter Lin  wrote:

> @benedict - you're right that I've haven't requested permission to edit.
> You're also right that I've given up on getting edit permission to
> cassandra wiki. I've been struggling and struggled with "how" to manage
> open source projects, so I totally get it. Managing projects is a thankless
> job most of the time. Pleasing everyone is totally impossible. Apache isn't
> alone in this. I've submitted stuff to google's open source projects in the
> past and had it go into a black hole. We all struggle with managing open
> source projects.
>
> I am committed to contributing Cassandra community, but just not through
> the wiki. There's lots of different ways to contribute. The jira tickets
> I've submitted have gotten good responses generally. It does take several
> days depending on how busy the committers are, but that's normal for all
> projects.
>
>
>
> On Wed, Jul 23, 2014 at 9:00 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
>
>> Requesting a change is very different to requesting permission to edit
>> (which, I note, still hasn't been made); we do our best to promote
>> community engagement, so granting a privilege request has a different
>> mental category to a random edit request, which is much more likely to be
>> forgotten by any particular committer in the process of attending to their
>> more pressing work.
>>
>> The relationship between committers and the community is debated at
>> length in all projects, often by vocal individuals such as yourselves who
>> are unhappy in some way with how the project is being run. However it is
>> very hard to please everyone - most of the time we can't even please all
>> the committers, and that is a much smaller and more homogenous group.
>>
>>
>>
>>
>>
>> On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin  wrote:
>>
>>>
>>> I sent a request to add a link my .Net driver for cassandra to the wiki
>>> over 5 weeks back and no response at all.
>>>
>>> I sent another request way back in 2013 and got zero response. Again, I
>>> totally understand people are busy and I'm just as guilty as everyone else
>>> of letting requests slip by. It's the reality of contributing to open
>>> source as a hobby. If I wasn't serious about contributing to cassandra
>>> community, I wouldn't have spent 2.5 months porting Hector to C# manually.
>>>
>>> Perhaps the real cause is that some committers can't "empathise" with
>>> others in the community?
>>>
>>>
>>> On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith <
>>> belliottsm...@datastax.com> wrote:
>>>
 All requests I've seen in the past year to edit the wiki (admittedly
 only 2-3) have been answered promptly with editing privileges. Personally I
 don't have a major preference either way for policy - there are positives
 and negatives to each approach - but, like I said, raise it on the dev list
 and see if anybody else does.

 However I must admit I cannot empathise with your characterisation of
 requesting permission as 'begging', or a 'slap in the face', or that it is
 even particularly onerous. It is a slight psychological barrier, but in my
 personal experience when a psychological barrier as low as this prevents me
 from taking action, it's usually because I don't have as much desire to
 contribute as I thought I did.




 On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin  wrote:

>
> I've submitted requests to edit the wiki in the past and nothing ever
> got done.
>
> Having been an apache committer and contributor over the years, I can
> totally understand that people are busy. I also understand that "most"
> developer find writing docs tedious.
>
> I'd rather not harass the committers about wiki edits, since I didn't
> like it when it happened to me in the past. That's why many apache 
> projects
> keep their wiki's open. Honestly, as much as I find writing docs
> challenging and tedious, it's critical and important. For my other open
> source projects, I force myself to write docs.
>
> my point is, the wiki should be open and the barrier should be
> removed. Having to "beg/ask" to edit the wiki feels like a slap in the 
> face
> to me, but maybe I'm alone in this. Then again, I've heard the same
> sentiment from other people about cassandra's wiki. The thing is, they 
> just
> chalk it up to "cassandra committers don't give a crap about docs". I do 
> my
> best to defend the committers and point out some are

Re: Which way to Cassandraville?

2014-07-22 Thread Jake Luciani
Checkout datastax devcenter which is a GUI datamodelling tool for cql3

http://www.datastax.com/what-we-offer/products-services/devcenter


On Sun, Jul 20, 2014 at 7:17 PM, jcllings  wrote:

> So I'm a Java application developer and I'm trying to find entry points
> for learning to work with Cassandra.
> I just finished reading "Cassandra: The Definitive Guide" which seems
> pretty out of date and while very informative as to the technology that
> Cassandra uses, was not very helpful from the perspective of an
> application developer.
>
> Having said that, what Java clients should I be looking at?  Are there
> any reasonably mature PoJo mapping techs for Cassandra analogous to
> Hibernate? I can't say that I'm looking forward to yet another *QL
> variant but I guess CQL is going to be a necessity.  What, if any, GUI
> tools are available for working with Cassandra, for data modelling?
>
> Jim C.
>
>


-- 
http://twitter.com/tjake


Re: high pending compactions

2014-06-08 Thread Jake Luciani
2&3

On Sunday, June 8, 2014, S C  wrote:

> I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending
> compaction count. "pending tasks: 67" while active compaction tasks are
> not more than 5. I have a 24CPU machine. Shouldn't I be seeing more
> compactions? Is this a pattern of high writes and compactions backing up?
> How can I improve this? Here are my thoughts.
>
>
>1. Increase memtable_total_space_in_mb
>2. Increase compaction_throughput_mb_per_sec
>3. Increase concurrent_compactions
>
>
> Sorry if this was discussed already. Any pointers is much appreciated.
>
> Thanks,
> Kumar
>


-- 
http://twitter.com/tjake


Re: HsHa

2013-08-14 Thread Jake Luciani
This is technically a Thrift message not Cassandra, it happens when a
client hangs up without closing the socket.
You should be able to silence it by raising the class specific log level
see log4j-server.properties as an example


On Wed, Aug 14, 2013 at 9:59 AM, Alain RODRIGUEZ  wrote:

> @Commiters/Experts,
>
> Does this sound like a bug or like 4 PEBCAKs to you ? Should we raise a
> JIRA ?
>
> Alain
>
>
> 2013/8/14 Keith Wright 
>
>> Same here on 1.2.4.
>>
>> From: Romain HARDOUIN 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Wednesday, August 14, 2013 3:36 AM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: HsHa
>>
>> The same goes for us.
>>
>> Romain
>>
>> Alain RODRIGUEZ  a écrit sur 13/08/2013 18:10:05 :
>>
>> > De : Alain RODRIGUEZ 
>> > A : user@cassandra.apache.org,
>> > Date : 13/08/2013 18:10
>> > Objet : Re: HsHa
>> >
>> > I have this anytime I try to switch to "hsha" since 0.8.
>> >
>> > Always kept "sync" for this reason. Thought I was alone with this
>> > bug since I never had any clue about this on the mailing list.
>> >
>> > So +1.
>> >
>> > Alain
>> >
>>
>> > 2013/8/13 Christopher Wirt 
>> > Hello,
>> >
>> > I was trying out the hsha thrift server implementation and found
>> > that I get a fair amount of these appearing in the server logs.
>> >
>> > ERROR [Selector-Thread-9] 2013-08-13 15:39:10,433
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> > ERROR [Selector-Thread-9] 2013-08-13 15:39:11,499
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> > ERROR [Selector-Thread-9] 2013-08-13 15:39:11,695
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> > ERROR [Selector-Thread-9] 2013-08-13 15:39:12,562
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> > ERROR [Selector-Thread-1] 2013-08-13 15:39:12,660
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> > ERROR [Selector-Thread-9] 2013-08-13 15:39:13,496
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> > ERROR [Selector-Thread-9] 2013-08-13 15:39:14,281
>> > TNonblockingServer.java (line 468) Read an invalid frame size of 0.
>> > Are you using TFramedTransport on the client side?
>> >
>> > Anyone seen this message before? know what it means? or issues it could
>> hide?
>> >
>> > https://issues.apache.org/jira/browse/CASSANDRA-4573
>> > in the comments suggests it might be a 10 client timeout
>> > but looking at JMX client stats the max value for read/write/slice
>> > is well below 10secs
>> >
>> >
>> > I’m using 1.2.8 on centos
>> >
>> >
>> > Cheers,
>> > Chris
>>
>
>


-- 
http://twitter.com/tjake


Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread Jake Luciani
Take a look at https://issues.apache.org/jira/browse/CASSANDRA-5661


On Mon, Jul 15, 2013 at 4:18 AM, sulong  wrote:

> Thanks for your help. Yes, I will try to increase the sstable size. I hope
> it can save me.
>
> 9000 SSTableReader x 10 RandomAccessReader x 64Kb = 5.6G memory. If there
> is only one RandomAccessReader, the memory will be 9000 * 1 * 64Kb = 0.56G
> . Looks great. But I think it must be reasonable to recycle the
> RandomAccessReader.
>
>
> On Mon, Jul 15, 2013 at 4:02 PM, Janne Jalkanen 
> wrote:
>
>>
>> I had exactly the same problem, so I increased the sstable size (from 5
>> to 50 MB - the default 5MB is most certainly too low for serious usecases).
>>  Now the number of SSTableReader objects is manageable, and my heap is
>> happier.
>>
>> Note that for immediate effect I stopped the node, removed the *.json
>> files and restarted - which put all SSTables to L0, which meant a weekend
>> full of compactions… Would be really cool if there was a way to
>> automatically drop all LCS SSTables one level down to make them compact
>> earlier without avoiding the
>> "OMG-must-compact-everything-aargh-my-L0-is-full" -effect of removing the
>> JSON file.
>>
>> /Janne
>>
>> On 15 Jul 2013, at 10:48, sulong  wrote:
>>
>> > Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?
>> The RandomAccessReader objects consums too much memory.
>> >
>> > I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap.
>> The cassandra's memory is full after about one month, so I have to restart
>> the 4 nodes every month.
>> >
>> > I have 100G data on every node, with LevedCompactionStrategy and 10M
>> sstable size, so there are more than 1 sstable files. By looking
>> through the heap dump file, I see there are more than 9000 SSTableReader
>> objects in memory, which references lots of  RandomAccessReader objects.
>> The memory is consumed by these RandomAccessReader objects.
>> >
>> > I see the PoolingSegementedFile has a recycle method, which puts the
>> RandomAccessReader to a queue. Looks like the Queue always grow until the
>> sstable is compacted.  Is there any way to stop the RandomAccessReader
>> recycling? Or, set a limit to the recycled RandomAccessReader's number?
>> >
>> >
>>
>>
>


-- 
http://twitter.com/tjake


Re: Leveled Compaction, number of SStables growing.

2013-07-09 Thread Jake Luciani
We run with 128mb some run with 256mb.  Leveled compaction creates fixed
sized sstables by design so this is the only way to lower the file count.


On Tue, Jul 9, 2013 at 2:56 PM, PARASHAR, BHASKARJYA JAY wrote:

>  Hi,
>
> ** **
>
> We recently switched from size tired compaction to Leveled compaction. We
> made this change because our rows are frequently updated. We also have a
> lot of data.
>
> With size-tiered compaction, we have about 5-10 sstables per CF. So with
> about 15 CF’s we had about 100 sstables.
>
> With a sstable default sixe of 5mb, now after leveled compaction, we have
> about 130k sstables and growing as the writes increases. There are a lot of
> compaction jobs pending.
>
> If we increase the SStable size to 20mb, that will be about 30k sstables
> but it’s still a lot.
>
> ** **
>
> Is this common? Any solution, hints on reducing the sstables are welcome.*
> ***
>
> ** **
>
> Thanks
>
> -Jay
>



-- 
http://twitter.com/tjake


Re: Data model for financial time series

2013-06-07 Thread Jake Luciani
We have built a similar system, you can ready about our data model in CQL3
here:

http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013

We are going to be presenting a similar talk next week at the cassandra
summit.


On Fri, Jun 7, 2013 at 12:34 PM, Davide Anastasia <
davide.anasta...@qualitycapital.com> wrote:

>  Hi,
>
> I am trying to build the storage of stock prices in Cassandra. My queries
> are ideally of three types:
>
> - give me everything between time A and time B;
>
> - give me everything about symbol X;
>
> - give me everything of type Y;
>
> …or an intersection of the three. Something I will be happy doing is:
>
> - give me all the trades about APPL between 7:00am and 3:00pm of a certain
> day.
>
> ** **
>
> However, being a time series, I will be happy to retrieve the data in
> ascending order of timestamp (from 7:00 to 3:00).
>
> ** **
>
> I have tried to build my table with the timestamp (as timeuuid) as primary
> key, however I cannot manage to get my data in order and and “order by” in
> CQL3 raise an error and doesn’t perform the query.
>
> ** **
>
> Does anybody have any suggestion to get a good design the fits my queries?
> 
>
> Thanks,
>
> David
>



-- 
http://twitter.com/tjake


Re: Cassandra and Apache Drill

2012-08-31 Thread Jake Luciani
I don't think Drill has been accepted into the incubator yet or has any
code.

If/When that happens then it's entirely possible Cassandra could be
integrated.

On Fri, Aug 31, 2012 at 4:29 PM, John Onusko  wrote:

> Like a lot of folks, I have a need for Big Data and fast queries on that
> data. Hive queries against Cassandra functionally meet my requirements, but
> the job oriented processing is too slow when you need to execute many
> queries on a small portion of the data. It seems like Apache Drill might be
> the right answer to this problem. I see HBase mentioned as a possible
> integration point with Drill, but no mention of Cassandra. Has anyone taken
> a look at Drill to see how it could access the data in Cassandra?
>
> ** **
>
> -John
>



-- 
http://twitter.com/tjake


Re: DSE solr HA

2012-08-13 Thread Jake Luciani


 
> 
>>  
>> Going through this page and it looks like indexes are stored locally 
>> http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details . 
>> My question is what happens if one of the solr nodes crashes? Is the data 
>> indexed again on those nodes?
>>  

Yes the data is indexed again on the node. Either from the commitlog or hints 
or repair. Same as Cassandra. 

>> Also, if RF > 1 then is the same data being indexed on all RF nodes or is 
>> that RF only for document replication?

The former. Each Replica has a indexed copy. We remove duplicates on read. 

Re: java.lang.OutOfMemoryError: unable to create new native thread

2012-06-25 Thread Jake Luciani
This means you need to raise the nproc limit for the user you run cassandra
with

On Mon, Jun 25, 2012 at 8:48 AM, Oli Schacher wrote:

> Hi list
>
> I have a small cassandra cluster consisting of three nodes. Every few
> weeks the whole cluster goes down at the same time. All nodes show:
>
> java.lang.OutOfMemoryError: unable to create new native thread
>at java.lang.Thread.start0(Native Method)
>at java.lang.Thread.start(Thread.java:691)
>at
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
>at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
>at
> org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:104)
>at
> org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:214)
>
> There are no other log messages shortly before the crash.
>
> I don't have much experience with cassandra, so I probably forgot to
> configure an important memory parameter. But before I screw things up
> even more, I hope someone on the list can point me in the right
> direction.
>
> Hardware:
> Each Node runs on two Intel Xeon CPU E5645  @ 2.40GHz (6 physical cores
> per CPU, 12 total), 12 Gig memory
>
> Software:
> Datastax Cassandra 1.1 , on Centos 6
>
> Clients:
> 10 linux servers, all of them connecting using pycassa. total of 10-30
> writes / sec
>
> I haven't changed any memory settings from the default, except
> uncommented
> MAX_HEAP_SIZE="4G"
> HEAP_NEWSIZE="800M"
> in cassandra-env.sh, this hasn't made a difference though.
>
> Any hints would be appreciated.
>
> Thanks,
> Oli
>
>
>


-- 
http://twitter.com/tjake


Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-25 Thread Jake Luciani
Hi Sarfar,

Yes you should make it a multiple.  The issue is each shard 'sticks' to a
given node but there is no way to guarantee  5 random keys will equally
distribute across 5 nodes.  The idea is eventually they will as you add
more and more keys.  So increasing shards at once can make that happen
faster.  You can change this parameter and restart the nodes without
affecting your old data.

If you have more issues raise it on the github issue tab for Solandra.

-Jake

On Mon, Jun 25, 2012 at 2:23 AM, Safdar Kureishy
wrote:

> Hi Jake,
>
> Thanks. Yes, I forgot to mention also that I had raised the
> solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should
> I have raised it to 10 or 15 (multiple of 5)? I have added all the
> documents that I needed to the index now. It appears the distribution
> became more even at a later stage, after indexing 12 million Nutch
> documents. The distribution is now 35G / 35G / 56G / 324M / 51G, but there
> is still one node that has a small fraction (i.e 324M) of what the other
> nodes have. In addition, some nodes also have about double the data as
> others (e.g., 56G vs 35G). If you think that increasing
> solandra.shards.at.once param will further improve the distribution, what
> would I need to do to enforce that change when the cluster is running, now
> that all the data has already been added to the index? And on the flip
> side, if the change cannot be made for existing data, what would happen (to
> existing + new data) if the setting was changed and the servers were
> restarted?
>
> Lastly, is there another mailing list I should be using for Solandra
> questions? I couldn't find one....
>
> Thanks,
> Safdar
>
>
>
>
>
> On Mon, Jun 25, 2012 at 4:16 AM, Jake Luciani  wrote:
>
>> Hi Safdar,
>>
>> If you want to get better utilization of the cluster raise the
>> solandra.shards.at.once param in solandra.properties
>>
>> -Jake
>>
>>
>>
>> On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy <
>> safdar.kurei...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've searched online but was unable to find any leads for the problem
>>> below. This mailing list seemed the most appropriate place. Apologies in
>>> advance if that isn't the case.
>>>
>>> I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
>>> nodes with tokens *evenly distributed across the token space*, for a
>>> 5-node cluster (as evidenced below under the "effective-ownership" column
>>> of the "nodetool ring" output). My data is a set of a few million crawled
>>> web pages, crawled using Nutch, and also indexed using the "solrindex"
>>> command available through Nutch. AFAIK, the key for each document generated
>>> from the crawled data is the URL.
>>>
>>> Based on the "load" values for the nodes below, despite adding about 3
>>> million web pages to this index via the HTTP Rest API (e.g.:
>>> http://9.9.9.x:8983/solandra/index/update), some nodes are still
>>> "empty". Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
>>> (shown in *bold* below) of the index, while the remaining 3 nodes are
>>> consistently getting hammered by all the data. If the RandomPartioner
>>> (which is what I'm using for this cluster) is supposed to achieve an even
>>> distribution of keys across the token space, why is it that the data below
>>> is skewed in this fashion? Literally, no key was yet been hashed to the
>>> nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
>>> this absurdity?.
>>>
>>> [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
>>> Address DC  RackStatus State   Load
>>>  Effective-Owership  Token
>>>
>>>136112946768375385385349842972707284580
>>> 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
>>> 20.00%  0
>>> 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
>>>  20.00%  34028236692093846346337460743176821145
>>> 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
>>>  20.00%  68056473384187692692674921486353642290
>>> 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
>>>  20.00%  102084710076281539039012382229530463435
>>> 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
>>>  20.00%  136112946768375385385349842972707284580
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Safdar
>>>
>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>


-- 
http://twitter.com/tjake


Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Jake Luciani
Hi Safdar,

If you want to get better utilization of the cluster raise the
solandra.shards.at.once param in solandra.properties

-Jake



On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy  wrote:

> Hi,
>
> I've searched online but was unable to find any leads for the problem
> below. This mailing list seemed the most appropriate place. Apologies in
> advance if that isn't the case.
>
> I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
> nodes with tokens *evenly distributed across the token space*, for a
> 5-node cluster (as evidenced below under the "effective-ownership" column
> of the "nodetool ring" output). My data is a set of a few million crawled
> web pages, crawled using Nutch, and also indexed using the "solrindex"
> command available through Nutch. AFAIK, the key for each document generated
> from the crawled data is the URL.
>
> Based on the "load" values for the nodes below, despite adding about 3
> million web pages to this index via the HTTP Rest API (e.g.:
> http://9.9.9.x:8983/solandra/index/update), some nodes are still
> "empty". Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
> (shown in *bold* below) of the index, while the remaining 3 nodes are
> consistently getting hammered by all the data. If the RandomPartioner
> (which is what I'm using for this cluster) is supposed to achieve an even
> distribution of keys across the token space, why is it that the data below
> is skewed in this fashion? Literally, no key was yet been hashed to the
> nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
> this absurdity?.
>
> [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
> Address DC  RackStatus State   Load
>  Effective-Owership  Token
>
>  136112946768375385385349842972707284580
> 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
> 20.00%  0
> 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
>  20.00%  34028236692093846346337460743176821145
> 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
>  20.00%  68056473384187692692674921486353642290
> 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
>  20.00%  102084710076281539039012382229530463435
> 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
>  20.00%  136112946768375385385349842972707284580
>
> Thanks in advance.
>
> Regards,
> Safdar
>



-- 
http://twitter.com/tjake


Re: 200TB in Cassandra ?

2012-04-20 Thread Jake Luciani
What other solutions are you considering?  Any OLTP style access of 200TB
of data will require substantial IO.

Do you know how big your working dataset will be?

-Jake

On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter wrote:

> On Fri, Apr 20, 2012 at 6:27 AM, aaron morton wrote:
>
>> Couple of ideas:
>>
>> * take a look at compression in 1.X
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
>> * is there repetition in the binary data ? Can you save space by
>> implementing content addressable storage ?
>>
>
> The data is already very highly space optimised. We've come to the
> conclusion that Cassandra is probably not the right fit the use case this
> time
>
> cheers
>
>
>>
>> Cheers
>>
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 20/04/2012, at 12:55 AM, Dave Brosius wrote:
>>
>>  I think your math is 'relatively' correct. It would seem to me you
>> should focus on how you can reduce the amount of storage you are using per
>> item, if at all possible, if that node count is prohibitive.
>>
>> On 04/19/2012 07:12 AM, Franc Carter wrote:
>>
>>
>>  Hi,
>>
>>  One of the projects I am working on is going to need to store about
>> 200TB of data - generally in manageable binary chunks. However, after doing
>> some rough calculations based on rules of thumb I have seen for how much
>> storage should be on each node I'm worried.
>>
>>200TB with RF=3 is 600TB = 600,000GB
>>   Which is 1000 nodes at 600GB per node
>>
>>  I'm hoping I've missed something as 1000 nodes is not viable for us.
>>
>>  cheers
>>
>>  --
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  
>> franc.car...@sirca.org.au | www.sirca.org.au
>> Tel: +61 2 9236 9118
>>  Level 9, 80 Clarence St, Sydney NSW 2000
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>>
>>
>
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
How many indexes are there?

On Tue, Apr 17, 2012 at 10:16 AM, Maxim Potekhin  wrote:

>  Yes. Sorry I didn't mention this, but of course I'm checking on indexes
> once in a while.
> So yes, they are marked as built.
>
> All of this started happening after a few days of continuous loading
> process. Since
> the nodes have good hardware (24 cores + SSD), the apparent load on each
> node
> was nothing remarkable, even at 20kHz insertion rate. But maybe I'm being
> overoptimistic.
>
> Maxim
>
>
>
> On 4/17/2012 10:12 AM, Jake Luciani wrote:
>
> Hmm that does sound fishy.
>
>  When you run show keyspaces from cassandra-cli it shows which indexes
> are built.  Are they marked built in your column family?
>
>  -Jake
>
>  On Tue, Apr 17, 2012 at 10:09 AM, Maxim Potekhin wrote:
>
>>  I understand that indexes are CFs. But the compaction stats says it's
>> building the
>> index, not compacting the corresponding CF. Either that's an ambiguous
>> diagnostic,
>> or indeed something is not right with my rig as of late.
>>
>> Maxim
>>
>>
>>
>>
>> On 4/17/2012 10:05 AM, Jake Luciani wrote:
>>
>> Well, the since the secondary indexes are themselves column families they
>> too are compacted along with everything else.
>>
>> On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin wrote:
>>
>>>  Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
>>> "pending tasks" in compaction stats, and tons of index files created in
>>> the
>>> data directory. Plus it does tell me that it is building the secondary
>>> index,
>>> and that seems to be happening at an amazingly glacial pace.
>>>
>>> I have 2 CFs there, with multiple secondary indexes. I'll try
>>> to compact the CF one by one, reboot and see if that helps.
>>>
>>> Maxim
>>>
>>>
>>>
>>> On 4/17/2012 9:53 AM, Jake Luciani wrote:
>>>
>>> No, the indexes are not rebuilt every compaction.  Only if you manually
>>> rebuild or bootstrap a new node does it use compaction manager to rebuild.
>>>
>>> On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin wrote:
>>>
>>>>  Thanks Aaaron. Just to be clear, every time I do a compaction,
>>>> I rebuild all indexes from scratch. Right?
>>>>
>>>> Maxim
>>>>
>>>>
>>>>
>>>> On 4/17/2012 6:16 AM, aaron morton wrote:
>>>>
>>>> Yes secondary index builds are done via the compaction manager.
>>>>
>>>>  Cheers
>>>>
>>>> -
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>>  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:
>>>>
>>>>  I noticed that "nodetool compactionstats" shows the building of the
>>>> secondary index while
>>>> I initiate compaction. Is this to be expected? Cassandra version 0.8.8.
>>>>
>>>> Thank you
>>>>
>>>> Maxim
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>  --
>>> http://twitter.com/tjake
>>>
>>>
>>>
>>
>>
>>  --
>> http://twitter.com/tjake
>>
>>
>>
>
>
>  --
> http://twitter.com/tjake
>
>
>


-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
Hmm that does sound fishy.

When you run show keyspaces from cassandra-cli it shows which indexes are
built.  Are they marked built in your column family?

-Jake

On Tue, Apr 17, 2012 at 10:09 AM, Maxim Potekhin  wrote:

>  I understand that indexes are CFs. But the compaction stats says it's
> building the
> index, not compacting the corresponding CF. Either that's an ambiguous
> diagnostic,
> or indeed something is not right with my rig as of late.
>
> Maxim
>
>
>
>
> On 4/17/2012 10:05 AM, Jake Luciani wrote:
>
> Well, the since the secondary indexes are themselves column families they
> too are compacted along with everything else.
>
> On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin  wrote:
>
>>  Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
>> "pending tasks" in compaction stats, and tons of index files created in
>> the
>> data directory. Plus it does tell me that it is building the secondary
>> index,
>> and that seems to be happening at an amazingly glacial pace.
>>
>> I have 2 CFs there, with multiple secondary indexes. I'll try
>> to compact the CF one by one, reboot and see if that helps.
>>
>> Maxim
>>
>>
>>
>> On 4/17/2012 9:53 AM, Jake Luciani wrote:
>>
>> No, the indexes are not rebuilt every compaction.  Only if you manually
>> rebuild or bootstrap a new node does it use compaction manager to rebuild.
>>
>> On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin  wrote:
>>
>>>  Thanks Aaaron. Just to be clear, every time I do a compaction,
>>> I rebuild all indexes from scratch. Right?
>>>
>>> Maxim
>>>
>>>
>>>
>>> On 4/17/2012 6:16 AM, aaron morton wrote:
>>>
>>> Yes secondary index builds are done via the compaction manager.
>>>
>>>  Cheers
>>>
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>>  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:
>>>
>>>  I noticed that "nodetool compactionstats" shows the building of the
>>> secondary index while
>>> I initiate compaction. Is this to be expected? Cassandra version 0.8.8.
>>>
>>> Thank you
>>>
>>> Maxim
>>>
>>>
>>>
>>>
>>
>>
>>  --
>> http://twitter.com/tjake
>>
>>
>>
>
>
>  --
> http://twitter.com/tjake
>
>
>


-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
Well, the since the secondary indexes are themselves column families they
too are compacted along with everything else.

On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin  wrote:

>  Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
> "pending tasks" in compaction stats, and tons of index files created in the
> data directory. Plus it does tell me that it is building the secondary
> index,
> and that seems to be happening at an amazingly glacial pace.
>
> I have 2 CFs there, with multiple secondary indexes. I'll try
> to compact the CF one by one, reboot and see if that helps.
>
> Maxim
>
>
>
> On 4/17/2012 9:53 AM, Jake Luciani wrote:
>
> No, the indexes are not rebuilt every compaction.  Only if you manually
> rebuild or bootstrap a new node does it use compaction manager to rebuild.
>
> On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin  wrote:
>
>>  Thanks Aaaron. Just to be clear, every time I do a compaction,
>> I rebuild all indexes from scratch. Right?
>>
>> Maxim
>>
>>
>>
>> On 4/17/2012 6:16 AM, aaron morton wrote:
>>
>> Yes secondary index builds are done via the compaction manager.
>>
>>  Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>>  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:
>>
>>  I noticed that "nodetool compactionstats" shows the building of the
>> secondary index while
>> I initiate compaction. Is this to be expected? Cassandra version 0.8.8.
>>
>> Thank you
>>
>> Maxim
>>
>>
>>
>>
>
>
>  --
> http://twitter.com/tjake
>
>
>


-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
No, the indexes are not rebuilt every compaction.  Only if you manually
rebuild or bootstrap a new node does it use compaction manager to rebuild.

On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin  wrote:

>  Thanks Aaaron. Just to be clear, every time I do a compaction,
> I rebuild all indexes from scratch. Right?
>
> Maxim
>
>
>
> On 4/17/2012 6:16 AM, aaron morton wrote:
>
> Yes secondary index builds are done via the compaction manager.
>
>  Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
>  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:
>
>  I noticed that "nodetool compactionstats" shows the building of the
> secondary index while
> I initiate compaction. Is this to be expected? Cassandra version 0.8.8.
>
> Thank you
>
> Maxim
>
>
>
>


-- 
http://twitter.com/tjake


Re: cassandra and .net

2012-04-10 Thread Jake Luciani
You can also look at using a .net client wrapper like
https://github.com/managedfusion/fluentcassandra

On Tue, Apr 10, 2012 at 8:06 AM, puneet loya  wrote:

> thankk  :) :) it works :)
>
>
> On Tue, Apr 10, 2012 at 3:07 PM, Henrik Schröder wrote:
>
>> In your code you are using BufferedTransport, but in the Cassandra logs
>> you're getting errors when it tries to use FramedTransport. If I remember
>> correctly, BufferedTransport is gone, so you should only use
>> FramedTransport. Like this:
>>
>> TTransport transport = new TFramedTransport(new TSocket(host, port));
>>
>> TProtocol protocol = new TBinaryProtocol(transport);
>> var client = new Cassandra.Client(protocol);
>> transport.Open();
>> client.describe_keyspace("abc");
>>
>>
>> /Henrik
>>
>>
>> On Tue, Apr 10, 2012 at 11:23, puneet loya  wrote:
>>
>>>
>>> Log is showing the following exception
>>>
>>> DEBUG [ScheduledTasks:1] 2012-04-10 14:49:29,654 LoadBroadcaster.java
>>> (line 86) Disseminating load info ...
>>> DEBUG [Thrift:7] 2012-04-10 14:50:00,820 CustomTThreadPoolServer.java
>>> (line 197) Thrift transport error occurred during processing of message.
>>> org.apache.thrift.transport.TTransportException
>>> at
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>>> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>>  at
>>> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>>> at
>>> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>>>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>>>  at
>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>>> at
>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>>>  at
>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
>>> at
>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>>>  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>>> Source)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>>  at java.lang.Thread.run(Unknown Source)
>>> DEBUG [Thrift:7] 2012-04-10 14:50:00,820 ClientState.java (line 104)
>>> logged out: #
>>>
>>> On Tue, Apr 10, 2012 at 11:24 AM, Maki Watanabe >> > wrote:
>>>
 Check your cassandra log.
 If you can't find any interesting log, set cassandra log level
 to DEBUG and run your program again.

 maki

 2012/4/10 puneet loya :
 > hi,
 >
 > sorry i posted the port as 7000. I m using 9160 but still has the same
 > error.
 >
 > "Cannot read, Remote side has closed".
 > Can u guess whats happening??
 >
 > On Tue, Apr 10, 2012 at 11:00 AM, Pierre Chalamet <
 pie...@chalamet.net>
 > wrote:
 >>
 >> hello,
 >>
 >> 9160 is probably the port to use if you use the default config.
 >>
 >> - Pierre
 >>
 >> On Apr 10, 2012, at 7:26 AM, puneet loya 
 wrote:
 >>
 >> > using System;
 >> > using System.Collections.Generic;
 >> > using System.Linq;
 >> > using System.Text;
 >> > using Thrift.Collections;
 >> > using Thrift.Protocol;
 >> > using Thrift.Transport;
 >> > using Apache.Cassandra;
 >> >
 >> > namespace ConsoleApplication1
 >> > {
 >> > class Program
 >> > {
 >> > static void Main(string[] args)
 >> > {
 >> > TTransport transport=null;
 >> > try
 >> > {
 >> > transport = new TBufferedTransport(new
 >> > TSocket("127.0.0.1", 7000));
 >> >
 >> >
 >> > //if(buffered)
 >> > //trans = new TBufferedTransport(trans
 as
 >> > TStreamTransport);
 >> > //if (framed)
 >> > //trans = new TFramedTransport(trans);
 >> >
 >> > TProtocol protocol = new
 TBinaryProtocol(transport);
 >> > Cassandra.Client client = new
 >> > Cassandra.Client(protocol);
 >> >
 >> > Console.WriteLine("Opening connection");
 >> >
 >> > if (!transport.IsOpen)
 >> > transport.Open();
 >> >
 >> > client.describe_keyspace("abc");   //
 >> > Crashing at this point
 >> >
 >> >   }
 >> > catch (Exception ex)
 >> > {
 >> > Console.WriteLine(ex.Message);
 >> > }
 >> > finally
 >> > { if(transport!=null)
 >> > transport.Close(); }
 >> > Console.ReadLine();
 >> > }
 >> > }
 >> > }
 >> >
 >> > I m trying t

Re: 2 questions DataStax Enterprise

2012-04-03 Thread Jake Luciani
Hi reply inline.

On Tue, Apr 3, 2012 at 12:18 PM, Alexandru Sicoe  wrote:

> Hi guys,
>  I'm trying out DSE and looking for the best way to arrange the cluster. I
> have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6
> outside the gateway that are supposed to take replicas from the other 3 and
> serve reads and analytics jobs.
>
> 1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other
> 6 nodes as analytics? Can I serve both real time reads and M/R jobs from
> the 6 nodes? How will these affect each other performancewise?
>

if you plan to use CFS heavily then it will affect performance of the other
nodes.  If you raise the RF of your column families then it should be fine
if you run mapreduce at CL=ONE


>
> I know that the way the system is supposed to be used is to separate
> analytics from real time queries. I've already explored a possible 3DC
> setup with Tyler in another message and it indeed works but I'm afraid it
> is too complex and would require me to send 2 replicas across the firewall
> which it can't handle very well at peak times, affecting other applications.
>
> 2. I started the cluster in the setup described in 1 (3 normal, 6
> analytics) and as soon as the Analytics nodes start up they start
> outputting this message:
>
> INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629)
> Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already
> tried 10 time(s).
> 
>
> So it seems my analytics nodes are trying to contact the normal Cassandra
> seed node on port 8012 which I read is a "Hadoop Job Tracker client port".
> It doesn't seem like this is the normal behavior. Why is it getting
> confused? In the .yaml of each node I'm using endpoint_snitch:
> com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed
> node before the normal cassandra seed node in the seeds.
>


You can run dsetool movejt to move the jobtracker to one of the known
hadoop nodes.


>
> Cheers,
> Alex
>
>


-- 
http://twitter.com/tjake


Re: Write performance compared to Postgresql

2012-04-03 Thread Jake Luciani
Hi Jeff,

Writing serially over one connection will be slower. If you run many threads 
hitting the server at once you will see throughput improve. 

Jake

 

On Apr 3, 2012, at 7:08 AM, Jeff Williams  wrote:

> Hi,
> 
> I am looking at cassandra for a logging application. We currently log to a 
> Postgresql database.
> 
> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 
> hashes representing logs entries, read from a json file. I then looped over 
> these to do 10,000 log inserts. I repeated the same writing to a postgresql 
> instance on one of the cassandra servers. The script is attached. The 
> cassandra writes appear to perform a lot worse. Is this expected?
> 
> jeff@transcoder01:~$ ruby cassandra-bm.rb 
> cassandra
>  3.17   0.48   3.65 ( 12.032212)
> jeff@transcoder01:~$ ruby cassandra-bm.rb 
> postgres
>  2.14   0.33   2.47 (  7.002601)
> 
> Regards,
> Jeff
> 
> 


  1   2   >