New Metrics Collector for Apache Cassandra w/ Prometheus

2020-05-15 Thread Jake Luciani
Hi,

Hope this email finds you well.

DataStax has recently open sourced a new metrics collector for Apache
Cassandra.
It's a drop in solution and comes with Prometheus dashboards and works with
all versions
between 2.2 to 4.0 alpha.

Blog:
https://www.datastax.com/blog/2020/05/monitoring-apache-cassandratm-made-simple
GH: https://github.com/datastax/metric-collector-for-apache-cassandra

Stay Safe!

Jake


Re: Counter performance

2017-04-17 Thread Jake Luciani
You can set the trace probability on a node to 1% and you'll catch a trace
on that table.

http://cassandra.apache.org/doc/latest/tools/nodetool/settraceprobability.html

On Mon, Apr 17, 2017 at 11:17 AM, benjamin roth  wrote:

> Just run some queries on counter tables. Some on regular tables. Look at
> traces and then compare. You don't need to do anything with application
> code. You can also set trace probability on a table level and then analyze
> the queries.
>
> Am 17.04.2017 17:07 schrieb "Eren Yilmaz" :
>
>> I can’t add tracing using driver – Usergrid code is way too complex. When
>> I look at logging the slow queries on the C* side, it says the feature is
>> added in version 3.10 (https://issues.apache.org/jir
>> a/browse/CASSANDRA-12403), and we use 3.7. Any other ways to log slow
>> queries in this version? Or, what do we expect with this log output?
>>
>>
>>
>> *From:* benjamin roth [mailto:brs...@gmail.com]
>> *Sent:* Monday, April 17, 2017 5:44 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: Counter performance
>>
>>
>>
>> You could enable a slow query log and then trace single queries couldn't
>> you?
>>
>>
>>
>> Am 17.04.2017 16:31 schrieb "Eren Yilmaz" :
>>
>> I can’t trace selects on the application tables unfortunately. The
>> application is Usergrid, and it stores the data in binary. We have little
>> control over Usergrid-created data.
>>
>>
>>
>> *From:* benjamin roth [mailto:brs...@gmail.com]
>> *Sent:* Monday, April 17, 2017 4:12 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Counter performance
>>
>>
>>
>> Do you see difference when tracing the selects?
>>
>>
>>
>> 2017-04-17 13:36 GMT+02:00 Eren Yilmaz :
>>
>> Application tables use LeveledCompactionStrategy. At first, counter
>> tables were created by default SizeTieredCompactionStrategy, but we changed
>> them to LeveledCompactionStrategy then.
>>
>>
>>
>> compaction = { 'class' : 'org.apache.cassandra.db.compa
>> ction.LeveledCompactionStrategy', 'sstable_size_in_mb' : 512 }
>>
>>
>>
>> *From:* benjamin roth [mailto:brs...@gmail.com]
>> *Sent:* Monday, April 17, 2017 12:12 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Counter performance
>>
>>
>>
>> Do you have a different compaction strategy on the counter tables?
>>
>>
>>
>> 2017-04-17 10:07 GMT+02:00 Eren Yilmaz :
>>
>> We are using Cassandra (3.7) counter tables in our application, and there
>> are about 10 counter tables. The counter tables are in a separate keyspace
>> with RF=3 (total 10 nodes). The tables are read-heavy, for each web request
>> to the application, we read at least 20 counter values. The counter reads
>> are very slow comparing to the other application data reads from cassandra,
>> and sometimes the reads put extra heavy CPU load on some nodes.
>>
>>
>>
>> Are there any tips, or best practices for increasing the performance of
>> counter tables?
>>
>>
>>
>>
>>
>>
>>
>


-- 
http://twitter.com/tjake


Re: Incremental repair for the first time

2016-12-16 Thread Jake Luciani
This was fixed post 3.0.4 please upgrade to latest 3.0 release

On Fri, Dec 16, 2016 at 4:49 PM, Kathiresan S 
wrote:

> Hi,
>
> We have a brand new Cassandra cluster (version 3.0.4) and we set up
> nodetool repair scheduled for every day (without any options for repair).
> As per documentation, incremental repair is the default in this case.
> Should we do a full repair for the very first time on each node once and
> then leave it to do incremental repair afterwards?
>
> *Problem we are facing:*
>
> On a random node, the repair process throws validation failed error,
> pointing to some other node
>
> For Eg. Node A, where the repair is run (without any option), throws below
> error
>
> *Validation failed in /Node B*
>
> In Node B when we check the logs, below exception is seen at the same
> exact time...
>
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
> *at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1087)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
> *at
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
> *at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:700)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_73]*
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_73]*
>
> Can you please help on how this can be fixed?
>
> Thanks,
> Kathir
>



-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.0.9 released

2016-09-20 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.9.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/YfvFn8 (CHANGES.txt)
[2]: https://goo.gl/k9leqx (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 3.0.8 released

2016-07-07 Thread Jake Luciani
Sorry, I totally missed that.  Uploading now.

On Thu, Jul 7, 2016 at 4:51 AM, horschi <hors...@gmail.com> wrote:

> Same for 2.2.7.
>
> On Thu, Jul 7, 2016 at 10:49 AM, Julien Anguenot <jul...@anguenot.org>
> wrote:
>
>> Hey,
>>
>> The Debian packages do not seem to have been published. Normal?
>>
>> Thank you.
>>
>>J.
>>
>> On Jul 6, 2016, at 4:20 PM, Jake Luciani <j...@apache.org> wrote:
>>
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 3.0.8.
>>
>> Apache Cassandra is a fully distributed database. It is the right choice
>> when you need scalability and high availability without compromising
>> performance.
>>
>>  http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>>  http://cassandra.apache.org/download/
>>
>> This version is a bug fix release[1] on the 3.0 series. As always, please
>> pay
>> attention to the release notes[2] and Let us know[3] if you were to
>> encounter
>> any problem.
>>
>> Enjoy!
>>
>> [1]: http://goo.gl/DQpe4d (CHANGES.txt)
>> [2]: http://goo.gl/UISX1K (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>
>>
>>
>


[RELEASE] Apache Cassandra 2.2.7 released

2016-07-06 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/KNV34t (CHANGES.txt)
[2]: http://goo.gl/VQfst8 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.7 released

2016-06-14 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/yPJaXi (CHANGES.txt)
[2]: http://goo.gl/Jph9Fh (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.7 released

2016-06-14 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a tick-tock bug fix release[1] on the 3.x series. As
always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/k1abJV (CHANGES.txt)
[2]: http://goo.gl/3ENJIz (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Why there is no native shutdown command in cassandra

2016-06-13 Thread Jake Luciani
If that's true it's a bug then. can you open a ticket and include the logs?
https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Jun 13, 2016 at 2:19 PM, Anshu Vajpayee <anshu.vajpa...@gmail.com>
wrote:

> I just tested. It doesn't flush memtables like nodetool drain/flush
> command. Means it only does crash for the node, no graceful shutdown.
>
>
>
> On Mon, Jun 13, 2016 at 10:51 PM, Jake Luciani <jak...@gmail.com> wrote:
>
>> Yeah same as drain.  Just exits at the end.
>>
>> On Mon, Jun 13, 2016 at 1:11 PM, Anshu Vajpayee <anshu.vajpa...@gmail.com
>> > wrote:
>>
>>> Thanks for information.
>>>
>>> Does stopdaemon also flush memtables  and stop trift and CQL interface
>>> before shutting down the daemon ?  does node also announce  shutting down
>>> message  in ring  ?
>>>
>>>
>>> On Mon, Jun 13, 2016 at 10:14 PM, Jake Luciani <jak...@gmail.com> wrote:
>>>
>>>> If you want to understand why, it's because C* was designed to be
>>>> crash-only.
>>>>
>>>> https://www.usenix.org/conference/hotos-ix/crash-only-software
>>>>
>>>> Since this is great for the project but bad for operators experience we
>>>> have later added this stopdaemon command.
>>>>
>>>> On Mon, Jun 13, 2016 at 12:37 PM, Anshu Vajpayee <
>>>> anshu.vajpa...@gmail.com> wrote:
>>>>
>>>>> As per Documentation(pasted as below), It does not stop Daemon . I
>>>>> tested also.I was looking for graceful shutdown  for Cassandra Daemon.
>>>>> Description
>>>>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsDrain.html?scroll=toolsDrain__description_unique_11>
>>>>>
>>>>> Flushes all memtables from the node to SSTables on disk. Cassandra
>>>>> stops listening for connections from the client and other nodes. You need
>>>>> to restart Cassandra after running nodetool drain. You typically use
>>>>> this command before upgrading a node to a new version of Cassandra. To
>>>>> simply flush memtables to disk, use nodetool flush.
>>>>>
>>>>> On Mon, Jun 13, 2016 at 10:00 PM, Jeff Jirsa <
>>>>> jeff.ji...@crowdstrike.com> wrote:
>>>>>
>>>>>> `nodetool drain`
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Anshu Vajpayee <anshu.vajpa...@gmail.com>
>>>>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>>>> *Date: *Monday, June 13, 2016 at 9:28 AM
>>>>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>>>> *Subject: *Why there is no native shutdown command in cassandra
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All
>>>>>>
>>>>>>
>>>>>>
>>>>>> Why we dont have native shutdown command in Cassandra ?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Every software provides graceful shutdown command.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ​Regards,
>>>>>>
>>>>>> Anshu​
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Regards,*
>>>>> *Anshu *
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://twitter.com/tjake
>>>>
>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Anshu *
>>>
>>>
>>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


-- 
http://twitter.com/tjake


Re: Why there is no native shutdown command in cassandra

2016-06-13 Thread Jake Luciani
Yeah same as drain.  Just exits at the end.

On Mon, Jun 13, 2016 at 1:11 PM, Anshu Vajpayee <anshu.vajpa...@gmail.com>
wrote:

> Thanks for information.
>
> Does stopdaemon also flush memtables  and stop trift and CQL interface
> before shutting down the daemon ?  does node also announce  shutting down
> message  in ring  ?
>
>
> On Mon, Jun 13, 2016 at 10:14 PM, Jake Luciani <jak...@gmail.com> wrote:
>
>> If you want to understand why, it's because C* was designed to be
>> crash-only.
>>
>> https://www.usenix.org/conference/hotos-ix/crash-only-software
>>
>> Since this is great for the project but bad for operators experience we
>> have later added this stopdaemon command.
>>
>> On Mon, Jun 13, 2016 at 12:37 PM, Anshu Vajpayee <
>> anshu.vajpa...@gmail.com> wrote:
>>
>>> As per Documentation(pasted as below), It does not stop Daemon . I
>>> tested also.I was looking for graceful shutdown  for Cassandra Daemon.
>>> Description
>>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsDrain.html?scroll=toolsDrain__description_unique_11>
>>>
>>> Flushes all memtables from the node to SSTables on disk. Cassandra stops
>>> listening for connections from the client and other nodes. You need to
>>> restart Cassandra after running nodetool drain. You typically use this
>>> command before upgrading a node to a new version of Cassandra. To simply
>>> flush memtables to disk, use nodetool flush.
>>>
>>> On Mon, Jun 13, 2016 at 10:00 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com
>>> > wrote:
>>>
>>>> `nodetool drain`
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Anshu Vajpayee <anshu.vajpa...@gmail.com>
>>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> *Date: *Monday, June 13, 2016 at 9:28 AM
>>>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>>>> *Subject: *Why there is no native shutdown command in cassandra
>>>>
>>>>
>>>>
>>>> Hi All
>>>>
>>>>
>>>>
>>>> Why we dont have native shutdown command in Cassandra ?
>>>>
>>>>
>>>>
>>>> Every software provides graceful shutdown command.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ​Regards,
>>>>
>>>> Anshu​
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> *Regards,*
>>> *Anshu *
>>>
>>>
>>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


-- 
http://twitter.com/tjake


Re: Why there is no native shutdown command in cassandra

2016-06-13 Thread Jake Luciani
If you want to understand why, it's because C* was designed to be
crash-only.

https://www.usenix.org/conference/hotos-ix/crash-only-software

Since this is great for the project but bad for operators experience we
have later added this stopdaemon command.

On Mon, Jun 13, 2016 at 12:37 PM, Anshu Vajpayee 
wrote:

> As per Documentation(pasted as below), It does not stop Daemon . I tested
> also.I was looking for graceful shutdown  for Cassandra Daemon.Description
>
> 
>
> Flushes all memtables from the node to SSTables on disk. Cassandra stops
> listening for connections from the client and other nodes. You need to
> restart Cassandra after running nodetool drain. You typically use this
> command before upgrading a node to a new version of Cassandra. To simply
> flush memtables to disk, use nodetool flush.
>
> On Mon, Jun 13, 2016 at 10:00 PM, Jeff Jirsa 
> wrote:
>
>> `nodetool drain`
>>
>>
>>
>>
>>
>> *From: *Anshu Vajpayee 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Monday, June 13, 2016 at 9:28 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Why there is no native shutdown command in cassandra
>>
>>
>>
>> Hi All
>>
>>
>>
>> Why we dont have native shutdown command in Cassandra ?
>>
>>
>>
>> Every software provides graceful shutdown command.
>>
>>
>>
>>
>>
>>
>>
>> ​Regards,
>>
>> Anshu​
>>
>>
>>
>>
>>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.6 released

2016-06-06 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a tick-tock feature release[1] on the 3.x series. As
always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/eu90nx (CHANGES.txt)
[2]: http://goo.gl/ugkBQW (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.6 released

2016-05-13 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/cBU6AT (CHANGES.txt)
[2]: http://goo.gl/XvXLaJ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.6 released

2016-04-26 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/yCpWu7 (CHANGES.txt)
[2]: http://goo.gl/qktJUS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.14 released

2016-04-26 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.14.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/7lm5sY (CHANGES.txt)
[2]: http://goo.gl/SUIzT9 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Jake Luciani
What kind of collection? if its par new I wouldn't worry.

On Thu, Apr 21, 2016 at 2:02 PM, Sotirios Delimanolis <sotodel...@yahoo.com>
wrote:

> Should this be of any concern? Are the corresponding threads spending too
> long in this JNI critical region and delaying GC?
>
> I don't get that impression at all from the GC log timings. They're very
> reasonable.
>
> On Thursday, April 21, 2016 10:57 AM, Jake Luciani <jak...@gmail.com>
> wrote:
>
>
> It's only used by the Snappy and LZ4 Compressors
>
> On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis <
> sotodel...@yahoo.com> wrote:
>
> According to this Oracle document
> <https://blogs.oracle.com/g1gc/entry/g1_gc_glossary_of_terms>, GCLocker
> Initiated GC
>
> is triggered when a JNI critical region was released. GC is blocked
> when any thread is in the JNI Critical region.
> If GC was requested during that period, that GC is invoked after all
> the threads come out of the JNI critical region.
>
> What part of Cassandra's implementation does anything with JNI?
>
> In our GC logs, this is by far the most common reason for GC pauses.
>
>
>
>
> --
> http://twitter.com/tjake
>
>
>


-- 
http://twitter.com/tjake


Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Jake Luciani
It's only used by the Snappy and LZ4 Compressors

On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis 
wrote:

> According to this Oracle document
> , GCLocker
> Initiated GC
>
> is triggered when a JNI critical region was released. GC is blocked
> when any thread is in the JNI Critical region.
> If GC was requested during that period, that GC is invoked after all
> the threads come out of the JNI critical region.
>
> What part of Cassandra's implementation does anything with JNI?
>
> In our GC logs, this is by far the most common reason for GC pauses.
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.5 released

2016-04-13 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.5 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/FchTrl (CHANGES.txt)
[2]: http://goo.gl/0zpkJU (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.5 released

2016-04-11 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/tlNv8g (CHANGES.txt)
[2]: http://goo.gl/WrCSKw (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.4 released

2016-03-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a feature release[1] on the 3.4 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/l61Mvd (CHANGES.txt)
[2]: http://goo.gl/hIamQh (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jake Luciani
No problem. Run it after you upgrade.

On Tue, Feb 9, 2016 at 2:01 PM, Will Hayworth <whaywo...@atlassian.com>
wrote:

> Pardon my ignorance, Jake--should we run upgradesstables -a after or
> before we install 3.3?
>
> Thanks! :)
>
> ___
> Will Hayworth
> Developer, Engagement Engine
> Atlassian
>
> My pronoun is "they". <http://pronoun.is/they>
>
>
>
> On Tue, Feb 9, 2016 at 10:50 AM, Jake Luciani <j...@apache.org> wrote:
>
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 3.3.
>>
>> *This release contains a critical bug in 3.0 series[4].* If you have
>> installed version >= 3.0
>> you will need to run 'nodetool upgradesstables -a' on all nodes to
>> receive the fix.
>>
>> Apache Cassandra is a fully distributed database. It is the right choice
>> when you need scalability and high availability without compromising
>> performance.
>>
>>  http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>>  http://cassandra.apache.org/download/
>>
>> This version is a bug fix release[1] on the 3.3 series. As always, please
>> pay
>> attention to the release notes[2] and Let us know[3] if you were to
>> encounter
>> any problem.
>>
>> Enjoy!
>>
>> [1]: http://goo.gl/V2lsST (CHANGES.txt)
>> [2]: http://goo.gl/5UBlNl (NEWS.txt)
>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>> [4]: https://issues.apache.org/jira/browse/CASSANDRA-11102
>>
>>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.3.

*This release contains a critical bug in 3.0 series[4].* If you have
installed version >= 3.0
you will need to run 'nodetool upgradesstables -a' on all nodes to receive
the fix.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.3 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/V2lsST (CHANGES.txt)
[2]: http://goo.gl/5UBlNl (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: https://issues.apache.org/jira/browse/CASSANDRA-11102


[RELEASE] Apache Cassandra 3.0.3 released

2016-02-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.3.

*This release contains a critical bug in 3.0 series[4].* If you have
installed version >= 3.0
you will need to run 'nodetool upgradesstables -a' on all nodes to receive
the fix.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/UtWBp4 (CHANGES.txt)
[2]: http://goo.gl/QGrGiy (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: https://issues.apache.org/jira/browse/CASSANDRA-11102


Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-09 Thread Jake Luciani
Well typically you should run upgradesstables when you upgrade major
versions as well

https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html

On Tue, Feb 9, 2016 at 6:11 PM, Will Zhang <weiliang.zh...@gmail.com> wrote:

> Nice work guys.
>
> Just to confirm, if you upgrade from, 2.2.x say, directly to 3.3, you will
> *not* need to run upgradesstables, right? It seems pretty clear that the
> answer is no but I just wanted to make sure. Only needed if you got from a
> 3.x version?
>
> Thank you.
>
> Sent from my iPhone
>
> On 9 Feb 2016, at 19:06, Jake Luciani <jak...@gmail.com> wrote:
>
> No problem. Run it after you upgrade.
>
> On Tue, Feb 9, 2016 at 2:01 PM, Will Hayworth <whaywo...@atlassian.com>
> wrote:
>
>> Pardon my ignorance, Jake--should we run upgradesstables -a after or
>> before we install 3.3?
>>
>> Thanks! :)
>>
>> ___
>> Will Hayworth
>> Developer, Engagement Engine
>> Atlassian
>>
>> My pronoun is "they". <http://pronoun.is/they>
>>
>>
>>
>> On Tue, Feb 9, 2016 at 10:50 AM, Jake Luciani <j...@apache.org> wrote:
>>
>>> The Cassandra team is pleased to announce the release of Apache Cassandra
>>> version 3.3.
>>>
>>> *This release contains a critical bug in 3.0 series[4].* If you have
>>> installed version >= 3.0
>>> you will need to run 'nodetool upgradesstables -a' on all nodes to
>>> receive the fix.
>>>
>>> Apache Cassandra is a fully distributed database. It is the right choice
>>> when you need scalability and high availability without compromising
>>> performance.
>>>
>>>  http://cassandra.apache.org/
>>>
>>> Downloads of source and binary distributions are listed in our download
>>> section:
>>>
>>>  http://cassandra.apache.org/download/
>>>
>>> This version is a bug fix release[1] on the 3.3 series. As always,
>>> please pay
>>> attention to the release notes[2] and Let us know[3] if you were to
>>> encounter
>>> any problem.
>>>
>>> Enjoy!
>>>
>>> [1]: http://goo.gl/V2lsST (CHANGES.txt)
>>> [2]: http://goo.gl/5UBlNl (NEWS.txt)
>>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>> [4]: https://issues.apache.org/jira/browse/CASSANDRA-11102
>>>
>>>
>>
>
>
> --
> http://twitter.com/tjake
>
>


-- 
http://twitter.com/tjake


Re: [RELEASE] Apache Cassandra 2.1.13 released

2016-02-08 Thread Jake Luciani
Apologies I send the wrong changelog and news links.

Here are the correct ones for 2.1.13

http://goo.gl/9ZPnNX (CHANGES.txt)
http://goo.gl/5cR7eh (NEWS.txt)



On Mon, Feb 8, 2016 at 9:19 AM, Jake Luciani <j...@apache.org> wrote:

> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 2.1.13.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 2.1 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/lT2JXJ (CHANGES.txt)
> [2]: http://goo.gl/9m6hGQ (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>


[RELEASE] Apache Cassandra 2.1.13 released

2016-02-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.13.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/lT2JXJ (CHANGES.txt)
[2]: http://goo.gl/9m6hGQ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: cassandra-stress tool - InvalidQueryException: Batch too large

2016-02-01 Thread Jake Luciani
Yeah that looks like a bug.  Can you open a JIRA and attach the full .yaml?

Thanks!


On Mon, Feb 1, 2016 at 5:09 AM, Ralf Steppacher 
wrote:

> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress
> tool to work for my test scenario. I have followed the example on
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  to
> create a yaml file describing my test.
>
> I am collecting events per user id (text, partition key). Events have a
> session type (text), event type (text), and creation time (timestamp)
> (clustering keys, in that order). Plus some more attributes required for
> rendering the events in a UI. For testing purposes I ended up with the
> following column spec and insert distribution:
>
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
>
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
>
>
> Running stress tool for just the insert prints
>
> Generating batches with [1..1] partitions and [0..1] rows (of
> [10..120] total rows in the partitions)
>
> and then immediately starts flooding me with
> "com.datastax.driver.core.exceptions.InvalidQueryException: Batch too
> large”.
>
> Why I should be exceeding the "batch_size_fail_threshold_in_kb: 50” in the
> cassandra.yaml I do not understand. My understanding is that the stress
> tool should generate one row per batch. The size of a single row should not
> exceed 8+10*3+5*3+15*3+100*3 = 398 bytes. Assuming a worst case of all text
> characters being 3 byte unicode characters.
>
> How come I end up with batches that exceed the 50kb threshold? Am I
> missing the point about the “select” attribute?
>
>
> Thanks!
> Ralf
>



-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.2.1 released

2016-01-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.2.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/ySa5hr (CHANGES.txt)
[2]: https://goo.gl/tCBBPv (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Cassandra 3.1.1 with respect to HeapSpace

2016-01-14 Thread Jake Luciani
Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node
and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

>
> Ok, I will open a ticket.
>
> How could I restart my cluster without loosing everything ?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
> Thanks
>
> Jean
>
> On 14 Jan 2016, at 18:19, Tyler Hobbs  wrote:
>
> I don't think that's a known issue.  Can you open a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
> along with the commitlog files and the mutation that was saved to /tmp?
>
> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>> I use Cassandra 3.1.1.
>> I use the following setup for the memory:
>>   MAX_HEAP_SIZE="6G"
>> HEAP_NEWSIZE="496M"
>>
>> I have been loading a lot of data in this cluster over the last 24 hours.
>> The system behaved I think very nicely. It was loading very fast, and
>> giving excellent read time. There was no error messages until this one:
>>
>>
>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.lang.OutOfMemoryError: Java heap space
>> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:128)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:123)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at 

[RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/vBb0Ad (CHANGES.txt)
[2]: http://goo.gl/JjUIGF (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
Note: I made a mistake saying this is a bug fix release, it's a feature
release that includes bugfixes.

On Tue, Jan 12, 2016 at 8:46 AM, Jake Luciani <j...@apache.org> wrote:

>
> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 3.2.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.2 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/vBb0Ad (CHANGES.txt)
> [2]: http://goo.gl/JjUIGF (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>


[RELEASE] Apache Cassandra 3.1.1 released

2015-12-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.1.1.

There has been some understandable confusion about our new Tick-Tock
release style.  This thread should help explain it [4]. Since a critical
bug was discovered just after 3.1 we are releasing 3.1.1 to address it
before 3.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/etxSuG (CHANGES.txt)
[2]: https://goo.gl/gP7B3J (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: http://www.mail-archive.com/user@cassandra.apache.org/msg45119.html


[RELEASE] Apache Cassandra 3.0.2 released

2015-12-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: https://goo.gl/swRjp9 (CHANGES.txt)
[2]: https://goo.gl/ipA763 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.1 released

2015-12-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.1. This is the first release from our new Tick-Tock release
process[4].
It contains only bugfixes on the 3.0 release.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.x series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/rQJ9yd (CHANGES.txt)
[2]: http://goo.gl/WBrlCs (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/


[RELEASE] Apache Cassandra 3.0.1 released

2015-12-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/99MRn6 (CHANGES.txt)
[2]: http://goo.gl/jwoQl6 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.4 released

2015-12-07 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.4.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/EWjhm1 (CHANGES.txt)
[2]: http://goo.gl/WLSytN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.12 released

2015-12-07 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.12.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Phl5Pd (CHANGES.txt)
[2]: http://goo.gl/L1HIfj (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: cassandra-stress 2.1: Generating data

2015-12-03 Thread Jake Luciani
The data is only being inserted from gen01

On Thu, Dec 3, 2015 at 10:52 AM,  wrote:

> Hi,
>
>
>
> I’m trying to insert data with Cassandra-stress into cluster C* with 6
> nodes: *node001….006*
>
>
>
> The stress-tool is executed on a different machine (*gen01*) specifying
> one of 6 nodes: tools/bin/cassandra-stress  user profile=cf.yml
> ops\(insert=1\) n=500 -mode thrift -node node001  -rate threads=50
>
>
>
> My question : The data generation of data is done on  gen01 and then
> inserted on nodes Cassandra OR ALL (generation and insertion) is running on
> nodes Cassandra ?
>
>
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this 
> message was modified, changed or falsified.
> Thank you.
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 3.0.0 released

2015-11-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0.

Top Cassandra 3.0 features:

  * CQL optimized storage engine and sstable format
  * Materialized views
  * More efficient hints

Read more about features and upgrade instructions in NEWS.txt[2]

The Java driver beta for 3.0.0 will be officially released within the next
week.  In the meantime,
use the version included in the release under /lib.

The Python driver rc has been released as '3.0.0rc1'

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a first release[1] on the 3.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/TduZdw (CHANGES.txt)
[2]: http://goo.gl/mJxdHZ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-rc2 released

2015-10-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-rc2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] for the 3.0 series. As always,
please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/mLK41h (CHANGES.txt)
[2]: http://goo.gl/JO8474 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.11 released

2015-10-16 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.11.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/mJCyUf (CHANGES.txt)
[2]: http://goo.gl/ax1w4y (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.3 released

2015-10-16 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.3.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/zLlUcO (CHANGES.txt)
[2]: http://goo.gl/pC433O (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.10 released

2015-10-05 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.10.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/KE0tlf (CHANGES.txt)
[2]: http://goo.gl/0CW2iz (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.2 released

2015-10-05 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/d9xIEO (CHANGES.txt)
[2]: http://goo.gl/S64khA (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Consistency Issues

2015-10-01 Thread Jake Luciani
Couple things to try.

1. nodetool resetlocalschema on the nodes with missing CFs. This will
refresh the schema on the local node.
2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
specific to this problem but worth upgrading)


Re: Consistency Issues

2015-10-01 Thread Jake Luciani
Onur, was responding to Stephen's issue.


On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı <onur.yal...@8digits.com> wrote:

> Thank you Jake.
>
> The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not
> a possibility because of the deprecation of cql dialects. Our application
> is using Hector and migrating to cql3 is a huge refactoring.
>
>
>
> On 01/10/15 15:48, Jake Luciani wrote:
>
>> Couple things to try.
>>
>> 1. nodetool resetlocalschema on the nodes with missing CFs. This will
>> refresh the schema on the local node.
>> 2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing
>> specific to this problem but worth upgrading)
>>
>
>


-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 2.0.17 released

2015-09-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.17.

This is most likely the final release for the 2.0 release series.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/QwruFc (CHANGES.txt)
[2]: http://goo.gl/fHlSqL (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-rc1 released

2015-09-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Oppn3S (CHANGES.txt)
[2]: http://goo.gl/zQFaj4 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.1 released

2015-09-01 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/x6ilHu (CHANGES.txt)
[2]: http://goo.gl/FHwYLN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.9 released

2015-08-28 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.9.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/xnYwFa (CHANGES.txt)
[2]: http://goo.gl/QDqPhN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-beta1 released

2015-08-24 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-beta1.

You’ll need python-driver 3.0.0a2 (available on pypi) or java-driver
3.0.0-alpha2 (uploaded to Maven Central) to try out 3.0.0-beta1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a *BETA* release[1] on the 3.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/2TNRm5 (CHANGES.txt)
[2]: http://goo.gl/9xluWy (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 3.0.0-alpha1 released

2015-08-03 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-alpha1.

This is the first test build of Cassandra 3.0 that includes:

   * New storage engine
   * New sstable format
   * Materialized Views

We expect bugs in this release so test and report any issues please!


Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a *ALPHA* release[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/qTe3Ed (CHANGES.txt)
[2]: http://goo.gl/eMIDGw (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0 released

2015-07-20 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0.

You can read about the release here:
http://www.datastax.com/dev/blog/cassandra-2-2

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is the first release[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/nUjs6O (CHANGES.txt)
[2]: http://goo.gl/Qk4ljt (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.8 released

2015-07-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.8.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/heI10N (CHANGES.txt)
[2]: http://goo.gl/BIe5dS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0-rc2 released

2015-07-09 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-rc2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pE0pPF (CHANGES.txt)
[2]: http://goo.gl/h5OJie (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Adding Nodes With Inconsistent Data

2015-06-24 Thread Jake Luciani
This is no longer an issue in 2.1.
https://issues.apache.org/jira/browse/CASSANDRA-2434

We now make sure the replica we bootstrap from is the one that will no
longer own that range

On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 It looks to me that can indeed happen theoretically (I might be wrong).

 However,

 - Hinted Handoff tends to remove this issue, if this is big worry, you
 might want to make sure HH are enabled and well tuned
 - Read Repairs (synchronous or not) might have mitigate things also, if
 you read fresh data. You can set this to higher values.
 - After an outage, you should always run a nodetool repair on the node
 that went done - following the best practices, or because you understand
 the reasons - or just trust HH if it is enough to you.

 So I would say that you can always shoot yourself in your foot, whatever
 you do, yet following best practices or understanding the internals is the
 key imho.

 I would say it is a good question though.

 Alain.



 2015-06-24 19:43 GMT+02:00 Anuj Wadehra anujw_2...@yahoo.co.in:

 Hi,

 We faced a scenario where we lost little data after adding 2 nodes in the
 cluster. There were intermittent dropped mutations in the cluster. Need to
 verify my understanding how this may have happened to do Root Cause
 Analysis:

 Scenario: 3 nodes, RF=3, Read / Write CL= Quorum

 1. Due to overloaded cluster, some writes just happened on 2 nodes: node
 1  node 2 whike asynchronous mutations dropped on node 3.
 So say key K with Token T was not written to 3.

 2. I added node 4 and suppose as per newly calculated ranges, now token T
 is supposed to have replicas on node 1, node 3, and node 4. Unfortunately
 node 4 started bootstrapping from node 3 where key K was missing.

 3. After 2 min gap recommended, I added node 5 and as per new token
 distribution suppose token T now is suppossed to have replicas on node 3,
 node 4 and node 5. Again node 5 bootstrapped from node 3 where data was
 misssing.

 So now key K is lost and thats how we list very few rows.

 Moreover, in step 1 situation could be worse. we can also have a scenario
 where some writes just happened on one of three replicas and cassandra
 chooses  replicas where this data is missing for streaming ranges to 2 new
 nodes.

 Am I making sense?

 We are using C* 2.0.3.

 Thanks
 Anuj



 Sent from Yahoo Mail on Android
 https://overview.mail.yahoo.com/mobile/?.src=Android





-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 2.0.16 released

2015-06-22 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.16.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/XtSTxA (CHANGES.txt)
[2]: http://goo.gl/9NHMdH (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.7 released

2015-06-22 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.7.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/0AxLpL (CHANGES.txt)
[2]: http://goo.gl/kkEDSi (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0-rc1 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pBjybx (CHANGES.txt)
[2]: http://goo.gl/E1RiHd (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.6 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.6.  We are now calling 2.1 series stable and suitable for
production.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/8aR9L2 (CHANGES.txt)
[2]: http://goo.gl/dstU4D (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Multiple cassandra instances per physical node

2015-05-26 Thread Jake Luciani

  If I have a 20-node cluster with 2 nodes on each physical server, can I
 use 10 racks to properly segment my partitions?


Yes.




 On Sun, May 24, 2015 at 5:38 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 What impact would vnodes have on strong consistency?  I think the problem
 you're describing exists with or without them.

 On Sat, May 23, 2015 at 2:30 PM Nate McCall n...@thelastpickle.com
 wrote:


 So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
 nodes (each with 5 data disks, 1 commit log disk) and either give each its
 own container  IP or change the listen ports. Will this work? What are the
 risks? Will/should Cassandra support this better in the future?


 Don't use vnodes if any operations need strong consistency (reading or
 writing at quorum). Otherwise, at RF=3, if you loose a single node you will
 only have one 1 replica left for some portion of the ring.



 --
 -
 Nate McCall
 Austin, TX
 @zznate

 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com




 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
 http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.




-- 
http://twitter.com/tjake


[BETA-RELEASE] Apache Cassandra 2.2.0-beta1 released

2015-05-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-beta1.

This release is *not* production ready. We are looking for testing of
existing and new features. If you encounter any problem please let us know
[1].

Cassandra 2.2 features major enhancements such as:

* Resume-able Bootstrapping
* JSON Support [4]
* User Defined Functions [5]
* Server-side Aggregation [6]
* Role based access control

Read [2] and [3] to learn about all the new features.

Downloads of source and binary distributions are listed in our download
section:

http://cassandra.apache.org/download/

Enjoy!

-The Cassandra Team

[1]: https://issues.apache.org/jira/browse/CASSANDRA
[2]: http://goo.gl/MyOEib (NEWS.txt)
[3]: http://goo.gl/MBJd1S (CHANGES.txt)
[4]: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#json
[5]: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
[6]: http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udas


[RELEASE] Apache Cassandra 2.0.15 released

2015-05-18 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.15.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/G050Kn (CHANGES.txt)
[2]: http://goo.gl/ZyvMnR (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.5 released

2015-04-29 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.5.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/xjzhhE (CHANGES.txt)
[2]: http://goo.gl/skvzNS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[SECURITY ANNOUNCEMENT] CVE-2015-0225

2015-04-01 Thread Jake Luciani
CVE-2015-0225: Apache Cassandra remote execution of arbitrary code

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Cassandra 1.2.0 to 1.2.19
Cassandra 2.0.0 to 2.0.13
Cassandra 2.1.0 to 2.1.3

Description:
Under its default configuration, Cassandra binds an unauthenticated
JMX/RMI interface to all network interfaces.  As RMI is an API for the
transport and remote execution of serialized Java, anyone with access
to this interface can execute arbitrary code as the running user.

Mitigation:
1.2.x has reached EOL, so users of = 1.2.x are recommended to upgrade
to a supported version of Cassandra, or manually configure encryption
and authentication of JMX,
(seehttps://wiki.apache.org/cassandra/JmxSecurity).
2.0.x users should upgrade to 2.0.14
2.1.x users should upgrade to 2.1.4
Alternately, users of any version not wishing to upgrade can
reconfigure JMX/RMI to enable encryption and authentication according
to https://wiki.apache.org/cassandra/JmxSecurityor
http://docs.oracle.com/javase/7/docs/technotes/guides/management/agent.html

Credit:
This issue was discovered by Georgi Geshev of MWR InfoSecurity


[RELEASE] Apache Cassandra 2.0.13 released

2015-03-16 Thread Jake Luciani
Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/Rh9gyx (CHANGES.txt)
[2]: http://goo.gl/k8vIom (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Cassandra Stress Test Result Evaluation

2015-03-09 Thread Jake Luciani
Your insert settings look unrealistic since I doubt you would be
writing 50k rows at a time.  Try to set this to 1 per partition and
you should get much more consistent numbers across runs I would think.
select: fixed(1)/10

On Wed, Mar 4, 2015 at 7:53 AM, Nisha Menon nisha.meno...@gmail.com wrote:
 I have been using the cassandra-stress tool to evaluate my cassandra cluster
 for quite some time now. My problem is that I am not able to comprehend the
 results generated for my specific use case.

 My schema looks something like this:

 CREATE TABLE Table_test(
   ID uuid,
   Time timestamp,
   Value double,
   Date timestamp,
   PRIMARY KEY ((ID,Date), Time)
 ) WITH COMPACT STORAGE;

 I have parsed this information in a custom yaml file and used parameters
 n=1, threads=100 and the rest are default options (cl=one, mode=native
 cql3 etc). The Cassandra cluster is a 3 node CentOS VM setup.

 A few specifics of the custom yaml file are as follows:

 insert:
 partitions: fixed(100)
 select: fixed(1)/2
 batchtype: UNLOGGED

 columnspecs:
 -name: Time
  size: fixed(1000)
 -name: ID
  size: uniform(1..100)
 -name: Date
  size: uniform(1..10)
 -name: Value
  size: uniform(-100..100)

 My observations so far are as follows (Please correct me if I am wrong):

 With n=1 and time: fixed(1000), the number of rows getting inserted is
 10 million. (1*1000=1000)
 The number of row-keys/partitions is 1(i.e n), within which 100
 partitions are taken at a time (which means 100 *1000 = 10 key-value
 pairs) out of which 5 key-value pairs are processed at a time. (This is
 because of select: fixed(1)/2 ~ 50%)

 The output message also confirms the same:

 Generating batches with [100..100] partitions and [5..5] rows
 (of[10..10] total rows in the partitions)

 The results that I get are the following for consecutive runs with the same
 configuration as above:

 Run Total_ops   Op_rate Partition_rate  Row_Rate   Time
 1 56   19 1885   943246 3.0
 2 46   46 4648  2325498 1.0
 3 27   30 2982  1489870 0.9
 4 59   19 1932   966034 3.1
 5 100  17 1730   865182 5.8

 Now what I need to understand are as follows:

 Which among these metrics is the throughput i.e, No. of records inserted per
 second? Is it the Row_rate, Op_rate or Partition_rate? If it’s the Row_rate,
 can I safely conclude here that I am able to insert close to 1 million
 records per second? Any thoughts on what the Op_rate and Partition_rate mean
 in this case?
 Why is it that the Total_ops vary so drastically in every run ? Has the
 number of threads got anything to do with this variation? What can I
 conclude here about the stability of my Cassandra setup?
 How do I determine the batch size per thread here? In my example, is the
 batch size 5?

 Thanks in advance.



-- 
http://twitter.com/tjake


Re: Many pending compactions

2015-02-18 Thread Jake Luciani
Ja, Please upgrade to official 2.1.3 we've fixed many things related to
compaction.  Are you seeing the compactions % complete progress at all?

On Wed, Feb 18, 2015 at 11:58 AM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 Try repair -pr on all nodes.

 If after that you still have issues, you can try to rebuild the SSTables
 using nodetool upgradesstables or scrub.

 Regards,

 Roni Balthazar

 Em 18/02/2015, às 14:13, Ja Sam ptrstp...@gmail.com escreveu:

 ad 3)  I did this already yesterday (setcompactionthrouput also). But
 still SSTables are increasing.

 ad 1) What do you think I should use -pr or try to use incremental?



 On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 You are right... Repair makes the data consistent between nodes.

 I understand that you have 2 issues going on.

 You need to run repair periodically without errors and need to decrease
 the numbers of compactions pending.

 So I suggest:

 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can
 use incremental repairs. There were some bugs on 2.1.2.
 2) Run cleanup on all nodes
 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0,
 and increase setcompactionthroughput for some time and see if the number
 of SSTables is going down.

 Let us know what errors are you getting when running repairs.

 Regards,

 Roni Balthazar


 On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote:

 Can you explain me what is the correlation between growing SSTables and
 repair?
 I was sure, until your  mail, that repair is only to make data
 consistent between nodes.

 Regards


 On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com
  wrote:

 Which error are you getting when running repairs?
 You need to run repair on your nodes within gc_grace_seconds (eg:
 weekly). They have data that are not read frequently. You can run
 repair -pr on all nodes. Since you do not have deletes, you will not
 have trouble with that. If you have deletes, it's better to increase
 gc_grace_seconds before the repair.

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
 After repair, try to run a nodetool cleanup.

 Check if the number of SSTables goes down after that... Pending
 compactions must decrease as well...

 Cheers,

 Roni Balthazar




 On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote:
  1) we tried to run repairs but they usually does not succeed. But we
 had
  Leveled compaction before. Last week we ALTER tables to STCS, because
 guys
  from DataStax suggest us that we should not use Leveled and alter
 tables in
  STCS, because we don't have SSD. After this change we did not run any
  repair. Anyway I don't think it will change anything in SSTable count
 - if I
  am wrong please give me an information
 
  2) I did this. My tables are 99% write only. It is audit system
 
  3) Yes I am using default values
 
  4) In both operations I am using LOCAL_QUORUM.
 
  I am almost sure that READ timeout happens because of too much
 SSTables.
  Anyway firstly I would like to fix to many pending compactions. I
 still
  don't know how to speed up them.
 
 
  On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar 
 ronibaltha...@gmail.com
  wrote:
 
  Are you running repairs within gc_grace_seconds? (default is 10 days)
 
 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
 
  Double check if you set cold_reads_to_omit to 0.0 on tables with STCS
  that you do not read often.
 
  Are you using default values for the properties
  min_compaction_threshold(4) and max_compaction_threshold(32)?
 
  Which Consistency Level are you using for reading operations? Check
 if
  you are not reading from DC_B due to your Replication Factor and CL.
 
 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
 
 
  Cheers,
 
  Roni Balthazar
 
  On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com
 wrote:
   I don't have problems with DC_B (replica) only in DC_A(my system
 write
   only
   to it) I have read timeouts.
  
   I checked in OpsCenter SSTable count  and I have:
   1) in DC_A  same +-10% for last week, a small increase for last
 24h (it
   is
   more than 15000-2 SSTables depends on node)
   2) in DC_B last 24h shows up to 50% decrease, which give nice
   prognostics.
   Now I have less then 1000 SSTables
  
   What did you measure during system optimizations? Or do you have
 an idea
   what more should I check?
   1) I look at CPU Idle (one node is 50% idle, rest 70% idle)
   2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there
 are
   spikes
   3) system RAM usage is almost full
   4) In Total Bytes Compacted most most lines are below 3MB/s. For
 total
   DC_A
   it is less than 10MB/s, in DC_B it looks much better (avg is like
   17MB/s)
  
   something else?
  
  
  
   On Wed, Feb 18, 2015 at 1:32 

[RELEASE] Apache Cassandra 2.1.3 released

2015-02-17 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.3.

This release contains over 100 fixes for 2.1 so anyone on 2.1.X should
upgrade to this ASAP.


Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/xGm4Qq (CHANGES.txt)
[2]: http://goo.gl/dBGQa0 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.12 released

2015-01-20 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.12.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Enjoy!

[1]: http://goo.gl/ZeeTfs (CHANGES.txt)
[2]: http://goo.gl/1zEijH (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.1.2 released

2014-11-10 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problems.

Enjoy!

[1]: http://goo.gl/pi45XF (CHANGES.txt)
[2]: http://goo.gl/vtSXzZ (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Apache Cassandra debian repo issue

2014-10-31 Thread Jake Luciani
Hello,

There is currently an issue with the apache debian repo for cassandra.

ASF infrastructure is working on fixing this
https://issues.apache.org/jira/browse/INFRA-8558

Sorry for the inconvenience.

-Jake


Re: CPU consumption of Cassandra

2014-09-22 Thread Jake Luciani
Eric,

We have a new stress tool to help you share your schema for wider bench
marking.  see
http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
If you wouldn't mind creating a yaml for your schema I would be happy to
take a look.

-Jake




On Mon, Sep 22, 2014 at 12:39 PM, Leleu Eric eric.le...@worldline.com
wrote:

  Hi,





 I’m currently testing Cassandra 2.0.9  (and since the last week 2.1) under
 some read heavy load…



 I have 2 cassandra nodes (RF : 2) running under CentOS 6 with 16GB of RAM
 and 8 Cores.

 I have around 93GB of data per node (one Disk of 300GB with SAS interface
 and a Rotational Speed of 10500)



 I have 300 active client threads and they request the C* nodes with a
 Consitency level set to ONE (I’m using the CQL datastax driver).



 During my tests I saw  a lot of CPU consumption (70% user / 6%sys / 4%
 iowait / 20%idle).

 C* nodes respond to around 5000 op/s (sometime up to 6000op/s)



 I try to profile a node and at the first look, 60% of the CPU is passed in
 the “sun.nio.ch” package. (SelectorImpl.select or Channel.read)



 I know that Benchmark results are highly dependent of the Dataset and use
 cases, but according to my point of view this CPU consumption is normal
 according to the load.

 Someone can confirm that point ?

 According to my Hardware configuration, can I expect to have more than
 6000 read op/s ?





 Regards,

 Eric









 --

 Ce message et les pièces jointes sont confidentiels et réservés à l'usage
 exclusif de ses destinataires. Il peut également être protégé par le secret
 professionnel. Si vous recevez ce message par erreur, merci d'en avertir
 immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
 pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
 être recherchée quant au contenu de ce message. Bien que les meilleurs
 efforts soient faits pour maintenir cette transmission exempte de tout
 virus, l'expéditeur ne donne aucune garantie à cet égard et sa
 responsabilité ne saurait être recherchée pour tout dommage résultant d'un
 virus transmis.

 This e-mail and the documents attached are confidential and intended
 solely for the addressee; it may also be privileged. If you receive this
 e-mail in error, please notify the sender immediately and destroy it. As
 its integrity cannot be secured on the Internet, the Worldline liability
 cannot be triggered for the message content. Although the sender endeavours
 to maintain a computer virus-free network, the sender does not warrant that
 this transmission is virus-free and will not be liable for any damages
 resulting from any virus transmitted.




-- 
http://twitter.com/tjake


[RELEASE] Apache Cassandra 1.2.19 released

2014-09-18 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.19.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem. This will likely be the final release in the 1.2
series.

Enjoy!

[1]: http://goo.gl/F6szqv (CHANGES.txt)
[2]: http://goo.gl/9VsZ88 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: [RELEASE] Apache Cassandra 1.2.19 released

2014-09-18 Thread Jake Luciani
Apologies, the correct url for CHANGES.txt is http://goo.gl/eB973i

On Thu, Sep 18, 2014 at 12:58 PM, Jake Luciani j...@apache.org wrote:

 The Cassandra team is pleased to announce the release of Apache Cassandra
 version 1.2.19.

 Cassandra is a highly scalable second-generation distributed database,
 bringing together Dynamo's fully distributed design and Bigtable's
 ColumnFamily-based data model. You can read more here:

  http://cassandra.apache.org/

 Downloads of source and binary distributions are listed in our download
 section:

  http://cassandra.apache.org/download/

 This version is a maintenance/bug fix release[1] on the 1.2 series. As
 always,
 please pay attention to the release notes[2] and Let us know[3] if you
 were to
 encounter any problem. This will likely be the final release in the 1.2
 series.

 Enjoy!

 [1]: http://goo.gl/F6szqv (CHANGES.txt)
 [2]: http://goo.gl/9VsZ88 (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA



Re: Why is the cassandra documentation such poor quality?

2014-07-23 Thread Jake Luciani
I'll note that historically the wiki used to be open to all and due massive
amounts of spam it was put on lockdown by the ASF.

If there is a better platform the community feels would make it simpler to
provide community based documentation then we should consider it.
The ASF also has confluence wiki which might be simpler for users to
contribute to? (at least they have captchas)

-Jake



On Wed, Jul 23, 2014 at 9:20 AM, Peter Lin wool...@gmail.com wrote:

 @benedict - you're right that I've haven't requested permission to edit.
 You're also right that I've given up on getting edit permission to
 cassandra wiki. I've been struggling and struggled with how to manage
 open source projects, so I totally get it. Managing projects is a thankless
 job most of the time. Pleasing everyone is totally impossible. Apache isn't
 alone in this. I've submitted stuff to google's open source projects in the
 past and had it go into a black hole. We all struggle with managing open
 source projects.

 I am committed to contributing Cassandra community, but just not through
 the wiki. There's lots of different ways to contribute. The jira tickets
 I've submitted have gotten good responses generally. It does take several
 days depending on how busy the committers are, but that's normal for all
 projects.



 On Wed, Jul 23, 2014 at 9:00 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 Requesting a change is very different to requesting permission to edit
 (which, I note, still hasn't been made); we do our best to promote
 community engagement, so granting a privilege request has a different
 mental category to a random edit request, which is much more likely to be
 forgotten by any particular committer in the process of attending to their
 more pressing work.

 The relationship between committers and the community is debated at
 length in all projects, often by vocal individuals such as yourselves who
 are unhappy in some way with how the project is being run. However it is
 very hard to please everyone - most of the time we can't even please all
 the committers, and that is a much smaller and more homogenous group.





 On Wed, Jul 23, 2014 at 2:30 PM, Peter Lin wool...@gmail.com wrote:


 I sent a request to add a link my .Net driver for cassandra to the wiki
 over 5 weeks back and no response at all.

 I sent another request way back in 2013 and got zero response. Again, I
 totally understand people are busy and I'm just as guilty as everyone else
 of letting requests slip by. It's the reality of contributing to open
 source as a hobby. If I wasn't serious about contributing to cassandra
 community, I wouldn't have spent 2.5 months porting Hector to C# manually.

 Perhaps the real cause is that some committers can't empathise with
 others in the community?


 On Wed, Jul 23, 2014 at 8:22 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:

 All requests I've seen in the past year to edit the wiki (admittedly
 only 2-3) have been answered promptly with editing privileges. Personally I
 don't have a major preference either way for policy - there are positives
 and negatives to each approach - but, like I said, raise it on the dev list
 and see if anybody else does.

 However I must admit I cannot empathise with your characterisation of
 requesting permission as 'begging', or a 'slap in the face', or that it is
 even particularly onerous. It is a slight psychological barrier, but in my
 personal experience when a psychological barrier as low as this prevents me
 from taking action, it's usually because I don't have as much desire to
 contribute as I thought I did.




 On Wed, Jul 23, 2014 at 1:54 PM, Peter Lin wool...@gmail.com wrote:


 I've submitted requests to edit the wiki in the past and nothing ever
 got done.

 Having been an apache committer and contributor over the years, I can
 totally understand that people are busy. I also understand that most
 developer find writing docs tedious.

 I'd rather not harass the committers about wiki edits, since I didn't
 like it when it happened to me in the past. That's why many apache 
 projects
 keep their wiki's open. Honestly, as much as I find writing docs
 challenging and tedious, it's critical and important. For my other open
 source projects, I force myself to write docs.

 my point is, the wiki should be open and the barrier should be
 removed. Having to beg/ask to edit the wiki feels like a slap in the 
 face
 to me, but maybe I'm alone in this. Then again, I've heard the same
 sentiment from other people about cassandra's wiki. The thing is, they 
 just
 chalk it up to cassandra committers don't give a crap about docs. I do 
 my
 best to defend the committers and point out some are volunteers, but it
 does give the public a negative impression. I know the committers care
 about docs, but they don't always have time to do it.

 I know that given a choice between coding or writing docs, 90% of the
 time I'll choose coding. What I've decided 

Re: Which way to Cassandraville?

2014-07-22 Thread Jake Luciani
Checkout datastax devcenter which is a GUI datamodelling tool for cql3

http://www.datastax.com/what-we-offer/products-services/devcenter


On Sun, Jul 20, 2014 at 7:17 PM, jcllings jclli...@gmail.com wrote:

 So I'm a Java application developer and I'm trying to find entry points
 for learning to work with Cassandra.
 I just finished reading Cassandra: The Definitive Guide which seems
 pretty out of date and while very informative as to the technology that
 Cassandra uses, was not very helpful from the perspective of an
 application developer.

 Having said that, what Java clients should I be looking at?  Are there
 any reasonably mature PoJo mapping techs for Cassandra analogous to
 Hibernate? I can't say that I'm looking forward to yet another *QL
 variant but I guess CQL is going to be a necessity.  What, if any, GUI
 tools are available for working with Cassandra, for data modelling?

 Jim C.




-- 
http://twitter.com/tjake


Re: high pending compactions

2014-06-08 Thread Jake Luciani
23

On Sunday, June 8, 2014, S C as...@outlook.com wrote:

 I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending
 compaction count. pending tasks: 67 while active compaction tasks are
 not more than 5. I have a 24CPU machine. Shouldn't I be seeing more
 compactions? Is this a pattern of high writes and compactions backing up?
 How can I improve this? Here are my thoughts.


1. Increase memtable_total_space_in_mb
2. Increase compaction_throughput_mb_per_sec
3. Increase concurrent_compactions


 Sorry if this was discussed already. Any pointers is much appreciated.

 Thanks,
 Kumar



-- 
http://twitter.com/tjake


Re: HsHa

2013-08-14 Thread Jake Luciani
This is technically a Thrift message not Cassandra, it happens when a
client hangs up without closing the socket.
You should be able to silence it by raising the class specific log level
see log4j-server.properties as an example


On Wed, Aug 14, 2013 at 9:59 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 @Commiters/Experts,

 Does this sound like a bug or like 4 PEBCAKs to you ? Should we raise a
 JIRA ?

 Alain


 2013/8/14 Keith Wright kwri...@nanigans.com

 Same here on 1.2.4.

 From: Romain HARDOUIN romain.hardo...@urssaf.fr
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday, August 14, 2013 3:36 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: HsHa

 The same goes for us.

 Romain

 Alain RODRIGUEZ arodr...@gmail.com a écrit sur 13/08/2013 18:10:05 :

  De : Alain RODRIGUEZ arodr...@gmail.com
  A : user@cassandra.apache.org,
  Date : 13/08/2013 18:10
  Objet : Re: HsHa
 
  I have this anytime I try to switch to hsha since 0.8.
 
  Always kept sync for this reason. Thought I was alone with this
  bug since I never had any clue about this on the mailing list.
 
  So +1.
 
  Alain
 

  2013/8/13 Christopher Wirt chris.w...@struq.com
  Hello,
 
  I was trying out the hsha thrift server implementation and found
  that I get a fair amount of these appearing in the server logs.
 
  ERROR [Selector-Thread-9] 2013-08-13 15:39:10,433
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
  ERROR [Selector-Thread-9] 2013-08-13 15:39:11,499
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
  ERROR [Selector-Thread-9] 2013-08-13 15:39:11,695
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
  ERROR [Selector-Thread-9] 2013-08-13 15:39:12,562
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
  ERROR [Selector-Thread-1] 2013-08-13 15:39:12,660
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
  ERROR [Selector-Thread-9] 2013-08-13 15:39:13,496
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
  ERROR [Selector-Thread-9] 2013-08-13 15:39:14,281
  TNonblockingServer.java (line 468) Read an invalid frame size of 0.
  Are you using TFramedTransport on the client side?
 
  Anyone seen this message before? know what it means? or issues it could
 hide?
 
  https://issues.apache.org/jira/browse/CASSANDRA-4573
  in the comments suggests it might be a 10 client timeout
  but looking at JMX client stats the max value for read/write/slice
  is well below 10secs
 
 
  I’m using 1.2.8 on centos
 
 
  Cheers,
  Chris





-- 
http://twitter.com/tjake


Re: Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?

2013-07-15 Thread Jake Luciani
Take a look at https://issues.apache.org/jira/browse/CASSANDRA-5661


On Mon, Jul 15, 2013 at 4:18 AM, sulong sulong1...@gmail.com wrote:

 Thanks for your help. Yes, I will try to increase the sstable size. I hope
 it can save me.

 9000 SSTableReader x 10 RandomAccessReader x 64Kb = 5.6G memory. If there
 is only one RandomAccessReader, the memory will be 9000 * 1 * 64Kb = 0.56G
 . Looks great. But I think it must be reasonable to recycle the
 RandomAccessReader.


 On Mon, Jul 15, 2013 at 4:02 PM, Janne Jalkanen 
 janne.jalka...@ecyrd.comwrote:


 I had exactly the same problem, so I increased the sstable size (from 5
 to 50 MB - the default 5MB is most certainly too low for serious usecases).
  Now the number of SSTableReader objects is manageable, and my heap is
 happier.

 Note that for immediate effect I stopped the node, removed the *.json
 files and restarted - which put all SSTables to L0, which meant a weekend
 full of compactions… Would be really cool if there was a way to
 automatically drop all LCS SSTables one level down to make them compact
 earlier without avoiding the
 OMG-must-compact-everything-aargh-my-L0-is-full -effect of removing the
 JSON file.

 /Janne

 On 15 Jul 2013, at 10:48, sulong sulong1...@gmail.com wrote:

  Why does cassandra PoolingSegmentedFile recycle the RandomAccessReader?
 The RandomAccessReader objects consums too much memory.
 
  I have a cluster of 4 nodes. Every node's cassandra jvm has 8G heap.
 The cassandra's memory is full after about one month, so I have to restart
 the 4 nodes every month.
 
  I have 100G data on every node, with LevedCompactionStrategy and 10M
 sstable size, so there are more than 1 sstable files. By looking
 through the heap dump file, I see there are more than 9000 SSTableReader
 objects in memory, which references lots of  RandomAccessReader objects.
 The memory is consumed by these RandomAccessReader objects.
 
  I see the PoolingSegementedFile has a recycle method, which puts the
 RandomAccessReader to a queue. Looks like the Queue always grow until the
 sstable is compacted.  Is there any way to stop the RandomAccessReader
 recycling? Or, set a limit to the recycled RandomAccessReader's number?
 
 





-- 
http://twitter.com/tjake


Re: Leveled Compaction, number of SStables growing.

2013-07-09 Thread Jake Luciani
We run with 128mb some run with 256mb.  Leveled compaction creates fixed
sized sstables by design so this is the only way to lower the file count.


On Tue, Jul 9, 2013 at 2:56 PM, PARASHAR, BHASKARJYA JAY bp1...@att.comwrote:

  Hi,

 ** **

 We recently switched from size tired compaction to Leveled compaction. We
 made this change because our rows are frequently updated. We also have a
 lot of data.

 With size-tiered compaction, we have about 5-10 sstables per CF. So with
 about 15 CF’s we had about 100 sstables.

 With a sstable default sixe of 5mb, now after leveled compaction, we have
 about 130k sstables and growing as the writes increases. There are a lot of
 compaction jobs pending.

 If we increase the SStable size to 20mb, that will be about 30k sstables
 but it’s still a lot.

 ** **

 Is this common? Any solution, hints on reducing the sstables are welcome.*
 ***

 ** **

 Thanks

 -Jay




-- 
http://twitter.com/tjake


Re: Data model for financial time series

2013-06-07 Thread Jake Luciani
We have built a similar system, you can ready about our data model in CQL3
here:

http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013

We are going to be presenting a similar talk next week at the cassandra
summit.


On Fri, Jun 7, 2013 at 12:34 PM, Davide Anastasia 
davide.anasta...@qualitycapital.com wrote:

  Hi,

 I am trying to build the storage of stock prices in Cassandra. My queries
 are ideally of three types:

 - give me everything between time A and time B;

 - give me everything about symbol X;

 - give me everything of type Y;

 …or an intersection of the three. Something I will be happy doing is:

 - give me all the trades about APPL between 7:00am and 3:00pm of a certain
 day.

 ** **

 However, being a time series, I will be happy to retrieve the data in
 ascending order of timestamp (from 7:00 to 3:00).

 ** **

 I have tried to build my table with the timestamp (as timeuuid) as primary
 key, however I cannot manage to get my data in order and and “order by” in
 CQL3 raise an error and doesn’t perform the query.

 ** **

 Does anybody have any suggestion to get a good design the fits my queries?
 

 Thanks,

 David




-- 
http://twitter.com/tjake


Re: Cassandra and Apache Drill

2012-08-31 Thread Jake Luciani
I don't think Drill has been accepted into the incubator yet or has any
code.

If/When that happens then it's entirely possible Cassandra could be
integrated.

On Fri, Aug 31, 2012 at 4:29 PM, John Onusko jonu...@actiance.com wrote:

 Like a lot of folks, I have a need for Big Data and fast queries on that
 data. Hive queries against Cassandra functionally meet my requirements, but
 the job oriented processing is too slow when you need to execute many
 queries on a small portion of the data. It seems like Apache Drill might be
 the right answer to this problem. I see HBase mentioned as a possible
 integration point with Drill, but no mention of Cassandra. Has anyone taken
 a look at Drill to see how it could access the data in Cassandra?

 ** **

 -John




-- 
http://twitter.com/tjake


Re: DSE solr HA

2012-08-13 Thread Jake Luciani


 
 
  
 Going through this page and it looks like indexes are stored locally 
 http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details . 
 My question is what happens if one of the solr nodes crashes? Is the data 
 indexed again on those nodes?
  

Yes the data is indexed again on the node. Either from the commitlog or hints 
or repair. Same as Cassandra. 

 Also, if RF  1 then is the same data being indexed on all RF nodes or is 
 that RF only for document replication?

The former. Each Replica has a indexed copy. We remove duplicates on read. 

Re: java.lang.OutOfMemoryError: unable to create new native thread

2012-06-25 Thread Jake Luciani
This means you need to raise the nproc limit for the user you run cassandra
with

On Mon, Jun 25, 2012 at 8:48 AM, Oli Schacher cassan...@lists.wgwh.chwrote:

 Hi list

 I have a small cassandra cluster consisting of three nodes. Every few
 weeks the whole cluster goes down at the same time. All nodes show:

 java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at
 java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at
 java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1336)
at
 org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:104)
at
 org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:214)

 There are no other log messages shortly before the crash.

 I don't have much experience with cassandra, so I probably forgot to
 configure an important memory parameter. But before I screw things up
 even more, I hope someone on the list can point me in the right
 direction.

 Hardware:
 Each Node runs on two Intel Xeon CPU E5645  @ 2.40GHz (6 physical cores
 per CPU, 12 total), 12 Gig memory

 Software:
 Datastax Cassandra 1.1 , on Centos 6

 Clients:
 10 linux servers, all of them connecting using pycassa. total of 10-30
 writes / sec

 I haven't changed any memory settings from the default, except
 uncommented
 MAX_HEAP_SIZE=4G
 HEAP_NEWSIZE=800M
 in cassandra-env.sh, this hasn't made a difference though.

 Any hints would be appreciated.

 Thanks,
 Oli





-- 
http://twitter.com/tjake


Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Jake Luciani
Hi Safdar,

If you want to get better utilization of the cluster raise the
solandra.shards.at.once param in solandra.properties

-Jake



On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy safdar.kurei...@gmail.com
 wrote:

 Hi,

 I've searched online but was unable to find any leads for the problem
 below. This mailing list seemed the most appropriate place. Apologies in
 advance if that isn't the case.

 I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
 nodes with tokens *evenly distributed across the token space*, for a
 5-node cluster (as evidenced below under the effective-ownership column
 of the nodetool ring output). My data is a set of a few million crawled
 web pages, crawled using Nutch, and also indexed using the solrindex
 command available through Nutch. AFAIK, the key for each document generated
 from the crawled data is the URL.

 Based on the load values for the nodes below, despite adding about 3
 million web pages to this index via the HTTP Rest API (e.g.:
 http://9.9.9.x:8983/solandra/index/update), some nodes are still
 empty. Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
 (shown in *bold* below) of the index, while the remaining 3 nodes are
 consistently getting hammered by all the data. If the RandomPartioner
 (which is what I'm using for this cluster) is supposed to achieve an even
 distribution of keys across the token space, why is it that the data below
 is skewed in this fashion? Literally, no key was yet been hashed to the
 nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
 this absurdity?.

 [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
 Address DC  RackStatus State   Load
  Effective-Owership  Token

  136112946768375385385349842972707284580
 9.9.9.0   datacenter1 rack1   Up Normal  7.57 GB
 20.00%  0
 9.9.9.1   datacenter1 rack1   Up Normal  *21.44 KB*
  20.00%  34028236692093846346337460743176821145
 9.9.9.2   datacenter1 rack1   Up Normal  14.99 GB
  20.00%  68056473384187692692674921486353642290
 9.9.9.3   datacenter1 rack1   Up Normal  *50.79 KB*
  20.00%  102084710076281539039012382229530463435
 9.9.9.4   datacenter1 rack1   Up Normal  15.22 GB
  20.00%  136112946768375385385349842972707284580

 Thanks in advance.

 Regards,
 Safdar




-- 
http://twitter.com/tjake


Re: 200TB in Cassandra ?

2012-04-20 Thread Jake Luciani
What other solutions are you considering?  Any OLTP style access of 200TB
of data will require substantial IO.

Do you know how big your working dataset will be?

-Jake

On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter franc.car...@sirca.org.auwrote:

 On Fri, Apr 20, 2012 at 6:27 AM, aaron morton aa...@thelastpickle.comwrote:

 Couple of ideas:

 * take a look at compression in 1.X
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
 * is there repetition in the binary data ? Can you save space by
 implementing content addressable storage ?


 The data is already very highly space optimised. We've come to the
 conclusion that Cassandra is probably not the right fit the use case this
 time

 cheers



 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/04/2012, at 12:55 AM, Dave Brosius wrote:

  I think your math is 'relatively' correct. It would seem to me you
 should focus on how you can reduce the amount of storage you are using per
 item, if at all possible, if that node count is prohibitive.

 On 04/19/2012 07:12 AM, Franc Carter wrote:


  Hi,

  One of the projects I am working on is going to need to store about
 200TB of data - generally in manageable binary chunks. However, after doing
 some rough calculations based on rules of thumb I have seen for how much
 storage should be on each node I'm worried.

200TB with RF=3 is 600TB = 600,000GB
   Which is 1000 nodes at 600GB per node

  I'm hoping I've missed something as 1000 nodes is not viable for us.

  cheers

  --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
No, the indexes are not rebuilt every compaction.  Only if you manually
rebuild or bootstrap a new node does it use compaction manager to rebuild.

On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin potek...@bnl.gov wrote:

  Thanks Aaaron. Just to be clear, every time I do a compaction,
 I rebuild all indexes from scratch. Right?

 Maxim



 On 4/17/2012 6:16 AM, aaron morton wrote:

 Yes secondary index builds are done via the compaction manager.

  Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:

  I noticed that nodetool compactionstats shows the building of the
 secondary index while
 I initiate compaction. Is this to be expected? Cassandra version 0.8.8.

 Thank you

 Maxim






-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
Well, the since the secondary indexes are themselves column families they
too are compacted along with everything else.

On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin potek...@bnl.gov wrote:

  Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
 pending tasks in compaction stats, and tons of index files created in the
 data directory. Plus it does tell me that it is building the secondary
 index,
 and that seems to be happening at an amazingly glacial pace.

 I have 2 CFs there, with multiple secondary indexes. I'll try
 to compact the CF one by one, reboot and see if that helps.

 Maxim



 On 4/17/2012 9:53 AM, Jake Luciani wrote:

 No, the indexes are not rebuilt every compaction.  Only if you manually
 rebuild or bootstrap a new node does it use compaction manager to rebuild.

 On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin potek...@bnl.gov wrote:

  Thanks Aaaron. Just to be clear, every time I do a compaction,
 I rebuild all indexes from scratch. Right?

 Maxim



 On 4/17/2012 6:16 AM, aaron morton wrote:

 Yes secondary index builds are done via the compaction manager.

  Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:

  I noticed that nodetool compactionstats shows the building of the
 secondary index while
 I initiate compaction. Is this to be expected? Cassandra version 0.8.8.

 Thank you

 Maxim






  --
 http://twitter.com/tjake





-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
Hmm that does sound fishy.

When you run show keyspaces from cassandra-cli it shows which indexes are
built.  Are they marked built in your column family?

-Jake

On Tue, Apr 17, 2012 at 10:09 AM, Maxim Potekhin potek...@bnl.gov wrote:

  I understand that indexes are CFs. But the compaction stats says it's
 building the
 index, not compacting the corresponding CF. Either that's an ambiguous
 diagnostic,
 or indeed something is not right with my rig as of late.

 Maxim




 On 4/17/2012 10:05 AM, Jake Luciani wrote:

 Well, the since the secondary indexes are themselves column families they
 too are compacted along with everything else.

 On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin potek...@bnl.gov wrote:

  Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
 pending tasks in compaction stats, and tons of index files created in
 the
 data directory. Plus it does tell me that it is building the secondary
 index,
 and that seems to be happening at an amazingly glacial pace.

 I have 2 CFs there, with multiple secondary indexes. I'll try
 to compact the CF one by one, reboot and see if that helps.

 Maxim



 On 4/17/2012 9:53 AM, Jake Luciani wrote:

 No, the indexes are not rebuilt every compaction.  Only if you manually
 rebuild or bootstrap a new node does it use compaction manager to rebuild.

 On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin potek...@bnl.gov wrote:

  Thanks Aaaron. Just to be clear, every time I do a compaction,
 I rebuild all indexes from scratch. Right?

 Maxim



 On 4/17/2012 6:16 AM, aaron morton wrote:

 Yes secondary index builds are done via the compaction manager.

  Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:

  I noticed that nodetool compactionstats shows the building of the
 secondary index while
 I initiate compaction. Is this to be expected? Cassandra version 0.8.8.

 Thank you

 Maxim






  --
 http://twitter.com/tjake





  --
 http://twitter.com/tjake





-- 
http://twitter.com/tjake


Re: Is the secondary index re-built under compaction?

2012-04-17 Thread Jake Luciani
How many indexes are there?

On Tue, Apr 17, 2012 at 10:16 AM, Maxim Potekhin potek...@bnl.gov wrote:

  Yes. Sorry I didn't mention this, but of course I'm checking on indexes
 once in a while.
 So yes, they are marked as built.

 All of this started happening after a few days of continuous loading
 process. Since
 the nodes have good hardware (24 cores + SSD), the apparent load on each
 node
 was nothing remarkable, even at 20kHz insertion rate. But maybe I'm being
 overoptimistic.

 Maxim



 On 4/17/2012 10:12 AM, Jake Luciani wrote:

 Hmm that does sound fishy.

  When you run show keyspaces from cassandra-cli it shows which indexes
 are built.  Are they marked built in your column family?

  -Jake

  On Tue, Apr 17, 2012 at 10:09 AM, Maxim Potekhin potek...@bnl.govwrote:

  I understand that indexes are CFs. But the compaction stats says it's
 building the
 index, not compacting the corresponding CF. Either that's an ambiguous
 diagnostic,
 or indeed something is not right with my rig as of late.

 Maxim




 On 4/17/2012 10:05 AM, Jake Luciani wrote:

 Well, the since the secondary indexes are themselves column families they
 too are compacted along with everything else.

 On Tue, Apr 17, 2012 at 10:02 AM, Maxim Potekhin potek...@bnl.govwrote:

  Thanks Jake. Then I am definitely seeing weirdness, as there are tons of
 pending tasks in compaction stats, and tons of index files created in
 the
 data directory. Plus it does tell me that it is building the secondary
 index,
 and that seems to be happening at an amazingly glacial pace.

 I have 2 CFs there, with multiple secondary indexes. I'll try
 to compact the CF one by one, reboot and see if that helps.

 Maxim



 On 4/17/2012 9:53 AM, Jake Luciani wrote:

 No, the indexes are not rebuilt every compaction.  Only if you manually
 rebuild or bootstrap a new node does it use compaction manager to rebuild.

 On Tue, Apr 17, 2012 at 9:47 AM, Maxim Potekhin potek...@bnl.govwrote:

  Thanks Aaaron. Just to be clear, every time I do a compaction,
 I rebuild all indexes from scratch. Right?

 Maxim



 On 4/17/2012 6:16 AM, aaron morton wrote:

 Yes secondary index builds are done via the compaction manager.

  Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 17/04/2012, at 1:06 PM, Maxim Potekhin wrote:

  I noticed that nodetool compactionstats shows the building of the
 secondary index while
 I initiate compaction. Is this to be expected? Cassandra version 0.8.8.

 Thank you

 Maxim






  --
 http://twitter.com/tjake





  --
 http://twitter.com/tjake





  --
 http://twitter.com/tjake





-- 
http://twitter.com/tjake


Re: cassandra and .net

2012-04-10 Thread Jake Luciani
You can also look at using a .net client wrapper like
https://github.com/managedfusion/fluentcassandra

On Tue, Apr 10, 2012 at 8:06 AM, puneet loya puneetl...@gmail.com wrote:

 thankk  :) :) it works :)


 On Tue, Apr 10, 2012 at 3:07 PM, Henrik Schröder skro...@gmail.comwrote:

 In your code you are using BufferedTransport, but in the Cassandra logs
 you're getting errors when it tries to use FramedTransport. If I remember
 correctly, BufferedTransport is gone, so you should only use
 FramedTransport. Like this:

 TTransport transport = new TFramedTransport(new TSocket(host, port));

 TProtocol protocol = new TBinaryProtocol(transport);
 var client = new Cassandra.Client(protocol);
 transport.Open();
 client.describe_keyspace(abc);


 /Henrik


 On Tue, Apr 10, 2012 at 11:23, puneet loya puneetl...@gmail.com wrote:


 Log is showing the following exception

 DEBUG [ScheduledTasks:1] 2012-04-10 14:49:29,654 LoadBroadcaster.java
 (line 86) Disseminating load info ...
 DEBUG [Thrift:7] 2012-04-10 14:50:00,820 CustomTThreadPoolServer.java
 (line 197) Thrift transport error occurred during processing of message.
 org.apache.thrift.transport.TTransportException
 at
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
  at
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 at
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
  at
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
 at
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
 DEBUG [Thrift:7] 2012-04-10 14:50:00,820 ClientState.java (line 104)
 logged out: #User allow_all groups=[]

 On Tue, Apr 10, 2012 at 11:24 AM, Maki Watanabe watanabe.m...@gmail.com
  wrote:

 Check your cassandra log.
 If you can't find any interesting log, set cassandra log level
 to DEBUG and run your program again.

 maki

 2012/4/10 puneet loya puneetl...@gmail.com:
  hi,
 
  sorry i posted the port as 7000. I m using 9160 but still has the same
  error.
 
  Cannot read, Remote side has closed.
  Can u guess whats happening??
 
  On Tue, Apr 10, 2012 at 11:00 AM, Pierre Chalamet 
 pie...@chalamet.net
  wrote:
 
  hello,
 
  9160 is probably the port to use if you use the default config.
 
  - Pierre
 
  On Apr 10, 2012, at 7:26 AM, puneet loya puneetl...@gmail.com
 wrote:
 
   using System;
   using System.Collections.Generic;
   using System.Linq;
   using System.Text;
   using Thrift.Collections;
   using Thrift.Protocol;
   using Thrift.Transport;
   using Apache.Cassandra;
  
   namespace ConsoleApplication1
   {
   class Program
   {
   static void Main(string[] args)
   {
   TTransport transport=null;
   try
   {
   transport = new TBufferedTransport(new
   TSocket(127.0.0.1, 7000));
  
  
   //if(buffered)
   //trans = new TBufferedTransport(trans
 as
   TStreamTransport);
   //if (framed)
   //trans = new TFramedTransport(trans);
  
   TProtocol protocol = new
 TBinaryProtocol(transport);
   Cassandra.Client client = new
   Cassandra.Client(protocol);
  
   Console.WriteLine(Opening connection);
  
   if (!transport.IsOpen)
   transport.Open();
  
   client.describe_keyspace(abc);   //
   Crashing at this point
  
 }
   catch (Exception ex)
   {
   Console.WriteLine(ex.Message);
   }
   finally
   { if(transport!=null)
   transport.Close(); }
   Console.ReadLine();
   }
   }
   }
  
   I m trying to interact with cassandra server(database) from .net.
 For
   that i have referred two libraries i.e, apacheCassandra08.dll and
   thrift.dll.. In the following piece of code the connection is
 getting opened
   but when i m using client object it is giving an error stating
 Cannot read,
   Remote side has closed.
  
   Can any1 help me out with this? Has any1 faced the same prob?
  
  
 
 







-- 
http://twitter.com/tjake


Re: Write performance compared to Postgresql

2012-04-03 Thread Jake Luciani
Hi Jeff,

Writing serially over one connection will be slower. If you run many threads 
hitting the server at once you will see throughput improve. 

Jake

 

On Apr 3, 2012, at 7:08 AM, Jeff Williams je...@wherethebitsroam.com wrote:

 Hi,
 
 I am looking at cassandra for a logging application. We currently log to a 
 Postgresql database.
 
 I set up 2 cassandra servers for testing. I did a benchmark where I had 100 
 hashes representing logs entries, read from a json file. I then looped over 
 these to do 10,000 log inserts. I repeated the same writing to a postgresql 
 instance on one of the cassandra servers. The script is attached. The 
 cassandra writes appear to perform a lot worse. Is this expected?
 
 jeff@transcoder01:~$ ruby cassandra-bm.rb 
 cassandra
  3.17   0.48   3.65 ( 12.032212)
 jeff@transcoder01:~$ ruby cassandra-bm.rb 
 postgres
  2.14   0.33   2.47 (  7.002601)
 
 Regards,
 Jeff
 
 cassandra-bm.rb


Re: 2 questions DataStax Enterprise

2012-04-03 Thread Jake Luciani
Hi reply inline.

On Tue, Apr 3, 2012 at 12:18 PM, Alexandru Sicoe adsi...@gmail.com wrote:

 Hi guys,
  I'm trying out DSE and looking for the best way to arrange the cluster. I
 have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6
 outside the gateway that are supposed to take replicas from the other 3 and
 serve reads and analytics jobs.

 1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other
 6 nodes as analytics? Can I serve both real time reads and M/R jobs from
 the 6 nodes? How will these affect each other performancewise?


if you plan to use CFS heavily then it will affect performance of the other
nodes.  If you raise the RF of your column families then it should be fine
if you run mapreduce at CL=ONE



 I know that the way the system is supposed to be used is to separate
 analytics from real time queries. I've already explored a possible 3DC
 setup with Tyler in another message and it indeed works but I'm afraid it
 is too complex and would require me to send 2 replicas across the firewall
 which it can't handle very well at peak times, affecting other applications.

 2. I started the cluster in the setup described in 1 (3 normal, 6
 analytics) and as soon as the Analytics nodes start up they start
 outputting this message:

 INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629)
 Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already
 tried 10 time(s).
 

 So it seems my analytics nodes are trying to contact the normal Cassandra
 seed node on port 8012 which I read is a Hadoop Job Tracker client port.
 It doesn't seem like this is the normal behavior. Why is it getting
 confused? In the .yaml of each node I'm using endpoint_snitch:
 com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed
 node before the normal cassandra seed node in the seeds.



You can run dsetool movejt to move the jobtracker to one of the known
hadoop nodes.



 Cheers,
 Alex




-- 
http://twitter.com/tjake


Re: Row iteration using RandomPartitioner

2012-04-02 Thread Jake Luciani
Correct. Random partitioner order is md5 token order. If you make no changes 
you will get the same order

 

On Apr 2, 2012, at 7:53 AM, christopher-t...@ubs.com wrote:

 Hi,
 
 Bit of a silly question, is row iteration using the RandomPartitioner 
 deterministic?  I don't particularly care what the order is relative to the 
 row keys (obviously there isn't one, it's the RandomPartitioner), but if I 
 run a full iteration over all rows in a CF twice, assuming no underlying 
 changes to the CF in the meantime, will the rows be returned in the same 
 order both times?
 
 I assume so, as I don’t see how one could use get_range_slices to do this 
 otherwise, but I wanted to check.
 
 Visit our website at http://www.ubs.com 
 
 This message contains confidential information and is intended only 
 for the individual named. If you are not the named addressee you 
 should not disseminate, distribute or copy this e-mail. Please 
 notify the sender immediately by e-mail if you have received this 
 e-mail by mistake and delete this e-mail from your system. 
 
 E-mails are not encrypted and cannot be guaranteed to be secure or 
 error-free as information could be intercepted, corrupted, lost, 
 destroyed, arrive late or incomplete, or contain viruses. The sender 
 therefore does not accept liability for any errors or omissions in the 
 contents of this message which arise as a result of e-mail transmission. 
 If verification is required please request a hard-copy version. This 
 message is provided for informational purposes and should not be 
 construed as a solicitation or offer to buy or sell any securities 
 or related financial instruments. 
 
 UBS Limited is a company limited by shares incorporated in the United 
 Kingdom registered in England and Wales with number 2035362. 
 Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited 
 is authorised and regulated by the Financial Services Authority. 
 
 UBS AG is a public company incorporated with limited liability in 
 Switzerland domiciled in the Canton of Basel-City and the Canton of 
 Zurich respectively registered at the Commercial Registry offices in 
 those Cantons with Identification No: CH-270.3.004.646-4 and having 
 respective head offices at Aeschenvorstadt 1, 4051 Basel and 
 Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the 
 United Kingdom as a foreign company with No: FC021146 and having a 
 UK Establishment registered at Companies House, Cardiff, with No:  
 BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue, 
 London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and 
 regulated by the Financial Services Authority.
 
 UBS reserves the right to retain all messages. Messages are protected 
 and accessed only in legally justified cases.


Re: Is the wiki outdated regarding Hive support?

2012-04-01 Thread Jake Luciani
Hi Ben. That is still the repo. The code that ships with latest DSE is the 
hive-0.8.1-merge branch. 

We will try to get this into the Cassandra trunk asap. 

Jake

 

On Apr 1, 2012, at 6:39 PM, Ben McCann b...@benmccann.com wrote:

 The wiki says Hive support is currently a standalone project but will become 
 part of the main Cassandra source tree in the future. See 
 https://github.com/riptano/hive for details.  This seems outdated to me 
 since Datastax isn't planning any future updates to Brisk.  The closest thing 
 I've seen for Hive support is this Hive bug.  Should I update the wiki to 
 delete this statement or is it still accurate?
 
 Thanks,
 Ben
 
 


Re: How much has Cassandra improved from 0.8.6 to 1.0+?

2012-01-30 Thread Jake Luciani
Well as they say Lies, damned lies, and statistics  This is a alternate
comparison you can review:
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/

YCSB is a known and agreed upon benchmark.  The benchmark you link includes
no sourcecode to reproduce with and as the author mentions For Cassandra
this was single node cluster, for Mongo simply one server with no
replication. Cluster tests were run for functionality.

-Jake

On Mon, Jan 30, 2012 at 1:56 PM, Kevin klawso...@gmail.com wrote:

 I’m currently using 0.8.6 and want to know how much (performance wise),
 Cassandra has improved. Specifically read performance. This 
 benchmarkhttp://amesar.wordpress.com/2011/10/19/mongodb-vs-cassandra-benchmarks/here
  illustrates my concerns. I don’t know whether it was a fair comparison
 (especially since the conductor did not perform any tweaks or optimizations
 beforehand), but from all the resources I’ve read it seems that Cassandra
 still has quite a way to go before matching the read performance of MongoDB
 and some of the other NoSQL alternatives. 

 ** **

 Is this still true, and if so, how far down the line can we expect to see
 work on this specific area?




-- 
http://twitter.com/tjake


Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-18 Thread Jake Luciani
Thanks Andrei!

On Wed, Jan 18, 2012 at 8:00 AM, Andrei Savu savu.and...@gmail.com wrote:

 Hi guys,

 I just want to the let you know that  Apache Whirr trunk (the upcoming
 0.7.1 release) can deploy Cassandra 1.0.7 on AWS EC2  Rackspace Cloud.

 You can give it a try by running the following commands:
 https://gist.github.com/1632893

 And the last thing we would appreciate any suggestions on improving the
 deployment scripts or on improving Whirr.

 Thanks,

 -- Andrei Savu / andreisavu.ro




-- 
http://twitter.com/tjake


Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread Jake Luciani
I've written a locking mechanism for Solandra  (I refer to it as a
reservation system) which basically allows you to acquire a lock.  This is
used to ensure a node is service unique sequential IDs for lucene.

It sounds a bit similar to Dominic's description but I'll explain how the
Solandra one works.

The code is at
https://github.com/tjake/Solandra/blob/solandra/src/lucandra/cluster/CassandraIndexManager.java#L714

The algorithm is basically:

   - each node has a unique id.
   - a lock name is a row key
   - client writes to that row @ QUORUM a column name of it's ID with a TTL
of N seconds
   - client instantly reads back the entire row @ QUORUM
   - if client encounters a column that is non-expiring then the lock is
already acquired.
   - if client encounters a non-deleted but expiring column with a
timestamp  the one it wrote then it sleeps and tries again.
   - if clients own timestamp was the earliest then it has won the lock and
writes a non-expiring column of the same name to mark it as officially
locked.
   - in the case of a tie (2 columns with same ts the uuids are sorted and
the lesser one wins)
   - once finished, node with the lock deletes the column and frees the
lock.

This algorithm allows for deadlocks because the client has a huge number of
locks to work with.  It would be fairly simple to use a TTL again to make
locks auto expire after N seconds, this would make it more like google
chubby.

It also allows for bad clients to game the system but that's not something
that could be dealt with using authorization apis.

For legacy reasons the linked code uses super columns but a regular column
family will work just fine.

-Jake


On Mon, Dec 12, 2011 at 7:36 AM, Dominic Williams 
dwilli...@fightmymonster.com wrote:

 Hi guys, just thought I'd chip in...

 Fight My Monster is still using Cages, which is working fine, but...

 I'm looking at using Cassandra to replace Cages/ZooKeeper(!) There are 2
 main reasons:-

 1. Although a fast ZooKeeper cluster can handle a lot of load (we aren't
 getting anywhere near to capacity and we do a *lot* of serialisation) at
 some point it will be necessary to start hashing lock paths onto separate
 ZooKeeper clusters, and I tend to believe that these days you should choose
 platforms that handle sharding themselves (e.g. choose Cassandra rather
 than MySQL)

 2. Why have more components in your system when you can have less!!! KISS

 Recently I therefore tried to devise an algorithm which can be used to add
 a distributed locking layer to clients such as Pelops, Hector, Pycassa etc.

 There is a doc describing the algorithm, to which may be added an appendix
 describing a protocol so that locking can be interoperable between the
 clients. That could be extended to describe a protocol for transactions.
 Word of warning this is a *beta* algorithm that has only been seen by a
 select group so far, and therefore not even 100% sure it works but there is
 a useful general discussion regarding serialization of reads/writes so I
 include it anyway (and since this algorithm is going to be out there now,
 if there's anyone out there who fancies doing a Z proof or disproof, that
 would be fantastic).
 http://media.fightmymonster.com/Shared/docs/Wait%20Chain%20Algorithm.pdf

 Final word on this re transactions: if/when transactions are added to
 locking system in Pelops/Hector/Pycassa, Cassandra will provide better
 performance than ZooKeeper for storing snapshots, especially as transaction
 size increases

 Best, Dominic

 On 11 December 2011 01:53, Guy Incognito dnd1...@gmail.com wrote:

  you could try writing with the clock of the initial replay entry?

 On 06/12/2011 20:26, John Laban wrote:

 Ah, neat.  It is similar to what was proposed in (4) above with adding
 transactions to Cages, but instead of snapshotting the data to be rolled
 back (the before data), you snapshot the data to be replayed (the after
 data).  And then later, if you find that the transaction didn't complete,
 you just keep replaying the transaction until it takes.

  The part I don't understand with this approach though:  how do you
 ensure that someone else didn't change the data between your initial failed
 transaction and the later replaying of the transaction?  You could get lost
 writes in that situation.

  Dominic (in the Cages blog post) explained a workaround with that for
 his rollback proposal:  all subsequent readers or writers of that data
 would have to check for abandoned transactions and roll them back
 themselves before they could read the data.  I don't think this is possible
 with the XACT_LOG replay approach in these slides though, based on how
 the data is indexed (cassandra node token + timeUUID).


  PS:  How are you liking Cages?




 2011/12/6 Jérémy SEVELLEC jsevel...@gmail.com

 Hi John,

  I had exactly the same reflexions.

  I'm using zookeeper and cage to lock et isolate.

  but how to rollback?
 It's impossible so try replay!

  the idea is explained in 

Re: best practices for simulating transactions in Cassandra

2011-12-12 Thread Jake Luciani


 Jake:  The algorithm you've outlined is pretty similar to how Zookeeper
 clients implement locking.  The potential only issue that I see with it
 implemented in Cassandra is that it uses the timestamps of the inserted
 columns to determine the winner of the lock.  The column timestamps are
 generated by the clients (whose clocks can drift from each other), so its
 possible for a client (whose clock is skewed to some time in the near past)
 to accidentally steal a lock from another client who presently thinks
 that it is the winner of the lock.  At least it seems that way to me.


I don't see that. if a client wants to abuse the system or doesn't run NTP
then it can grab all the locks. but each lock is guaranteed to be owned by
one person. since the client timestamps are used to pick a winner, see
point 4 and 5

It inspects each column, that represents a different acquire attempt and
compares those timestamps.  so if client A is skewed in the past but
encounters a non-expiring column it knows the lock is taken.

-Jake



 Dominic:  I'll have to read-read your paper a few times (while furrowing
 my brow and scratching my head) before I can convince myself that the
 proposed algorithm doesn't have the possibility of deadlock or livelock.
  It does seem that you have covered a lot of the bases though.

 Thanks for sharing guys :)
 John


 On Mon, Dec 12, 2011 at 6:21 AM, Jake Luciani jak...@gmail.com wrote:

 I've written a locking mechanism for Solandra  (I refer to it as a
 reservation system) which basically allows you to acquire a lock.  This is
 used to ensure a node is service unique sequential IDs for lucene.

 It sounds a bit similar to Dominic's description but I'll explain how the
 Solandra one works.

 The code is at
 https://github.com/tjake/Solandra/blob/solandra/src/lucandra/cluster/CassandraIndexManager.java#L714

 The algorithm is basically:

- each node has a unique id.
- a lock name is a row key
- client writes to that row @ QUORUM a column name of it's ID with a
 TTL of N seconds
- client instantly reads back the entire row @ QUORUM
- if client encounters a column that is non-expiring then the lock is
 already acquired.
- if client encounters a non-deleted but expiring column with a
 timestamp  the one it wrote then it sleeps and tries again.
- if clients own timestamp was the earliest then it has won the lock
 and writes a non-expiring column of the same name to mark it as officially
 locked.
- in the case of a tie (2 columns with same ts the uuids are sorted
 and the lesser one wins)
- once finished, node with the lock deletes the column and frees the
 lock.

 This algorithm allows for deadlocks because the client has a huge number
 of locks to work with.  It would be fairly simple to use a TTL again to
 make locks auto expire after N seconds, this would make it more like google
 chubby.

 It also allows for bad clients to game the system but that's not
 something that could be dealt with using authorization apis.

 For legacy reasons the linked code uses super columns but a regular
 column family will work just fine.

 -Jake


 On Mon, Dec 12, 2011 at 7:36 AM, Dominic Williams 
 dwilli...@fightmymonster.com wrote:

 Hi guys, just thought I'd chip in...

 Fight My Monster is still using Cages, which is working fine, but...

 I'm looking at using Cassandra to replace Cages/ZooKeeper(!) There are 2
 main reasons:-

 1. Although a fast ZooKeeper cluster can handle a lot of load (we aren't
 getting anywhere near to capacity and we do a *lot* of serialisation) at
 some point it will be necessary to start hashing lock paths onto separate
 ZooKeeper clusters, and I tend to believe that these days you should choose
 platforms that handle sharding themselves (e.g. choose Cassandra rather
 than MySQL)

 2. Why have more components in your system when you can have less!!! KISS

 Recently I therefore tried to devise an algorithm which can be used to
 add a distributed locking layer to clients such as Pelops, Hector, Pycassa
 etc.

 There is a doc describing the algorithm, to which may be added an
 appendix describing a protocol so that locking can be interoperable between
 the clients. That could be extended to describe a protocol for
 transactions. Word of warning this is a *beta* algorithm that has only been
 seen by a select group so far, and therefore not even 100% sure it works
 but there is a useful general discussion regarding serialization of
 reads/writes so I include it anyway (and since this algorithm is going to
 be out there now, if there's anyone out there who fancies doing a Z proof
 or disproof, that would be fantastic).
 http://media.fightmymonster.com/Shared/docs/Wait%20Chain%20Algorithm.pdf

 Final word on this re transactions: if/when transactions are added to
 locking system in Pelops/Hector/Pycassa, Cassandra will provide better
 performance than ZooKeeper for storing snapshots, especially as transaction
 size increases

 Best, Dominic

  1   2   >