Re: Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics

2016-02-18 Thread daemeon reiydelle
Given you only have 16 columns vs. over 200 ... I would expect a
substantial improvement in writes, but not 5x.
Ditto reads. I would be interested to understand where that 5x comes from.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 18, 2016 at 8:20 PM, Chandra Sekar KR <
chandraseka...@hotmail.com> wrote:

> Hi,
>
>
> I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON
> (Text) data types in Cassandra & its ease of use/impact across other DSE
> products - Spark & Solr. We are migrating an OLTP database from RDBMS to
> Cassandra which has 200+ columns and with an average daily volume of 25
> million records/day. The access pattern is quite simple and in OLTP the
> access is always based on primary key. For OLAP, there are other access
> patterns with a combination of columns where we are planning to use Spark &
> Solr for search & analytical capabilities (in a separate DC).
>
>
> The average size of each record is ~2KB and the application workload is of
> type INSERT only (no updates/deletes). We conducted performance tests on
> two types of data models
>
> 1) A table with 200+ columns similar to RDBMS
>
> 2) A table with 15 columns where only critical business fields are
> maintained as key/value pairs and the remaining are stored in a single
> column of type TEXT as JSON object.
>
>
> In the results, we noticed significant advantage in the JSON model where
> the performance was 5X times better than columnar data model.
> Alternatively, we are in the process of evaluating performance for other
> data types - MAP & UDT instead of using TEXT for storing JSON object.
> Sample data model structure for columnar, json, map & udt types are given
> below:
>
>
>
>
> I would like to know the performance, transformation, compatibility &
> portability impacts & east-of-use of each of these data types from Search &
> Analytics perspective (Spark & Solr). I'm aware that we will have to use
> field transformers in Solr to use index on JSON fields, not sure about MAP
> & UDT. Any help on comparison of these data types in Spark & Solr is highly
> appreciated.
>
>
> Regards, KR
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Jan, thanks!  That makes perfect sense to run a second time before stopping
cassandra.  I'll add that in when I do the production cluster.

On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten  wrote:

> Hi Branton,
>
> two cents from me - I didnt look through the script, but for the rsyncs I
> do pretty much the same when moving them. Since they are immutable I do a
> first sync while everything is up and running to the new location which
> runs really long. Meanwhile new ones are created and I sync them again
> online, much less files to copy now. After that I shutdown the node and my
> last rsync now has to copy only a few files which is quite fast and so the
> downtime for that node is within minutes.
>
> Jan
>
>
>
> Von meinem iPhone gesendet
>
> Am 18.02.2016 um 22:12 schrieb Branton Davis :
>
> Alain, thanks for sharing!  I'm confused why you do so many repetitive
> rsyncs.  Just being cautious or is there another reason?  Also, why do you
> have --delete-before when you're copying data to a temp (assumed empty)
> directory?
>
> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ 
> wrote:
>
>> I did the process a few weeks ago and ended up writing a runbook and a
>> script. I have anonymised and share it fwiw.
>>
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>>
>> It is basic bash. I tried to have the shortest down time possible, making
>> this a bit more complex, but it allows you to do a lot in parallel and just
>> do a fast operation sequentially, reducing overall operation time.
>>
>> This worked fine for me, yet I might have make some errors while making
>> it configurable though variables. Be sure to be around if you decide to run
>> this. Also I automated this more by using knife (Chef), I hate to repeat
>> ops, this is something you might want to consider.
>>
>> Hope this is useful,
>>
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal :
>>
>>> Hey Branton,
>>>
>>> Please do let us know if you face any problems  doing this.
>>>
>>> Thanks
>>> anishek
>>>
>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>>> branton.da...@spanning.com> wrote:
>>>
 We're about to do the same thing.  It shouldn't be necessary to shut
 down the entire cluster, right?

 On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli 
 wrote:

>
>
> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal 
> wrote:
>>
>> To accomplish this can I just copy the data from disk1 to disk2 with
>> in the relevant cassandra home location folders, change the cassanda.yaml
>> configuration and restart the node. before starting i will shutdown the
>> cluster.
>>
>
> Yes.
>
> =Rob
>
>


>>>
>>
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Here's what I ended up doing on a test cluster.  It seemed to work well.
I'm running a full repair on the production cluster, probably over the
weekend, then I'll have a go at the test cluster again and go for broke.

# sync to temporary directory on original volume
rsync -azvuiP /var/data/cassandra_data2/ /var/data/cassandra/data2/

# check "before" size of data directory
du -sh /var/data/cassandra/data

# compare sizes
du -sh /var/data/cassandra_data2 && du -sh /var/data/cassandra/data2

service cassandra stop

# sync anything that changed before stop/drain completed
rsync -azvuiP /var/data/cassandra_data2/ /var/data/cassandra/data2/

# compare sizes
du -sh /var/data/cassandra_data2 && du -sh /var/data/cassandra/data2

# edit /usr/local/cassandra/conf/cassandra.yaml:
#  - remove /var/data/cassandra_data2 from data_file_directories

# sync files into real data directory
rsync -azvuiP /var/data/cassandra/data2/ /var/data/cassandra/data/

# check "after" size of data directory (should be size of
/var/data/cassandra_data2 plus "before" size)
du -sh /var/data/cassandra/data

# remove temporary directory
rm -Rf /var/data/cassandra/data2

# unmount second volume
umount /dev/xvdf

# In AWS console:
#  - detach sdf volume
#  - delete volume

# remove mount directory
rm -Rf /var/data/cassandra_data2/

# restart cassandra
service cassandra start

# run repair
/usr/local/cassandra/bin/nodetool repair -pr



On Thu, Feb 18, 2016 at 3:12 PM, Branton Davis 
wrote:

> Alain, thanks for sharing!  I'm confused why you do so many repetitive
> rsyncs.  Just being cautious or is there another reason?  Also, why do you
> have --delete-before when you're copying data to a temp (assumed empty)
> directory?
>
> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ 
> wrote:
>
>> I did the process a few weeks ago and ended up writing a runbook and a
>> script. I have anonymised and share it fwiw.
>>
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>>
>> It is basic bash. I tried to have the shortest down time possible, making
>> this a bit more complex, but it allows you to do a lot in parallel and just
>> do a fast operation sequentially, reducing overall operation time.
>>
>> This worked fine for me, yet I might have make some errors while making
>> it configurable though variables. Be sure to be around if you decide to run
>> this. Also I automated this more by using knife (Chef), I hate to repeat
>> ops, this is something you might want to consider.
>>
>> Hope this is useful,
>>
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal :
>>
>>> Hey Branton,
>>>
>>> Please do let us know if you face any problems  doing this.
>>>
>>> Thanks
>>> anishek
>>>
>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>>> branton.da...@spanning.com> wrote:
>>>
 We're about to do the same thing.  It shouldn't be necessary to shut
 down the entire cluster, right?

 On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli 
 wrote:

>
>
> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal 
> wrote:
>>
>> To accomplish this can I just copy the data from disk1 to disk2 with
>> in the relevant cassandra home location folders, change the cassanda.yaml
>> configuration and restart the node. before starting i will shutdown the
>> cluster.
>>
>
> Yes.
>
> =Rob
>
>


>>>
>>
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Jan Kesten
Hi Branton,

two cents from me - I didnt look through the script, but for the rsyncs I do 
pretty much the same when moving them. Since they are immutable I do a first 
sync while everything is up and running to the new location which runs really 
long. Meanwhile new ones are created and I sync them again online, much less 
files to copy now. After that I shutdown the node and my last rsync now has to 
copy only a few files which is quite fast and so the downtime for that node is 
within minutes.

Jan



Von meinem iPhone gesendet

> Am 18.02.2016 um 22:12 schrieb Branton Davis :
> 
> Alain, thanks for sharing!  I'm confused why you do so many repetitive 
> rsyncs.  Just being cautious or is there another reason?  Also, why do you 
> have --delete-before when you're copying data to a temp (assumed empty) 
> directory?
> 
>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ  wrote:
>> I did the process a few weeks ago and ended up writing a runbook and a 
>> script. I have anonymised and share it fwiw.
>> 
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>> 
>> It is basic bash. I tried to have the shortest down time possible, making 
>> this a bit more complex, but it allows you to do a lot in parallel and just 
>> do a fast operation sequentially, reducing overall operation time.
>> 
>> This worked fine for me, yet I might have make some errors while making it 
>> configurable though variables. Be sure to be around if you decide to run 
>> this. Also I automated this more by using knife (Chef), I hate to repeat 
>> ops, this is something you might want to consider.
>> 
>> Hope this is useful,
>> 
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>> 
>> The Last Pickle
>> http://www.thelastpickle.com
>> 
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal :
>>> Hey Branton,
>>> 
>>> Please do let us know if you face any problems  doing this.
>>> 
>>> Thanks
>>> anishek
>>> 
 On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis 
  wrote:
 We're about to do the same thing.  It shouldn't be necessary to shut down 
 the entire cluster, right?
 
> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli  
> wrote:
> 
> 
>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal  
>> wrote:
>> To accomplish this can I just copy the data from disk1 to disk2 with in 
>> the relevant cassandra home location folders, change the cassanda.yaml 
>> configuration and restart the node. before starting i will shutdown the 
>> cluster.
> 
> Yes.
> 
> =Rob
> 


Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics

2016-02-18 Thread Chandra Sekar KR
Hi,


I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON (Text) 
data types in Cassandra & its ease of use/impact across other DSE products - 
Spark & Solr. We are migrating an OLTP database from RDBMS to Cassandra which 
has 200+ columns and with an average daily volume of 25 million records/day. 
The access pattern is quite simple and in OLTP the access is always based on 
primary key. For OLAP, there are other access patterns with a combination of 
columns where we are planning to use Spark & Solr for search & analytical 
capabilities (in a separate DC).


The average size of each record is ~2KB and the application workload is of type 
INSERT only (no updates/deletes). We conducted performance tests on two types 
of data models

1) A table with 200+ columns similar to RDBMS

2) A table with 15 columns where only critical business fields are maintained 
as key/value pairs and the remaining are stored in a single column of type TEXT 
as JSON object.


In the results, we noticed significant advantage in the JSON model where the 
performance was 5X times better than columnar data model. Alternatively, we are 
in the process of evaluating performance for other data types - MAP & UDT 
instead of using TEXT for storing JSON object. Sample data model structure for 
columnar, json, map & udt types are given below:


[cid:9136e044-677b-4e0a-8bb2-5305acc2782d]


I would like to know the performance, transformation, compatibility & 
portability impacts & east-of-use of each of these data types from Search & 
Analytics perspective (Spark & Solr). I'm aware that we will have to use field 
transformers in Solr to use index on JSON fields, not sure about MAP & UDT. Any 
help on comparison of these data types in Spark & Solr is highly appreciated.


Regards, KR


Re: High Bloom filter false ratio

2016-02-18 Thread Anishek Agarwal
Hey all,

@Jaydeep here is the cfstats output from one node.

Read Count: 1721134722

Read Latency: 0.04268825050756254 ms.

Write Count: 56743880

Write Latency: 0.014650376727851532 ms.

Pending Tasks: 0

Table: user_stay_points

SSTable count: 1289

Space used (live), bytes: 122141272262

Space used (total), bytes: 224227850870

Off heap memory used (total), bytes: 653827528

SSTable Compression Ratio: 0.4959736121441446

Number of keys (estimate): 345137664

Memtable cell count: 339034

Memtable data size, bytes: 106558314

Memtable switch count: 3266

Local read count: 1721134803

Local read latency: 0.048 ms

Local write count: 56743898

Local write latency: 0.018 ms

Pending tasks: 0

Bloom filter false positives: 40664437

Bloom filter false ratio: 0.69058

Bloom filter space used, bytes: 493777336

Bloom filter off heap memory used, bytes: 493767024

Index summary off heap memory used, bytes: 91677192

Compression metadata off heap memory used, bytes: 68383312

Compacted partition minimum bytes: 104

Compacted partition maximum bytes: 1629722

Compacted partition mean bytes: 1773

Average live cells per slice (last five minutes): 0.0

Average tombstones per slice (last five minutes): 0.0


@Tyler Hobbs

we are using cassandra 2.0.15 so
https://issues.apache.org/jira/browse/CASSANDRA-8525  shouldnt occur. Other
problems looks like will be fixed in 3.0 .. we will mostly try and slot in
an upgrade to 3.x version towards second quarter of this year.


@Daemon

Latencies seem to have higher ratios, attached is the graph.


I am mostly trying to look at Bloom filters, because the way we do reads,
we read data with non existent partition keys and it seems to be taking
long to respond, like for 720 queries it takes 2 seconds, with all 721
queries not returning anything. the 720 queries are done in sequence of 180
queries each with 180 of them running in parallel.


thanks

anishek



On Fri, Feb 19, 2016 at 3:09 AM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> How many partition keys exists for the table which shows this problem (or
> provide nodetool cfstats for that table)?
>
> On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle 
> wrote:
>
>> The bloom filter buckets the values in a small number of buckets. I have
>> been surprised by how many cases I see with large cardinality where a few
>> values populate a given bloom leaf, resulting in high false positives, and
>> a surprising impact on latencies!
>>
>> Are you seeing 2:1 ranges between mean and worse case latencies (allowing
>> for gc times)?
>>
>> Daemeon Reiydelle
>> On Feb 18, 2016 8:57 AM, "Tyler Hobbs"  wrote:
>>
>>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>>
>>> Otherwise, it's possible that you're repeatedly querying one or two
>>> partitions that always trigger a bloom filter false positive.  You could
>>> try manually tracing a few queries on this table (for non-existent
>>> partitions) to see if the bloom filter rejects them.
>>>
>>> Depending on your Cassandra version, your false positive ratio could be
>>> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>>
>>> There are also a couple of recent improvements to bloom filters:
>>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>>
>>>
>>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal 
>>> wrote:
>>>
 Hello,

 We have a table with composite partition key with humungous
 cardinality, its a combination of (long,long). On the table we have
 bloom_filter_fp_chance=0.01.

 On doing "nodetool cfstats" on the 5 nodes we have in the cluster we
 are seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.

 I thought over time the bloom filter would adjust to the key space
 cardinality, we have been running the cluster for a long time now but have
 added significant traffic from Jan this year, which would not lead to
 writes in the db but would lead to high reads to see if are any values.

 Are there any settings that can be changed to allow better ratio.

 Thanks
 Anishek

>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax 
>>>
>>
>


Live upgrade 2.0 to 2.1 temporarily increases GC time causing timeouts and unavailability

2016-02-18 Thread Sotirios Delimanolis
We have a Cassandra cluster with 24 nodes. These nodes were running 2.0.16. 
While the nodes are in the ring and handling queries, we perform the upgrade to 
2.1.12 as follows (more or less) one node at a time:
   
   - Stop the Cassandra process
   - Deploy jars, scripts, binaries, etc.
   - Start the Cassandra process

A few nodes into the upgrade, we start noticing that the majority of queries 
(mostly through Thrift) time out or report unavailable. Looking at system 
information, Cassandra GC time goes through the roof, which is what we assume 
causes the time outs.
Once all nodes are upgraded, the cluster stabilizes and no more (barely any) 
time outs occur. 
What could explain this? Does it have anything to do with how a 2.0 
communicates with a 2.1?
Our Cassandra consumers haven't changed.






Re: High Bloom filter false ratio

2016-02-18 Thread Jaydeep Chovatia
How many partition keys exists for the table which shows this problem (or
provide nodetool cfstats for that table)?

On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle 
wrote:

> The bloom filter buckets the values in a small number of buckets. I have
> been surprised by how many cases I see with large cardinality where a few
> values populate a given bloom leaf, resulting in high false positives, and
> a surprising impact on latencies!
>
> Are you seeing 2:1 ranges between mean and worse case latencies (allowing
> for gc times)?
>
> Daemeon Reiydelle
> On Feb 18, 2016 8:57 AM, "Tyler Hobbs"  wrote:
>
>> You can try slightly lowering the bloom_filter_fp_chance on your table.
>>
>> Otherwise, it's possible that you're repeatedly querying one or two
>> partitions that always trigger a bloom filter false positive.  You could
>> try manually tracing a few queries on this table (for non-existent
>> partitions) to see if the bloom filter rejects them.
>>
>> Depending on your Cassandra version, your false positive ratio could be
>> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>>
>> There are also a couple of recent improvements to bloom filters:
>> * https://issues.apache.org/jira/browse/CASSANDRA-8413
>> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>>
>>
>> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal 
>> wrote:
>>
>>> Hello,
>>>
>>> We have a table with composite partition key with humungous cardinality,
>>> its a combination of (long,long). On the table we have
>>> bloom_filter_fp_chance=0.01.
>>>
>>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
>>> seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>>
>>> I thought over time the bloom filter would adjust to the key space
>>> cardinality, we have been running the cluster for a long time now but have
>>> added significant traffic from Jan this year, which would not lead to
>>> writes in the db but would lead to high reads to see if are any values.
>>>
>>> Are there any settings that can be changed to allow better ratio.
>>>
>>> Thanks
>>> Anishek
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Branton Davis
Alain, thanks for sharing!  I'm confused why you do so many repetitive
rsyncs.  Just being cautious or is there another reason?  Also, why do you
have --delete-before when you're copying data to a temp (assumed empty)
directory?

On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ  wrote:

> I did the process a few weeks ago and ended up writing a runbook and a
> script. I have anonymised and share it fwiw.
>
> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>
> It is basic bash. I tried to have the shortest down time possible, making
> this a bit more complex, but it allows you to do a lot in parallel and just
> do a fast operation sequentially, reducing overall operation time.
>
> This worked fine for me, yet I might have make some errors while making it
> configurable though variables. Be sure to be around if you decide to run
> this. Also I automated this more by using knife (Chef), I hate to repeat
> ops, this is something you might want to consider.
>
> Hope this is useful,
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal :
>
>> Hey Branton,
>>
>> Please do let us know if you face any problems  doing this.
>>
>> Thanks
>> anishek
>>
>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>> branton.da...@spanning.com> wrote:
>>
>>> We're about to do the same thing.  It shouldn't be necessary to shut
>>> down the entire cluster, right?
>>>
>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli 
>>> wrote:
>>>


 On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal 
 wrote:
>
> To accomplish this can I just copy the data from disk1 to disk2 with
> in the relevant cassandra home location folders, change the cassanda.yaml
> configuration and restart the node. before starting i will shutdown the
> cluster.
>

 Yes.

 =Rob


>>>
>>>
>>
>


Re: High Bloom filter false ratio

2016-02-18 Thread daemeon reiydelle
The bloom filter buckets the values in a small number of buckets. I have
been surprised by how many cases I see with large cardinality where a few
values populate a given bloom leaf, resulting in high false positives, and
a surprising impact on latencies!

Are you seeing 2:1 ranges between mean and worse case latencies (allowing
for gc times)?

Daemeon Reiydelle
On Feb 18, 2016 8:57 AM, "Tyler Hobbs"  wrote:

> You can try slightly lowering the bloom_filter_fp_chance on your table.
>
> Otherwise, it's possible that you're repeatedly querying one or two
> partitions that always trigger a bloom filter false positive.  You could
> try manually tracing a few queries on this table (for non-existent
> partitions) to see if the bloom filter rejects them.
>
> Depending on your Cassandra version, your false positive ratio could be
> inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525
>
> There are also a couple of recent improvements to bloom filters:
> * https://issues.apache.org/jira/browse/CASSANDRA-8413
> * https://issues.apache.org/jira/browse/CASSANDRA-9167
>
>
> On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal 
> wrote:
>
>> Hello,
>>
>> We have a table with composite partition key with humungous cardinality,
>> its a combination of (long,long). On the table we have
>> bloom_filter_fp_chance=0.01.
>>
>> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
>> seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>>
>> I thought over time the bloom filter would adjust to the key space
>> cardinality, we have been running the cluster for a long time now but have
>> added significant traffic from Jan this year, which would not lead to
>> writes in the db but would lead to high reads to see if are any values.
>>
>> Are there any settings that can be changed to allow better ratio.
>>
>> Thanks
>> Anishek
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Anuj Wadehra
Whats the GC overhead? Can you your share your GC collector and settings ?

Whats your query pattern? Do you use secondary indexes, batches, in clause etc?

Anuj

Sent from Yahoo Mail on Android 
 
  On Thu, 18 Feb, 2016 at 8:45 pm, Mike Heffner wrote:   
Alain,
Thanks for the suggestions.

Sure, tpstats are here: https://gist.github.com/mheffner/a979ae1a0304480b052a. 
Looking at the metrics across the ring, there were no blocked tasks nor dropped 
messages.
Iowait metrics look fine, so it doesn't appear to be blocking on disk. 
Similarly, there are no long GC pauses.
We haven't noticed latency on any particular table higher than others or 
correlated around the occurrence of a timeout. We have noticed with further 
testing that running cassandra-stress against the ring, while our workload is 
writing to the same ring, will incur similar 10 second timeouts. If our 
workload is not writing to the ring, cassandra stress will run without hitting 
timeouts. This seems to imply that our workload pattern is causing something to 
block cluster-wide, since the stress tool writes to a different keyspace then 
our workload.
I mentioned in another reply that we've tracked it to something between 2.0.x 
and 2.1.x, so we are focusing on narrowing which point release it was 
introduced in.
Cheers,
Mike
On Thu, Feb 18, 2016 at 3:33 AM, Alain RODRIGUEZ  wrote:

Hi Mike,
What about the output of tpstats ? I imagine you have dropped messages there. 
Any blocked threads ? Could you paste this output here ?
May this be due to some network hiccup to access the disks as they are EBS ? 
Can you think of anyway of checking this ? Do you have a lot of GC logs, how 
long are the pauses (use something like: grep -i 'GCInspector' 
/var/log/cassandra/system.log) ?
Something else you could check are local_writes stats to see if only one table 
if affected or this is keyspace / cluster wide. You can use metrics exposed by 
cassandra or if you have no dashboards I believe a: 'nodetool cfstats  | 
grep -e 'Table:' -e 'Local'' should give you a rough idea of local latencies.
Those are just things I would check, I have not a clue on what is happening 
here, hope this will help.
C*heers,-Alain RodriguezFrance
The Last Picklehttp://www.thelastpickle.com
2016-02-18 5:13 GMT+01:00 Mike Heffner :

Jaydeep,
No, we don't use any light weight transactions.
Mike
On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia  
wrote:

Are you guys using light weight transactions in your write path?
On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat  
wrote:

Are your commitlog and data on the same disk ? If yes, you should put
commitlogs on a separate disk which don't have a lot of IO.

Others IO may have great impact impact on your commitlog writing and
it may even block.

An example of impact IO may have, even for Async writes:
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

2016-02-11 0:31 GMT+01:00 Mike Heffner :
> Jeff,
>
> We have both commitlog and data on a 4TB EBS with 10k IOPS.
>
> Mike
>
> On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa 
> wrote:
>>
>> What disk size are you using?
>>
>>
>>
>> From: Mike Heffner
>> Reply-To: "user@cassandra.apache.org"
>> Date: Wednesday, February 10, 2016 at 2:24 PM
>> To: "user@cassandra.apache.org"
>> Cc: Peter Norton
>> Subject: Re: Debugging write timeouts on Cassandra 2.2.5
>>
>> Paulo,
>>
>> Thanks for the suggestion, we ran some tests against CMS and saw the same
>> timeouts. On that note though, we are going to try doubling the instance
>> sizes and testing with double the heap (even though current usage is low).
>>
>> Mike
>>
>> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta 
>> wrote:
>>>
>>> Are you using the same GC settings as the staging 2.0 cluster? If not,
>>> could you try using the default GC settings (CMS) and see if that changes
>>> anything? This is just a wild guess, but there were reports before of
>>> G1-caused instabilities with small heap sizes (< 16GB - see CASSANDRA-10403
>>> for more context). Please ignore if you already tried reverting back to CMS.
>>>
>>> 2016-02-10 16:51 GMT-03:00 Mike Heffner :

 Hi all,

 We've recently embarked on a project to update our Cassandra
 infrastructure running on EC2. We are long time users of 2.0.x and are
 testing out a move to version 2.2.5 running on VPC with EBS. Our test setup
 is a 3 node, RF=3 cluster supporting a small write load (mirror of our
 staging load).

 We are writing at QUORUM and while p95's look good compared to our
 staging 2.0.x cluster, we are seeing frequent write operations that time 
 out
 at the max write_request_timeout_in_ms (10 seconds). CPU across the cluster
 is < 10% and EBS write load is < 100 IOPS. Cassandra is running with the
 Oracle JDK 8u60 and we're using G1GC and any GC pauses are less than 500ms.

 We run on c4.2xl instances with GP2 EBS attached storage for data and
 com

Re: High Bloom filter false ratio

2016-02-18 Thread Tyler Hobbs
You can try slightly lowering the bloom_filter_fp_chance on your table.

Otherwise, it's possible that you're repeatedly querying one or two
partitions that always trigger a bloom filter false positive.  You could
try manually tracing a few queries on this table (for non-existent
partitions) to see if the bloom filter rejects them.

Depending on your Cassandra version, your false positive ratio could be
inaccurate: https://issues.apache.org/jira/browse/CASSANDRA-8525

There are also a couple of recent improvements to bloom filters:
* https://issues.apache.org/jira/browse/CASSANDRA-8413
* https://issues.apache.org/jira/browse/CASSANDRA-9167


On Thu, Feb 18, 2016 at 1:35 AM, Anishek Agarwal  wrote:

> Hello,
>
> We have a table with composite partition key with humungous cardinality,
> its a combination of (long,long). On the table we have
> bloom_filter_fp_chance=0.01.
>
> On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are
> seeing  "Bloom filter false ratio:" in the range of 0.7 -0.9.
>
> I thought over time the bloom filter would adjust to the key space
> cardinality, we have been running the cluster for a long time now but have
> added significant traffic from Jan this year, which would not lead to
> writes in the db but would lead to high reads to see if are any values.
>
> Are there any settings that can be changed to allow better ratio.
>
> Thanks
> Anishek
>



-- 
Tyler Hobbs
DataStax 


Re: „Using Timestamp“ Feature

2016-02-18 Thread Tyler Hobbs
2016-02-18 2:00 GMT-06:00 Matthias Niehoff 
:

>
> * is the 'using timestamp' feature (and providing statement timestamps)
> sufficiently robust and mature to build an application on?
>

Yes.  It's been there since the start of CQL3.


> * In a BatchedStatement, can different statements have different
> (explicitly provided) timestamps, or is the BatchedStatement's timestamp
> used for them all? Is this specified / stable behaviour?
>

Yes, you can separate timestamps per statement.  And, in fact, if you
potentially mix inserts and deletes on the same rows, you *should *use
explicit timestamps with different values.  See the timestamp notes here:
http://cassandra.apache.org/doc/cql3/CQL.html#batchStmt


> * cqhsh reports a syntax error when I use 'using timestamp' with an update
> statement (works with 'insert'). Is there a good reason for this, or is it
> a bug?
>

The "USING TIMESTAMP" goes in a different place in update statements.  It
should be something like:

UPDATE mytable USING TIMESTAMP ? SET col = ? WHERE key = ?


-- 
Tyler Hobbs
DataStax 


Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Mike Heffner
Alain,

Thanks for the suggestions.

Sure, tpstats are here:
https://gist.github.com/mheffner/a979ae1a0304480b052a. Looking at the
metrics across the ring, there were no blocked tasks nor dropped messages.

Iowait metrics look fine, so it doesn't appear to be blocking on disk.
Similarly, there are no long GC pauses.

We haven't noticed latency on any particular table higher than others or
correlated around the occurrence of a timeout. We have noticed with further
testing that running cassandra-stress against the ring, while our workload
is writing to the same ring, will incur similar 10 second timeouts. If our
workload is not writing to the ring, cassandra stress will run without
hitting timeouts. This seems to imply that our workload pattern is causing
something to block cluster-wide, since the stress tool writes to a
different keyspace then our workload.

I mentioned in another reply that we've tracked it to something between
2.0.x and 2.1.x, so we are focusing on narrowing which point release it was
introduced in.

Cheers,

Mike

On Thu, Feb 18, 2016 at 3:33 AM, Alain RODRIGUEZ  wrote:

> Hi Mike,
>
> What about the output of tpstats ? I imagine you have dropped messages
> there. Any blocked threads ? Could you paste this output here ?
>
> May this be due to some network hiccup to access the disks as they are EBS
> ? Can you think of anyway of checking this ? Do you have a lot of GC logs,
> how long are the pauses (use something like: grep -i 'GCInspector'
> /var/log/cassandra/system.log) ?
>
> Something else you could check are local_writes stats to see if only one
> table if affected or this is keyspace / cluster wide. You can use metrics
> exposed by cassandra or if you have no dashboards I believe a: 'nodetool
> cfstats  | grep -e 'Table:' -e 'Local'' should give you a rough idea
> of local latencies.
>
> Those are just things I would check, I have not a clue on what is
> happening here, hope this will help.
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-18 5:13 GMT+01:00 Mike Heffner :
>
>> Jaydeep,
>>
>> No, we don't use any light weight transactions.
>>
>> Mike
>>
>> On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Are you guys using light weight transactions in your write path?
>>>
>>> On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat <
>>> fabrice.faco...@gmail.com> wrote:
>>>
 Are your commitlog and data on the same disk ? If yes, you should put
 commitlogs on a separate disk which don't have a lot of IO.

 Others IO may have great impact impact on your commitlog writing and
 it may even block.

 An example of impact IO may have, even for Async writes:

 https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

 2016-02-11 0:31 GMT+01:00 Mike Heffner :
 > Jeff,
 >
 > We have both commitlog and data on a 4TB EBS with 10k IOPS.
 >
 > Mike
 >
 > On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa <
 jeff.ji...@crowdstrike.com>
 > wrote:
 >>
 >> What disk size are you using?
 >>
 >>
 >>
 >> From: Mike Heffner
 >> Reply-To: "user@cassandra.apache.org"
 >> Date: Wednesday, February 10, 2016 at 2:24 PM
 >> To: "user@cassandra.apache.org"
 >> Cc: Peter Norton
 >> Subject: Re: Debugging write timeouts on Cassandra 2.2.5
 >>
 >> Paulo,
 >>
 >> Thanks for the suggestion, we ran some tests against CMS and saw the
 same
 >> timeouts. On that note though, we are going to try doubling the
 instance
 >> sizes and testing with double the heap (even though current usage is
 low).
 >>
 >> Mike
 >>
 >> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta <
 pauloricard...@gmail.com>
 >> wrote:
 >>>
 >>> Are you using the same GC settings as the staging 2.0 cluster? If
 not,
 >>> could you try using the default GC settings (CMS) and see if that
 changes
 >>> anything? This is just a wild guess, but there were reports before
 of
 >>> G1-caused instabilities with small heap sizes (< 16GB - see
 CASSANDRA-10403
 >>> for more context). Please ignore if you already tried reverting
 back to CMS.
 >>>
 >>> 2016-02-10 16:51 GMT-03:00 Mike Heffner :
 
  Hi all,
 
  We've recently embarked on a project to update our Cassandra
  infrastructure running on EC2. We are long time users of 2.0.x and
 are
  testing out a move to version 2.2.5 running on VPC with EBS. Our
 test setup
  is a 3 node, RF=3 cluster supporting a small write load (mirror of
 our
  staging load).
 
  We are writing at QUORUM and while p95's look good compared to our
  staging 2.0.x cluster, we are seeing frequent write operations
 that time out
  at the ma

Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-18 Thread sai krishnam raju potturi
thanks a lot Alian. We did rely on "unsafeassasinate" earlier, which
worked. We were planning to upgrade from 2.0.14 version to 2.1.12, on all
our clusters.
  But we are trying to figure out why decommissioned nodes are showing up
in the "nodetool describecluster" as "UNREACHABLE".

thanks
Sai

On Wed, Feb 17, 2016 at 5:42 AM, Alain RODRIGUEZ  wrote:

> Hi,
>
> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>
>
> I believe this is the expected behavior, we keep some a trace of leaving
> nodes for a few days, this shouldn't be an issue for you
>
> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>
> This is a weird behaviour I haven't see for a while. You might want to dig
> this some more.
>
> Restarting the entire cluster,  everytime a node is decommissioned does
>> not seem right
>>
>
> Meanwhile, if you are sure the node is out and streams have ended, I guess
> it could be ok to use a JMX client (MX4J, JConsole...) and then use the JMX
> method Gossiper.unsafeAssassinateEndpoints(ip_address) to assassinate the
> gone node from any of the remaining nodes.
>
> How to -->
> http://tumblr.doki-pen.org/post/22654515359/assassinating-cassandra-nodes
> (3 years old post, I partially read it, but I think it might still be
> relevant)
>
> Has anybody experienced similar behaviour
>
>
> FTR, 3 years old similar issue I faced -->
> http://grokbase.com/t/cassandra/user/127knx7nn0/unreachable-node-not-in-nodetool-ring
>
> FWIW, people using C* = 3.x, this is exposed through nodetool -->
> https://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsAssassinate.html
>
> Keep in mind that something called 'unsafe' and 'assassinate' at the same
> time is not something you want to use in a regular decommissioning process
> as it drop the node with no file transfer, you basically totally lose a
> node (unless node is out already which seems to be your case, it should be
> safe to use it in your case). I only used it to fix gossip status in the
> past or at some point when forcing a removenode was not working, followed
> by full repairs on remaining nodes.
>
> C*heers,
> -
> Alain Rodriguez
> France
>
> The Last Pickle
> http://www.thelastpickle.com
>
> 2016-02-16 20:08 GMT+01:00 sai krishnam raju potturi 
> :
>
>> hi;
>> we have a 12 node cluster across 2 datacenters. We are currently
>> using cassandra 2.1.12 version.
>>
>> SNITCH : GossipingPropertyFileSnitch
>>
>> When we decommissioned few nodes in a particular datacenter and observed
>> the following :
>>
>> nodetool status shows only the live nodes in the cluster.
>>
>> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>>
>>
>> When the live nodes were restarted, "nodetool describecluster" shows
>> only the live nodes, which is expected.
>>
>> Purging the gossip info too did not help.
>>
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
>> 140213388002871593911508364312533329916,
>>  98576967436431350637134234839492449485] for /X.X.X.X
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [6977666116265389022494863106850615,
>> 111270759969411259938117902792984586225,
>> 138611464975439236357814418845450428175] for /X.X.X.X
>>
>> Has anybody experienced similar behaviour. Restarting the entire cluster,
>>  everytime a node is decommissioned does not seem right. Thanks in advance
>> for the help.
>>
>>
>> thanks
>> Sai
>>
>>
>>
>


Re: Re : decommissioned nodes shows up in "nodetool describecluster" as UNREACHABLE in 2.1.12 version

2016-02-18 Thread sai krishnam raju potturi
thank you Ben. We are using cassandra 2.1.12 version. We did face the bug
mentioned  https://issues.apache.org/jira/browse/CASSANDRA-10371 in DSE
4.6.7, in another cluster. It's strange we are seeing that even
in cassandra 2.1.12 version.

  The "nodetool describecluster" showing decommissioned nodes as
UNREACHABLE is something we are seeing for the first time.

thanks
Sai

On Wed, Feb 17, 2016 at 12:36 PM, Ben Bromhead  wrote:

> I'm not sure what version of Cassandra you are running so here is some
> general advice:
>
>- Gossip entries for decommissioned nodes will hang around for a few
>days to help catch up nodes in the case of a partition. This is why you see
>the decommissioned nodes listed as LEFT. This is intentional
>- If you keep seeing those entries in your logs and you are on 2.0.x,
>you might be impacted by
>https://issues.apache.org/jira/browse/CASSANDRA-10371. In this case
>upgrade to 2.1 or you can try the work arounds listed in the ticket.
>
> Ben
>
> On Tue, 16 Feb 2016 at 11:09 sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi;
>> we have a 12 node cluster across 2 datacenters. We are currently
>> using cassandra 2.1.12 version.
>>
>> SNITCH : GossipingPropertyFileSnitch
>>
>> When we decommissioned few nodes in a particular datacenter and observed
>> the following :
>>
>> nodetool status shows only the live nodes in the cluster.
>>
>> nodetool describecluster shows the decommissioned nodes as UNREACHABLE.
>>
>> nodetool gossipinfo shows the decommissioned nodes as "LEFT"
>>
>>
>> When the live nodes were restarted, "nodetool describecluster" shows
>> only the live nodes, which is expected.
>>
>> Purging the gossip info too did not help.
>>
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [125897680671740685543105407593050165202,
>> 140213388002871593911508364312533329916,
>>  98576967436431350637134234839492449485] for /X.X.X.X
>> INFO  17:27:07 InetAddress /X.X.X.X is now DOWN
>> INFO  17:27:07 Removing tokens [6977666116265389022494863106850615,
>> 111270759969411259938117902792984586225,
>> 138611464975439236357814418845450428175] for /X.X.X.X
>>
>> Has anybody experienced similar behaviour. Restarting the entire cluster,
>>  everytime a node is decommissioned does not seem right. Thanks in advance
>> for the help.
>>
>>
>> thanks
>> Sai
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Mike Heffner
Following up from our earlier post...

We have continued to do exhaustive testing and measuring of the numerous
hardware and configuration variables here. What we have uncovered is that
on identical hardware (including the configuration we run in production),
something between versions 2.0.17 and 2.1.13 introduced this write timeout
for our workload. We still aren't any closer to identifying the what or
why, but it is easily reproduced using our workload when we bump to the
2.1.x release line.

At the moment we are going to focus on hardening this new hardware
configuration using the 2.0.17 release and roll it out internally to some
of our production rings. We also want to bisect the 2.1.x release line to
find if there was a particular point release that introduced the timeout.
If anyone has suggestions for particular changes to look out for we'd be
happy to focus a test on that earlier.

Thanks,

Mike

On Wed, Feb 10, 2016 at 2:51 PM, Mike Heffner  wrote:

> Hi all,
>
> We've recently embarked on a project to update our Cassandra
> infrastructure running on EC2. We are long time users of 2.0.x and are
> testing out a move to version 2.2.5 running on VPC with EBS. Our test setup
> is a 3 node, RF=3 cluster supporting a small write load (mirror of our
> staging load).
>
> We are writing at QUORUM and while p95's look good compared to our staging
> 2.0.x cluster, we are seeing frequent write operations that time out at the
> max write_request_timeout_in_ms (10 seconds). CPU across the cluster is <
> 10% and EBS write load is < 100 IOPS. Cassandra is running with the Oracle
> JDK 8u60 and we're using G1GC and any GC pauses are less than 500ms.
>
> We run on c4.2xl instances with GP2 EBS attached storage for data and
> commitlog directories. The nodes are using EC2 enhanced networking and have
> the latest Intel network driver module. We are running on HVM instances
> using Ubuntu 14.04.2.
>
> Our schema is 5 tables, all with COMPACT STORAGE. Each table is similar to
> the definition here: https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a
>
> This is our cassandra.yaml:
> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml
>
> Like I mentioned we use 8u60 with G1GC and have used many of the GC
> settings in Al Tobey's tuning guide. This is our upstart config with JVM
> and other CPU settings:
> https://gist.github.com/mheffner/dc44613620b25c4fa46d
>
> We've used several of the sysctl settings from Al's guide as well:
> https://gist.github.com/mheffner/ea40d58f58a517028152
>
> Our client application is able to write using either Thrift batches using
> Asytanax driver or CQL async INSERT's using the Datastax Java driver.
>
> For testing against Thrift (our legacy infra uses this) we write batches
> of anywhere from 6 to 1500 rows at a time. Our p99 for batch execution is
> around 45ms but our maximum (p100) sits less than 150ms except when it
> periodically spikes to the full 10seconds.
>
> Testing the same write path using CQL writes instead demonstrates similar
> behavior. Low p99s except for periodic full timeouts. We enabled tracing
> for several operations but were unable to get a trace that completed
> successfully -- Cassandra started logging many messages as:
>
> INFO  [ScheduledTasks:1] - MessagingService.java:946 - _TRACE messages
> were dropped in last 5000 ms: 52499 for internal timeout and 0 for cross
> node timeout
>
> And all the traces contained rows with a "null" source_elapsed row:
> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out
>
>
> We've exhausted as many configuration option permutations that we can
> think of. This cluster does not appear to be under any significant load and
> latencies seem to largely fall in two bands: low normal or max timeout.
> This seems to imply that something is getting stuck and timing out at the
> max write timeout.
>
> Any suggestions on what to look for? We had debug enabled for awhile but
> we didn't see any msg that pointed to something obvious. Happy to provide
> any more information that may help.
>
> We are pretty much at the point of sprinkling debug around the code to
> track down what could be blocking.
>
>
> Thanks,
>
> Mike
>
> --
>
>   Mike Heffner 
>   Librato, Inc.
>
>


-- 

  Mike Heffner 
  Librato, Inc.


Re: How Cassandra reduce the size of stored data ?

2016-02-18 Thread Alain RODRIGUEZ
I know no paper, but here is some informations that might be of interest

 http://www.datastax.com/2015/12/storage-engine-30

Also Cassandra uses standard compression (LZ4, Snappy, Deflate) depending
on user choice


   - for data storage
   
https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsAboutConfigCompress.html
   - for commitlogs -->
   
https://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commonProps-ph
   - for internode communication -->
   
https://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commonProps-ph
   - for client / server communication -->
   
http://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/ProtocolOptions.Compression.html

C*heers,
-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-18 10:26 GMT+01:00 Thouraya TH :

> Hi all,
>
> Please, is there a scientific paper about this topic "How Cassandra reduce
> the size of stored data on nodes and exchanged between nodes"?
>
> Thank you so much for help.
> Best Regards.
>


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Alain RODRIGUEZ
I did the process a few weeks ago and ended up writing a runbook and a
script. I have anonymised and share it fwiw.

https://github.com/arodrime/cassandra-tools/tree/master/remove_disk

It is basic bash. I tried to have the shortest down time possible, making
this a bit more complex, but it allows you to do a lot in parallel and just
do a fast operation sequentially, reducing overall operation time.

This worked fine for me, yet I might have make some errors while making it
configurable though variables. Be sure to be around if you decide to run
this. Also I automated this more by using knife (Chef), I hate to repeat
ops, this is something you might want to consider.

Hope this is useful,

C*heers,
-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-18 8:28 GMT+01:00 Anishek Agarwal :

> Hey Branton,
>
> Please do let us know if you face any problems  doing this.
>
> Thanks
> anishek
>
> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis  > wrote:
>
>> We're about to do the same thing.  It shouldn't be necessary to shut down
>> the entire cluster, right?
>>
>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli 
>> wrote:
>>
>>>
>>>
>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal 
>>> wrote:

 To accomplish this can I just copy the data from disk1 to disk2 with in
 the relevant cassandra home location folders, change the cassanda.yaml
 configuration and restart the node. before starting i will shutdown the
 cluster.

>>>
>>> Yes.
>>>
>>> =Rob
>>>
>>>
>>
>>
>


How Cassandra reduce the size of stored data ?

2016-02-18 Thread Thouraya TH
Hi all,

Please, is there a scientific paper about this topic "How Cassandra reduce
the size of stored data on nodes and exchanged between nodes"?

Thank you so much for help.
Best Regards.


Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-18 Thread Alain RODRIGUEZ
Hi Mike,

What about the output of tpstats ? I imagine you have dropped messages
there. Any blocked threads ? Could you paste this output here ?

May this be due to some network hiccup to access the disks as they are EBS
? Can you think of anyway of checking this ? Do you have a lot of GC logs,
how long are the pauses (use something like: grep -i 'GCInspector'
/var/log/cassandra/system.log) ?

Something else you could check are local_writes stats to see if only one
table if affected or this is keyspace / cluster wide. You can use metrics
exposed by cassandra or if you have no dashboards I believe a: 'nodetool
cfstats  | grep -e 'Table:' -e 'Local'' should give you a rough idea
of local latencies.

Those are just things I would check, I have not a clue on what is happening
here, hope this will help.

C*heers,
-
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-02-18 5:13 GMT+01:00 Mike Heffner :

> Jaydeep,
>
> No, we don't use any light weight transactions.
>
> Mike
>
> On Wed, Feb 17, 2016 at 6:44 PM, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Are you guys using light weight transactions in your write path?
>>
>> On Thu, Feb 11, 2016 at 12:36 AM, Fabrice Facorat <
>> fabrice.faco...@gmail.com> wrote:
>>
>>> Are your commitlog and data on the same disk ? If yes, you should put
>>> commitlogs on a separate disk which don't have a lot of IO.
>>>
>>> Others IO may have great impact impact on your commitlog writing and
>>> it may even block.
>>>
>>> An example of impact IO may have, even for Async writes:
>>>
>>> https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
>>>
>>> 2016-02-11 0:31 GMT+01:00 Mike Heffner :
>>> > Jeff,
>>> >
>>> > We have both commitlog and data on a 4TB EBS with 10k IOPS.
>>> >
>>> > Mike
>>> >
>>> > On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa <
>>> jeff.ji...@crowdstrike.com>
>>> > wrote:
>>> >>
>>> >> What disk size are you using?
>>> >>
>>> >>
>>> >>
>>> >> From: Mike Heffner
>>> >> Reply-To: "user@cassandra.apache.org"
>>> >> Date: Wednesday, February 10, 2016 at 2:24 PM
>>> >> To: "user@cassandra.apache.org"
>>> >> Cc: Peter Norton
>>> >> Subject: Re: Debugging write timeouts on Cassandra 2.2.5
>>> >>
>>> >> Paulo,
>>> >>
>>> >> Thanks for the suggestion, we ran some tests against CMS and saw the
>>> same
>>> >> timeouts. On that note though, we are going to try doubling the
>>> instance
>>> >> sizes and testing with double the heap (even though current usage is
>>> low).
>>> >>
>>> >> Mike
>>> >>
>>> >> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta <
>>> pauloricard...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Are you using the same GC settings as the staging 2.0 cluster? If
>>> not,
>>> >>> could you try using the default GC settings (CMS) and see if that
>>> changes
>>> >>> anything? This is just a wild guess, but there were reports before of
>>> >>> G1-caused instabilities with small heap sizes (< 16GB - see
>>> CASSANDRA-10403
>>> >>> for more context). Please ignore if you already tried reverting back
>>> to CMS.
>>> >>>
>>> >>> 2016-02-10 16:51 GMT-03:00 Mike Heffner :
>>> 
>>>  Hi all,
>>> 
>>>  We've recently embarked on a project to update our Cassandra
>>>  infrastructure running on EC2. We are long time users of 2.0.x and
>>> are
>>>  testing out a move to version 2.2.5 running on VPC with EBS. Our
>>> test setup
>>>  is a 3 node, RF=3 cluster supporting a small write load (mirror of
>>> our
>>>  staging load).
>>> 
>>>  We are writing at QUORUM and while p95's look good compared to our
>>>  staging 2.0.x cluster, we are seeing frequent write operations that
>>> time out
>>>  at the max write_request_timeout_in_ms (10 seconds). CPU across the
>>> cluster
>>>  is < 10% and EBS write load is < 100 IOPS. Cassandra is running
>>> with the
>>>  Oracle JDK 8u60 and we're using G1GC and any GC pauses are less
>>> than 500ms.
>>> 
>>>  We run on c4.2xl instances with GP2 EBS attached storage for data
>>> and
>>>  commitlog directories. The nodes are using EC2 enhanced networking
>>> and have
>>>  the latest Intel network driver module. We are running on HVM
>>> instances
>>>  using Ubuntu 14.04.2.
>>> 
>>>  Our schema is 5 tables, all with COMPACT STORAGE. Each table is
>>> similar
>>>  to the definition here:
>>>  https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a
>>> 
>>>  This is our cassandra.yaml:
>>> 
>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml
>>> 
>>>  Like I mentioned we use 8u60 with G1GC and have used many of the GC
>>>  settings in Al Tobey's tuning guide. This is our upstart config
>>> with JVM and
>>>  other CPU settings:
>>> https://gist.github.com/mheffner/dc44613620b25c4fa46d
>>> 
>>>  We've used several of the sysctl settings from Al's guide as well:
>>>  https://gist.github.com/mhe

„Using Timestamp“ Feature

2016-02-18 Thread Matthias Niehoff
Hi,

I have a few questions regarding the „Using timestamp“ feature. I would be
glad if you can help me.

* is the 'using timestamp' feature (and providing statement timestamps)
sufficiently robust and mature to build an application on?
* In a BatchedStatement, can different statements have different
(explicitly provided) timestamps, or is the BatchedStatement's timestamp
used for them all? Is this specified / stable behaviour?
* cqhsh reports a syntax error when I use 'using timestamp' with an update
statement (works with 'insert'). Is there a good reason for this, or is it
a bug?

Thank You
-- 
Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
172.1702676
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
www.more4fi.de

Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
nicht gestattet