Re: How to Parse raw CQL text?

2018-02-05 Thread Kant Kodali
I just did some trial and error. Looks like this would work

public class Test {

public static void main(String[] args) throws Exception {

String stmt = "create table if not exists
test_keyspace.my_table (field1 text, field2 int, field3 set,
field4 map, primary key (field1) );";
ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
CqlLexer cqlLexer = new CqlLexer(stringStream);
CommonTokenStream token = new CommonTokenStream(cqlLexer);
CqlParser parser = new CqlParser(token);
ParsedStatement query = parser.cqlStatement();
if (query.getClass().getDeclaringClass() ==
CreateTableStatement.class) {
CreateTableStatement.RawStatement cts =
(CreateTableStatement.RawStatement) query;
CFMetaData
.compile(stmt, cts.keyspace())
.getColumnMetadata()
.values()
.stream()
.forEach(cd -> System.out.println(cd));

}

   }

}


On Mon, Feb 5, 2018 at 2:13 PM, Kant Kodali  wrote:

> Hi Anant,
>
> I just have CQL create table statement as a string I want to extract all
> the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
> ClusteringKey, Clustering Order and so on. Thats really  it!
>
> Thanks!
>
> On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh 
> wrote:
>
>> I think I understand what you are trying to do … but what is your goal?
>> What do you mean “use it for different” queries… Maybe you want to do an
>> event and have an event processor? Seems like you are trying to basically
>> by pass that pattern and parse a query and split it into several actions?
>>
>> Did you look into this unit test folder?
>>
>> https://github.com/apache/cassandra/blob/trunk/test/unit/
>> org/apache/cassandra/cql3/CQLTester.java
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali , wrote:
>>
>> Hi All,
>>
>> I have a need where I get a raw CQL create table statement as a String
>> and I need to parse the keyspace, tablename, columns and so on..so I can
>> use it for various queries and send it to C*. I used the example below
>> from this link . I get
>> the following error.  And I thought maybe someone in this mailing list will
>> be more familiar with internals.
>>
>> Exception in thread "main" 
>> org.apache.cassandra.exceptions.ConfigurationException:
>> Keyspace test_keyspace doesn't exist
>> at org.apache.cassandra.cql3.statements.CreateTableStatement$Ra
>> wStatement.prepare(CreateTableStatement.java:200)
>> at com.hello.world.Test.main(Test.java:23)
>>
>>
>> Here is my code.
>>
>> package com.hello.world;
>>
>> import org.antlr.runtime.ANTLRStringStream;
>> import org.antlr.runtime.CommonTokenStream;
>> import org.apache.cassandra.cql3.CqlLexer;
>> import org.apache.cassandra.cql3.CqlParser;
>> import org.apache.cassandra.cql3.statements.CreateTableStatement;
>> import org.apache.cassandra.cql3.statements.ParsedStatement;
>>
>> public class Test {
>>
>> public static void main(String[] args) throws Exception {
>> String stmt = "create table if not exists test_keyspace.my_table 
>> (field1 text, field2 int, field3 set, field4 map, 
>> primary key (field1) );";
>> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>> CqlLexer cqlLexer = new CqlLexer(stringStream);
>> CommonTokenStream token = new CommonTokenStream(cqlLexer);
>> CqlParser parser = new CqlParser(token);
>> ParsedStatement query = parser.query();
>> if (query.getClass().getDeclaringClass() == 
>> CreateTableStatement.class) {
>> CreateTableStatement.RawStatement cts = 
>> (CreateTableStatement.RawStatement) query;
>> System.out.println(cts.keyspace());
>> System.out.println(cts.columnFamily());
>> ParsedStatement.Prepared prepared = cts.prepare();
>> CreateTableStatement cts2 = (CreateTableStatement) 
>> prepared.statement;
>> cts2.getCFMetaData()
>> .getColumnMetadata()
>> .values()
>> .stream()
>> .forEach(cd -> System.out.println(cd));
>> }
>> }
>> }
>>
>> Thanks!
>>
>>
>


Re: How to Parse raw CQL text?

2018-02-05 Thread Kant Kodali
Hi Anant,

I just have CQL create table statement as a string I want to extract all
the parts like, tableName, KeySpaceName, regular Columns,  partitionKey,
ClusteringKey, Clustering Order and so on. Thats really  it!

Thanks!

On Mon, Feb 5, 2018 at 1:50 PM, Rahul Singh 
wrote:

> I think I understand what you are trying to do … but what is your goal?
> What do you mean “use it for different” queries… Maybe you want to do an
> event and have an event processor? Seems like you are trying to basically
> by pass that pattern and parse a query and split it into several actions?
>
> Did you look into this unit test folder?
>
> https://github.com/apache/cassandra/blob/trunk/test/
> unit/org/apache/cassandra/cql3/CQLTester.java
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 5, 2018, 4:06 PM -0500, Kant Kodali , wrote:
>
> Hi All,
>
> I have a need where I get a raw CQL create table statement as a String and
> I need to parse the keyspace, tablename, columns and so on..so I can use it
> for various queries and send it to C*. I used the example below from this
> link . I get the
> following error.  And I thought maybe someone in this mailing list will be
> more familiar with internals.
>
> Exception in thread "main" 
> org.apache.cassandra.exceptions.ConfigurationException:
> Keyspace test_keyspace doesn't exist
> at org.apache.cassandra.cql3.statements.CreateTableStatement$Ra
> wStatement.prepare(CreateTableStatement.java:200)
> at com.hello.world.Test.main(Test.java:23)
>
>
> Here is my code.
>
> package com.hello.world;
>
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
> import org.apache.cassandra.cql3.CqlLexer;
> import org.apache.cassandra.cql3.CqlParser;
> import org.apache.cassandra.cql3.statements.CreateTableStatement;
> import org.apache.cassandra.cql3.statements.ParsedStatement;
>
> public class Test {
>
> public static void main(String[] args) throws Exception {
> String stmt = "create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map, primary 
> key (field1) );";
> ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
> CqlLexer cqlLexer = new CqlLexer(stringStream);
> CommonTokenStream token = new CommonTokenStream(cqlLexer);
> CqlParser parser = new CqlParser(token);
> ParsedStatement query = parser.query();
> if (query.getClass().getDeclaringClass() == 
> CreateTableStatement.class) {
> CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
> System.out.println(cts.keyspace());
> System.out.println(cts.columnFamily());
> ParsedStatement.Prepared prepared = cts.prepare();
> CreateTableStatement cts2 = (CreateTableStatement) 
> prepared.statement;
> cts2.getCFMetaData()
> .getColumnMetadata()
> .values()
> .stream()
> .forEach(cd -> System.out.println(cd));
> }
> }
> }
>
> Thanks!
>
>


Re: Add column if it does not exist?

2018-02-05 Thread Rahul Singh
Yeah, you can handle the exception — what i meant that it wouldnt cause harm to 
the DB

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 5, 2018, 5:07 PM -0500, Oliver Ruebenacker , wrote:
> Well, it does throw an InvalidQueryException if the column already exists.
>
> > On Mon, Feb 5, 2018 at 4:44 PM, Rahul Singh  
> > wrote:
> > > Since CQL != SQL, there’s isnt a syntatical way. Just run the alter table 
> > > command and it shouldn't be an issue if its there.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 5, 2018, 4:15 PM -0500, Oliver Ruebenacker , 
> > > wrote:
> > > >
> > > >  Hello,
> > > >
> > > >   What's the easiest way to add a column to a table but only if it does 
> > > > not exist? Thanks!
> > > >
> > > >  Best, Oliver
> > > >
> > > > --
> > > > Oliver Ruebenacker
> > > > Senior Software Engineer, Diabetes Portal, Broad Institute
> > > >
>
>
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal, Broad Institute
>


Re: Add column if it does not exist?

2018-02-05 Thread Oliver Ruebenacker
Well, it does throw an InvalidQueryException if the column already exists.

On Mon, Feb 5, 2018 at 4:44 PM, Rahul Singh 
wrote:

> Since CQL != SQL, there’s isnt a syntatical way. Just run the alter table
> command and it shouldn't be an issue if its there.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 5, 2018, 4:15 PM -0500, Oliver Ruebenacker ,
> wrote:
>
>
>  Hello,
>
>   What's the easiest way to add a column to a table but only if it does
> not exist? Thanks!
>
>  Best, Oliver
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal
> , Broad Institute
> 
>
>


-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
, Broad Institute



Re: How to Parse raw CQL text?

2018-02-05 Thread Rahul Singh
I think I understand what you are trying to do … but what is your goal? What do 
you mean “use it for different” queries… Maybe you want to do an event and have 
an event processor? Seems like you are trying to basically by pass that pattern 
and parse a query and split it into several actions?

Did you look into this unit test folder?

https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/CQLTester.java

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 5, 2018, 4:06 PM -0500, Kant Kodali , wrote:
> Hi All,
>
> I have a need where I get a raw CQL create table statement as a String and I 
> need to parse the keyspace, tablename, columns and so on..so I can use it for 
> various queries and send it to C*. I used the example below from this link. I 
> get the following error.  And I thought maybe someone in this mailing list 
> will be more familiar with internals.
>
> Exception in thread "main" 
> org.apache.cassandra.exceptions.ConfigurationException: Keyspace 
> test_keyspace doesn't exist
> at 
> org.apache.cassandra.cql3.statements.CreateTableStatement$RawStatement.prepare(CreateTableStatement.java:200)
> at com.hello.world.Test.main(Test.java:23)
>
>
> Here is my code.
>
> package com.hello.world;
>
> import org.antlr.runtime.ANTLRStringStream;
> import org.antlr.runtime.CommonTokenStream;
> import org.apache.cassandra.cql3.CqlLexer;
> import org.apache.cassandra.cql3.CqlParser;
> import org.apache.cassandra.cql3.statements.CreateTableStatement;
> import org.apache.cassandra.cql3.statements.ParsedStatement;
>
> public class Test {
>
>public static void main(String[] args) throws Exception {
>String stmt = "create table if not exists test_keyspace.my_table 
> (field1 text, field2 int, field3 set, field4 map, primary 
> key (field1) );";
>ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
>CqlLexer cqlLexer = new CqlLexer(stringStream);
>CommonTokenStream token = new CommonTokenStream(cqlLexer);
>CqlParser parser = new CqlParser(token);
>ParsedStatement query = parser.query();
>if (query.getClass().getDeclaringClass() == 
> CreateTableStatement.class) {
>CreateTableStatement.RawStatement cts = 
> (CreateTableStatement.RawStatement) query;
>System.out.println(cts.keyspace());
>System.out.println(cts.columnFamily());
>ParsedStatement.Prepared prepared = cts.prepare();
>CreateTableStatement cts2 = (CreateTableStatement) 
> prepared.statement;
>cts2.getCFMetaData()
>.getColumnMetadata()
>.values()
>.stream()
>.forEach(cd -> System.out.println(cd));
>}
>}
> }
> Thanks!


Re: Add column if it does not exist?

2018-02-05 Thread Rahul Singh
Since CQL != SQL, there’s isnt a syntatical way. Just run the alter table 
command and it shouldn't be an issue if its there.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 5, 2018, 4:15 PM -0500, Oliver Ruebenacker , wrote:
>
>  Hello,
>
>   What's the easiest way to add a column to a table but only if it does not 
> exist? Thanks!
>
>  Best, Oliver
>
> --
> Oliver Ruebenacker
> Senior Software Engineer, Diabetes Portal, Broad Institute
>


Add column if it does not exist?

2018-02-05 Thread Oliver Ruebenacker
 Hello,

  What's the easiest way to add a column to a table but only if it does not
exist? Thanks!

 Best, Oliver

-- 
Oliver Ruebenacker
Senior Software Engineer, Diabetes Portal
, Broad Institute



How to Parse raw CQL text?

2018-02-05 Thread Kant Kodali
Hi All,

I have a need where I get a raw CQL create table statement as a String and
I need to parse the keyspace, tablename, columns and so on..so I can use it
for various queries and send it to C*. I used the example below from this
link . I get the following
error.  And I thought maybe someone in this mailing list will be more
familiar with internals.

Exception in thread "main"
org.apache.cassandra.exceptions.ConfigurationException:
Keyspace test_keyspace doesn't exist
at org.apache.cassandra.cql3.statements.CreateTableStatement$
RawStatement.prepare(CreateTableStatement.java:200)
at com.hello.world.Test.main(Test.java:23)


Here is my code.

package com.hello.world;

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.apache.cassandra.cql3.CqlLexer;
import org.apache.cassandra.cql3.CqlParser;
import org.apache.cassandra.cql3.statements.CreateTableStatement;
import org.apache.cassandra.cql3.statements.ParsedStatement;

public class Test {

public static void main(String[] args) throws Exception {
String stmt = "create table if not exists
test_keyspace.my_table (field1 text, field2 int, field3 set,
field4 map, primary key (field1) );";
ANTLRStringStream stringStream = new ANTLRStringStream(stmt);
CqlLexer cqlLexer = new CqlLexer(stringStream);
CommonTokenStream token = new CommonTokenStream(cqlLexer);
CqlParser parser = new CqlParser(token);
ParsedStatement query = parser.query();
if (query.getClass().getDeclaringClass() ==
CreateTableStatement.class) {
CreateTableStatement.RawStatement cts =
(CreateTableStatement.RawStatement) query;
System.out.println(cts.keyspace());
System.out.println(cts.columnFamily());
ParsedStatement.Prepared prepared = cts.prepare();
CreateTableStatement cts2 = (CreateTableStatement)
prepared.statement;
cts2.getCFMetaData()
.getColumnMetadata()
.values()
.stream()
.forEach(cd -> System.out.println(cd));
}
}
}

Thanks!


Re: Heavy one-off writes best practices

2018-02-05 Thread Romain Hardouin
  Hi Julien,
We have such a use case on some clusters. If you want to insert big batches at 
fast pace the only viable solution is to generate SSTables on Spark side and 
stream them to C*. Last time we benchmarked such a job we achieved 1.3 million 
partitions inserted per seconde on a 3 C* nodes test cluster - which is 
impossible with regular inserts.
Best,
Romain
Le lundi 5 février 2018 à 03:54:09 UTC+1, kurt greaves 
 a écrit :  
 
 
Would you know if there is evidence that inserting skinny rows in sorted order 
(no batching) helps C*?
This won't have any effect as each insert will be handled separately by the 
coordinator (or a different coordinator, even). Sorting is also very unlikely 
to help even if you did batch.

 Also, in the case of wide rows, is there evidence that sorting clustering keys 
within partition batches helps ease C*'s job?
No evidence, seems very unlikely. ​  

Re: Cassandra 2.1: replace running node without streaming

2018-02-05 Thread Oleksandr Shulgin
On Sat, Feb 3, 2018 at 11:23 AM, Kyrylo Lebediev 
wrote:

> Just tested on 3.11.1 and it worked for me (you may see the logs below).
>
> Just comprehended that there is one important prerequisite this method to
> work: new node MUST be located in the same rack (in terms of C*) as the old
> one. Otherwise correct replicas placement order will be violated (I mean
> when replicas of the same token range should be placed in different racks).
>

Correct.

Anyway, even having successful run of node replacement in sandbox I'm still
> in doubt.
>
> Just wondering why this procedure which seems to be much easier than
> [add/remove node] or [replace a node] which are documented ways for live
> node replacement, has never been included into documentation.
>
> Does anybody in the ML know the reason for this?
>

There are a number of reasons why one would need to replace a node.  Losing
a disk would be the most frequent one, I guess.  In that case using
replace_address is the way to go, since it allows you to avoid any
ownership changes.

At the same time on EC2 you might be replacing nodes in order to apply
security updates to your base machine image, etc.  In this case it is
possible to apply the described procedure to migrate the data to the new
node.  However, given that your nodes are small enough, simply using
replace_address seems like a more straightforward way to me.

Also, for some reason in his article Carlos drops files of system keyspace
> (which contains system.local table):
>
> In the new node, delete all system tables except for the schema ones. This
> will ensure that the new Cassandra node will not have any corrupt or
> previous configuration assigned.
>
>1. sudo cd /var/lib/cassandra/data/system && sudo ls | grep -v schema
>| xargs -I {} sudo rm -rf {}
>
>
Ah, this sounds like a wrong thing to do.  That would remove system.local
keyspace, which I expect makes the node forget its tokens.

I wouldn't do that: the node's state on disk should be just like after a
normal restart.

--
Alex


Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread Jeff Jirsa
Also: coordinator handles tracing and read repair. Make sure tracing is off for 
production. Have your data repaired if possible to eliminate that.

Use tracing to see what’s taking the time.

-- 
Jeff Jirsa


> On Feb 5, 2018, at 6:32 AM, Jeff Jirsa  wrote:
> 
> There’s two parts to latency on the Cassandra side:
> 
> Local and coordinator
> 
> When you read, the node to which you connect coordinates the request to the 
> node which has the data (potentially itself). Long tail in coordinator 
> latencies tend to be the coordinator itself gc’ing, which will happen from 
> time to time. If it’s more consistently high, it may be natural latencies in 
> your cluster (ie: your requests are going cross wan and the other dc is 
> 10-20ms away).
> 
> If the latency is seen in p99 but not p50, you can almost always 
> speculatively read from another coordinator (driver level speculation) after 
> a millisecond or so.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Feb 5, 2018, at 5:41 AM, mohsin k  wrote:
>> 
>> Thanks for response @Nicolas. I was considering the total read latency from 
>> the client to server (as shown in the image above) which is around 30ms. 
>> Which I want to get around 3ms (client and server are both on same network). 
>> I did not consider read latency provided by the server (which I should 
>> have). I monitored CPU , memory and JVM lifecycle, which is at a safe level. 
>> I think the difference(0.03 to 30) might be because of low network 
>> bandwidth, correct me if I am wrong.
>> 
>> I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable 
>> amount of difference, might be because there is less room for improvement on 
>> the server side.
>> 
>> Thanks again.
>> 
>>> On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar  
>>> wrote:
>>> Your row hit rate is 0.971 which is already very high, IMHO there is 
>>> "nothing" left to do here if you can afford to store your entire dataset in 
>>> memory
>>> 
>>> Local read latency: 0.030 ms already seems good to me, what makes you think 
>>> that you can achieve more with the relative "small" box you are using ? 
>>> 
>>> You have to keep an eye on other metrics which might be a limiting factor, 
>>> like cpu usage, JVM heap lifecycle and so on
>>> 
>>> For read heavy workflow it is sometimes advised to reduce 
>>> chunk_length_in_kb from the default 64kb to 4kb, see if it helps ! 
>>> 
 On 5 February 2018 at 13:09, mohsin k  wrote:
 Hey Rahul,
 
 Each partition has around 10 cluster keys. Based on nodetool, I can 
 roughly estimate partition size to be less than 1KB.
 
> On Mon, Feb 5, 2018 at 5:37 PM, mohsin k  
> wrote:
> Hey Nicolas,
> 
> My goal is to reduce latency as much as possible. I did wait for warmup. 
> The test ran for more than 15mins, I am not sure why it shows 2mins 
> though.
> 
> 
> 
>> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh 
>>  wrote:
>> What is the average size of your partitions / rows. 1GB may not be 
>> enough.
>> 
>> Rahul
>> 
>>> On Feb 5, 2018, 6:52 AM -0500, mohsin k , 
>>> wrote:
>>> Hi,
>>> 
>>> I have been looking into different configurations for tuning my 
>>> cassandra servers. So, initially I loadtested server using 
>>> cassandra-stress tool, with default configs and then tuning one by one 
>>> config to measure impact of change. First config, I tried was setting 
>>> "row_cache_size_in_mb" to 1000 (MB) in yaml, adding caching {'keys': 
>>> 'ALL', 'rows_per_partition': 'ALL'}. After changing these configs, I 
>>> observed that latency has increased rather than decreasing. It would be 
>>> really helpful if I get to understand why is this the case and what 
>>> steps must be taken to decrease the latency.
>>> 
>>> I am running a cluster with 4 nodes.
>>> 
>>> Following is my schema:
>>> 
>>> CREATE TABLE stresstest.user_to_segment (
>>> userid text,
>>> segmentid text,
>>> PRIMARY KEY (userid, segmentid)
>>> ) WITH CLUSTERING ORDER BY (segmentid DESC)
>>> AND bloom_filter_fp_chance = 0.1
>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>>> AND comment = 'A table to hold blog segment user relation'
>>> AND compaction = {'class': 
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>> AND compression = {'chunk_length_in_kb': '64', 'class': 
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND crc_check_chance = 1.0
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 864000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99PERCENTILE';
>>> 
>>> Following are node specs:
>>> RAM: 4

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread Jeff Jirsa
There’s two parts to latency on the Cassandra side:

Local and coordinator

When you read, the node to which you connect coordinates the request to the 
node which has the data (potentially itself). Long tail in coordinator 
latencies tend to be the coordinator itself gc’ing, which will happen from time 
to time. If it’s more consistently high, it may be natural latencies in your 
cluster (ie: your requests are going cross wan and the other dc is 10-20ms 
away).

If the latency is seen in p99 but not p50, you can almost always speculatively 
read from another coordinator (driver level speculation) after a millisecond or 
so.

-- 
Jeff Jirsa


> On Feb 5, 2018, at 5:41 AM, mohsin k  wrote:
> 
> Thanks for response @Nicolas. I was considering the total read latency from 
> the client to server (as shown in the image above) which is around 30ms. 
> Which I want to get around 3ms (client and server are both on same network). 
> I did not consider read latency provided by the server (which I should have). 
> I monitored CPU , memory and JVM lifecycle, which is at a safe level. I think 
> the difference(0.03 to 30) might be because of low network bandwidth, correct 
> me if I am wrong.
> 
> I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable 
> amount of difference, might be because there is less room for improvement on 
> the server side.
> 
> Thanks again.
> 
>> On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar  
>> wrote:
>> Your row hit rate is 0.971 which is already very high, IMHO there is 
>> "nothing" left to do here if you can afford to store your entire dataset in 
>> memory
>> 
>> Local read latency: 0.030 ms already seems good to me, what makes you think 
>> that you can achieve more with the relative "small" box you are using ? 
>> 
>> You have to keep an eye on other metrics which might be a limiting factor, 
>> like cpu usage, JVM heap lifecycle and so on
>> 
>> For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb 
>> from the default 64kb to 4kb, see if it helps ! 
>> 
>>> On 5 February 2018 at 13:09, mohsin k  wrote:
>>> Hey Rahul,
>>> 
>>> Each partition has around 10 cluster keys. Based on nodetool, I can roughly 
>>> estimate partition size to be less than 1KB.
>>> 
 On Mon, Feb 5, 2018 at 5:37 PM, mohsin k  wrote:
 Hey Nicolas,
 
 My goal is to reduce latency as much as possible. I did wait for warmup. 
 The test ran for more than 15mins, I am not sure why it shows 2mins though.
 
 
 
> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh 
>  wrote:
> What is the average size of your partitions / rows. 1GB may not be enough.
> 
> Rahul
> 
>> On Feb 5, 2018, 6:52 AM -0500, mohsin k , 
>> wrote:
>> Hi,
>> 
>> I have been looking into different configurations for tuning my 
>> cassandra servers. So, initially I loadtested server using 
>> cassandra-stress tool, with default configs and then tuning one by one 
>> config to measure impact of change. First config, I tried was setting 
>> "row_cache_size_in_mb" to 1000 (MB) in yaml, adding caching {'keys': 
>> 'ALL', 'rows_per_partition': 'ALL'}. After changing these configs, I 
>> observed that latency has increased rather than decreasing. It would be 
>> really helpful if I get to understand why is this the case and what 
>> steps must be taken to decrease the latency.
>> 
>> I am running a cluster with 4 nodes.
>> 
>> Following is my schema:
>> 
>> CREATE TABLE stresstest.user_to_segment (
>> userid text,
>> segmentid text,
>> PRIMARY KEY (userid, segmentid)
>> ) WITH CLUSTERING ORDER BY (segmentid DESC)
>> AND bloom_filter_fp_chance = 0.1
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>> AND comment = 'A table to hold blog segment user relation'
>> AND compaction = {'class': 
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': 
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>> 
>> Following are node specs:
>> RAM: 4GB
>> CPU: 4 Core
>> HDD: 250BG
>> 
>> 
>> Following is the output of 'nodetool info' after setting 
>> row_cache_size_in_mb:
>> 
>> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
>> Gossip active  : true
>> Thrift active  : false
>> Native Transport active: true
>> Load   : 10.94 MiB
>> Generation No  : 15

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread Rahul Singh
What are the tbl Local read latency stats vs. the read request latency stats ?


Rahul

On Feb 5, 2018, 8:41 AM -0500, mohsin k , wrote:
> Thanks for response @Nicolas. I was considering the total read latency from 
> the client to server (as shown in the image above) which is around 30ms. 
> Which I want to get around 3ms (client and server are both on same network). 
> I did not consider read latency provided by the server (which I should have). 
> I monitored CPU , memory and JVM lifecycle, which is at a safe level. I think 
> the difference(0.03 to 30) might be because of low network bandwidth, correct 
> me if I am wrong.
>
> I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable 
> amount of difference, might be because there is less room for improvement on 
> the server side.
>
> Thanks again.
>
> > On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar  
> > wrote:
> > > Your row hit rate is 0.971 which is already very high, IMHO there is 
> > > "nothing" left to do here if you can afford to store your entire dataset 
> > > in memory
> > >
> > > Local read latency: 0.030 ms already seems good to me, what makes you 
> > > think that you can achieve more with the relative "small" box you are 
> > > using ?
> > >
> > > You have to keep an eye on other metrics which might be a limiting 
> > > factor, like cpu usage, JVM heap lifecycle and so on
> > >
> > > For read heavy workflow it is sometimes advised to reduce 
> > > chunk_length_in_kb from the default 64kb to 4kb, see if it helps !
> > >
> > > > On 5 February 2018 at 13:09, mohsin k  wrote:
> > > > > Hey Rahul,
> > > > >
> > > > > Each partition has around 10 cluster keys. Based on nodetool, I can 
> > > > > roughly estimate partition size to be less than 1KB.
> > > > >
> > > > > > On Mon, Feb 5, 2018 at 5:37 PM, mohsin k 
> > > > > >  wrote:
> > > > > > > Hey Nicolas,
> > > > > > >
> > > > > > > My goal is to reduce latency as much as possible. I did wait for 
> > > > > > > warmup. The test ran for more than 15mins, I am not sure why it 
> > > > > > > shows 2mins though.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh 
> > > > > > > >  wrote:
> > > > > > > > > What is the average size of your partitions / rows. 1GB may 
> > > > > > > > > not be enough.
> > > > > > > > >
> > > > > > > > > Rahul
> > > > > > > > >
> > > > > > > > > On Feb 5, 2018, 6:52 AM -0500, mohsin k 
> > > > > > > > > , wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I have been looking into different configurations for 
> > > > > > > > > > tuning my cassandra servers. So, initially I loadtested 
> > > > > > > > > > server using cassandra-stress tool, with default configs 
> > > > > > > > > > and then tuning one by one config to measure impact of 
> > > > > > > > > > change. First config, I tried was setting 
> > > > > > > > > > "row_cache_size_in_mb" to 1000 (MB) in yaml, adding caching 
> > > > > > > > > > {'keys': 'ALL', 'rows_per_partition': 'ALL'}. After 
> > > > > > > > > > changing these configs, I observed that latency has 
> > > > > > > > > > increased rather than decreasing. It would be really 
> > > > > > > > > > helpful if I get to understand why is this the case and 
> > > > > > > > > > what steps must be taken to decrease the latency.
> > > > > > > > > >
> > > > > > > > > > I am running a cluster with 4 nodes.
> > > > > > > > > >
> > > > > > > > > > Following is my schema:
> > > > > > > > > >
> > > > > > > > > > CREATE TABLE stresstest.user_to_segment (
> > > > > > > > > >     userid text,
> > > > > > > > > >     segmentid text,
> > > > > > > > > >     PRIMARY KEY (userid, segmentid)
> > > > > > > > > > ) WITH CLUSTERING ORDER BY (segmentid DESC)
> > > > > > > > > >     AND bloom_filter_fp_chance = 0.1
> > > > > > > > > >     AND caching = {'keys': 'ALL', 'rows_per_partition': 
> > > > > > > > > > 'ALL'}
> > > > > > > > > >     AND comment = 'A table to hold blog segment user 
> > > > > > > > > > relation'
> > > > > > > > > >     AND compaction = {'class': 
> > > > > > > > > > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
> > > > > > > > > >     AND compression = {'chunk_length_in_kb': '64', 'class': 
> > > > > > > > > > 'org.apache.cassandra.io.compress.LZ4Compressor'}
> > > > > > > > > >     AND crc_check_chance = 1.0
> > > > > > > > > >     AND dclocal_read_repair_chance = 0.1
> > > > > > > > > >     AND default_time_to_live = 0
> > > > > > > > > >     AND gc_grace_seconds = 864000
> > > > > > > > > >     AND max_index_interval = 2048
> > > > > > > > > >     AND memtable_flush_period_in_ms = 0
> > > > > > > > > >     AND min_index_interval = 128
> > > > > > > > > >     AND read_repair_chance = 0.0
> > > > > > > > > >     AND speculative_retry = '99PERCENTILE';
> > > > > > > > > >
> > > > > > > > > > Following are node specs:
> > > > > > > > > > RAM: 4GB
> > > > > > > > > > CPU: 4 Core
> > > > > > > > > > HDD: 250BG
> > > > > > > > > >
> > > > > > > > > >
> 

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread mohsin k
Thanks for response @Nicolas. I was considering the total read latency from
the client to server (as shown in the image above) which is around 30ms.
Which I want to get around 3ms (client and server are both on same
network). I did not consider read latency provided by the server (which I
should have). I monitored CPU , memory and JVM lifecycle, which is at a
safe level. *I think the difference(0.03 to 30) might be because of low
network bandwidth, correct me if I am wrong.*

I did reduce chunk_length_in_kb to 4kb, but I couldn't get a considerable
amount of difference, might be because there is less room for improvement
on the server side.

Thanks again.

On Mon, Feb 5, 2018 at 6:52 PM, Nicolas Guyomar 
wrote:

> Your row hit rate is 0.971 which is already very high, IMHO there is
> "nothing" left to do here if you can afford to store your entire dataset in
> memory
>
> Local read latency: 0.030 ms already seems good to me, what makes you
> think that you can achieve more with the relative "small" box you are using
> ?
>
> You have to keep an eye on other metrics which might be a limiting factor,
> like cpu usage, JVM heap lifecycle and so on
>
> For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb
> from the default 64kb to 4kb, see if it helps !
>
> On 5 February 2018 at 13:09, mohsin k  wrote:
>
>> Hey Rahul,
>>
>> Each partition has around 10 cluster keys. Based on nodetool, I can
>> roughly estimate partition size to be less than 1KB.
>>
>> On Mon, Feb 5, 2018 at 5:37 PM, mohsin k 
>> wrote:
>>
>>> Hey Nicolas,
>>>
>>> My goal is to reduce latency as much as possible. I did wait for warmup.
>>> The test ran for more than 15mins, I am not sure why it shows 2mins though.
>>>
>>>
>>>
>>> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh <
>>> rahul.xavier.si...@gmail.com> wrote:
>>>
 What is the average size of your partitions / rows. 1GB may not be
 enough.

 Rahul

 On Feb 5, 2018, 6:52 AM -0500, mohsin k ,
 wrote:

 Hi,

 I have been looking into different configurations for tuning my
 cassandra servers. So, initially I loadtested server using cassandra-stress
 tool, with default configs and then tuning one by one config to measure
 impact of change. First config, I tried was setting "
 *row_cache_size_in_mb*" to 1000 (MB) in yaml, adding caching {'keys':
 'ALL', *'rows_per_partition': 'ALL'*}. After changing these configs, I
 observed that latency has increased rather than decreasing. It would be
 really helpful if I get to understand why is this the case and what steps
 must be taken to decrease the latency.

 I am running a cluster with 4 nodes.

 Following is my schema:

 CREATE TABLE stresstest.user_to_segment (
 userid text,
 segmentid text,
 PRIMARY KEY (userid, segmentid)
 ) WITH CLUSTERING ORDER BY (segmentid DESC)
 AND bloom_filter_fp_chance = 0.1
 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
 AND comment = 'A table to hold blog segment user relation'
 AND compaction = {'class': 'org.apache.cassandra.db.compa
 ction.LeveledCompactionStrategy'}
 AND compression = {'chunk_length_in_kb': '64', 'class': '
 org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99PERCENTILE';

 Following are node specs:
 RAM: 4GB
 CPU: 4 Core
 HDD: 250BG


 Following is the output of 'nodetool info' after setting
 row_cache_size_in_mb:

 ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
 Gossip active  : true
 Thrift active  : false
 Native Transport active: true
 Load   : 10.94 MiB
 Generation No  : 1517571163
 Uptime (seconds)   : 9169
 Heap Memory (MB)   : 136.01 / 3932.00
 Off Heap Memory (MB)   : 0.10
 Data Center: dc1
 Rack   : rack1
 Exceptions : 0
 Key Cache  : entries 125881, size 9.6 MiB, capacity 100
 MiB, 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in
 seconds
 Row Cache  : entries 125861, size 31.54 MiB, capacity 1000
 MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save
 period in seconds
 Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0
 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
 Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB,
 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread Nicolas Guyomar
Your row hit rate is 0.971 which is already very high, IMHO there is
"nothing" left to do here if you can afford to store your entire dataset in
memory

Local read latency: 0.030 ms already seems good to me, what makes you think
that you can achieve more with the relative "small" box you are using ?

You have to keep an eye on other metrics which might be a limiting factor,
like cpu usage, JVM heap lifecycle and so on

For read heavy workflow it is sometimes advised to reduce chunk_length_in_kb
from the default 64kb to 4kb, see if it helps !

On 5 February 2018 at 13:09, mohsin k  wrote:

> Hey Rahul,
>
> Each partition has around 10 cluster keys. Based on nodetool, I can
> roughly estimate partition size to be less than 1KB.
>
> On Mon, Feb 5, 2018 at 5:37 PM, mohsin k 
> wrote:
>
>> Hey Nicolas,
>>
>> My goal is to reduce latency as much as possible. I did wait for warmup.
>> The test ran for more than 15mins, I am not sure why it shows 2mins though.
>>
>>
>>
>> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh > > wrote:
>>
>>> What is the average size of your partitions / rows. 1GB may not be
>>> enough.
>>>
>>> Rahul
>>>
>>> On Feb 5, 2018, 6:52 AM -0500, mohsin k ,
>>> wrote:
>>>
>>> Hi,
>>>
>>> I have been looking into different configurations for tuning my
>>> cassandra servers. So, initially I loadtested server using cassandra-stress
>>> tool, with default configs and then tuning one by one config to measure
>>> impact of change. First config, I tried was setting "
>>> *row_cache_size_in_mb*" to 1000 (MB) in yaml, adding caching {'keys':
>>> 'ALL', *'rows_per_partition': 'ALL'*}. After changing these configs, I
>>> observed that latency has increased rather than decreasing. It would be
>>> really helpful if I get to understand why is this the case and what steps
>>> must be taken to decrease the latency.
>>>
>>> I am running a cluster with 4 nodes.
>>>
>>> Following is my schema:
>>>
>>> CREATE TABLE stresstest.user_to_segment (
>>> userid text,
>>> segmentid text,
>>> PRIMARY KEY (userid, segmentid)
>>> ) WITH CLUSTERING ORDER BY (segmentid DESC)
>>> AND bloom_filter_fp_chance = 0.1
>>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>>> AND comment = 'A table to hold blog segment user relation'
>>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>>> ction.LeveledCompactionStrategy'}
>>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>>> org.apache.cassandra.io.compress.LZ4Compressor'}
>>> AND crc_check_chance = 1.0
>>> AND dclocal_read_repair_chance = 0.1
>>> AND default_time_to_live = 0
>>> AND gc_grace_seconds = 864000
>>> AND max_index_interval = 2048
>>> AND memtable_flush_period_in_ms = 0
>>> AND min_index_interval = 128
>>> AND read_repair_chance = 0.0
>>> AND speculative_retry = '99PERCENTILE';
>>>
>>> Following are node specs:
>>> RAM: 4GB
>>> CPU: 4 Core
>>> HDD: 250BG
>>>
>>>
>>> Following is the output of 'nodetool info' after setting
>>> row_cache_size_in_mb:
>>>
>>> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
>>> Gossip active  : true
>>> Thrift active  : false
>>> Native Transport active: true
>>> Load   : 10.94 MiB
>>> Generation No  : 1517571163
>>> Uptime (seconds)   : 9169
>>> Heap Memory (MB)   : 136.01 / 3932.00
>>> Off Heap Memory (MB)   : 0.10
>>> Data Center: dc1
>>> Rack   : rack1
>>> Exceptions : 0
>>> Key Cache  : entries 125881, size 9.6 MiB, capacity 100 MiB,
>>> 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in
>>> seconds
>>> Row Cache  : entries 125861, size 31.54 MiB, capacity 1000
>>> MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save
>>> period in seconds
>>> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0
>>> hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
>>> Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB,
>>> 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss
>>> latency
>>> Percent Repaired   : 100.0%
>>> Token  : (invoke with -T/--tokens to see all 256 tokens)
>>>
>>>
>>> Following is output of nodetool cfstats:
>>>
>>> Total number of tables: 37
>>> 
>>> Keyspace : stresstest
>>> Read Count: 4398162
>>> Read Latency: 0.02184742626579012 ms.
>>> Write Count: 0
>>> Write Latency: NaN ms.
>>> Pending Flushes: 0
>>> Table: user_to_segment
>>> SSTable count: 1
>>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
>>> Space used (live): 11076103
>>> Space used (total): 11076103
>>> Space used by snapshots (total): 0
>>> Off heap memory used (total): 107981
>>> SSTable Compression Ratio: 0.5123353861375962
>>> Number of partitions (estimate): 125782
>>> Memtable cell count: 0
>>> Memtable data size: 0
>>> Memtable off heap memory used: 0
>>> Memtable switch count: 2
>>> Local read cou

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread mohsin k
Hey Rahul,

Each partition has around 10 cluster keys. Based on nodetool, I can roughly
estimate partition size to be less than 1KB.

On Mon, Feb 5, 2018 at 5:37 PM, mohsin k  wrote:

> Hey Nicolas,
>
> My goal is to reduce latency as much as possible. I did wait for warmup.
> The test ran for more than 15mins, I am not sure why it shows 2mins though.
>
>
>
> On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh 
> wrote:
>
>> What is the average size of your partitions / rows. 1GB may not be enough.
>>
>> Rahul
>>
>> On Feb 5, 2018, 6:52 AM -0500, mohsin k ,
>> wrote:
>>
>> Hi,
>>
>> I have been looking into different configurations for tuning my cassandra
>> servers. So, initially I loadtested server using cassandra-stress tool,
>> with default configs and then tuning one by one config to measure impact of
>> change. First config, I tried was setting "*row_cache_size_in_mb*" to
>> 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition':
>> 'ALL'*}. After changing these configs, I observed that latency has
>> increased rather than decreasing. It would be really helpful if I get to
>> understand why is this the case and what steps must be taken to decrease
>> the latency.
>>
>> I am running a cluster with 4 nodes.
>>
>> Following is my schema:
>>
>> CREATE TABLE stresstest.user_to_segment (
>> userid text,
>> segmentid text,
>> PRIMARY KEY (userid, segmentid)
>> ) WITH CLUSTERING ORDER BY (segmentid DESC)
>> AND bloom_filter_fp_chance = 0.1
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>> AND comment = 'A table to hold blog segment user relation'
>> AND compaction = {'class': 'org.apache.cassandra.db.compa
>> ction.LeveledCompactionStrategy'}
>> AND compression = {'chunk_length_in_kb': '64', 'class': '
>> org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> Following are node specs:
>> RAM: 4GB
>> CPU: 4 Core
>> HDD: 250BG
>>
>>
>> Following is the output of 'nodetool info' after setting
>> row_cache_size_in_mb:
>>
>> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
>> Gossip active  : true
>> Thrift active  : false
>> Native Transport active: true
>> Load   : 10.94 MiB
>> Generation No  : 1517571163
>> Uptime (seconds)   : 9169
>> Heap Memory (MB)   : 136.01 / 3932.00
>> Off Heap Memory (MB)   : 0.10
>> Data Center: dc1
>> Rack   : rack1
>> Exceptions : 0
>> Key Cache  : entries 125881, size 9.6 MiB, capacity 100 MiB,
>> 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in
>> seconds
>> Row Cache  : entries 125861, size 31.54 MiB, capacity 1000
>> MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save
>> period in seconds
>> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0
>> hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
>> Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB,
>> 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss
>> latency
>> Percent Repaired   : 100.0%
>> Token  : (invoke with -T/--tokens to see all 256 tokens)
>>
>>
>> Following is output of nodetool cfstats:
>>
>> Total number of tables: 37
>> 
>> Keyspace : stresstest
>> Read Count: 4398162
>> Read Latency: 0.02184742626579012 ms.
>> Write Count: 0
>> Write Latency: NaN ms.
>> Pending Flushes: 0
>> Table: user_to_segment
>> SSTable count: 1
>> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
>> Space used (live): 11076103
>> Space used (total): 11076103
>> Space used by snapshots (total): 0
>> Off heap memory used (total): 107981
>> SSTable Compression Ratio: 0.5123353861375962
>> Number of partitions (estimate): 125782
>> Memtable cell count: 0
>> Memtable data size: 0
>> Memtable off heap memory used: 0
>> Memtable switch count: 2
>> Local read count: 4398162
>> Local read latency: 0.030 ms
>> Local write count: 0
>> Local write latency: NaN ms
>> Pending flushes: 0
>> Percent repaired: 0.0
>> Bloom filter false positives: 0
>> Bloom filter false ratio: 0.0
>> Bloom filter space used: 79280
>> Bloom filter off heap memory used: 79272
>> Index summary off heap memory used: 26757
>> Compression metadata off heap memory used: 1952
>> Compacted partition minimum bytes: 43
>> Compacted partition maximum bytes: 215
>> Compacted partition mean bytes: 136
>> Average live cells per slice (last five minutes): 5.719932432432432
>> Maximum live cells per slice (last five minutes): 10
>> Average tombstones per slice (last five minutes): 1.0
>> Maximum tombstones per slice

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread mohsin k
Hey Nicolas,

My goal is to reduce latency as much as possible. I did wait for warmup.
The test ran for more than 15mins, I am not sure why it shows 2mins though.



On Mon, Feb 5, 2018 at 5:25 PM, Rahul Singh 
wrote:

> What is the average size of your partitions / rows. 1GB may not be enough.
>
> Rahul
>
> On Feb 5, 2018, 6:52 AM -0500, mohsin k ,
> wrote:
>
> Hi,
>
> I have been looking into different configurations for tuning my cassandra
> servers. So, initially I loadtested server using cassandra-stress tool,
> with default configs and then tuning one by one config to measure impact of
> change. First config, I tried was setting "*row_cache_size_in_mb*" to
> 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition':
> 'ALL'*}. After changing these configs, I observed that latency has
> increased rather than decreasing. It would be really helpful if I get to
> understand why is this the case and what steps must be taken to decrease
> the latency.
>
> I am running a cluster with 4 nodes.
>
> Following is my schema:
>
> CREATE TABLE stresstest.user_to_segment (
> userid text,
> segmentid text,
> PRIMARY KEY (userid, segmentid)
> ) WITH CLUSTERING ORDER BY (segmentid DESC)
> AND bloom_filter_fp_chance = 0.1
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = 'A table to hold blog segment user relation'
> AND compaction = {'class': 'org.apache.cassandra.db.compa
> ction.LeveledCompactionStrategy'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> Following are node specs:
> RAM: 4GB
> CPU: 4 Core
> HDD: 250BG
>
>
> Following is the output of 'nodetool info' after setting
> row_cache_size_in_mb:
>
> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
> Gossip active  : true
> Thrift active  : false
> Native Transport active: true
> Load   : 10.94 MiB
> Generation No  : 1517571163
> Uptime (seconds)   : 9169
> Heap Memory (MB)   : 136.01 / 3932.00
> Off Heap Memory (MB)   : 0.10
> Data Center: dc1
> Rack   : rack1
> Exceptions : 0
> Key Cache  : entries 125881, size 9.6 MiB, capacity 100 MiB,
> 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in
> seconds
> Row Cache  : entries 125861, size 31.54 MiB, capacity 1000
> MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period
> in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits,
> 0 requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB,
> 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss
> latency
> Percent Repaired   : 100.0%
> Token  : (invoke with -T/--tokens to see all 256 tokens)
>
>
> Following is output of nodetool cfstats:
>
> Total number of tables: 37
> 
> Keyspace : stresstest
> Read Count: 4398162
> Read Latency: 0.02184742626579012 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: user_to_segment
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 11076103
> Space used (total): 11076103
> Space used by snapshots (total): 0
> Off heap memory used (total): 107981
> SSTable Compression Ratio: 0.5123353861375962
> Number of partitions (estimate): 125782
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 2
> Local read count: 4398162
> Local read latency: 0.030 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 79280
> Bloom filter off heap memory used: 79272
> Index summary off heap memory used: 26757
> Compression metadata off heap memory used: 1952
> Compacted partition minimum bytes: 43
> Compacted partition maximum bytes: 215
> Compacted partition mean bytes: 136
> Average live cells per slice (last five minutes): 5.719932432432432
> Maximum live cells per slice (last five minutes): 10
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Dropped Mutations: 0
>
>  Following are my results:
>  The blue graph is before setting row_cache_size_in_mb,
> orange is after
>
> Thanks,
> Mohsin
>
>
> -
> To unsubscribe, e-mail: user-unsubscr

Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread Rahul Singh
What is the average size of your partitions / rows. 1GB may not be enough.

Rahul

On Feb 5, 2018, 6:52 AM -0500, mohsin k , wrote:
> Hi,
>
> I have been looking into different configurations for tuning my cassandra 
> servers. So, initially I loadtested server using cassandra-stress tool, with 
> default configs and then tuning one by one config to measure impact of 
> change. First config, I tried was setting "row_cache_size_in_mb" to 1000 (MB) 
> in yaml, adding caching {'keys': 'ALL', 'rows_per_partition': 'ALL'}. After 
> changing these configs, I observed that latency has increased rather than 
> decreasing. It would be really helpful if I get to understand why is this the 
> case and what steps must be taken to decrease the latency.
>
> I am running a cluster with 4 nodes.
>
> Following is my schema:
>
> CREATE TABLE stresstest.user_to_segment (
>     userid text,
>     segmentid text,
>     PRIMARY KEY (userid, segmentid)
> ) WITH CLUSTERING ORDER BY (segmentid DESC)
>     AND bloom_filter_fp_chance = 0.1
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>     AND comment = 'A table to hold blog segment user relation'
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
>
> Following are node specs:
> RAM: 4GB
> CPU: 4 Core
> HDD: 250BG
>
>
> Following is the output of 'nodetool info' after setting row_cache_size_in_mb:
>
> ID                     : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
> Gossip active          : true
> Thrift active          : false
> Native Transport active: true
> Load                   : 10.94 MiB
> Generation No          : 1517571163
> Uptime (seconds)       : 9169
> Heap Memory (MB)       : 136.01 / 3932.00
> Off Heap Memory (MB)   : 0.10
> Data Center            : dc1
> Rack                   : rack1
> Exceptions             : 0
> Key Cache              : entries 125881, size 9.6 MiB, capacity 100 MiB, 107 
> hits, 126004 requests, 0.001 recent hit rate, 14400 save period in seconds
> Row Cache              : entries 125861, size 31.54 MiB, capacity 1000 MiB, 
> 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period in 
> seconds
> Counter Cache          : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
> requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache            : entries 273, size 17.06 MiB, capacity 480 MiB, 325 
> misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss latency
> Percent Repaired       : 100.0%
> Token                  : (invoke with -T/--tokens to see all 256 tokens)
>
>
> Following is output of nodetool cfstats:
>
> Total number of tables: 37
> 
> Keyspace : stresstest
> Read Count: 4398162
> Read Latency: 0.02184742626579012 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: user_to_segment
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 11076103
> Space used (total): 11076103
> Space used by snapshots (total): 0
> Off heap memory used (total): 107981
> SSTable Compression Ratio: 0.5123353861375962
> Number of partitions (estimate): 125782
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 2
> Local read count: 4398162
> Local read latency: 0.030 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 79280
> Bloom filter off heap memory used: 79272
> Index summary off heap memory used: 26757
> Compression metadata off heap memory used: 1952
> Compacted partition minimum bytes: 43
> Compacted partition maximum bytes: 215
> Compacted partition mean bytes: 136
> Average live cells per slice (last five minutes): 5.719932432432432
> Maximum live cells per slice (last five minutes): 10
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Dropped Mutations: 0
>
>                  Following are my results:
>                  The blue graph is before setting row_cache_size_in_mb, 
> orange is after
>
> Thanks,
> Mohsin
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Increased latency after setting row_cache_size_in_mb

2018-02-05 Thread Nicolas Guyomar
Hi,

Could you explain a bit more what you are trying to achieve please ?
Performance tuning is by far the most complex problem we have to deal with,
and there are a lot of configuration changes that can be made on a C*
cluster

When you do performance tuning, do not forget that you need to warmup C*
JVM. Judging from the provided graph it seems to me that your test ran for
2min, which is really too short


On 5 February 2018 at 08:16, mohsin k  wrote:

> Hi,
>
> I have been looking into different configurations for tuning my cassandra
> servers. So, initially I loadtested server using cassandra-stress tool,
> with default configs and then tuning one by one config to measure impact of
> change. First config, I tried was setting "*row_cache_size_in_mb*" to
> 1000 (MB) in yaml, adding caching {'keys': 'ALL', *'rows_per_partition':
> 'ALL'*}. After changing these configs, I observed that latency has
> increased rather than decreasing. It would be really helpful if I get to
> understand why is this the case and what steps must be taken to decrease
> the latency.
>
> I am running a cluster with 4 nodes.
>
> Following is my schema:
>
> CREATE TABLE stresstest.user_to_segment (
> userid text,
> segmentid text,
> PRIMARY KEY (userid, segmentid)
> ) WITH CLUSTERING ORDER BY (segmentid DESC)
> AND bloom_filter_fp_chance = 0.1
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
> AND comment = 'A table to hold blog segment user relation'
> AND compaction = {'class': 'org.apache.cassandra.db.compa
> ction.LeveledCompactionStrategy'}
> AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> Following are node specs:
> RAM: 4GB
> CPU: 4 Core
> HDD: 250BG
>
>
> Following is the output of 'nodetool info' after setting
> row_cache_size_in_mb:
>
> ID : d97dfbbf-1dc3-4d95-a1d9-c9a8d22a3d32
> Gossip active  : true
> Thrift active  : false
> Native Transport active: true
> Load   : 10.94 MiB
> Generation No  : 1517571163
> Uptime (seconds)   : 9169
> Heap Memory (MB)   : 136.01 / 3932.00
> Off Heap Memory (MB)   : 0.10
> Data Center: dc1
> Rack   : rack1
> Exceptions : 0
> Key Cache  : entries 125881, size 9.6 MiB, capacity 100 MiB,
> 107 hits, 126004 requests, 0.001 recent hit rate, 14400 save period in
> seconds
> Row Cache  : entries 125861, size 31.54 MiB, capacity 1000
> MiB, 4262684 hits, 4388545 requests, 0.971 recent hit rate, 0 save period
> in seconds
> Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits,
> 0 requests, NaN recent hit rate, 7200 save period in seconds
> Chunk Cache: entries 273, size 17.06 MiB, capacity 480 MiB,
> 325 misses, 126623 requests, 0.997 recent hit rate, NaN microseconds miss
> latency
> Percent Repaired   : 100.0%
> Token  : (invoke with -T/--tokens to see all 256 tokens)
>
>
> Following is output of nodetool cfstats:
>
> Total number of tables: 37
> 
> Keyspace : stresstest
> Read Count: 4398162
> Read Latency: 0.02184742626579012 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Flushes: 0
> Table: user_to_segment
> SSTable count: 1
> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
> Space used (live): 11076103
> Space used (total): 11076103
> Space used by snapshots (total): 0
> Off heap memory used (total): 107981
> SSTable Compression Ratio: 0.5123353861375962
> Number of partitions (estimate): 125782
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 2
> Local read count: 4398162
> Local read latency: 0.030 ms
> Local write count: 0
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 79280
> Bloom filter off heap memory used: 79272
> Index summary off heap memory used: 26757
> Compression metadata off heap memory used: 1952
> Compacted partition minimum bytes: 43
> Compacted partition maximum bytes: 215
> Compacted partition mean bytes: 136
> Average live cells per slice (last five minutes): 5.719932432432432
> Maximum live cells per slice (last five minutes): 10
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
> Dropped Mutations: 0
>
>  Following are my results:
>  The blue graph is before setting row_cache_size_in_mb,
> orange is after
>
> Thanks,
> Mohsin
>
>
>
> 

Re: How to start cluster after the yaml file has been changed?

2018-02-05 Thread Nicolas Guyomar
Hi,

You have an invalid yaml: file:/etc/cassandra/cassandra.yaml. I suppose
what you just changed is not yaml compatible, pay attention to space
betwenn semi-colon and value

You an use any tool like https://jsonformatter.org/yaml-formatter to help
fix this issue

On 5 February 2018 at 09:28, milenko markovic <
milenko...@yahoo.co.uk.invalid> wrote:

> I have changed yaml config file.Now I want to start my cassandra cluster.
> I have restarted node
>  cassandra.service - LSB: distributed storage system for structured data
>Loaded: loaded (/etc/init.d/cassandra; bad; vendor preset: enabled)
>Active: active (running) since ma. 2018-02-05 09:27:05 CET; 2s ago
>  Docs: man:systemd-sysv-generator(8)
>   Process: 4854 ExecStop=/etc/init.d/cassandra stop (code=exited,
> status=0/SUCCESS)
>   Process: 4863 ExecStart=/etc/init.d/cassandra start (code=exited,
> status=0/SUCCESS)
>CGroup: /system.slice/cassandra.service
>└─4944 java -Xloggc:/var/log/cassandra/gc.log -ea
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 
> -XX:+HeapDumpOnOutOfMemoryError
> -Xss256k -XX:StringTableSize=103 -XX:+AlwaysPreTouch -
>
> If I go for
>
> sudo -u milenko /usr/sbin/cassandra
>
> NFO  [main] 2018-02-05 09:27:31,681 YamlConfigurationLoader.java:89 -
> Configuration location: file:/etc/cassandra/cassandra.yaml
> Exception (org.apache.cassandra.exceptions.ConfigurationException)
> encountered during startup: Invalid yaml: file:/etc/cassandra/cassandra.
> yaml
>  Error: null; Can't construct a java object for tag:yaml.org
> ,2002:org.apache.cassandra.config.Config; exception=Cannot create
> property=seed_provider for JavaBean=org.apache.cassandra.
> config.Config@551bdc27; java.lang.reflect.InvocationTargetException;  in
> 'reader', line 10, column 1:
> cluster_name: 'Petter Cluster'
> ^
> Invalid yaml: file:/etc/cassandra/cassandra.yaml
>  Error: null; Can't construct a java object for tag:yaml.org
> ,2002:org.apache.cassandra.config.Config; exception=Cannot create
> property=seed_provider for JavaBean=org.apache.cassandra.
> config.Config@551bdc27; java.lang.reflect.InvocationTargetException;  in
> 'reader', line 10, column 1:
> cluster_name: 'Petter Cluster'
> ^
> ERROR [main] 2018-02-05 09:27:32,005 CassandraDaemon.java:706 - Exception
> encountered during startup: Invalid yaml: file:/etc/cassandra/cassandra.
> yaml
>  Error: null; Can't construct a java object for tag:yaml.org
> ,2002:org.apache.cassandra.config.Config; exception=Cannot create
> property=seed_provider for JavaBean=org.apache.cassandra.
> config.Config@551bdc27; java.lang.reflect.InvocationTargetException;  in
> 'reader', line 10, column 1:
> cluster_name: 'Petter Cluster'
>
> How to fix this?
>
>
>
>


How to start cluster after the yaml file has been changed?

2018-02-05 Thread milenko markovic
I have changed yaml config file.Now I want to start my cassandra cluster.I have 
restarted node cassandra.service - LSB: distributed storage system for 
structured data   Loaded: loaded (/etc/init.d/cassandra; bad; vendor preset: 
enabled)   Active: active (running) since ma. 2018-02-05 09:27:05 CET; 2s ago   
  Docs: man:systemd-sysv-generator(8)  Process: 4854 
ExecStop=/etc/init.d/cassandra stop (code=exited, status=0/SUCCESS)  Process: 
4863 ExecStart=/etc/init.d/cassandra start (code=exited, status=0/SUCCESS)   
CGroup: /system.slice/cassandra.service           └─4944 java 
-Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k 
-XX:StringTableSize=103 -XX:+AlwaysPreTouch -
If I go for
sudo -u milenko /usr/sbin/cassandra 

NFO  [main] 2018-02-05 09:27:31,681 YamlConfigurationLoader.java:89 - 
Configuration location: file:/etc/cassandra/cassandra.yamlException 
(org.apache.cassandra.exceptions.ConfigurationException) encountered during 
startup: Invalid yaml: file:/etc/cassandra/cassandra.yaml Error: null; Can't 
construct a java object for 
tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create 
property=seed_provider for 
JavaBean=org.apache.cassandra.config.Config@551bdc27; 
java.lang.reflect.InvocationTargetException;  in 'reader', line 10, column 1:   
 cluster_name: 'Petter Cluster'    ^Invalid yaml: 
file:/etc/cassandra/cassandra.yaml Error: null; Can't construct a java object 
for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot 
create property=seed_provider for 
JavaBean=org.apache.cassandra.config.Config@551bdc27; 
java.lang.reflect.InvocationTargetException;  in 'reader', line 10, column 1:   
 cluster_name: 'Petter Cluster'    ^ERROR [main] 2018-02-05 09:27:32,005 
CassandraDaemon.java:706 - Exception encountered during startup: Invalid yaml: 
file:/etc/cassandra/cassandra.yaml Error: null; Can't construct a java object 
for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot 
create property=seed_provider for 
JavaBean=org.apache.cassandra.config.Config@551bdc27; 
java.lang.reflect.InvocationTargetException;  in 'reader', line 10, column 1:   
 cluster_name: 'Petter Cluster'
How to fix this?