Fwd: store avro to cassandra

2015-11-05 Thread Lu Niu
Hi, cassandra users

my data is in avro format and the schema is huge. Is there any way that I
can automatically convert the avro schema to the schema that cassandra
could use? also, the api that I could store and fetch the data? Thank you!

Best,
Lu


Re: Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread 郝加来
hi ,
the same partition key on a single node are atomic and isolated?
sorry,i don't read the source code ,but i think the cassandra is single thread 
on the same keyspace ,not he partition key, and the same keyspace is atomic and 
isolated .
because, when the client insert data into table a and b ,the throught  on table 
' a is drop , and a + b 'sum is the 2/s .




郝加来

From: Graham Sanderson
Date: 2015-11-06 11:06
To: user@cassandra.apache.org
Subject: Re: why cassanra max is 2/s on a node ?
Agreed too. It also matters what you are inserting… if you are inserting to the 
same (or small set of) partition key(s) you will be limited because writes to 
the same partition key on a single node are atomic and isolated.


On Nov 5, 2015, at 8:49 PM, Venkatesh Arivazhagan  wrote:


I agree with Tyler! Have you tries increasing the the client threads from 5 to 
a higher number?
On Nov 5, 2015 6:46 PM, "郝加来"  wrote:

right ,
but wo want a node 's throught is above million , so if the system hava fifty 
table , a single table can achive 2/s .





郝加来

From: Eric Stevens
Date: 2015-11-05 23:56
To: user@cassandra.apache.org
Subject: Re: why cassanra max is 2/s on a node ?
> 512G memory , 128core cpu 


This seems dramatically oversized for a Cassandra node.  You'd do much better 
to have a much larger cluster of much smaller nodes.




On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky  wrote:

I don't know what current numbers are, but last year the idea of getting 1 
million writes per second on a 96 node cluster was considered a reasonable 
achievement. That would be roughly 10,000 writes per second per node and you 
are getting twice that. 


See:
http://www.datastax.com/1-million-writes



Or this Google test which hit 1 million writes per second with 330 nodes, which 
would be roughly 3,000 writes per second per node:
http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html



So, is your question why your throughput is so good or are you disappointed 
that it wasn't better?


Cassandra is designed for clusters with lots of nodes, so if you want to get an 
accurate measure of per-node performance you need to test with a reasonable 
number of nodes and then divide aggregate performance by the number of nodes, 
not test a single node alone. In short, testing a single node in isolation is 
not a recommended approach to testing Cassandra performance.




-- Jack Krupansky


On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:

hi
veryone 
i setup cassandra 2.2.3 on a node , the machine 's environment is openjdk-1.8.0 
, 512G memory , 128core cpu , 3T ssd .
the token num is 256 on a node , the program use datastax driver 2.1.8 and use 
5 thread to insert data to cassandra on the same machine , the data 's capcity 
is 6G  and 1157000 line .

why is the throughput 2/s on the node ?

# Per-thread stack size.
JVM_OPTS="$JVM_OPTS -Xss512k"

# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"

# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"  
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"

memtable_heap_space_in_mb: 1024
memtable_offheap_space_in_mb: 10240
memtable_cleanup_threshold: 0.55 
memtable_allocation_type: heap_buffers 



以上
谢谢



 
郝加来

金融华东事业部
<东软20周年邮件签名logo(1(11-06-10-44-31).jpg>

东软集团股份有限公司
上海市闵行区紫月路1000号东软软件园
Postcode:200241
Tel:(86 21) 33578591
Fax:(86 21) 23025565-111
Mobile:13764970711
Email:ha...@neusoft.com
Http://www.neusoft.com








---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---




---

Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Graham Sanderson
Also it sounds like you are reading the data from a single file - the problem 
could easily be with your load tool

try (as someone suggested) using cassandra stress

> On Nov 5, 2015, at 9:06 PM, Graham Sanderson  wrote:
> 
> Agreed too. It also matters what you are inserting… if you are inserting to 
> the same (or small set of) partition key(s) you will be limited because 
> writes to the same partition key on a single node are atomic and isolated.
> 
>> On Nov 5, 2015, at 8:49 PM, Venkatesh Arivazhagan > > wrote:
>> 
>> I agree with Tyler! Have you tries increasing the the client threads from 5 
>> to a higher number?
>> 
>> On Nov 5, 2015 6:46 PM, "郝加来" mailto:ha...@neusoft.com>> 
>> wrote:
>> right ,
>> but wo want a node 's throught is above million , so if the system hava 
>> fifty table , a single table can achive 2/s .
>>  
>>  
>> 郝加来
>>  
>> From: Eric Stevens 
>> Date: 2015-11-05 23:56
>> To: user@cassandra.apache.org 
>> Subject: Re: why cassanra max is 2/s on a node ?
>> > 512G memory , 128core cpu
>> 
>> This seems dramatically oversized for a Cassandra node.  You'd do much 
>> better to have a much larger cluster of much smaller nodes.
>> 
>> 
>> On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky > > wrote:
>> I don't know what current numbers are, but last year the idea of getting 1 
>> million writes per second on a 96 node cluster was considered a reasonable 
>> achievement. That would be roughly 10,000 writes per second per node and you 
>> are getting twice that.
>> 
>> See:
>> http://www.datastax.com/1-million-writes 
>> 
>> 
>> Or this Google test which hit 1 million writes per second with 330 nodes, 
>> which would be roughly 3,000 writes per second per node:
>> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html
>>  
>> 
>> 
>> So, is your question why your throughput is so good or are you disappointed 
>> that it wasn't better?
>> 
>> Cassandra is designed for clusters with lots of nodes, so if you want to get 
>> an accurate measure of per-node performance you need to test with a 
>> reasonable number of nodes and then divide aggregate performance by the 
>> number of nodes, not test a single node alone. In short, testing a single 
>> node in isolation is not a recommended approach to testing Cassandra 
>> performance.
>> 
>> 
>> -- Jack Krupansky
>> 
>> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来 > > wrote:
>> hi
>> veryone
>> i setup cassandra 2.2.3 on a node , the machine 's environment is 
>> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
>> the token num is 256 on a node , the program use datastax driver 2.1.8 and 
>> use 5 thread to insert data to cassandra on the same machine , the data 's 
>> capcity is 6G  and 1157000 line .
>>  
>> why is the throughput 2/s on the node ?
>>  
>> # Per-thread stack size.
>> JVM_OPTS="$JVM_OPTS -Xss512k"
>>  
>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>>  
>> # GC tuning options
>> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
>> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>> JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>>  
>> memtable_heap_space_in_mb: 1024
>> memtable_offheap_space_in_mb: 10240
>> memtable_cleanup_threshold: 0.55
>> memtable_allocation_type: heap_buffers
>>  
>>  
>>  
>> 以上
>> 谢谢
>>  
>> 郝加来
>>  
>> 金融华东事业部
>> <东软20周年邮件签名logo(1(11-06-10-44-31).jpg>
>> 
>> 东软集团股份有限公司
>> 上海市闵行区紫月路1000号东软软件园
>> Postcode:200241
>> Tel:(86 21) 33578591
>> Fax:(86 21) 23025565-111
>> Mobile:13764970711
>> Email:ha...@neusoft.com 
>> Http://www.neusoft.com 
>>  
>>  
>>  
>>  
>>  
>> 
>> ---
>> Confidentiality Notice: The information contained in this e-mail and any 
>> accompanying attachment(s) 
>> is intended only for the use of the intended recipient and may be 
>> confidential and/or privileged of 
>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader 
>> of this communication is 
>>

Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Graham Sanderson
Agreed too. It also matters what you are inserting… if you are inserting to the 
same (or small set of) partition key(s) you will be limited because writes to 
the same partition key on a single node are atomic and isolated.

> On Nov 5, 2015, at 8:49 PM, Venkatesh Arivazhagan  
> wrote:
> 
> I agree with Tyler! Have you tries increasing the the client threads from 5 
> to a higher number?
> 
> On Nov 5, 2015 6:46 PM, "郝加来" mailto:ha...@neusoft.com>> 
> wrote:
> right ,
> but wo want a node 's throught is above million , so if the system hava fifty 
> table , a single table can achive 2/s .
>  
>  
> 郝加来
>  
> From: Eric Stevens 
> Date: 2015-11-05 23:56
> To: user@cassandra.apache.org 
> Subject: Re: why cassanra max is 2/s on a node ?
> > 512G memory , 128core cpu
> 
> This seems dramatically oversized for a Cassandra node.  You'd do much better 
> to have a much larger cluster of much smaller nodes.
> 
> 
> On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky  > wrote:
> I don't know what current numbers are, but last year the idea of getting 1 
> million writes per second on a 96 node cluster was considered a reasonable 
> achievement. That would be roughly 10,000 writes per second per node and you 
> are getting twice that.
> 
> See:
> http://www.datastax.com/1-million-writes 
> 
> 
> Or this Google test which hit 1 million writes per second with 330 nodes, 
> which would be roughly 3,000 writes per second per node:
> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html
>  
> 
> 
> So, is your question why your throughput is so good or are you disappointed 
> that it wasn't better?
> 
> Cassandra is designed for clusters with lots of nodes, so if you want to get 
> an accurate measure of per-node performance you need to test with a 
> reasonable number of nodes and then divide aggregate performance by the 
> number of nodes, not test a single node alone. In short, testing a single 
> node in isolation is not a recommended approach to testing Cassandra 
> performance.
> 
> 
> -- Jack Krupansky
> 
> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  > wrote:
> hi
> veryone
> i setup cassandra 2.2.3 on a node , the machine 's environment is 
> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
> the token num is 256 on a node , the program use datastax driver 2.1.8 and 
> use 5 thread to insert data to cassandra on the same machine , the data 's 
> capcity is 6G  and 1157000 line .
>  
> why is the throughput 2/s on the node ?
>  
> # Per-thread stack size.
> JVM_OPTS="$JVM_OPTS -Xss512k"
>  
> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>  
> # GC tuning options
> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
> JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>  
> memtable_heap_space_in_mb: 1024
> memtable_offheap_space_in_mb: 10240
> memtable_cleanup_threshold: 0.55
> memtable_allocation_type: heap_buffers
>  
>  
>  
> 以上
> 谢谢
>  
> 郝加来
>  
> 金融华东事业部
> <东软20周年邮件签名logo(1(11-06-10-44-31).jpg>
> 
> 东软集团股份有限公司
> 上海市闵行区紫月路1000号东软软件园
> Postcode:200241
> Tel:(86 21) 33578591
> Fax:(86 21) 23025565-111
> Mobile:13764970711
> Email:ha...@neusoft.com 
> Http://www.neusoft.com 
>  
>  
>  
>  
>  
> 
> ---
> Confidentiality Notice: The information contained in this e-mail and any 
> accompanying attachment(s) 
> is intended only for the use of the intended recipient and may be 
> confidential and/or privileged of 
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
> this communication is 
> not the intended recipient, unauthorized use, forwarding, printing,  storing, 
> disclosure or copying 
> is strictly prohibited, and may be unlawful.If you have received this 
> communication in error,please 
> immediately notify the sender by return e-mail, and delete the original 
> message and all copies from 
> your system. Thank you. 
> -

Re: Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Venkatesh Arivazhagan
I agree with Tyler! Have you tries increasing the the client threads from 5
to a higher number?
On Nov 5, 2015 6:46 PM, "郝加来"  wrote:

> right ,
> but wo want a node 's throught is above million , so if the system hava
> fifty table , a single table can achive 2/s .
>
>
> --
> 郝加来
>
> *From:* Eric Stevens 
> *Date:* 2015-11-05 23:56
> *To:* user@cassandra.apache.org
> *Subject:* Re: why cassanra max is 2/s on a node ?
> > 512G memory , 128core cpu
>
> This seems dramatically oversized for a Cassandra node.  You'd do *much* 
> better
> to have a much larger cluster of much smaller nodes.
>
>
> On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky 
> wrote:
>
>> I don't know what current numbers are, but last year the idea of getting
>> 1 million writes per second on a 96 node cluster was considered a
>> reasonable achievement. That would be roughly 10,000 writes per second per
>> node and you are getting twice that.
>>
>> See:
>> http://www.datastax.com/1-million-writes
>>
>> Or this Google test which hit 1 million writes per second with 330 nodes,
>> which would be roughly 3,000 writes per second per node:
>>
>> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html
>>
>> So, is your question why your throughput is so good or are you
>> disappointed that it wasn't better?
>>
>> Cassandra is designed for clusters with lots of nodes, so if you want to
>> get an accurate measure of per-node performance you need to test with a
>> reasonable number of nodes and then divide aggregate performance by the
>> number of nodes, not test a single node alone. In short, testing a single
>> node in isolation is not a recommended approach to testing Cassandra
>> performance.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:
>>
>>> hi
>>> veryone
>>> i setup cassandra 2.2.3 on a node , the machine 's environment is
>>> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
>>> the token num is 256 on a node , the program use datastax driver 2.1.8
>>> and use 5 thread to insert data to cassandra on the same machine , the data
>>> 's capcity is 6G  and 1157000 line .
>>>
>>> why is the throughput 2/s on the node ?
>>>
>>>
>>> # Per-thread stack size.
>>>
>>> JVM_OPTS="$JVM_OPTS -Xss512k"
>>>
>>>
>>>
>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>>>
>>>
>>>
>>> # GC tuning options
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>>
>>> JVM_OPTS="$JVM_OPTS
>>> -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>>>
>>>
>>>
>>> memtable_heap_space_in_mb: 1024
>>>
>>> memtable_offheap_space_in_mb: 10240
>>>
>>> memtable_cleanup_threshold: 0.55
>>>
>>> memtable_allocation_type: heap_buffers
>>>
>>>
>>>
>>>
>>> 以上
>>> 谢谢
>>> --
>>>
>>> *郝加来*
>>>
>>> 金融华东事业部
>>>
>>> 东软集团股份有限公司
>>> 上海市闵行区紫月路1000号东软软件园
>>> Postcode:200241
>>> Tel:(86 21) 33578591
>>> Fax:(86 21) *23025565-111*
>>> Mobile:13764970711
>>> Email:ha...@neusoft.com
>>> Http://www.neusoft.com 
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Confidentiality Notice: The information contained in this e-mail and any
>>> accompanying attachment(s)
>>> is intended only for the use of the intended recipient and may be
>>> confidential and/or privileged of
>>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any
>>> reader of this communication is
>>> not the intended recipient, unauthorized use, forwarding, printing,
>>> storing, disclosure or copying
>>> is strictly prohibited, and may be unlawful.If you have received this
>>> communication in error,please
>>> immediately notify the sender by return e-mail, and delete the original
>>> message and all copies from
>>> your system. Thank you.
>>>
>>> ---
>>>
>>
>>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft C

Re: Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread 郝加来
right ,
but wo want a node 's throught is above million , so if the system hava fifty 
table , a single table can achive 2/s .





郝加来

From: Eric Stevens
Date: 2015-11-05 23:56
To: user@cassandra.apache.org
Subject: Re: why cassanra max is 2/s on a node ?
> 512G memory , 128core cpu


This seems dramatically oversized for a Cassandra node.  You'd do much better 
to have a much larger cluster of much smaller nodes.




On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky  wrote:

I don't know what current numbers are, but last year the idea of getting 1 
million writes per second on a 96 node cluster was considered a reasonable 
achievement. That would be roughly 10,000 writes per second per node and you 
are getting twice that.


See:
http://www.datastax.com/1-million-writes



Or this Google test which hit 1 million writes per second with 330 nodes, which 
would be roughly 3,000 writes per second per node:
http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html



So, is your question why your throughput is so good or are you disappointed 
that it wasn't better?


Cassandra is designed for clusters with lots of nodes, so if you want to get an 
accurate measure of per-node performance you need to test with a reasonable 
number of nodes and then divide aggregate performance by the number of nodes, 
not test a single node alone. In short, testing a single node in isolation is 
not a recommended approach to testing Cassandra performance.




-- Jack Krupansky


On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:

hi
veryone 
i setup cassandra 2.2.3 on a node , the machine 's environment is openjdk-1.8.0 
, 512G memory , 128core cpu , 3T ssd .
the token num is 256 on a node , the program use datastax driver 2.1.8 and use 
5 thread to insert data to cassandra on the same machine , the data 's capcity 
is 6G  and 1157000 line .

why is the throughput 2/s on the node ?

# Per-thread stack size.
JVM_OPTS="$JVM_OPTS -Xss512k"

# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"

# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"  
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"

memtable_heap_space_in_mb: 1024
memtable_offheap_space_in_mb: 10240
memtable_cleanup_threshold: 0.55 
memtable_allocation_type: heap_buffers 



以上
谢谢



 
郝加来

金融华东事业部


东软集团股份有限公司
上海市闵行区紫月路1000号东软软件园
Postcode:200241
Tel:(86 21) 33578591
Fax:(86 21) 23025565-111
Mobile:13764970711
Email:ha...@neusoft.com
Http://www.neusoft.com








---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---


---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---


Re: Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread 郝加来
Cassandra is designed for clusters with lots of nodes, 
right ,
i know it , but a single node 's throughput  only 2/s ? and all table 's 
total throughput  is 2 /s ?
so i think it is a single thread to deal the all table's command .
normal , a database 's all table 's total throughput is above 200,000 /s .





郝加来

From: Jack Krupansky
Date: 2015-11-05 23:24
To: user@cassandra.apache.org
Subject: Re: why cassanra max is 2/s on a node ?
I don't know what current numbers are, but last year the idea of getting 1 
million writes per second on a 96 node cluster was considered a reasonable 
achievement. That would be roughly 10,000 writes per second per node and you 
are getting twice that.


See:
http://www.datastax.com/1-million-writes



Or this Google test which hit 1 million writes per second with 330 nodes, which 
would be roughly 3,000 writes per second per node:
http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html



So, is your question why your throughput is so good or are you disappointed 
that it wasn't better?


Cassandra is designed for clusters with lots of nodes, so if you want to get an 
accurate measure of per-node performance you need to test with a reasonable 
number of nodes and then divide aggregate performance by the number of nodes, 
not test a single node alone. In short, testing a single node in isolation is 
not a recommended approach to testing Cassandra performance.




-- Jack Krupansky


On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:

hi
veryone 
i setup cassandra 2.2.3 on a node , the machine 's environment is openjdk-1.8.0 
, 512G memory , 128core cpu , 3T ssd .
the token num is 256 on a node , the program use datastax driver 2.1.8 and use 
5 thread to insert data to cassandra on the same machine , the data 's capcity 
is 6G  and 1157000 line .

why is the throughput 2/s on the node ?

# Per-thread stack size.
JVM_OPTS="$JVM_OPTS -Xss512k"

# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"

# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"  
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"

memtable_heap_space_in_mb: 1024
memtable_offheap_space_in_mb: 10240
memtable_cleanup_threshold: 0.55 
memtable_allocation_type: heap_buffers 



以上
谢谢



 
郝加来

金融华东事业部


东软集团股份有限公司
上海市闵行区紫月路1000号东软软件园
Postcode:200241
Tel:(86 21) 33578591
Fax:(86 21) 23025565-111
Mobile:13764970711
Email:ha...@neusoft.com
Http://www.neusoft.com








---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---


---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---


Re: Does nodetool cleanup clears tombstones in the CF?

2015-11-05 Thread K F
Thanks Rob, I will look into checksstablegarbage utility. However, I don't want 
to run major compaction as that would result in too big of a sstable.
Regards,K F
  From: Robert Coli 
 To: "user@cassandra.apache.org" ; K F 
 
 Sent: Thursday, November 5, 2015 1:53 PM
 Subject: Re: Does nodetool cleanup clears tombstones in the CF?
   


On Wed, Nov 4, 2015 at 12:56 PM, K F  wrote:

Quick question, in order for me to purge tombstones on particular nodes if I 
run nodetool cleanup   will that help in purging 
the tombstones from that node?

cleanup is for removing data from ranges the node no longer owns.
It is unrelated to tombstones.
There are various approaches to cleaning up tombstones. A simple (if manual) 
one is to use "checksstablegarbage" and user defined compaction. Even simpler 
is to run a major compaction, but this has some downsides.
=Rob 

  

What are the repercussions of a restart during anticompaction?

2015-11-05 Thread Bryan Cheng
Hey list,

Tried to find an answer to this elsewhere, but turned up nothing.

We ran our first incremental repair after a large dc migration two days
ago; the cluster had been running full repairs prior to this during the
migration. Our nodes are currently going through anticompaction, as
expected.

However, two days later, there is little to no apparent progress on this
process. The compaction count does increase, in bursts, but compactionstats
hangs with no response. We're seeing our disk space footprint grow steadily
as well. The number of sstables on disk is reaching high levels.

In the past, when our compactions seem to hang, a restart seems to move
things along; at the very least, it seems to allow JMX to respond. However,
I'm not sure of the repercussions of a restart during anticompaction.

Given my understanding of anticompaction, my expectation would be that the
sstables that had been split and marked repaired would remain that way, the
ones that had not yet been split would be left as unrepaired and some
ranges would probably be re-repaired on the next incremental repair, and
the machine would do standard compaction among the two sets (repaired vs
unrepaired). In other words, we wouldn't lose any progress in incremental
repair + anticompaction, but some repaired data would get re-repaired. Does
this seem reasonable?

Should I just let this anticompaction run its course? We did the migration
procedure (marking sstables as repaired) awhile ago, but did a full repair
again after that before we decommissioned our old dc.

Any guidance would be appreciated! Thanks,

Bryan


cassandra-stress and "op rate"

2015-11-05 Thread Herbert Fischer
Hi,

I'm doing some hardware benchmarks for Cassandra and trying to figure out
what is the best setup with the hardware options I have.

I'm testing a single-node Cassandra with three different setups:

- 1 HDD for commit log and 6 HDDs for data logs
- 1 HDD for commit log and 1 HDDs for data logs
- 1 HDD for commit log and data logs

The load generator is more powerful in terms of CPU than the Cassandra
server, so I'm able to saturate the server's CPU with just one load
generator.

So far I got really unexpected results. I got higher "op rate" for the
setup with less HDDs, and even worse, with the setup with just one HDD.

So my question is, what does it mean the "op rate" in the Results of
cassandra-stress tool?


Thanks in advance!

-- 
Herbert Fischer


Re: Does nodetool cleanup clears tombstones in the CF?

2015-11-05 Thread Robert Coli
On Wed, Nov 4, 2015 at 12:56 PM, K F  wrote:

> Quick question, in order for me to purge tombstones on particular nodes if
> I run nodetool cleanup   will that help in
> purging the tombstones from that node?
>

cleanup is for removing data from ranges the node no longer owns.

It is unrelated to tombstones.

There are various approaches to cleaning up tombstones. A simple (if
manual) one is to use "checksstablegarbage" and user defined compaction.
Even simpler is to run a major compaction, but this has some downsides.

=Rob


Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Tyler Hobbs
>
> the program use datastax driver 2.1.8 and use 5 thread to insert data to
> cassandra on the same machine


The client with five threads is probably your bottleneck.  Try running the
cassandra stress tool for comparison.  You should see at least double the
throughput.

On Thu, Nov 5, 2015 at 9:56 AM, Eric Stevens  wrote:

> > 512G memory , 128core cpu
>
> This seems dramatically oversized for a Cassandra node.  You'd do *much* 
> better
> to have a much larger cluster of much smaller nodes.
>
>
> On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky 
> wrote:
>
>> I don't know what current numbers are, but last year the idea of getting
>> 1 million writes per second on a 96 node cluster was considered a
>> reasonable achievement. That would be roughly 10,000 writes per second per
>> node and you are getting twice that.
>>
>> See:
>> http://www.datastax.com/1-million-writes
>>
>> Or this Google test which hit 1 million writes per second with 330 nodes,
>> which would be roughly 3,000 writes per second per node:
>>
>> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html
>>
>> So, is your question why your throughput is so good or are you
>> disappointed that it wasn't better?
>>
>> Cassandra is designed for clusters with lots of nodes, so if you want to
>> get an accurate measure of per-node performance you need to test with a
>> reasonable number of nodes and then divide aggregate performance by the
>> number of nodes, not test a single node alone. In short, testing a single
>> node in isolation is not a recommended approach to testing Cassandra
>> performance.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:
>>
>>> hi
>>> veryone
>>> i setup cassandra 2.2.3 on a node , the machine 's environment is
>>> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
>>> the token num is 256 on a node , the program use datastax driver 2.1.8
>>> and use 5 thread to insert data to cassandra on the same machine , the data
>>> 's capcity is 6G  and 1157000 line .
>>>
>>> why is the throughput 2/s on the node ?
>>>
>>>
>>> # Per-thread stack size.
>>>
>>> JVM_OPTS="$JVM_OPTS -Xss512k"
>>>
>>>
>>>
>>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>>>
>>>
>>>
>>> # GC tuning options
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>>
>>> JVM_OPTS="$JVM_OPTS
>>> -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
>>>
>>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>>>
>>>
>>>
>>> memtable_heap_space_in_mb: 1024
>>>
>>> memtable_offheap_space_in_mb: 10240
>>>
>>> memtable_cleanup_threshold: 0.55
>>>
>>> memtable_allocation_type: heap_buffers
>>>
>>>
>>>
>>>
>>> 以上
>>> 谢谢
>>> --
>>>
>>> *郝加来*
>>>
>>> 金融华东事业部
>>>
>>> 东软集团股份有限公司
>>> 上海市闵行区紫月路1000号东软软件园
>>> Postcode:200241
>>> Tel:(86 21) 33578591
>>> Fax:(86 21) *23025565-111*
>>> Mobile:13764970711
>>> Email:ha...@neusoft.com
>>> Http://www.neusoft.com 
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Confidentiality Notice: The information contained in this e-mail and any
>>> accompanying attachment(s)
>>> is intended only for the use of the intended recipient and may be
>>> confidential and/or privileged of
>>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any
>>> reader of this communication is
>>> not the intended recipient, unauthorized use, forwarding, printing,
>>> storing, disclosure or copying
>>> is strictly prohibited, and may be unlawful.If you have received this
>>> communication in error,please
>>> immediately notify the sender by return e-mail, and delete the original
>>> message and all copies from
>>> your system. Thank you.
>>>
>>> ---
>>>
>>
>>


-- 
Tyler Hobbs
DataStax 


Re: Replication Factor Change

2015-11-05 Thread Yulian Oifa
Hello
OK i got it , so i should set CL to ALL for reads, otherwise data may be
retrieved from nodes that does not have yet current record.
Thanks for help.
Yulian Oifa

On Thu, Nov 5, 2015 at 5:33 PM, Eric Stevens  wrote:

> If you switch reads to CL=LOCAL_ALL, you should be able to increase RF,
> then run repair, and after repair is complete, go back to your old
> consistency level.  However, while you're operating at ALL consistency, you
> have no tolerance for a node failure (but at RF=1 you already have no
> tolerance for a node failure, so that doesn't really change your
> availability model).
>
> On Thu, Nov 5, 2015 at 8:01 AM Yulian Oifa  wrote:
>
>> Hello to all.
>> I am planning to change replication factor from 1 to 3.
>> Will it cause data read errors in time of nodes repair?
>>
>> Best regards
>> Yulian Oifa
>>
>


Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Eric Stevens
> 512G memory , 128core cpu

This seems dramatically oversized for a Cassandra node.  You'd do *much* better
to have a much larger cluster of much smaller nodes.


On Thu, Nov 5, 2015 at 8:25 AM Jack Krupansky 
wrote:

> I don't know what current numbers are, but last year the idea of getting 1
> million writes per second on a 96 node cluster was considered a reasonable
> achievement. That would be roughly 10,000 writes per second per node and
> you are getting twice that.
>
> See:
> http://www.datastax.com/1-million-writes
>
> Or this Google test which hit 1 million writes per second with 330 nodes,
> which would be roughly 3,000 writes per second per node:
>
> http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html
>
> So, is your question why your throughput is so good or are you
> disappointed that it wasn't better?
>
> Cassandra is designed for clusters with lots of nodes, so if you want to
> get an accurate measure of per-node performance you need to test with a
> reasonable number of nodes and then divide aggregate performance by the
> number of nodes, not test a single node alone. In short, testing a single
> node in isolation is not a recommended approach to testing Cassandra
> performance.
>
>
> -- Jack Krupansky
>
> On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:
>
>> hi
>> veryone
>> i setup cassandra 2.2.3 on a node , the machine 's environment is
>> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
>> the token num is 256 on a node , the program use datastax driver 2.1.8
>> and use 5 thread to insert data to cassandra on the same machine , the data
>> 's capcity is 6G  and 1157000 line .
>>
>> why is the throughput 2/s on the node ?
>>
>>
>> # Per-thread stack size.
>>
>> JVM_OPTS="$JVM_OPTS -Xss512k"
>>
>>
>>
>> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>>
>> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>>
>>
>>
>> # GC tuning options
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>
>> JVM_OPTS="$JVM_OPTS
>> -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>>
>>
>>
>> memtable_heap_space_in_mb: 1024
>>
>> memtable_offheap_space_in_mb: 10240
>>
>> memtable_cleanup_threshold: 0.55
>>
>> memtable_allocation_type: heap_buffers
>>
>>
>>
>>
>> 以上
>> 谢谢
>> --
>>
>> *郝加来*
>>
>> 金融华东事业部
>>
>> 东软集团股份有限公司
>> 上海市闵行区紫月路1000号东软软件园
>> Postcode:200241
>> Tel:(86 21) 33578591
>> Fax:(86 21) *23025565-111*
>> Mobile:13764970711
>> Email:ha...@neusoft.com
>> Http://www.neusoft.com 
>>
>>
>>
>>
>>
>>
>>
>> ---
>> Confidentiality Notice: The information contained in this e-mail and any
>> accompanying attachment(s)
>> is intended only for the use of the intended recipient and may be
>> confidential and/or privileged of
>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any
>> reader of this communication is
>> not the intended recipient, unauthorized use, forwarding, printing,
>> storing, disclosure or copying
>> is strictly prohibited, and may be unlawful.If you have received this
>> communication in error,please
>> immediately notify the sender by return e-mail, and delete the original
>> message and all copies from
>> your system. Thank you.
>>
>> ---
>>
>
>


Re: Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread Eric Stevens
If you're talking about logged batches, these absolutely have an impact on
performance of about 30%.  The whole batch will succeed or fail as a unit,
but throughput will go down and load will go up.  Keep in mind that logged
batches are atomic but are not isolated - i.e. it's totally possible to get
a dirty read.  See
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2

If you're not doing some kind of CAS operation inside the logged batch,
then the only advantage of a logged batch over an unlogged batch is that
when consistency can't be accomplished for the second statement (so it
fails the write), then the first statement will also not succeed (but at
that point your cluster is effectively offline).

Unlogged batches offer very few guarantees over single statements, and even
have the drawback of eliminating your driver's ability to operate in a
token aware fashion.

On Thu, Nov 5, 2015 at 8:22 AM Sachin Nikam  wrote:

> I currently have a keyspace with table definition that looks like this.
>
>
> CREATE TABLE *orders*(
>   order-id long PRIMARY KEY,
>   order-blob text
> );
>
> This table will have a write load of ~40-100 tps and a read load of ~200-400 
> tps.
>
> We are now considering adding another table definition which closely 
> resembles a timeseries table.
>
> CREATE TABLE order_sequence(
> //shard-id will be generated by order-id%Number of Nodes in //Cassandra Ring. 
> It will be then suffixed with Current //Date. An Example would be 
> 2-Nov-11-2015
>
>   shard-and-date text,
>
> //This will be a simple flake generated long
>   sequence-id long
>   PRIMARY KEY (shard-and-date, sequence-id)
> )WITH CLUSTERING ORDER BY (sequence-id DESC);
>
>
> The goal of this table is to answer queries like "Get me the count of orders 
> changed in a given sequence-id range". This query will be called once every 5 
> sec.
>
> The plan is to write both these tables in a single BATCH statement.
>
> 1. Will this impact the WRite latency?
>
> 2. Also will it impact Read latency of "orders" table?
>
> 3. Will it impact the overall stability of the cluster?
>
>


RE: Replication Factor Change

2015-11-05 Thread aeljami.ext
Hello,

If current CL = ONE, Be careful on production at the time of change replication 
factor, 3 nodes will be queried while data is being transformed ==> So data 
read errors!
De : Yulian Oifa [mailto:oifa.yul...@gmail.com]
Envoyé : jeudi 5 novembre 2015 16:02
À : user@cassandra.apache.org
Objet : Replication Factor Change

Hello to all.
I am planning to change replication factor from 1 to 3.
Will it cause data read errors in time of nodes repair?
Best regards
Yulian Oifa

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Re: Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread DuyHai Doan
""Get me the count of orders changed in a given sequence-id range"" --> Can
you give an example of SELECT statement for this query ?

Because given the table structure, you have to provide the shard-and-date
partition key and I don't see how you can know this value unless you create
as many SELECT as there are Cassandra nodes, for a given date ...

On Thu, Nov 5, 2015 at 4:21 PM, Sachin Nikam  wrote:

> I currently have a keyspace with table definition that looks like this.
>
>
> CREATE TABLE *orders*(
>   order-id long PRIMARY KEY,
>   order-blob text
> );
>
> This table will have a write load of ~40-100 tps and a read load of ~200-400 
> tps.
>
> We are now considering adding another table definition which closely 
> resembles a timeseries table.
>
> CREATE TABLE order_sequence(
> //shard-id will be generated by order-id%Number of Nodes in //Cassandra Ring. 
> It will be then suffixed with Current //Date. An Example would be 
> 2-Nov-11-2015
>
>   shard-and-date text,
>
> //This will be a simple flake generated long
>   sequence-id long
>   PRIMARY KEY (shard-and-date, sequence-id)
> )WITH CLUSTERING ORDER BY (sequence-id DESC);
>
>
> The goal of this table is to answer queries like "Get me the count of orders 
> changed in a given sequence-id range". This query will be called once every 5 
> sec.
>
> The plan is to write both these tables in a single BATCH statement.
>
> 1. Will this impact the WRite latency?
>
> 2. Also will it impact Read latency of "orders" table?
>
> 3. Will it impact the overall stability of the cluster?
>
>


Re: Replication Factor Change

2015-11-05 Thread Eric Stevens
If you switch reads to CL=LOCAL_ALL, you should be able to increase RF,
then run repair, and after repair is complete, go back to your old
consistency level.  However, while you're operating at ALL consistency, you
have no tolerance for a node failure (but at RF=1 you already have no
tolerance for a node failure, so that doesn't really change your
availability model).

On Thu, Nov 5, 2015 at 8:01 AM Yulian Oifa  wrote:

> Hello to all.
> I am planning to change replication factor from 1 to 3.
> Will it cause data read errors in time of nodes repair?
>
> Best regards
> Yulian Oifa
>


Re: Does datastax java driver works with ipv6 address?

2015-11-05 Thread Eric Stevens
The server is binding to the IPv4 "all addresses" reserved address
(0.0.0.0), but binding it as IPv4 over IPv6 (:::0.0.0.0), which does
not have the same meaning as the IPv6 all addresses reserved IP (being ::,
aka 0:0:0:0:0:0:0:0).

My guess is you have an IPv4 address of 0.0.0.0 in rpc_address, and the
server is binding as instructed.  Probably you just need to set rpc_address
to either :: or the node's actual IPv6 address.

On Wed, Nov 4, 2015 at 10:36 PM Dikang Gu  wrote:

> Thanks Michael,
>
> Actually I find the problem is with the sever setup, I put "rpc_address:
> 0.0.0.0" in the config, and I find the sever bind to the address like this:
>
> tcp0  0 :::9160 :::*
>  LISTEN  2411582/java
> tcp0  0 :::0.0.0.0:9042 :::*
>LISTEN  2411582/java
>
> So using the sever ip "2401:db00:11:60ed:face:0:31:0", I can connect to
> the thrift port 9160, but not the native port 9042. Do you know the reason
> for this?
>
> Thanks
> Dikang.
>
>
> On Wed, Nov 4, 2015 at 12:29 PM, Michael Shuler 
> wrote:
>
>> On 11/04/2015 11:17 AM, Dikang Gu wrote:
>>
>>> I have ipv6 only cassandra cluster, and I'm trying to connect to it
>>> using java driver, like:
>>>
>>> Inet6Address inet6 = (Inet6Address)
>>> InetAddress.getByName("2401:db00:0011:60ed:face::0031:");
>>> cluster = Cluster.builder().addContactPointsWithPorts(Arrays.asList(new
>>> InetSocketAddress(inet6,9042))).build();
>>> session =cluster.connect(CASSANDRA_KEYSPACE);
>>>
>>> But it failed to connect to the cassandra, looks like the java driver
>>> does not parse the ipv6 address correctly, exceptions are:
>>>
>>> 
>>
>> Open a JIRA bug report for the java driver at:
>>
>>   https://datastax-oss.atlassian.net/browse/JAVA
>>
>> As for IPv6 testing for Cassandra in general, it has been brought up, but
>> little testing is done at this time. If you have some contributions to be
>> made in this area, I'm sure they would be greatly appreciated. You are in a
>> relatively unique position with an IPv6-only cluster, so your input is
>> valuable.
>>
>>
>>
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20text%20~%20ipv6%20AND%20status%20!%3D%20Resolved
>>
>> --
>> Kind regards,
>> Michael
>>
>>
>
>
> --
> Dikang
>
>


Re: why cassanra max is 20000/s on a node ?

2015-11-05 Thread Jack Krupansky
I don't know what current numbers are, but last year the idea of getting 1
million writes per second on a 96 node cluster was considered a reasonable
achievement. That would be roughly 10,000 writes per second per node and
you are getting twice that.

See:
http://www.datastax.com/1-million-writes

Or this Google test which hit 1 million writes per second with 330 nodes,
which would be roughly 3,000 writes per second per node:
http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-million-writes-per-second-on-google-compute-engine.html

So, is your question why your throughput is so good or are you disappointed
that it wasn't better?

Cassandra is designed for clusters with lots of nodes, so if you want to
get an accurate measure of per-node performance you need to test with a
reasonable number of nodes and then divide aggregate performance by the
number of nodes, not test a single node alone. In short, testing a single
node in isolation is not a recommended approach to testing Cassandra
performance.


-- Jack Krupansky

On Thu, Nov 5, 2015 at 9:05 AM, 郝加来  wrote:

> hi
> veryone
> i setup cassandra 2.2.3 on a node , the machine 's environment is
> openjdk-1.8.0 , 512G memory , 128core cpu , 3T ssd .
> the token num is 256 on a node , the program use datastax driver 2.1.8 and
> use 5 thread to insert data to cassandra on the same machine , the data 's
> capcity is 6G  and 1157000 line .
>
> why is the throughput 2/s on the node ?
>
>
> # Per-thread stack size.
>
> JVM_OPTS="$JVM_OPTS -Xss512k"
>
>
>
> # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
>
> JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
>
>
>
> # GC tuning options
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
>
> JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>
> JVM_OPTS="$JVM_OPTS
> -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"
>
>
>
> memtable_heap_space_in_mb: 1024
>
> memtable_offheap_space_in_mb: 10240
>
> memtable_cleanup_threshold: 0.55
>
> memtable_allocation_type: heap_buffers
>
>
>
>
> 以上
> 谢谢
> --
>
> *郝加来*
>
> 金融华东事业部
>
> 东软集团股份有限公司
> 上海市闵行区紫月路1000号东软软件园
> Postcode:200241
> Tel:(86 21) 33578591
> Fax:(86 21) *23025565-111*
> Mobile:13764970711
> Email:ha...@neusoft.com
> Http://www.neusoft.com 
>
>
>
>
>
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
>
> ---
>


Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread Sachin Nikam
I currently have a keyspace with table definition that looks like this.


CREATE TABLE *orders*(
  order-id long PRIMARY KEY,
  order-blob text
);

This table will have a write load of ~40-100 tps and a read load of
~200-400 tps.

We are now considering adding another table definition which closely
resembles a timeseries table.

CREATE TABLE order_sequence(
//shard-id will be generated by order-id%Number of Nodes in
//Cassandra Ring. It will be then suffixed with Current //Date. An
Example would be 2-Nov-11-2015

  shard-and-date text,

//This will be a simple flake generated long
  sequence-id long
  PRIMARY KEY (shard-and-date, sequence-id)
)WITH CLUSTERING ORDER BY (sequence-id DESC);


The goal of this table is to answer queries like "Get me the count of
orders changed in a given sequence-id range". This query will be
called once every 5 sec.

The plan is to write both these tables in a single BATCH statement.

1. Will this impact the WRite latency?

2. Also will it impact Read latency of "orders" table?

3. Will it impact the overall stability of the cluster?


Re: Question for datastax java Driver

2015-11-05 Thread Eric Stevens
In short: Yes, but it's not a good idea.

To do it, you want to look into WhiteListPolicy for your loadbalancer
policy, if your WhiteListPolicy contains only the same host(s) that you
added as contact points, then the client will only connect to those hosts.

However it's probably not a good idea for several reasons.

First, it's directly at odds with Cassandra's availability guarantees.  If
you connect only to one node, and that node goes down, your client has lost
the ability to communicate with the cluster *at all*.  Even though you
(presumably) have replication set up, and the cluster is fully capable of
answering questions and taking writes with that node offline.  If you
permit the default behavior, your client remains connected and functional
through node losses (one or more depending on your replication factor).

Second, this produces coordination overhead, which increases latency for
your requests as well as GC pressure in your cluster.  When you do an
operation on a host that does not own that data, that host will in turn
communicate with the host(s) that *do* own that data.  This is work that
doesn't have to happen, because the java driver can do that work itself,
and communicate directly with primary replicas.  This saves a network hop
(reducing latency) and saves GC pressure in the cluster (the hosts don't
have to coordinate operations, and the requests complete more quickly).

Aside from very narrow scenarios (perhaps diagnostic ones where you're
testing a specific host that you suspect to be misbehaving), I can't think
of a reason you'd want to do this.

On Wed, Nov 4, 2015 at 10:32 PM Dikang Gu  wrote:

> Hi there,
>
> Right now, it seems if I add a contact point like this:
>
> cluster = Cluster.builder().addContactPoint().build();
>
> When client is connected to the cluster, client will fetch the addresses
> for all the nodes in the cluster, and try to connect to them.
>
> I'm wondering can I disable the behavior? I mean I just want each client
> to connect to one or several contact point, not connect to all of the
> nodes, am I able to do this?
>
> Thanks.
> --
> Dikang
>
>


Re: Can't save Opscenter Dashboard

2015-11-05 Thread Kai Wang
It happens again after I reboot another node. This time I see errors in
agent.log. It seems to be related to the previous dead node.

  INFO [clojure-agent-send-off-pool-2] 2015-11-05 09:48:41,602 Attempting
to load stored metric values.
 ERROR [clojure-agent-send-off-pool-2] 2015-11-05 09:48:41,613 There was an
error when attempting to load stored rollups.
 com.datastax.driver.core.exceptions.DriverInternalError: Unexpected error
while processing response from /x.x.x.x:9042
at
com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:42)
at
com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:24)
...
Caused by: com.datastax.driver.core.exceptions.DriverInternalError:
Unexpected error while processing response from /x.x.x.x:9042
at
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:150)
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:183)
at
com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45)
...
Caused by: java.lang.IllegalStateException: Can't use this cluster instance
because it was previously closed
at com.datastax.driver.core.Cluster.checkNotClosed(Cluster.java:493)
at com.datastax.driver.core.Cluster.access$400(Cluster.java:61)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1231)
...
INFO [clojure-agent-send-off-pool-1] 2015-11-05 09:48:41,618 Attempting to
load stored metric values.
 ERROR [clojure-agent-send-off-pool-1] 2015-11-05 09:48:41,622 There was an
error when attempting to load stored rollups.
 com.datastax.driver.core.exceptions.InvalidQueryException: Invalid null
value for partition key part key
at
com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:291)


On Wed, Nov 4, 2015 at 8:43 PM, qihuang.zheng 
wrote:

> We have this problem with version 5.2.0.  so we decide to update to 5.2.2
>
> But this problem seems remain.  We solve this by totally delele relate
> agent file and process and restart. just like first time install.
>
>
> sudo kill -9 `ps -ef|grep datastax_agent_monitor | head -1 |awk '{print
> $2}'` && \
>
> sudo kill -9 `cat /var/run/datastax-agent/datastax-agent.pid` && \
>
> sudo rm -rf /var/lib/datastax-agent && \
>
> sudo rm -rf /usr/share/datastax-agent
>
> --
> qihuang.zheng
>
>  原始邮件
> *发件人:* Kai Wang
> *收件人:* user
> *发送时间:* 2015年11月5日(周四) 04:39
> *主题:* Can't save Opscenter Dashboard
>
> Hi,
>
> Today after one of the nodes is rebooted, OpsCenter dashboard doesn't save
> anymore. It starts with an empty dashboard with no widget or graph. If I
> add some graph/widget, they are being updated fine. But if I refresh the
> browser, the dashboard became empty again.
>
> Also there's no "DEFAULT" tab on the dashboard as the user guide shows. I
> am not sure if it was there before.
>


Replication Factor Change

2015-11-05 Thread Yulian Oifa
Hello to all.
I am planning to change replication factor from 1 to 3.
Will it cause data read errors in time of nodes repair?

Best regards
Yulian Oifa


Re: scylladb

2015-11-05 Thread Carlos Rolo
Something to do on a expected rainy weekend. Thanks for the information.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen  wrote:

> As of two days ago, they say they've got it @cjrolo.
>
> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>
>
> On Thursday, November 5, 2015, Carlos Rolo  wrote:
>
>> I will not try until multi-DC is implemented. More than an month has
>> passed since I looked for it, so it could possibly be in place, if so I may
>> take some time to test it.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: 
>> *linkedin.com/in/carlosjuzarterolo
>> *
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>>
>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad 
>> wrote:
>>
>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>> feedback.
>>>
>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli 
>>> wrote:
>>> >
>>> > Hi guys,
>>> >
>>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>>> town) and has some thoughts/hands-on experience to share?
>>> >
>>> > Cheers,
>>> > Tommaso
>>>
>>>
>>
>> --
>>
>>
>>
>>
>
> --
> Sent from mobile -- apologizes for brevity or errors.
>

-- 


--





why cassanra max is 20000/s on a node ?

2015-11-05 Thread 郝加来
hi
veryone 
i setup cassandra 2.2.3 on a node , the machine 's environment is openjdk-1.8.0 
, 512G memory , 128core cpu , 3T ssd .
the token num is 256 on a node , the program use datastax driver 2.1.8 and use 
5 thread to insert data to cassandra on the same machine , the data 's capcity 
is 6G  and 1157000 line .

why is the throughput 2/s on the node ?

# Per-thread stack size.
JVM_OPTS="$JVM_OPTS -Xss512k"
 
# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
JVM_OPTS="$JVM_OPTS -XX:StringTableSize=103"
 
# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+CMSIncrementalMode"
JVM_OPTS="$JVM_OPTS -XX:+DisableExplicitGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"  
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=6"

memtable_heap_space_in_mb: 1024
memtable_offheap_space_in_mb: 10240
memtable_cleanup_threshold: 0.55 
memtable_allocation_type: heap_buffers 
 


以上
谢谢




郝加来

金融华东事业部


东软集团股份有限公司
上海市闵行区紫月路1000号东软软件园
Postcode:200241
Tel:(86 21) 33578591
Fax:(86 21) 23025565-111
Mobile:13764970711
Email:ha...@neusoft.com
Http://www.neusoft.com


---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---


Re: scylladb

2015-11-05 Thread Dani Traphagen
As of two days ago, they say they've got it @cjrolo.

https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta

On Thursday, November 5, 2015, Carlos Rolo  wrote:

> I will not try until multi-DC is implemented. More than an month has
> passed since I looked for it, so it could possibly be in place, if so I may
> take some time to test it.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> *
> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad  > wrote:
>
>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>> feedback.
>>
>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli > > wrote:
>> >
>> > Hi guys,
>> >
>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>> town) and has some thoughts/hands-on experience to share?
>> >
>> > Cheers,
>> > Tommaso
>>
>>
>
> --
>
>
>
>

-- 
Sent from mobile -- apologizes for brevity or errors.


Re: scylladb

2015-11-05 Thread Carlos Rolo
I will not try until multi-DC is implemented. More than an month has passed
since I looked for it, so it could possibly be in place, if so I may take
some time to test it.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad 
wrote:

> Nope, no one I know.  Let me know if you try it I'd love to hear your
> feedback.
>
> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli 
> wrote:
> >
> > Hi guys,
> >
> > did anyone already try Scylladb (yet another fastest NoSQL database in
> town) and has some thoughts/hands-on experience to share?
> >
> > Cheers,
> > Tommaso
>
>

-- 


--





Re: scylladb

2015-11-05 Thread Jon Haddad
Nope, no one I know.  Let me know if you try it I'd love to hear your feedback.

> On Nov 5, 2015, at 9:22 AM, tommaso barbugli  wrote:
> 
> Hi guys,
> 
> did anyone already try Scylladb (yet another fastest NoSQL database in town) 
> and has some thoughts/hands-on experience to share?
> 
> Cheers,
> Tommaso



scylladb

2015-11-05 Thread tommaso barbugli
Hi guys,

did anyone already try Scylladb (yet another fastest NoSQL database in
town) and has some thoughts/hands-on experience to share?

Cheers,
Tommaso