How to use cassandra of python better?

2019-12-12 Thread lampahome
As title, I want to utilize cassandra's advantages in maximum but I don't
know how.

So far, I know I can improve performance by execute_async and
batchstatement.

When I want more nodes scalability, just add server and modify some config
files.

Are there ways to help me use cassandra-python better?


Re: Data is not syncing up when we add one more Node(DR) to existing 3 node cluster

2019-12-12 Thread John Belliveau
Hi Anil,

In the cassandra.yaml file on your new node in DC2, is the IP address for
the seeds set to the seed node in DC1?

Best,
John

On Wed, Dec 11, 2019 at 11:09 PM Anil Kumar Ganipineni <
akganipin...@adaequare.com> wrote:

> Hi All,
>
>
>
> We have 3 node cluster on datacentre DC1 and below is our key space
> declaration. The current data size on the cluster is ~10GB. When we add a
> new node on datacentre DC2, the new node is not syncing up with the data,
> but it is showing UN when I run the *nodetool status*.
>
>
>
> *CREATE* KEYSPACE *production* *WITH* REPLICATION = { 'class' :
> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'DC1': '3', 'DC2':
> '1' } *AND* DURABLE_WRITES = *true*;
>
>
>
>
>
> Please provide suggestions to make the new node on DC2 to sync up with
> existing cluster. This is required as the DC2 is our DR in a different
> region other than existing cluster.
>
>
>
>
>
> *Regards,*
>
> *Anil Ganipineni*
>
>
>
> *P** Please consider environment before printing this page.*
>
>
>


Re: Measuring Cassandra Metrics at a Sessions/Connection Levels

2019-12-12 Thread Reid Pinchback
Metrics are exposed via JMX.  You can use something like jmxtrans or collectd 
with the jmx plugin to capture metrics per-node and route them to whatever you 
use to aggregate metrics.

From: Fred Habash 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, December 12, 2019 at 9:38 AM
To: "user@cassandra.apache.org" 
Subject: Measuring Cassandra Metrics at a Sessions/Connection Levels

Message from External Sender
Hi all ...

We are facing a scenario where we have to measure for some metrics on a per 
connection or client basis. For example. count of read/write request by client 
IP/host/user/program. We want to know the source of C* requests for budgeting, 
capacity planing, or charge-backs.
We are running 2.2.8.

I did some research and I just wanted to verify my findings ...

1. C* 4+ has two instruments 'nodetool clientstats' & system_view.clinets
2. Earlier release have no native instruments to collect these metrics

Is there any other way to measure such metrics?


Thank you



Re: average row size in a cassandra table

2019-12-12 Thread Raney, Michael
For rough estimate, I’ve seen the following pattern.

Sudo code
Do queries by token range at random.

Select asjson * from table;

Take the length of json string of each row.

Perform average.

Cheers.

From: Ayub M 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, December 11, 2019 at 11:17 PM
To: "user@cassandra.apache.org" 
Subject: average row size in a cassandra table

How to find average row size of a table in cassandra? I am not looking for 
partition size (which can be found from nodetool tablehistograms), since a 
partition can have many rows. I am looking for row size.


Measuring Cassandra Metrics at a Sessions/Connection Levels

2019-12-12 Thread Fred Habash
Hi all ...

We are facing a scenario where we have to measure for some metrics on a per
connection or client basis. For example. count of read/write request by
client IP/host/user/program. We want to know the source of C* requests for
budgeting, capacity planing, or charge-backs.
We are running 2.2.8.

I did some research and I just wanted to verify my findings ...

1. C* 4+ has two instruments 'nodetool clientstats' & system_view.clinets
2. Earlier release have no native instruments to collect these metrics

Is there any other way to measure such metrics?


Thank you


Re: execute is faster than execute_async?

2019-12-12 Thread Avi Kivity


On 12/12/2019 06.25, lampahome wrote:



Jon Haddad mailto:j...@jonhaddad.com>> 於 
2019年12月12日 週四 上午12:42寫道:


I'm not sure how you're measuring this - could you share your
benchmarking code?

s the details of theri?


start = time.time()
for i in range(40960):
    prep = session.prepare(query, (args))
    session.execute(prep) # or session.execute_async(prep)
print('time', time.time()-start)

Just like above code snippet.
I almost cost time by execute_async()   more than normal execute().



I think you're just exposing Python and perhaps driver weaknesses.


With .execute(), memory usage stays constant and you suffer the round 
trip time once per loop.


With .execute_async(), memory usage grows, and if there is any algorithm 
in the driver that is not O(1) (say to maintain the outstanding request 
table), execution time grows as you push more and more requests. The 
thread(s) that process responses have to contend with the request 
issuing thread over locks. You don't suffer the round trip time, but 
from your results the other issues dominate.



If you also collected responses in your loop, and also bound the number 
of outstanding requests to a reasonable number, you'll see execute_async 
performing better. You'll see even better performance if you drop Python 
for a language more suitable for the data plane.