How to use cassandra of python better?
As title, I want to utilize cassandra's advantages in maximum but I don't know how. So far, I know I can improve performance by execute_async and batchstatement. When I want more nodes scalability, just add server and modify some config files. Are there ways to help me use cassandra-python better?
Re: Data is not syncing up when we add one more Node(DR) to existing 3 node cluster
Hi Anil, In the cassandra.yaml file on your new node in DC2, is the IP address for the seeds set to the seed node in DC1? Best, John On Wed, Dec 11, 2019 at 11:09 PM Anil Kumar Ganipineni < akganipin...@adaequare.com> wrote: > Hi All, > > > > We have 3 node cluster on datacentre DC1 and below is our key space > declaration. The current data size on the cluster is ~10GB. When we add a > new node on datacentre DC2, the new node is not syncing up with the data, > but it is showing UN when I run the *nodetool status*. > > > > *CREATE* KEYSPACE *production* *WITH* REPLICATION = { 'class' : > 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'DC1': '3', 'DC2': > '1' } *AND* DURABLE_WRITES = *true*; > > > > > > Please provide suggestions to make the new node on DC2 to sync up with > existing cluster. This is required as the DC2 is our DR in a different > region other than existing cluster. > > > > > > *Regards,* > > *Anil Ganipineni* > > > > *P** Please consider environment before printing this page.* > > >
Re: Measuring Cassandra Metrics at a Sessions/Connection Levels
Metrics are exposed via JMX. You can use something like jmxtrans or collectd with the jmx plugin to capture metrics per-node and route them to whatever you use to aggregate metrics. From: Fred Habash Reply-To: "user@cassandra.apache.org" Date: Thursday, December 12, 2019 at 9:38 AM To: "user@cassandra.apache.org" Subject: Measuring Cassandra Metrics at a Sessions/Connection Levels Message from External Sender Hi all ... We are facing a scenario where we have to measure for some metrics on a per connection or client basis. For example. count of read/write request by client IP/host/user/program. We want to know the source of C* requests for budgeting, capacity planing, or charge-backs. We are running 2.2.8. I did some research and I just wanted to verify my findings ... 1. C* 4+ has two instruments 'nodetool clientstats' & system_view.clinets 2. Earlier release have no native instruments to collect these metrics Is there any other way to measure such metrics? Thank you
Re: average row size in a cassandra table
For rough estimate, I’ve seen the following pattern. Sudo code Do queries by token range at random. Select asjson * from table; Take the length of json string of each row. Perform average. Cheers. From: Ayub M Reply-To: "user@cassandra.apache.org" Date: Wednesday, December 11, 2019 at 11:17 PM To: "user@cassandra.apache.org" Subject: average row size in a cassandra table How to find average row size of a table in cassandra? I am not looking for partition size (which can be found from nodetool tablehistograms), since a partition can have many rows. I am looking for row size.
Measuring Cassandra Metrics at a Sessions/Connection Levels
Hi all ... We are facing a scenario where we have to measure for some metrics on a per connection or client basis. For example. count of read/write request by client IP/host/user/program. We want to know the source of C* requests for budgeting, capacity planing, or charge-backs. We are running 2.2.8. I did some research and I just wanted to verify my findings ... 1. C* 4+ has two instruments 'nodetool clientstats' & system_view.clinets 2. Earlier release have no native instruments to collect these metrics Is there any other way to measure such metrics? Thank you
Re: execute is faster than execute_async?
On 12/12/2019 06.25, lampahome wrote: Jon Haddad mailto:j...@jonhaddad.com>> 於 2019年12月12日 週四 上午12:42寫道: I'm not sure how you're measuring this - could you share your benchmarking code? s the details of theri? start = time.time() for i in range(40960): prep = session.prepare(query, (args)) session.execute(prep) # or session.execute_async(prep) print('time', time.time()-start) Just like above code snippet. I almost cost time by execute_async() more than normal execute(). I think you're just exposing Python and perhaps driver weaknesses. With .execute(), memory usage stays constant and you suffer the round trip time once per loop. With .execute_async(), memory usage grows, and if there is any algorithm in the driver that is not O(1) (say to maintain the outstanding request table), execution time grows as you push more and more requests. The thread(s) that process responses have to contend with the request issuing thread over locks. You don't suffer the round trip time, but from your results the other issues dominate. If you also collected responses in your loop, and also bound the number of outstanding requests to a reasonable number, you'll see execute_async performing better. You'll see even better performance if you drop Python for a language more suitable for the data plane.