Re: operation and maintenance tools

2016-11-07 Thread wxn...@zjqunshuo.com
Thank you for the response. Now I have more confidence on using nodetool:-)

From: Dikang Gu
Date: 2016-11-08 14:52
To: cassandra
Subject: Re: operation and maintenance tools
Hi Simon, 

For a 10 nodes cluster, Cassandra nodetool should be enough for most C* 
operations and maintenance, unless you have some special requirements.

For the memory, you can check what's your JVM settings, and the gc log for JVM 
usage.

--Dikang.

On Mon, Nov 7, 2016 at 7:25 PM, wxn...@zjqunshuo.com  
wrote:
Hi All,

I need to do maintenance work for a C* cluster with about 10 nodes. Please 
recommend a C* operation and maintenance tools you are using.
I also noticed my C* deamon using large memory while doing nothing. Is there 
any convenent tool to deeply analysize the C* node memory?

Cheers,
Simon



-- 
Dikang



Re: A difficult data model with C*

2016-11-07 Thread Mickael Delanoë
Which version of Cassandra are you using? If this is a 3.0 or higher, why
don't you create a materialized view for ypur base table with the last time
as the first clustering colum?

However : It need to be confirmed if this is not an anti-pattern for
cassandra as this materialized view will have a lot delete + insert (i.e
lots of  tombstone i think)

Le 8 nov. 2016 7:49 AM, "Dikang Gu"  a écrit :
>
> Agree, change the last_time to be descending order will help, you can
also TTL the data, so that the old records will be purged by Cassandra.
>
> --Dikang.
>
> On Mon, Nov 7, 2016 at 10:39 PM, Alain Rastoul 
wrote:
>>
>> On 11/08/2016 03:54 AM, ben ben wrote:
>>>
>>> Hi guys,
>>>CREATE TABLE recent (
>>>  user_name text,
>>>  vedio_id text,
>>>  position int,
>>>  last_time timestamp,
>>>  PRIMARY KEY (user_name, vedio_id)
>>> )
>>>
>>
>> Hi Ben,
>>
>> May be a clustering columns order would help
>> CREATE TABLE recent (
>> ...
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>> So you can query only the last 10 records
>> SELECT * FROM recent WHERE vedio_id = xxx  LIMIT 10
>>
>> See here http://www.datastax.com/dev/blog/we-shall-have-order
>> --
>> best,
>> Alain
>
>
>
>
> --
> Dikang
>


Re: operation and maintenance tools

2016-11-07 Thread Dikang Gu
Hi Simon,

For a 10 nodes cluster, Cassandra nodetool should be enough for most C*
operations and maintenance, unless you have some special requirements.

For the memory, you can check what's your JVM settings, and the gc log for
JVM usage.

--Dikang.

On Mon, Nov 7, 2016 at 7:25 PM, wxn...@zjqunshuo.com 
wrote:

> Hi All,
>
> I need to do maintenance work for a C* cluster with about 10 nodes. Please
> recommend a C* operation and maintenance tools you are using.
> I also noticed my C* deamon using large memory while doing nothing. Is
> there any convenent tool to deeply analysize the C* node memory?
>
> Cheers,
> Simon
>



-- 
Dikang


Re: A difficult data model with C*

2016-11-07 Thread Dikang Gu
Agree, change the last_time to be descending order will help, you can also
TTL the data, so that the old records will be purged by Cassandra.

--Dikang.

On Mon, Nov 7, 2016 at 10:39 PM, Alain Rastoul 
wrote:

> On 11/08/2016 03:54 AM, ben ben wrote:
>
>> Hi guys,
>>CREATE TABLE recent (
>>  user_name text,
>>  vedio_id text,
>>  position int,
>>  last_time timestamp,
>>  PRIMARY KEY (user_name, vedio_id)
>> )
>>
>>
> Hi Ben,
>
> May be a clustering columns order would help
> CREATE TABLE recent (
> ...
> ) WITH CLUSTERING ORDER BY (last_time DESC);
> So you can query only the last 10 records
> SELECT * FROM recent WHERE vedio_id = xxx  LIMIT 10
>
> See here http://www.datastax.com/dev/blog/we-shall-have-order
> --
> best,
> Alain
>



-- 
Dikang


Re: A difficult data model with C*

2016-11-07 Thread Alain Rastoul

On 11/08/2016 03:54 AM, ben ben wrote:

Hi guys,
   CREATE TABLE recent (
 user_name text,
 vedio_id text,
 position int,
 last_time timestamp,
 PRIMARY KEY (user_name, vedio_id)
)



Hi Ben,

May be a clustering columns order would help
CREATE TABLE recent (
...
) WITH CLUSTERING ORDER BY (last_time DESC);
So you can query only the last 10 records
SELECT * FROM recent WHERE vedio_id = xxx  LIMIT 10

See here http://www.datastax.com/dev/blog/we-shall-have-order
--
best,
Alain


Re: Secondary index tombstone limit

2016-11-07 Thread anil ahlawat
unsubsribe

Sent from Yahoo Mail on Android 
 
  On Tue, 8 Nov, 2016 at 2:11 pm, Oleg Krayushkin wrote:  
 Hi, could you please clarify: 100k tombstone limit for SE is per CF, cf-node, 
original sstable or (very unlikely) partition?

Thanks!-- 

Oleg Krayushkin  


Secondary index tombstone limit

2016-11-07 Thread Oleg Krayushkin
Hi, could you please clarify: 100k tombstone limit for SE is per CF,
cf-node, original sstable or (very unlikely) partition?

Thanks!
-- 

Oleg Krayushkin


operation and maintenance tools

2016-11-07 Thread wxn...@zjqunshuo.com
Hi All,

I need to do maintenance work for a C* cluster with about 10 nodes. Please 
recommend a C* operation and maintenance tools you are using.
I also noticed my C* deamon using large memory while doing nothing. Is there 
any convenent tool to deeply analysize the C* node memory?

Cheers,
Simon


A difficult data model with C*

2016-11-07 Thread ben ben
Hi guys,

  We are maintaining a system for an on-line video service. ALL users' viewing 
records of every movie are stored in C*. So she/he can continue to enjoy the 
movie from the last point next time. The table is designed as below:
  CREATE TABLE recent (
user_name text,
vedio_id text,
position int,
last_time timestamp,
PRIMARY KEY (user_name, vedio_id)
)

  It worked well before. However, the records increase every day and the last 
ten items may be adequate for the business. The current model use vedio_id as 
cluster key to keep a row for a movie, but as you know, the business prefer to 
order by the last_time desc. If we use last_time as cluster key, there will be 
many records for a singe movie and the recent one is actually desired. So how 
to model that? Do you have any suggestions?
  Thanks!


BRs,
BEN



Re: store individual inventory items in a table, how to assign them correctly

2016-11-07 Thread Justin Cameron
You can use lightweight transactions to achieve this.

Example:
UPDATE item SET customer = 'Joe' WHERE item_id = 2 IF customer = null;

Keep in mind that lightweight transactions have performance tradeoffs (
http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0)


On Mon, 7 Nov 2016 at 11:52 S Ahmed  wrote:

> Say I have 100 products in inventory, instead of having a counter I want
> to create 100 rows per inventory item.
>
> When someone purchases a product, how can I correctly assign that customer
> a product from inventory without having any race conditions etc?
>
> Thanks.
>
-- 

Justin Cameron

Senior Software Engineer | Instaclustr




This email has been sent on behalf of Instaclustr Pty Ltd (Australia) and
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: large number of pending compactions, sstables steadily increasing

2016-11-07 Thread Benjamin Roth
Hm, this MAY somehow relate to the issue I encountered recently:
https://issues.apache.org/jira/browse/CASSANDRA-12730
I also made a proposal to mitigate excessive (unnecessary) flushes during
repair streams but unfortunately nobody commented on it yet.
Maybe there are some opinions on it around here?

2016-11-07 20:15 GMT+00:00 Ben Slater :

> What I’ve seen happen a number of times is you get in a negative feedback
> loop:
> not enough capacity to keep up with compactions (often triggered by repair
> or compaction hitting a large partition) -> more sstables -> more expensive
> reads -> even less capacity to keep up with compactions -> repeat
>
> The way we deal with this at Instaclustr is typically to take the node
> offline to let it catch up with compactions. We take it offline by running
> nodetool disablegossip + disablethrift + disablebinary, unthrottle
> compactions (nodetool setcompactionthroughput 0) and then leave it to chug
> through compactions until it gets close to zero then reverse the settings
> or restart C* to set things back to normal. This typically resolves the
> issues. If you see it happening regularly your cluster probably needs more
> processing capacity (or other tuning).
>
> Cheers
> Ben
>
> On Tue, 8 Nov 2016 at 02:38 Eiti Kimura  wrote:
>
>> Hey guys,
>>
>> Do we have any conclusions about this case? Ezra, did you solve your
>> problem?
>> We are facing a very similar problem here. LeveledCompaction with VNodes
>> and looks like a node went to a weird state and start to consume lot of
>> CPU, the compaction process seems to be stucked and the number of SSTables
>> increased significantly.
>>
>> Do you have any clue about it?
>>
>> Thanks,
>> Eiti
>>
>>
>>
>> J.P. Eiti Kimura
>> Plataformas
>>
>> +55 19 3518  5500
>> + 55 19 98232 2792
>> skype: eitikimura
>> 
>>   
>> 
>>
>> 2016-09-11 18:20 GMT-03:00 Jens Rantil :
>>
>> I just want to chime in and say that we also had issues keeping up with
>> compaction once (with vnodes/ssd disks) and I also want to recommend
>> keeping track of your open file limit which might bite you.
>>
>> Cheers,
>> Jens
>>
>>
>> On Friday, August 19, 2016, Mark Rose  wrote:
>>
>> Hi Ezra,
>>
>> Are you making frequent changes to your rows (including TTL'ed
>> values), or mostly inserting new ones? If you're only inserting new
>> data, it's probable using size-tiered compaction would work better for
>> you. If you are TTL'ing whole rows, consider date-tiered.
>>
>> If leveled compaction is still the best strategy, one way to catch up
>> with compactions is to have less data per partition -- in other words,
>> use more machines. Leveled compaction is CPU expensive. You are CPU
>> bottlenecked currently, or from the other perspective, you have too
>> much data per node for leveled compaction.
>>
>> At this point, compaction is so far behind that you'll likely be
>> getting high latency if you're reading old rows (since dozens to
>> hundreds of uncompacted sstables will likely need to be checked for
>> matching rows). You may be better off with size tiered compaction,
>> even if it will mean always reading several sstables per read (higher
>> latency than when leveled can keep up).
>>
>> How much data do you have per node? Do you update/insert to/delete
>> rows? Do you TTL?
>>
>> Cheers,
>> Mark
>>
>> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel 
>> wrote:
>> > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
>> fix
>> > issue) which seems to be stuck in a weird state -- with a large number
>> of
>> > pending compactions and sstables. The node is compacting about
>> 500gb/day,
>> > number of pending compactions is going up at about 50/day. It is at
>> about
>> > 2300 pending compactions now. I have tried increasing number of
>> compaction
>> > threads and the compaction throughput, which doesn't seem to help
>> eliminate
>> > the many pending compactions.
>> >
>> > I have tried running 'nodetool cleanup' and 'nodetool compact'. The
>> latter
>> > has fixed the issue in the past, but most recently I was getting OOM
>> errors,
>> > probably due to the large number of sstables. I upgraded to 2.2.7 and
>> am no
>> > longer getting OOM errors, but also it does not resolve the issue. I do
>> see
>> > this message in the logs:
>> >
>> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
>> >> CompactionManager.java:610 - Cannot perform a full major compaction as
>> >> repaired and unrepaired sstables cannot be compacted together. These
>> two set
>> >> of sstables will be compacted separately.
>> >
>> > Below are the 'nodetool tablestats' comparing a normal and the
>> problematic

Re: large number of pending compactions, sstables steadily increasing

2016-11-07 Thread Ben Slater
What I’ve seen happen a number of times is you get in a negative feedback
loop:
not enough capacity to keep up with compactions (often triggered by repair
or compaction hitting a large partition) -> more sstables -> more expensive
reads -> even less capacity to keep up with compactions -> repeat

The way we deal with this at Instaclustr is typically to take the node
offline to let it catch up with compactions. We take it offline by running
nodetool disablegossip + disablethrift + disablebinary, unthrottle
compactions (nodetool setcompactionthroughput 0) and then leave it to chug
through compactions until it gets close to zero then reverse the settings
or restart C* to set things back to normal. This typically resolves the
issues. If you see it happening regularly your cluster probably needs more
processing capacity (or other tuning).

Cheers
Ben

On Tue, 8 Nov 2016 at 02:38 Eiti Kimura  wrote:

> Hey guys,
>
> Do we have any conclusions about this case? Ezra, did you solve your
> problem?
> We are facing a very similar problem here. LeveledCompaction with VNodes
> and looks like a node went to a weird state and start to consume lot of
> CPU, the compaction process seems to be stucked and the number of SSTables
> increased significantly.
>
> Do you have any clue about it?
>
> Thanks,
> Eiti
>
>
>
> J.P. Eiti Kimura
> Plataformas
>
> +55 19 3518  5500
> + 55 19 98232 2792
> skype: eitikimura
> 
>   
> 
>
> 2016-09-11 18:20 GMT-03:00 Jens Rantil :
>
> I just want to chime in and say that we also had issues keeping up with
> compaction once (with vnodes/ssd disks) and I also want to recommend
> keeping track of your open file limit which might bite you.
>
> Cheers,
> Jens
>
>
> On Friday, August 19, 2016, Mark Rose  wrote:
>
> Hi Ezra,
>
> Are you making frequent changes to your rows (including TTL'ed
> values), or mostly inserting new ones? If you're only inserting new
> data, it's probable using size-tiered compaction would work better for
> you. If you are TTL'ing whole rows, consider date-tiered.
>
> If leveled compaction is still the best strategy, one way to catch up
> with compactions is to have less data per partition -- in other words,
> use more machines. Leveled compaction is CPU expensive. You are CPU
> bottlenecked currently, or from the other perspective, you have too
> much data per node for leveled compaction.
>
> At this point, compaction is so far behind that you'll likely be
> getting high latency if you're reading old rows (since dozens to
> hundreds of uncompacted sstables will likely need to be checked for
> matching rows). You may be better off with size tiered compaction,
> even if it will mean always reading several sstables per read (higher
> latency than when leveled can keep up).
>
> How much data do you have per node? Do you update/insert to/delete
> rows? Do you TTL?
>
> Cheers,
> Mark
>
> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel 
> wrote:
> > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
> fix
> > issue) which seems to be stuck in a weird state -- with a large number of
> > pending compactions and sstables. The node is compacting about 500gb/day,
> > number of pending compactions is going up at about 50/day. It is at about
> > 2300 pending compactions now. I have tried increasing number of
> compaction
> > threads and the compaction throughput, which doesn't seem to help
> eliminate
> > the many pending compactions.
> >
> > I have tried running 'nodetool cleanup' and 'nodetool compact'. The
> latter
> > has fixed the issue in the past, but most recently I was getting OOM
> errors,
> > probably due to the large number of sstables. I upgraded to 2.2.7 and am
> no
> > longer getting OOM errors, but also it does not resolve the issue. I do
> see
> > this message in the logs:
> >
> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
> >> CompactionManager.java:610 - Cannot perform a full major compaction as
> >> repaired and unrepaired sstables cannot be compacted together. These
> two set
> >> of sstables will be compacted separately.
> >
> > Below are the 'nodetool tablestats' comparing a normal and the
> problematic
> > node. You can see problematic node has many many more sstables, and they
> are
> > all in level 1. What is the best way to fix this? Can I just delete those
> > sstables somehow then run a repair?
> >>
> >> Normal node
> >>>
> >>> keyspace: mykeyspace
> >>>
> >>> Read Count: 0
> >>>
> >>> Read Latency: NaN ms.
> >>>
> >>> Write Count: 31905656
> >>>
> >>> Write Latency: 0.051713177939359714 ms.
> >>>
> >>> Pending Flushes: 0
> >>>
> >>> Table: mytable
> >>>
> >>> 

store individual inventory items in a table, how to assign them correctly

2016-11-07 Thread S Ahmed
Say I have 100 products in inventory, instead of having a counter I want to
create 100 rows per inventory item.

When someone purchases a product, how can I correctly assign that customer
a product from inventory without having any race conditions etc?

Thanks.


Re: Designing a table in cassandra

2016-11-07 Thread sat
Hi Carlos Alonso,

Thanks for your quick answer.

Thanks and Regards
A.SathishKumar

On Mon, Nov 7, 2016 at 2:26 AM, Carlos Alonso  wrote:

> Hi,
>
> I think your best bet is, as usual, the simplest one that can work, which,
> to me, in this case is the 3rd one. Creating one single device table that
> contains the different 'versions' of the configuration during time, along
> with a flag to know wether it was updated by user or by network gives you
> all the flexibility you need. The primary key you suggest sounds good to me.
>
> To finally validate the model it would be good to know which are the
> queries you're thinking of running against this model because as you
> probably know, Cassandra models should be query driven.
>
> The suggested primary key will work for queries like "Give me the
> version(s) of this particular device_name in this particular time range"
>
> Hope it helps.
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 7 November 2016 at 01:23, sat  wrote:
>
>> Hi,
>>
>> We are new to Cassandra. For our POC, we tried creating table and
>> inserting them as JSON and all these went fine. Now we are trying to
>> implement one of the application scenario, and I am having difficulty in
>> coming up with the best approach.
>>
>> Scenario:
>> We have a Device POJO which have some attributes/fields which are
>> read/write by users as well as network and some attributes/fields only
>> network can modify. When users need to configure they will create an
>> instance of Device POJO and set/configure applicable fields, however
>> network can update those attributes. We wanted to know the discrepancy by
>> the values configured by users versus the values updated by network. Hence
>> we have thought of 3 different approaches
>>
>> 1) Create multiple tables for the same Device like Device_Users and
>> Device_Network so that we can see the difference.
>>
>> 2) Create different Keyspace as multiple objects like Device can have the
>> same requirement
>>
>> 3) Create one "Device" table and insert one row for user configuration
>> and another row for network update. We will create this table with multiple
>> primary key (device_name, updated_by)
>>
>> Please let us know which is the best option (with their pros and cons if
>> possible) among these 3, and also let us know if there are other options.
>>
>> Thanks and Regards
>> A.SathishKumar
>>
>
>


-- 
A.SathishKumar
044-24735023


Re: Cassandra Python Driver : execute_async consumes lots of memory?

2016-11-07 Thread Lahiru Gamathige
Hi Rajesh,

By looking at your code I see that the memory would definitely grow because
you write big batches async and you will end up large number of batch
statements and the all end up slowing down. We recently migrated some data
to C* and what we did was we created a data stream and wrote in batches and
used a library which is sensitive to back-pressure of the stream. In your
implementation there's is no back-pressure to control it. We migrated data
pretty fast by keeping the CPU 100% constantly and achieve the highest
performance (used Scala with akka-streams and phantom-websudo).

I would consider using some streaming API to implement this. When you do
batching make sure you don't exceed the max match size, then things will
slow down anyways.

Lahiru

On Mon, Nov 7, 2016 at 8:51 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Hi
>
> We are trying to inject millions to data into a table by executing Batches
> of PreparedStatments.
>
> We found that when we use 'session.execute(batch)', it write more data but
> very very slow.
> However if we use  'session.execute_async(batch)' then its relatively fast
> but when it reaches certain limit, its fillup the memory (python process)
>
> Our implementation:
> Cassandra 3.7.0 cluster  ring with 3 nodes (RedHat, 150GB Disk, 8GB of RAM
> each)
>
> Python 2.7.12
>
> Anybody know how to reduce the memory use of Cassandra-python driver API
> specifically for execute_async? Thank you!
>
>
>
> ===CODE ==
>   sqlQuery = "INSERT INTO tableV  (id, sample_name, pos, ref_base,
> var_base) values (?,?,?,?,?)"
>random_numbers_for_strains = random.sample(xrange(1,300), 200)
> random_numbers = random.sample(xrange(1,200), 20)
>
> totalCounter  = 0
> c = 0
> time_init = time.time()
> for random_number_strain in random_numbers_for_strains:
>
> sample_name = None
> sample_name = 'sample'+str(random_number_strain)
>
> cassandraCluster = CassandraCluster.CassandraCluster()
> cluster = cassandraCluster.create_cluster_with_protocol2()
> session = cluster.connect();
> #session.default_timeout = 1800
> session.set_keyspace(self.KEYSPACE_NAME)
>
> preparedStatement = session.prepare(sqlQuery)
>
> counter = 0
> c = c + 1
>
> for random_number in random_numbers:
>
> totalCounter += 1
> if counter == 0 :
> batch = BatchStatement()
>
> counter += 1
> if totalCounter % 1 == 0 :
> print "Total Count "+ str(totalCounter)
>
> batch.add(preparedStatement.bind([ uuid.uuid1(),
> sample_name, random_number, random.choice('GT'), random.choice('AC')]))
> if counter % 50 == 0:
> session.execute_async(batch)
> #session.execute(batch)
> batch = None
> del batch
> counter = 0
>
> time.sleep(2);
> session.cluster.shutdown()
> random_number= None
> del random_number
> preparedStatement = None
> session = None
> del session
> cluster = None
> del cluster
> cassandraCluster = None
> del cassandraCluster
> gc.collect()
>
> ===CODE ==
>
>
>
> Kind regards,
> Rajesh Radhakrishnan
>
>
> **
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **
>


Cassandra Python Driver : execute_async consumes lots of memory?

2016-11-07 Thread Rajesh Radhakrishnan
Hi

We are trying to inject millions to data into a table by executing Batches of 
PreparedStatments.

We found that when we use 'session.execute(batch)', it write more data but very 
very slow.
However if we use  'session.execute_async(batch)' then its relatively fast but 
when it reaches certain limit, its fillup the memory (python process)

Our implementation:
Cassandra 3.7.0 cluster  ring with 3 nodes (RedHat, 150GB Disk, 8GB of RAM each)

Python 2.7.12

Anybody know how to reduce the memory use of Cassandra-python driver API 
specifically for execute_async? Thank you!



===CODE ==
  sqlQuery = "INSERT INTO tableV  (id, sample_name, pos, ref_base, 
var_base) values (?,?,?,?,?)"
   random_numbers_for_strains = random.sample(xrange(1,300), 200)
random_numbers = random.sample(xrange(1,200), 20)

totalCounter  = 0
c = 0
time_init = time.time()
for random_number_strain in random_numbers_for_strains:

sample_name = None
sample_name = 'sample'+str(random_number_strain)

cassandraCluster = CassandraCluster.CassandraCluster()
cluster = cassandraCluster.create_cluster_with_protocol2()
session = cluster.connect();
#session.default_timeout = 1800
session.set_keyspace(self.KEYSPACE_NAME)

preparedStatement = session.prepare(sqlQuery)

counter = 0
c = c + 1

for random_number in random_numbers:

totalCounter += 1
if counter == 0 :
batch = BatchStatement()

counter += 1
if totalCounter % 1 == 0 :
print "Total Count "+ str(totalCounter)

batch.add(preparedStatement.bind([ uuid.uuid1(), sample_name, 
random_number, random.choice('GT'), random.choice('AC')]))
if counter % 50 == 0:
session.execute_async(batch)
#session.execute(batch)
batch = None
del batch
counter = 0

time.sleep(2);
session.cluster.shutdown()
random_number= None
del random_number
preparedStatement = None
session = None
del session
cluster = None
del cluster
cassandraCluster = None
del cassandraCluster
gc.collect()

===CODE ==



Kind regards,
Rajesh Radhakrishnan


**
The information contained in the EMail and any attachments is confidential and 
intended solely and for the attention and use of the named addressee(s). It may 
not be disclosed to any other person without the express authority of Public 
Health England, or the intended recipient, or both. If you are not the intended 
recipient, you must not disclose, copy, distribute or retain this message or 
any part of it. This footnote also confirms that this EMail has been swept for 
computer viruses by Symantec.Cloud, but please re-sweep any attachments before 
opening or saving. http://www.gov.uk/PHE
**

Re: large number of pending compactions, sstables steadily increasing

2016-11-07 Thread Eiti Kimura
Hey guys,

Do we have any conclusions about this case? Ezra, did you solve your
problem?
We are facing a very similar problem here. LeveledCompaction with VNodes
and looks like a node went to a weird state and start to consume lot of
CPU, the compaction process seems to be stucked and the number of SSTables
increased significantly.

Do you have any clue about it?

Thanks,
Eiti



J.P. Eiti Kimura
Plataformas

+55 19 3518  5500
+ 55 19 98232 2792
skype: eitikimura

  


2016-09-11 18:20 GMT-03:00 Jens Rantil :

> I just want to chime in and say that we also had issues keeping up with
> compaction once (with vnodes/ssd disks) and I also want to recommend
> keeping track of your open file limit which might bite you.
>
> Cheers,
> Jens
>
>
> On Friday, August 19, 2016, Mark Rose  wrote:
>
>> Hi Ezra,
>>
>> Are you making frequent changes to your rows (including TTL'ed
>> values), or mostly inserting new ones? If you're only inserting new
>> data, it's probable using size-tiered compaction would work better for
>> you. If you are TTL'ing whole rows, consider date-tiered.
>>
>> If leveled compaction is still the best strategy, one way to catch up
>> with compactions is to have less data per partition -- in other words,
>> use more machines. Leveled compaction is CPU expensive. You are CPU
>> bottlenecked currently, or from the other perspective, you have too
>> much data per node for leveled compaction.
>>
>> At this point, compaction is so far behind that you'll likely be
>> getting high latency if you're reading old rows (since dozens to
>> hundreds of uncompacted sstables will likely need to be checked for
>> matching rows). You may be better off with size tiered compaction,
>> even if it will mean always reading several sstables per read (higher
>> latency than when leveled can keep up).
>>
>> How much data do you have per node? Do you update/insert to/delete
>> rows? Do you TTL?
>>
>> Cheers,
>> Mark
>>
>> On Wed, Aug 17, 2016 at 2:39 PM, Ezra Stuetzel 
>> wrote:
>> > I have one node in my cluster 2.2.7 (just upgraded from 2.2.6 hoping to
>> fix
>> > issue) which seems to be stuck in a weird state -- with a large number
>> of
>> > pending compactions and sstables. The node is compacting about
>> 500gb/day,
>> > number of pending compactions is going up at about 50/day. It is at
>> about
>> > 2300 pending compactions now. I have tried increasing number of
>> compaction
>> > threads and the compaction throughput, which doesn't seem to help
>> eliminate
>> > the many pending compactions.
>> >
>> > I have tried running 'nodetool cleanup' and 'nodetool compact'. The
>> latter
>> > has fixed the issue in the past, but most recently I was getting OOM
>> errors,
>> > probably due to the large number of sstables. I upgraded to 2.2.7 and
>> am no
>> > longer getting OOM errors, but also it does not resolve the issue. I do
>> see
>> > this message in the logs:
>> >
>> >> INFO  [RMI TCP Connection(611)-10.9.2.218] 2016-08-17 01:50:01,985
>> >> CompactionManager.java:610 - Cannot perform a full major compaction as
>> >> repaired and unrepaired sstables cannot be compacted together. These
>> two set
>> >> of sstables will be compacted separately.
>> >
>> > Below are the 'nodetool tablestats' comparing a normal and the
>> problematic
>> > node. You can see problematic node has many many more sstables, and
>> they are
>> > all in level 1. What is the best way to fix this? Can I just delete
>> those
>> > sstables somehow then run a repair?
>> >>
>> >> Normal node
>> >>>
>> >>> keyspace: mykeyspace
>> >>>
>> >>> Read Count: 0
>> >>>
>> >>> Read Latency: NaN ms.
>> >>>
>> >>> Write Count: 31905656
>> >>>
>> >>> Write Latency: 0.051713177939359714 ms.
>> >>>
>> >>> Pending Flushes: 0
>> >>>
>> >>> Table: mytable
>> >>>
>> >>> SSTable count: 1908
>> >>>
>> >>> SSTables in each level: [11/4, 20/10, 213/100, 1356/1000,
>> 306, 0,
>> >>> 0, 0, 0]
>> >>>
>> >>> Space used (live): 301894591442
>> >>>
>> >>> Space used (total): 301894591442
>> >>>
>> >>>
>> >>>
>> >>> Problematic node
>> >>>
>> >>> Keyspace: mykeyspace
>> >>>
>> >>> Read Count: 0
>> >>>
>> >>> Read Latency: NaN ms.
>> >>>
>> >>> Write Count: 30520190
>> >>>
>> >>> Write Latency: 0.05171286705620116 ms.
>> >>>
>> >>> Pending Flushes: 0
>> >>>
>> >>> Table: mytable
>> >>>
>> >>> SSTable count: 14105
>> >>>
>> >>> SSTables in each level: [13039/4, 21/10, 206/100, 831, 0, 0,
>> 0,
>> >>> 0, 0]
>> >>>
>> >>> Space used (live): 561143255289
>> >>>
>> >>> Space used (total): 561143255289
>> >
>> > Thanks,
>> >
>> > Ezra
>>
>
>
> --
> Jens Rantil
> Backend engineer
> 

Re: Using a Set for UDTs, how is uniqueness established?

2016-11-07 Thread Ali Akhtar
Huh, so that means updates to the udt values won't be possible?

Sticking to a map then.

On Mon, Nov 7, 2016 at 5:31 PM, DuyHai Doan  wrote:

> So, to compare UDT values, Cassandra will compare them field by field. So
> that udt1.equals(udt2) results in:
>
>   udt1.field1.equals(udt2.field1)
> && udt1.field2.equals(udt2.field2)
> ...
> && udt1.fieldN.equals(udt2.fieldN)
>
> Your idea of using field "id" to distinguish between UDT value is good
> e.g. if the "id" value mismatches then the 2 UDT are different. However, if
> the "id" values do match, it does not guarantee that the UDT values match
> since it requires that all other fields match.
>
>
>
> On Mon, Nov 7, 2016 at 1:14 PM, Ali Akhtar  wrote:
>
>> I have a UDT which contains a text 'id' field, which should be used to
>> establish the uniqueness of the UDT.
>>
>> I'd like to have a set field in a table, and I'd like to use the
>> id of the udts to establish uniqueness.
>>
>> Any ideas how this can be done? Also using Java, and c* 3.7
>>
>
>


Re: Using a Set for UDTs, how is uniqueness established?

2016-11-07 Thread DuyHai Doan
So, to compare UDT values, Cassandra will compare them field by field. So
that udt1.equals(udt2) results in:

  udt1.field1.equals(udt2.field1)
&& udt1.field2.equals(udt2.field2)
...
&& udt1.fieldN.equals(udt2.fieldN)

Your idea of using field "id" to distinguish between UDT value is good e.g.
if the "id" value mismatches then the 2 UDT are different. However, if the
"id" values do match, it does not guarantee that the UDT values match since
it requires that all other fields match.



On Mon, Nov 7, 2016 at 1:14 PM, Ali Akhtar  wrote:

> I have a UDT which contains a text 'id' field, which should be used to
> establish the uniqueness of the UDT.
>
> I'd like to have a set field in a table, and I'd like to use the
> id of the udts to establish uniqueness.
>
> Any ideas how this can be done? Also using Java, and c* 3.7
>


Using a Set for UDTs, how is uniqueness established?

2016-11-07 Thread Ali Akhtar
I have a UDT which contains a text 'id' field, which should be used to
establish the uniqueness of the UDT.

I'd like to have a set field in a table, and I'd like to use the id
of the udts to establish uniqueness.

Any ideas how this can be done? Also using Java, and c* 3.7


Re: failing bootstraps with OOM

2016-11-07 Thread Carlos Alonso
If what you need is a replacement node, to increase the hardware specs I'd
recommend an 'immediate node replacement' like described here:
http://mrcalonso.com/cassandra-instantaneous-in-place-node-replacement/

Basically the process just rsyncs the relevant data (data + configuration)
from one node to another and stop the old node and start the new one. As
the configuration is all the same (just the ip will change) it joins the
ring as if it was the old one. And there's no need for any bootstrapping.

BTW, are you using vnodes?

Regards

Carlos Alonso | Software Engineer | @calonso 

On 3 November 2016 at 15:46, Oleksandr Shulgin  wrote:

> On Thu, Nov 3, 2016 at 2:32 PM, Mike Torra  wrote:
>
>> Hi Alex - I do monitor sstable counts and pending compactions, but
>> probably not closely enough. In 3/4 regions the cluster is running in, both
>> counts are very high - ~30-40k sstables for one particular CF, and on many
>> nodes >1k pending compactions.
>>
>
> It is generally a good idea to try to keep the number of pending
> compactions minimal.  We usually see it is close to zero on every node
> during normal operations and less than some tens during maintenance such as
> repair.
>
> I had noticed this before, but I didn't have a good sense of what a "high"
>> number for these values was.
>>
>
> I would say anything higher than 20 probably requires someone to have a
> look and over 1k is very troublesome.
>
> It makes sense to me why this would cause the issues I've seen. After
>> increasing concurrent_compactors and compaction_throughput_mb_per_sec
>> (to 8 and 64mb, respectively), I'm starting to see those counts go down
>> steadily. Hopefully that will resolve the OOM issues, but it looks like it
>> will take a while for compactions to catch up.
>>
>> Thanks for the suggestions, Alex
>>
>
> Welcome. :-)
>
> --
> Alex
>
>


Re: Are Cassandra writes are faster than reads?

2016-11-07 Thread Vikas Jaiman
Thanks Jeff and Ben for the info.

On Mon, Nov 7, 2016 at 6:44 AM, Ben Bromhead  wrote:

> They can be and it depends on your compaction strategy :)
>
> On Sun, 6 Nov 2016 at 21:24 Ali Akhtar  wrote:
>
>> tl;dr? I just want to know if updates are bad for performance, and if so,
>> for how long.
>>
>> On Mon, Nov 7, 2016 at 10:23 AM, Ben Bromhead 
>> wrote:
>>
>> Check out https://wiki.apache.org/cassandra/WritePathForUsers for the
>> full gory details.
>>
>> On Sun, 6 Nov 2016 at 21:09 Ali Akhtar  wrote:
>>
>> How long does it take for updates to get merged / compacted into the main
>> data file?
>>
>> On Mon, Nov 7, 2016 at 5:31 AM, Ben Bromhead  wrote:
>>
>> To add some flavor as to how the commitlog implementation is so quick.
>>
>> It only flushes to disk every 10s by default. So writes are effectively
>> done to memory and then to disk asynchronously later on. This is generally
>> accepted to be OK, as the write is also going to other nodes.
>>
>> You can of course change this behavior to flush on each write or to skip
>> the commitlog altogether (danger!). This however will change how "safe"
>> things are from a durability perspective.
>>
>> On Sun, Nov 6, 2016, 12:51 Jeff Jirsa  wrote:
>>
>> Cassandra writes are particularly fast, for a few reasons:
>>
>>
>>
>> 1)   Most writes go to a commitlog (append-only file, written
>> linearly, so particularly fast in terms of disk operations) and then pushed
>> to the memTable. Memtable is flushed in batches to the permanent data
>> files, so it buffers many mutations and then does a sequential write to
>> persist that data to disk.
>>
>> 2)   Reads may have to merge data from many data tables on disk.
>> Because the writes (described very briefly in step 1) write to immutable
>> files, updates/deletes have to be merged on read – this is extra effort for
>> the read path.
>>
>>
>>
>> If you don’t do much in terms of overwrites/deletes, and your partitions
>> are particularly small, and your data fits in RAM (probably mmap/page cache
>> of data files, unless you’re using the row cache), reads may be very fast
>> for you. Certainly individual reads on low-merge workloads can be < 0.1ms.
>>
>>
>>
>> -  Jeff
>>
>>
>>
>> *From: *Vikas Jaiman 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Sunday, November 6, 2016 at 12:42 PM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Are Cassandra writes are faster than reads?
>>
>>
>>
>> Hi all,
>>
>>
>>
>> Are Cassandra writes are faster than reads ?? If yes, why is this so? I
>> am using consistency 1 and data is in memory.
>>
>>
>>
>> Vikas
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr 
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>> --
>> Ben Bromhead
>> CTO | Instaclustr 
>> +1 650 284 9692
>> Managed Cassandra / Spark on AWS, Azure and Softlayer
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Re: Designing a table in cassandra

2016-11-07 Thread Carlos Alonso
Hi,

I think your best bet is, as usual, the simplest one that can work, which,
to me, in this case is the 3rd one. Creating one single device table that
contains the different 'versions' of the configuration during time, along
with a flag to know wether it was updated by user or by network gives you
all the flexibility you need. The primary key you suggest sounds good to me.

To finally validate the model it would be good to know which are the
queries you're thinking of running against this model because as you
probably know, Cassandra models should be query driven.

The suggested primary key will work for queries like "Give me the
version(s) of this particular device_name in this particular time range"

Hope it helps.

Regards

Carlos Alonso | Software Engineer | @calonso 

On 7 November 2016 at 01:23, sat  wrote:

> Hi,
>
> We are new to Cassandra. For our POC, we tried creating table and
> inserting them as JSON and all these went fine. Now we are trying to
> implement one of the application scenario, and I am having difficulty in
> coming up with the best approach.
>
> Scenario:
> We have a Device POJO which have some attributes/fields which are
> read/write by users as well as network and some attributes/fields only
> network can modify. When users need to configure they will create an
> instance of Device POJO and set/configure applicable fields, however
> network can update those attributes. We wanted to know the discrepancy by
> the values configured by users versus the values updated by network. Hence
> we have thought of 3 different approaches
>
> 1) Create multiple tables for the same Device like Device_Users and
> Device_Network so that we can see the difference.
>
> 2) Create different Keyspace as multiple objects like Device can have the
> same requirement
>
> 3) Create one "Device" table and insert one row for user configuration and
> another row for network update. We will create this table with multiple
> primary key (device_name, updated_by)
>
> Please let us know which is the best option (with their pros and cons if
> possible) among these 3, and also let us know if there are other options.
>
> Thanks and Regards
> A.SathishKumar
>