from:"Laing, Michael"

Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael

try the following. -ml

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE file (
parentid text, -- row_key, same for each version
id text, -- column_key, same for each version
contenttype map, -- differs by version, version is the
key to the map
PRIMARY KEY (parentid, id)
);

update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
parentid = 'd1' and id = 'f1';
update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
parentid = 'd1' and id = 'f2';
update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
parentid = 'd1' and id = 'f2';

select * from file where parentid = 'd1';

-- returns:

-- parentid | id | contenttype
++--
--   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
00:00:00-0500': 'pdf2'}
--   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
00:00:00-0500': 'pdf4'}

-- use an app to pop off the latest version from the map

-- map other varying fields using the same technique as used for contenttype



On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra  wrote:

> create table file(id text , parentid text,contenttype text,version
> timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
> ORDER BY (version DESC);
>
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
> insert into file (id, parentid, version, contenttype, descr, name) values
> ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
> create index on file(parentid);
>
>
> select * from file where id='f1' and parentid='d1' limit 1;
>
> select * from file where parentid='d1' limit 1;
>
>
> Will it work for you?
>
> -Vivek
>
>
>
>
> On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra wrote:
>
>> My bad. I did miss out to read "latest version" part.
>>
>> -Vivek
>>
>>
>> On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah <
>> muhammed.daw...@gmail.com> wrote:
>>
>>> I have tried with both the options creating secondary index and also
>>> tried adding parentid to primary key, but I am getting all the files with
>>> parentid 'yyy', what I want is the latest version of file with the
>>> combination of parentid, fileid. Say below are the records inserted in the
>>> file table:
>>>
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>>> insert into file (id, parentid, version, contenttype, description, name)
>>> values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1');
>>>
>>> I want to write a query which returns me second and last record and not
>>> the first and third record, because for the first and third record there
>>> exists a latest version, for the combination of id and parentid.
>>>
>>> I am confused If at all this is achievable, please suggest.
>>>
>>> Dawood
>>>
>>>
>>>
>>> On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra wrote:
>>>
 create secondary index over parentid.
 OR
 make it part of clustering key

 -Vivek


 On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah <
 muhammed.daw...@gmail.com> wrote:

> Jan,
>
> The solution you gave works spot on, but there is one more requirement
> I forgot to mention. Following is my table structure
>
> CREATE TABLE file (
>   id text,
>   contenttype text,
>   createdby text,
>   createdtime timestamp,
>   description text,
>   name text,
>   parentid text,
>   version timestamp,
>   PRIMARY KEY (id, version)
>
> ) WITH CLUSTERING ORDER BY (version DESC);
>
>
> The query (select * from file where id = 'xxx' limit 1;) provided
> solves the problem of finding the latest version file. But I have one more
> requirement of finding all the latest version files having parentid say
> 'yyy'.
>
> Please suggest how can this query be achieved.
>
> Dawood
>
>
>
> On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah <
> muhammed.daw...@gmail.com> wrote:
>
>> In my case version can be timestamp

Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael

I use the technique described in my previous message to handle millions of
messages and their versions.

Actually, I use timeuuid's instead of timestamps, as they have more
'uniqueness'. Also I index my maps by a timeuuid that is the complement
(based on a future date) of a current timeuuid. Since maps are kept sorted
by key, this means I can just pop off the first one to get the most recent.

The downside of this approach is that you get more stuff returned to you
from Cassandra than you need. To mitigate that I queue a job to examine and
correct the situation if, upon doing a read, the number of versions for a
particular key is higher than some threshold, e.g. 50. There are many ways
to approach this problem.

Our actual implementation proceeds to another level, as we also have
replicas of versions. This happens because we process important
transactions in parallel and can expect up to 9 replicas of each version.
We journal them all and use them for reporting latencies in our processing
pipelines as well as for replay when we need to recover application state.

Regards,

Michael


On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael wrote:

> try the following. -ml
>
> -- put this in  and run using 'cqlsh -f 
>
> DROP KEYSPACE latest;
>
> CREATE KEYSPACE latest WITH replication = {
> 'class': 'SimpleStrategy',
> 'replication_factor' : 1
> };
>
> USE latest;
>
> CREATE TABLE file (
> parentid text, -- row_key, same for each version
> id text, -- column_key, same for each version
> contenttype map, -- differs by version, version is
> the key to the map
> PRIMARY KEY (parentid, id)
> );
>
> update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
> parentid = 'd1' and id = 'f1';
> update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
> parentid = 'd1' and id = 'f1';
> update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
> parentid = 'd1' and id = 'f2';
> update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
> parentid = 'd1' and id = 'f2';
>
> select * from file where parentid = 'd1';
>
> -- returns:
>
> -- parentid | id | contenttype
>
> ++--
> --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
> 00:00:00-0500': 'pdf2'}
> --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
> 00:00:00-0500': 'pdf4'}
>
> -- use an app to pop off the latest version from the map
>
> -- map other varying fields using the same technique as used for
> contenttype
>
>
>
> On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra wrote:
>
>> create table file(id text , parentid text,contenttype text,version
>> timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING
>> ORDER BY (version DESC);
>>
>> insert into file (id, parentid, version, contenttype, descr, name) values
>> ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1');
>> insert into file (id, parentid, version, contenttype, descr, name) values
>> ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1');
>> insert into file (id, parentid, version, contenttype, descr, name) values
>> ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1');
>> insert into file (id, parentid, version, contenttype, descr, name) values
>> ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1');
>> create index on file(parentid);
>>
>>
>> select * from file where id='f1' and parentid='d1' limit 1;
>>
>> select * from file where parentid='d1' limit 1;
>>
>>
>> Will it work for you?
>>
>> -Vivek
>>
>>
>>
>>
>> On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra wrote:
>>
>>> My bad. I did miss out to read "latest version" part.
>>>
>>> -Vivek
>>>
>>>
>>> On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah <
>>> muhammed.daw...@gmail.com> wrote:
>>>
>>>> I have tried with both the options creating secondary index and also
>>>> tried adding parentid to primary key, but I am getting all the files with
>>

Re: Versioning in cassandra

2013-09-04 Thread Laing, Michael

Dawood,

In general that will work. However it does mean that you 1) read the old
version 2) update the new version and 3) write the archive version.

Step 2 is a problem: what if someone else has updated the old version after
step 1? and there are 3 atomic operations required, at least.

However, these considerations may be mitigated using Cassandra 2 light
transactions; and it is not a problem if you have only one updater.

But another problem may be performance. You must test. The solution I
proposed does not require a read before write and does an atomic append,
even if multiple maps are being updated. It also defers deletions via ttl's
and a separate, manageable queue for 'cleanup' of large maps.

I think the most important word in my reply is: 'test'.

Cheers,

Michael


On Wed, Sep 4, 2013 at 9:05 AM, dawood abdullah
wrote:

> Michael,
>
> Your approach solves the problem, thanks for the solution. I was thinking
> of another approach as well where in I would create another column family
> say file_archive, so whenever an update is made to the File table, I will
> create a new version in the File and move the old version to the new
> file_archive table. Please let me know if the second approach is fine.
>
> Regards,
> Dawood
>
>
> On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael 
> wrote:
>
>> I use the technique described in my previous message to handle millions
>> of messages and their versions.
>>
>> Actually, I use timeuuid's instead of timestamps, as they have more
>> 'uniqueness'. Also I index my maps by a timeuuid that is the complement
>> (based on a future date) of a current timeuuid. Since maps are kept sorted
>> by key, this means I can just pop off the first one to get the most recent.
>>
>> The downside of this approach is that you get more stuff returned to you
>> from Cassandra than you need. To mitigate that I queue a job to examine and
>> correct the situation if, upon doing a read, the number of versions for a
>> particular key is higher than some threshold, e.g. 50. There are many ways
>> to approach this problem.
>>
>> Our actual implementation proceeds to another level, as we also have
>> replicas of versions. This happens because we process important
>> transactions in parallel and can expect up to 9 replicas of each version.
>> We journal them all and use them for reporting latencies in our processing
>> pipelines as well as for replay when we need to recover application state.
>>
>> Regards,
>>
>> Michael
>>
>>
>> On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael > > wrote:
>>
>>> try the following. -ml
>>>
>>> -- put this in  and run using 'cqlsh -f 
>>>
>>> DROP KEYSPACE latest;
>>>
>>> CREATE KEYSPACE latest WITH replication = {
>>> 'class': 'SimpleStrategy',
>>> 'replication_factor' : 1
>>> };
>>>
>>> USE latest;
>>>
>>> CREATE TABLE file (
>>> parentid text, -- row_key, same for each version
>>> id text, -- column_key, same for each version
>>> contenttype map, -- differs by version, version is
>>> the key to the map
>>> PRIMARY KEY (parentid, id)
>>> );
>>>
>>> update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where
>>> parentid = 'd1' and id = 'f1';
>>> update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where
>>> parentid = 'd1' and id = 'f1';
>>> update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where
>>> parentid = 'd1' and id = 'f2';
>>> update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where
>>> parentid = 'd1' and id = 'f2';
>>>
>>> select * from file where parentid = 'd1';
>>>
>>> -- returns:
>>>
>>> -- parentid | id | contenttype
>>>
>>> ++--
>>> --   d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05
>>> 00:00:00-0500': 'pdf2'}
>>> --   d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05
>>> 00:00:00-0500': 'pdf4'}
>>>
>>> -- use an app to pop off the latest version from the map
>>>
>>> -- map other varying fields using the same technique as

Re: Selecting multiple rows with composite partition keys using CQL3

2013-09-04 Thread Laing, Michael

you could try this. -ml

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE carl_test;

CREATE KEYSPACE carl_test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE carl_test;

CREATE TABLE carl_table (
app text,
name text,
ts int,
data  text,
PRIMARY KEY ((app, name), ts)
);

update carl_table set data = 'whatever' where app='foo' and name='bar' and
ts=123;
update carl_table set data = 'whomever' where app='foo' and name='hello'
and ts=123;

SELECT * FROM carl_table WHERE app='foo' and name in ('bar', 'hello') and
ts=123;

-- returns:

-- app | name  | ts  | data
---+---+-+--
-- foo |   bar | 123 | whatever
-- foo | hello | 123 | whomever



On Wed, Sep 4, 2013 at 1:33 PM, Carl Lerche  wrote:

> I can't find a way to do this with the current implementation of CQL3. Are
> there any plans to add an OR feature to CQL3 or some other way to select a
> batch of disjoint composite keys?
>
>
> On Fri, Aug 30, 2013 at 7:52 PM, Carl Lerche  wrote:
>
>> Hello,
>>
>> I've been trying to figure out how to port my application to CQL3 based
>> on http://cassandra.apache.org/doc/cql3/CQL.html.
>>
>> I have a table with a primary key: ( (app, name), timestamp ). So, the
>> partition key would be composite (on app and name). I'm trying to figure
>> out if there is a way to select multiple rows that span partition keys.
>> Basically, I am trying to do:
>>
>> SELECT .. WHERE (app = 'foo' AND name = 'bar' AND timestamp = 123) OR
>> (app = 'foo' AND name='hello' AND timestamp = 123)
>>
>
>

Re: One node out of three not flushing memtables

2013-09-09 Thread Laing, Michael

I have seen something similar.

Of course correlation is not causation...

Like you, doing testing with heavy writes.

I was using a python client to drive the writes using the cql module which
is thrift based.

The correlation I eventually tracked down was that whichever node my python
client(s) connected to eventually ran out of memory because it could not
gain enough back by flushing memtables. It was just a matter of time.

I switched to the new python-driver client and the problem disappeared.

I have now been able to return almost all parameters to defaults and get
out the business of manually managing the JVM heap, to my great relief!

Currently, I have to retool my test harness as I have been unable to drive
C*2.0.0 to destruction (yet).

Michael

On Mon, Sep 9, 2013 at 8:11 PM, Jan Algermissen
wrote:

> I have a strange pattern: In a cluster with three equally dimensioned and
> configured nodes I keep loosing one because apparently it fails to flush
> its memtables:
>
> http://twitpic.com/dcrtel
>
>
> It is a different node every time.
>
> So far I understand that I should expect to see the chain-saw graph when
> memtables are build up and then get flushed. But what about that third
> node? Has anyone seen something similar?
>
> Jan
>
> C* dsc 2.0 ,  3x 4GB, 2CPU nodes with heavy writes of 70 col-rows (aprox
> 10 of those rows per wide row)
>
> I have turned off caches, reduced overall memtable and set flush-wroters
> to 2,  rpc_reader and writer threads to 1.
>
>
>

Re: Composite Column Grouping

2013-09-10 Thread Laing, Michael

You could try this. C* doesn't do it all for you, but it will efficiently
get you the right data.

-ml

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE time_series (
userid text,
pkid text,
colname map,
PRIMARY KEY (userid, pkid)
);

UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
userid = 'XYZ' AND pkid = '1000';
UPDATE time_series SET colname = colname +
{'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
UPDATE time_series SET colname = colname +
{'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
UPDATE time_series SET colname = colname +
{'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
UPDATE time_series SET colname = colname +
{'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';

SELECT * FROM time_series WHERE userid = 'XYZ';

-- returns:
-- userid | pkid | colname
--+--+-
--XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
'Col-Name-4'}
--XYZ | 1001 |   {'201':
'Col-Name-2'}
--XYZ | 1002 |   {'204':
'Col-Name-5'}

-- use an app to pop off the latest key/value from the map for each row,
then sort by key desc.


On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:

> I have been faced with a problem of grouping composites on the second-part.
>
> Lets say my CF contains this
>
>
> TimeSeriesCF
>key:UserID
>composite-col-name:TimeUUID:PKID
>
> Some sample data
>
> UserID = XYZ
>  Time:PKID
>Col-Name1 = 200:1000
>Col-Name2 = 201:1001
>Col-Name3 = 202:1000
>Col-Name4 = 203:1000
>Col-Name5 = 204:1002
>
> Whenever a time-series query is issued, it should return the following in
> time-desc order.
>
> UserID = XYZ
>   Col-Name5 = 204:1002
>   Col-Name4 = 203:1000
>   Col-Name2 = 201:1001
>
> Is something like this possible in Cassandra? Is there a different way to
> design and achieve the same objective?
>
> --
> Ravi
>
>

Re: Composite Column Grouping

2013-09-10 Thread Laing, Michael

If you have set up the table as described in my previous message, you could
run this python snippet to return the desired result:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
logging.basicConfig()

from operator import itemgetter

import cassandra
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

cql_cluster = Cluster()
cql_session = cql_cluster.connect()
cql_session.set_keyspace('latest')

select_stmt = "select * from time_series where userid = 'XYZ'"
query = SimpleStatement(select_stmt)
rows = cql_session.execute(query)

results = []
for row in rows:
max_time = max(row.colname.keys())
results.append((row.userid, row.pkid, max_time, row.colname[max_time]))

sorted_results = sorted(results, key=itemgetter(2), reverse=True)
for result in sorted_results: print result

# prints:

# (u'XYZ', u'1002', u'204', u'Col-Name-5')
# (u'XYZ', u'1000', u'203', u'Col-Name-4')
# (u'XYZ', u'1001', u'201', u'Col-Name-2')



On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael
wrote:

> You could try this. C* doesn't do it all for you, but it will efficiently
> get you the right data.
>
> -ml
>
> -- put this in  and run using 'cqlsh -f 
>
> DROP KEYSPACE latest;
>
> CREATE KEYSPACE latest WITH replication = {
> 'class': 'SimpleStrategy',
> 'replication_factor' : 1
> };
>
> USE latest;
>
> CREATE TABLE time_series (
> userid text,
> pkid text,
> colname map,
> PRIMARY KEY (userid, pkid)
> );
>
> UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
> userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
> UPDATE time_series SET colname = colname +
> {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';
>
> SELECT * FROM time_series WHERE userid = 'XYZ';
>
> -- returns:
> -- userid | pkid | colname
>
> --+--+-
> --XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
> 'Col-Name-4'}
> --XYZ | 1001 |   {'201':
> 'Col-Name-2'}
> --XYZ | 1002 |   {'204':
> 'Col-Name-5'}
>
> -- use an app to pop off the latest key/value from the map for each row,
> then sort by key desc.
>
>
> On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
>> I have been faced with a problem of grouping composites on the
>> second-part.
>>
>> Lets say my CF contains this
>>
>>
>> TimeSeriesCF
>>key:UserID
>>composite-col-name:TimeUUID:PKID
>>
>> Some sample data
>>
>> UserID = XYZ
>>  Time:PKID
>>Col-Name1 = 200:1000
>>Col-Name2 = 201:1001
>>Col-Name3 = 202:1000
>>Col-Name4 = 203:1000
>>Col-Name5 = 204:1002
>>
>> Whenever a time-series query is issued, it should return the following in
>> time-desc order.
>>
>> UserID = XYZ
>>   Col-Name5 = 204:1002
>>   Col-Name4 = 203:1000
>>   Col-Name2 = 201:1001
>>
>> Is something like this possible in Cassandra? Is there a different way to
>> design and achieve the same objective?
>>
>> --
>> Ravi
>>
>>
>
>

Re: Composite Column Grouping

2013-09-11 Thread Laing, Michael

Then you can do this. I handle millions of entries this way and it works
well if you are mostly interested in recent activity.

If you need to span all activity then you can use a separate table to
maintain the 'latest'. This table should also be sharded as entries will be
'hot'. Sharding will spread the heat and the tombstones (compaction load)
around the cluster.

-ml

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE time_series (
bucket_userid text, -- bucket is the beginning of a datetime span
concatenated with a shard designator
pkid text,
timeuuid text,
colname text,
PRIMARY KEY (bucket_userid, timeuuid)
);

-- the example table is using 15 minute bucket spans and 2 shards for
illustration (you would usually use more shards)
-- adjust these appropriately for your application

UPDATE time_series SET pkid = '1000', colname = 'Col-Name-1' where
bucket_userid = '2013-09-11T05:15-0_XYZ' AND timeuuid='200';
UPDATE time_series SET pkid = '1001', colname = 'Col-Name-2' where
bucket_userid = '2013-09-11T05:15-1_XYZ' AND timeuuid='201';
UPDATE time_series SET pkid = '1000', colname = 'Col-Name-3' where
bucket_userid = '2013-09-11T05:15-0_XYZ' AND timeuuid='202';
UPDATE time_series SET pkid = '1000', colname = 'Col-Name-4' where
bucket_userid = '2013-09-11T05:30-1_XYZ' AND timeuuid='203';
UPDATE time_series SET pkid = '1002', colname = 'Col-Name-5' where
bucket_userid = '2013-09-11T05:30-0_XYZ' AND timeuuid='204';

-- This query assumes that the 'current' span is 2013-09-11T05:30 and I am
interested in this span and the previous one.

SELECT * FROM time_series
WHERE bucket_userid in ( -- go back as many spans as you need to, all
shards in each span (cartesian product)
'2013-09-11T05:15-0_XYZ',
'2013-09-11T05:15-1_XYZ',
'2013-09-11T05:30-0_XYZ',
'2013-09-11T05:30-1_XYZ'
)
ORDER BY timeuuid DESC;

-- returns:
-- bucket_userid  | timeuuid | colname| pkid
--+--++--
-- 2013-09-11T05:30-0_XYZ |  204 | Col-Name-5 | 1002
-- 2013-09-11T05:30-1_XYZ |  203 | Col-Name-4 | 1000
-- 2013-09-11T05:15-0_XYZ |  202 | Col-Name-3 | 1000
-- 2013-09-11T05:15-1_XYZ |  201 | Col-Name-2 | 1001
-- 2013-09-11T05:15-0_XYZ |  200 | Col-Name-1 | 1000

-- do a stable purge on pkid to get the result.


On Wed, Sep 11, 2013 at 1:01 AM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:

> Thanks Michael,
>
> But I cannot sort the rows in memory, as the number of columns will be
> quite huge.
>
> From the python script above:
>select_stmt = "select * from time_series where userid = 'XYZ'"
>
> This would return me many hundreds of thousands of columns. I need to go
> in time-series order using ranges [Pagination queries].
>
>
> On Wed, Sep 11, 2013 at 7:06 AM, Laing, Michael  > wrote:
>
>> If you have set up the table as described in my previous message, you
>> could run this python snippet to return the desired result:
>>
>> #!/usr/bin/env python
>> # -*- coding: utf-8 -*-
>> import logging
>> logging.basicConfig()
>>
>> from operator import itemgetter
>>
>> import cassandra
>> from cassandra.cluster import Cluster
>> from cassandra.query import SimpleStatement
>>
>> cql_cluster = Cluster()
>> cql_session = cql_cluster.connect()
>> cql_session.set_keyspace('latest')
>>
>> select_stmt = "select * from time_series where userid = 'XYZ'"
>> query = SimpleStatement(select_stmt)
>> rows = cql_session.execute(query)
>>
>> results = []
>> for row in rows:
>> max_time = max(row.colname.keys())
>> results.append((row.userid, row.pkid, max_time,
>> row.colname[max_time]))
>>
>> sorted_results = sorted(results, key=itemgetter(2), reverse=True)
>> for result in sorted_results: print result
>>
>> # prints:
>>
>> # (u'XYZ', u'1002', u'204', u'Col-Name-5')
>> # (u'XYZ', u'1000', u'203', u'Col-Name-4')
>> # (u'XYZ', u'1001', u'201', u'Col-Name-2')
>>
>>
>>
>> On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> You could try this. C* doesn't

Re: Composite Column Grouping

2013-09-11 Thread Laing, Michael

Here's a slightly better version and a python script. -ml

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE time_series (
bucket_userid text, -- bucket is the beginning of a datetime span
concatenated with a shard designator
user_id text,
pkid text,
timeuuid text,
colname text,
PRIMARY KEY (bucket_userid, timeuuid)
);

UPDATE time_series
SET
user_id = 'XYZ',
pkid = '1000',
colname = 'Col-Name-1'
WHERE
bucket_userid = '2013-09-11T05:15-0_XYZ' AND
timeuuid='200'
;
UPDATE time_series
SET
user_id = 'XYZ',
pkid = '1001',
colname = 'Col-Name-2'
WHERE
bucket_userid = '2013-09-11T05:15-1_XYZ' AND
timeuuid='201'
;
UPDATE time_series
SET
user_id = 'XYZ',
pkid = '1000',
colname = 'Col-Name-3'
WHERE
bucket_userid = '2013-09-11T05:15-0_XYZ' AND
timeuuid='202'
;
UPDATE time_series
SET
user_id = 'XYZ',
pkid = '1000',
colname = 'Col-Name-4'
WHERE
bucket_userid = '2013-09-11T05:30-1_XYZ' AND
timeuuid='203'
;
UPDATE time_series
SET
user_id = 'XYZ',
pkid = '1002',
colname = 'Col-Name-5'
WHERE
bucket_userid = '2013-09-11T05:30-0_XYZ' AND
timeuuid='204'
;

-- This query assumes that the 'current' span is 2013-09-11T05:30 and I am
interested in this span and the previous one.

SELECT * FROM time_series
WHERE bucket_userid IN ( -- go back as many spans as you need to, all
shards in each span (cartesian product)
'2013-09-11T05:15-0_XYZ',
'2013-09-11T05:15-1_XYZ',
'2013-09-11T05:30-0_XYZ',
'2013-09-11T05:30-1_XYZ'
) -- you could add a range condition on timeuuid to further restrict the
results
ORDER BY timeuuid DESC;

-- returns:
-- bucket_userid  | timeuuid | colname| pkid | user_id
--+--++--+-
-- 2013-09-11T05:30-0_XYZ |  204 | Col-Name-5 | 1002 | XYZ
-- 2013-09-11T05:30-1_XYZ |  203 | Col-Name-4 | 1000 | XYZ
-- 2013-09-11T05:15-0_XYZ |  202 | Col-Name-3 | 1000 | XYZ
-- 2013-09-11T05:15-1_XYZ |  201 | Col-Name-2 | 1001 | XYZ
-- 2013-09-11T05:15-0_XYZ |  200 | Col-Name-1 | 1000 | XYZ

-- do a stable purge on pkid to get the result


python script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
logging.basicConfig()

import cassandra
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

cql_cluster = Cluster()
cql_session = cql_cluster.connect()
cql_session.set_keyspace('latest')

select_stmt = """
SELECT * FROM time_series
WHERE bucket_userid IN ( -- go back as many spans as you need to, all
shards in each span (cartesian product)
'2013-09-11T05:15-0_XYZ',
'2013-09-11T05:15-1_XYZ',
'2013-09-11T05:30-0_XYZ',
'2013-09-11T05:30-1_XYZ'
)
ORDER BY timeuuid DESC;
"""

query = SimpleStatement(select_stmt)
rows = cql_session.execute(query)

pkids = set()
for row in rows:
if row.pkid in pkids:
continue
else:
print row.user_id, row.timeuuid, row.colname, row.pkid
pkids.add(row.pkid)

# prints:

# XYZ 204 Col-Name-5 1002
# XYZ 203 Col-Name-4 1000
# XYZ 201 Col-Name-2 1001


On Wed, Sep 11, 2013 at 6:13 AM, Laing, Michael
wrote:

> Then you can do this. I handle millions of entries this way and it works
> well if you are mostly interested in recent activity.
>
> If you need to span all activity then you can use a separate table to
> maintain the 'latest'. This table should also be sharded as entries will be
> 'hot'. Sharding will spread the heat and the tombstones (compaction load)
> around the cluster.
>
> -ml
>
> -- put this in  and run using 'cqlsh -f 
>
> DROP KEYSPACE latest;
>
> CREATE KEYSPACE latest WITH replication = {
> 'class': 'SimpleStrategy',
> 'replication_factor' : 1
> };
>
> USE latest;
>
> CREATE TABLE time_series (
> bucket_userid text, -- bucket is the beginning of a datetime span
> concatenated with a shard designator
> pkid text,
> timeuuid text,
> colname text,
> PRIMARY KEY (bucket_userid, timeuuid)
> );
>
> -- the example table is using 15 minute bucket spans and 2 shards for
> illustration (you would usually use more shards)
> -- adjust these appropriately for your application
>
> UPDATE time_se

Re: Complex JSON objects

2013-09-11 Thread Laing, Michael

A way to do this would be to express the JSON structure as (path, value)
tuples and then use a map to store them.

For example, your JSON above can be expressed as shown below where the path
is a list of keys/indices and the value is a scalar.

You could also concatenate the path elements and use them as a column key
instead. The advantage there is that you can do range queries against such
structures, and they will efficiently yield subtrees. E.g. a query for
"path > 'readings.1.' and path < 'readings.1.\u'" will yield the
appropriate rows.

ml

([u'events', 0, u'timestamp'], 1378686742465)

([u'events', 0, u'version'], 0.1)

([u'events', 0, u'type'], u'direction_change')

([u'events', 0, u'data', u'units'], u'miles')

([u'events', 0, u'data', u'direction'], u'NW')

([u'events', 0, u'data', u'offset'], 23)

([u'events', 1, u'timestamp'], 1378686742465)

([u'events', 1, u'version'], 0.1)

([u'events', 1, u'type'], u'altitude_change')

([u'events', 1, u'data', u'duration'], 18923)

([u'events', 1, u'data', u'rate'], 0.2)

([u'readings', 0, u'timestamp'], 1378686742465)

([u'readings', 0, u'value'], 20)

([u'readings', 0, u'rate_of_change'], 0.05)

([u'readings', 1, u'timestamp'], 1378686742466)

([u'readings', 1, u'value'], 22)

([u'readings', 1, u'rate_of_change'], 0.05)

([u'readings', 2, u'timestamp'], 1378686742467)

([u'readings', 2, u'value'], 21)

([u'readings', 2, u'rate_of_change'], 0.05)


On Wed, Sep 11, 2013 at 2:26 PM, Hartzman, Leslie <
leslie.d.hartz...@medtronic.com> wrote:

>  Hi,
>
> ** **
>
> What would be the recommended way to deal with a complex JSON structure,
> short of storing the whole JSON as a value to a column? What options are
> there to store dynamic data like this?
>
> ** **
>
> e.g.,
>
> ** **
>
> {
>
>   “ readings”: [
>
> {
>
>“value” : 20,
>
>   “rate_of_change” : 0.05,
>
>   “timestamp” :  1378686742465
>
>  },
>
> {
>
>“value” : 22,
>
>   “rate_of_change” : 0.05,
>
>   “timestamp” :  1378686742466
>
>  },
>
> {
>
>“value” : 21,
>
>   “rate_of_change” : 0.05,
>
>   “timestamp” :  1378686742467
>
>  }
>
>   ],
>
>   “events” : [
>
>  {
>
> “type” : “direction_change”,
>
> “version” : 0.1,
>
> “timestamp”: 1378686742465
>
>  “data” : {
>
>   “units” : “miles”,
>
>   “direction” : “NW”,
>
>   “offset” : 23
>
>   }
>
>},
>
>  {
>
> “type” : “altitude_change”,
>
> “version” : 0.1,
>
> “timestamp”: 1378686742465
>
>  “data” : {
>
>   “rate”: 0.2,
>
>   “duration” : 18923
>
>   }
>
> }
>
>]
>
> }
>
> ** **
>
>  
>
> [CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email
> is proprietary to Medtronic and is intended for use only by the individual
> or entity to which it is addressed, and may contain information that is
> private, privileged, confidential or exempt from disclosure under
> applicable law. If you are not the intended recipient or it appears that
> this mail has been forwarded to you without proper authority, you are
> notified that any use or dissemination of this information in any manner is
> strictly prohibited. In such cases, please delete this mail from your
> records. To view this notice in other languages you can either select the
> following link or manually copy and paste the link into the address bar of
> a web browser: http://emaildisclaimer.medtronic.com
>

Re: Nodes not added to existing cluster

2013-09-25 Thread Laing, Michael

Check your security groups to be sure you have appropriate access.

If in a VPC check both IN and OUT; if using ACLs check those.


On Wed, Sep 25, 2013 at 3:41 PM, Skye Book  wrote:

> Hi all,
>
> I have a three node cluster using the EC2 Multi-Region Snitch currently
> operating only in US-EAST.  On having a node go down this morning, I
> started a new node with an identical configuration, except for the seed
> list, the listen address and the rpc address.  The new node comes up and
> creates its own cluster rather than joining the pre-existing ring.  I've
> tried creating a node both *before* ad *after* using `nodetool remove`
> for the bad node, each time with the same result.
>
> Does anyone have any suggestions for where to look that might put me on
> the right track?
>
> Thanks,
> -Skye
>
>

Re: How to select timestamp with CQL

2013-10-23 Thread Laing, Michael

http://www.datastax.com/documentation/cql/3.1/webhelp/index.html#cql/cql_reference/select_r.html


On Wed, Oct 23, 2013 at 6:50 AM, Alex N  wrote:

> Thanks!
> I can't find it in the documentation...
>
>
>
> 2013/10/23 Cyril Scetbon 
>
>> Hi,
>>
>> Now you can ask for the TTL and the TIMESTAMP as shown in the following
>> example :
>>
>> cqlsh:k1> select * FROM t1 ;
>>
>>  *ise*| *filtre* | *value_1*
>> ++-
>>  *cyril1* |  *2* |   *49926*
>>  *cyril2* |  *1* |   *18584*
>>  *cyril3* |  *2* |   *31415*
>>
>> cqlsh:k1> select filtre,writetime(filtre),ttl(filtre) FROM t1 ;
>>
>>  *filtre* | *writetime(filtre)* | *ttl(filtre)*
>> +---+-
>>   *2* |  *1380088288623000* |*null*
>>   *1* |  *1380088288636000* |*null*
>>   *2* |  *1380088289309000* |*null*
>>
>>  Regards
>> --
>> Cyril SCETBON
>>
>> On 23 Oct 2013, at 12:00, Alex N  wrote:
>>
>> Hi,
>> I was wondering how could I select column timestamp with CQL. I've been
>> using Hector so far, and it gives me this option. But I want to use
>> datastax CQL driver now.
>> I don't want to mess with this value! just read it. I know I should
>> probably have separate column with timestamp value created by my own, but I
>> don't want to change the schema and update milions of rows know.
>> I found this ticket https://issues.apache.org/jira/browse/CASSANDRA-4217and 
>> it's fixed but I don't know how to use it -
>>
>> SELECT key, value, timestamp(value) FROM foo; - this doesn't work.
>> Regards,
>> Alex
>>
>>
>>
>>
>

Re: How would you model that?

2013-11-08 Thread Laing, Michael

You could try this:

CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary key
(shard, ts));

select user, ts from user_activity where shard in ('00', '01', ...) order
by ts desc;

Grab each user and ts the first time you see that user.

Use as many shards as you think you need to control row size and spread the
load.

Set ttls to expire user_activity entries when you are no longer interested
in them.

ml


On Fri, Nov 8, 2013 at 6:10 AM, pavli...@gmail.com wrote:

> Hey guys, I need to retrieve a list of distinct users based on their
> activity datetime. How can I model a table to store that kind of
> information?
>
> The straightforward decision was this:
>
> CREATE TABLE user_activity (user text primary key, ts timeuuid);
>
> but it turned out it is impossible to do a select like this:
>
> select * from user_activity order by ts;
>
> as it fails with "ORDER BY is only supported when the partition key is
> restricted by an EQ or an IN".
>
> How would you model the thing? Just need to have a list of users based on
> their last activity timestamp...
>
> Thanks!
>
>

Re: IN predicates on non-primary-key columns (%s) is not yet supported - then will it be ?

2013-11-08 Thread Laing, Michael

try this:

CREATE COLUMNFAMILY post (
KEY uuid,
author uuid,
blog timeuuid, -- sortable
name text,
data text,
PRIMARY KEY ( KEY, blog )
);

create index on post (author);

SELECT * FROM post
WHERE
blog >= 4d6b5fc5-487b-11e3-a6f4-406c8f1838fa
AND blog <= 50573ef8-487b-11e3-be65-406c8f1838fa
AND author= a6c9f405-487b-11e3-bd38-406c8f1838fa
;

works if blog can be modeled this way...

ml


On Fri, Nov 8, 2013 at 6:58 AM, Сергей Нагайцев  wrote:

> CREATE COLUMNFAMILY post (
> KEY uuid,
> author uuid,
> blog uuid,
> name text,
> data text,
> PRIMARY KEY ( KEY )
> );
>
> SELECT * FROM post WHERE blog IN (1,2) AND author=3 ALLOW FILTERING;
> (don't look at fact numbers are not uuids :)
>
> Error: IN predicates on non-primary-key columns (blog) is not yet supported
>
> And how to workaround this ?
> Manual index tables ? Any guidelines how to design them ?
>

Re: Best data structure for tracking most recent updates.

2013-11-08 Thread Laing, Michael

Here are a couple ideas:

1. You can rotate tables and truncate to avoid deleting.
2. You can shard your tables (partition key) to mitigate hotspots.
3. You can use a column key to store rows in timeuuid sequence.

create table recent_updates_00 (shard text, uuid timeuuid, message text,
primary key (shard, uuid));
create table recent_updates_01 (shard text, uuid timeuuid, message text,
primary key (shard, uuid)));
...

You can determine 'shard' randomly within a range, e.g. 1 of 24 shards,
when you write. Sharding spreads the load as each shard is a row.

You determine which table to write to by current datetime, e.g. hour of
day, day of week, etc. and use the modulus based upon, e.g. every 5 hours,
every 3 days, etc. So you are only writing to 1 table at a time. Usually I
derive the datetime from the timeuuid so all is consistent. Within your
modulus range, you can truncate currently unused tables so they are ready
for reuse - truncation is overall much cheaper than deletion.

You can retrieve 'the latest' updates by doing a query like this - the
table is determined by current time, but possibly you will want to append
results from the 'prior' table if you do not satisfy your limit:

select uuid, message from recent_updates_xx where shard in ('00', '01',
...) order by uuid desc limit 10; -- get the latest 10

This is a very efficient query. You can improve efficiency somewhat by
altering the storage order in the table creates.

ml

On Fri, Nov 8, 2013 at 6:02 PM, Jacob Rhoden  wrote:

> I need to be able to show the most recent changes that have occurred in a
> system, I understand inserting every update into a tracking table and
> deleting old updates may not be great, as I may end up creating millions of
> tombstones. i.e. don't do this:
>
> create table recent_updates(uuid timeuuid primary key, message text);
> insert into recent_updates(now(), 'the message');
> insert into recent_updates(now(), 'the message');
> 
> insert into recent_updates(now(), 'the message');
> // delete all but the most recent ten messages.
>
> So how do people solve it? The following option occurs to me, but I am not
> sure if its the best option:
>
> create table recent_updates(record int primary key, message text, uuid
> timeuuid);
> insert into recent_updates(1, 'the message', now());
> insert into recent_updates(2, 'the message', now());
> 
> insert into recent_updates(10, 'the message', now());
> // rotate back to 1
> insert into recent_updates(1, 'the message', now());
>
> Doing it this way would require a query to find out what number in the
> sequence we are up to.
>
> Best regards,
> Jacob
>

Re: Efficient IP address location lookup

2013-11-16 Thread Laing, Michael

This approach is similar to Janne's.

But I used a shard as an example to make more even rows, and just converted
each IP to an int.

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE jacob_test;

CREATE KEYSPACE jacob_test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE jacob_test;

CREATE TABLE location (
shard int,
start bigint,
end bigint,
country text,
city text,
PRIMARY KEY (shard, start)
);

-- shard is calculated as start % 12

-- range 100.0.0.0 - 100.0.0.10 == 1677721600 - 1677721610
INSERT INTO location (shard, start, end, country, city) VALUES
(4,1677721600,1677721610,'AU','Melbourne');

-- range 100.0.0.11 - 100.0.0.200
INSERT INTO location (shard, start, end, country, city) VALUES
(3,1677721611,1677721800,'US','New York');

-- range 100.0.0.201-100.0.0.255
INSERT INTO location (shard, start, end, country, city) VALUES
(1,1677721801,1677721855,'UK','London');

-- where is IP 100.0.0.30?
SELECT * FROM location WHERE shard IN (0,1,2,3,4,5,6,7,8,9,10,11) AND start
<= 1677721630 LIMIT 1;

-- returns:

-- shard | start  | city | country | end
-++--+-+
-- 3 | 1677721611 | New York |  US | 1677721800

--(1 rows)

-- app should check that 'end' value is >= IP
-- alternatively fill in ranges with 'unknown', as previously suggested



On Sat, Nov 16, 2013 at 3:48 AM, Janne Jalkanen wrote:

> Idea:
>
> Put only range end points in the table with primary key (part, remainder)
>
> insert into location (part, remainder, city) values (100,10,Sydney) //
> 100.0.0.1-100.0.0.10 is Sydney
> insert into location (part, remainder, city) values (100,50,Melbourne) //
> 100.0.0.11-100.0.0.5 is Melb
>
> then look up (100.0.0.30) as
>
> select * from location where part=100 and remainder >= 30 limit 1
>
> For nonused ranges just put in an empty city or some other known value :)
>
> /Janne
>
> On Nov 16, 2013, at 04:51 , Jacob Rhoden  wrote:
>
>
> On 16 Nov 2013, at 1:47 pm, Jon Haddad  wrote:
>
> Instead of determining your table first, you should figure out what you
> want to ask Cassandra.
>
>
> Thanks Jon, Perhaps I should have been more clear. I need to efficiently
> look up the location of an IP address.
>
> On Nov 15, 2013, at 4:36 PM, Jacob Rhoden  wrote:
>
> Hi Guys,
>
> It occurs to me that someone may have done this before and be willing to
> share, or may just be interested in helping work out it.
>
> Assuming a database table where the partition key is the first component
> of a users IPv4 address, i.e. (ip=100.0.0.1, part=100) and the remaining
> three parts of the IP address become a 24bit integer.
>
> create table location(
> part int,
> start bigint,
> end bigint,
> country text,
> city text,
> primary key (part, start, end));
>
> // range 100.0.0.0 - 100.0.0.10
> insert into location (part, start, end, country, city)
> values(100,0,10,'AU','Melbourne’);
>
> // range 100.0.0.11 - 100.0.0.200
> insert into location (part, start, end, country, city)
> values(100,11,200,'US','New York’);
>
> // range 100.0.0.201-100.0.0.255
> insert into location (part, start, end, country, city)
> values(100,201,255,'UK','London');
>
>
> What is the appropriate way to then query this? While the following is
> possible:
>
> select * from location where part=100 and start<=30
>
>
> What I need to do, is this, which seems not allowed. What is the correct
> way to query this?
>
> select * from location where part=100 and start<=30 and end>=30
>
>
> Or perhaps I’m going about this all wrong? Thanks!
>
>
>
>

Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC

2013-11-19 Thread Laing, Michael

We had a similar problem when our nodes could not sync using ntp due to VPC
ACL settings. -ml


On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt wrote:

> Hi all,
>
> I am attempting to bring up our new app on a 3-node cluster and am having
> problems with frequent read timeouts and slow inter-node replication.
> Initially, these errors were mostly occurring in our app server, affecting
> 0.02%-1.0% of our queries in an otherwise unloaded cluster. No exceptions
> were logged on the servers in this case, and reads in a single node
> environment with the same code and client driver virtually never see
> exceptions like this, so I suspect problems with the inter-cluster
> communication between nodes.
>
> The 3 nodes are deployed in a single AWS VPC, and are all in a common
> subnet. The Cassandra version is 2.0.2 following an upgrade this past
> weekend due to NPEs in a secondary index that were affecting certain
> queries under 2.0.1. The servers are m1.large instances running AWS Linux
> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
> All database contents are CQL tables with replication factor of 3, and the
> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>
> In testing with the application, I noticed this afternoon that the
> contents of the 3 nodes differed in their respective copies of the same
> table for newly written data, for time periods exceeding several minutes,
> as reported by cqlsh on each node. Specifying different hosts from the same
> server using cqlsh also exhibited timeouts on multiple attempts to connect,
> and on executing some queries, though they eventually succeeded in all
> cases, and eventually the data in all nodes was fully replicated.
>
> The AWS servers have a security group with only ports 22, 7000, 9042, and
> 9160 open.
>
> At this time, it seems that either I am still missing something in my
> cluster configuration, or maybe there are other ports that are needed for
> inter-node communication.
>
> Any advice/suggestions would be appreciated.
>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobe...@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>

Re: CQL and counters

2013-11-22 Thread Laing, Michael

Here's another example that may help:

-- put this in  AND run using 'cqlsh -f 

DROP KEYSPACE bryce_test;

CREATE KEYSPACE bryce_test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE bryce_test;

CREATE TABLE samples (
name text,
bucket text,
count counter,
total counter,
PRIMARY KEY (name, bucket)
);

UPDATE samples SET count = count + 1, total = total + 1 WHERE name='test'
AND bucket='2013-11-22T19:00';
UPDATE samples SET count = count + 1, total = total + 2 WHERE name='test'
AND bucket='2013-11-22T19:00';
UPDATE samples SET count = count + 1, total = total + 3 WHERE name='test'
AND bucket='2013-11-22T19:15';
UPDATE samples SET count = count + 1, total = total + 4 WHERE name='test'
AND bucket='2013-11-22T19:30';
UPDATE samples SET count = count + 1, total = total + 5 WHERE name='test'
AND bucket='2013-11-22T19:30';

SELECT * FROM samples;

SELECT * FROM samples
WHERE
name = 'test'
AND bucket >= '2013-11-22T19:30'
AND bucket <= '2013-11-22T19:45';

-- returns:

-- name | bucket   | count | total
+--+---+---
-- test | 2013-11-22T19:00 | 2 | 3
-- test | 2013-11-22T19:15 | 1 | 3
-- test | 2013-11-22T19:30 | 2 | 9

--(3 rows)


-- name | bucket   | count | total
+--+---+---
-- test | 2013-11-22T19:30 | 2 | 9

--(1 rows)



On Fri, Nov 22, 2013 at 7:21 PM, Tyler Hobbs  wrote:

> Something like this would work:
>
> CREATE TABLE foo (
> interface text,
> property text,
> bucket timestamp,
> count counter,
> PRIMARY KEY ((interface, property), bucket)
> )
>
> interface is 'NIC1' and property is 'Total' or 'Count'.
>
> To query over a date range, you'd run a query like:
>
> SELECT bucket, count FROM foo WHERE interface='NIC1' AND property='total'
> AND bucket > '2013-11-22 10:00:00' AND bucket < '2013-11-22 12:00:00';
>
>
> On Fri, Nov 22, 2013 at 4:48 PM, Bryce Godfrey 
> wrote:
>
>>  I’m looking for some guidance on how to model some stat tracking over
>> time, bucketed to some type of interval (15 min, hour, etc).
>>
>>
>>
>> As an example, let’s say I would like to track network traffic throughput
>> and bucket it to 15 minute intervals.  In our old model, using thrift I
>> would create a column family set to counter, and use a timestamp ticks for
>> the column name for a “total” and “count” column.  And as data was sampled,
>> we would increment count by one, and increment the total with the sampled
>> value for that time bucket.  The column name would give us the datetime for
>> the values, as well as provide me with a convenient row slice query to get
>> a date range for any given statistic.
>>
>>
>>
>> Key| 1215  | 1230 | 1245
>>
>> NIC1:Total   | 100| 56  |  872
>>
>> NIC1:Count | 15  | 15  | 15
>>
>>
>>
>> Then given the total/count I can show an average over time.
>>
>>
>>
>> In CQL it seems like I can’t do new counter columns at runtime unless
>> they are defined in the schema first or run an ALTER statement, which may
>> not be the correct way to go.  So is there a better way to model this type
>> of data with the new CQL world?  Nor do I know how to query that type of
>> data, similar to the row slice by column name.
>>
>>
>>
>> Thanks,
>>
>> Bryce
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Laing, Michael

We use the python-driver and have contributed some to its development.

I have been careful to not push too fast on features until we need them.
For example, we have just started using prepared statements - working well
BTW.

Next we will employ futures and start to exploit the async nature of new
interface to C*.

We are very familiar with libev in both C and python, and are happy to dig
into the code to add features and fix bugs as needed, so the rewards of
bypassing the old and focusing on the new seem worth the risks to us.

ml


On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad  wrote:

> So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
> currently using the thrift api to execute CQL until the native driver is
> out of beta.  I'm a little biased in recommending it, since I'm one of the
> primary authors.  If you've got cqlengine specific questions, head to the
> mailing list: https://groups.google.com/forum/#!forum/cqlengine-users
>
> If you want to roll your own solution, it might make sense to take an
> approach like we did and throw a layer on top of thrift so you don't have
> to do a massive rewrite of your entire app once you want to go native.
>
> Jon
>
>
> On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan wrote:
>
>> I have worked with Pycassa before and wrote a wrapper to use batch
>> mutation & connection pooling etc. But
>> http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL
>> 3 based api because Thrift based api (Pycassa) will be supported for
>> backward compatibility only. Apache site recommends to use Python api
>> written by DataStax which is still in Beta (As per their documentation).
>> See warnings from their python-driver/README.rst file
>>
>> *Warning*
>>
>> This driver is currently under heavy development, so the API and layout
>> of packages,modules, classes, and functions are subject to change. There
>> may also be serious bugs, so usage in a production environment is *not* 
>> recommended
>> at this time.
>>
>> DataStax site http://www.datastax.com/download/clientdrivers recommends
>> using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
>> between CQL 3 based apis? Which stands out on top? Answers based on facts
>> will help the community so please refrain from opinions.
>>
>> Please help ??
>>
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Laing, Michael

I think thread pooling is always in operation - and we haven't seen any
problems in that regard going to the 6 local nodes each client connects to.
We haven't tried batching yet.


On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan  wrote:

> Michael - thanks. Have you tried batching and thread pooling in
> python-driver? For now, i would avoid object mapper cqlengine, just because
> of my deadlines.
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox> for iPhone
>
>
> On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael  > wrote:
>
>> We use the python-driver and have contributed some to its development.
>>
>> I have been careful to not push too fast on features until we need them.
>> For example, we have just started using prepared statements - working well
>> BTW.
>>
>> Next we will employ futures and start to exploit the async nature of new
>> interface to C*.
>>
>> We are very familiar with libev in both C and python, and are happy to
>> dig into the code to add features and fix bugs as needed, so the rewards of
>> bypassing the old and focusing on the new seem worth the risks to us.
>>
>> ml
>>
>>
>> On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad wrote:
>>
>>>  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
>>> currently using the thrift api to execute CQL until the native driver is
>>> out of beta.  I'm a little biased in recommending it, since I'm one of the
>>> primary authors.  If you've got cqlengine specific questions, head to the
>>> mailing list: https://groups.google.com/forum/#!forum/cqlengine-users
>>>
>>> If you want to roll your own solution, it might make sense to take an
>>> approach like we did and throw a layer on top of thrift so you don't have
>>> to do a massive rewrite of your entire app once you want to go native.
>>>
>>> Jon
>>>
>>>
>>> On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan wrote:
>>>
>>>>  I have worked with Pycassa before and wrote a wrapper to use batch
>>>> mutation & connection pooling etc. But
>>>> http://wiki.apache.org/cassandra/ClientOptions recommends now to use
>>>> CQL 3 based api because Thrift based api (Pycassa) will be supported for
>>>> backward compatibility only. Apache site recommends to use Python api
>>>> written by DataStax which is still in Beta (As per their documentation).
>>>> See warnings from their python-driver/README.rst file
>>>>
>>>> *Warning*
>>>>
>>>> This driver is currently under heavy development, so the API and layout
>>>> of packages,modules, classes, and functions are subject to change. There
>>>> may also be serious bugs, so usage in a production environment is *not* 
>>>> recommended
>>>> at this time.
>>>>
>>>> DataStax site http://www.datastax.com/download/clientdrivers recommends
>>>> using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
>>>> between CQL 3 based apis? Which stands out on top? Answers based on facts
>>>> will help the community so please refrain from opinions.
>>>>
>>>> Please help ??
>>>>
>>>
>>>
>>>
>>>  --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> skype: rustyrazorblade
>>>
>>
>>
>

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Laing, Michael

That's not a problem we have faced yet.


On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan  wrote:

> How do you insert huge amount of data?
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox> for iPhone
>
>
> On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael  > wrote:
>
>> I think thread pooling is always in operation - and we haven't seen any
>> problems in that regard going to the 6 local nodes each client connects to.
>> We haven't tried batching yet.
>>
>>
>> On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan wrote:
>>
>>> Michael - thanks. Have you tried batching and thread pooling in
>>> python-driver? For now, i would avoid object mapper cqlengine, just because
>>> of my deadlines.
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox> for iPhone
>>>
>>>
>>> On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael <
>>> michael.la...@nytimes.com> wrote:
>>>
>>>> We use the python-driver and have contributed some to its development.
>>>>
>>>> I have been careful to not push too fast on features until we need
>>>> them. For example, we have just started using prepared statements - working
>>>> well BTW.
>>>>
>>>> Next we will employ futures and start to exploit the async nature of
>>>> new interface to C*.
>>>>
>>>> We are very familiar with libev in both C and python, and are happy to
>>>> dig into the code to add features and fix bugs as needed, so the rewards of
>>>> bypassing the old and focusing on the new seem worth the risks to us.
>>>>
>>>> ml
>>>>
>>>>
>>>> On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad wrote:
>>>>
>>>>>  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
>>>>> currently using the thrift api to execute CQL until the native driver is
>>>>> out of beta.  I'm a little biased in recommending it, since I'm one of the
>>>>> primary authors.  If you've got cqlengine specific questions, head to the
>>>>> mailing list: https://groups.google.com/forum/#!forum/cqlengine-users
>>>>>
>>>>> If you want to roll your own solution, it might make sense to take an
>>>>> approach like we did and throw a layer on top of thrift so you don't have
>>>>> to do a massive rewrite of your entire app once you want to go native.
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>> On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan wrote:
>>>>>
>>>>>>  I have worked with Pycassa before and wrote a wrapper to use batch
>>>>>> mutation & connection pooling etc. But
>>>>>> http://wiki.apache.org/cassandra/ClientOptions recommends now to use
>>>>>> CQL 3 based api because Thrift based api (Pycassa) will be supported for
>>>>>> backward compatibility only. Apache site recommends to use Python api
>>>>>> written by DataStax which is still in Beta (As per their documentation).
>>>>>> See warnings from their python-driver/README.rst file
>>>>>>
>>>>>> *Warning*
>>>>>>
>>>>>> This driver is currently under heavy development, so the API and
>>>>>> layout of packages,modules, classes, and functions are subject to change.
>>>>>> There may also be serious bugs, so usage in a production environment is
>>>>>> *not* recommended at this time.
>>>>>>
>>>>>> DataStax site http://www.datastax.com/download/clientdrivers recommends
>>>>>> using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
>>>>>> between CQL 3 based apis? Which stands out on top? Answers based on facts
>>>>>> will help the community so please refrain from opinions.
>>>>>>
>>>>>> Please help ??
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Jon Haddad
>>>>> http://www.rustyrazorblade.com
>>>>> skype: rustyrazorblade
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-09 Thread Laing, Michael

My experience is that you must upgrade to 2.0.3 ASAP to fix this.

Michael


On Mon, Dec 9, 2013 at 6:39 PM, David Laube  wrote:

> Hi All,
>
> We are running Cassandra 2.0.2 and have recently stumbled upon an issue
> with nodetool repair. Upon running nodetool repair on each of the 5 nodes
> in the ring (one at a time) we observe the following exceptions returned to
> standard out;
>
>
> [2013-12-08 11:04:02,047] Repair session
> ff16c510-5ff7-11e3-97c0-5973cc397f8f for range
> (1246984843639507027,1266616572749926276] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #ff16c510-5ff7-11e3-97c0-5973cc397f8f on keyspace_name/col_family1,
> (1246984843639507027,1266616572749926276]] Validation failed in /10.x.x.48
> [2013-12-08 11:04:02,063] Repair session
> 284c8b40-5ff8-11e3-97c0-5973cc397f8f for range
> (-109256956528331396,-89316884701275697] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #284c8b40-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family2,
> (-109256956528331396,-89316884701275697]] Validation failed in /10.x.x.103
> [2013-12-08 11:04:02,070] Repair session
> 399e7160-5ff8-11e3-97c0-5973cc397f8f for range
> (8901153810410866970,8915879751739915956] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #399e7160-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family1,
> (8901153810410866970,8915879751739915956]] Validation failed in /10.x.x.103
> [2013-12-08 11:04:02,072] Repair session
> 3ea73340-5ff8-11e3-97c0-5973cc397f8f for range
> (1149084504576970235,1190026362216198862] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #3ea73340-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family1,
> (1149084504576970235,1190026362216198862]] Validation failed in /10.x.x.103
> [2013-12-08 11:04:02,091] Repair session
> 6f0da460-5ff8-11e3-97c0-5973cc397f8f for range
> (-5407189524618266750,-5389231566389960750] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #6f0da460-5ff8-11e3-97c0-5973cc397f8f on keyspace_name/col_family1,
> (-5407189524618266750,-5389231566389960750]] Validation failed in
> /10.x.x.103
> [2013-12-09 23:16:36,962] Repair session
> 7efc2740-6127-11e3-97c0-5973cc397f8f for range
> (1246984843639507027,1266616572749926276] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #7efc2740-6127-11e3-97c0-5973cc397f8f on keyspace_name/col_family1,
> (1246984843639507027,1266616572749926276]] Validation failed in /10.x.x.48
> [2013-12-09 23:16:36,986] Repair session
> a8c44260-6127-11e3-97c0-5973cc397f8f for range
> (-109256956528331396,-89316884701275697] failed with error
> org.apache.cassandra.exceptions.RepairException: [repair
> #a8c44260-6127-11e3-97c0-5973cc397f8f on keyspace_name/col_family2,
> (-109256956528331396,-89316884701275697]] Validation failed in /10.x.x.210
>
> The /var/log/cassandra/system.log shows similar info as above with no real
> explanation as to the root cause behind the exception(s).  There also does
> not appear to be any additional info in /var/log/cassandra/cassandra.log.
> We have tried restoring a recent snapshot of the keyespace in question to a
> separate staging ring and the repair runs successfully and without
> exception there. This is even after we tried insert/delete on the keyspace
> in the separate staging ring. Has anyone seen this behavior before and what
> can we do to resolve this? Any assistance would be greatly appreciated.
>
> Best regards,
> -Dave

Re: Recurring actions with 4 hour interval

2013-12-10 Thread Laing, Michael

2.0.3: system tables have a 1 hour memtable_flush_period which I have
observed to trigger compaction on the 4 hour mark. Going by memory tho...
-ml


On Tue, Dec 10, 2013 at 10:31 AM, Andre Sprenger
wrote:

> As far as I know there is nothing hard coded in Cassandra that kicks in
> every 4 hours. Turn on GC logging, maybe dump the output of jstats to a
> file and correlate this data with the Cassandra logs. Cassandra logs are
> pretty good in telling you what is going on.
>
>
> 2013/12/10 Joel Samuelsson 
>
>> Hello,
>>
>> We've been having a lot of problems with extremely long GC (and still do)
>> which I've asked about several times on this list (I can find links to
>> those discussions if anyone is interested).
>> We noticed a pattern that the GC pauses may be related to something
>> happening every 4 hours. Is there anything specific happening within
>> Cassandra with a 4 hour interval?
>>
>> Any help is much appreciated,
>> Joel Samuelsson
>>
>
>

Re: Exactly one wide row per node for a given CF?

2013-12-10 Thread Laing, Michael

You could shard your rows like the following.

You would need over 100 shards, possibly... so testing is in order :)

Michael

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE robert_test;

CREATE KEYSPACE robert_test WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE robert_test;

CREATE TABLE bdn_index_pub (
tree int,
shard int,
pord int,
hpath text,
PRIMARY KEY ((tree, shard), pord)
);

-- shard is calculated as pord % 12

COPY bdn_index_pub (tree, shard, pord, hpath) FROM STDIN;
1, 1, 1, "Chicago"
5, 3, 15, "New York"
1, 5, 5, "Melbourne"
3, 2, 2, "San Francisco"
1, 3, 3, "Palo Alto"
\.

SELECT * FROM bdn_index_pub
WHERE shard IN (0,1,2,3,4,5,6,7,8,9,10,11)
AND tree =  1
AND pord < 4
AND pord > 0
ORDER BY pord desc
;

-- returns:

-- tree | shard | pord | hpath
+---+--+--
--1 | 3 |3 |  "Palo Alto"
--1 | 1 |1 |"Chicago"



On Tue, Dec 10, 2013 at 8:41 AM, Robert Wille  wrote:

> I have a question about this statement:
>
> When rows get above a few 10’s  of MB things can slow down, when they get
> above 50 MB they can be a pain, when they get above 100MB it’s a warning
> sign. And when they get above 1GB, well you you don’t want to know what
> happens then.
>
> I tested a data model that I created. Here’s the schema for the table in
> question:
>
> CREATE TABLE bdn_index_pub (
>
> tree INT,
>
> pord INT,
>
> hpath VARCHAR,
>
> PRIMARY KEY (tree, pord)
>
> );
>
> As a test, I inserted 100 million records. tree had the same value for
> every record, and I had 100 million values for pord. hpath averaged about
> 50 characters in length. My understanding is that all 100 million strings
> would have been stored in a single row, since they all had the same value
> for the first component of the primary key. I didn’t look at the size of
> the table, but it had to be several gigs (uncompressed). Contrary to what
> Aaron says, I do want to know what happens, because I didn’t experience any
> issues with this table during my test. Inserting was fast. The last batch
> of records inserted in approximately the same amount of time as the first
> batch. Querying the table was fast. What I didn’t do was test the table
> under load, nor did I try this in a multi-node cluster.
>
> If this is bad, can somebody suggest a better pattern? This table was
> designed to support a query like this: select hpath from bdn_index_pub
> where tree = :tree and pord >= :start and pord <= :end. In my application,
> most trees will have less than a million records. A handful will have 10’s
> of millions, and one of them will have 100 million.
>
> If I need to break up my rows, my first instinct would be to divide each
> tree into blocks of say 10,000 and change tree to a string that contains
> the tree and the block number. Something like this:
>
> 17:0, 0, ‘/’
> …
> 17:0, , ’/a/b/c’
> 17:1,1, ‘/a/b/d’
> …
>
> I’d then need to issue an extra query for ranges that crossed block
> boundaries.
>
> Any suggestions on a better pattern?
>
> Thanks
>
> Robert
>
> From: Aaron Morton 
> Reply-To: 
> Date: Tuesday, December 10, 2013 at 12:33 AM
> To: Cassandra User 
> Subject: Re: Exactly one wide row per node for a given CF?
>
> But this becomes troublesome if I add or remove nodes. What effectively I
>> want is to partition on the unique id of the record modulus N (id % N;
>> where N is the number of nodes).
>
> This is exactly the problem consistent hashing (used by cassandra) is
> designed to solve. If you hash the key and modulo the number of nodes,
> adding and removing nodes requires a lot of data to move.
>
> I want to be able to randomly distribute a large set of records but keep
>> them clustered in one wide row per node.
>
> Sounds like you should revisit your data modelling, this is a pretty well
> known anti pattern.
>
> When rows get above a few 10’s  of MB things can slow down, when they get
> above 50 MB they can be a pain, when they get above 100MB it’s a warning
> sign. And when they get above 1GB, well you you don’t want to know what
> happens then.
>
> It’s a bad idea and you should take another look at the data model. If you
> have to do it, you can try the ByteOrderedPartitioner which uses the row
> key as a token, given you total control of the row placement.
>
> Cheers
>
>
> -
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 4/12/2013, at 8:32 pm, Vivek Mishra  wrote:
>
> So Basically you want to create a cluster of multiple unique keys, but
> data which belongs to one unique should be colocated. correct?
>
> -Vivek
>
>
> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending 
> wrote:
>
>> Subject says it all. I want to be able to randomly distribute a large set
>> of records but keep them clustered in one wide row per node.
>>
>> As an example, lets say I’ve got a collection of

Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Laing, Michael

It's a feature:

In the stock cassandra.yaml file for 2.03 see:

# When executing a scan, within or across a partition, we need to keep the
> # tombstones seen in memory so we can return them to the coordinator, which
> # will use them to make sure other replicas also know about the deleted
> rows.
> # With workloads that generate a lot of tombstones, this can cause
> performance
> # problems and even exaust the server heap.
> # (
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
> )
> # Adjust the thresholds here if you understand the dangers and want to
> # scan more tombstones anyway.  These thresholds may also be adjusted at
> runtime
> # using the StorageService mbean.
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10


You are hitting the failure threshold.

ml


On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:

> Sanjeeth,
>
> Looks like the error is being populated from the hintedhandoff, what is
> the size of your hints cf?
>
> Thanks
> Rahul
>
>
> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>
>> Hi all,
>>   One of my cassandra nodes crashes with the following exception
>> periodically -
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
>> (line 200) Scanned over 10 tombstones; query aborted (see
>> tombstone_fail_thr
>> eshold)
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>> at
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>> at
>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>> at
>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>> at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> Why does this happen? Does this relate to any incorrect config value?
>>
>> The Cassandra Version I'm running is
>> ReleaseVersion: 2.0.3
>>
>> - Sanjeeth
>>
>>
>

Re: Latest Stable version of cassandra in production

2014-01-09 Thread Laing, Michael

I would like to +1 Jan.

We are using C* 2.0 and have just gone into production directly supporting
the latest revision of www.nytimes.com.

I avoid new features unless I really need them; we are prepared to read
code and make fixes ourselves if necessary, but it has not been.

Best regards,

Michael


On Thu, Jan 9, 2014 at 7:04 AM, Jan Algermissen
wrote:

>
> On 09.01.2014, at 03:36, Sanjeeth Kumar  wrote:
>
> > Hi all,
> >   What is the latest stable version of cassandra  you have in production
> ? We are migrating a large chunk of our mysql database to cassandra. I see
> a lot of discussions regarding 1.* versions, but I have not seen / could
> not find discussions regarding using 2.* versions in production. Any
> suggestions for the version based on your experience?
> >
>
> I came to C* with 2.0 and have it up an running since about August. I have
> a production environment but can live with occasional problems. I could
> even setup the whole 3-node cluster from scratch without being killed :-)
>
> So I am happy with 2.0 having a short history of production use.
>
> What I noticed or saw on the list is that newer features (auto-paging, CAS
> support) seem to make it into the production releases in a non-production
> state.
>
> It appears this is generally accepted practice in the community - and
> honestly, I would not really know how to thoroughly test such features
> without handing them to the community to run them. So I do not think the C*
> crew is to blame, really.
>
> I have not had any issues with C* 2.0 when steering clear of these
> features (e.g. manually disable paging when working with ranges and do it
> yourself).
>
> Jan
>
>
>
> > - Sanjeeth
>
>

Re: Latest Stable version of cassandra in production

2014-01-09 Thread Laing, Michael

Good: doesn't OOM on smallish machines, can use defaults for almost all
params w good results.

Bad: watch the list like a hawk to avoid problems others have, be aware of
bug fixes, workarounds, and Jira issues.

ml


On Thu, Jan 9, 2014 at 12:58 PM, Bruce Durling  wrote:

> So, what are you getting from 2.0 if you aren't using the new
> features? Why not stick with 1.2.x?
>
> cheers,
> Bruce
>
> On Thu, Jan 9, 2014 at 12:37 PM, Laing, Michael
>  wrote:
> > I would like to +1 Jan.
> >
> > We are using C* 2.0 and have just gone into production directly
> supporting
> > the latest revision of www.nytimes.com.
> >
> > I avoid new features unless I really need them; we are prepared to read
> code
> > and make fixes ourselves if necessary, but it has not been.
> >
> > Best regards,
> >
> > Michael
> >
> >
> > On Thu, Jan 9, 2014 at 7:04 AM, Jan Algermissen <
> jan.algermis...@nordsc.com>
> > wrote:
> >>
> >>
> >> On 09.01.2014, at 03:36, Sanjeeth Kumar  wrote:
> >>
> >> > Hi all,
> >> >   What is the latest stable version of cassandra  you have in
> production
> >> > ? We are migrating a large chunk of our mysql database to cassandra.
> I see a
> >> > lot of discussions regarding 1.* versions, but I have not seen /
> could not
> >> > find discussions regarding using 2.* versions in production. Any
> suggestions
> >> > for the version based on your experience?
> >> >
> >>
> >> I came to C* with 2.0 and have it up an running since about August. I
> have
> >> a production environment but can live with occasional problems. I could
> even
> >> setup the whole 3-node cluster from scratch without being killed :-)
> >>
> >> So I am happy with 2.0 having a short history of production use.
> >>
> >> What I noticed or saw on the list is that newer features (auto-paging,
> CAS
> >> support) seem to make it into the production releases in a
> non-production
> >> state.
> >>
> >> It appears this is generally accepted practice in the community - and
> >> honestly, I would not really know how to thoroughly test such features
> >> without handing them to the community to run them. So I do not think
> the C*
> >> crew is to blame, really.
> >>
> >> I have not had any issues with C* 2.0 when steering clear of these
> >> features (e.g. manually disable paging when working with ranges and do
> it
> >> yourself).
> >>
> >> Jan
> >>
> >>
> >>
> >> > - Sanjeeth
> >>
> >
>
>
>
> --
> @otfrom | CTO & co-founder @MastodonC | mastodonc.com
> See recent coverage of us in the Economist http://econ.st/WeTd2i and
> the Financial Times http://on.ft.com/T154BA
>

No deletes - is periodic repair needed? I think not...

2014-01-25 Thread Laing, Michael

I have a simple set of tables that can be grouped as follows:

1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage
size

2. Regular values, no deletes, some overwrites, read heavy (10 to 1), ttl's
to manage size

3. Counter values, no deletes, update heavy, rotation/truncation to manage
size

It seems to me that I can set gc_grace_seconds to 0 on each set of tables
and that I do not need to do periodic repair on any of them.

Is this the case? If so it relieves an operational headache and eliminates
a lot of processing.

The only downside I can see is if (when) a node really gets wiped out -
then I might lose any hints it may be holding as a coordinator and maybe
some other stuff. This is a rare occurrence, but if it happened I guess I
would replace the node, repairing and cleaning it as needed, and run repair
-pr sequentially on all other nodes to be sure the cluster is in sync.

BTW I am using Cassandra 2.0.3 and local quorum reads and writes on a 2 dc
12-node cluster.

Thanks,

Michael

Re: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Laing, Michael

Thanks Sylvain,

Your assumption is correct!

So I think I actually have 4 classes:

1.Regular values, no deletes, no overwrites, write heavy, variable
ttl's to manage size
2.Regular values, no deletes, some overwrites, read heavy (10 to 1),
fixed ttl's to manage size
2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1),
variable ttl's to manage size
3.Counter values, no deletes, update heavy, rotation/truncation to
manage size

Only 2.a. above requires me to do 'periodic repair'.

What I will actually do is change my schema and applications slightly to
eliminate the need for overwrites on the only table I have in that category.

And I will set gc_grace_period to 0 for the tables in the updated schema
and drop 'periodic repair' from the schedule.

Cheers,

Michael

On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne wrote:

> By periodic repair, I'll assume you mean "having to run repair every
> gc_grace period to make sure no deleted entries resurrect". With that
> assumption:
>
>
>> 1. Regular values, no deletes, no overwrites, write heavy, ttl's to
>> manage size
>>
>
> Since 'repair within gc_grace' is about avoiding value that have been
> deleted to resurrect, if you do no delete nor overwrites, you're in no risk
> of that (and don't need to 'repair withing gc_grace').
>
>
>> 2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
>> ttl's to manage size
>>
>
> It depends a bit. In general, if you always set the exact same TTL on
> every insert (implying you always set a TTL), then you have nothing to
> worry about. If the TTL varies (of if you only set TTL some of the times),
> then you might still need to have some periodic repairs. That being said,
> if there is no deletes but only TTLs, then the TTL kind of lengthen the
> period at which you need to do repair: instead of needing to repair withing
> gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
> is the smallest TTL you set on columns).
>
> 3. Counter values, no deletes, update heavy, rotation/truncation to manage
>> size
>>
>
> No deletes and no TTL implies that your fine (as in, there is no need for
> 'repair withing gc_grace').
>
> --
> Sylvain
>

Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Laing, Michael

Thanks again Sylvain!

I have actually set up one of our application streams such that the same
key is only overwritten with a monotonically increasing ttl.

For example, a breaking news item might have an initial ttl of 60 seconds,
followed in 45 seconds by an update with a ttl of 3000 seconds, followed by
an 'ignore me' update in 600 seconds with a ttl of 30 days (our maximum
ttl) when the article is published.

My understanding is that this case fits the criteria and no 'periodic
repair' is needed.

I guess another thing I would point out that is easy to miss or forget (if
you are a newish user like me), is that ttl's are fine-grained, by column.
So we are talking 'fixed' or 'variable' by individual column, not by table.
Which means, in my case, that ttl's can vary widely across a table, but as
long as I constrain them by key value to be fixed or monotonically
increasing, it fits the criteria.

Cheers,

Michael


On Tue, Jan 28, 2014 at 4:18 AM, Sylvain Lebresne wrote:

> On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo wrote:
>
>> If you have only ttl columns, and you never update the column I would not
>> think you need a repair.
>>
>
> Right, no deletes and no updates is the case 1. of Michael on which I
> think we all agree 'periodic repair to avoid resurrected columns' is not
> required.
>
>
>>
>> Repair cures lost deletes. If all your writes have a ttl a lost write
>> should not matter since the column was never written to the node and thus
>> could never be resurected on said node.
>>
>
> I'm sure we're all in agreement here, but for the record, this is only
> true if you have no updates (overwrites) and/or if all writes have the
> *same* ttl. Because in the general case, a column with a relatively short
> TTL is basically very close to a delete, while a column with a long TTL is
> very close from one that has no TTL. If the former column (with short TTL)
> overwrites the latter one (with long TTL), and if one nodes misses the
> overwrite, that node could resurrect the column with the longer TTL (until
> that column expires that is). Hence the separation of the case 2. (fixed
> ttl, no repair needed) and 2.a. (variable ttl, repair may be needed).
>
> --
> Sylvain
>
>
>>
>> Unless i am missing something.
>>
>> On Monday, January 27, 2014, Laing, Michael 
>> wrote:
>> > Thanks Sylvain,
>> > Your assumption is correct!
>> > So I think I actually have 4 classes:
>> > 1.Regular values, no deletes, no overwrites, write heavy, variable
>> ttl's to manage size
>> > 2.Regular values, no deletes, some overwrites, read heavy (10 to
>> 1), fixed ttl's to manage size
>> > 2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1),
>> variable ttl's to manage size
>> > 3.Counter values, no deletes, update heavy, rotation/truncation to
>> manage size
>> > Only 2.a. above requires me to do 'periodic repair'.
>> > What I will actually do is change my schema and applications slightly
>> to eliminate the need for overwrites on the only table I have in that
>> category.
>> > And I will set gc_grace_period to 0 for the tables in the updated
>> schema and drop 'periodic repair' from the schedule.
>> > Cheers,
>> > Michael
>> >
>> >
>> > On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne 
>> wrote:
>> >>
>> >> By periodic repair, I'll assume you mean "having to run repair every
>> gc_grace period to make sure no deleted entries resurrect". With that
>> assumption:
>> >>
>> >>>
>> >>> 1. Regular values, no deletes, no overwrites, write heavy, ttl's to
>> manage size
>> >>
>> >> Since 'repair within gc_grace' is about avoiding value that have been
>> deleted to resurrect, if you do no delete nor overwrites, you're in no risk
>> of that (and don't need to 'repair withing gc_grace').
>> >>
>> >>>
>> >>> 2. Regular values, no deletes, some overwrites, read heavy (10 to 1),
>> ttl's to manage size
>> >>
>> >> It depends a bit. In general, if you always set the exact same TTL on
>> every insert (implying you always set a TTL), then you have nothing to
>> worry about. If the TTL varies (of if you only set TTL some of the times),
>> then you might still need to have some periodic repairs. That being said,
>> if there is no deletes but only TTLs, then the TTL kind of lengthen the
>> period at which you need to do repair: instead of needing to repair withing
>> gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL)
>> is the smallest TTL you set on columns).
>> >>>
>> >>> 3. Counter values, no deletes, update heavy, rotation/truncation to
>> manage size
>> >>
>> >> No deletes and no TTL implies that your fine (as in, there is no need
>> for 'repair withing gc_grace').
>> >>
>> >> --
>> >> Sylvain
>> >
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>

Re: GC taking a long time

2014-02-06 Thread Laing, Michael

for the restart issue see
CASSANDRA-6008
and
6086


On Thu, Feb 6, 2014 at 12:19 PM, Alain RODRIGUEZ  wrote:

> Hi Robert,
>
> The heap, and GC are things a bit tricky to tune,
>
> I recently read a post about heap, explaining how heap works and how to
> tune it :
> http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads
>
> There is plenty of this kind of blogs or articles on the web.
>
> Be careful, tuning highly depends on your workload and hardware. You
> shouldn't use configuration found on these posts as is. You need to test,
> incrementally and monitor how it behave.
>
> If you are not able to find a solution, there is also a lot of
> professionals, consultants (like Datastax or Aaron Morton) whose job is to
> help with Cassandra integrations, including heap and GC tuning.
>
> Hope this will help somehow.
>
>
> 2014-01-29 16:51 GMT+01:00 Robert Wille :
>
>  Forget about what I said about there not being any load during the
>> night. I forgot about my unit tests. They would have been running at this
>> time and they run against this cluster.
>>
>> I also forgot to provide JVM information:
>>
>> java version "1.7.0_17"
>> Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
>>
>> Thanks
>>
>> Robert
>>
>> From: Robert Wille 
>> Reply-To: 
>> Date: Wednesday, January 29, 2014 at 4:06 AM
>> To: "user@cassandra.apache.org" 
>> Subject: GC taking a long time
>>
>> I read through the recent thread "Cassandra mad GC", which seemed very
>> similar to my situation, but didn’t really help.
>>
>> Here is what I get from my logs when I grep for GCInspector. Note that
>> this is the middle of the night on a dev server, so there should have been
>> almost no load.
>>
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:16,579 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 341 ms for 1 collections, 8001582816 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:29,135 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 350 ms for 1 collections, 802776used; 
>> max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:41,646 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 364 ms for 1 collections, 8075851136used; 
>> max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:41:54,223 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 375 ms for 1 collections, 8124762400used; 
>> max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:42:24,258 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 22995 ms for 2 collections, 7385470288
>> used; max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:21,328 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 218 ms for 1 collections, 7582480104 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:33,418 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 222 ms for 1 collections, 7584743872 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:45,527 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 217 ms for 1 collections, 7588514264 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:45:57,594 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 223 ms for 1 collections, 7590223632 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:09,686 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 226 ms for 1 collections, 7592826720 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:21,867 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7595464520 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:33,869 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 227 ms for 1 collections, 7597109672 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:45,962 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7599909296 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:46:57,964 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 230 ms for 1 collections, 7601584048 used;
>> max is 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:10,018 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 229 ms for 1 collections, 7604217952used; 
>> max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:22,136 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 236 ms for 1 collections, 7605867784used; 
>> max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:34,277 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 239 ms for 1 collections, 7607521456used; 
>> max is
>> 8126464000
>>  INFO [ScheduledTasks:1] 2014-01-29 02:47:46,292 GCInspector.java (line
>> 116) GC for ConcurrentMarkSweep: 235 ms for 1 collections, 7610667376 used;
>> max is

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Laing, Michael

Just to add my 2 cents...

We are very happy CQL users, running in production.

I have had no problems modeling whatever I have needed to, including
problems similar to the examples set forth previously, in CQL.

Personally I think it is an excellent improvement to Cassandra, and we have
no intentions to ever look back to thrift.

Michael Laing
Systems Architect
NYTimes


On Thu, Feb 20, 2014 at 7:49 PM, Edward Capriolo wrote:

>
>
> On Thursday, February 20, 2014, Robert Coli  wrote:
> > On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne 
> wrote:
> >>
> >> Of course, if everyone was using that reasoning, no-one would ever test
> new features and report problems/suggest improvement. So thanks to anyone
> like Rüdiger that actually tries stuff and take the time to report problems
> when they think they encounter one. Keep at it, *you* are the one helping
> Cassandra to get better everyday.
> >
> >
> > Perhaps people who are prototyping their first application with a piece
> of software are not the ideal people to beta test it?
> >
> > The people catching new version bullets for the community should be
> experienced operators choosing to do so in development and staging
> environments.
> > The current paradigm ensures that new users have to deal with Cassandra
> problems that interfere with their prototyping process and initial
> production deploy, presumably getting a very bad initial impression of
> Cassandra in the process.
> > =Rob
> >
>
> You would be surprised how many people pick software a of software b based
> on initial impressions.
>
> The reason I ended up choosing cassandra over hbase mostly boilded down to
> c* being easy to set up and not crashing. If it took us say 3 days to stand
> up a cassandra cluster and do the hello world thing i might very well be a
> voldemort user!
>
>
>
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Queuing System

2014-02-22 Thread Laing, Michael

We use RabbitMQ for queuing and Cassandra for persistence.

RabbitMQ with clustering and/or federation should meet your high
availability needs.

Michael


On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan  wrote:

> Jagan
>
>  Queue-like data structures are known to be one of the worst anti patterns
> for Cassandra:
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
>
>
>
> On Sat, Feb 22, 2014 at 4:03 PM, Jagan Ranganathan wrote:
>
>> Hi,
>>
>> I need to decouple some of the work being processed from the user thread
>> to provide better user experience. For that I need a queuing system with
>> the following needs,
>>
>>- High Availability
>>- No Data Loss
>>- Better Performance.
>>
>> Following are some libraries that were considered along with the
>> limitation I see,
>>
>>- Redis - Data Loss
>>- ZooKeeper - Not advised for Queue system.
>>- TokyoCabinet/SQLite/LevelDB - of this Level DB seem to be
>>performing better. With replication requirement, I probably have to look 
>> at
>>Apache ActiveMQ+LevelDB.
>>
>> After checking on the third option above, I kind of wonder if Cassandra
>> with Leveled Compaction offer a similar system. Do you see any issues in
>> such a usage or is there other better solutions available.
>>
>> Will be great to get insights on this.
>>
>> Regards,
>> Jagan
>>
>
>

Re: CQL decimal encoding

2014-02-26 Thread Laing, Michael

go uses 'zig-zag' encoding, perhaps that is the difference?


On Wed, Feb 26, 2014 at 6:52 AM, Peter Lin  wrote:

>
> You may need to bit shift if that is the case
>
> Sent from my iPhone
>
> > On Feb 26, 2014, at 2:53 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> >
> > Hey Colin,
> >
> >> On Tue, Feb 25, 2014 at 10:26 PM, Colin Blower 
> wrote:
> >> It looks like you are trying to implement the Decimal type. You might
> want
> >> to start with implementing the Integer type. The Decimal type follows
> pretty
> >> easily from the Integer type.
> >>
> >> For example:
> >> i = unmarchalInteger(data[4:])
> >> s = decInt(data[0:4])
> >> out = inf.newDec(i, s)
> >
> > Thanks for the suggestion.
> >
> > This is pretty much what I've got already. I think the issue might be
> > to do with the way that big.Int doesn't appear to use two's complement
> > to encode the varint. Maybe what is happening is that the encoding is
> > isomorphic across say Java, .NET, Python and Ruby, but that the
> > big.Int library in Go is not encoding in the same way.
> >
> > Cheers,
> >
> > Ben
>

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael

I have no problem doing this w 2.0.5 - what version of C* are you using? Or
maybe I don't understand your data model... attach 'creates' if you don't
mind.

ml


On Thu, Mar 13, 2014 at 9:24 AM, David Savage wrote:

> Hi Peter,
>
> Thanks for the help, unfortunately I'm not sure that's the problem, the id
> is the primary key on the documents table and the timestamp is the
> primary key on the eventlog table
>
> Kind regards,
>
>
> Dave
>
> On Thursday, 13 March 2014, Peter Lin  wrote:
>
>>
>> it's not clear to me if your "id" column is the KEY or just a regular
>> column with secondary index.
>>
>> queries that have IN on non primary key columns isn't supported yet. not
>> sure if that answers your question.
>>
>>
>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage wrote:
>>
>>> Hi there,
>>>
>>> I'm experimenting using cassandra and have run across an error message
>>> which I need a little more information on.
>>>
>>> The use case I'm experimenting with is a series of document updates
>>> (documents being an arbitrary map of key value pairs), I would like to find
>>> the latest document updates after a specified time period. I don't want to
>>> store many copies of the documents (one per update) as the updates are
>>> often only to single keys in the map so that would involve a lot of
>>> duplicated data.
>>>
>>> The solution I've found that seems to fit best in terms of performance
>>> is to have two tables.
>>>
>>> One that has an event log of timeuuid -> docid and a second that stores
>>> the documents themselves stored by docid -> map. I then run
>>> two queries, one to select ids that have changed after a certain time:
>>>
>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime)
>>>
>>> and then a second to select the actual documents themselves
>>>
>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)
>>>
>>> However this then explodes on query with the error message:
>>>
>>> "Cannot restrict PRIMARY KEY part id by IN relation as a collection is
>>> selected by the query"
>>>
>>> Detective work lead me to these lines in
>>> org.apache.cassandra.cql3.statementsSelectStatement:
>>>
>>> // We only support IN for the last name and for
>>> compact storage so far
>>> // TODO: #3885 allows us to extend to non compact as
>>> well, but that remains to be done
>>> if (i != stmt.columnRestrictions.length - 1)
>>> throw new
>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be
>>> restricted by IN relation", cname));
>>> else if (stmt.selectACollection())
>>> throw new
>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY part %s
>>> by IN relation as a collection is selected by the query", cname));
>>>
>>> It seems like #3885 will allow support for the first IF block above, but
>>> I don't think it will allow the second, am I correct?
>>>
>>> Any pointers on how I can work around this would be greatly appreciated.
>>>
>>> Kind regards,
>>>
>>> Dave
>>>
>>
>>

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael

Create your table like this and it will work:

CREATE TABLE test.documents (group text,id bigint,data
map,PRIMARY KEY ((group, id)));

The extra parens catenate 'group' and 'id' into the partition key - IN will
work on the last component of a partition key.

ml


On Thu, Mar 13, 2014 at 10:40 AM, David Savage wrote:

> Nope, upgraded to 2.0.5 and still get the same problem, I actually
> simplified the problem a little in my first post, there's a composite
> primary key involved as I need to partition ids into groups
>
> So the full CQL statements are:
>
> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
> 'replication_factor':3};
>
>
> CREATE TABLE test.documents (group text,id bigint,data
> map,PRIMARY KEY (group, id));
>
>
> INSERT INTO test.documents(id,group,data) VALUES (0,'test',{'count':'0'});
>
> INSERT INTO test.documents(id,group,data) VALUES (1,'test',{'count':'1'});
>
> INSERT INTO test.documents(id,group,data) VALUES (2,'test',{'count':'2'});
>
>
> SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);
>
>
> Thanks for your help.
>
>
> Kind regards,
>
>
> /Dave
>
>
> On Thu, Mar 13, 2014 at 2:00 PM, David Savage wrote:
>
>> Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
>> dragged in by the cassandra unit library I'm using for testing [1] I will
>> try to fix my build dependencies and retry, thx.
>>
>> /Dave
>>
>> [1] https://github.com/jsevellec/cassandra-unit
>>
>>
>> On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> I have no problem doing this w 2.0.5 - what version of C* are you using?
>>> Or maybe I don't understand your data model... attach 'creates' if you
>>> don't mind.
>>>
>>> ml
>>>
>>>
>>> On Thu, Mar 13, 2014 at 9:24 AM, David Savage wrote:
>>>
>>>> Hi Peter,
>>>>
>>>> Thanks for the help, unfortunately I'm not sure that's the problem, the
>>>> id is the primary key on the documents table and the timestamp is the
>>>> primary key on the eventlog table
>>>>
>>>> Kind regards,
>>>>
>>>>
>>>> Dave
>>>>
>>>> On Thursday, 13 March 2014, Peter Lin  wrote:
>>>>
>>>>>
>>>>> it's not clear to me if your "id" column is the KEY or just a regular
>>>>> column with secondary index.
>>>>>
>>>>> queries that have IN on non primary key columns isn't supported yet.
>>>>> not sure if that answers your question.
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2014 at 7:12 AM, David Savage 
>>>>> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I'm experimenting using cassandra and have run across an error
>>>>>> message which I need a little more information on.
>>>>>>
>>>>>> The use case I'm experimenting with is a series of document updates
>>>>>> (documents being an arbitrary map of key value pairs), I would like to 
>>>>>> find
>>>>>> the latest document updates after a specified time period. I don't want 
>>>>>> to
>>>>>> store many copies of the documents (one per update) as the updates are
>>>>>> often only to single keys in the map so that would involve a lot of
>>>>>> duplicated data.
>>>>>>
>>>>>> The solution I've found that seems to fit best in terms of
>>>>>> performance is to have two tables.
>>>>>>
>>>>>> One that has an event log of timeuuid -> docid and a second that
>>>>>> stores the documents themselves stored by docid -> map. I
>>>>>> then run two queries, one to select ids that have changed after a certain
>>>>>> time:
>>>>>>
>>>>>> SELECT id FROM eventlog WHERE timestamp>=minTimeuuid($minimumTime)
>>>>>>
>>>>>> and then a second to select the actual documents themselves
>>>>>>
>>>>>> SELECT id, data FROM documents WHERE id IN (0, 1, 2, 3, 4, 5, 6, 7…)
>>>>>>
>>>>>> However this then explodes on query with the error message:
>>>>>>
>>>>>> "Cannot restrict PRIMARY KEY part id by IN relation as a collection
>>>>>> is selected by the query"
>>>>>>
>>>>>> Detective work lead me to these lines in
>>>>>> org.apache.cassandra.cql3.statementsSelectStatement:
>>>>>>
>>>>>> // We only support IN for the last name and for
>>>>>> compact storage so far
>>>>>> // TODO: #3885 allows us to extend to non compact
>>>>>> as well, but that remains to be done
>>>>>> if (i != stmt.columnRestrictions.length - 1)
>>>>>> throw new
>>>>>> InvalidRequestException(String.format("PRIMARY KEY part %s cannot be
>>>>>> restricted by IN relation", cname));
>>>>>> else if (stmt.selectACollection())
>>>>>> throw new
>>>>>> InvalidRequestException(String.format("Cannot restrict PRIMARY KEY part 
>>>>>> %s
>>>>>> by IN relation as a collection is selected by the query", cname));
>>>>>>
>>>>>> It seems like #3885 will allow support for the first IF block above,
>>>>>> but I don't think it will allow the second, am I correct?
>>>>>>
>>>>>> Any pointers on how I can work around this would be greatly
>>>>>> appreciated.
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael

Think of them as:

PRIMARY KEY (partition_key[, range_key])

where the partition_key can be compounded as:

(partition_key0 [, partition_key1, ...])

and the optional range_key can be compounded as:

range_key0 [, range_key1 ...]

If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
and key2 is the range_key and queries will work that hash to key1 (the
partition) using = or IN and specify a range on key2.

But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
compound partition key - there is no range key - and you can specify = on
key1 and = or IN on key2 (but not a range).

Anyway that's what I remember! Hope it helps.

ml


On Thu, Mar 13, 2014 at 11:27 AM, David Savage wrote:

> Great that works, thx! I probably would have never found that...
>
> It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
> PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
> time.
>
> Kind regards,
>
> Dave
>
>
> On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael  > wrote:
>
>> Create your table like this and it will work:
>>
>> CREATE TABLE test.documents (group text,id bigint,data
>> map,PRIMARY KEY ((group, id)));
>>
>> The extra parens catenate 'group' and 'id' into the partition key - IN
>> will work on the last component of a partition key.
>>
>> ml
>>
>>
>> On Thu, Mar 13, 2014 at 10:40 AM, David Savage wrote:
>>
>>> Nope, upgraded to 2.0.5 and still get the same problem, I actually
>>> simplified the problem a little in my first post, there's a composite
>>> primary key involved as I need to partition ids into groups
>>>
>>> So the full CQL statements are:
>>>
>>> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
>>> 'replication_factor':3};
>>>
>>>
>>> CREATE TABLE test.documents (group text,id bigint,data
>>> map,PRIMARY KEY (group, id));
>>>
>>>
>>> INSERT INTO test.documents(id,group,data) VALUES
>>> (0,'test',{'count':'0'});
>>>
>>> INSERT INTO test.documents(id,group,data) VALUES
>>> (1,'test',{'count':'1'});
>>>
>>> INSERT INTO test.documents(id,group,data) VALUES
>>> (2,'test',{'count':'2'});
>>>
>>>
>>> SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);
>>>
>>>
>>> Thanks for your help.
>>>
>>>
>>> Kind regards,
>>>
>>>
>>> /Dave
>>>
>>>
>>> On Thu, Mar 13, 2014 at 2:00 PM, David Savage wrote:
>>>
>>>> Hmmm that maybe the problem, I'm currently testing with 2.0.2 which got
>>>> dragged in by the cassandra unit library I'm using for testing [1] I will
>>>> try to fix my build dependencies and retry, thx.
>>>>
>>>> /Dave
>>>>
>>>> [1] https://github.com/jsevellec/cassandra-unit
>>>>
>>>>
>>>> On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael <
>>>> michael.la...@nytimes.com> wrote:
>>>>
>>>>> I have no problem doing this w 2.0.5 - what version of C* are you
>>>>> using? Or maybe I don't understand your data model... attach 'creates' if
>>>>> you don't mind.
>>>>>
>>>>> ml
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2014 at 9:24 AM, David Savage 
>>>>> wrote:
>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> Thanks for the help, unfortunately I'm not sure that's the problem,
>>>>>> the id is the primary key on the documents table and the timestamp
>>>>>> is the primary key on the eventlog table
>>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>> On Thursday, 13 March 2014, Peter Lin  wrote:
>>>>>>
>>>>>>>
>>>>>>> it's not clear to me if your "id" column is the KEY or just a
>>>>>>> regular column with secondary index.
>>>>>>>
>>>>>>> queries that have IN on non primary key columns isn't supported yet.
>>>>>>> not sure if that answers your question.
>>>>>>>
>>

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael

I have found that range_key communicates better what you can actually do
with them, whereas clustering is more passive.

ml


On Thu, Mar 13, 2014 at 2:08 PM, Jack Krupansky wrote:

>   “range key” is formally known as “clustering column”. One or more
> clustering columns can be specified to identify individual rows in a
> partition. Without clustering columns, one partition is one row. So, it’s a
> matter of whether you want your rows to be in the same partition or
> distributed.
>
> -- Jack Krupansky
>
>  *From:* Laing, Michael 
> *Sent:* Thursday, March 13, 2014 1:39 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: CQL Select Map using an IN relationship
>
>  Think of them as:
>
>
> PRIMARY KEY (partition_key[, range_key])
>
>
> where the partition_key can be compounded as:
>
>
> (partition_key0 [, partition_key1, ...])
>
>
> and the optional range_key can be compounded as:
>
>
> range_key0 [, range_key1 ...]
>
>
> If you do this: PRIMARY KEY (key1, key2) - then key1 is the partition_key
> and key2 is the range_key and queries will work that hash to key1 (the
> partition) using = or IN and specify a range on key2.
>
> But if you do this: PRIMARY key ((key1, key2)) then (key1, key2) is the
> compound partition key - there is no range key - and you can specify = on
> key1 and = or IN on key2 (but not a range).
>
> Anyway that's what I remember! Hope it helps.
>
> ml
>
>
> On Thu, Mar 13, 2014 at 11:27 AM, David Savage wrote:
>
>> Great that works, thx! I probably would have never found that...
>>
>> It now makes me wonder in general when to use PRIMARY KEY (key1, key2) or
>> PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the
>> time.
>>
>> Kind regards,
>>
>> Dave
>>
>>
>> On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> Create your table like this and it will work:
>>>
>>> CREATE TABLE test.documents (group text,id bigint,data
>>> map,PRIMARY KEY ((group, id)));
>>>
>>> The extra parens catenate 'group' and 'id' into the partition key - IN
>>> will work on the last component of a partition key.
>>>
>>> ml
>>>
>>>
>>> On Thu, Mar 13, 2014 at 10:40 AM, David Savage 
>>> wrote:
>>>
>>>> Nope, upgraded to 2.0.5 and still get the same problem, I actually
>>>> simplified the problem a little in my first post, there's a composite
>>>> primary key involved as I need to partition ids into groups
>>>>
>>>> So the full CQL statements are:
>>>>
>>>>
>>>> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy',
>>>> 'replication_factor':3};
>>>>
>>>>
>>>>
>>>> CREATE TABLE test.documents (group text,id bigint,data
>>>> map,PRIMARY KEY (group, id));
>>>>
>>>>
>>>>
>>>> INSERT INTO test.documents(id,group,data) VALUES
>>>> (0,'test',{'count':'0'});
>>>>
>>>> INSERT INTO test.documents(id,group,data) VALUES
>>>> (1,'test',{'count':'1'});
>>>>
>>>> INSERT INTO test.documents(id,group,data) VALUES
>>>> (2,'test',{'count':'2'});
>>>>
>>>>
>>>>
>>>> SELECT id,data FROM test.documents WHERE group='test' AND id IN (0,1,2);
>>>>
>>>>
>>>>
>>>> Thanks for your help.
>>>>
>>>>
>>>>
>>>> Kind regards,
>>>>
>>>>
>>>>
>>>> /Dave
>>>>
>>>>
>>>> On Thu, Mar 13, 2014 at 2:00 PM, David Savage 
>>>> wrote:
>>>>
>>>>>  Hmmm that maybe the problem, I'm currently testing with 2.0.2 which
>>>>> got dragged in by the cassandra unit library I'm using for testing [1] I
>>>>> will try to fix my build dependencies and retry, thx.
>>>>>
>>>>> /Dave
>>>>>
>>>>> [1] https://github.com/jsevellec/cassandra-unit
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2014 at 1:56 PM, Laing, Michael <
>>>>> michael.la...@nytimes.com> wrote:
>>>>>
>>>>>> I have no problem doing this w 2.0.5 - what version of C* are you
>&

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael

These are my personal opinions, reflecting both my long experience w
database systems, and my newness to Cassandra...

[tl;dr]

The Cassandra contributors, having made its history, tend to describe it in
terms of implementation rather than action. And its implementation has a
history, all relatively recent, that many know, but which to newcomers like
me is obscure and, frankly, not particularly relevant.

Note: we are all trying to understand Crimea now, and to really understand,
you have to ingest several hundred years of history. Luckily, Cassandra has
not been around quite so long!

But Cassandra's history creeps into the nomenclature of CQL3. So what might
logically be called a 'hash key' is called a 'partition key', what is
called a 'clustering key' might be better termed a 'range key' IMHO.

The 'official' terms in the nomenclature are important to know, they are
just not descriptive of the actions one takes as a user of them. However,
they have meaning to those who have 'lived' the history of Cassandra, and
form an important bridge to the past.

As a new user I found them non-intuitive. Amazon has done a much better job
with DynamoDB - muddled, however, by bad syntax choices.

But you adjust and mentally map... I am still bumfuzzled when people talk
of slices and other C* cruft but just let it slide by like lectures from my
mother. That and thrift can just fade into history with gopher and lynx as
far as I am concerned - CQL3 is where it's at.

But another thing to remember is that performance is king - and to get
performance you fly 'close to the metal': Cassandra does that and you
should know the code paths, the physical structures, and the
characteristics of your 'metal' to understand how to build high-performing
apps.

***

The answer to both asterisks is Yes. You should use the term 'clustering
column' because that is what is in the docs - but you should think 'range
key' for how you use it. Similarly 'partition key' : 'hash key'.

Good luck,

ml

Re: Cassandra slow on some reads

2014-03-14 Thread Laing, Michael

*If* you do not need to do range queries on your 'timestam' (ts) column -
*and* if you can change your schema (big if...), then you could move
'timestam' into the partition key like this (using your notation):

PK((key String , timestam int), column1 string, col2 string) , list1 , list
2, list 3 .

Now the select query you showed should execute more consistently.

But of course something else might break...!

ml


On Fri, Mar 14, 2014 at 8:50 AM, Batranut Bogdan  wrote:

> Hello all,
>
> Here is the environment:
>
> I have a 6 node Cassandra cluster. On each node I have:
> - 32 G RAM
> - 24 G RAM for cassa
> - ~150 - 200 MB/s disk speed
> - tomcat 6 with axis2 webservice that uses the datastax java driver to make
> asynch reads / writes
> - replication factor for the keyspace is 3
>
> (I know that there is a lot of heap but I also have write heavy tasks and
> I want them to get into mem fast) .
>
> All nodes in the same data center
> The clients that read / write are in the same datacenter so network is
> Gigabit.
>
> The table structure is like this: PK(key String , timestam int, column1
> string, col2 string) , list1 , list 2, list 3 .
> There are about 300 milions individual keys.
> There are about 100 timestamps for each key now, so the rows will get
> wider as time passes.
>
> I am using datastax java driver to query the cluster.
>
> I have ~450 queries that are like this: SELECT * FROM table where key =
> 'some string' and ts = some value; some value is close to present time.
>
> The problem:
>
> About 10 - 20 % of these queries take more than 5 seconds to execute, in
> fact, the majority of those take around 10 seconds.
> When investigating I saw that if I have a slow response and I redo the
> query it will finish in 8 - 10 MILIseconds like the rest of the queries
> that I have.
> I could not see using JConsole any spikes in CPU / memory when executing
> the queries. The rise in resource consumtion is very small on all nodes on
> the cluster. I expect such delays to be generated by a BIG increase in
> resource consumption.
>
> Any comments will be appreciated.
>
> Thank you.
>
>
>
>
>

Re: Exception in thread event_loop

2014-03-16 Thread Laing, Michael

A possible workaround - not a fix - might be to install libev so the libev
event loop is used.

See http://datastax.github.io/python-driver/installation.html

Also be sure you are running the latest version: 1.0.2 I believe.

Your ';' is outside of your 'str' - actually shouldn't be a problem tho.

Good luck!
ml


On Sun, Mar 16, 2014 at 8:19 PM, Sundeep Kambhampati <
satyasunde...@gmail.com> wrote:

> Hi,
>
> I am using cassandra-2.0.4.
>
> When I execute following python code, I often get an exception mentioned
> below. Sometimes it works fine but most of the times it throws an error.
> All the nodes are up and running. Data in second column of table is huge,
> few thousand characters long and there around 10K rows.
>
> Code :
>
> from cassandra.cluster import Cluster
>
> cluster = Cluster(['10.2.252.0', '10.2.252.1'])
> session = cluster.connect('test')
>
> str = "SELECT * FROM users ";
> rows = session.execute(str)
>
> for user_row in rows:
> print user_row.f_id, user_row.data
>
> cluster.shutdown()
>
> Exception:
>
> Exception in thread event_loop (most likely raised during interpreter
> shutdown):
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/threading.py", line 810, in
> __bootstrap_inner
>   File "/usr/local/lib/python2.7/threading.py", line 763, in run
>   File
> "/usr/local/lib/python2.7/site-packages/cassandra/io/asyncorereactor.py",
> line 52, in _run_loop
> : __exit__
>
>
> Can someone help me fixing this error?
>
> Regards,
> Sundeep
>

Re: ALLOW FILTERING usage

2014-03-17 Thread Laing, Michael

Your second query is invalid:

*Bad Request: Partition KEY part key cannot be restricted by IN relation
(only the last part of the partition key can)*

ml


On Mon, Mar 17, 2014 at 6:56 AM, Tupshin Harper  wrote:

> It's the difference between reading from only the partitions that you are
> interested, vs reading every single partition before filtering the
> results.  At scale, and assuming you don't actually need to read every
> partition, there would be a huge difference.
>
> If the model requires you to read every partition in the table in the IN
> approach, then there probably won't be much difference.
>
> In general, I would prefer the approach where you explicitly specify the
> keys you are needing to poll.
>
> -Tupshin
> On Mar 17, 2014 6:51 AM, "Kasper Middelboe Petersen" 
> wrote:
>
>> Hi,
>>
>> I have a table:
>>
>> CREATE TABLE json (
>>key text,
>>group text,
>>date timestamp,
>>json text,
>>PRIMARY KEY((key, group), date)
>> ) WITH CLUSTERING ORDER BY (date DESC);
>>
>> This table will contain a small amount of data (only what an
>> administrator creates by hand - a year from now maybe 6 different keys,
>> 10-15 groups for a total of 60-90 rows each with up to maybe 15 columns).
>>
>> I need the application to detect changes to this and was planning to poll
>> the table every few minutes for new content (while its not often its
>> updated, the update needs to go live fairly quick).
>>
>> My question then is how big, if any, a difference there would be to doing:
>> SELECT * FROM json WHERE date < :now: AND date > :lastCheck: ALLOW
>> FILTERING
>> to
>> SELECT * FROM json WHERE key IN () AND group IN
>> () AND date < :now: AND date > :lastCheck:
>>
>>
>> /Kasper
>>
>

Re: Data model for boolean attributes

2014-03-21 Thread Laing, Michael

Of course what you really want is this:

create table x(
  id text,
  timestamp timeuuid,
  flag boolean,
  // other fields
  primary key (flag, id, timestamp)
)

Whoops now there are only 2 partition keys! Not good if you have any
reasonable number of rows...

Faced with a situation like this (although this is extreme) of a limited
number of partition keys - and if this access path is important - then you
can add shards like this:

create table x(
  id text,
  timestamp timeuuid,
  flag boolean,
  // other fields
  shard int,
  primary key ((flag, shard), id, timestamp)
)

The last part of the partition key may be used with IN so you can query
like this:

select * from x where flag=true and shard in (0,1,2,3,4,5,6,...);

I monitor partition sizes and shard enough to keep them reasonable in this
sort of situation. The C* infrastructure parallelizes a lot of the activity
so such queries are quite fast. Oh, and ORDER BY works across shards.

But the main point is: drive from your queries. Designing for C* is NOT
like SQL - don't expect to develop a normalized set of tables to do it all.
Start with how you want to access data and design from there.

So - if you need to get a bunch of ids fast given a flag and maybe an
id/timestamp range, and your volumes/sizes are such that the number of
shards can be kept reasonable, this might be a good design, otherwise its
crap. Drive from your own access patterns to derive your (typically
denormalized) table defs.

ml

On Fri, Mar 21, 2014 at 3:34 PM, DuyHai Doan  wrote:

> Hello Ben
>
>  Try the following alternative with composite partition key to encode the
> dual states of the boolean:
>
>
> create table x(
>   id text,
>   flag boolean,
>   timestamp timeuuid,
> // other fields
>   primary key (*(id,flag)* timestamp)
> )
>
> Your previous "select * from x where flag = true;"  translate into:
>
>  SELECT * FROM x WHERE id=... AND flag = true
>
> Of course, you'll need to provide the id in any case.
>
>  If you want to query only on the boolean flag, I'm afraid that manual
> indexing or secondary index (beware of cardinality !) are your only choices.
>
> Regards
>
>  Duy Hai DOAN
>
>
>
>
> On Fri, Mar 21, 2014 at 8:27 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>
>> Hi,
>>
>> I was wondering what the best way is to lay column families out so
>> that you can to query by a boolean attribute.
>>
>> For example:
>>
>> create table x(
>>   id text,
>>   timestamp timeuuid,
>>   flag boolean,
>>   // other fields
>>   primary key (id, timestamp)
>> )
>>
>> So that you can query
>>
>> select * from x where flag = true;
>>
>> Three approaches spring to mind:
>>
>> 1) Put a secondary index on the flag column
>> 2) Split the column family definition out into two separate CFs, one
>> for true and one for false
>> 3) Maintain a manual index table
>> 4) Something else
>>
>> Option (1) seems to be the easiest, but I was wondering if that was
>> going to put too much load on the secondary index, given the low
>> cardinality of this attribute. It seems like what Cassandra would have
>> to do internally to achieve (1) is to effectively implement (2) or (3)
>> behind the scenes, but I'm just guessing.
>>
>> Does anybody have any experience with this?
>>
>> Cheers,
>>
>> Ben
>>
>
>

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread Laing, Michael

I ran into the same problem some time ago.

Upgrading to Cassandra 2, jdk 1.7, and default parameters fixed it.

I think the jdk change was the key for my similarly small memory cluster.

ml



On Sat, Mar 22, 2014 at 1:36 PM, prem yadav  wrote:

> Michael, no memory constraints. System memory is 4 GB and Cassandra run on
> default.
>
>
> On Sat, Mar 22, 2014 at 5:32 PM, prem yadav  wrote:
>
>> Its Oracle jdk 1.6.
>> Robert, any fix that you know of which went into 1.2.15 for this
>> particular issue?
>>
>>
>> On Sat, Mar 22, 2014 at 4:50 PM, Robert Coli wrote:
>>
>>> On Sat, Mar 22, 2014 at 7:48 AM, prem yadav wrote:
>>>
 But, the cassandra process keeps getting killed due to OOM. Cassandra
 version in use is 1.1.9.

>>>
>>> Try using 1.2.15, instead?
>>>
>>> =Rob
>>>
>>>
>>
>>
>

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread Laing, Michael

You might want to look at:

http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/


On Sat, Mar 22, 2014 at 4:38 PM, Brian Flad  wrote:

> Is there anything else running on the boxes? Can you show us the output of
> ps aux for the Cassandra process so we can see the -xmx, etc? JDK 7 may
> help, even if you cannot upgrade Cassandra yet (which I would really
> recommend since it moves items off-heap).
>
>
> On Sat, Mar 22, 2014 at 4:01 PM, prem yadav  wrote:
>
>> Upgrading is not possible right now. Any other suggestions guys?
>> I have already tried reducing the number of rpc threads. Also tried
>> reducing the linux kernel overcommit.
>>
>>
>> On Sat, Mar 22, 2014 at 5:44 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> I ran into the same problem some time ago.
>>>
>>> Upgrading to Cassandra 2, jdk 1.7, and default parameters fixed it.
>>>
>>> I think the jdk change was the key for my similarly small memory cluster.
>>>
>>> ml
>>>
>>>
>>>
>>> On Sat, Mar 22, 2014 at 1:36 PM, prem yadav wrote:
>>>
>>>> Michael, no memory constraints. System memory is 4 GB and Cassandra run
>>>> on default.
>>>>
>>>>
>>>> On Sat, Mar 22, 2014 at 5:32 PM, prem yadav wrote:
>>>>
>>>>> Its Oracle jdk 1.6.
>>>>> Robert, any fix that you know of which went into 1.2.15 for this
>>>>> particular issue?
>>>>>
>>>>>
>>>>> On Sat, Mar 22, 2014 at 4:50 PM, Robert Coli wrote:
>>>>>
>>>>>> On Sat, Mar 22, 2014 at 7:48 AM, prem yadav wrote:
>>>>>>
>>>>>>> But, the cassandra process keeps getting killed due to OOM.
>>>>>>> Cassandra version in use is 1.1.9.
>>>>>>>
>>>>>>
>>>>>> Try using 1.2.15, instead?
>>>>>>
>>>>>> =Rob
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
>
> Brian Flad
> http://about.me/bflad
>

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael

In your step 4, be sure you create a consistent EBS snapshot. You may have
pieces of your sstables that have not actually been flushed all the way to
EBS.

See https://github.com/alestic/ec2-consistent-snapshot

ml


On Fri, Mar 28, 2014 at 3:21 PM, Russ Lavoie  wrote:

> Thank you for your quick response.
>
> Is there a way to tell when a snapshot is completely done?
>
>
>   On Friday, March 28, 2014 1:30 PM, Robert Coli 
> wrote:
>  On Fri, Mar 28, 2014 at 11:15 AM, Russ Lavoie wrote:
>
> We are using cassandra 1.2.10 (With JNA installed) on ubuntu 12.04.3 and
> are running our instances in Amazon Web Services.
>
>
>
>  Our cassandra systems data is on an EBS volume
>
>
> Best practice for Cassandra on AWS is to run on ephemeral stripe, not EBS.
>
>
>  so we can take snapshots of the data and create volumes based on those
> snapshots and restore them where we want to.
>
>
> https://github.com/synack/tablesnap
>
>
> ?
>
>
>  How can I tell when a snapshot is fully complete so I do not have
> corrupted SSTables?
>
>
> SStables are immutable after they are created. I'm not sure how you're
> getting a snapshot that has corrupted SSTables in it. If you can repro
> reliably, file a JIRA on issues.apache.org.
>
> =Rob
>
>
>
>

Re: Migration from 2.0.10 to 2.1.12

2016-03-30 Thread Laing, Michael

fyi the list of reserved keywords is at:

https://cassandra.apache.org/doc/cql3/CQL.html#appendixA

ml

On Wed, Mar 30, 2016 at 9:41 AM, Jean Carlo 
wrote:

> Yes we did some reads and writes, the problem is that adding double quotes
> force us to modify our code to change and insert like that
>
> INSERT INTO table1 (bill_id, *full*, name,provider_date ,total) values
> ('qs','full','name','2015-02-23','toltal');
>
> to this
>
> INSERT INTO table1 (bill_id, *"full"*, name,provider_date ,total) values
> ('qs','full','name','2015-02-23','toltal');
>
> this last one is ok, but obviously the first one makes an error:
>
> cqlsh:pns_fr_2_jean> INSERT INTO table1 (bill_id, full, name,provider_date
> ,total) values ('qs','full','name','2015-02-23','toltal');
> SyntaxException:  message="line 1:29 no viable alternative at input '*full*' (INSERT INTO
> table1 (bill_id, [full]...)">
>
> and it is because the name of the column is not longer full, is "full"
>
>
>
> Best regards
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Wed, Mar 30, 2016 at 3:26 PM, Eric Evans  wrote:
>
>> On Wed, Mar 30, 2016 at 8:08 AM, Jean Carlo 
>> wrote:
>> > With double quotes it doesn't show error
>> >
>> > CREATE TABLE table1 ( bill_id text, "full" text, name text,
>> > provider_date timestamp, total text, PRIMARY KEY ( bill_id) ) ;
>> >
>> > but it changes the name of the column
>>
>> I don't think it does though; You're meant to be able to use the
>> output of DESC TABLE to (re)create these tables, so it's quoted below
>> for the same reason you did so above.
>>
>> Obviously it would be better to have column names that parse without
>> needing to be double quoted, but it should work.  Have you tried some
>> reads and writes?
>>
>> > desc table table1;
>> >
>> > CREATE TABLE pns_fr_2_jean.table1 (
>> > bill_id text PRIMARY KEY,
>> > "full" text,
>> > name text,
>> > provider_date timestamp,
>> > total text
>> > )
>> >
>> > instead of
>> >
>> > CREATE TABLE pns_fr_2_jean.table1 (
>> > bill_id text PRIMARY KEY,
>> > full text,
>> > name text,
>> > provider_date timestamp,
>> > total text
>> > )
>>
>>
>> --
>> Eric Evans
>> eev...@wikimedia.org
>>
>
>

Re: Publishing from cassandra

2016-04-24 Thread Laing, Michael

You could take a look at, or follow:
https://issues.apache.org/jira/browse/CASSANDRA-8844

On Sun, Apr 24, 2016 at 10:51 AM, Alexander Orr  wrote:

> Hi,
>
> I'm wondering if someone could help me, I'd like to use cassandra to store
> data and publish this on dowstream to another database (kdb if anyone is
> interested). Essentially I'd like to be able to run a function or operation
> on cassandra from an upstream process that would insert to table and
> publish the data on downstream.
>
> I can't see anything in the docs, but I'm relatively new to cassandra.
> Assuming there's not something simple already in place what would be the
> best way to impliment this kind of mechanism? I have some java that will
> allow me to talk to the db I want to, but I'm not sure of the  best way to
> integrate this with cassandra.
>
> UDFs seem to have ponential, but I don't think it's possible to use
> external libraries/classes within UDFs. All I can think of at the minute is
> either having a process which controls cassandra, publishes to it and also
> the downstream system directly or cloning the git repo and seeing if I can
> hack in some extra functionality.
>
> Any suggestions welcome.
>
> Thanks
>
> Alex
>

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael

Try converting that int from decimal to hex and inserting dashes in the
appropriate spots - or go the other way.

Also, you are looking at different rows, based upon your selection
criteria...

ml

On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Hi,
>
>
> I got a Cassandra keyspace, but while reading the data(especially UUID)
> via Spark SQL using Python is not returning the correct value.
>
> Cassandra:
> --
> My table 'SAM'' is described below:
>
> CREATE table ks.sam (id uuid, dept text, workflow text, type double
> primary  key (id, dept))
>
> SELECT id, workflow FROM sam WHERE dept='blah';
>
> The above example  CQL gives me the following
> id   | workflow
> --+
>  9547v26c-f528-12e5-da8b-001a4q3dac10 |   testWK
>
>
> Spark/Python:
> --
> from pyspark import SparkConf
> from pyspark.sql import SQLContext
> import pyspark_cassandra
> from pyspark_cassandra import CassandraSparkContext
>
> 
> conf =
> SparkConf().set("spark.cassandra.connection.host",IP_ADDRESS).set("spark.cassandra.connection.native.port",PORT_NUMBER)
> sparkContext = CassandraSparkContext(conf = conf)
> sqlContext = SQLContext(sparkContext)
>
> samTable =sparkContext.cassandraTable("ks", "sam").select('id', 'dept','
> workflow')
> samTable.cache()
>
> samdf.registerTempTable("samd")
>
>  sparkSQLl ="SELECT distinct id, dept, workflow FROM samd WHERE workflow='
> testWK'
>  new_df = sqlContext.sql(sparkSQLl)
>  results  =  new_df.collect()
>  for row in results:
> print "dept=",row.dept
> print "wk=",row.workflow
> print "id=",row.id
> ...
> The Python code above prints the following:
> dept=Biology
> wk=testWK
> id=293946894141093607334963674332192894528
>
>
> You can see here that the id (uuid) whose correct value at Cassandra is '
> 9547v26c-f528-12e5-da8b-001a4q3dac10'  but via Spark I am getting an int '
> 29394689414109360733496367433219289452'.
> What I am doing wrong here? How to get the correct UUID value from
> Cassandra via Spark/Python ? Please help me.
>
> Thank you
> Rajesh R
>
> **
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **
>

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael

Yes - a UUID is just a 128 bit value. You can view it using any base or
format.

If you are looking at the same row, you should see the same 128 bit value,
otherwise my theory is incorrect :)

Cheers,
ml

On Tue, May 24, 2016 at 6:57 AM, Rajesh Radhakrishnan <
rajesh.radhakrish...@phe.gov.uk> wrote:

> Hi Michael,
>
> Thank you for the quick reply.
> So you are suggesting to convert this int value(UUID comes back as int via
> Spark SQL) to hex?
>
>
> And selection is just a example to highlight the UUID convertion issue.
> So in Cassandra it should be
> SELECT id, workflow FROM sam WHERE dept='blah';
>
> And in Spark with Python:
> SELECT distinct id, dept, workflow FROM samd WHERE dept='blah';
>
>
> Best,
> Rajesh R
>
>
> --
> *From:* Laing, Michael [michael.la...@nytimes.com]
> *Sent:* 24 May 2016 11:40
> *To:* user@cassandra.apache.org
> *Subject:* Re: UUID coming as int while using SPARK SQL
>
> Try converting that int from decimal to hex and inserting dashes in the
> appropriate spots - or go the other way.
>
> Also, you are looking at different rows, based upon your selection
> criteria...
>
> ml
>
> On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan <
> rajesh.radhakrish...@phe.gov.uk
> <http://redir.aspx?REF=5W78rpYMgC0K3ToHNzRZiPAT8hnWs6gnRkq-A41T1FsrnpaZwYPTCAFtYWlsdG86UmFqZXNoLlJhZGhha3Jpc2huYW5AcGhlLmdvdi51aw..>
> > wrote:
>
>> Hi,
>>
>>
>> I got a Cassandra keyspace, but while reading the data(especially UUID)
>> via Spark SQL using Python is not returning the correct value.
>>
>> Cassandra:
>> --
>> My table 'SAM'' is described below:
>>
>> CREATE table ks.sam (id uuid, dept text, workflow text, type double
>> primary  key (id, dept))
>>
>> SELECT id, workflow FROM sam WHERE dept='blah';
>>
>> The above example  CQL gives me the following
>> id   | workflow
>> --+
>>  9547v26c-f528-12e5-da8b-001a4q3dac10 |   testWK
>>
>>
>> Spark/Python:
>> --
>> from pyspark import SparkConf
>> from pyspark.sql import SQLContext
>> import pyspark_cassandra
>> from pyspark_cassandra import CassandraSparkContext
>>
>> 
>> conf =
>> SparkConf().set("spark.cassandra.connection.host",IP_ADDRESS).set("spark.cassandra.connection.native.port",PORT_NUMBER)
>> sparkContext = CassandraSparkContext(conf = conf)
>> sqlContext = SQLContext(sparkContext)
>>
>> samTable =sparkContext.cassandraTable("ks", "sam").select('id', 'dept','
>> workflow')
>> samTable.cache()
>>
>> samdf.registerTempTable("samd")
>>
>>  sparkSQLl ="SELECT distinct id, dept, workflow FROM samd WHERE workflow
>> ='testWK'
>>  new_df = sqlContext.sql(sparkSQLl)
>>  results  =  new_df.collect()
>>  for row in results:
>> print "dept=",row.dept
>> print "wk=",row.workflow
>> print "id=",row.id
>> <http://redir.aspx?REF=hwbfDfTNtMn5fA8cehCaM6eYnx4zZc72sCEXSlez7vornpaZwYPTCAFodHRwOi8vcm93Lmlk>
>> ...
>> The Python code above prints the following:
>> dept=Biology
>> wk=testWK
>> id=293946894141093607334963674332192894528
>>
>>
>> You can see here that the id (uuid) whose correct value at Cassandra is '
>> 9547v26c-f528-12e5-da8b-001a4q3dac10'  but via Spark I am getting an int
>> '29394689414109360733496367433219289452'.
>> What I am doing wrong here? How to get the correct UUID value from
>> Cassandra via Spark/Python ? Please help me.
>>
>> Thank you
>> Rajesh R
>>
>> **
>> The information contained in the EMail and any attachments is
>> confidential and intended solely and for the attention and use of the named
>> addressee(s). It may not be disclosed to any other person without the
>> express authority of Public Health England, or the intended recipient, or
>> both. If you are not the intended recipient, you must not disclose, copy,
>> distribute or retain this message or any part of it. This footnote also
>> confirms that this EMail has been swept for computer viruses by
>> Symantec.Cloud, but please re-sweep any attachments before opening or
>> saving. http://www.gov.uk/PHE
>> <http://redir.aspx?REF=zD5FZVqmamOq2gN3nyXbD0q1lWW

Re: Cassandra event notification on INSERT/DELETE of records

2016-05-25 Thread Laing, Michael

You could also follow this related issue:
https://issues.apache.org/jira/browse/CASSANDRA-8844

On Wed, May 25, 2016 at 12:04 PM, Aaditya Vadnere  wrote:

> Thanks Eric and Mark, we were thinking along similar lines. But we already
> need Cassandra for regular database purpose, so instead of having both
> Kafka and Cassandra, the possibility of using Cassandra alone was explored.
>
> Another usecase where update notification can be useful is when we want to
> synchronize two or more instances of same component. Say two threads of
> component 'A' can share the same database. When a record is updated in
> database by thread 1, a notification is sent to thread 2. After that thread
> 2, performs a read.
>
> I think this also is an anti-pattern.
>
> Regards,
> Aaditya
>
> On Tue, May 24, 2016 at 12:45 PM, Mark Reddy 
> wrote:
>
>> +1 to what Eric said, a queue is a classic C* anti-pattern. Something
>> like Kafka or RabbitMQ might fit your use case better.
>>
>>
>> Mark
>>
>> On 24 May 2016 at 18:03, Eric Stevens  wrote:
>>
>>> It sounds like you're trying to build a queue in Cassandra, which is one
>>> of the classic anti-pattern use cases for Cassandra.
>>>
>>> You may be able to do something clever with triggers, but I highly
>>> recommend you look at purpose-built queuing software such as Kafka to solve
>>> this instead.
>>>
>>> On Tue, May 24, 2016 at 9:49 AM Aaditya Vadnere 
>>> wrote:
>>>
 Hi experts,

 We are evaluating Cassandra as messaging infrastructure for a project.

 In our workflow Cassandra database will be synchronized across two
 nodes, a component will INSERT/UPDATE records on one node and another
 component (who has registered for the specific table) on second node will
 get notified of record change.

 The second component will then try to read the database to find out the
 specific message.

 Is it possible for Cassandra to support such workflow? Basically, is
 there a way for Cassandra to generate a notification anytime schema changes
 (so we can set processes to listen for schema changes). As I understand,
 polling the database periodically or database triggers might work but they
 are costly operations.

 --
 Aaditya Vadnere

>>>
>>
>
>
> --
> Aaditya Vadnere
>

Re: Adding disk capacity to a running node

2016-10-17 Thread Laing, Michael

You could just expand the size of your ebs volume and extend the file
system. No data is lost - assuming you are running Linux.

On Monday, October 17, 2016, Seth Edwards  wrote:

> We're running 2.0.16. We're migrating to a new data model but we've had an
> unexpected increase in write traffic that has caused us some capacity
> issues when we encounter compactions. Our old data model is on STCS. We'd
> like to add another ebs volume (we're on aws) to our JBOD config and
> hopefully avoid any situation where we run out of disk space during a large
> compaction. It appears that the behavior we are hoping to get is actually
> undesirable and removed in 3.2. It still might be an option for us until we
> can finish the migration.
>
> I'm not familiar with LVM so it may be a bit risky to try at this point.
>
> On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng  > wrote:
>
>> I assume you're talking about Cassandra JBOD (just a bunch of disk) setup
>> because you do mention it as adding it to the list of data directories. If
>> this is the case, you may run into issues, depending on your C* version.
>> Check this out: http://www.datastax.com/dev/blog/improving-jbod.
>>
>> Or another approach is to use LVM to manage multiple devices into a
>> single mount point. If you do so, from what Cassandra can see is just
>> simply increased disk storage space and there should should have no problem.
>>
>> Hope this helps,
>>
>> Yabin
>>
>> On Mon, Oct 17, 2016 at 11:54 AM, Vladimir Yudovin > > wrote:
>>
>>> Yes, Cassandra should keep percent of disk usage equal for all disk.
>>> Compaction process and SSTable flushes will use new disk to distribute both
>>> new and existing data.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>>
>>> *Winguzone  - Hosted Cloud Cassandra on
>>> Azure and SoftLayer.Launch your cluster in minutes.*
>>>
>>>
>>>  On Mon, 17 Oct 2016 11:43:27 -0400*Seth Edwards >> >* wrote 
>>>
>>> We have a few nodes that are running out of disk capacity at the moment
>>> and instead of adding more nodes to the cluster, we would like to add
>>> another disk to the server and add it to the list of data directories. My
>>> question, is, will Cassandra use the new disk for compactions on sstables
>>> that already exist in the primary directory?
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael

As I tried to say, EBS snapshots require much care or you get corruption
such as you have encountered.

Does Cassandra quiesce the file system after a snapshot using fsfreeze or
xfs_freeze? Somehow I doubt it...


On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad  wrote:

> I have a nagging memory of reading about issues with virtualization and
> not actually having durable versions of your data even after an fsync
> (within the VM).  Googling around lead me to this post:
> http://petercai.com/virtualization-is-bad-for-database-integrity/
>
> It's possible you're hitting this issue, with with the virtualization
> layer, or with EBS itself.  Just a shot in the dark though, other people
> would likely know much more than I.
>
>
>
> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie  wrote:
>
>> Robert,
>>
>> That is what I thought as well.  But apparently something is happening.
>>  The only way I can get away with doing this is adding a sleep 60 right
>> after the nodetool snapshot is executed.  I can reproduce this 100% of the
>> time by not issuing a sleep after nodetool snapshot.
>>
>> This is the error.
>>
>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>> java.io.EOFException
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>>  at
>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>>  at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.io.EOFException
>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
>>  ... 11 more
>>
>>
>>   On Friday, March 28, 2014 2:38 PM, Robert Coli 
>> wrote:
>>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:
>>
>> Thank you for your quick response.
>>
>> Is there a way to tell when a snapshot is completely done?
>>
>>
>> IIRC, the JMX call blocks until the snapshot completes. It should be done
>> when nodetool returns.
>>
>> =Rob
>>
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael

+1 for tablesnap


On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad  wrote:

> I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
> predictable.
>
> Additionally, from a practical standpoint, you may want to back up your
> sstables somewhere.  If you use S3, it's easy to pull just the new tables
> out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
> incur the overhead of routinely backing up the entire dataset.  For a non
> trivial database, this matters quite a bit.
>
>
> On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael  > wrote:
>
>> As I tried to say, EBS snapshots require much care or you get corruption
>> such as you have encountered.
>>
>> Does Cassandra quiesce the file system after a snapshot using fsfreeze or
>> xfs_freeze? Somehow I doubt it...
>>
>>
>> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote:
>>
>>> I have a nagging memory of reading about issues with virtualization and
>>> not actually having durable versions of your data even after an fsync
>>> (within the VM).  Googling around lead me to this post:
>>> http://petercai.com/virtualization-is-bad-for-database-integrity/
>>>
>>> It's possible you're hitting this issue, with with the virtualization
>>> layer, or with EBS itself.  Just a shot in the dark though, other people
>>> would likely know much more than I.
>>>
>>>
>>>
>>> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie wrote:
>>>
>>>> Robert,
>>>>
>>>> That is what I thought as well.  But apparently something is happening.
>>>>  The only way I can get away with doing this is adding a sleep 60 right
>>>> after the nodetool snapshot is executed.  I can reproduce this 100% of the
>>>> time by not issuing a sleep after nodetool snapshot.
>>>>
>>>> This is the error.
>>>>
>>>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
>>>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>>>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>>>> java.io.EOFException
>>>> at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:108)
>>>> at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>>>>  at
>>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>>>> at
>>>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>>>>  at
>>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
>>>> at
>>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>>>> at
>>>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> Caused by: java.io.EOFException
>>>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>>> at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:83)
>>>>  ... 11 more
>>>>
>>>>
>>>>   On Friday, March 28, 2014 2:38 PM, Robert Coli 
>>>> wrote:
>>>>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie wrote:
>>>>
>>>> Thank you for your quick response.
>>>>
>>>> Is there a way to tell when a snapshot is completely done?
>>>>
>>>>
>>>> IIRC, the JMX call blocks until the snapshot completes. It should be
>>>> done when nodetool returns.
>>>>
>>>> =Rob
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> skype: rustyrazorblade
>>>
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Re: Timeseries with TTL

2014-04-06 Thread Laing, Michael

Since you are using LeveledCompactionStrategy there is no major/minor
compaction - just compaction.

Leveled compaction does more work - your logs don't look unreasonable to me
- the real question is whether your nodes can keep up w the IO. SSDs work
best.

BTW if you never delete and only ttl your values at a constant value, you
can set gc=0 and forget about periodic repair of the table, saving some
space, IO, CPU, and an operational step.

If your nodes cannot keep up the IO, switch to SizeTieredCompaction and
monitor read response times. Or add SSDs.

In my experience, for smallish nodes running C* 2 without SSDs,
LeveledCompactionStrategy
can cause the disk cache to churn, reducing read performance substantially.
So watch out for that.

Good luck,

Michael


On Sun, Apr 6, 2014 at 10:25 AM, Vicent Llongo  wrote:

> Hi,
>
> Most of the queries to that table are just getting a range of values for a
> metric:
> SELECT val FROM metrics_5min WHERE uid = ? AND metric = ? AND ts >= ? AND
> ts <= ?
>
> I'm not sure from the logs what kind of compactions they are. This is what
> I see in system.log (grepping for that specific table):
>
> ...
> INFO [CompactionExecutor:742] 2014-04-06 13:30:11,223 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14991-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14990-Data.db')]
> INFO [CompactionExecutor:753] 2014-04-06 13:35:22,495 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14992-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14993-Data.db')]
> INFO [CompactionExecutor:770] 2014-04-06 13:41:09,146 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14995-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14994-Data.db')]
> INFO [CompactionExecutor:783] 2014-04-06 13:46:21,250 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14996-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14997-Data.db')]
> INFO [CompactionExecutor:798] 2014-04-06 13:51:28,369 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14998-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14999-Data.db')]
> INFO [CompactionExecutor:816] 2014-04-06 13:57:17,585 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15000-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15001-Data.db')]
> ...
>
> As you can see every ~5 minutes there's a compaction going on.
>
>
>
>
> On Sun, Apr 6, 2014 at 4:33 PM, Sergey Murylev wrote:
>
>>  Hi Vincent,
>>
>>
>> Is that a good pattern for Cassandra? Is there some compaction tunings I
>> should take into account?
>>
>> Actually it depends on how you use Cassandra :). If you use it as
>> key-value storage TTL works fine. But if you would use rather complex CQL
>> queries to this table I not sure that it would be good.
>>
>>
>> With this structure is obvious that after one week inserting data, from
>> that moment there's gonna be new expired columns every 5 minutes in that
>> table. Because of that I've noticed that this table is being compacted
>> every 5 minutes.
>>
>> Compaction doesn't triggered when some column expired. It triggered on
>> gc_grace_seconds timeout and according compaction strategy. You can see
>> more detailed description of LeveledCompactionStrategy in following
>> article: Leveled compaction in 
>> Cassandra.
>>
>>
>> There are 2 types of compaction: minor and major, which kind of
>> compaction do you see and how come to conclusion that compaction triggered
>> every 5 minutes? If you see major compaction that situation is very bad
>> otherwise it is normal case.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>>
>> On 06/04/14 15:48, Vicent Llongo wrote:
>>
>>  Hi there,
>>
>>  I have this table where I'm inserting timeseries values with a TTL of
>> 86400*7 (1week):
>>
>> CREATE TABLE metrics_5min (
>>   object_id varchar,
>>   metric varchar,
>>   ts timestamp,
>>   val double,
>>   PRIMARY KEY ((object_id, metric), ts)
>> )
>> WITH gc_grace_seconds = 86400
>> AND compaction = {'class': 'LeveledCompactionStrategy',
>> 'sstable_size_in_mb' : 100};
>>
>>
>>  With this structure is obvious that after one week inserting data, from
>

Re: Setting gc_grace_seconds to zero and skipping "nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Laing, Michael

Perhaps following this recent thread would help clarify things:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201401.mbox/%3ccakgmdnfk3pa-w+ltusm88a15jdg275o31p4ujwol1b7bkaj...@mail.gmail.com%3E

Cheers,

Michael


On Mon, Apr 7, 2014 at 2:00 PM, Donald Smith <
donald.sm...@audiencescience.com> wrote:

>  This statement is significant: “BTW if you never delete and only ttl
> your values at a constant value, you can set gc=0 and forget about periodic
> repair of the table, saving some space, IO, CPU, and an operational step.”
>
>
> Setting gc_grace_seconds to zero has the effect of not storing hinted
> handoffs (which prevent deleted data from reappearing), I believe.
> “Periodic repair” refers to running “nodetool repair” (aka Anti-Entropy).
>
>
>
> I too have wondered if setting gc_grace_seconds to zero and skipping
> “nodetool repair” are safe.
>
>
>
> We’re using C* 2.0.6. In the 2.0.X versions, with vnodes, “nodetool repair
> …” is very slow (see https://issues.apache.org/jira/browse/CASSANDRA-5220and
> https://issues.apache.org/jira/browse/CASSANDRA-6611).We found read
> repairs via “nodetool repair” unacceptably slow, even when we restricted it
> to one table, and often the repairs hung or failed.  We also tried subrange
> repairs and the other options.
>
>
>
> Our app does no deletes and only rarely updates a row (if there was bad
> data that needs to be replaced).  So it’s very tempting to set
> gc_grace_seconds = 0 in the table definitions and skip read repairs.
>
>
>
> But there is Cassandra documentation that warns that read repairs are
> necessary even if you don’t do deletes. For example,
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.htmlsays:
>
>
>
>  Note: If deletions never occur, you should still schedule regular
> repairs. Be aware that setting a column to null is a delete.
>
>
>
> The apache wiki
> https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repairsays:
>
>  Unless your application performs no deletes, it is strongly recommended
> that production clusters run nodetool repair periodically on all nodes in
> the cluster.
>
> *IF* your operations team is sufficiently on the ball, you can get by
> without repair as long as you do not have hardware failure -- in that case,
> HintedHandoff <https://wiki.apache.org/cassandra/HintedHandoff> is
> adequate to repair successful updates that some replicas have missed.
> Hinted handoff is active for max_hint_window_in_ms after a replica fails.
>
> Full repair or re-bootstrap is necessary to re-replicate data lost to
> hardware failure (see below).
>
> So, if there are hardware failures, “nodetool repair” is needed.  And
> http://planetcassandra.org/general-faq/ says:
>
>
>
> Anti-Entropy Node Repair – For data that is not read frequently, or to
> update data on a node that has been down for an extended period, the node
> repair process (also referred to as anti-entropy repair) ensures that all
> data on a replica is made consistent. Node repair (using the nodetool
> utility) should be run routinely as part of regular cluster maintenance
> operations.
>
>
>
> If RF=2, ReadConsistency is ONE and data failed to get replicated to the
> second node, then during a read might the app incorrectly return “missing
> data”?
>
>
>
> It seems to me that the need to run “nodetool repair” reflects a design
> bug; it should be automated.
>
>
>
> Don
>
>
>
> *From:* Laing, Michael [mailto:michael.la...@nytimes.com]
> *Sent:* Sunday, April 06, 2014 11:31 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Timeseries with TTL
>
>
>
> Since you are using LeveledCompactionStrategy there is no major/minor
> compaction - just compaction.
>
>
>
> Leveled compaction does more work - your logs don't look unreasonable to
> me - the real question is whether your nodes can keep up w the IO. SSDs
> work best.
>
>
>
> BTW if you never delete and only ttl your values at a constant value, you
> can set gc=0 and forget about periodic repair of the table, saving some
> space, IO, CPU, and an operational step.
>
>
>
> If your nodes cannot keep up the IO, switch to SizeTieredCompaction and
> monitor read response times. Or add SSDs.
>
>
>
> In my experience, for smallish nodes running C* 2 without
> SSDs, LeveledCompactionStrategy can cause the disk cache to churn, reducing
> read performance substantially. So watch out for that.
>
>
>
> Good luck,
>
>
>
> Michael
>
>
>
> On Sun, Apr 6, 2014 at 10:25 AM, Vicent Llongo  wrote:
>
> Hi,
>
>
>
> Most of the queries to that table a

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael

I have played with this quite a bit and recommend you set gc_grace_seconds
to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.

A caveat I have is that we use C* 2.0.6 - but the space we expect to
recover is in fact recovered.

Actually, since we never delete explicitly (just ttl) we always have
gc_grace_seconds set to 0.

Another important caveat is to be careful with repair: having set gc to 0
and compacted on a node, if you then repair it, data may come streaming in
from the other nodes. We don't run into this, as our gc is always 0, but
others may be able to comment.

ml


On Fri, Apr 11, 2014 at 11:26 AM, William Oberman
wrote:

> Yes, I'm using SizeTiered.
>
> I totally understand the "mess up the heuristics" issue.  But, I don't
> understand "You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables".  My
> understanding is the small tables will still compact.  The problem is that
> until I have 3 other (by default) tables of the same size as the "big
> table", it won't be compacted.
>
> In my case, this might not be terrible though, right?  To get into the
> trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
> 90-95% of the data, so I expect the data to be 25-50GB after the tombstones
> are cleared, but call it 50GB.  That means I won't compact this 50GB file
> until I gather another 150GB (50,50,50,50->200).   But, that's not
> *horrible*.  Now, if I only deleted 10% of the data, waiting to compact
> 450GB until I had another 1.3TB would be rough...
>
> I think your advice is great for people looking for "normal" answers in
> the forum, but I don't think my use case is very normal :-)
>
> will
>
> On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy wrote:
>
>> Yes, running nodetool compact (major compaction) creates one large
>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>> this the compaction strategy you are using?) leading to multiple 'small'
>> SSTables alongside the single large SSTable, which results in increased
>> read latency. You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables. For all these
>> reasons it is generally advised to stay away from running compactions
>> manually.
>>
>> Assuming that this is a production environment and you want to keep
>> everything running as smoothly as possible I would reduce the gc_grace on
>> the CF, allow automatic minor compactions to kick in and then increase the
>> gc_grace once again after the tombstones have been removed.
>>
>>
>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> So, if I was impatient and just "wanted to make this happen now", I
>>> could:
>>>
>>> 1.) Change GCGraceSeconds of the CF to 0
>>> 2.) run nodetool compact (*)
>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>
>>> Since I have ~900M tombstones, even if I miss a few due to impatience, I
>>> don't care *that* much as I could re-run my clean up tool against the now
>>> much smaller CF.
>>>
>>> (*) A long long time ago I seem to recall reading advice about "don't
>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>> long term consequence?  Short term there are several:
>>> -a heavy operation
>>> -temporary 2x disk space
>>> -one big SSTable afterwards
>>> But moving forward, everything is ok right?
>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>> etc...  The only flaw I can think of is it will take forever until the
>>> SSTable minor compactions build up enough to consider including the big
>>> SSTable in a compaction, making it likely I'll have to self manage
>>> compactions.
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy wrote:
>>>
 Correct, a tombstone will only be removed after gc_grace period has
 elapsed. The default value is set to 10 days which allows a great deal of
 time for consistency to be achieved prior to deletion. If you are
 operationally confident that you can achieve consistency via anti-entropy
 repairs within a shorter period you can always reduce that 10 day interval.


 Mark


 On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
 ober...@civicscience.com> wrote:

> I'm seeing a lot of articles about a dependency between removing
> tombstones and GCGraceSeconds, which might be my problem (I just checked,
> and this CF has GCGraceSeconds of 10 days).
>
>
> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
> tbarbu...@gmail.com> wrote:
>
>> compaction should take care of it; for me it never worked so I run
>> nodetool compaction on every node; that does it.
>>
>>
>> 2014-04-11 16:05 GMT+02:00 William Oberman 
>> :
>>
>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>> nodetool repair, or time (as in just

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael

At the cost of really quite a lot of compaction, you can temporarily switch
to SizeTiered, and when that is completely done (check each node), switch
back to Leveled.

it's like doing the laundry twice :)

I've done this on CFs that were about 5GB but I don't see why it wouldn't
work on larger ones.

ml

On Fri, Apr 11, 2014 at 1:33 PM, Paulo Ricardo Motta Gomes <
paulo.mo...@chaordicsystems.com> wrote:

> This thread is really informative, thanks for the good feedback.
>
> My question is : Is there a way to force tombstones to be clared with LCS?
> Does scrub help in any case? Or the only solution would be to create a new
> CF and migrate all the data if you intend to do a large CF cleanup?
>
> Cheers,
>
>
> On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy wrote:
>
>> Thats great Will, if you could update the thread with the actions you
>> decide to take and the results that would be great.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <
>> ober...@civicscience.com> wrote:
>>
>>> I've learned a *lot* from this thread.  My thanks to all of the
>>> contributors!
>>>
>>> Paulo: Good luck with LCS.  I wish I could help there, but all of my
>>> CF's are SizeTiered (mostly as I'm on the same schema/same settings since
>>> 0.7...)
>>>
>>> will
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib wrote:
>>>

 Levelled Compaction is a wholly different beast when it comes to
 tombstones.

 The tombstones are inserted, like any other write really, at the lower
 levels in the leveldb hierarchy.

 They are only removed after they have had the chance to "naturally"
 migrate upwards in the leveldb hierarchy to the highest level in your data
 store.  How long that takes depends on:
  1. The amount of data in your store and the number of levels your LCS
 strategy has
 2. The amount of new writes entering the bottom funnel of your leveldb,
 forcing upwards compaction and combining

 To give you an idea, I had a similar scenario and ran a (slow,
 throttled) delete job on my cluster around December-January.  Here's a
 graph of the disk space usage on one node.  Notice the still-diclining
 usage long after the cleanup job has finished (sometime in January).  I
 tend to think of tombstones in LCS as little bombs that get to explode much
 later in time:

 http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg

 On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
 paulo.mo...@chaordicsystems.com> wrote:

 I have a similar problem here, I deleted about 30% of a very large CF
 using LCS (about 80GB per node), but still my data hasn't shrinked, even if
 I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
 scrub forces a minor compaction?

 Cheers,

 Paulo

 On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy wrote:

> Yes, running nodetool compact (major compaction) creates one large
> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
> this the compaction strategy you are using?) leading to multiple 'small'
> SSTables alongside the single large SSTable, which results in increased
> read latency. You will incur the operational overhead of having to manage
> compactions if you wish to compact these smaller SSTables. For all these
> reasons it is generally advised to stay away from running compactions
> manually.
>
> Assuming that this is a production environment and you want to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
> ober...@civicscience.com> wrote:
>
>> So, if I was impatient and just "wanted to make this happen now", I
>> could:
>>
>> 1.) Change GCGraceSeconds of the CF to 0
>> 2.) run nodetool compact (*)
>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>
>> Since I have ~900M tombstones, even if I miss a few due to
>> impatience, I don't care *that* much as I could re-run my clean up tool
>> against the now much smaller CF.
>>
>> (*) A long long time ago I seem to recall reading advice about "don't
>> ever run nodetool compact", but I can't remember why.  Is there any bad
>> long term consequence?  Short term there are several:
>> -a heavy operation
>> -temporary 2x disk space
>> -one big SSTable afterwards
>> But moving forward, everything is ok right?
>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>> etc...  The only flaw I can think of is it will take forever until the
>> SSTable minor compactions build up enough to consider including the big
>> SSTable in a com

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael

I've never noticed that that setting tombstone_threshold has any effect...
at least in 2.0.6.

What gets written to the log?


On Fri, Apr 11, 2014 at 3:31 PM, DuyHai Doan  wrote:

> I was wondering, to remove the tombstones from Sstables created by LCS,
> why don't we just set the tombstone_threshold table property to a very
> small value (say 0.01)..?
>
> As the doc said (
> www.datastax.com/documentation/cql/3.0/cql/cql_reference/compactSubprop.html)
> this will force compaction on the sstable itself for the purpose of
> cleaning tombstones, no merging with other sstables is done.
>
> In addition this property applies to both compaction strategies :-)
>
> Isn't a little bit lighter than changing strategy and hoping for the best?
>
> Regards
>
> Duy Hai DOAN
>  Le 11 avr. 2014 20:16, "Robert Coli"  a écrit :
>
> On Fri, Apr 11, 2014 at 10:33 AM, Paulo Ricardo Motta Gomes <
>> paulo.mo...@chaordicsystems.com> wrote:
>>
>>> My question is : Is there a way to force tombstones to be clared with
>>> LCS? Does scrub help in any case?
>>>
>>
>> 1) Switch to size tiered compaction, compact, and switch back. Not only
>> "with LCS", but...
>>
>> 2)  scrub does a 1:1 rewrite of sstables, watching for corruption. I
>> believe it does throw away tombstones if it is able to, but that is not the
>> purpose of it.
>>
>> =Rob
>>
>>

Re: Deleting column names

2014-04-22 Thread Laing, Michael

Referring to the original post, I think the confusion is what is a "row" in
this context:

So as far as I understand, the s column is now the *row *key

...

Since I have multiple different p, o, c combinations per s, deleting the whole
> *row* identified by s is no option

The s column is in fact the *partition_key*, not the row key, which is the
composite of all 4 columns (the partiton_key plus the clustering columns).

Deleting the row, as Steven correctly showed, will not delete the
partition, but only the row - the tuple of the 4 columns.

Terminology has changed with cql and we all have to get used to it.

ml

On Mon, Apr 21, 2014 at 10:00 PM, Steven A Robenalt
wrote:

> Is there a reason you can't use:
>
> DELETE FROM table_name WHERE s = ? AND p = ? AND o = ? AND c = ?;
>
>
> On Mon, Apr 21, 2014 at 6:51 PM, Eric Plowe  wrote:
>
>> Also I don't think you can null out columns that are part of the primary
>> key after they've been set.
>>
>>
>> On Monday, April 21, 2014, Andreas Wagner <
>> andreas.josef.wag...@googlemail.com> wrote:
>>
>>> Hi cassandra users, hi Sebastian,
>>>
>>> I'd be interested in this ... is there any update/solution?
>>>
>>> Thanks so much ;)
>>> Andreas
>>>
>>> On 04/16/2014 11:43 AM, Sebastian Schmidt wrote:
>>>
 Hi,

 I'm using a Cassandra table to store some data. I created the table like
 this:
 CREATE TABLE IF NOT EXISTS table_name (s BLOB, p BLOB, o BLOB, c BLOB,
 PRIMARY KEY (s, p, o, c));

 I need the at least the p column to be sorted, so that I can use it in a
 WHERE clause. So as far as I understand, the s column is now the row
 key, and (p, o, c) is the column name.

 I tried to delete single entries with a prepared statement like this:
 DELETE p, o, c FROM table_name WHERE s = ? AND p = ? AND o = ? AND c =
 ?;

 That didn't work, because p is a primary key part. It failed during
 preparation.

 I also tried to use variables like this:
 DELETE ?, ?, ? FROM table_name WHERE s = ?;

 This also failed during preparation, because ? is an unknown identifier.

 Since I have multiple different p, o, c combinations per s, deleting the
 whole row identified by s is no option. So how can I delete a s, p, o, c
 tuple, without deleting other s, p, o, c tuples with the same s? I know
 that this worked with Thrift/Hector before.

 Regards,
 Sebastian

>>>
>>>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobe...@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>

Re: Deleting column names

2014-04-22 Thread Laing, Michael

Your understanding is incorrect - the easiest way to see that is to try it.


On Tue, Apr 22, 2014 at 12:00 PM, Sebastian Schmidt wrote:

> From my understanding, this would delete all entries with the given s.
> Meaning, if I have inserted (sa, p1, o1, c1) and (sa, p2, o2, c2),
> executing this:
>
> DELETE FROM table_name WHERE s = sa AND p = p1 AND o = o1 AND c = c1
>
> would delete sa, p1, o1, c1, p2, o2, c2. Is this correct? Or does the
> above statement only delete p1, o1, c1?
>
>
> 2014-04-22 4:00 GMT+02:00 Steven A Robenalt :
>
> Is there a reason you can't use:
>>
>> DELETE FROM table_name WHERE s = ? AND p = ? AND o = ? AND c = ?;
>>
>>
>> On Mon, Apr 21, 2014 at 6:51 PM, Eric Plowe  wrote:
>>
>>> Also I don't think you can null out columns that are part of the primary
>>> key after they've been set.
>>>
>>>
>>> On Monday, April 21, 2014, Andreas Wagner <
>>> andreas.josef.wag...@googlemail.com> wrote:
>>>
 Hi cassandra users, hi Sebastian,

 I'd be interested in this ... is there any update/solution?

 Thanks so much ;)
 Andreas

 On 04/16/2014 11:43 AM, Sebastian Schmidt wrote:

> Hi,
>
> I'm using a Cassandra table to store some data. I created the table
> like
> this:
> CREATE TABLE IF NOT EXISTS table_name (s BLOB, p BLOB, o BLOB, c BLOB,
> PRIMARY KEY (s, p, o, c));
>
> I need the at least the p column to be sorted, so that I can use it in
> a
> WHERE clause. So as far as I understand, the s column is now the row
> key, and (p, o, c) is the column name.
>
> I tried to delete single entries with a prepared statement like this:
> DELETE p, o, c FROM table_name WHERE s = ? AND p = ? AND o = ? AND c =
> ?;
>
> That didn't work, because p is a primary key part. It failed during
> preparation.
>
> I also tried to use variables like this:
> DELETE ?, ?, ? FROM table_name WHERE s = ?;
>
> This also failed during preparation, because ? is an unknown
> identifier.
>
>
> Since I have multiple different p, o, c combinations per s, deleting
> the
> whole row identified by s is no option. So how can I delete a s, p, o,
> c
> tuple, without deleting other s, p, o, c tuples with the same s? I know
> that this worked with Thrift/Hector before.
>
> Regards,
> Sebastian
>


>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>>  HighWire | Stanford University
>> 425 Broadway St, Redwood City, CA 94063
>>
>> srobe...@stanford.edu
>> http://highwire.stanford.edu
>>
>>
>>
>>
>>
>>
>

Re: Schema disagreement errors

2014-05-12 Thread Laing, Michael

Upgrade to 2.0.7 fixed this for me.

You can also try 'nodetool resetlocalschema' on disagreeing nodes. This
worked temporarily for me in 2.0.6.

ml


On Mon, May 12, 2014 at 3:31 PM, Gaurav Sehgal  wrote:

> We have recently started seeing a lot of Schema Disagreement errors. We
> are using Cassandra 2.0.6 with Oracle Java 1.7. I went through the
> Cassandra FAQ and followed the below steps:
>
>
>
>- nodetool disablethrift
>- nodetool disablegossip
>- nodetool drain
>-
>
>'kill '.
>
>
> As per the documentation; the commit logs should have been flush; but that
> did not happen in our case. The commit logs were still there. So, I removed
> them manually to make sure there are no commit logs when cassandra start
> up( which was fine in our case as this data can always be replayed).  I
> also deleted the schema* directory from the /data/system folder.
>
> Though when we started cassandra back up the issue started happening again.
>
>
> Any help would be appreciated
>
> Cheers!
> Gaurav
>
>
>

Re: migration to a new model

2014-06-03 Thread Laing, Michael

Hi Marcelo,

I could create a fast copy program by repurposing some python apps that I
am using for benchmarking the python driver - do you still need this?

With high levels of concurrency and multiple subprocess workers, based on
my current actual benchmarks, I think I can get well over 1,000 rows/second
on my mac and significantly more in AWS. I'm using variable size rows
averaging 5kb.

This would be the initial version of a piece of the benchmark suite we will
release as part of our nyt⨍aбrik project on 21 June for my Cassandra Day
NYC talk re the python driver.

ml


On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:

> Hi Jens,
>
> Thanks for trying to help.
>
> Indeed, I know I can't do it using just CQL. But what would you use to
> migrate data manually? I tried to create a python program using auto
> paging, but I am getting timeouts. I also tried Hive, but no success.
> I only have two nodes and less than 200Gb in this cluster, any simple way
> to extract the data quickly would be good enough for me.
>
> Best regards,
> Marcelo.
>
>
>
> 2014-06-02 15:08 GMT-03:00 Jens Rantil :
>
> Hi Marcelo,
>>
>> Looks like you can't do this without migrating your data manually:
>> https://stackoverflow.com/questions/18421668/alter-cassandra-column-family-primary-key-using-cassandra-cli-or-cql
>>
>> Cheers,
>> Jens
>>
>>
>> On Mon, Jun 2, 2014 at 7:48 PM, Marcelo Elias Del Valle <
>> marc...@s1mbi0se.com.br> wrote:
>>
>>> Hi,
>>>
>>> I have some cql CFs in a 2 node Cassandra 2.0.8 cluster.
>>>
>>> I realized I created my column family with the wrong partition. Instead
>>> of:
>>>
>>> CREATE TABLE IF NOT EXISTS entity_lookup (
>>>   name varchar,
>>>   value varchar,
>>>   entity_id uuid,
>>>   PRIMARY KEY ((name, value), entity_id))
>>> WITH
>>> caching=all;
>>>
>>> I used:
>>>
>>> CREATE TABLE IF NOT EXISTS entitylookup (
>>>   name varchar,
>>>   value varchar,
>>>   entity_id uuid,
>>>   PRIMARY KEY (name, value, entity_id))
>>> WITH
>>> caching=all;
>>>
>>>
>>> Now I need to migrate the data from the second CF to the first one.
>>> I am using Data Stax Community Edition.
>>>
>>> What would be the best way to convert data from one CF to the other?
>>>
>>> Best regards,
>>> Marcelo.
>>>
>>
>>
>

Re: High latency on 5 node Cassandra Cluster

2014-06-04 Thread Laing, Michael

I would first check to see if there was a time synchronization issue among
nodes that triggered and/or perpetuated the event.

ml


On Wed, Jun 4, 2014 at 3:12 AM, Arup Chakrabarti  wrote:

> Hello. We had some major latency problems yesterday with our 5 node
> cassandra cluster. Wanted to get some feedback on where we could start to
> look to figure out what was causing the issue. If there is more info I
> should provide, please let me know.
>
> Here are the basics of the cluster:
> Clients: Hector and Cassie
> Size: 5 nodes (2 in AWS US-West-1, 2 in AWS US-West-2, 1 in Linode Fremont)
> Replication Factor: 5
> Quorum Reads and Writes enabled
> Read Repair set to true
> Cassandra Version: 1.0.12
>
> We started experiencing catastrophic latency from our app servers. We
> believed at the time this was due to compactions running, and the clients
> were not re-routing appropriately, so we disabled thrift on a single node
> that had high load. This did not resolve the issue. After that, we stopped
> gossip on the same node that had high load on it, again this did not
> resolve anything. We then took down gossip on another node (leaving 3/5 up)
> and that fixed the latency from the application side. For a period of ~4
> hours, every time we would try to bring up a fourth node, the app would see
> the latency again. We then rotated the three nodes that were up to make
> sure it was not a networking event related to a single region/provider and
> we kept seeing the same problem: 3 nodes showed no latency problem, 4 or 5
> nodes would. After the ~4hours, we brought the cluster up to 5 nodes and
> everything was fine.
>
> We currently have some ideas on what caused this behavior, but has anyone
> else seen this type of problem where a full cluster causes problems, but
> removing nodes fixes it? Any input on what to look for in our logs to
> understand the issue?
>
> Thanks
>
> Arup
>

Re: migration to a new model

2014-06-04 Thread Laing, Michael

OK Marcelo, I'll work on it today. -ml


On Tue, Jun 3, 2014 at 8:24 PM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:

> Hi Michael,
>
> For sure I would be interested in this program!
>
> I am new both to python and for cql. I started creating this copier, but
> was having problems with timeouts. Alex solved my problem here on the list,
> but I think I will still have a lot of trouble making the copy to work fine.
>
> I open sourced my version here:
> https://github.com/s1mbi0se/cql_record_processor
>
> Just in case it's useful for anything.
>
> However, I saw CQL has support for concurrency itself and having something
> made by someone who knows Python CQL Driver better would be very helpful.
>
> My two servers today are at OVH (ovh.com), we have servers at AWS but but
> several cases we prefer other hosts. Both servers have SDD and 64 Gb RAM, I
> could use the script as a benchmark for you if you want. Besides, we have
> some bigger clusters, I could run on the just to test the speed if this is
> going to help.
>
> Regards
> Marcelo.
>
>
> 2014-06-03 11:40 GMT-03:00 Laing, Michael :
>
> Hi Marcelo,
>>
>> I could create a fast copy program by repurposing some python apps that I
>> am using for benchmarking the python driver - do you still need this?
>>
>> With high levels of concurrency and multiple subprocess workers, based on
>> my current actual benchmarks, I think I can get well over 1,000 rows/second
>> on my mac and significantly more in AWS. I'm using variable size rows
>> averaging 5kb.
>>
>> This would be the initial version of a piece of the benchmark suite we
>> will release as part of our nyt⨍aбrik project on 21 June for my
>> Cassandra Day NYC talk re the python driver.
>>
>> ml
>>
>>
>> On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle <
>> marc...@s1mbi0se.com.br> wrote:
>>
>>> Hi Jens,
>>>
>>> Thanks for trying to help.
>>>
>>> Indeed, I know I can't do it using just CQL. But what would you use to
>>> migrate data manually? I tried to create a python program using auto
>>> paging, but I am getting timeouts. I also tried Hive, but no success.
>>> I only have two nodes and less than 200Gb in this cluster, any simple
>>> way to extract the data quickly would be good enough for me.
>>>
>>> Best regards,
>>> Marcelo.
>>>
>>>
>>>
>>> 2014-06-02 15:08 GMT-03:00 Jens Rantil :
>>>
>>> Hi Marcelo,
>>>>
>>>> Looks like you can't do this without migrating your data manually:
>>>> https://stackoverflow.com/questions/18421668/alter-cassandra-column-family-primary-key-using-cassandra-cli-or-cql
>>>>
>>>> Cheers,
>>>> Jens
>>>>
>>>>
>>>> On Mon, Jun 2, 2014 at 7:48 PM, Marcelo Elias Del Valle <
>>>> marc...@s1mbi0se.com.br> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have some cql CFs in a 2 node Cassandra 2.0.8 cluster.
>>>>>
>>>>> I realized I created my column family with the wrong partition.
>>>>> Instead of:
>>>>>
>>>>> CREATE TABLE IF NOT EXISTS entity_lookup (
>>>>>   name varchar,
>>>>>   value varchar,
>>>>>   entity_id uuid,
>>>>>   PRIMARY KEY ((name, value), entity_id))
>>>>> WITH
>>>>> caching=all;
>>>>>
>>>>> I used:
>>>>>
>>>>> CREATE TABLE IF NOT EXISTS entitylookup (
>>>>>   name varchar,
>>>>>   value varchar,
>>>>>   entity_id uuid,
>>>>>   PRIMARY KEY (name, value, entity_id))
>>>>> WITH
>>>>> caching=all;
>>>>>
>>>>>
>>>>> Now I need to migrate the data from the second CF to the first one.
>>>>> I am using Data Stax Community Edition.
>>>>>
>>>>> What would be the best way to convert data from one CF to the other?
>>>>>
>>>>> Best regards,
>>>>> Marcelo.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: migration to a new model

2014-06-04 Thread Laing, Michael

Marcelo,

Here is a link to the preview of the python fast copy program:

https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47

It will copy a table from one cluster to another with some transformation-
they can be the same cluster.

It has 3 main throttles to experiment with:

   1. fetch_size: size of source pages in rows
   2. worker_count: number of worker subprocesses
   3. concurrency: number of async callback chains per worker subprocess

It is easy to overrun Cassandra and the python driver, so I recommend
starting with the defaults: fetch_size: 1000; worker_count: 2; concurrency:
10.

Additionally there are switches to set 'policies' by source and
destination: retry (downgrade consistency), dc_aware, and token_aware.
retry is useful if you are getting timeouts. For the others YMMV.

To use it you need to define the SELECT and UPDATE cql statements as well
as the 'map_fields' method.

The worker subprocesses divide up the token range among themselves and
proceed quasi-independently. Each worker opens a connection to each cluster
and the driver sets up connection pools to the nodes in the cluster. Anyway
there are a lot of processes, threads, callbacks going at once so it is fun
to watch.

On my regional cluster of small nodes in AWS I got about 3000 rows per
second transferred after things warmed up a bit - each row about 6kb.

ml

On Wed, Jun 4, 2014 at 11:49 AM, Laing, Michael 
wrote:

> OK Marcelo, I'll work on it today. -ml
>
>
> On Tue, Jun 3, 2014 at 8:24 PM, Marcelo Elias Del Valle <
> marc...@s1mbi0se.com.br> wrote:
>
>> Hi Michael,
>>
>> For sure I would be interested in this program!
>>
>> I am new both to python and for cql. I started creating this copier, but
>> was having problems with timeouts. Alex solved my problem here on the list,
>> but I think I will still have a lot of trouble making the copy to work fine.
>>
>> I open sourced my version here:
>> https://github.com/s1mbi0se/cql_record_processor
>>
>> Just in case it's useful for anything.
>>
>> However, I saw CQL has support for concurrency itself and having
>> something made by someone who knows Python CQL Driver better would be very
>> helpful.
>>
>> My two servers today are at OVH (ovh.com), we have servers at AWS but
>> but several cases we prefer other hosts. Both servers have SDD and 64 Gb
>> RAM, I could use the script as a benchmark for you if you want. Besides, we
>> have some bigger clusters, I could run on the just to test the speed if
>> this is going to help.
>>
>> Regards
>> Marcelo.
>>
>>
>> 2014-06-03 11:40 GMT-03:00 Laing, Michael :
>>
>> Hi Marcelo,
>>>
>>> I could create a fast copy program by repurposing some python apps that
>>> I am using for benchmarking the python driver - do you still need this?
>>>
>>> With high levels of concurrency and multiple subprocess workers, based
>>> on my current actual benchmarks, I think I can get well over 1,000
>>> rows/second on my mac and significantly more in AWS. I'm using variable
>>> size rows averaging 5kb.
>>>
>>> This would be the initial version of a piece of the benchmark suite we
>>> will release as part of our nyt⨍aбrik project on 21 June for my
>>> Cassandra Day NYC talk re the python driver.
>>>
>>> ml
>>>
>>>
>>> On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle <
>>> marc...@s1mbi0se.com.br> wrote:
>>>
>>>> Hi Jens,
>>>>
>>>> Thanks for trying to help.
>>>>
>>>> Indeed, I know I can't do it using just CQL. But what would you use to
>>>> migrate data manually? I tried to create a python program using auto
>>>> paging, but I am getting timeouts. I also tried Hive, but no success.
>>>> I only have two nodes and less than 200Gb in this cluster, any simple
>>>> way to extract the data quickly would be good enough for me.
>>>>
>>>> Best regards,
>>>> Marcelo.
>>>>
>>>>
>>>>
>>>> 2014-06-02 15:08 GMT-03:00 Jens Rantil :
>>>>
>>>> Hi Marcelo,
>>>>>
>>>>> Looks like you can't do this without migrating your data manually:
>>>>> https://stackoverflow.com/questions/18421668/alter-cassandra-column-family-primary-key-using-cassandra-cli-or-cql
>>>>>
>>>>> Cheers,
>>>>> Jens
>>>>>
>>>>>
>>>>> On Mon, Jun 2, 2014 at 7:48 PM, Marcelo Elias Del Valle <
>>>>> mar

Re: migration to a new model

2014-06-04 Thread Laing, Michael

BTW you might want to put a LIMIT clause on your SELECT for testing. -ml


On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael 
wrote:

> Marcelo,
>
> Here is a link to the preview of the python fast copy program:
>
> https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47
>
> It will copy a table from one cluster to another with some transformation-
> they can be the same cluster.
>
> It has 3 main throttles to experiment with:
>
>1. fetch_size: size of source pages in rows
>2. worker_count: number of worker subprocesses
>3. concurrency: number of async callback chains per worker subprocess
>
> It is easy to overrun Cassandra and the python driver, so I recommend
> starting with the defaults: fetch_size: 1000; worker_count: 2; concurrency:
> 10.
>
> Additionally there are switches to set 'policies' by source and
> destination: retry (downgrade consistency), dc_aware, and token_aware.
> retry is useful if you are getting timeouts. For the others YMMV.
>
> To use it you need to define the SELECT and UPDATE cql statements as well
> as the 'map_fields' method.
>
> The worker subprocesses divide up the token range among themselves and
> proceed quasi-independently. Each worker opens a connection to each cluster
> and the driver sets up connection pools to the nodes in the cluster. Anyway
> there are a lot of processes, threads, callbacks going at once so it is fun
> to watch.
>
> On my regional cluster of small nodes in AWS I got about 3000 rows per
> second transferred after things warmed up a bit - each row about 6kb.
>
> ml
>
>
> On Wed, Jun 4, 2014 at 11:49 AM, Laing, Michael  > wrote:
>
>> OK Marcelo, I'll work on it today. -ml
>>
>>
>> On Tue, Jun 3, 2014 at 8:24 PM, Marcelo Elias Del Valle <
>> marc...@s1mbi0se.com.br> wrote:
>>
>>> Hi Michael,
>>>
>>> For sure I would be interested in this program!
>>>
>>> I am new both to python and for cql. I started creating this copier, but
>>> was having problems with timeouts. Alex solved my problem here on the list,
>>> but I think I will still have a lot of trouble making the copy to work fine.
>>>
>>> I open sourced my version here:
>>> https://github.com/s1mbi0se/cql_record_processor
>>>
>>> Just in case it's useful for anything.
>>>
>>> However, I saw CQL has support for concurrency itself and having
>>> something made by someone who knows Python CQL Driver better would be very
>>> helpful.
>>>
>>> My two servers today are at OVH (ovh.com), we have servers at AWS but
>>> but several cases we prefer other hosts. Both servers have SDD and 64 Gb
>>> RAM, I could use the script as a benchmark for you if you want. Besides, we
>>> have some bigger clusters, I could run on the just to test the speed if
>>> this is going to help.
>>>
>>> Regards
>>> Marcelo.
>>>
>>>
>>> 2014-06-03 11:40 GMT-03:00 Laing, Michael :
>>>
>>> Hi Marcelo,
>>>>
>>>> I could create a fast copy program by repurposing some python apps that
>>>> I am using for benchmarking the python driver - do you still need this?
>>>>
>>>> With high levels of concurrency and multiple subprocess workers, based
>>>> on my current actual benchmarks, I think I can get well over 1,000
>>>> rows/second on my mac and significantly more in AWS. I'm using variable
>>>> size rows averaging 5kb.
>>>>
>>>> This would be the initial version of a piece of the benchmark suite we
>>>> will release as part of our nyt⨍aбrik project on 21 June for my
>>>> Cassandra Day NYC talk re the python driver.
>>>>
>>>> ml
>>>>
>>>>
>>>> On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle <
>>>> marc...@s1mbi0se.com.br> wrote:
>>>>
>>>>> Hi Jens,
>>>>>
>>>>> Thanks for trying to help.
>>>>>
>>>>> Indeed, I know I can't do it using just CQL. But what would you use to
>>>>> migrate data manually? I tried to create a python program using auto
>>>>> paging, but I am getting timeouts. I also tried Hive, but no success.
>>>>> I only have two nodes and less than 200Gb in this cluster, any simple
>>>>> way to extract the data quickly would be good enough for me.
>>>>>
>>>>> Best regards,
>>>>> Marcelo.
>>>>>
>>>>>
>&

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-06 Thread Laing, Michael

select * from test_paging where *token(*id*)* > token(0);

ml


On Fri, Jun 6, 2014 at 1:47 AM, Jonathan Haddad  wrote:

> Sorry, the datastax docs are actually a bit better:
> http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html
>
> Jon
>
>
> On Thu, Jun 5, 2014 at 10:46 PM, Jonathan Haddad 
> wrote:
>
>> You should read through the token docs, it has examples and
>> specifications: http://cassandra.apache.org/doc/cql3/CQL.html#tokenFun
>>
>>
>> On Thu, Jun 5, 2014 at 10:22 PM, Kevin Burton  wrote:
>>
>>> I'm building a new schema which I need to read externally by paging
>>> through the result set.
>>>
>>> My understanding from reading the documentation , and this list, is that
>>> I can do that but I need to use the token() function.
>>>
>>> Only it doesn't work.
>>>
>>> Here's a reduction:
>>>
>>>
>>> create table test_paging (
>>> id int,
>>> primary key(id)
>>> );
>>>
>>> insert into test_paging (id) values (1);
>>> insert into test_paging (id) values (2);
>>> insert into test_paging (id) values (3);
>>> insert into test_paging (id) values (4);
>>> insert into test_paging (id) values (5);
>>>
>>> select * from test_paging where id > token(0);
>>>
>>> … but it gives me:
>>>
>>> Bad Request: Type error: cannot assign result of function token (type
>>> bigint) to id (type int)
>>>
>>> …
>>>
>>> What's that about?  I can't find any documentation for this and there
>>> aren't any concise examples.
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> Skype: *burtonator*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations
>>> are people.
>>>
>>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> skype: rustyrazorblade
>>
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

python fast table copy/transform (subject updated)

2014-06-06 Thread Laing, Michael

Hi Marcelo,

I have updated the prerelease app in this gist:

https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47

I found that it was too easy to overrun my Cassandra clusters so I added a
throttle arg which by default is 1000 rows per second.

Fixed a few bugs too, reworked the args, etc.

I'll be interested to hear if you find it useful and/or have any comments.

ml


On Thu, Jun 5, 2014 at 1:09 PM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:

> Michael,
>
> I will try to test it up to tomorrow and I will let you know all the
> results.
>
> Thanks a lot!
>
> Best regards,
> Marcelo.
>
>
> 2014-06-04 22:28 GMT-03:00 Laing, Michael :
>
> BTW you might want to put a LIMIT clause on your SELECT for testing. -ml
>>
>>
>> On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael > > wrote:
>>
>>> Marcelo,
>>>
>>> Here is a link to the preview of the python fast copy program:
>>>
>>> https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47
>>>
>>> It will copy a table from one cluster to another with some
>>> transformation- they can be the same cluster.
>>>
>>> It has 3 main throttles to experiment with:
>>>
>>>1. fetch_size: size of source pages in rows
>>>2. worker_count: number of worker subprocesses
>>>3. concurrency: number of async callback chains per worker subprocess
>>>
>>> It is easy to overrun Cassandra and the python driver, so I recommend
>>> starting with the defaults: fetch_size: 1000; worker_count: 2; concurrency:
>>> 10.
>>>
>>> Additionally there are switches to set 'policies' by source and
>>> destination: retry (downgrade consistency), dc_aware, and token_aware.
>>> retry is useful if you are getting timeouts. For the others YMMV.
>>>
>>> To use it you need to define the SELECT and UPDATE cql statements as
>>> well as the 'map_fields' method.
>>>
>>> The worker subprocesses divide up the token range among themselves and
>>> proceed quasi-independently. Each worker opens a connection to each cluster
>>> and the driver sets up connection pools to the nodes in the cluster. Anyway
>>> there are a lot of processes, threads, callbacks going at once so it is fun
>>> to watch.
>>>
>>> On my regional cluster of small nodes in AWS I got about 3000 rows per
>>> second transferred after things warmed up a bit - each row about 6kb.
>>>
>>> ml
>>>
>>>
>>> On Wed, Jun 4, 2014 at 11:49 AM, Laing, Michael <
>>> michael.la...@nytimes.com> wrote:
>>>
>>>> OK Marcelo, I'll work on it today. -ml
>>>>
>>>>
>>>> On Tue, Jun 3, 2014 at 8:24 PM, Marcelo Elias Del Valle <
>>>> marc...@s1mbi0se.com.br> wrote:
>>>>
>>>>> Hi Michael,
>>>>>
>>>>> For sure I would be interested in this program!
>>>>>
>>>>> I am new both to python and for cql. I started creating this copier,
>>>>> but was having problems with timeouts. Alex solved my problem here on the
>>>>> list, but I think I will still have a lot of trouble making the copy to
>>>>> work fine.
>>>>>
>>>>> I open sourced my version here:
>>>>> https://github.com/s1mbi0se/cql_record_processor
>>>>>
>>>>> Just in case it's useful for anything.
>>>>>
>>>>> However, I saw CQL has support for concurrency itself and having
>>>>> something made by someone who knows Python CQL Driver better would be very
>>>>> helpful.
>>>>>
>>>>> My two servers today are at OVH (ovh.com), we have servers at AWS but
>>>>> but several cases we prefer other hosts. Both servers have SDD and 64 Gb
>>>>> RAM, I could use the script as a benchmark for you if you want. Besides, 
>>>>> we
>>>>> have some bigger clusters, I could run on the just to test the speed if
>>>>> this is going to help.
>>>>>
>>>>> Regards
>>>>> Marcelo.
>>>>>
>>>>>
>>>>> 2014-06-03 11:40 GMT-03:00 Laing, Michael :
>>>>>
>>>>> Hi Marcelo,
>>>>>>
>>>>>> I could create a fast copy program by repurposing some python apps
>>>>>> that I am using for benchmarking the python driver - do you still need 
>>>>>> th

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael

Perhaps if you described both the schema and the query in more detail, we
could help... e.g. did the query have an IN clause with 2 keys? Or is
the key compound? More detail will help.


On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma  wrote:

> I didn't explain clearly - I'm not requesting 2 unknown keys
> (resulting in a full scan), I'm requesting 2 specific rows by key.
> On Jun 10, 2014 6:02 PM, "DuyHai Doan"  wrote:
>
>> Hello Jeremy
>>
>> Basically what you are doing is to ask Cassandra to do a distributed full
>> scan on all the partitions across the cluster, it's normal that the nodes
>> are somehow stressed.
>>
>> How did you make the query? Are you using Thrift or CQL3 API?
>>
>> Please note that there is another way to get all partition keys : SELECT
>> DISTINCT  FROM..., more details here :
>> www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
>> I ran an application today that attempted to fetch 20,000+ unique row
>> keys in one query against a set of completely empty column families. On a
>> 4-node cluster (EC2 m1.large instances) with the recommended memory
>> settings (2 GB heap), every single node immediately ran out of memory and
>> became unresponsive, to the point where I had to kill -9 the cassandra
>> processes.
>>
>> Now clearly this query is not the best idea in the world, but the effects
>> of it are a bit disturbing. What could be going on here? Are there any
>> other query pitfalls I should be aware of that have the potential to
>> explode the entire cluster?
>>
>> -j
>>
>

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Laing, Michael

Just an FYI, my benchmarking of the new python driver, which uses the
asynchronous CQL native transport, indicates that one can largely overcome
client-to-node latency effects if you employ a suitable level of
concurrency and non-blocking techniques.

Of course response size and other factors come into play, but having a
hundred or so queries simultaneously in the pipeline from each worker
subprocess is a big help.


On Thu, Jun 12, 2014 at 10:46 AM, Jeremy Jongsma 
wrote:

> Good to know, thanks Peter. I am worried about client-to-node latency if I
> have to do 20,000 individual queries, but that makes it clearer that at
> least batching in smaller sizes is a good idea.
>
>
> On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford 
> wrote:
>
>> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma 
>> wrote:
>>
>>> The big problem seems to have been requesting a large number of row keys
>>> combined with a large number of named columns in a query. 20K rows with 20K
>>> columns destroyed my cluster. Splitting it into slices of 100 sequential
>>> queries fixed the performance issue.
>>>
>>> When updating 20K rows at a time, I saw a different issue -
>>> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
>>> that issue.
>>>
>>> Is there any documentation on this? Obviously these limits will vary by
>>> cluster capacity, but for new users it would be great to know that you can
>>> run into problems with large queries, and how they present themselves when
>>> you hit them. The errors I saw are pretty opaque, and took me a couple days
>>> to track down.
>>>
>>>
>> The first thing that comes to mind is the Multiget section on the
>> Datastax anti-patterns page:
>> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningAntiPatterns_c.html?scroll=concept_ds_emm_hwl_fk__multiple-gets
>>
>>
>>
>> -psanford
>>
>>
>>
>

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Laing, Michael

Just to add 2 more cents... :)

The CQL3 protocol is asynchronous. This can provide a substantial
throughput increase, according to my benchmarking, when one uses
non-blocking techniques.

It is also peer-to-peer. Hence the server can generate events to send to
the client, e.g. schema changes - in general, 'triggers' become possible.

ml


On Fri, Jun 13, 2014 at 6:21 PM, graham sanderson  wrote:

> My 2 cents…
>
> A motivation for CQL3 AFAIK was to make Cassandra more familiar to SQL
> users. This is a valid goal, and works well in many cases.
> Equally there are use cases (that some might find ugly) where Cassandra is
> chosen explicitly because of the sorts of things you can do at the thrift
> level, which aren’t (currently) exposed via CQL3
>
> To Robert’s point earlier - "Rational people should presume that Thrift
> support must eventually disappear”… he is probably right (though frankly
> I’d rather the non-blocking thrift version was added instead). However if
> we do get rid of the thrift interface, then it needs to be at a time that
> CQLn is capable of expressing all the things you could do via the thrift
> API. Note, I need to go look and see if the non-blocking thrift version
> also requires materializing the entire thrift object in memory.
>
> On Jun 13, 2014, at 4:55 PM, DuyHai Doan  wrote:
>
> There are always the pros and the cons with a querying language, as always.
>
> But as far as I can see, the advantages of Thrift I can see over CQL3 are:
>
>  1) Thrift require a little bit less decoding server-side (a difference
> around 10% in CPU usage).
>
>  2) Thrift use more "compact" storage because CQL3 need to add extra
> "marker" columns to guarantee the existence of primary key. It is worsen
> when you use clustering columns because for each distinct clustering group
> you have a related "marker" columns.
>
>  That being said, point 1) is not really an issue since most of the time
> nodes are more I/O bound than CPU bound. Only in extreme cases where you
> have incredible read rate with data that fits entirely in memory that you
> may notice the difference.
>
>  For point 2) this is a small trade-off to have access to a query language
> and being able to do slice queries using the WHERE clause. Some like it,
> other hate it, it's just a question of taste.  Please note that the "waste"
> in disk space is somehow mitigated by compression.
>
>  Long story short I think Thrift may have appropriate usage but only in
> very few use cases. Recently a lot of improvement and features have been
> added to CQL3 so that it shoud be considered as the first choice for most
> users and if they fall into those few use cases then switch back to Thrift
>
> My 2 cents
>
>
>
>
>
>
> On Fri, Jun 13, 2014 at 11:43 PM, Peter Lin  wrote:
>
>>
>> With text based query approach like CQL, you loose the type with dynamic
>> columns. Yes, we're storing it as bytes, but it is simpler and easier with
>> Thrift to do these types of things.
>>
>> I like CQL3 and what it does, but text based query languages make certain
>> dynamic schema use cases painful. Having used and built ORM's they are
>> poorly suited to dynamic schemas. If you've never had to write an ORM to
>> handle dynamic user defined schemas at runtime, it's tough to see where the
>> problems arise and how that makes life painful.
>>
>> Just to be clear, I'm not saying "don't use CQL3" or "CQL3 is bad". I'm
>> saying CQL3 is good for certain kinds of use cases and Thrift is good at
>> certain use cases. People need to look at what and how they're storing data
>> and do what makes the most sense to them. Slavishly following CQL3 doesn't
>> make any sense to me.
>>
>>
>>
>> On Fri, Jun 13, 2014 at 5:30 PM, DuyHai Doan 
>> wrote:
>>
>>> "the validation type is set to bytes, and my code is type safe, so it
>>> knows which serializers to use. Those dynamic columns are driven off the
>>> types in Java."  --> Correct. However, you are still bound by the column
>>> comparator type which should be fixed (unless again you set it to bytes, in
>>> this case you loose the ordering and sorting feature)
>>>
>>>  Basically what you are doing is telling Cassandra to save data in the
>>> cells as raw bytes, the serialization is taken care client side using the
>>> appropriate serializer. This is perfectly a valid strategy.
>>>
>>>  But how is it different from using CQL3 and setting the value to "blob"
>>> (equivalent to bytes) and take care of the serialization client-side also ?
>>> You can even imagine saving value in JSON format and set the type to "text".
>>>
>>>  Really, I don't see why CQL3 cannot achieve the scenario you describe.
>>>
>>>  For the record, when you create a table in CQL3 as follow:
>>>
>>>  CREATE TABLE user (
>>>  id bigint PRIMARY KEY,
>>>  firstname text,
>>>  lastname text,
>>>  last_connection timestamp,
>>>  );
>>>
>>>  C* will create a column family with validation type = bytes to
>>> accommodate the timestamp and text type

Re: Summarizing Timestamp datatype

2014-06-17 Thread Laing, Michael

If you can arrange to index your rows by:

(, )

Then you can select ranges as you wish.

This works because  is the "partition key", arrived at by
hash (really it's a hash key), whereas  is the "clustering
key" (really it is a range key) which is kept in sorted order both in
memory and on disk.

If you don't have too many rows,  can be a constant.

If you want to avoid hot spots, and/or have more rows, then  can be a shard, e.g. an int from 0 to 11. Then you can use IN to
select, plus your range, and it works very nicely in practice (in my
experience) despite being considered by some as an anti-pattern.

ml

On Tue, Jun 17, 2014 at 8:41 PM, Jason Lewis  wrote:

> I have data stored with the timestamp datatype. Is it possible to use
> CQL to return results based on if a row falls in a range for a day?
>
> Ex. If I have 20 rows that occur on 2014-06-10, no rows for 2014-06-11
> and 15 rows that occured on 2014-06-12, I'd like to only return
> results that data exists for 2014-06-10 and 2014-06-12.
>
> Does that makes sense?  Is it possible?  CQL doesn't seem super
> flexible and I haven't had luck coming up with a CQL solution.
>
> jas
>

Re: Summarizing Timestamp datatype

2014-06-18 Thread Laing, Michael

Well then you better provide your schema and query, as I select ranges like
this all the time using CQL and I (at least) must not understand your
problem from the description so far.


On Wed, Jun 18, 2014 at 2:54 AM, DuyHai Doan  wrote:

> Hello Jason
>
> If you want to check for presence / absence of data for a day, you can add
> the date as a composite component to your partition key. Cassandra will
> rely on the bloom filter and avoid hitting disk for maximum performance.
>
> The only drawback of this modelling is that you need to provide the date
> each time you query your data
> Le 18 juin 2014 04:22, "Jason Lewis"  a écrit :
>
> That's how my schema is built. So far, I'm pulling the data out by a
>> range of 30 days.  I want to see if I have data for every day, just
>> wondering if it's possible in the CQL, as opposed to how i'm doing it
>> now, in python.
>>
>> On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael
>>  wrote:
>> > If you can arrange to index your rows by:
>> >
>> > (, )
>> >
>> > Then you can select ranges as you wish.
>> >
>> > This works because  is the "partition key", arrived at
>> by
>> > hash (really it's a hash key), whereas  is the
>> "clustering
>> > key" (really it is a range key) which is kept in sorted order both in
>> memory
>> > and on disk.
>> >
>> > If you don't have too many rows,  can be a constant.
>> >
>> > If you want to avoid hot spots, and/or have more rows, then > else>
>> > can be a shard, e.g. an int from 0 to 11. Then you can use IN to select,
>> > plus your range, and it works very nicely in practice (in my experience)
>> > despite being considered by some as an anti-pattern.
>> >
>> > ml
>> >
>> >
>> > On Tue, Jun 17, 2014 at 8:41 PM, Jason Lewis 
>> wrote:
>> >>
>> >> I have data stored with the timestamp datatype. Is it possible to use
>> >> CQL to return results based on if a row falls in a range for a day?
>> >>
>> >> Ex. If I have 20 rows that occur on 2014-06-10, no rows for 2014-06-11
>> >> and 15 rows that occured on 2014-06-12, I'd like to only return
>> >> results that data exists for 2014-06-10 and 2014-06-12.
>> >>
>> >> Does that makes sense?  Is it possible?  CQL doesn't seem super
>> >> flexible and I haven't had luck coming up with a CQL solution.
>> >>
>> >> jas
>> >
>> >
>>
>

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Laing, Michael

However my extensive benchmarking this week of the python driver from
master shows a performance *decrease* when using 'token_aware'.

This is on 12-node, 2-datacenter, RF-3 cluster in AWS.

Also why do the work the coordinator will do for you: send all the queries,
wait for everything to come back in whatever order, and sort the result.

I would rather keep my app code simple.

But the real point is that you should benchmark in your own environment.

ml


On Fri, Jun 20, 2014 at 3:29 AM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:

> Yes, I am using the CQL datastax drivers.
> It was a good advice, thanks a lot Janathan.
> []s
>
>
> 2014-06-20 0:28 GMT-03:00 Jonathan Haddad :
>
> The only case in which it might be better to use an IN clause is if
>> the entire query can be satisfied from that machine.  Otherwise, go
>> async.
>>
>> The native driver reuses connections and intelligently manages the
>> pool for you.  It can also multiplex queries over a single connection.
>>
>> I am assuming you're using one of the datastax drivers for CQL, btw.
>>
>> Jon
>>
>> On Thu, Jun 19, 2014 at 7:37 PM, Marcelo Elias Del Valle
>>  wrote:
>> > This is interesting, I didn't know that!
>> > It might make sense then to use select = + async + token aware, I will
>> try
>> > to change my code.
>> >
>> > But would it be a "recomended solution" for these cases? Any other
>> options?
>> >
>> > I still would if this is the right use case for Cassandra, to look for
>> > random keys in a huge cluster. After all, the amount of connections to
>> > Cassandra will still be huge, right... Wouldn't it be a problem?
>> > Or when you use async the driver reuses the connection?
>> >
>> > []s
>> >
>> >
>> > 2014-06-19 22:16 GMT-03:00 Jonathan Haddad :
>> >
>> >> If you use async and your driver is token aware, it will go to the
>> >> proper node, rather than requiring the coordinator to do so.
>> >>
>> >> Realistically you're going to have a connection open to every server
>> >> anyways.  It's the difference between you querying for the data
>> >> directly and using a coordinator as a proxy.  It's faster to just ask
>> >> the node with the data.
>> >>
>> >> On Thu, Jun 19, 2014 at 6:11 PM, Marcelo Elias Del Valle
>> >>  wrote:
>> >> > But using async queries wouldn't be even worse than using SELECT IN?
>> >> > The justification in the docs is I could query many nodes, but I
>> would
>> >> > still
>> >> > do it.
>> >> >
>> >> > Today, I use both async queries AND SELECT IN:
>> >> >
>> >> > SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + "
>> >> > WHERE
>> >> > name=%s and value in(%s)"
>> >> >
>> >> > for name, values in identifiers.items():
>> >> >query = self.SELECT_ENTITY_LOOKUP % ('%s',
>> >> > ','.join(['%s']*len(values)))
>> >> >args = [name] + values
>> >> >query_msg = query % tuple(args)
>> >> >futures.append((query_msg, self.session.execute_async(query,
>> args)))
>> >> >
>> >> > for query_msg, future in futures:
>> >> >try:
>> >> >   rows = future.result(timeout=10)
>> >> >   for row in rows:
>> >> > entity_ids.add(row.entity_id)
>> >> >except:
>> >> >   logging.error("Query '%s' returned ERROR " % (query_msg))
>> >> >   raise
>> >> >
>> >> > Using async just with select = would mean instead of 1 async query
>> >> > (example:
>> >> > in (0, 1, 2)), I would do several, one for each value of "values"
>> array
>> >> > above.
>> >> > In my head, this would mean more connections to Cassandra and the
>> same
>> >> > amount of work, right? What would be the advantage?
>> >> >
>> >> > []s
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > 2014-06-19 22:01 GMT-03:00 Jonathan Haddad :
>> >> >
>> >> >> Your other option is to fire off async queries.  It's pretty
>> >> >> straightforward w/ the java or python drivers.
>> >> >>
>> >> >> On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
>> >> >>  wrote:
>> >> >> > I was taking a look at Cassandra anti-patterns list:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
>> >> >> >
>> >> >> > Among then is
>> >> >> >
>> >> >> > SELECT ... IN or index lookups¶
>> >> >> >
>> >> >> > SELECT ... IN and index lookups (formerly secondary indexes)
>> should
>> >> >> > be
>> >> >> > avoided except for specific scenarios. See When not to use IN in
>> >> >> > SELECT
>> >> >> > and
>> >> >> > When not to use an index in Indexing in
>> >> >> >
>> >> >> > CQL for Cassandra 2.0"
>> >> >> >
>> >> >> > And Looking at the SELECT doc, I saw:
>> >> >> >
>> >> >> > When not to use IN¶
>> >> >> >
>> >> >> > The recommendations about when not to use an index apply to using
>> IN
>> >> >> > in
>> >> >> > the
>> >> >> > WHERE clause. Under most conditions, using IN in the WHERE clause
>> is
>> >> >> > not
>> >> >> > recommended. Using IN can degrade performance because usually many
>> >> >> > nodes
>> >> >> > must be queried. For exam

Re: Does the default LIMIT applies to automatic paging?

2014-06-24 Thread Laing, Michael

And with python use future.has_more_pages and
future.start_fetching_next_page().


On Tue, Jun 24, 2014 at 1:20 AM, DuyHai Doan  wrote:

> With the Java Driver,  set the fetchSize and use ResultSet.iterator
> Le 24 juin 2014 01:04, "ziju feng"  a écrit :
>
> Hi All,
>>
>> I have a wide row table that I want to iterate through all rows under a
>> specific partition key. The table may contains around one million rows per
>> partition
>>
>> I was wondering if the default 1 rows LIMIT applies to automatic
>> pagination in C* 2.0 (I'm using Datastax driver). If so, what is best way
>> to retrieve all rows of a given partition? Should I use a super large LIMIT
>> value or should I manually page through the table?
>>
>> Thanks,
>>
>> Ziju
>>
>

Re: How to maintain the N-most-recent versions of a value?

2014-07-18 Thread Laing, Michael

The cql you provided is invalid. You probably meant something like:

CREATE TABLE foo (
>
> rowkey text,
>
> family text,
>
> qualifier text,
>
> version int,
>
> value blob,
>
> PRIMARY KEY ((rowkey, family, qualifier), version))
>
> WITH CLUSTERING ORDER BY (version DESC);
>
>
 We use ttl's and LIMIT for structures like these, paying attention to the
construction of the partition key so that partition sizes are reasonable.

If the blob might be large, store it somewhere else. We use S3 but you
could also put it in another C* table.

In 2.1 the row cache may help as it will store N rows per recently accessed
partition, starting at the beginning of the partition.

ml


On Fri, Jul 18, 2014 at 6:30 AM, Benedict Elliott Smith <
belliottsm...@datastax.com> wrote:

> If the versions can be guaranteed to be a adjacent (i.e. if the latest
> version is V, the prior version is V-1) you could issue a delete at the
> same time as an insert for V-N-(buffer) where buffer >= 0
>
> In general guaranteeing that is probably hard, so this seems like
> something that would be nice to have C* manage for you. Unfortunately we
> don't have anything on the roadmap to help with this. A custom compaction
> strategy might do the trick, or permitting some filter during compaction
> that can omit/tombstone certain records based on the input data. This
> latter option probably wouldn't be too hard to implement, although it might
> not offer any guarantees about expiring records in order without incurring
> extra compaction cost (you could reasonably easily guarantee the most
> recent N are present, but the cleaning up of older records might happen
> haphazardly, in no particular order, and without any promptness guarantees,
> if you want to do it cheaply). Feel free to file a ticket, or submit a
> patch!
>
>
> On Fri, Jul 18, 2014 at 1:32 AM, Clint Kelly 
> wrote:
>
>> Hi everyone,
>>
>> I am trying to design a schema that will keep the N-most-recent
>> versions of a value.  Currently my table looks like the following:
>>
>> CREATE TABLE foo (
>> rowkey text,
>> family text,
>> qualifier text,
>> version long,
>> value blob,
>> PRIMARY KEY (rowkey, family, qualifier, version))
>> WITH CLUSTER ORDER BY (rowkey ASC, family ASC, qualifier ASC, version
>> DESC));
>>
>> Is there any standard design pattern for updating such a layout such
>> that I keep the N-most-recent (version, value) pairs for every unique
>> (rowkey, family, qualifier)?  I can't think of any way to do this
>> without doing a read-modify-write.  The best thing I can think of is
>> to use TTL to approximate the desired behavior (which will work if I
>> know how often we are writing new data to the table).  I could also
>> use "LIMIT N" in my queries to limit myself to only N items, but that
>> does not address any of the storage-size issues.
>>
>> In case anyone is curious, this question is related to some work that
>> I am doing translating a system built on HBase (which provides this
>> "keep the N-most-recent-version-of-a-cell" behavior) to Cassandra
>> while providing the user with as-similar-as-possible an interface.
>>
>> Best regards,
>> Clint
>>
>
>

Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael

We use IN (keeping the number down). The coordinator does parallel dispatch
AND applies ORDERED BY to the aggregate results, which we would otherwise
have to do ourselves. Anyway, worth it for us.

ml


On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton  wrote:

> Perhaps the best strategy is to have the datastax java-driver do this and
> I just wait or each result individually.  This will give me parallel
> dispatch.
>
>
> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson 
> wrote:
>
>> Of course the driver in question is allowed to be smarter and can do so
>> if use use a ? parameter for a list or even individual elements
>>
>> I'm not sure which if any drivers currently do this but we plan to
>> combine this with token aware routing in our scala driver in the future
>>
>> Sent from my iPhone
>>
>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:
>>
>> Nope. Select ... IN() sends one request to a coordinator. This
>> coordinator dispatch the request to 50 nodes as in your example and waits
>> for 50 responses before sending back the final result. As you can guess
>> this approach is not optimal since the global request latency is bound to
>> the slowest latency among 50 nodes.
>>
>>  On the other hand if you use async feature from the native protocol, you
>> client will issue 50 requests in parallel and the answers arrive as soon as
>> they are fetched from different nodes.
>>
>>  Clearly the only advantage of using IN() clause is ease of query. I
>> would advise to use IN() only when you have a "few" values, not 50.
>>
>>
>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton  wrote:
>>
>>> Say I have about 50 primary keys I need to fetch.
>>>
>>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
>>> has record, I can read from all 50 at once.
>>>
>>> I assume cassandra does the right thing here ?  I believe it does… at
>>> least from reading the docs but it's still a bit unclear.
>>>
>>> Kevin
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>>  … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>

Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael

Except then you have to merge results if you want them ordered.


On Fri, Jul 25, 2014 at 2:15 PM, Kevin Burton  wrote:

> Ah.. ok. Nice.  That should work.  Parallel dispatch on the client would
> work too.. using async.
>
>
> On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael  > wrote:
>
>> We use IN (keeping the number down). The coordinator does parallel
>> dispatch AND applies ORDERED BY to the aggregate results, which we would
>> otherwise have to do ourselves. Anyway, worth it for us.
>>
>> ml
>>
>>
>> On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton  wrote:
>>
>>> Perhaps the best strategy is to have the datastax java-driver do this
>>> and I just wait or each result individually.  This will give me parallel
>>> dispatch.
>>>
>>>
>>> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson 
>>> wrote:
>>>
>>>> Of course the driver in question is allowed to be smarter and can do so
>>>> if use use a ? parameter for a list or even individual elements
>>>>
>>>> I'm not sure which if any drivers currently do this but we plan to
>>>> combine this with token aware routing in our scala driver in the future
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:
>>>>
>>>> Nope. Select ... IN() sends one request to a coordinator. This
>>>> coordinator dispatch the request to 50 nodes as in your example and waits
>>>> for 50 responses before sending back the final result. As you can guess
>>>> this approach is not optimal since the global request latency is bound to
>>>> the slowest latency among 50 nodes.
>>>>
>>>>  On the other hand if you use async feature from the native protocol,
>>>> you client will issue 50 requests in parallel and the answers arrive as
>>>> soon as they are fetched from different nodes.
>>>>
>>>>  Clearly the only advantage of using IN() clause is ease of query. I
>>>> would advise to use IN() only when you have a "few" values, not 50.
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton 
>>>> wrote:
>>>>
>>>>> Say I have about 50 primary keys I need to fetch.
>>>>>
>>>>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and
>>>>> each has record, I can read from all 50 at once.
>>>>>
>>>>> I assume cassandra does the right thing here ?  I believe it does… at
>>>>> least from reading the docs but it's still a bit unclear.
>>>>>
>>>>> Kevin
>>>>>
>>>>> --
>>>>>
>>>>> Founder/CEO Spinn3r.com
>>>>> Location: *San Francisco, CA*
>>>>> blog: http://burtonator.wordpress.com
>>>>>  … or check out my Google+ profile
>>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>> <http://spinn3r.com>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>> <http://spinn3r.com>
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Re: IN clause with composite primary key?

2014-07-25 Thread Laing, Michael

You may also want to use tuples for the clustering columns:

The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS:
>
> SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN 
> (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
>
>
> from https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt


On Fri, Jul 25, 2014 at 2:29 PM, DuyHai Doan  wrote:

> Below are the rules for IN clause
>
> a. composite partition keys: the IN clause only applies to the last
> composite component
> b. clustering keys: the IN clause only applies to the last clustering key
>
> Contrived example:
>
> CREATE TABLE test(
>pk1 int,
>pk2 int,
>clust1 int,
>clust2 int,
>clust3 int,
>PRIMARY KEY ((pk1,pk2), clust1, clust2, clust3));
>
> Possible queries
>
> SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3);
> SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3) AND col1=1 AND col2=2
> AND col3 IN (3,4,5);
>
> Theoretically there should be possible to do   SELECT * FROM test WHERE
> pk1 IN(1,2)  AND pk2 =3;  or SELECT * FROM test WHERE pk1 IN(1,2)  AND pk2
> IN (3,4) because the values in the IN() clause are just expanded to all
> linear combinations with other composites of the partiton key. But for some
> reason it's not allowed.
>
> However the restriction of IN clause for the clustering keys some how
> makes sense. Having multiple clustering keys, if you allow using IN clause
> for the first or any clustering key that is not the last one, C* would have
> to do a very large slice to pick some discrete values matching the IN()
> clause ...
>
>
>
>
>
>
>
> On Fri, Jul 25, 2014 at 11:17 PM, Kevin Burton  wrote:
>
>> How the heck would you build an IN clause with a primary key which is
>> composite?
>>
>> so say columns foo and bar are the primary key.
>>
>> if you just had foo as your column name, you can do
>>
>> where foo in ()
>>
>> … but with two keys I don't see how it's possible.
>>
>> specifying both actually builds a cartesian product.  which is kind of
>> cool but not what I want :)
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>

Re: Measuring WAN replication latency

2014-07-29 Thread Laing, Michael

I saw this awhile back:

With requests possibly coming in from either US region, we need to make
> sure that the replication of data happens within an acceptable time
> threshold. This lead us to perform an experiment where we wrote 1 million
> records in one region of a multi-region cluster. We then initiated a read,
> 500ms later, in the other region, of the records that were just written in
> the initial region, while keeping a production level of load on the
> cluster. All records were successfully read.


from
http://techblog.netflix.com/2013/12/active-active-for-multi-regional.html


On Tue, Jul 29, 2014 at 5:53 PM, Robert Coli  wrote:

> On Tue, Jul 29, 2014 at 3:15 PM, Rahul Neelakantan  wrote:
>
>> Does anyone know of a way to measure/monitor WAN replication latency for
>> Cassandra?
>
>
> No. [1]
>
> =Rob
>
> [1] There are ways to do something like this task, but you probably don't
> actually want to do them. Trying to do them suggests that you are relying
> on WAN replication timing for your application, which is something you
> almost certainly do not want to do. Why do you believe you have this
> requirement?
>

Re: select many rows one time or select many times?

2014-08-01 Thread Laing, Michael

I don't think there is an easy "answer" to this...

A possible approach, based upon the implied dimensions of the problem,
would be to maintain a bloom filter over "words" for each user as a
partition key with the user as clustering key. Then a single query would
efficiently yield the list of users that "may" match and other techniques
could be used to refine that list down to actual matches.

ml


On Thu, Jul 31, 2014 at 10:44 AM, Philo Yang  wrote:

> Hi all,
>
> I have a cluster of 2.0.6 and one of my tables is like this:
> CREATE TABLE word (
>   user text,
>   word text,
>   flag double,
>   PRIMARY KEY (user, word)
> )
>
> each "user" has about 1 "word" per node. I have a requirement of
> selecting all rows where user='someuser' and word is in a large set whose
> size is about 1000 .
>
> In C* document, it is not recommended to use "select ... in" just like:
>
> select from word where user='someuser' and word in ('a','b','aa','ab',...)
>
> So now I select all rows where user='someuser' and filtrate them via
> client rather than via C*. Of course, I use Datastax Java Driver to page
> the resultset by setFetchSize(1000).  Is it the best way? I found the
> system's load is high because of large range query, should I change to
> select for only one row each time and select 1000 times?
>
> just like:
> select from word where user='someuser' and word = 'a';
> select from word where user='someuser' and word = 'b';
> select from word where user='someuser' and word = 'c';
> .
>
> Which method will cause lower pressure on Cassandra cluster?
>
> Thanks,
> Philo Yang
>
>

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael

Actually I think you do want to use scopeId, scopeType as the partition key
(and drop row caching until you upgrade to 2.1 where "rows" are in fact
rows and not partitions):

CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
(
scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
timestamp bigint,
PRIMARY KEY ((scopeId , scopeType), nodeId, nodeType)
);

Then you can select using IN on the cartesian product of your clustering
keys:

SELECT timestamp
FROM  Graph_Marked_Nodes
WHERE scopeId = ?
AND scopeType = ?
AND (nodeId, nodeType) IN (
(uuid1, 'foo'), (uuid1, 'bar'),
(uuid2, 'foo'), (uuid2, 'bar'),
(uuid3, 'foo'), (uuid3, 'bar')
);

ml

PS Of course you could boldly go to 2.1 now for a nice performance boost :)




On Sat, Aug 30, 2014 at 8:59 PM, Todd Nine  wrote:

> Hi all,
>   I'm working on transferring our thrift DAOs over to CQL.  It's going
> well, except for 2 cases that both use multi get.  The use case is very
> simple.  It is a narrow row, by design, with only a few columns.  When I
> perform a multiget, I need to get up to 1k rows at a time.  I do not want
> to turn these into a wide row using scopeId and scopeType as the row key.
>
>
> On the physical level, my Column Family needs something similar to the
> following format.
>
>
> scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 }
>
>
> I've defined by table with the following CQL.
>
>
> CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
> ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
> timestamp bigint,
> PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType))
> )WITH caching = 'all'
>
>
> This works well for inserts deletes and single reads.  I always know the
> scopeId, scopeType, nodeId, and nodeType, so I want to return the timestamp
> columns.  I thought I could use the IN operation and specify the pairs of
> nodeId and nodeTypes I have as input, however this doesn't work.
>
> Can anyone give me a suggestion on how to perform a multiget when I have
> several values for the nodeId and the nodeType?  This read occurs on every
> read of edges so making 1k trips is not going to work from a performance
> perspective.
>
> Below is the query I've tried.
>
> SELECT timestamp FROM  Graph_Marked_Nodes WHERE scopeId = ? AND scopeType
> = ? AND nodeId IN (uuid1, uuid2, uuid3) AND nodeType IN ('foo','bar')
>
> I've found this issue, which looks like it's a solution to my problem.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6875
>
> However, I'm not able to get the syntax in the issue description to work
> either.  Any input would be appreciated!
>
> Cassandra: 2.0.10
> Datastax Driver: 2.1.0
>
> Thanks,
> Todd
>
>
>
>
>

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael

You need to switch gears to new terminology as well - a "thrift row" is a
partition now, etc... :)

So yes - the *partition* key of the *table* would be scopeId, scopeType in
my proposed scheme.

But the partitions would be too big, given what you describe.

You could shard the rows, but even then they would be large and retrieval
with IN on the shard would put a lot of pressure on the cluster and
coordinator. That's what we do to avoid hot spots but our numbers are much
smaller. Also we never delete, just ttl and compact.

If the type of a node is known at the time its uuid is assigned, you could
embed the type in the uuid, e.g. by taking over either part of the MAC
address or some of the random bits in a uuid v1. This would greatly
simplify the problem (presuming the types have low cardinality).

E.g:
CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
(
scopeId uuid, scopeType varchar, nodeIdType timeuuid, timestamp bigint,

PRIMARY KEY ((scopeId , scopeType, nodeIdType))
);

SELECT timestamp
FROM  Graph_Marked_Nodes
WHERE scopeId = ?
AND scopeType = ?
AND (nodeIdType) IN (uuid1foo, uuid2bar, uuid3foo);

A possible similar approach would be to use User Defined Types in 2.1, but
I haven't even looked at that yet.

There are blog posts from Datastax describing internal structures - and
then there is the source of course :)

ml


On Sun, Aug 31, 2014 at 11:06 AM, Todd Nine  wrote:

> Hey Michael,
>Thanks for the response.  If I use the clustered columns in the way you
> described, won't that make the row key of the column family scopeId and
> scopeType?
>
> The scope fields represent a graph's owner.  The graph itself can have
> several billion nodes in it.  When a lot of deletes start occurring on the
> same graph, I will quickly saturate the row capacity of a column family if
> the physical row key is only the scope.
>
> This is why I have each node on its own row key.  As long as our cluster
> has the capacity to handle the load, we won't hit the upper bounds of the
> maximum columns in a row.
>
> I'm new to CQL in our code.  I've only been using it for
> administration.  I've been using the thrift interface in code since the 0.6
> days.
>
> I feel I have a strong understanding of the internals of the column family
> structure.   I'm struggling to find documentation on the CQL to physical
> layout that isn't a trivial example, especially are around multiget use
> cases.  Do you have any pointers to blogs or tutorials you've found
> helpful?
>
> Thanks,
> Todd
>
>
> On Sunday, August 31, 2014, Laing, Michael 
> wrote:
>
>> Actually I think you do want to use scopeId, scopeType as the partition
>> key (and drop row caching until you upgrade to 2.1 where "rows" are in fact
>> rows and not partitions):
>>
>> CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
>> (
>> scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
>> timestamp bigint,
>> PRIMARY KEY ((scopeId , scopeType), nodeId, nodeType)
>> );
>>
>> Then you can select using IN on the cartesian product of your clustering
>> keys:
>>
>> SELECT timestamp
>> FROM  Graph_Marked_Nodes
>> WHERE scopeId = ?
>> AND scopeType = ?
>> AND (nodeId, nodeType) IN (
>> (uuid1, 'foo'), (uuid1, 'bar'),
>> (uuid2, 'foo'), (uuid2, 'bar'),
>> (uuid3, 'foo'), (uuid3, 'bar')
>> );
>>
>> ml
>>
>> PS Of course you could boldly go to 2.1 now for a nice performance boost
>> :)
>>
>>
>>
>>
>> On Sat, Aug 30, 2014 at 8:59 PM, Todd Nine  wrote:
>>
>>> Hi all,
>>>   I'm working on transferring our thrift DAOs over to CQL.  It's going
>>> well, except for 2 cases that both use multi get.  The use case is very
>>> simple.  It is a narrow row, by design, with only a few columns.  When I
>>> perform a multiget, I need to get up to 1k rows at a time.  I do not want
>>> to turn these into a wide row using scopeId and scopeType as the row key.
>>>
>>>
>>> On the physical level, my Column Family needs something similar to the
>>> following format.
>>>
>>>
>>> scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 }
>>>
>>>
>>> I've defined by table with the following CQL.
>>>
>>>
>>> CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
>>> ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
>>> timestamp bigint,
>>> PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType))
>>> )WITH caching = 'all&#x

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael

Are event_time and timestamp essentially representing the same datetime?

On Sunday, August 31, 2014, Subodh Nijsure  wrote:

> I have following database schema
>
> CREATE TABLE sensor_info_table (
>   asset_id text,
>   event_time timestamp,
>   "timestamp" timeuuid,
>   sensor_reading map,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id), event_time, "timestamp")
> );
>
> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>
> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>
> Now I am able to insert the data into this table, however I am unable
> to do following query where I want to select items with specific
> timeuuid values.
>
> It gives me following error.
>
> SELECT * from mydb.sensor_info_table where timestamp IN (
> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>
> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
> (preceding column "event_time" is either not restricted or by a non-EQ
> relation)
>
> What do I have to do to make this work?
>
> For what its worth I am using django for my front end development and
> I am using "timestamp timeuuid" field as unique indentifier to
> reference specific sensor reading from django framework -- since
> cassandra doesn't have way to generate unique id upon insert (like
> old-style rdms's auto-fields).
>
>
> Below is software version info.
>
> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
> Thrift protocol 19.39.0]
>
> I really don't understand what the error message preceeding column
> "event_time" is either not restricted or by no-EQ relation?
>
> -Subodh Nijsure
>

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael

Hmm. Because the clustering key is (event_time, "timestamp"), event_time
must be specified as well - hopefully that info is available to the ux.

Unfortunately you will then hit another problem with your query: you are
selecting a collection field... this will not work with IN on "timestamp".

So you could select all the "timestamp"s for an asset_id/event_time:

> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
> 1231234;


Or you could apply a range of "timestamp"s:

> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;


BTW the secondary indices are not a good idea: high cardinality and of no
use in this query that I can see.

ml


On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure 
wrote:

> Not really event time stamp is created by the sensor when it reads data
> and  timestamp is something server creates when inserting data into
> cassandra db.  At later point in time my django ux allows users to browse
> this data and reference interesting data points via the timestamp field.
> The timestamp field is my bridge between Sal and nosql world.
>
> Subodh
> On Aug 31, 2014 5:33 PM, "Laing, Michael" 
> wrote:
>
>> Are event_time and timestamp essentially representing the same datetime?
>>
>> On Sunday, August 31, 2014, Subodh Nijsure 
>> wrote:
>>
>>> I have following database schema
>>>
>>> CREATE TABLE sensor_info_table (
>>>   asset_id text,
>>>   event_time timestamp,
>>>   "timestamp" timeuuid,
>>>   sensor_reading map,
>>>   sensor_serial_number text,
>>>   sensor_type int,
>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>> );
>>>
>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>>
>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>>
>>> Now I am able to insert the data into this table, however I am unable
>>> to do following query where I want to select items with specific
>>> timeuuid values.
>>>
>>> It gives me following error.
>>>
>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>>
>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>> (preceding column "event_time" is either not restricted or by a non-EQ
>>> relation)
>>>
>>> What do I have to do to make this work?
>>>
>>> For what its worth I am using django for my front end development and
>>> I am using "timestamp timeuuid" field as unique indentifier to
>>> reference specific sensor reading from django framework -- since
>>> cassandra doesn't have way to generate unique id upon insert (like
>>> old-style rdms's auto-fields).
>>>
>>>
>>> Below is software version info.
>>>
>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>> Thrift protocol 19.39.0]
>>>
>>> I really don't understand what the error message preceeding column
>>> "event_time" is either not restricted or by no-EQ relation?
>>>
>>> -Subodh Nijsure
>>>
>>

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael

Oh it must be late - I missed the fact that you didn't want to specify
asset_id. The above queries will still work but you have to use 'allow
filtering' - generally not a good idea. I'll look again in the morning.


On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael 
wrote:

> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
> must be specified as well - hopefully that info is available to the ux.
>
> Unfortunately you will then hit another problem with your query: you are
> selecting a collection field... this will not work with IN on "timestamp".
>
> So you could select all the "timestamp"s for an asset_id/event_time:
>
>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> 1231234;
>
>
> Or you could apply a range of "timestamp"s:
>
>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>
>
> BTW the secondary indices are not a good idea: high cardinality and of no
> use in this query that I can see.
>
> ml
>
>
> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure 
> wrote:
>
>> Not really event time stamp is created by the sensor when it reads data
>> and  timestamp is something server creates when inserting data into
>> cassandra db.  At later point in time my django ux allows users to browse
>> this data and reference interesting data points via the timestamp field.
>> The timestamp field is my bridge between Sal and nosql world.
>>
>> Subodh
>> On Aug 31, 2014 5:33 PM, "Laing, Michael" 
>> wrote:
>>
>>> Are event_time and timestamp essentially representing the same datetime?
>>>
>>> On Sunday, August 31, 2014, Subodh Nijsure 
>>> wrote:
>>>
>>>> I have following database schema
>>>>
>>>> CREATE TABLE sensor_info_table (
>>>>   asset_id text,
>>>>   event_time timestamp,
>>>>   "timestamp" timeuuid,
>>>>   sensor_reading map,
>>>>   sensor_serial_number text,
>>>>   sensor_type int,
>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>>> );
>>>>
>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>>>
>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>>>
>>>> Now I am able to insert the data into this table, however I am unable
>>>> to do following query where I want to select items with specific
>>>> timeuuid values.
>>>>
>>>> It gives me following error.
>>>>
>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>>>
>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>>> (preceding column "event_time" is either not restricted or by a non-EQ
>>>> relation)
>>>>
>>>> What do I have to do to make this work?
>>>>
>>>> For what its worth I am using django for my front end development and
>>>> I am using "timestamp timeuuid" field as unique indentifier to
>>>> reference specific sensor reading from django framework -- since
>>>> cassandra doesn't have way to generate unique id upon insert (like
>>>> old-style rdms's auto-fields).
>>>>
>>>>
>>>> Below is software version info.
>>>>
>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>>> Thrift protocol 19.39.0]
>>>>
>>>> I really don't understand what the error message preceeding column
>>>> "event_time" is either not restricted or by no-EQ relation?
>>>>
>>>> -Subodh Nijsure
>>>>
>>>
>

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael

This should work for your query requirements - 2 tables w same info because
disk is cheap and writes are fast so optimize for reads:

CREATE TABLE sensor_asset (
  asset_id text,
  event_time timestamp,
  tuuid timeuuid,
  sensor_reading map,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY ((asset_id), event_time)
);

CREATE TABLE sensor_tuuid (
  asset_id text,
  event_time timestamp,
  tuuid timeuuid,
  sensor_reading map,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY (tuuid)
);

1. Give me all sensor data for an asset:

select * from sensor_asset where asset_id = ;

2. Give me sensor data that matches a set of timeuuids:

select * from sensor_tuuid where tuuid in (, , ...);

3. Give me all sensor data for an asset collected after | before | between
event_time(s):

select * from sensor_asset where asset_id =  and event_time > ;
select * from sensor_asset where asset_id =  and event_time < ;
select * from sensor_asset where asset_id =  and event_time < 
and event_time > ;

***

Many people (not me) handle sensor data, so there may be better overall
approaches considering volumes, deletion, compaction etc.

But the above is simple and should make your current approach workable as
you iterate toward a complete solution.

Cheers,
ml



On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure 
wrote:

> Thanks for your help Michael.
>
> If specifying asset_id would help I can construct queries that can
> include asset_id
>
> So I have been "playing" around with PRIMARY KEY definition and
> following table definition
>
> CREATE TABLE sensor_info_table (
>   asset_id text,
>   event_time timestamp,
>   "timestamp" timeuuid,
>   sensor_reading map,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id, "timestamp"), event_time)
> );
>
> It does what I want to do, and I removed the index for timestamp item
> since now it is part of primary key and thus my query like this works.
>
> SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
> timestamp IN (
> 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
> );
>
> But now this doesn't work it give
>
> SELECT * from sensor_info_table where  asset_id='3' ;
>
> Bad Request: Partition key part timestamp must be restricted since
> preceding part is
>
> I am keeping index on event_time as I sometime need to query something
> "give me all data since time x" i.e. something like this works.
>
>  SELECT * from sensor_info_table where  event_time > '2014-08-31
> 16:54:02-0700' ALLOW FILTERING;
>
> However if I do this things then this don't work:
>
> SELECT * from sensor_info_table where  asset_id='3' AND event_time >
> '2014-08-31 16:54:02-0700';
>
> Bad Request: Partition key part timestamp must be restricted since
> preceding part is
>
> Also  I am not conformable with fact that I need to specify ALLOW
> FILTERING.
>
> I guess cassandra schema design task asks designer to write down
> queries before designing schema.
>
> For the above table definition I want to do following queries:
>
> - Give me all sensor data for given asset.
> - Give me sensor data that matches given set of timeuuids
> - Give me all sendor data for a given asset, that were collected after
> | before | between  certain event_time.
>
> Given these query criteria how should  I construct my schema? One
> thought has occurred to me is make three tables with each item
> asset_id , event_time, timeuuid as primary keys and depending on type
> of query choose the table to do query upon. That seems like a waste of
> resources (disk, cpu ), also increasing insert times(!) but thats the
> way things need to happen in cassandra world its okay. ( I am
> two-three weeks into learning about cassandra).
>
> -Subodh
>
> On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
>  wrote:
> > Oh it must be late - I missed the fact that you didn't want to specify
> > asset_id. The above queries will still work but you have to use 'allow
> > filtering' - generally not a good idea. I'll look again in the morning.
> >
> >
> > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <
> michael.la...@nytimes.com>
> > wrote:
> >>
> >> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
> >> must be specified as well - hopefully that info is available to the ux.
> >>
> >> Unfortunately you will then hit another problem with your query: you are
> >> selecting a collection field... this will not work with IN on
> "timestamp".
> >>
> >> So you could select

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael

Did the OP propose that?


On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky 
wrote:

>   One comment on deletions – aren’t deletions kind of an anti-pattern for
> modern data processing, such as sensor data, time series data, and social
> media? I mean, isn’t it usually better to return a full history of the
> data, with some aging scheme, and manage the tracking of which values are
> “current” (or “recent”)? Shouldn’t we be looking for and promoting “write
> once” approaches as a much stronger preference/pattern? Or maybe I should
> say “write once and bulk delete on aging” rather than the exercise in
> futility of doing a massive number of deletes and updates in place?
>
> -- Jack Krupansky
>
>  *From:* Laing, Michael 
> *Sent:* Monday, September 1, 2014 9:33 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Help with select IN query in cassandra
>
>  This should work for your query requirements - 2 tables w same info
> because disk is cheap and writes are fast so optimize for reads:
>
>  CREATE TABLE sensor_asset (
>   asset_id text,
>   event_time timestamp,
>   tuuid timeuuid,
>   sensor_reading map,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id), event_time)
> );
>
> CREATE TABLE sensor_tuuid (
>   asset_id text,
>   event_time timestamp,
>   tuuid timeuuid,
>   sensor_reading map,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY (tuuid)
> );
>
> 1. Give me all sensor data for an asset:
>
> select * from sensor_asset where asset_id = ;
>
> 2. Give me sensor data that matches a set of timeuuids:
>
> select * from sensor_tuuid where tuuid in (, , ...);
>
> 3. Give me all sensor data for an asset collected after | before | between
> event_time(s):
>
> select * from sensor_asset where asset_id =  and event_time > ;
>  select * from sensor_asset where asset_id =  and event_time <
> ;
>  select * from sensor_asset where asset_id =  and event_time <
>  and event_time > ;
>
> ***
>
> Many people (not me) handle sensor data, so there may be better overall
> approaches considering volumes, deletion, compaction etc.
>
> But the above is simple and should make your current approach workable as
> you iterate toward a complete solution.
>
> Cheers,
> ml
>
>
>
> On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure  > wrote:
>
>> Thanks for your help Michael.
>>
>> If specifying asset_id would help I can construct queries that can
>> include asset_id
>>
>> So I have been "playing" around with PRIMARY KEY definition and
>> following table definition
>>
>> CREATE TABLE sensor_info_table (
>>   asset_id text,
>>   event_time timestamp,
>>   "timestamp" timeuuid,
>>   sensor_reading map,
>>   sensor_serial_number text,
>>   sensor_type int,
>>   PRIMARY KEY ((asset_id, "timestamp"), event_time)
>> );
>>
>> It does what I want to do, and I removed the index for timestamp item
>> since now it is part of primary key and thus my query like this works.
>>
>> SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
>> timestamp IN (
>> 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
>> );
>>
>> But now this doesn't work it give
>>
>> SELECT * from sensor_info_table where  asset_id='3' ;
>>
>> Bad Request: Partition key part timestamp must be restricted since
>> preceding part is
>>
>> I am keeping index on event_time as I sometime need to query something
>> "give me all data since time x" i.e. something like this works.
>>
>> SELECT * from sensor_info_table where  event_time > '2014-08-31
>> 16:54:02-0700' ALLOW FILTERING;
>>
>> However if I do this things then this don't work:
>>
>> SELECT * from sensor_info_table where  asset_id='3' AND event_time >
>> '2014-08-31 16:54:02-0700';
>>
>> Bad Request: Partition key part timestamp must be restricted since
>> preceding part is
>>
>> Also  I am not conformable with fact that I need to specify ALLOW
>> FILTERING.
>>
>> I guess cassandra schema design task asks designer to write down
>> queries before designing schema.
>>
>> For the above table definition I want to do following queries:
>>
>> - Give me all sensor data for given asset.
>> - Give me sensor data that matches given set of timeuuids
>> - Give me all sendor data for a given asset, that were collected after
>> | before | between  certain event_time.
>>
&

Re: EC2 - Performace Question

2014-09-01 Thread Laing, Michael

Is table track_user equivalent to table userpixel?

On Monday, September 1, 2014, Eduardo Cusa <
eduardo.c...@usmediaconsulting.com> wrote:

> Hi All. I Have a Cluster in Amazon with the following settings:
>
> * 2 Nodes M3.Large
> * Cassandra 2.0.7
> * Default instaltion on ubuntu
>
> And I have one table with 5.000.000 rows:
>
>
> CREATE TABLE track_user ( userid text, trackid text,date text ,advid text,
> country text, region text,
> PRIMARY KEY( (trackid,advid , country,
> region),userid ));
>
>
> When run  the following query take *20 **seconds * to finish :
>
> cqlsh:usmc> select count(*) from userpixel where trackid = 'ab1' and advid
> = 'adb1' and country = 'AR' and region = 'C' limit 500;
>
>
> Is this time normal?
>
> There are any way to improve the response?
>
>
> Thanks
> Eduardo
>

Re: EC2 - Performace Question

2014-09-01 Thread Laing, Michael

Is there a reason why updating a counter for this information will not work
for you?

On Monday, September 1, 2014, eduardo.cusa <
eduardo.c...@usmediaconsulting.com> wrote:

> yes, is the same table, my mistake.
>
>
> On Mon, Sep 1, 2014 at 6:35 PM, Laing, Michael [via [hidden email]
> <http://user/SendEmail.jtp?type=node&node=7596570&i=0>] <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=7596570&i=1>> wrote:
>
>> Is table track_user equivalent to table userpixel?
>>
>> On Monday, September 1, 2014, Eduardo Cusa <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7596569&i=0>> wrote:
>>
>>> Hi All. I Have a Cluster in Amazon with the following settings:
>>>
>>> * 2 Nodes M3.Large
>>> * Cassandra 2.0.7
>>> * Default instaltion on ubuntu
>>>
>>> And I have one table with 5.000.000 rows:
>>>
>>>
>>> CREATE TABLE track_user ( userid text, trackid text,date text ,advid
>>> text, country text, region text,
>>> PRIMARY KEY( (trackid,advid , country,
>>> region),userid ));
>>>
>>>
>>> When run  the following query take *20 **seconds * to finish :
>>>
>>> cqlsh:usmc> select count(*) from userpixel where trackid = 'ab1' and
>>> advid = 'adb1' and country = 'AR' and region = 'C' limit 500;
>>>
>>>
>>> Is this time normal?
>>>
>>> There are any way to improve the response?
>>>
>>>
>>> Thanks
>>> Eduardo
>>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/EC2-Performace-Question-tp7596568p7596569.html
>>  To start a new topic under [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7596570&i=2>, email [hidden
>> email] <http://user/SendEmail.jtp?type=node&node=7596570&i=3>
>> To unsubscribe from [hidden email]
>> <http://user/SendEmail.jtp?type=node&node=7596570&i=4>, click here.
>> NAML
>> <http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: Re: EC2 - Performace Question
> <http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/EC2-Performace-Question-tp7596568p7596570.html>
> Sent from the cassandra-u...@incubator.apache.org mailing list archive
> <http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/> at
> Nabble.com.
>

Re: OOM at Bootstrap Time

2014-10-25 Thread Laing, Michael

Since no one else has stepped in...

We have run clusters with ridiculously small nodes - I have a production
cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance
storage. It works fine but you can see those little puppies struggle...

And I ran into problems such as you observe...

Upgrading Java to the latest 1.7 and - most importantly - *reverting to the
default configuration, esp. for heap*, seemed to settle things down
completely. Also make sure that you are using the 'recommended production
settings' from the docs on your boxen.

However we are running 2.0.x not 2.1.0 so YMMV.

And we are switching to 15GB nodes w 2 heftier CPUs each and SSD storage -
still a 'small' machine, but much more reasonable for C*.

However I can't say I am an expert, since I deliberately keep things so
simple that we do not encounter problems - it just works so I dig into
other stuff.

ml


On Sat, Oct 25, 2014 at 5:22 PM, Maxime  wrote:

> Hello, I've been trying to add a new node to my cluster ( 4 nodes ) for a
> few days now.
>
> I started by adding a node similar to my current configuration, 4 GB or
> RAM + 2 Cores on DigitalOcean. However every time, I would end up getting
> OOM errors after many log entries of the type:
>
> INFO  [SlabPoolCleaner] 2014-10-25 13:44:57,240 ColumnFamilyStore.java:856
> - Enqueuing flush of mycf: 5383 (0%) on-heap, 0 (0%) off-heap
>
> leading to:
>
> ka-120-Data.db (39291 bytes) for commitlog position
> ReplayPosition(segmentId=1414243978538, position=23699418)
> WARN  [SharedPool-Worker-13] 2014-10-25 13:48:18,032
> AbstractTracingAwareExecutorService.java:167 - Uncaught exception on thread
> Thread[SharedPool-Worker-13,5,main]: {}
> java.lang.OutOfMemoryError: Java heap space
>
> Thinking it had to do with either compaction somehow or streaming, 2
> activities I've had tremendous issues with in the past; I tried to slow
> down the setstreamthroughput to extremely low values all the way to 5. I
> also tried setting setcompactionthoughput to 0, and then reading that in
> some cases it might be too fast, down to 8. Nothing worked, it merely
> vaguely changed the mean time to OOM but not in a way indicating either was
> anywhere a solution.
>
> The nodes were configured with 2 GB of Heap initially, I tried to crank it
> up to 3 GB, stressing the host memory to its limit.
>
> After doing some exploration (I am considering writing a Cassandra Ops
> documentation with lessons learned since there seems to be little of it in
> organized fashions), I read that some people had strange issues on
> lower-end boxes like that, so I bit the bullet and upgraded my new node to
> a 8GB + 4 Core instance, which was anecdotally better.
>
> To my complete shock, exact same issues are present, even raising the Heap
> memory to 6 GB. I figure it can't be a "normal" situation anymore, but must
> be a bug somehow.
>
> My cluster is 4 nodes, RF of 2, about 160 GB of data across all nodes.
> About 10 CF of varying sizes. Runtime writes are between 300 to 900 /
> second. Cassandra 2.1.0, nothing too wild.
>
> Has anyone encountered these kinds of issues before? I would really enjoy
> hearing about the experiences of people trying to run small-sized clusters
> like mine. From everything I read, Cassandra operations go very well on
> large (16 GB + 8 Cores) machines, but I'm sad to report I've had nothing
> but trouble trying to run on smaller machines, perhaps I can learn from
> other's experience?
>
> Full logs can be provided to anyone interested.
>
> Cheers
>

Re: OOM at Bootstrap Time

2014-10-27 Thread Laing, Michael

t;>> #a6e54ea0-5bed-11e4-8df5-f357715e1a79
>>> > ID#0] Prepare completed. Receiving 154 files(3 332 779 920 bytes),
>>> sending 0
>>> > files(0 bytes)
>>> >
>>> > INFO  [STREAM-IN-/...71] 2014-10-25 02:21:50,494
>>> > StreamResultFuture.java:166 - [Stream
>>> #a6e54ea0-5bed-11e4-8df5-f357715e1a79
>>> > ID#0] Prepare completed. Receiving 1315 files(4 606 316 933 bytes),
>>> sending
>>> > 0 files(0 bytes)
>>> >
>>> > INFO  [STREAM-IN-/...217] 2014-10-25 02:21:51,036
>>> > StreamResultFuture.java:166 - [Stream
>>> #a6e54ea0-5bed-11e4-8df5-f357715e1a79
>>> > ID#0] Prepare completed. Receiving 1640 files(3 208 023 573 bytes),
>>> sending
>>> > 0 files(0 bytes)
>>> >
>>> >  As you can see, the existing 4 nodes are streaming data to the new
>>> node and
>>> > on average the data set size is about 3.3 - 4.5 Gb. However the number
>>> of
>>> > SSTables is around 150 files for nodes ...20 and
>>> > ...81 but goes through the roof to reach 1315 files for
>>> > ...71 and 1640 files for ...217
>>> >
>>> >  The total data set size is roughly the same but the file number is
>>> x10,
>>> > which mean that you'll have a bunch of tiny files.
>>> >
>>> >  I guess that upon reception of those files, there will be a massive
>>> flush
>>> > to disk, explaining the behaviour you're facing (flush storm)
>>> >
>>> > I would suggest looking on nodes ...71 and
>>> ...217 to
>>> > check for the total SSTable count for each table to confirm this
>>> intuition
>>> >
>>> > Regards
>>> >
>>> >
>>> > On Sun, Oct 26, 2014 at 4:58 PM, Maxime  wrote:
>>> >>
>>> >> I've emailed you a raw log file of an instance of this happening.
>>> >>
>>> >> I've been monitoring more closely the timing of events in tpstats and
>>> the
>>> >> logs and I believe this is what is happening:
>>> >>
>>> >> - For some reason, C* decides to provoke a flush storm (I say some
>>> reason,
>>> >> I'm sure there is one but I have had difficulty determining the
>>> behaviour
>>> >> changes between 1.* and more recent releases).
>>> >> - So we see ~ 3000 flush being enqueued.
>>> >> - This happens so suddenly that even boosting the number of flush
>>> writers
>>> >> to 20 does not suffice. I don't even see "all time blocked" numbers
>>> for it
>>> >> before C* stops responding. I suspect this is due to the sudden OOM
>>> and GC
>>> >> occurring.
>>> >> - The last tpstat that comes back before the node goes down indicates
>>> 20
>>> >> active and 3000 pending and the rest 0. It's by far the anomalous
>>> activity.
>>> >>
>>> >> Is there a way to throttle down this generation of Flush? C*
>>> complains if
>>> >> I set the queue_size to any value (deprecated now?) and boosting the
>>> threads
>>> >> does not seem to help since even at 20 we're an order of magnitude
>>> off.
>>> >>
>>> >> Suggestions? Comments?
>>> >>
>>> >>
>>> >> On Sun, Oct 26, 2014 at 2:26 AM, DuyHai Doan 
>>> wrote:
>>> >>>
>>> >>> Hello Maxime
>>> >>>
>>> >>>  Can you put the complete logs and config somewhere ? It would be
>>> >>> interesting to know what is the cause of the OOM.
>>> >>>
>>> >>> On Sun, Oct 26, 2014 at 3:15 AM, Maxime  wrote:
>>> >>>>
>>> >>>> Thanks a lot that is comforting. We are also small at the moment so
>>> I
>>> >>>> definitely can relate with the idea of keeping small and simple at
>>> a level
>>> >>>> where it just works.
>>> >>>>
>>> >>>> I see the new Apache version has a lot of fixes so I will try to
>>> upgrade
>>> >>>> before I look into downgrading.
>>> >>>>
>>> >>>>
>>> >>>> On Saturday, October 25, 2014, Laing, Michael

Re: Using Cassandra for session tokens

2014-12-01 Thread Laing, Michael

Since the session tokens are random, perhaps computing a shard from each
one and using it as the partition key would be a good idea.

I would also use uuid v1 to get ordering.

With such a small amount of data, only a few shards would be needed.

On Mon, Dec 1, 2014 at 10:08 AM, Phil Wise 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> The session will be written once at create time, and never modified
> after that. Will that affect things?
>
> Thank you
>
> - -Phil
>
> On 01.12.2014 15:58, Jonathan Haddad wrote:
> > I don't think DateTiered will help here, since there's no
> > clustering key defined.  This is a pretty straightforward workload,
> > I've done something similar.
> >
> > Are you overwriting the session on every request? Or just writing
> > it once?
> >
> > On Mon Dec 01 2014 at 6:45:14 AM Matt Brown 
> > wrote:
> >
> >> This sounds like a good use case for
> >> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
> >>
> >>
> >> On Dec 1, 2014, at 3:07 AM, Phil Wise
> >>  wrote:
> >>
> >> We're considering switching from using Redis to Cassandra to
> >> store short lived (~1 hour) session tokens, in order to reduce
> >> the number of data storage engines we have to manage.
> >>
> >> Can anyone foresee any problems with the following approach:
> >>
> >> 1) Use the TTL functionality in Cassandra to remove old tokens.
> >>
> >> 2) Store the tokens in a table like:
> >>
> >> CREATE TABLE tokens ( id uuid, username text, // (other session
> >> information) PRIMARY KEY (id) );
> >>
> >> 3) Perform ~100 writes/sec like:
> >>
> >> INSERT INTO tokens (id, username ) VALUES
> >> (468e0d69-1ebe-4477-8565-00a4cb6fa9f2, 'bob') USING TTL 3600;
> >>
> >> 4) Perform ~1000 reads/sec like:
> >>
> >> SELECT * FROM tokens WHERE
> >> ID=468e0d69-1ebe-4477-8565-00a4cb6fa9f2 ;
> >>
> >> The tokens will be about 100 bytes each, and we will grant 100
> >> per second on a small 3 node cluster. Therefore there will be
> >> about 360k tokens alive at any time, with a total size of 36MB
> >> before database overhead.
> >>
> >> My biggest worry at the moment is that this kind of workload
> >> will stress compaction in an unusual way.  Are there any metrics
> >> I should keep an eye on to make sure it is working fine?
> >>
> >> I read over the following links, but they mostly talk about
> >> DELETE-ing and tombstones. Am I right in thinking that as soon as
> >> a node performs a compaction then the rows with an expired TTL
> >> will be thrown away, regardless of gc_grace_seconds?
> >>
> >> https://issues.apache.org/jira/browse/CASSANDRA-7534
> >>
> >>
> >>
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
> >>
> >>
> >>
> https://issues.apache.org/jira/browse/CASSANDRA-6654
> >>
> >> Thank you
> >>
> >> Phil
> >>
> >>
> >>
> >>
> >
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1
>
> iQIcBAEBAgAGBQJUfIR1AAoJEAvGtrO88FBAnpAP/0RCdwCy4Wi0ogz24SRKpCu0
> c/i6O2HBTinl2RXLoH9xMOT8kXJ82P9tVDeKjLQAZYnBgRwF7Fcbvd40GPf+5aaj
> aU1TkU4jLnDCeFTwG/vx+TIfZEE27nppsECLtfmnzJEl/4yZwAG3Dy+VkuqBurMu
> J6If9bMnseEgvF1onmA7ZLygJq44tlgOGyHT0WdYRX7CwAE6HeyxMC38ArarRU37
> dfGhsttBMqdxHreKE0CqRZZ67iT+KixGoUeCvZUnTvOLTsrEWO17yTezQDamAee0
> jIsVfgKqqhoiKeAj99J75rcsIT3WAbS23MV1s92AQXYkpR1KmHTB6KvUjH2AQBew
> 9xwdDSg/eVsdQNkGbtSJ2cNPnFuBBZv2kzW5PVyQ625bMHNAF2GE9rLIKddMUbNQ
> LiwOPAJDWBJeZnJYj3cncdfC2Jw1H4rlV0k6BHwdzZUrEdbvUKlHtyl8/ZsZnJHs
> SrPsiYQa0NI6C+faAFqzBEyLhsWdJL3ygNZTo4CW3I8z+yYEyzZtmKPDmHdVzK/M
> M8GlaRYw1t7OY81VBXKcmPyr5Omti7wtkffC6bhopsPCm7ATSq2r46z8OFlkUdJl
> wcTMJM0E6gZtiMIr3D+WbOTzI5kPX6x4UB3ec3xq6+GIObPwioVAJf3ADmIK4iHT
> G106NwdUnag5XlnbwgMX
> =6zXb
> -END PGP SIGNATURE-
>

Re: Using Cassandra for session tokens

2014-12-01 Thread Laing, Michael

Sharding lets you use the row cache effectively in 2.1.

But like everything, one should test :)

On Mon, Dec 1, 2014 at 1:56 PM, Jonathan Haddad  wrote:

> I don't know what the advantage would be of using this sharding system.  I
> would recommend just going with a simple k->v table as the OP suggested.
>
> On Mon Dec 01 2014 at 7:18:51 AM Laing, Michael 
> wrote:
>
>> Since the session tokens are random, perhaps computing a shard from each
>> one and using it as the partition key would be a good idea.
>>
>> I would also use uuid v1 to get ordering.
>>
>> With such a small amount of data, only a few shards would be needed.
>>
>> On Mon, Dec 1, 2014 at 10:08 AM, Phil Wise 
>> wrote:
>>
>>> -BEGIN PGP SIGNED MESSAGE-
>>> Hash: SHA1
>>>
>>> The session will be written once at create time, and never modified
>>> after that. Will that affect things?
>>>
>>> Thank you
>>>
>>> - -Phil
>>>
>>> On 01.12.2014 15:58, Jonathan Haddad wrote:
>>> > I don't think DateTiered will help here, since there's no
>>> > clustering key defined.  This is a pretty straightforward workload,
>>> > I've done something similar.
>>> >
>>> > Are you overwriting the session on every request? Or just writing
>>> > it once?
>>> >
>>> > On Mon Dec 01 2014 at 6:45:14 AM Matt Brown 
>>> > wrote:
>>> >
>>> >> This sounds like a good use case for
>>> >> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
>>> >>
>>> >>
>>> >> On Dec 1, 2014, at 3:07 AM, Phil Wise
>>> >>  wrote:
>>> >>
>>> >> We're considering switching from using Redis to Cassandra to
>>> >> store short lived (~1 hour) session tokens, in order to reduce
>>> >> the number of data storage engines we have to manage.
>>> >>
>>> >> Can anyone foresee any problems with the following approach:
>>> >>
>>> >> 1) Use the TTL functionality in Cassandra to remove old tokens.
>>> >>
>>> >> 2) Store the tokens in a table like:
>>> >>
>>> >> CREATE TABLE tokens ( id uuid, username text, // (other session
>>> >> information) PRIMARY KEY (id) );
>>> >>
>>> >> 3) Perform ~100 writes/sec like:
>>> >>
>>> >> INSERT INTO tokens (id, username ) VALUES
>>> >> (468e0d69-1ebe-4477-8565-00a4cb6fa9f2, 'bob') USING TTL 3600;
>>> >>
>>> >> 4) Perform ~1000 reads/sec like:
>>> >>
>>> >> SELECT * FROM tokens WHERE
>>> >> ID=468e0d69-1ebe-4477-8565-00a4cb6fa9f2 ;
>>> >>
>>> >> The tokens will be about 100 bytes each, and we will grant 100
>>> >> per second on a small 3 node cluster. Therefore there will be
>>> >> about 360k tokens alive at any time, with a total size of 36MB
>>> >> before database overhead.
>>> >>
>>> >> My biggest worry at the moment is that this kind of workload
>>> >> will stress compaction in an unusual way.  Are there any metrics
>>> >> I should keep an eye on to make sure it is working fine?
>>> >>
>>> >> I read over the following links, but they mostly talk about
>>> >> DELETE-ing and tombstones. Am I right in thinking that as soon as
>>> >> a node performs a compaction then the rows with an expired TTL
>>> >> will be thrown away, regardless of gc_grace_seconds?
>>> >>
>>> >> https://issues.apache.org/jira/browse/CASSANDRA-7534
>>> >>
>>> >>
>>> >>
>>> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
>>> >>
>>> >>
>>> >>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6654
>>> >>
>>> >> Thank you
>>> >>
>>> >> Phil
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> -BEGIN PGP SIGNATURE-
>>> Version: GnuPG v1
>>>
>>> iQIcBAEBAgAGBQJUfIR1AAoJEAvGtrO88FBAnpAP/0RCdwCy4Wi0ogz24SRKpCu0
>>> c/i6O2HBTinl2RXLoH9xMOT8kXJ82P9tVDeKjLQAZYnBgRwF7Fcbvd40GPf+5aaj
>>> aU1TkU4jLnDCeFTwG/vx+TIfZEE27nppsECLtfmnzJEl/4yZwAG3Dy+VkuqBurMu
>>> J6If9bMnseEgvF1onmA7ZLygJq44tlgOGyHT0WdYRX7CwAE6HeyxMC38ArarRU37
>>> dfGhsttBMqdxHreKE0CqRZZ67iT+KixGoUeCvZUnTvOLTsrEWO17yTezQDamAee0
>>> jIsVfgKqqhoiKeAj99J75rcsIT3WAbS23MV1s92AQXYkpR1KmHTB6KvUjH2AQBew
>>> 9xwdDSg/eVsdQNkGbtSJ2cNPnFuBBZv2kzW5PVyQ625bMHNAF2GE9rLIKddMUbNQ
>>> LiwOPAJDWBJeZnJYj3cncdfC2Jw1H4rlV0k6BHwdzZUrEdbvUKlHtyl8/ZsZnJHs
>>> SrPsiYQa0NI6C+faAFqzBEyLhsWdJL3ygNZTo4CW3I8z+yYEyzZtmKPDmHdVzK/M
>>> M8GlaRYw1t7OY81VBXKcmPyr5Omti7wtkffC6bhopsPCm7ATSq2r46z8OFlkUdJl
>>> wcTMJM0E6gZtiMIr3D+WbOTzI5kPX6x4UB3ec3xq6+GIObPwioVAJf3ADmIK4iHT
>>> G106NwdUnag5XlnbwgMX
>>> =6zXb
>>> -END PGP SIGNATURE-
>>>
>>
>>

Re: Recommissioned node is much smaller

2014-12-07 Thread Laing, Michael

On a mac this works (different sed, use an actual newline):

"
nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
's/,$/\
>/'
"

Otherwise the last token will have an 'n' appended which you may not notice.

On Fri, Dec 5, 2014 at 4:34 PM, Robert Coli  wrote:

> On Wed, Dec 3, 2014 at 10:10 AM, Robert Wille  wrote:
>
>>  Load and ownership didn’t correlate nearly as well as I expected. I have
>> lots and lots of very small records. I would expect very high correlation.
>>
>>  I think the moral of the story is that I shouldn’t delete the system
>> directory. If I have issues with a node, I should recommission it properly.
>>
>
> If you always specify initial_token in cassandra.yaml, then you are
> protected from some cases similar to the one that you seem to have just
> encountered.
>
> Wish I had actually managed to post this on a blog, but :
>
>
> --- cut ---
>
> example of why :
>
> https://issues.apache.org/jira/browse/CASSANDRA-5571
>
> 11:22 < rcoli> but basically, explicit is better than implicit
> 11:22 < rcoli> the only reason ppl let cassandra pick tokens is that it's
> semi-complex to do "right" with vnodes
> 11:22 < rcoli> but once it has picked tokens
> 11:22 < rcoli> you know what they are
> 11:22 < rcoli> why have a risky conf file that relies on implicit state?
> 11:23 < rcoli> just put the tokens in the conf file. done.
> 11:23 < rcoli> then you can use auto_bootstrap:false even if you lose
> system keyspace, etc.
>
> I plan to write a short blog post about this, but...
>
> I recommend that anyone using Cassandra, vnodes or not, always explicitly
> populate their initial_token line in cassandra.yaml. There are a number of
> cases where you will lose if you do not do so, and AFAICT no cases where
> you lose by doing so.
>
> If one is using vnodes and wants to do this, the process goes like :
>
> 1) set num_tokens to the desired number of vnodes
> 2) start node/bootstrap
> 3) use a one liner like jeffj's :
>
> "
> nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
> 's/,$/\n/'
> "
>
> to get a comma delimited list of the vnode tokens
>
> 4) insert this comma delimited list in initial_tokens, and comment out
> num_tokens (though it is a NOOP)
>
>  --- cut ---
>
> =Rob
>
>

Re: [Consitency on cqlsh command prompt]

2014-12-17 Thread Laing, Michael

http://datastax.github.io/python-driver/api/cassandra.html

On Wed, Dec 17, 2014 at 9:27 AM, nitin padalia 
wrote:
>
> Thanks! Philip/Ryan,
> Ryan I am using single Datacenter.
> Philip could you point some link where we could see those enums.
> -Nitin
> On Dec 17, 2014 7:14 PM, "Philip Thompson" 
> wrote:
>
>> I believe the problem here is that the consistency level it is showing
>> you is not the number of nodes that need to respond, but the enum value
>> that corresponds to QUORUM internally. If you would like, you can file an
>> improvement request on the Apache Cassandra Jira.
>>
>> On Wed, Dec 17, 2014 at 12:47 AM, nitin padalia 
>> wrote:
>>>
>>> Hi,
>>>
>>> When I set Consistency to QUORUM in cqlsh command line. It says
>>> consistency is set to quorum.
>>>
>>> cqlsh:testdb> CONSISTENCY QUORUM ;
>>> Consistency level set to QUORUM.
>>>
>>> However when I check it back using CONSISTENCY command on the prompt
>>> it says consistency is 4. However it should be 2 as my replication
>>> factor for the keyspace is 3.
>>> cqlsh:testdb> CONSISTENCY ;
>>> Current consistency level is 4.
>>>
>>> Isn't consistency QUORUM calculated by: (replication_factor/2)+1?
>>> Where replication_factor/2 is rounded down.
>>>
>>> If yes then why consistency is displayed as 4, however it should be 2
>>> (3/2 = 1.5 = 1)+1 = 2.
>>>
>>> I am using Casssandra version 2.1.2 and cqlsh 5.0.1 and CQL spec 3.2.0
>>>
>>>
>>> Thanks! in advance.
>>> Nitin Padalia
>>>
>>

Re: number of replicas per data center?

2015-01-19 Thread Laing, Michael

Since our workload is spread globally, we spread our nodes across AWS
regions as well: 2 nodes per zone, 6 nodes per region (datacenter) (RF 3),
12 nodes total (except during upgrade migrations). We autodeploy into VPCs.
If a region goes "bad" we can route all traffic to another and bring up a
third.

Super inexpensive in that is very reliable, available and easy to manage.

ml

On Sun, Jan 18, 2015 at 11:50 PM, Kevin Burton  wrote:

> Ah.. six replicas.  At least its super inexpensive that way (sarcasm!)
>
>
>
> On Sun, Jan 18, 2015 at 8:14 PM, Jonathan Haddad 
> wrote:
>
>> Sorry, I left out RF.  Yes, I prefer 3 replicas in each datacenter, and
>> that's pretty common.
>>
>>
>> On Sun Jan 18 2015 at 8:02:12 PM Kevin Burton  wrote:
>>
>>> < 3 what? :-P replicas per datacenter or 3 data centers?
>>>
>>> So if you have 2 data centers you would have 6 total replicas with 3
>>> local replicas per datacenter?
>>>
>>> On Sun, Jan 18, 2015 at 7:53 PM, Jonathan Haddad 
>>> wrote:
>>>
 Personally I wouldn't go < 3 unless you have a good reason.


 On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton 
 wrote:

> How do people normally setup multiple data center replication in terms
> of number of *local* replicas?
>
> So say you have two data centers, do you have 2 local replicas, for a
> total of 4 replicas?  Or do you have 2 in one datacenter, and 1 in 
> another?
>
> If you only have one in a local datacenter then when it fails you have
> to transfer all that data over the WAN.
>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>
>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>

Re: Adding more nodes causes performance problem

2015-02-09 Thread Laing, Michael

Use token-awareness so you don't have as much coordinator overhead.

ml

On Mon, Feb 9, 2015 at 5:32 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:

> AFAIK, if you were using RF 3 in a 3 node cluster, so all your nodes had
> all your data.
> When the number of nodes started to grow, this assumption stopped being
> true.
> I think Cassandra will scale linearly from 9 nodes on, but comparing a
> situation where all your nodes hold all your data is not really fair, as in
> this situation Cassandra will behave as a database with two more replicas,
> for reads.
> I can be wrong, but this is my call.
>
> From: user@cassandra.apache.org
> Subject: Re:Adding more nodes causes performance problem
>
> I have a cluster with 3 nodes, the only keyspace is with replication
> factor of 3,
> the application read/write UUID-keyed data. I use CQL (casssandra-python),
> most writes are done by execute_async, most read are done with consistency
> level of ONE, overall performance in this setup is better than I expected.
>
> Then I test 6-nodes cluster and 9-nodes. The performance (both read and
> write) was getting worse and worse. Roughly speaking, 6-nodes is about 2~3
> times slower than 3-nodes, and 9-nodes is about 5~6 times slower than
> 3-nodes. All tests were done with same data set, same test program, same
> client machines, for multiple times. I'm running Cassandra 2.1.2 with
> default
> configuration.
>
> What I observed, is that with 6-nodes and 9-nodes, the Cassandra servers
> were doing OK with IO, but CPU utilization was about 60%~70% higher than
> 3-nodes.
>
> I'd like to get suggestion how to troubleshoot this, as this is totally
> against
> what I read, that Cassandra is scaled linearly.
>
>
>
>

Re: Storing bi-temporal data in Cassandra

2015-02-15 Thread Laing, Michael

Perhaps you should learn more about Cassandra before you ask such questions.

It's easy if you just look at the readily accessible docs.

ml

On Sat, Feb 14, 2015 at 6:05 PM, Raj N  wrote:

> I don't think thats solves my problem. The question really is why can't we
> use ranges for both time columns when they are part of the primary key.
> They are on 1 row after all. Is this just a CQL limitation?
>
> -Raj
>
> On Sat, Feb 14, 2015 at 3:35 AM, DuyHai Doan  wrote:
>
>> "I am trying to get the state as of a particular transaction_time"
>>
>>  --> In that case you should probably define your primary key in another
>> order for clustering columns
>>
>> PRIMARY KEY (weatherstation_id,transaction_time,event_time)
>>
>> Then, select * from temperatures where weatherstation_id = 'foo' and
>> event_time >= '2015-01-01 00:00:00' and event_time < '2015-01-02
>> 00:00:00' and transaction_time = ''
>>
>>
>>
>> On Sat, Feb 14, 2015 at 3:06 AM, Raj N  wrote:
>>
>>> Has anyone designed a bi-temporal table in Cassandra? Doesn't look like
>>> I can do this using CQL for now. Taking the time series example from well
>>> known modeling tutorials in Cassandra -
>>>
>>> CREATE TABLE temperatures (
>>> weatherstation_id text,
>>> event_time timestamp,
>>> temperature text,
>>> PRIMARY KEY (weatherstation_id,event_time),
>>> ) WITH CLUSTERING ORDER BY (event_time DESC);
>>>
>>> If I add another column transaction_time
>>>
>>> CREATE TABLE temperatures (
>>> weatherstation_id text,
>>> event_time timestamp,
>>> transaction_time timestamp,
>>> temperature text,
>>> PRIMARY KEY (weatherstation_id,event_time,transaction_time),
>>> ) WITH CLUSTERING ORDER BY (event_time DESC, transaction_time DESC);
>>>
>>> If I try to run a query using the following CQL, it throws an error -
>>>
>>> select * from temperatures where weatherstation_id = 'foo' and
>>> event_time >= '2015-01-01 00:00:00' and event_time < '2015-01-02
>>> 00:00:00' and transaction_time < '2015-01-02 00:00:00'
>>>
>>> It works if I use an equals clause for the event_time. I am trying to
>>> get the state as of a particular transaction_time
>>>
>>> -Raj
>>>
>>
>>
>

1 2 >

1 - 100 of 133 matches

Mail list logo