RE: How to configure linux service for Cassandra?

2013-11-12 Thread Christopher Wirt
Starting multiple Cassandra nodes on the same machine involves setting loop
back aliases and some configuration fiddling.

 

Lucky for you Sylvain Lebresne made this handy tool in python which does the
job for you. 

https://github.com/pcmanus/ccm

 

to run as a service you need a script like this
http://www.bajb.net/2012/01/cassandra-service-script/

I haven't tried this, I just run Cassandra in the foreground of a screen
session.

 

 

From: Boole.Z.Guo (mis.cnsh04.Newegg) 41442 [mailto:boole.z@newegg.com] 
Sent: 12 November 2013 05:17
To: user@cassandra.apache.org
Subject: How to configure linux service for Cassandra?

 

How to configure linux service for Cassandra or start multiple Cassandra
nodes from a single node?

 

Thanks very muh!

 

Best Regards,

Boole Guo



Modeling multi-tenanted Cassandra schema

2013-11-12 Thread Ben Hood
Hi,

I've just received a requirement to make a Cassandra app
multi-tenanted, where we'll have up to 100 tenants.

Most of the tables are timestamped wide row tables with a natural
application key for the partitioning key and a timestamp key as a
cluster key.

So I was considering the options:

(a) Add a tenant column to each table and stick a secondary index on
that column;
(b) Add a tenant column to each table and maintain index tables that
use the tenant id as a partitioning key;
(c) Decompose the partitioning key of each table and add the tenant
and the leading component of the key;
(d) Add the tenant as a separate clustering key;
(e) Replicate the schema in separate tenant specific key spaces;
(f) Something I may have missed;

Option (a) seems the easiest, but I'm wary of just adding secondary
indexes without thinking about it.

Option (b) seems to have the least impact of the layout of the
storage, but a cost of maintaining each index table, both code wise
and in terms of performance.

Option (c) seems quite straight forward, but I feel it might have a
significant effect on the distribution of the rows, if the cardinality
of the tenants is low.

Option (d) seems simple enough, but it would mean that you couldn't
query for a range of tenants without supplying a range of natural
application keys, through which you would need to iterate (under the
assumption that you don't use an ordered partitioner).

Option (e) appears relatively straight forward, but it does mean that
the application CQL client needs to maintain separate cluster
connections for each tenant. Also I'm not sure to what extent key
spaces were designed to partition identically structured data.

Does anybody have any experience with running a multi-tenanted
Cassandra app, or does this just depend too much on the specifics of
the application?

Cheers,

Ben


答复: How to configure linux service for Cassandra?

2013-11-12 Thread Boole.Z.Guo (mis.cnsh04.Newegg) 41442
Thanks very much. I will try.

The goal of ccm and ccmlib is to make is easy to create, manage and destroy a
small cluster on a local box. It is meant for testing of a Cassandra cluster.
Best Regards,
Boole Guo
Software Engineer, NESC-SH.MIS
+86-021-51530666*41442
Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)

发件人: Christopher Wirt [mailto:chris.w...@struq.com]
发送时间: 2013年11月12日 16:53
收件人: user@cassandra.apache.org
主题: RE: How to configure linux service for Cassandra?

Starting multiple Cassandra nodes on the same machine involves setting loop 
back aliases and some configuration fiddling.

Lucky for you Sylvain Lebresne made this handy tool in python which does the 
job for you.
https://github.com/pcmanus/ccm

to run as a service you need a script like this 
http://www.bajb.net/2012/01/cassandra-service-script/
I haven’t tried this, I just run Cassandra in the foreground of a screen 
session.


From: Boole.Z.Guo (mis.cnsh04.Newegg) 41442 [mailto:boole.z@newegg.com]
Sent: 12 November 2013 05:17
To: user@cassandra.apache.org
Subject: How to configure linux service for Cassandra?

How to configure linux service for Cassandra or start multiple Cassandra nodes 
from a single node?

Thanks very muh!

Best Regards,
Boole Guo


Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

2013-11-12 Thread Jiri Horky
Hi,
On 11/12/2013 05:29 AM, Aaron Morton wrote:
>>> Are you doing large slices or do could you have a lot of tombstones
>>> on the rows ? 
>> don't really know - how can I monitor that?
> For tombstones, do you do a lot of deletes ? 
> Also in v2.0.2 cfstats has this 
>
> Average live cells per slice (last five minutes): 0.0
> Average tombstones per slice (last five minutes): 0.0
>
> For large slices you need to check your code. e.g. do you anything
> that reads lots of columns or very large columns or lets the user
> select how many columns to read?
>
> The org.apache.cassandra.db.ArrayBackedSortedColumns in the trace back
> is used during reads (.e.g.
> org.apache.cassandra.db.filter.SliceQueryFilter)
thanks for explanation, will try to provide some figures (but
unfortunately not from the 2.0.2).
>
>>> You probably want the heap to be 4G to 8G in size, 10G will
>>> encounter longer pauses. 
>>> Also the size of the new heap may be too big depending on the number
>>> of cores. I would recommend trying 800M
>> I tried to decrease it first to 384M then to 128M with no change in
>> the behaviour. I don't really care extra memory overhead of the cache
>> - to be able to actual point to it with objects, but I don't really
>> see the reason why it should create/delete those many objects so
>> quickly. 
> Not sure what you changed to 384M.
Sorry for the confusion. I meant to say that I tried to decrease row
cache size to 384M and then to 128M and the GC times did not change at
all (still ~30% of the time).
>
>>> Shows the heap growing very quickly. This could be due to wide reads
>>> or a high write throughput. 
>> Well, both prg01 and prg02 receive the same load which is about
>> ~150-250 (during peak) read requests per seconds and 100-160 write
>> requests per second. The only with heap growing rapidly and GC
>> kicking in is on nodes with row cache enabled.
>
> This sounds like on a row cache miss cassandra is reading the whole
> row, which happens to be a wide row. I would also guess some writes
> are going to the rows and they are getting invalidated out of the row
> cache. 
>
> The row cache is not great for rows the update frequently and/or wide
> rows. 
>
> How big are the rows ? use nodetool cfstats and nodetool cfhistograms.
I will get in touch with the developers and take the data from cf*
commands in a few days (I am out of office for some days).

Thanks for the pointers, will get in touch.

Cheers
Jiri Horky



Re: java.io.FileNotFoundException when setting up internode_compression

2013-11-12 Thread srmore
Thanks Christopher !
I don't think glibc is an issue (as it did go that far) /usr/tmp/
snappy-1.0.5-libsnappyjava.so is not there, permissions look ok, are there
any special settings (like JVM args) that I should be using ? I can see
libsnappyjava.so in the jar though
(snappy-java-1.0.5.jar\org\xerial\snappy\native\Linux\i386\) one other
thing I am using RedHat 6. I will try updating glibc ans see what happens.

Thanks !




On Mon, Nov 11, 2013 at 5:01 PM, Christopher Wirt wrote:

> I had this the other day when we were accidentally provisioned a centos5
> machine (instead of 6). Think it relates to the version of glibc. Notice it
> wants the native binary .so not the .jar
>
>
>
> So maybe update to a newer version of glibc? Or possibly make sure the .so
> exists at /usr/tmp/snappy-1.0.5-libsnappyjava.so?
>
> I was lucky and just did an OS reload to centos6.
>
>
>
> Here is someone having a similar issue.
>
>
> http://mail-archives.apache.org/mod_mbox/cassandra-commits/201307.mbox/%3CJIRA.12616012.1352862646995.6820.1373083550278@arcas%3E
>
>
>
>
>
> *From:* srmore [mailto:comom...@gmail.com]
> *Sent:* 11 November 2013 21:32
> *To:* user@cassandra.apache.org
> *Subject:* java.io.FileNotFoundException when setting up
> internode_compression
>
>
>
> I might be missing something obvious here, for some reason I cannot seem
> to get internode_compression = all to work. I am getting  the following
> exception. I am using cassandra 1.2.9 and have snappy-java-1.0.5.jar in my
> classpath. Google search did not return any useful result, has anyone seen
> this before ?
>
>
> java.io.FileNotFoundException: /usr/tmp/snappy-1.0.5-libsnappyjava.so (No
> such file or directory)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.(FileOutputStream.java:194)
> at java.io.FileOutputStream.(FileOutputStream.java:145)
> at
> org.xerial.snappy.SnappyLoader.extractLibraryFile(SnappyLoader.java:394)
> at
> org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:468)
> at
> org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:318)
> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
> at org.xerial.snappy.Snappy.(Snappy.java:48)
> at
> org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
> at
> org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
> at
> org.apache.cassandra.io.compress.SnappyCompressor.(SnappyCompressor.java:37)
> at
> org.apache.cassandra.config.CFMetaData.(CFMetaData.java:82)
> at
> org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:81)
> at
> org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:471)
> at
> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:123)
>
> Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in
> java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)
> at java.lang.Runtime.loadLibrary0(Runtime.java:823)
> at java.lang.System.loadLibrary(System.java:1028)
> at
> org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
> ... 18 more
>


Re: rcp timeout after using sstableloader

2013-11-12 Thread Tyler Hobbs
You might be seeing https://issues.apache.org/jira/browse/CASSANDRA-6272,
depending on what version of Cassandra you're running.


On Tue, Nov 12, 2013 at 1:34 AM, Turi, Ferenc (GE Power & Water, Non-GE) <
ferenc.t...@ge.com> wrote:

>  Hi,
>
>
>
> I tried to get experience in creating sstable using my own java
> code…Everything went ok, but after data loaded using sstableloader I was
> not able to select from the columnfamily anymore.
>
>
>
> Cql returned rcp timeout. Do somebody know what could be the reason?
>
> / I did not find any warning/error message in system.log /
>
>
>
> Thanks,
>
>
>
> Ferenc
>
>
>
>
>
>
>
>
>



-- 
Tyler Hobbs
DataStax 


Re: Statistics

2013-11-12 Thread Tyler Hobbs
This may be an easier method:
http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2


On Fri, Nov 8, 2013 at 3:57 PM, David Chia  wrote:

> http://www.datastax.com/dev/blog/metrics-in-cassandra12
>
>
> On Fri, Nov 8, 2013 at 11:42 AM, Parag Patel wrote:
>
>>Hi,
>>
>>
>>
>> I’m looking for a way to view statistics.  Mainly, I’d like to see the
>> distribution of writes and reads over the course of a day or a set of days.
>> Is there a way to do this through nodetool or by downloading a utility?
>>
>>
>>
>> Thanks,
>>
>> Parag
>>
>
>


-- 
Tyler Hobbs
DataStax 


Re: CF backup / restore selected columns

2013-11-12 Thread Tyler Hobbs
There's no easy way to do this that I'm aware of.  Snapshots are just
hardlinks to existing SSTable files.  Your best option is probably either
scanning the CF with a normal client or setting up an M/R job.


On Mon, Nov 11, 2013 at 5:06 PM, Turi, Ferenc (GE Power & Water, Non-GE) <
ferenc.t...@ge.com> wrote:

>  Hi,
>
>
>
> I have a question which I was not able to find the right answer for.
>
>  What is the best way to backup/restore a set of columns?
>
> Let’s say we have:
>
>
>
> CF1(a,b,c,d) – a,b,c,d are columns/ original CF we would like to take
> backup from
>
>
>
> CF2(d,e,f,g) – e,f,g are different columns, we would restore data into CF2
>
>
>
> A,b,c,d,e,f,g have the same type. Let’s say text.
>
>
>
> Cassandra version: 1.2.10, 3.1.4 datastax distribution
>
> The structure which we would like to take backup from differs from the
> restore CF.
>
>
>
> -  Copy from/copy to cannot be used in cqlsh / error: rcp_timeout
>
> -  Hive /insert overwrite xxx from …. cannot specify columns / I
> don’t want to save all columns and would like to restore into different
> structure.
>
>
>
> So what would be the solution? Is it possible at all?
>
>
>
> Thanks.
>
>
>



-- 
Tyler Hobbs
DataStax 


Cassandra debian package that supports ver. 1.11.11

2013-11-12 Thread Michael Hayes
I need this specific version for my usergrid chef recipe. Right now I’m using 
the datastax tarball from:
http://downloads.datastax.com/community/dsc-cassandra-1.1.11-bin.tar.gz

But a debian package would be better to properly setup the node.

Any suggestions appreciated. Thanks.

Re: Cassandra debian package that supports ver. 1.11.11

2013-11-12 Thread Blair Zajac

On 11/12/2013 12:23 PM, Michael Hayes wrote:

I need this specific version for my usergrid chef recipe. Right now I’m using 
the datastax tarball from:
http://downloads.datastax.com/community/dsc-cassandra-1.1.11-bin.tar.gz

But a debian package would be better to properly setup the node.


http://debian.datastax.com/community/pool/

Blair



Re: Duplicate hard link - Cassandra 1.2.9

2013-11-12 Thread Robert Coli
On Mon, Nov 11, 2013 at 8:06 PM, Aaron Morton wrote:

> If you can reproduce it may be time to raise a ticket including the JBOD
> setup https://issues.apache.org/jira/browse/CASSANDRA
>

Per OP on https://issues.apache.org/jira/browse/CASSANDRA-6298
"
As it turns out, I was using symlinks. I had my last data directory
pointing to the same data directory.

lrwxrwxrwx 1 root root 6 Oct 30 18:37 data01 -> /data1
lrwxrwxrwx 1 root root 6 Oct 30 18:37 data02 -> /data2
lrwxrwxrwx 1 root root 6 Oct 30 18:37 data03 -> /data3
lrwxrwxrwx 1 root root 6 Oct 30 18:37 data04 -> /data4
lrwxrwxrwx 1 root root 6 Oct 30 18:37 data05 -> /data5
lrwxrwxrwx 1 root root 6 Nov 8 18:46 data06 -> /data5

As you can see (not so clearly) the last link data06 is pointing to data5.

Maybe Cassandra can do some basic checking (at startup) to check that the
data directories aren't repeated or pointing to the same location.

In any case, not a bug in the server.
"

=Rob


Re: How would you model that?

2013-11-12 Thread Aaron Morton
> Hey guys, I need to retrieve a list of distinct users based on their activity 
> datetime. How can I model a table to store that kind of information?
If it’s for an arbitrary time slice it will be tricky, if you can use pre set 
time slices something like this would work:

CREATE TABLE (
timeslice_start timestamp, 
timeslice_size  int, 
usertext, 
PRIMARY KEY ( (timeslice_start, timeslice_size), user)
);

That would give you the unique users in a time slice, e.g. unique for a 4 hour 
window. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 9/11/2013, at 12:56 am, Franc Carter  wrote:

> 
> How about something like using a time-range as the key (e.g an hour depending 
> on your update rate) and a composite (time:user)  as the column name
> 
> cheers
> 
> 
> 
> On Fri, Nov 8, 2013 at 10:45 PM, Laing, Michael  
> wrote:
> You could try this:
> 
> CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary key 
> (shard, ts));
> 
> select user, ts from user_activity where shard in ('00', '01', ...) order by 
> ts desc;
> 
> Grab each user and ts the first time you see that user.
> 
> Use as many shards as you think you need to control row size and spread the 
> load.
> 
> Set ttls to expire user_activity entries when you are no longer interested in 
> them.
> 
> ml
> 
> 
> On Fri, Nov 8, 2013 at 6:10 AM, pavli...@gmail.com  wrote:
> Hey guys, I need to retrieve a list of distinct users based on their activity 
> datetime. How can I model a table to store that kind of information?
> 
> The straightforward decision was this:
> 
> CREATE TABLE user_activity (user text primary key, ts timeuuid);
> 
> but it turned out it is impossible to do a select like this:
> 
> select * from user_activity order by ts;
> 
> as it fails with "ORDER BY is only supported when the partition key is 
> restricted by an EQ or an IN".
> 
> How would you model the thing? Just need to have a list of users based on 
> their last activity timestamp...
> 
> Thanks!
> 
> 
> 
> 
> 
> -- 
> Franc Carter | Systems architect | Sirca Ltd
> franc.car...@sirca.org.au | www.sirca.org.au
> Tel: +61 2 8355 2514 
> Level 4, 55 Harrington St, The Rocks NSW 2000
> PO Box H58, Australia Square, Sydney NSW 1215
>