Re:

2012-09-26 Thread Manu Zhang
I still don't see it in jconsole. BTW, how long would you expect to cost to
read a column family of 15 rows if it fits into row cache entirely? It
takes me around 7s now. My experiment is done on a single node.

On Thu, Sep 27, 2012 at 6:00 AM, aaron morton wrote:

> Set the caching strategy for the CF to be ROWS_ONLY.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/09/2012, at 2:18 PM, Manu Zhang  wrote:
>
> The DEFAULT_CACHING_STRATEGY is Caching.KEYS_ONLY but even configuring row
> cache size to be greater zero
>  won't enable row cache. Why?
>
> On Wed, Sep 26, 2012 at 9:44 AM, Manu Zhang wrote:
>
>> I wonder now if "get_range_slices" call will ever look for data in row
>> cache. I don't see it in the codebase. Only the "get" call will check row
>> cache?
>>
>>
>> On Wed, Sep 26, 2012 at 12:11 AM, Charles Brophy wrote:
>>
>>> There are settings in cassandra.yaml that will _gradually_ reduce the
>>> available cache to zero if you are under constant memory pressure:
>>>
>>>  # Set to 1.0 to disable.  
>>> reduce_cache_sizes_at: *
>>> reduce_cache_capacity_to: *
>>>
>>> My experience is that the cache size will not return to the configured
>>> size until a service restart if you leave this enabled.  The text of this
>>> setting is not explicit about the long-term cache shrinkage, so it's easy
>>> to think that it will restore the cache to its configured size after the
>>> pressures have subsided. It won't.
>>>
>>> Charles
>>>
>>> On Tue, Sep 25, 2012 at 8:14 AM, Manu Zhang wrote:
>>>
 I've enabled row cache and set its capacity to 10MB but when I check
 its size in jconsole it's always 0. Isn't it that a row will be written to
 row cache if it isn't there when I read the row? I've bulk loaded the data
 into disk so row cache is crucial to the performance.
>>>
>>>
>>>
>>
>
>


is node tool row count always way off?

2012-09-26 Thread Hiller, Dean
The node tool cfstats, what is the row count estimate usually off by(what 
percentage? Or what absolute number?)

We have a CF with 4 rows that prints this out….

Column Family: bacnet11700AnalogInput8
SSTable count: 3
Space used (live): 13526
Space used (total): 13526
Number of Keys (estimate): 384

An SSTable of 3 sounds very weird….do I need to compact it myself….it's been 
like this for around 12 hours.

Thanks,
Dean


Re:

2012-09-26 Thread Manu Zhang
I mean I have modifications only on one column; do I have to add the rest
columns as well?

On Thu, Sep 27, 2012 at 5:18 AM, aaron morton wrote:

> That looks right to me.
>
> btw, most people use CLI or CQL scripts to manage the schema
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/09/2012, at 7:59 PM, Manu Zhang  wrote:
>
> Is there an example to update column family adding secondary indices with
> thrift api? Here's how I do that now...but what if I have a hundred columns?
>
> // add secondary index on column "o_custkey"
>  CfDef cf_def = new CfDef("tpch", "orders");
> cf_def.setComparator_type("UTF8Type");
>  cf_def.setKey_validation_class("UTF8Type");
> List column_metadata = new LinkedList();
>  ColumnDef col_def = new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_CUSTKEY"), "UTF8Type");
> col_def.setIndex_type(IndexType.KEYS);
>  column_metadata.add(col_def);
> column_metadata.add(new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_ORDERSTATUS"), "UTF8Type"));
>  column_metadata.add(new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_TOTALPRICE"), "UTF8Type"));
> column_metadata.add(new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_ORDERPRIORITY"), "UTF8Type"));
>  column_metadata.add(new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_CLERK"), "UTF8Type"));
> column_metadata.add(new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_SHIPPRIORITY"), "UTF8Type"));
>  column_metadata.add(new
> ColumnDef(CassandraUtil.string2ByteBuffer("O_COMMENT"), "UTF8Type"));
>  cf_def.setColumn_metadata(column_metadata);
> client.system_update_column_family(cf_def);
>
>
>


Re: 1.1.5 Missing Insert! Strange Problem

2012-09-26 Thread Arya Goudarzi
Any change anyone has seen the same mysterious issue?

On Wed, Sep 26, 2012 at 12:03 AM, Arya Goudarzi  wrote:

> No. We don't use TTLs.
>
>
> On Tue, Sep 25, 2012 at 11:47 PM, Roshni Rajagopal <
> roshni_rajago...@hotmail.com> wrote:
>
>>  By any chance is a TTL (time to live ) set on the columns...
>>
>> --
>> Date: Tue, 25 Sep 2012 19:56:19 -0700
>> Subject: 1.1.5 Missing Insert! Strange Problem
>> From: gouda...@gmail.com
>> To: user@cassandra.apache.org
>>
>>
>> Hi All,
>>
>> I have a 4 node cluster setup in 2 zones with NetworkTopology strategy
>> and strategy options for writing a copy to each zone, so the effective load
>> on each machine is 50%.
>>
>> Symptom:
>> I have a column family that has gc grace seconds of 10 days (the
>> default). On 17th there was an insert done to this column family and from
>> our application logs I can see that the client got a successful response
>> back with write consistency of ONE. I can verify the existence of the key
>> that was inserted in Commitlogs of both replicas however it seams that this
>> record was never inserted. I used list to get all the column family rows
>> which were about 800ish, and examine them to see if it could possibly be
>> deleted by our application. List should have shown them to me since I have
>> not gone beyond gc grace seconds if this record was deleted during past
>> days. I could not find it.
>>
>> Things happened:
>> During the same time as this insert was happening, I was performing a
>> rolling upgrade of Cassandra from 1.1.3 to 1.1.5 by taking one node down at
>> a time, performing the package upgrade and restarting the service and going
>> to the next node. I could see from system.log that some mutations were
>> replayed during those restarts, so I suppose the memtables were not flushed
>> before restart.
>>
>>
>> Could this procedure cause the row inser to disappear? How could I
>> troubleshoot as I am running out of ideas.
>>
>> Your help is greatly appreciated.
>>
>>
>> Cheers,
>> =Arya
>>
>
>


Re: Data Modeling: Comments with Voting

2012-09-26 Thread Kirk True
Depending on your needs, you could simply duplicate the comments in two 
separate CFs with the column names including time in one and the vote in 
the other. If you allow for updates to the comments, that would pose 
some issues you'd need to solve at the app level.


On 9/26/12 4:28 PM, Drew Kutcharian wrote:

Hi Guys,

Wondering what would be the best way to model a flat (no sub comments, i.e. 
twitter) comments list with support for voting (where I can sort by create time 
or votes) in Cassandra?

To demonstrate:

Sorted by create time:
- comment 1 (5 votes)
- comment 2 (1 votes)
- comment 3 (no votes)
- comment 4 (10 votes)

Sorted by votes:
- comment 4 (10 votes)
- comment 1 (5 votes)
- comment 2 (1 votes)
- comment 3 (no votes)

It's the sorted-by-votes that I'm having a bit of a trouble with. I'm looking 
for a roll-your-own approach and prefer not to use secondary indexes and CQL 
sorting.

Thanks,

Drew





Re: downgrade from 1.1.4 to 1.0.X

2012-09-26 Thread Radim Kolar

We have paid tool capable of downgrading cassandra 1.2, 1.1, 1.0, 0.8.


Data Modeling: Comments with Voting

2012-09-26 Thread Drew Kutcharian
Hi Guys,

Wondering what would be the best way to model a flat (no sub comments, i.e. 
twitter) comments list with support for voting (where I can sort by create time 
or votes) in Cassandra?

To demonstrate:

Sorted by create time:
- comment 1 (5 votes)
- comment 2 (1 votes)
- comment 3 (no votes)
- comment 4 (10 votes)

Sorted by votes:
- comment 4 (10 votes)
- comment 1 (5 votes)
- comment 2 (1 votes)
- comment 3 (no votes)

It's the sorted-by-votes that I'm having a bit of a trouble with. I'm looking 
for a roll-your-own approach and prefer not to use secondary indexes and CQL 
sorting.

Thanks,

Drew



1000's of column families

2012-09-26 Thread Hiller, Dean
We are streaming data with 1 stream per 1 CF and we have 1000's of CF.  When 
using the tools they are all geared to analyzing ONE column family at a time 
:(.  If I remember correctly, Cassandra supports as many CF's as you want, 
correct?  Even though I am going to have tons of funs with limitations on the 
tools, correct?

(I may end up wrapping the node tool with my own aggregate calls if needed to 
sum up multiple column families and such).

Thanks,
Dean


Re:

2012-09-26 Thread aaron morton
Set the caching strategy for the CF to be ROWS_ONLY.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/09/2012, at 2:18 PM, Manu Zhang  wrote:

> The DEFAULT_CACHING_STRATEGY is Caching.KEYS_ONLY but even configuring row 
> cache size to be greater zero 
>  won't enable row cache. Why? 
> 
> On Wed, Sep 26, 2012 at 9:44 AM, Manu Zhang  wrote:
> I wonder now if "get_range_slices" call will ever look for data in row cache. 
> I don't see it in the codebase. Only the "get" call will check row cache?
> 
> 
> On Wed, Sep 26, 2012 at 12:11 AM, Charles Brophy  wrote:
> There are settings in cassandra.yaml that will _gradually_ reduce the 
> available cache to zero if you are under constant memory pressure:
> 
>  # Set to 1.0 to disable.  
> reduce_cache_sizes_at: *
> reduce_cache_capacity_to: *
> 
> My experience is that the cache size will not return to the configured size 
> until a service restart if you leave this enabled.  The text of this setting 
> is not explicit about the long-term cache shrinkage, so it's easy to think 
> that it will restore the cache to its configured size after the pressures 
> have subsided. It won't. 
> 
> Charles
> 
> On Tue, Sep 25, 2012 at 8:14 AM, Manu Zhang  wrote:
> I've enabled row cache and set its capacity to 10MB but when I check its size 
> in jconsole it's always 0. Isn't it that a row will be written to row cache 
> if it isn't there when I read the row? I've bulk loaded the data into disk so 
> row cache is crucial to the performance.
> 
> 
> 



Re: Running repair negatively impacts read performance?

2012-09-26 Thread aaron morton
Sounds very odd. 

Is read performance degrading _after_ repair and compactions that normally 
result have completed ? 
What Compaction Strategy ?
What OS and JVM ? 

What are are the bloom filter false positive stats from cf stats ?

Do you have some read latency numbers from cfstats ?
Also, could you take a look at cfhistograms  ? 

Cheers
  

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/09/2012, at 3:05 AM, Charles Brophy  wrote:

> Hey guys,
> 
> I've begun to notice that read operations take a performance nose-dive after 
> a standard (full) repair of a fairly large column family: ~11 million 
> records. Interestingly, I've then noticed that read performance returns to 
> normal after a full scrub of the column family. Is it possible that the 
> repair operation is not correctly establishing the bloom filter afterwards? 
> I've noticed an interesting note of the scrub operation is that it will 
> "rebuild sstables with correct bloom filters" which is what is leading me to 
> this conclusion. Does this make sense?
> 
> I'm using 1.1.3 and Oracle JDK 1.6.31
> The column family is a stanard type and I've noticed this exact behavior 
> regardless of the key/column/value serializers in use.
> 
> Charles



Re: The compaction task cannot delete sstables which are used in a repair session

2012-09-26 Thread aaron morton
I see you are using the Windows. 

When using the Windows sometimes the memory mapping of the files has not 
finalised before we try to delete. This is an issue with the Windows and (I 
think) the non Sun JVM's. 

The Cassandra will drop back to using the JVM GC to free disk space when it's 
running low. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/09/2012, at 11:56 PM, Rene Kochen  wrote:

> Is this a bug? I'm using Cassandra 1.0.11:
> 
> INFO 13:45:43,750 Compacting 
> [SSTableReader(path='d:\data\Traxis\Parameters-hd-47-Data.db'), 
> SSTableReader(path='d:\data\Traxis\Parameters-hd-44-Data.db'), 
> SSTableReader(path='d:\data\Traxis\Parameters-hd-46-Data.db'), 
> SSTableReader(path='d:\data\Traxis\Parameters-hd-45-Data.db')]
> INFO 13:45:43,782 Compacted to [d:\data\Traxis\Parameters-hd-48-Data.db,].  
> 2,552 to 638 (~25% of original) bytes for 1 keys at 0.019014MB/s.  Time: 32ms.
> ERROR 13:45:43,782 Unable to delete d:\data\Traxis\Parameters-hd-44-Data.db 
> (it will be removed on server restart; we'll also retry after GC)
> ERROR 13:45:43,782 Unable to delete d:\data\Traxis\Parameters-hd-45-Data.db 
> (it will be removed on server restart; we'll also retry after GC)
> ERROR 13:45:43,797 Unable to delete d:\data\Traxis\Parameters-hd-46-Data.db 
> (it will be removed on server restart; we'll also retry after GC)
> ERROR 13:45:43,797 Unable to delete d:\data\Traxis\Parameters-hd-47-Data.db 
> (it will be removed on server restart; we'll also retry after GC)
> INFO 13:45:43,797 [repair #88f6f3a0-0706-11e2--aac4e84dbbbf] Sending 
> completed merkle tree to /10.49.94.171 for (Traxis,Parameters)
> 
> Thanks,
> 
> Rene



Re: a node stays in joining

2012-09-26 Thread aaron morton
> 
> But the Load keeps on increasing.

Sounds like the nodes are / were sending it data. 

nodetool netstats will show you what's going on. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/09/2012, at 10:22 PM, Satoshi Yamada  wrote:

> 
> hi,
> 
> One node in my cluster stay in "joining". I found a jira about this, which is 
> fixed,
> but still sees the similar thing. This is a node I remove the token first 
> because
> it did not boot correctly and re-joined in the cluster without any pre-set 
> token(should
> I set the previous token?).
> 
> As you see below, the node()'s state is Joining and Effective-Ownership 
> is 0.00 %
> for more than 10 hours. But the Load keeps on increasing.
> 
> Also I noticed in the gossipinfo, the status of the node is BOOT while other 
> node is 
> NORMAL.
> 
> So, how can I get the status of the node to NORMAL?
> 
> $ nodetool -h `hostname` ring
> AddressDC   RackStatus  LoadState 
>  Load  Effe...
>    datacenter1 rack1   Up Normal   122.41 MB  
>   4.27 %
> .
> 
>   datacenter1 rack1   UpJoining371.33 
> MB0.00 %
> 
> 
> $ nodetool -h `hostname` gossipinfo
> 
> 
> /192.0.1.111
>RELEASE=VERSION:1.1.4
>LOAD:3.89343423E8
>STATUS:BOOT, 1231231231312.
>SCHEMA:a442323-..
> 
> thanks,
> satoshi



Re: Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 11:07 AM, Rob Coli  wrote:
> On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh  wrote:
>> [ repair ballooned my data size ]
>> 1. Why repair almost triples data size?
>
> You didn't mention what version of cassandra you're running. In some
> old versions of cassandra (prior to 1.0), repair often creates even
> more extraneous data than it should by design.
>
Thank you for reply.

I run 1.1.5

Honestly, I don't understand what is going on.

I ran major compaction on Sep 15
as result I had one big sstable and several smalls. This is one biggest:

-rw-rw-r-- 1 ubuntu ubuntu  90G Sep 15 12:56 Bidgely-rawstreams-he-8475-Data.db

On Sep 22 (one week later)I ran repair and get two more sstables:

-rw-rw-r-- 1 ubuntu ubuntu  85G Sep 22 00:41 Bidgely-rawstreams-he-8605-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  86G Sep 22 00:45 Bidgely-rawstreams-he-8606-Data.db

I don't understand why it copied data twice. In worst case scenario it
should copy everything (~90G), but data is triplicates (90G + 85G
+85G).
Yesterday I ran repair one more time, six(!) more big sstables are
added. It does'n make any sense! What do I miss?

-rw-rw-r-- 1 ubuntu ubuntu  75G Sep 26 09:43 Bidgely-rawstreams-he-8785-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  77G Sep 26 09:45 Bidgely-rawstreams-he-8788-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  76G Sep 26 11:54 Bidgely-rawstreams-he-8793-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  75G Sep 26 11:55 Bidgely-rawstreams-he-8797-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  76G Sep 26 14:03 Bidgely-rawstreams-he-8804-Data.db
-rw-rw-r-- 1 ubuntu ubuntu  75G Sep 26 14:03 Bidgely-rawstreams-he-8807-Data.db

Even I somehow compact it back to 100G, I will have the same problem
very soon. What did I do wrong?

Andrey


Re:

2012-09-26 Thread aaron morton
That looks right to me. 

btw, most people use CLI or CQL scripts to manage the schema 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/09/2012, at 7:59 PM, Manu Zhang  wrote:

> Is there an example to update column family adding secondary indices with 
> thrift api? Here's how I do that now...but what if I have a hundred columns?
> 
> // add secondary index on column "o_custkey" 
>   CfDef cf_def = new CfDef("tpch", "orders");
>   cf_def.setComparator_type("UTF8Type");
>   cf_def.setKey_validation_class("UTF8Type");
>   List column_metadata = new LinkedList();
>   ColumnDef col_def = new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_CUSTKEY"), "UTF8Type");
>   col_def.setIndex_type(IndexType.KEYS);
>   column_metadata.add(col_def);
>   column_metadata.add(new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_ORDERSTATUS"), "UTF8Type"));
>   column_metadata.add(new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_TOTALPRICE"), "UTF8Type"));
>   column_metadata.add(new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_ORDERPRIORITY"), "UTF8Type"));
>   column_metadata.add(new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_CLERK"), "UTF8Type"));
>   column_metadata.add(new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_SHIPPRIORITY"), "UTF8Type"));
>   column_metadata.add(new 
> ColumnDef(CassandraUtil.string2ByteBuffer("O_COMMENT"), "UTF8Type"));
>   
>   cf_def.setColumn_metadata(column_metadata);
>   client.system_update_column_family(cf_def);



pig and widerows

2012-09-26 Thread William Oberman
Hi,

I'm trying to figure out what's going on with my cassandra/hadoop/pig
system.  I created a "mini" copy of my main cassandra data by randomly
subsampling to get ~50,000 keys.  I was then writing pig scripts but also
the equivalent operation using simple single threaded code to double check
pig.

Of course my very first test failed.  After doing a pig DUMP on the raw
data, what appears to be happening is I'm only getting the first 1024
columns of a key.  After some googling, this seems to be known behavior
unless you add "?widerows=true" to the pig load URI. I tried this, but
it didn't seem to fix anything :-(   Here's the the start of my pig script:
foo = LOAD 'cassandra://KEYSPACE/COLUMN_FAMILY?widerows=true' USING
CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name,
value)});

I'm using cassandra 1.1.5 from datastax rpms.  I'm using hadoop
(0.20.2+923.418-1) and pig (0.8.1+28.39-1) from cloudera rpms.

What am I doing wrong?  Or, how I can enable debugging/logging to next
figure out what is going on?  I haven't had to debug hadoop+pig+cassandra
much, other than doing DUMP/ILLUSTRATE from pig.

will


Re: Why data tripled in size after repair?

2012-09-26 Thread Peter Schuller
> What is strange every time I run repair data takes almost 3 times more
> - 270G, then I run compaction and get 100G back.

https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the
maion issues with repair. In short - in your case the limited
granularity of merkle trees is causing too much data to be streamed
(effectively duplicate data).
https://issues.apache.org/jira/browse/CASSANDRA-3912 may be a bandaid
for you in that it allows granularity to be much finer, and the
process to be more incremental.

A 'nodetool compact' decreases disk space temporarily as you have
noticed, but it may also have a long-term negative effect on steady
state disk space usage depending on your workload. If you've got a
workload that's not limited to insertions only (i.e., you have
overwrites/deletes), a major compaction will tend to push steady state
disk space usage up - because you're creating a single sstable bigger
than what would normally happen, and it takes more total disk space
before it will be part of a compaction again.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Truncate causing subsequent timeout on KeyIterator?

2012-09-26 Thread Conan Cook
Hi,

I'm running a bunch of integration tests using an embedded cassandra
instance via the Cassandra Maven Plugin v1.0.0-1, using Hector v1.0-5.
 I've got an issue where one of the tests is using a StringKeyIterator to
iterate over all the keys in a CF, but it gets TimedOutExceptions every
time when trying to communicate with Cassandra; all the other tests using
the same (Spring-wired) keyspace behave fine (stack trace below).  A
previous test is calling a cluster.truncate() to ensure an empty CF before
each test, and it's this that seems to cause the problem - at least,
commenting it out causes the other test to run fine.

Any ideas on what could be causing this?  Both tests are using the same
instance of Keyspace, autowired via Spring, and the same instance of
Cluster in the same way.  No exceptions are being thrown by the truncate
operation - it completes successfully and does its job.

Thanks,


Conan

Stack trace:

[2012-09-26 18:59:53,002] [WARN ] [main] [m.p.c.c.HConnectionManager] Could
not fullfill request on this host CassandraClient
[2012-09-26 18:59:53,003] [WARN ] [main] [m.p.c.c.HConnectionManager]
Exception:
me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:35)
~[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:163)
~[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:145)
~[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
~[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
~[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getRangeSlices(KeyspaceServiceImpl.java:167)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.model.thrift.ThriftRangeSlicesQuery$1.doInKeyspace(ThriftRangeSlicesQuery.java:66)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.model.thrift.ThriftRangeSlicesQuery$1.doInKeyspace(ThriftRangeSlicesQuery.java:62)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.model.thrift.ThriftRangeSlicesQuery.execute(ThriftRangeSlicesQuery.java:61)
[hector-core-1.0-5.jar:na]
at
me.prettyprint.cassandra.service.KeyIterator.runQuery(KeyIterator.java:102)
[hector-core-1.0-5.jar:na]

..

Caused by: org.apache.cassandra.thrift.TimedOutException: null
at
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12270)
~[cassandra-thrift-1.1.0.jar:1.1.0]
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
~[libthrift-0.7.0.jar:0.7.0]
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683)
~[cassandra-thrift-1.1.0.jar:1.1.0]
at
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667)
~[cassandra-thrift-1.1.0.jar:1.1.0]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:151)
~[hector-core-1.0-5.jar:na]


Re: Why data tripled in size after repair?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh  wrote:
> [ repair ballooned my data size ]
> 1. Why repair almost triples data size?

You didn't mention what version of cassandra you're running. In some
old versions of cassandra (prior to 1.0), repair often creates even
more extraneous data than it should by design.

However, by design, Repair repairs differing ranges based on merkle
trees. Merkle trees are an optimization, what you trade for the
optimization is over-repair. When you have multiple replicas, each
over-repairs. If you are running repair on your whole cluster, this is
why you should use repair -pr, as it reduces the per-replica
over-repair.

> 2. How to compact my data back to 100G?

1) do a major compaction, one CF at a time. if you only have one CF,
you're out of luck because you don't have enough headroom.
2) then convince someone to write "sstablesplit" so you can turn your
100G sstable into [n] smaller sstables and/or learn to live with your
giant sstable

Or add a new data directory with more space in it, to allow you to
compact. I mention the latter in case it is trivial to attach
additional storage in your env.

The other alternative is to wait. Most space will be reclaimed over
time by minor compaction.

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Integrated cassandra

2012-09-26 Thread Robin Verlangen
Thank you both for your reply.

We're not a 100% sure yet about what to use. The application itself is just
as distributed as Cassandra is. It also embeds ElasticSearch.

At this point I only see the "ring" as a real pain in the ass, as I have to
automatically move nodes around to prevent unbalanced setup.

The goal is not to prevent users from connecting to Cassandra. If they want
to change anything internal they can and should. Flexibility is one of our
main goals.

But you might have a point: Cassandra is not a "shoot-and-forget" kind of
software.

Not really sure what to do yet ...

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl



Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/26 Aaron Turner 

> Cassandra is a distributed database meant to run across multiple systems.
> Is your existing Java application distributed as well?  Does "maintain
> control" mean "exclude end users from connecting to it and making changes"
> or merely "provisioning and keep it running well operationally for the
> application"?  Honestly, either of those seem like a lot to ask right now
> for any solution requiring the scalability that Cassandra provides.
>
> That said, I've done embeded PostgreSQL in the past.  Not distributed mind
> you.  And it was on an appliance.  We picked PG because it's super reliable
> and very good at recovering from all kinds of evil things that customers
> do... pulling power cords, etc.  I don't think any of our customers even
> knew we were using PG unless they looked in the Licensing section of the
> manual.
>
> Personally, I don't think Cassandra is there yet where it can be a opaque
> datastore from the end user perspective- especially if you're distributing
> it as part of a software application and don't have full control over the
> hardware/environment.  Not to say Cassandra hasn't been reliable for us,
> but it's far from "install it and forget it".  Simple things like dealing
> with network/node outages or adding/removing new nodes are complicated
> enough that I'd hesitant to automate without some human familiar with
> Cassandra being involved.
>
>
>
>
> On Tue, Sep 25, 2012 at 10:11 PM, Robin Verlangen  wrote:
>
>> Hi there,
>>
>> Is there a way to "embed"/package Cassandra with an other Java
>> application and maintain control over it? Is this done before? Are there
>> any best practices?
>>
>> Why I want to do this? We want to offer as less as configuration as
>> possible to our customers, but only if it's possible without messing around
>> in the Cassandra core.
>>
>> Best regards,
>>
>> Robin Verlangen
>> *Software engineer*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> 
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>
>


Re: Integrated cassandra

2012-09-26 Thread Aaron Turner
Cassandra is a distributed database meant to run across multiple systems.
Is your existing Java application distributed as well?  Does "maintain
control" mean "exclude end users from connecting to it and making changes"
or merely "provisioning and keep it running well operationally for the
application"?  Honestly, either of those seem like a lot to ask right now
for any solution requiring the scalability that Cassandra provides.

That said, I've done embeded PostgreSQL in the past.  Not distributed mind
you.  And it was on an appliance.  We picked PG because it's super reliable
and very good at recovering from all kinds of evil things that customers
do... pulling power cords, etc.  I don't think any of our customers even
knew we were using PG unless they looked in the Licensing section of the
manual.

Personally, I don't think Cassandra is there yet where it can be a opaque
datastore from the end user perspective- especially if you're distributing
it as part of a software application and don't have full control over the
hardware/environment.  Not to say Cassandra hasn't been reliable for us,
but it's far from "install it and forget it".  Simple things like dealing
with network/node outages or adding/removing new nodes are complicated
enough that I'd hesitant to automate without some human familiar with
Cassandra being involved.



On Tue, Sep 25, 2012 at 10:11 PM, Robin Verlangen  wrote:

> Hi there,
>
> Is there a way to "embed"/package Cassandra with an other Java application
> and maintain control over it? Is this done before? Are there any best
> practices?
>
> Why I want to do this? We want to offer as less as configuration as
> possible to our customers, but only if it's possible without messing around
> in the Cassandra core.
>
> Best regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>


-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: any ways to have compaction use less disk space?

2012-09-26 Thread Rob Coli
On Wed, Sep 26, 2012 at 6:05 AM, Sylvain Lebresne  wrote:
> On Wed, Sep 26, 2012 at 2:35 AM, Rob Coli  wrote:
>> 150,000 sstables seem highly unlikely to be performant. As a simple
>> example of why, on the read path the bloom filter for every sstable
>> must be consulted...
>
> Unfortunately that's a bad example since that's not true.

You learn something new every day. Thanks for the clarification.

I reduce my claim to "a huge number of SSTables are unlikely to be
performant". :)

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
Hello everybody!
I have 3 node cluster with replication factor of 3.
each node has 800G disk and it used to have 100G of data.
What is strange every time I run repair data takes almost 3 times more
- 270G, then I run compaction and get 100G back.
Unfortunately, yesterday I forget to compact and run repair again (at
that moment I had around 270G). As result I have 720G on each node.
I run compaction again and get a lot of warnings like this

WARN [CompactionExecutor:732] 2012-09-26 16:13:00,745
CompactionTask.java (line 84) insufficient space to compact all
requested files

which makes sense, because I'm almost out of disk space.

So, I have two questions.

1. Why repair almost triples data size?

2. How to compact my data back to 100G?

Thank you,
  Andrey


Re: Why periodical repairs?

2012-09-26 Thread Tyler Hobbs
The DistributedDeletes link in that section explains the root reason for
needing to do this.  It's not that deletes are forgotten, it's that a write
(deletes are basically tombstone writes) didn't get replicated to all
replicas.  For example, at RF=3, write consistency level QUORUM, if one of
the replicas goes down for several hours while you're performing deletes,
then comes back up, it won't necessarily have all of those tombstones.
Hinted handoff will replay some of the deletes, but not all of them if
you're down for an extended period of time.

Once you have "zombie" data, the only way to get rid of it is to re-run the
delete.

On Wed, Sep 26, 2012 at 3:26 AM, Thomas Stets wrote:

> The Cassandra Operations page (http://wiki.apache.org/cassandra/Operations) 
> says:
>
> > Unless your application performs no deletes, it is vital that production
> clusters run nodetool repair periodically on all nodes in the cluster.
> The hard requirement for repair frequency is the value used for
> GCGraceSeconds Running nodetool repair often enough to guarantee that all
> nodes have performed a repair in a given period GCGraceSeconds long,
> ensures that deletes are not "forgotten" in the cluster.
>
> Is it really that common for deletes to be forgotten, or is it just a
> precaution against an unlikely-but-hard-to-fix problem?
>
>   regards, Thomas
>
>


-- 
Tyler Hobbs
DataStax 


Re: is this a cassandra bug?

2012-09-26 Thread Hiller, Dean
bump

On 9/25/12 2:40 PM, "Hiller, Dean"  wrote:

>Hmmm, is rowkey validation asynchronous to the actually sending of the
>data to cassandra?
>
>I seem to be able to put an invalid type and GET that invalid data back
>just fine even though my key type was an int and the key comparator was
>Decimal
>BUT then in the logs I see a validation fail exception but I never saw
>anything client sideŠin fact, the client READ back the data fine so I am
>bit confused hereŠ..1.1.4Š..I tested this on a single node after seeing it
>in our 6 node cluster with the same results.
>
>Thanks,
>Dean



any ideas on what these mean

2012-09-26 Thread Hiller, Dean
We were consistently getting this exception over and over as we put data into 
the system.  A reboot caused it to go away but we don't want to be rebooting in 
the future….

 1.  When does this occur?
 2.  Is it affecting my data put?  (I have seen other weird validation 
exceptions where my data is still put and I can read it from cassandra and I 
get no exception client side)
 3.  How do I reverse engineer what column familsy 13740 and 13739 are?  (ie. 
Their names?) so I can check for data corruption.

ERROR [MigrationStage:1] 2012-09-26 09:51:03,128 AbstractCassandraDaemon.java 
(line 134) Exception in thread Thread[MigrationStage:1,5,main]
java.lang.RuntimeException: java.io.IOException: 
org.apache.cassandra.config.ConfigurationException: Column family ID mismatch 
(found 13740; expected 13739)
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: 
org.apache.cassandra.config.ConfigurationException: Column family ID mismatch 
(found 13740; expected 13739)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:676)
at 
org.apache.cassandra.db.DefsTable.updateColumnFamily(DefsTable.java:463)
at 
org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:407)
at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:271)
at 
org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:249)
at 
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 6 more
Caused by: org.apache.cassandra.config.ConfigurationException: Column family ID 
mismatch (found 13740; expected 13739)
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:698)
at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:672)   
 ... 12 more

We also see this exception in the same log which is ironic considering the 
above one says found 13740!! And this one says couldn't find it….

ERROR [MutationStage:27379] 2012-09-26 09:50:57,558 RowMutationVerbHandler.java 
(line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=13740
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


Re: any ways to have compaction use less disk space?

2012-09-26 Thread Sylvain Lebresne
On Wed, Sep 26, 2012 at 2:35 AM, Rob Coli  wrote:
> 150,000 sstables seem highly unlikely to be performant. As a simple
> example of why, on the read path the bloom filter for every sstable
> must be consulted...

Unfortunately that's a bad example since that's not true.

Leveled compaction keeps sstables in level of non-overlapping key
ranges. Meaning that a read only has to check one sstable by level (a
little bit more to be precise since it has to include all of Level 0,
but provided you node is not lacking too much behind, that's still a
small amount of sstables). I'm too lazy to do the exact maths but I
believe that for 700gb you'll have 8 levels.

--
Sylvain


Re: is this a cassandra bug?

2012-09-26 Thread Sylvain Lebresne
You're mistaking 'key validation class' and 'comparator'. It is your
key validation class that is DecimalType. Your comparator is UTF8Type,
and yes, switching the comparator from UTF8Type to DecimalType is not
allowed.

--
Sylvain

On Tue, Sep 25, 2012 at 10:13 PM, Hiller, Dean  wrote:
> This is cassandra 1.1.4
>
> Describe shows DecimalType and I test setting comparator TO the 
> DecimalType and it fails  (Realize I have never touched this column family 
> until now except for posting data which succeeded)
>
> [default@unknown] use databus;
> Authenticated to keyspace: databus
> [default@databus] describe bacnet9800AnalogInput9;
> ColumnFamily: bacnet9800AnalogInput9
>   Key Validation Class: org.apache.cassandra.db.marshal.DecimalType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 0.1
>   DC Local Read repair chance: 0.0
>   Replicate on write: true
>   Caching: KEYS_ONLY
>   Bloom Filter FP chance: default
>   Built indexes: []
>   Compaction Strategy: 
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>   Compression Options:
> sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
> [default@databus] update column family bacnet9800AnalogInput9 with comparator 
> = DecimalType;
> org.apache.thrift.transport.TTransportException
> [default@databus]
>
> Exception from system.log from the node in the cluster is
>
> ERROR [MigrationStage:1] 2012-09-25 14:11:20,327 AbstractCassandraDaemon.java 
> (line 134) Exception in thread Thread[MigrationStage:1,5,main]
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.cassandra.config.ConfigurationException: comparators do not match 
> or are not compatible.
> at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: 
> org.apache.cassandra.config.ConfigurationException: comparators do not match 
> or are not compatible.
> at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:676)
> at org.apache.cassandra.db.DefsTable.updateColumnFamily(DefsTable.java:463)
> at org.apache.cassandra.db.DefsTable.mergeColumnFamilies(DefsTable.java:407)
> at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:271)
> at org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:249)
> at 
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> ... 6 more
> Caused by: org.apache.cassandra.config.ConfigurationException: comparators do 
> not match or are not compatible.
> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:705)
> at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:672)
> ... 12 more
>


Re: Nodetool repair and Leveled Compaction

2012-09-26 Thread Omid Aladini
I think this JIRA answers your question:

https://issues.apache.org/jira/browse/CASSANDRA-2610

which in order not to duplicate work (creation of Merkle trees) repair
is done on all replicas for a range.

Cheers,
Omid

On Tue, Sep 25, 2012 at 8:27 AM, Sergey Tryuber  wrote:
> Hi Radim
>
> Unfortunately number of compaction tasks is not overestimated. The number is
> decremented one-by-one and this process takes several hours for our 40GB
> node(( Also, when a lot of compaction tasks appears, we see that total disk
> space used (via JMX) is doubled and Cassandra really tries to compact
> something. When compactions are done, "total disk space used" is back to
> normal.
>
>
> On 24 September 2012 19:04, Radim Kolar  wrote:
>>
>>
>>> Repair process by itself is going well in a background, but the issue I'm
>>> concerned is a lot of unnecessary compaction tasks
>>
>> number in compaction tasks counter is over estimated. For example i have
>> 1100 tasks left and if I will stop inserting data, all tasks will finish
>> within 30 minutes.
>>
>> I suppose that this counter is incremented for every sstable which needs
>> compaction, but its not decremented properly because you can compact about
>> 20 sstables at once, and this reduces counter only by 1.
>
>


Re: Integrated cassandra

2012-09-26 Thread Vivek Mishra
if i am getting it correctly, then what you need to do is open a connection
with cassandra daemon thread and access via client API, Have a look at:
https://github.com/impetus-opensource/Kundera/blob/trunk/kundera-cassandra/src/test/java/com/impetus/client/persistence/CassandraCli.java

here, initClient() is initializing connection and then using this client
instance you can connect and maintain column family/keyspaces.

Above mentioned source code is build using EmbeddedCassandraService, so you
just need to initialize client, but not to start cassandra server.

HTH

-Vivek

On Wed, Sep 26, 2012 at 2:06 PM, Robin Verlangen  wrote:

> Do you have any ideas how to do this Vivek?
>
> Best regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
> 2012/9/26 Vivek Mishra 
>
>> I guess, you can always open/maintain a socket with running cassandra
>> daemon and have a control over specific column families/keyspace  or server
>> itself.
>>
>> -Vivek
>>
>>
>>
>> On Wed, Sep 26, 2012 at 12:51 PM, Robin Verlangen  wrote:
>>
>>> Some additional information: I already read about "Embedding"
>>> http://wiki.apache.org/cassandra/Embedding  however that doesn't seem a
>>> rock solid solution to me. The word "volatile" is not really comforting me
>>> ;-)
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> *Software engineer*
>>> *
>>> *
>>> W http://www.robinverlangen.nl
>>> E ro...@us2.nl
>>>
>>> 
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>>
>>>
>>> 2012/9/25 Robin Verlangen 
>>>
 Hi there,

 Is there a way to "embed"/package Cassandra with an other Java
 application and maintain control over it? Is this done before? Are there
 any best practices?

 Why I want to do this? We want to offer as less as configuration as
 possible to our customers, but only if it's possible without messing around
 in the Cassandra core.

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 

 Disclaimer: The information contained in this message and attachments
 is intended solely for the attention and use of the named addressee and may
 be confidential. If you are not the intended recipient, you are reminded
 that the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.


>>>
>>
>


Re: Integrated cassandra

2012-09-26 Thread Robin Verlangen
Do you have any ideas how to do this Vivek?

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl



Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/26 Vivek Mishra 

> I guess, you can always open/maintain a socket with running cassandra
> daemon and have a control over specific column families/keyspace  or server
> itself.
>
> -Vivek
>
>
>
> On Wed, Sep 26, 2012 at 12:51 PM, Robin Verlangen  wrote:
>
>> Some additional information: I already read about "Embedding"
>> http://wiki.apache.org/cassandra/Embedding  however that doesn't seem a
>> rock solid solution to me. The word "volatile" is not really comforting me
>> ;-)
>>
>> Best regards,
>>
>> Robin Verlangen
>> *Software engineer*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> 
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>>
>> 2012/9/25 Robin Verlangen 
>>
>>> Hi there,
>>>
>>> Is there a way to "embed"/package Cassandra with an other Java
>>> application and maintain control over it? Is this done before? Are there
>>> any best practices?
>>>
>>> Why I want to do this? We want to offer as less as configuration as
>>> possible to our customers, but only if it's possible without messing around
>>> in the Cassandra core.
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> *Software engineer*
>>> *
>>> *
>>> W http://www.robinverlangen.nl
>>> E ro...@us2.nl
>>>
>>> 
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>>
>>
>


Re: Integrated cassandra

2012-09-26 Thread Vivek Mishra
I guess, you can always open/maintain a socket with running cassandra
daemon and have a control over specific column families/keyspace  or server
itself.

-Vivek


On Wed, Sep 26, 2012 at 12:51 PM, Robin Verlangen  wrote:

> Some additional information: I already read about "Embedding"
> http://wiki.apache.org/cassandra/Embedding  however that doesn't seem a
> rock solid solution to me. The word "volatile" is not really comforting me
> ;-)
>
> Best regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
>
> 2012/9/25 Robin Verlangen 
>
>> Hi there,
>>
>> Is there a way to "embed"/package Cassandra with an other Java
>> application and maintain control over it? Is this done before? Are there
>> any best practices?
>>
>> Why I want to do this? We want to offer as less as configuration as
>> possible to our customers, but only if it's possible without messing around
>> in the Cassandra core.
>>
>> Best regards,
>>
>> Robin Verlangen
>> *Software engineer*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> 
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>


Why periodical repairs?

2012-09-26 Thread Thomas Stets
The Cassandra Operations page
(http://wiki.apache.org/cassandra/Operations) says:

> Unless your application performs no deletes, it is vital that production
clusters run nodetool repair periodically on all nodes in the cluster. The
hard requirement for repair frequency is the value used for GCGraceSeconds
Running nodetool repair often enough to guarantee that all nodes have
performed a repair in a given period GCGraceSeconds long, ensures that
deletes are not "forgotten" in the cluster.

Is it really that common for deletes to be forgotten, or is it just a
precaution against an unlikely-but-hard-to-fix problem?

  regards, Thomas


Re: Prevent queries from OOM nodes

2012-09-26 Thread Віталій Тимчишин
Actually an easy way to put cassandra down is
select count(*) from A limit 1000
CQL will read everything into List to make latter a count.

2012/9/26 aaron morton 

> Can you provide some information on the queries and the size of the data
> they traversed ?
>
> The default maximum size for a single thrift message is 16MB, was it
> larger than that ?
> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L375
>
> Cheers
>
>
> On 25/09/2012, at 8:33 AM, Bryce Godfrey 
> wrote:
>
> Is there anything I can do on the configuration side to prevent nodes from
> going OOM due to queries that will read large amounts of data and exceed
> the heap available? 
> ** **
> For the past few days of we had some nodes consistently freezing/crashing
> with OOM.  We got a heap dump into MAT and figured out the nodes were dying
> due to some queries for a few extremely large data sets.  Tracked it back
> to an app that just didn’t prevent users from doing these large queries,
> but it seems like Cassandra could be smart enough to guard against this
> type of thing?
> ** **
> Basically some kind of setting like “if the data to satisfy query >
> available heap then throw an error to the caller and about query”.  I would
> much rather return errors to clients then crash a node, as the error is
> easier to track down that way and resolve.
> ** **
> Thanks.
>
>
>


-- 
Best regards,
 Vitalii Tymchyshyn


Re: Integrated cassandra

2012-09-26 Thread Robin Verlangen
Some additional information: I already read about "Embedding"
http://wiki.apache.org/cassandra/Embedding  however that doesn't seem a
rock solid solution to me. The word "volatile" is not really comforting me
;-)

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl



Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/25 Robin Verlangen 

> Hi there,
>
> Is there a way to "embed"/package Cassandra with an other Java application
> and maintain control over it? Is this done before? Are there any best
> practices?
>
> Why I want to do this? We want to offer as less as configuration as
> possible to our customers, but only if it's possible without messing around
> in the Cassandra core.
>
> Best regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>


Re: 1.1.5 Missing Insert! Strange Problem

2012-09-26 Thread Arya Goudarzi
No. We don't use TTLs.

On Tue, Sep 25, 2012 at 11:47 PM, Roshni Rajagopal <
roshni_rajago...@hotmail.com> wrote:

>  By any chance is a TTL (time to live ) set on the columns...
>
> --
> Date: Tue, 25 Sep 2012 19:56:19 -0700
> Subject: 1.1.5 Missing Insert! Strange Problem
> From: gouda...@gmail.com
> To: user@cassandra.apache.org
>
>
> Hi All,
>
> I have a 4 node cluster setup in 2 zones with NetworkTopology strategy and
> strategy options for writing a copy to each zone, so the effective load on
> each machine is 50%.
>
> Symptom:
> I have a column family that has gc grace seconds of 10 days (the default).
> On 17th there was an insert done to this column family and from our
> application logs I can see that the client got a successful response back
> with write consistency of ONE. I can verify the existence of the key that
> was inserted in Commitlogs of both replicas however it seams that this
> record was never inserted. I used list to get all the column family rows
> which were about 800ish, and examine them to see if it could possibly be
> deleted by our application. List should have shown them to me since I have
> not gone beyond gc grace seconds if this record was deleted during past
> days. I could not find it.
>
> Things happened:
> During the same time as this insert was happening, I was performing a
> rolling upgrade of Cassandra from 1.1.3 to 1.1.5 by taking one node down at
> a time, performing the package upgrade and restarting the service and going
> to the next node. I could see from system.log that some mutations were
> replayed during those restarts, so I suppose the memtables were not flushed
> before restart.
>
>
> Could this procedure cause the row inser to disappear? How could I
> troubleshoot as I am running out of ideas.
>
> Your help is greatly appreciated.
>
>
> Cheers,
> =Arya
>