Re: Wide rows (time series data) and ORM

2013-10-23 Thread Vivek Mishra
Can Kundera work with wide rows in an ORM manner?

What specifically you looking for? Composite column based implementation
can be built using Kundera.
With Recent CQL3 developments, Kundera supports most of these. I think POJO
needs to be aware of number of fields needs to be persisted(Same as CQL3)

-Vivek


On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.com wrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift my
 thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can you
 actually design a POJO that fits the standard recipe for JPA usage? Would
 the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les




Re: MemtablePostFlusher pending

2013-10-23 Thread Kais Ahmed
Thanks robert,

For info if it helps to fix the bug i'm starting the downgrade, i restart
all the node and do a repair and there are a lot of error like this :

EERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 Validator.java (line
242) Failed creating a merkle tree for [repair
#9f9b7fc0-3bbe-11e3-a220-b18f7c69b044 on ks01/messages,
(8746393670077301406,8763948586274310360]], /172.31.38.135 (see log for
details)
ERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 CassandraDaemon.java
(line 185) Exception in thread Thread[ValidationExecutor:2,1,main]
java.lang.AssertionError
at
org.apache.cassandra.db.compaction.PrecompactedRow.update(PrecompactedRow.java:171)
at org.apache.cassandra.repair.Validator.rowHash(Validator.java:198)
at org.apache.cassandra.repair.Validator.add(Validator.java:151)
at
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:798)
at
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:60)
at
org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:395)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

And the repair stop after this error :

ERROR [FlushWriter:9] 2013-10-23 08:39:32,979 CassandraDaemon.java (line
185) Exception in thread Thread[FlushWriter:9,5,main]
java.lang.AssertionError
at
org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186)
at
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358)
at
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)








2013/10/22 Robert Coli rc...@eventbrite.com

 On Mon, Oct 21, 2013 at 11:57 PM, Kais Ahmed k...@neteck-fr.com wrote:

 I will try to create a new cluster 1.2 and copy data, can you tell me
 please the best pratice to do this, do i have to use sstable2json /
 json2sstable or other method.


 Unfortunately to downgrade versions you are going to need to use a method
 like sstable2json/json2sstable.

 Other bulkload options, which mostly don't apply in the downgrade case,
 here :

 http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

 =Rob



How to select timestamp with CQL

2013-10-23 Thread Alex N
Hi,
I was wondering how could I select column timestamp with CQL. I've been
using Hector so far, and it gives me this option. But I want to use
datastax CQL driver now.
I don't want to mess with this value! just read it. I know I should
probably have separate column with timestamp value created by my own, but I
don't want to change the schema and update milions of rows know.
I found this ticket
https://issues.apache.org/jira/browse/CASSANDRA-4217and it's fixed but
I don't know how to use it -

SELECT key, value, timestamp(value) FROM foo; - this doesn't work.
Regards,
Alex


Re: How to select timestamp with CQL

2013-10-23 Thread Cyril Scetbon
Hi,

Now you can ask for the TTL and the TIMESTAMP as shown in the following example 
:

cqlsh:k1 select * FROM t1 ;

 ise| filtre | value_1
++-
 cyril1 |  2 |   49926
 cyril2 |  1 |   18584
 cyril3 |  2 |   31415

cqlsh:k1 select filtre,writetime(filtre),ttl(filtre) FROM t1 ;

 filtre | writetime(filtre) | ttl(filtre)
+---+-
  2 |  1380088288623000 |null
  1 |  1380088288636000 |null
  2 |  1380088289309000 |null

Regards
-- 
Cyril SCETBON

On 23 Oct 2013, at 12:00, Alex N lot...@gmail.com wrote:

 Hi,
 I was wondering how could I select column timestamp with CQL. I've been using 
 Hector so far, and it gives me this option. But I want to use datastax CQL 
 driver now. 
 I don't want to mess with this value! just read it. I know I should probably 
 have separate column with timestamp value created by my own, but I don't want 
 to change the schema and update milions of rows know.
 I found this ticket https://issues.apache.org/jira/browse/CASSANDRA-4217 and 
 it's fixed but I don't know how to use it - 
 SELECT key, value, timestamp(value) FROM foo; - this doesn't work.
 Regards,
 Alex
 
 



Re: How to select timestamp with CQL

2013-10-23 Thread Alex N
Thanks!
I can't find it in the documentation...



2013/10/23 Cyril Scetbon cyril.scet...@free.fr

 Hi,

 Now you can ask for the TTL and the TIMESTAMP as shown in the following
 example :

 cqlsh:k1 select * FROM t1 ;

  *ise*| *filtre* | *value_1*
 ++-
  *cyril1* |  *2* |   *49926*
  *cyril2* |  *1* |   *18584*
  *cyril3* |  *2* |   *31415*

 cqlsh:k1 select filtre,writetime(filtre),ttl(filtre) FROM t1 ;

  *filtre* | *writetime(filtre)* | *ttl(filtre)*
 +---+-
   *2* |  *1380088288623000* |*null*
   *1* |  *1380088288636000* |*null*
   *2* |  *1380088289309000* |*null*

 Regards
 --
 Cyril SCETBON

 On 23 Oct 2013, at 12:00, Alex N lot...@gmail.com wrote:

 Hi,
 I was wondering how could I select column timestamp with CQL. I've been
 using Hector so far, and it gives me this option. But I want to use
 datastax CQL driver now.
 I don't want to mess with this value! just read it. I know I should
 probably have separate column with timestamp value created by my own, but I
 don't want to change the schema and update milions of rows know.
 I found this ticket https://issues.apache.org/jira/browse/CASSANDRA-4217and 
 it's fixed but I don't know how to use it -

 SELECT key, value, timestamp(value) FROM foo; - this doesn't work.
 Regards,
 Alex





Re: How to select timestamp with CQL

2013-10-23 Thread Laing, Michael
http://www.datastax.com/documentation/cql/3.1/webhelp/index.html#cql/cql_reference/select_r.html


On Wed, Oct 23, 2013 at 6:50 AM, Alex N lot...@gmail.com wrote:

 Thanks!
 I can't find it in the documentation...



 2013/10/23 Cyril Scetbon cyril.scet...@free.fr

 Hi,

 Now you can ask for the TTL and the TIMESTAMP as shown in the following
 example :

 cqlsh:k1 select * FROM t1 ;

  *ise*| *filtre* | *value_1*
 ++-
  *cyril1* |  *2* |   *49926*
  *cyril2* |  *1* |   *18584*
  *cyril3* |  *2* |   *31415*

 cqlsh:k1 select filtre,writetime(filtre),ttl(filtre) FROM t1 ;

  *filtre* | *writetime(filtre)* | *ttl(filtre)*
 +---+-
   *2* |  *1380088288623000* |*null*
   *1* |  *1380088288636000* |*null*
   *2* |  *1380088289309000* |*null*

  Regards
 --
 Cyril SCETBON

 On 23 Oct 2013, at 12:00, Alex N lot...@gmail.com wrote:

 Hi,
 I was wondering how could I select column timestamp with CQL. I've been
 using Hector so far, and it gives me this option. But I want to use
 datastax CQL driver now.
 I don't want to mess with this value! just read it. I know I should
 probably have separate column with timestamp value created by my own, but I
 don't want to change the schema and update milions of rows know.
 I found this ticket https://issues.apache.org/jira/browse/CASSANDRA-4217and 
 it's fixed but I don't know how to use it -

 SELECT key, value, timestamp(value) FROM foo; - this doesn't work.
 Regards,
 Alex







RE: Questions related to the data in SSTable files

2013-10-23 Thread java8964 java8964
We enabled the major repair on every node every 7 days.
I think you mean 2 cases of failed write. 
One is the replication failure of a writer. Duplication generated from this 
kind of failed should be very small in my case, because I only parse the data 
from 12 nodes, which should NOT contain any replication nodes.
If one node persistent a write, plus a hint of failed replication write, this 
write will still store as one write in its SSTable files, right? Why need to 
store 2 copies as duplication in SSTable files?
Another case is what you describe as client retries writing when time-out 
exception happens. This can explain the duplication reasonable.
Here is the duplication count happened in our SSTable files. You can see a lot 
of data duplicate 2 times, but also some with even higher number. But max 
duplication count is 27, can one client retry 27 times?
duplication_count duplication_occurrence







2 123615348
3 6446783
4 21102
5 1054
6 2496
7 47
8 726
9 52
10 12
11 3
12 7
13 9
14 7
15 3
16 2
17 2
18 1
19 5
20 5
22 1
23 3
25 2
27 99
Another question is do you have any guess what could cause case 2 happen in my 
original email?
Thanks
Date: Tue, 22 Oct 2013 17:52:24 -0700
Subject: Re: Questions related to the data in SSTable files
From: rc...@eventbrite.com
To: user@cassandra.apache.org

On Tue, Oct 22, 2013 at 5:17 PM, java8964 java8964 java8...@hotmail.com wrote:




Any way I can verify how often the system being repaired? I can ask another 
group who maintain the Cassandra cluster. But do you mean that even the failed 
writes will be stored in the SSTable files? 

repair sessions are logged in system.log, and the best practice is to run a 
repair once every gc_grace_seconds, which defaults to 10 days.

A failed write means only that it failed to meet its ConsistencyLevel in 
the request_timeout. It does not mean that it failed to write everywhere it 
tried to write. There is no rollback, so in practice with RF1 it is likely 
that a failed write succeeded at least somewhere. But if any failure is 
noted, Cassandra will generate a hint for hinted handoff and attempt to 
redeliver the failed write. Also, many/most client applications will respond 
to a timedoutexception by attempting to re-write the failed write, using the 
same client timestamp.

Repair has a fixed granularity, so the larger the size of your dataset the more 
over-repair any given repair will cause.
Duplicates occur as a natural consequences of this, if you have 1 row which 
differs in the merkle tree chunk and the merkle tree chunk is, for example, 
1000 rows.. you will repair one row and duplicate the other 999.
 
=Rob  

Compaction issues

2013-10-23 Thread Russ Garrett
Hi,

We have a cluster which we've recently moved to use
LeveledCompactionStrategy. We were experiencing some disk space
issues, so we added two additional nodes temporarily to aid
compaction. Once the compaction had completed on all nodes, we
decommissioned the two temporary nodes.

All nodes now have a high number of pending tasks which isn't dropping
- they're remaining approximately static. There are constantly
compaction tasks running, but when they complete, the pending tasks
number doesn't drop. We've set the compaction rate limit to 0, and
increased the number of compactor threads until the I/O utilisation is
at maximum, but neither of these have helped.

Any suggestions?

Cheers,

-- 
Russ Garrett
r...@garrett.co.uk


How to use Cassandra on-node storage engine only?

2013-10-23 Thread Yasin Celik
   I am developing an application for data storage. All the replication,
routing and data retrieving types of business are handled in my
application. Up to now, the data is stored in memory. Now, I want to use
Cassandra storage engine to flush data from memory into hard drive. I am
not sure if that is a correct approach.

My question: Can I use the Cassandra data storage engine only? I do not
want to use Cassandra as a whole standalone product (In this case, I should
run one independent Cassandra per node and my application act as if it is
client of Cassandra. This idea will put a lot of burden on node since it
puts unnecessary levels between my application and storage engine).

I have my own replication, ring and routing code. I only need the on-node
storage facilities of Cassandra. I want to embed cassandra in my
application as a library.

-- 
-
Yasin Celik


Re: Compaction issues

2013-10-23 Thread Michael Theroux
One more note,

When we did this conversion, we were on Cassandra 1.1.X.  You didn't mention 
what version of Cassandra you were running,

Thanks,
-Mike

On Oct 23, 2013, at 10:05 AM, Michael Theroux wrote:

 When we made a similar move, for an unknown reason (I didn't hear any 
 feedback from the list when I asked why this might be), compaction didn't 
 start after we moved from SizedTiered to leveled compaction until I ran 
 nodetool compact keyspace column-family-converted-to-lcs.
 
 The thread is here:
 
 http://www.mail-archive.com/user@cassandra.apache.org/msg27726.html
 
 I've also seen other individuals on this list state that those pending 
 compaction stats didn't move unless the node was restarted.  Compaction 
 started to run several minutes after restart.
 
 Thanks,
 -Mike
 
 On Oct 23, 2013, at 9:14 AM, Russ Garrett wrote:
 
 Hi,
 
 We have a cluster which we've recently moved to use
 LeveledCompactionStrategy. We were experiencing some disk space
 issues, so we added two additional nodes temporarily to aid
 compaction. Once the compaction had completed on all nodes, we
 decommissioned the two temporary nodes.
 
 All nodes now have a high number of pending tasks which isn't dropping
 - they're remaining approximately static. There are constantly
 compaction tasks running, but when they complete, the pending tasks
 number doesn't drop. We've set the compaction rate limit to 0, and
 increased the number of compactor threads until the I/O utilisation is
 at maximum, but neither of these have helped.
 
 Any suggestions?
 
 Cheers,
 
 -- 
 Russ Garrett
 r...@garrett.co.uk
 



Re: Compaction issues

2013-10-23 Thread Russ Garrett
On 23 October 2013 15:05, Michael Theroux mthero...@yahoo.com wrote:
 When we made a similar move, for an unknown reason (I didn't hear any 
 feedback from the list when I asked why this might be), compaction didn't 
 start after we moved from SizedTiered to leveled compaction until I ran 
 nodetool compact keyspace column-family-converted-to-lcs.

Yup, I understand that's this issue (the fix is still unreleased):
https://issues.apache.org/jira/browse/CASSANDRA-6092

However, my suspicion is that adding/removing nodes has probably
kickstarted the compaction anyway. I've tried issuing nodetool
compact with no obvious improvement. Also, there are plenty of
compactions running - it just seems like the number of pending tasks
is never affected.

Good point on the version. I'm on 2.0.1.

-- 
Russ Garrett
r...@garrett.co.uk


Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
PlayOrm supports different types of wide rows like embedded list in the object, 
etc. etc.  There is a list of nosql patterns mixed with playorm patterns on 
this page

http://buffalosw.com/wiki/patterns-page/

From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, October 22, 2013 1:18 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Wide rows (time series data) and ORM

As I'm becoming more familiar with Cassandra I'm still trying to shift my 
thinking from relational to NoSQL.

Can Kundera work with wide rows in an ORM manner? In other words, can you 
actually design a POJO that fits the standard recipe for JPA usage? Would the 
queries return collections of the POJO to handle wide row data?

I had considered using Spring and JPA for Cassandra, but it appears that other 
than basic configuration issues for Cassandra, to use Spring and JPA on a 
Cassandra database seems like an effort in futility if Cassandra is used as a 
NoSQL database instead of mimicking an RDBMS solution.

If anyone can shed any light on this, I'd appreciate it.

Thanks.

Les



Re: Questions related to the data in SSTable files

2013-10-23 Thread Robert Coli
On Wed, Oct 23, 2013 at 5:23 AM, java8964 java8964 java8...@hotmail.comwrote:

 We enabled the major repair on every node every 7 days.


This is almost certainly the cause of your many duplicates.

If you don't DELETE heavily, consider changing gc_grace_seconds to 34 days
and then doing a repair on the first of the month.


 If one node persistent a write, plus a hint of failed replication write,
 this write will still store as one write in its SSTable files, right? Why
 need to store 2 copies as duplication in SSTable files?


Write destined for replica nodes A B C.

Write comes into A.

Write fails but actually succeeds in replicating to B. A writes it as a
hint.

B flushes its memtable.

A then delivers hint to B, creating another copy of the identical write in
a memtable.

B then flushes this new memtable.

There are now two copies of the same write on disk.


 Here is the duplication count happened in our SSTable files. You can see a
 lot of data duplicate 2 times, but also some with even higher number. But
 max duplication count is 27, can one client retry 27 times?


This many duplicates are almost certainly a result of repair
over-repairing. Re-read this chunk from my previous mail :


 Repair has a fixed granularity, so the larger the size of your dataset the
 more over-repair any given repair will cause.

 Duplicates occur as a natural consequences of this, if you have 1 row
 which differs in the merkle tree chunk and the merkle tree chunk is, for
 example, 1000 rows.. you will repair one row and duplicate the other
 999.


Question #2 from your original mail is also almost certainly a result of
over-repair. The duplicate chunks can be from any time.

=Rob
PS - What cassandra version?


Re: Wide rows (time series data) and ORM

2013-10-23 Thread Les Hartzman
Hi Vivek,

What I'm looking for are a couple of things as I'm gaining an understanding
of Cassandra. With wide rows and time series data, how do you (or can you)
handle this data in an ORM manner? Now I understand that with CQL3, doing a
select * from time_series_data will return the data as multiple rows. So
does handling this data equal the way you would deal with any mapping of
objects to results in a relational manner? Would you still use a JPA
approach or is there a Cassandra/CQL3-specific way of interacting with the
database?

I expect to use a compound key for partitioning/clustering. For example I'm
planning on creating a table as follows:
  CREATE TABLE sensor_data (
sensor_id   text,
date   text,
data_time_stamptimestamp,
reading  int,
PRIMARY KEY ( (sensor_id, date),
data_time_stamp) );
The 'date' field will be day-specific so that for each day there will be a
new row created.

So will I be able to define a POJO, SensorData, with the fields show above
and basically process each 'row' returned by CQL as another SensorData
object?

Thanks.

Les



On Wed, Oct 23, 2013 at 1:22 AM, Vivek Mishra mishra.v...@gmail.com wrote:

 Can Kundera work with wide rows in an ORM manner?

 What specifically you looking for? Composite column based implementation
 can be built using Kundera.
 With Recent CQL3 developments, Kundera supports most of these. I think
 POJO needs to be aware of number of fields needs to be persisted(Same as
 CQL3)

 -Vivek


 On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.comwrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift my
 thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can you
 actually design a POJO that fits the standard recipe for JPA usage? Would
 the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les





Re: Wide rows (time series data) and ORM

2013-10-23 Thread Les Hartzman
Thanks Dean. I'll check that page out.

Les


On Wed, Oct 23, 2013 at 7:52 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 PlayOrm supports different types of wide rows like embedded list in the
 object, etc. etc.  There is a list of nosql patterns mixed with playorm
 patterns on this page

 http://buffalosw.com/wiki/patterns-page/

 From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tuesday, October 22, 2013 1:18 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Wide rows (time series data) and ORM

 As I'm becoming more familiar with Cassandra I'm still trying to shift my
 thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can you
 actually design a POJO that fits the standard recipe for JPA usage? Would
 the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les




Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
Another idea is the open source Energy Databus project which does time series 
data and is based on PlayORM actually(ORM is a bad name since it is more noSQL 
patterns and not really relational).

http://www.nrel.gov/analysis/databus/

That Energy Databus project is mainly time series data with some meta data.  I 
think NREL may be holding an Energy Databus summit soon (though again it is 
100% time series data and they need to rename it to just Databus which has been 
talked about at NREL).

Dean

From: Les Hartzman lhartz...@gmail.commailto:lhartz...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, October 23, 2013 11:12 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Wide rows (time series data) and ORM

Thanks Dean. I'll check that page out.

Les


On Wed, Oct 23, 2013 at 7:52 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
PlayOrm supports different types of wide rows like embedded list in the object, 
etc. etc.  There is a list of nosql patterns mixed with playorm patterns on 
this page

http://buffalosw.com/wiki/patterns-page/

From: Les Hartzman 
lhartz...@gmail.commailto:lhartz...@gmail.commailto:lhartz...@gmail.commailto:lhartz...@gmail.com
Reply-To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, October 22, 2013 1:18 PM
To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Wide rows (time series data) and ORM

As I'm becoming more familiar with Cassandra I'm still trying to shift my 
thinking from relational to NoSQL.

Can Kundera work with wide rows in an ORM manner? In other words, can you 
actually design a POJO that fits the standard recipe for JPA usage? Would the 
queries return collections of the POJO to handle wide row data?

I had considered using Spring and JPA for Cassandra, but it appears that other 
than basic configuration issues for Cassandra, to use Spring and JPA on a 
Cassandra database seems like an effort in futility if Cassandra is used as a 
NoSQL database instead of mimicking an RDBMS solution.

If anyone can shed any light on this, I'd appreciate it.

Thanks.

Les




Re: [RELEASE] Apache Cassandra 1.2.11 released

2013-10-23 Thread Janne Jalkanen

Question - is https://issues.apache.org/jira/browse/CASSANDRA-6102 in 1.2.11 or 
not? CHANGES.txt says it's not, JIRA says it is.

/Janne (temporarily unable to check out the git repo)

On Oct 22, 2013, at 13:48 , Sylvain Lebresne sylv...@datastax.com wrote:

 The Cassandra team is pleased to announce the release of Apache Cassandra
 version 1.2.11.
 
 Cassandra is a highly scalable second-generation distributed database,
 bringing together Dynamo's fully distributed design and Bigtable's
 ColumnFamily-based data model. You can read more here:
 
  http://cassandra.apache.org/
 
 Downloads of source and binary distributions are listed in our download
 section:
 
  http://cassandra.apache.org/download/
 
 This version is a maintenance/bug fix release[1] on the 1.2 series. As always,
 please pay attention to the release notes[2] and Let us know[3] if you were to
 encounter any problem.
 
 Enjoy!
 
 [1]: http://goo.gl/xjiN74 (CHANGES.txt)
 [2]: http://goo.gl/r5pVU2 (NEWS.txt)
 [3]: https://issues.apache.org/jira/browse/CASSANDRA
 



Re: The performance difference of online bulk insertion and the file-based bulk loading

2013-10-23 Thread Chris Burroughs

On 10/15/2013 08:41 AM, José Elias Queiroga da Costa Araújo wrote:

- is that is there a way that we can warm-up the cache, after the
file-based bulk loading, so that we can allow the data to be cached first
in the memory, and then afterwards, when we issue the bulk retrieval, the
performance can be closer to what is provided by the online-bulk-insertion.


Somewhat hacky, but you can at least warm of the OS page cache by `cat 
FILES  /dev/null`


Re: nodetool status reporting dead node as UN

2013-10-23 Thread Chris Burroughs
When debugging gossip related problems (is this node really 
down/dead/some-werid state) you might have better luck looking at 
`nodetool gossipinfo`.  The UN even though everything is bad thing 
might be https://issues.apache.org/jira/browse/CASSANDRA-5913


I'm not sure what exactly what happened in your case.  I'm also confused 
why an IP changed on restart.


On 10/17/2013 06:12 PM, Philip Persad wrote:

Hello,

I seem to have gotten my cluster into a bit of a strange state.
Pardon the rather verbose email, but there is a fair amount of
background.  I'm running a 3 node Cassandra 2.0.1 cluster.  This
particular cluster is used only rather intermittently for dev/testing
and does not see particularly heavy use, it's mostly a catch-all
cluster for environments which don't have a dedicated cluster to
themselves.  I noticed today that one of the nodes had died because
nodetool repair was failing due to a down replica.  I run nodetool
status and sure enough, one of my nodes shows up as down.

When I looked on the actual box, the cassandra process was up and
running and everything in the logs looked sensible.  The most
controversial thing I saw was 1 CMS Garbage Collection per hour, each
taking ~250 ms.  None the less, the node was not responding, so I
restarted it.  So far so good, everything is starting up, my ~30
column families across ~6 key spaces are all initializing.  The node
then handshakes with my other two nodes and reports them both as up.
Here is where things get strange.  According to the logs on the other
two nodes, the third node has come back up and all is well.  However
in the third node, I see a wall of the following in the logs (IP
addresses masked):

  INFO [GossipTasks:1] 2013-10-17 20:22:25,652 Gossiper.java (line 806)
InetAddress /x.x.x.222 is now DOWN
  INFO [GossipTasks:1] 2013-10-17 20:22:25,653 Gossiper.java (line 806)
InetAddress /x.x.x.221 is now DOWN
  INFO [HANDSHAKE-/10.21.5.222] 2013-10-17 20:22:25,655
OutboundTcpConnection.java (line 386) Handshaking version with
/x.x.x.222
  INFO [RequestResponseStage:3] 2013-10-17 20:22:25,658 Gossiper.java
(line 789) InetAddress /x.x.x.222 is now UP
  INFO [GossipTasks:1] 2013-10-17 20:22:26,654 Gossiper.java (line 806)
InetAddress /x.x.x.222 is now DOWN
  INFO [HANDSHAKE-/10.21.5.222] 2013-10-17 20:22:26,657
OutboundTcpConnection.java (line 386) Handshaking version with
/x.x.x.222
  INFO [RequestResponseStage:4] 2013-10-17 20:22:26,660 Gossiper.java
(line 789) InetAddress /x.x.x.222 is now UP
  INFO [RequestResponseStage:3] 2013-10-17 20:22:26,660 Gossiper.java
(line 789) InetAddress /x.x.x.222 is now UP
  INFO [GossipTasks:1] 2013-10-17 20:22:27,655 Gossiper.java (line 806)
InetAddress /x.x.x.222 is now DOWN
  INFO [HANDSHAKE-/10.21.5.222] 2013-10-17 20:22:27,660
OutboundTcpConnection.java (line 386) Handshaking version with
/x.x.x.222
  INFO [RequestResponseStage:4] 2013-10-17 20:22:27,662 Gossiper.java
(line 789) InetAddress /x.x.x.222 is now UP
  INFO [RequestResponseStage:3] 2013-10-17 20:22:27,662 Gossiper.java
(line 789) InetAddress /x.x.x.222 is now UP
  INFO [HANDSHAKE-/10.21.5.221] 2013-10-17 20:22:28,254
OutboundTcpConnection.java (line 386) Handshaking version with
/x.x.x.221
  INFO [GossipTasks:1] 2013-10-17 20:22:28,657 Gossiper.java (line 806)
InetAddress /x.x.x.222 is now DOWN
  INFO [RequestResponseStage:4] 2013-10-17 20:22:28,660 Gossiper.java
(line 789) InetAddress /x.x.x.221 is now UP
  INFO [RequestResponseStage:3] 2013-10-17 20:22:28,660 Gossiper.java
(line 789) InetAddress /x.x.x.221 is now UP
  INFO [HANDSHAKE-/10.21.5.222] 2013-10-17 20:22:28,661
OutboundTcpConnection.java (line 386) Handshaking version with
/x.x.x.222
  INFO [RequestResponseStage:4] 2013-10-17 20:22:28,663 Gossiper.java
(line 789) InetAddress /x.x.x.222 is now UP
  INFO [GossipTasks:1] 2013-10-17 20:22:29,658 Gossiper.java (line 806)
InetAddress /x.x.x.222 is now DOWN
  INFO [GossipTasks:1] 2013-10-17 20:22:29,660 Gossiper.java (line 806)
InetAddress /x.x.x.221 is now DOWN

Additional, client requests to the cluster at consistency QUORUM start
failing (saying 2 responses were required but only 1 replica
responded).  According to nodetool status, all the nodes are up.

This is clearly not good.  I take down the problem node.  Nodetool
reports it down and QUORUM client reads/writes start working again.
In an attempt to get the cluster back into a good state, I delete all
the data on the problem node and then bring it back up.  The other two
nodes log a changed host ID for the IP of the node I wiped and then
handshake with it.  The problem node also comes up, but reads/writes
start failing again with the same error.

I decide to take the problem node down again.  However this time, even
after the process is dead, nodetool and the other two nodes report
that my third node is still up and requests to the cluster continue to
fail.  Running nodetool status against either of the live nodes shows
that all nodes are up.  Running nodetool status against 

Re: Huge multi-data center latencies

2013-10-23 Thread Chris Burroughs

On 10/21/2013 07:03 PM, Hobin Yoon wrote:

Another question is how do you get the local DC name?



Have a look at org.apache.cassandra.db.EndpointSnitchInfo.getDatacenter



Re: How to use Cassandra on-node storage engine only?

2013-10-23 Thread Chris Burroughs
As far as I know this had not been done before.  I would be interested 
in hearing how it turned out.


On 10/23/2013 09:47 AM, Yasin Celik wrote:

I am developing an application for data storage. All the replication,
routing and data retrieving types of business are handled in my
application. Up to now, the data is stored in memory. Now, I want to use
Cassandra storage engine to flush data from memory into hard drive. I am
not sure if that is a correct approach.

My question: Can I use the Cassandra data storage engine only? I do not
want to use Cassandra as a whole standalone product (In this case, I should
run one independent Cassandra per node and my application act as if it is
client of Cassandra. This idea will put a lot of burden on node since it
puts unnecessary levels between my application and storage engine).

I have my own replication, ring and routing code. I only need the on-node
storage facilities of Cassandra. I want to embed cassandra in my
application as a library.





Re: MemtablePostFlusher pending

2013-10-23 Thread Aaron Morton
On a plane and cannot check jira but…

 ERROR [FlushWriter:216] 2013-10-07 07:11:46,538 CassandraDaemon.java (line 
 186) Exception in thread Thread[FlushWriter:216,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
Happened because we tried to write a row to disk that had zero columns and was 
not a row level tombstone. 


 ERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 CassandraDaemon.java 
 (line 185) Exception in thread Thread[ValidationExecutor:2,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.update(PrecompactedRow.java:171)
 at org.apache.cassandra.repair.Validator.rowHash(Validator.java:198)
 at org.apache.cassandra.repair.Validator.add(Validator.java:151)

I *think* this is happening for similar reasons. 

(notes to self below)…

public PrecompactedRow(CompactionController controller, 
ListSSTableIdentityIterator rows)
{
this(rows.get(0).getKey(),
 removeDeletedAndOldShards(rows.get(0).getKey(), controller, 
merge(rows, controller)));
}

 results in call to this on CFS

public static ColumnFamily removeDeletedCF(ColumnFamily cf, int gcBefore)
{
cf.maybeResetDeletionTimes(gcBefore);
return cf.getColumnCount() == 0  !cf.isMarkedForDelete() ? null : cf;
}

If the CF has zero columns and is not marked for delete the CF will be null, 
and the PreCompacedRow will be created with a non cf. This is the source of the 
assertion. 

Any information on how you are using cassandra, does the zero columns no row 
delete idea sound like something you are doing ? 


This may already be fixed. Will take a look later when on the ground. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/10/2013, at 9:50 PM, Kais Ahmed k...@neteck-fr.com wrote:

 Thanks robert,
 
 For info if it helps to fix the bug i'm starting the downgrade, i restart all 
 the node and do a repair and there are a lot of error like this :
 
 EERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 Validator.java (line 
 242) Failed creating a merkle tree for [repair 
 #9f9b7fc0-3bbe-11e3-a220-b18f7c69b044 on ks01/messages, 
 (8746393670077301406,8763948586274310360]], /172.31.38.135 (see log for 
 details)
 ERROR [ValidationExecutor:2] 2013-10-23 08:39:27,558 CassandraDaemon.java 
 (line 185) Exception in thread Thread[ValidationExecutor:2,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.update(PrecompactedRow.java:171)
 at org.apache.cassandra.repair.Validator.rowHash(Validator.java:198)
 at org.apache.cassandra.repair.Validator.add(Validator.java:151)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:798)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:60)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:395)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 
 And the repair stop after this error :
 
 ERROR [FlushWriter:9] 2013-10-23 08:39:32,979 CassandraDaemon.java (line 185) 
 Exception in thread Thread[FlushWriter:9,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.rawAppend(SSTableWriter.java:198)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:186)
 at 
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:358)
 at 
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:317)
 at 
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 
 
 
 
 
 
 
 
 2013/10/22 Robert Coli rc...@eventbrite.com
 On Mon, Oct 21, 2013 at 11:57 PM, Kais Ahmed k...@neteck-fr.com wrote:
 I will try to create a new cluster 1.2 and copy data, can you tell me please 
 the best pratice to do this, do i have to use sstable2json / json2sstable or 
 other method.
 
 Unfortunately to downgrade versions you are going to need to use a method 
 like sstable2json/json2sstable.
 
 Other bulkload options, which mostly don't apply in the downgrade case, here :
 
 

Re: How to use Cassandra on-node storage engine only?

2013-10-23 Thread Aaron Morton
  Can I use the Cassandra data storage engine only?
 

should be able to, it's pretty well architected. 

I did a talk at Cassandra EU last week about the internals which will be 
helpful, look on the Planet Cassandra site it will be posted there soon. (I did 
the same talk at Cassandra SF this year as well.)

If you are looking at the code the getColumnFamily() and apply() functions on 
the o.a.c.db.Table class are the read and write calls you would want to use. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/10/2013, at 2:47 AM, Yasin Celik yasinceli...@gmail.com wrote:

 
 
 I am developing an application for data storage. All the replication, routing 
 and data retrieving types of business are handled in my application. Up to 
 now, the data is stored in memory. Now, I want to use Cassandra storage 
 engine to flush data from memory into hard drive. I am not sure if that is a 
 correct approach.
 
 My question: Can I use the Cassandra data storage engine only? I do not want 
 to use Cassandra as a whole standalone product (In this case, I should run 
 one independent Cassandra per node and my application act as if it is client 
 of Cassandra. This idea will put a lot of burden on node since it puts 
 unnecessary levels between my application and storage engine).
 
 I have my own replication, ring and routing code. I only need the on-node 
 storage facilities of Cassandra. I want to embed cassandra in my application 
 as a library.
 
 
 -- 
 -
 Yasin Celik



Re: Compaction issues

2013-10-23 Thread Aaron Morton
 Also, there are plenty of
 compactions running - it just seems like the number of pending tasks
 is never affected.
Is there ever a time when the pending count is non zero but nodetool 
compactionstats does not show any running tasks ? 

If compaction cannot keep up you may be generating data faster than LCS can 
compact it. What sort of disks do you have?

What is the min_sstable_size ? The old default was 5, the new one is 130. The 
higher value will result in less IO. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/10/2013, at 3:16 AM, Russ Garrett r...@garrett.co.uk wrote:

 On 23 October 2013 15:05, Michael Theroux mthero...@yahoo.com wrote:
 When we made a similar move, for an unknown reason (I didn't hear any 
 feedback from the list when I asked why this might be), compaction didn't 
 start after we moved from SizedTiered to leveled compaction until I ran 
 nodetool compact keyspace column-family-converted-to-lcs.
 
 Yup, I understand that's this issue (the fix is still unreleased):
 https://issues.apache.org/jira/browse/CASSANDRA-6092
 
 However, my suspicion is that adding/removing nodes has probably
 kickstarted the compaction anyway. I've tried issuing nodetool
 compact with no obvious improvement. Also, there are plenty of
 compactions running - it just seems like the number of pending tasks
 is never affected.
 
 Good point on the version. I'm on 2.0.1.
 
 -- 
 Russ Garrett
 r...@garrett.co.uk



Re: Wide rows (time series data) and ORM

2013-10-23 Thread Vivek Mishra
Hi,
CREATE TABLE sensor_data (
sensor_id   text,
date   text,
data_time_stamptimestamp,
reading  int,
PRIMARY KEY ( (sensor_id, date),
data_time_stamp) );

Yes, you can create a POJO for this and map exactly with one row as a POJO
object.

Please have a look at:
https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera

There are users built production system using Kundera, please refer :
https://github.com/impetus-opensource/Kundera/wiki/Kundera-in-Production-Deployments


I am working as a core commitor in Kundera, please do let me know if you
have any query.

Sincerely,
-Vivek



On Wed, Oct 23, 2013 at 10:41 PM, Les Hartzman lhartz...@gmail.com wrote:

 Hi Vivek,

 What I'm looking for are a couple of things as I'm gaining an
 understanding of Cassandra. With wide rows and time series data, how do you
 (or can you) handle this data in an ORM manner? Now I understand that with
 CQL3, doing a select * from time_series_data will return the data as
 multiple rows. So does handling this data equal the way you would deal with
 any mapping of objects to results in a relational manner? Would you still
 use a JPA approach or is there a Cassandra/CQL3-specific way of interacting
 with the database?

 I expect to use a compound key for partitioning/clustering. For example
 I'm planning on creating a table as follows:
   CREATE TABLE sensor_data (
 sensor_id   text,
 date   text,
 data_time_stamptimestamp,
 reading  int,
 PRIMARY KEY ( (sensor_id, date),
 data_time_stamp) );
 The 'date' field will be day-specific so that for each day there will be a
 new row created.

 So will I be able to define a POJO, SensorData, with the fields show above
 and basically process each 'row' returned by CQL as another SensorData
 object?

 Thanks.

 Les



 On Wed, Oct 23, 2013 at 1:22 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 Can Kundera work with wide rows in an ORM manner?

 What specifically you looking for? Composite column based implementation
 can be built using Kundera.
 With Recent CQL3 developments, Kundera supports most of these. I think
 POJO needs to be aware of number of fields needs to be persisted(Same as
 CQL3)

 -Vivek


 On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.comwrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift
 my thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can
 you actually design a POJO that fits the standard recipe for JPA usage?
 Would the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears that
 other than basic configuration issues for Cassandra, to use Spring and JPA
 on a Cassandra database seems like an effort in futility if Cassandra is
 used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les






read latencies?

2013-10-23 Thread Matt Mankins
Hi.

I have a table with about 300k rows in it, and am doing a query that returns 
about 800 results.

select * from fc.co WHERE thread_key = 'fastcompany:3000619';

The read latencies seem really high (upwards of 500ms)? Or is this expected? Is 
this bad schema, or…? What's the best way to trace the bottleneck, besides this 
tracing query:

http://pastebin.com/sherFpgY

Or, how would you interpret that?

I'm not sure that row caches are being used, despite them being turned on in 
the cassandra.yaml file.

I'm using a 3 node cluster on amazon, using datastax community edition, 
cassandra 2.0.1, in the same EC2 availability zone. 

Many thanks,
@Mankins



Re: Wide rows (time series data) and ORM

2013-10-23 Thread Les Hartzman
Thanks Vivek. I'll look over those links tonight.



On Wed, Oct 23, 2013 at 4:20 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 CREATE TABLE sensor_data (
 sensor_id   text,
 date   text,
 data_time_stamptimestamp,
 reading  int,
 PRIMARY KEY ( (sensor_id, date),
 data_time_stamp) );

 Yes, you can create a POJO for this and map exactly with one row as a POJO
 object.

 Please have a look at:

 https://github.com/impetus-opensource/Kundera/wiki/Using-Compound-keys-with-Kundera

 There are users built production system using Kundera, please refer :

 https://github.com/impetus-opensource/Kundera/wiki/Kundera-in-Production-Deployments


 I am working as a core commitor in Kundera, please do let me know if you
 have any query.

 Sincerely,
 -Vivek



 On Wed, Oct 23, 2013 at 10:41 PM, Les Hartzman lhartz...@gmail.comwrote:

 Hi Vivek,

 What I'm looking for are a couple of things as I'm gaining an
 understanding of Cassandra. With wide rows and time series data, how do you
 (or can you) handle this data in an ORM manner? Now I understand that with
 CQL3, doing a select * from time_series_data will return the data as
 multiple rows. So does handling this data equal the way you would deal with
 any mapping of objects to results in a relational manner? Would you still
 use a JPA approach or is there a Cassandra/CQL3-specific way of interacting
 with the database?

 I expect to use a compound key for partitioning/clustering. For example
 I'm planning on creating a table as follows:
   CREATE TABLE sensor_data (
 sensor_id   text,
 date   text,
 data_time_stamptimestamp,
 reading  int,
 PRIMARY KEY ( (sensor_id, date),
 data_time_stamp) );
 The 'date' field will be day-specific so that for each day there will be
 a new row created.

 So will I be able to define a POJO, SensorData, with the fields show
 above and basically process each 'row' returned by CQL as another
 SensorData object?

 Thanks.

 Les



 On Wed, Oct 23, 2013 at 1:22 AM, Vivek Mishra mishra.v...@gmail.comwrote:

 Can Kundera work with wide rows in an ORM manner?

 What specifically you looking for? Composite column based implementation
 can be built using Kundera.
 With Recent CQL3 developments, Kundera supports most of these. I think
 POJO needs to be aware of number of fields needs to be persisted(Same as
 CQL3)

 -Vivek


 On Wed, Oct 23, 2013 at 12:48 AM, Les Hartzman lhartz...@gmail.comwrote:

 As I'm becoming more familiar with Cassandra I'm still trying to shift
 my thinking from relational to NoSQL.

 Can Kundera work with wide rows in an ORM manner? In other words, can
 you actually design a POJO that fits the standard recipe for JPA usage?
 Would the queries return collections of the POJO to handle wide row data?

 I had considered using Spring and JPA for Cassandra, but it appears
 that other than basic configuration issues for Cassandra, to use Spring and
 JPA on a Cassandra database seems like an effort in futility if Cassandra
 is used as a NoSQL database instead of mimicking an RDBMS solution.

 If anyone can shed any light on this, I'd appreciate it.

 Thanks.

 Les







RE: gc_grace_seconds to 0?

2013-10-23 Thread Arindam Barua

Forwarding to the group in case this helps out anyone else.


If so, should I set gc_grace_seconds to a lower non-zero value like 1-2 days?
Yes.

A
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/10/2013, at 1:08 PM, Arindam Barua 
aba...@247-inc.commailto:aba...@247-inc.com wrote:


We are not doing deletes, but are setting ttls of 8 days on most of our columns 
(these are not updates to existing columns). Hence it seems safe to reduce 
gc_grace_seconds to even 0 from a tombstones getting back to life concern. The 
motivation is that the lower gc_grace_seconds will help us reclaim tombstoned 
data quickly, which really adds up otherwise since it will otherwise linger 
around for an additional 10 days by default. However, I'm concerned about the 
hints being ttl'd to 0, which would mean hints would be effectively turned off? 
If so, should I set gc_grace_seconds to a lower non-zero value like 1-2 days?

Thanks,
Arindam

-Original Message-
From: Aaron Morton [mailto:aa...@thelastpickle.comhttp://thelastpickle.com]
Sent: Tuesday, October 22, 2013 11:50 AM
To: Arindam Barua
Subject: Re: Decommissioned nodes not leaving and Hinted Handoff flood


I haven't seen documentation for this elsewhere. I'm using Cassandra 1.1.5, 
and the yaml seems to mentions max_hint_window_in_ms for that purpose.
max_hint_window_in_ms is how long we will collect hints for when a node is down.


We don't have any deletes in our application, and hence are considering 
making gc_grace_seconds 0. However, if this affects the ttls of hints, then 
we probably don't want it to be 0.
If you are not doing deletes you can leave gc_grace_seconds with the default.



thanks
aaron



-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/10/2013, at 6:06 AM, Arindam Barua 
aba...@247-inc.commailto:aba...@247-inc.com wrote:



Hi Aaron,

If you have a chance, can you please respond to a question I had related to one 
of your emails from earlier.


The hints are stored with a TTL that is the gc_grace_seconds for the CF a the 
time the hint is written, so they will eventually be purged by compaction.
I haven't seen documentation for this elsewhere. I'm using Cassandra 1.1.5, and 
the yaml seems to mentions max_hint_window_in_ms for that purpose.

We don't have any deletes in our application, and hence are considering making 
gc_grace_seconds 0. However, if this affects the ttls of hints, then we 
probably don't want it to be 0.

Thanks,
Arindam

[1]
hinted_handoff_enabled: true
# this defines the maximum amount of time a dead host will have hints
# generated.  After it has been dead this long, hints will be dropped.
max_hint_window_in_ms: 360 # one hour

From: Arindam Barua [mailto:aba...@247-inc.com]
Sent: Wednesday, October 16, 2013 8:32 PM
To: user@cassandra.apache.org
Subject: gc_grace_seconds to 0?


We don't do any deletes in our cluster, but do set ttls of 8 days on most of 
the columns. After reading a bunch of earlier threads, I have concluded that I 
can safely set gc_grace_seconds to 0 and not have to worry about expired 
columns coming back to life. However, I wanted to know if there is any other 
downside to setting gc_grace_seconds to 0. Eg. I saw a mention of the ttl of 
hints set to gc_grace_seconds.

Thanks,
Arindam