Re: Help on Cassandra Limitaions

2013-09-06 Thread Sylvain Lebresne
Well, I don't know if that's what Patrick replied but that's not correct.
The wording *is* correct, though it does uses CQL3 terms.
For CQL3, the term partition is used to describe all the (CQL) rows that
share the same partition key (If you don't know what the latter is:
http://cassandra.apache.org/doc/cql3/CQL.html).
So it says that all the rows sharing a particular partition key multiplied
by their number of effective columns is capped at 2 billions.

In the thrift terminology, this means a 'thrift row' (not to be confused
with a CQL3 row) cannot have more that 2 billions thrift columns'.

--
Sylvain


On Fri, Sep 6, 2013 at 7:55 AM, Hannu Kröger hkro...@gmail.com wrote:

 I asked the same thing earlier and this is what patrick mcfadin replied:
 It's not worded well. Essentially it's saying there is a 2B limit on a
 row. It should be worded a 'CQL row'

 I hope helps.

 Cheers,
 Hannu

 On 6.9.2013, at 8.20, J Ramesh Kumar rameshj1...@gmail.com wrote:

 Hi,

 http://wiki.apache.org/cassandra/CassandraLimitations

 In the above link, I found the below limitation,

 The maximum number of cells (rows x columns) in a single partition is 2
 billion..

 Here what does partition mean ? Is it node (or) column family (or)
 anything else ?

 Thanks,
 Ramesh




Re: Help on Cassandra Limitaions

2013-09-06 Thread Hannu Kröger
Hi,

Well, that was a word to word quotation. :)

Anyways, I think what you just said is a better explanation than those two
previous ones. I hope it ends up on the wiki page because what it says
there now is causing confusion, no matter how correct it technically is :)

Cheers,
Hannu


2013/9/6 Sylvain Lebresne sylv...@datastax.com

 Well, I don't know if that's what Patrick replied but that's not correct.
 The wording *is* correct, though it does uses CQL3 terms.
 For CQL3, the term partition is used to describe all the (CQL) rows that
 share the same partition key (If you don't know what the latter is:
 http://cassandra.apache.org/doc/cql3/CQL.html).
 So it says that all the rows sharing a particular partition key multiplied
 by their number of effective columns is capped at 2 billions.

 In the thrift terminology, this means a 'thrift row' (not to be confused
 with a CQL3 row) cannot have more that 2 billions thrift columns'.

 --
 Sylvain


 On Fri, Sep 6, 2013 at 7:55 AM, Hannu Kröger hkro...@gmail.com wrote:

 I asked the same thing earlier and this is what patrick mcfadin replied:
 It's not worded well. Essentially it's saying there is a 2B limit on a
 row. It should be worded a 'CQL row'

 I hope helps.

 Cheers,
 Hannu

 On 6.9.2013, at 8.20, J Ramesh Kumar rameshj1...@gmail.com wrote:

 Hi,

 http://wiki.apache.org/cassandra/CassandraLimitations

 In the above link, I found the below limitation,

 The maximum number of cells (rows x columns) in a single partition is 2
 billion..

 Here what does partition mean ? Is it node (or) column family (or)
 anything else ?

 Thanks,
 Ramesh





Cassandra 1.2.4 - Unflushed data lost on restart

2013-09-06 Thread Thapar, Vishal (HP Networking)
I am running Cassandra 1.2.4 in standalone mode and see a data loss when I 
stop/start Cassandra [kill the task].

All data is written in atomic mutation batches. I can see the files in 
commitlog directory but they're not replayed. If I flush the data before 
stopping Cassandra, it is available on restart.

These are steps:

1. Start Cassandra and run the app that adds data.
2. Stop Cassandra and start it again.
3. Query data, not available, not even with cli.

I am using default configuration file settings. My understanding is that any 
data not flushed should be available in commitlog and it should get replayed on 
restart.

- Vishal.




Cassandra Reads

2013-09-06 Thread Sridhar Chellappa
Folks,

When I read Column(s) from a table, does Cassandra read only that column?
Or, does it read the entire row into memory and then filters out the
contents to send only the requested column(s) ?


Re: Cassandra crashes

2013-09-06 Thread Alex Major
Have you changed the appropriate config settings so that Cassandra will run
with only 2GB RAM? You shouldn't find the nodes go down.

Check out this blog post
http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/,
it outlines the configuration settings needed to run Cassandra on 64MB
RAM and might give you some insights.


On Wed, Sep 4, 2013 at 9:44 AM, Jan Algermissen
jan.algermis...@nordsc.comwrote:

 Hi,

 I have set up C* in a very limited environment: 3 VMs at digitalocean with
 2GB RAM and 40GB SSDs, so my expectations about overall performance are low.

 Keyspace uses replication level of 2.

 I am loading 1.5 Mio rows (each 60 columns of a mix of numbers and small
 texts, 300.000 wide rows effektively) in a quite 'agressive' way, using
 java-driver and async update statements.

 After a while of importing data, I start seeing timeouts reported by the
 driver:

 com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
 timeout during write query at consistency ONE (1 replica were required but
 only 0 acknowledged the write

 and then later, host-unavailability exceptions:

 com.datastax.driver.core.exceptions.UnavailableException: Not enough
 replica available for query at consistency ONE (1 required but only 0
 alive).

 Looking at the 3 hosts, I see two C*s went down - which explains that I
 still see some writes succeeding (that must be the one host left,
 satisfying the consitency level ONE).


 The logs tell me AFAIU that the servers shutdown due to reaching the heap
 size limit.

 I am irritated by the fact that the instances (it seems) shut themselves
 down instead of limiting their amount of work. I understand that I need to
 tweak the configuration and likely get more RAM, but still, I would
 actually be satisfied with reduced service (and likely more timeouts in the
 client).  Right now it looks as if I would have to slow down the client
 'artificially'  to prevent the loss of hosts - does that make sense?

 Can anyone explain whether this is intended behavior, meaning I'll just
 have to accept the self-shutdown of the hosts? Or alternatively, what data
 I should collect to investigate the cause further?

 Jan








Re: Cassandra Reads

2013-09-06 Thread Shahab Yunus
It only reads till that column (a sequential scan, I believe) and do not
read the whole row. It uses a row-level column index to reduce the amount
of data read.

Much more details at (first 2-3 are must-reads in fact):
http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=#cassandra/dml/dml_about_reads_c.html
http://www.roman10.net/how-apache-cassandra-read-works/
http://wiki.apache.org/cassandra/ArchitectureInternals

Regards,
Shahab


On Fri, Sep 6, 2013 at 6:28 AM, Sridhar Chellappa schellap2...@gmail.comwrote:

 Folks,

 When I read Column(s) from a table, does Cassandra read only that column?
 Or, does it read the entire row into memory and then filters out the
 contents to send only the requested column(s) ?



Re: Help on Cassandra Limitaions

2013-09-06 Thread Shahab Yunus
Also, Sylvain, you have couple of great posts about relationships between
CQL3/Thrift entities and naming issues:

http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
http://www.datastax.com/dev/blog/thrift-to-cql3

I always refer to them when I get confuse :)

Regards,
Shahab


On Fri, Sep 6, 2013 at 3:04 AM, Hannu Kröger hkro...@gmail.com wrote:

 Hi,

 Well, that was a word to word quotation. :)

 Anyways, I think what you just said is a better explanation than those two
 previous ones. I hope it ends up on the wiki page because what it says
 there now is causing confusion, no matter how correct it technically is :)

 Cheers,
 Hannu


 2013/9/6 Sylvain Lebresne sylv...@datastax.com

 Well, I don't know if that's what Patrick replied but that's not correct.
 The wording *is* correct, though it does uses CQL3 terms.
 For CQL3, the term partition is used to describe all the (CQL) rows
 that share the same partition key (If you don't know what the latter is:
 http://cassandra.apache.org/doc/cql3/CQL.html).
 So it says that all the rows sharing a particular partition key
 multiplied by their number of effective columns is capped at 2 billions.

 In the thrift terminology, this means a 'thrift row' (not to be confused
 with a CQL3 row) cannot have more that 2 billions thrift columns'.

 --
 Sylvain


 On Fri, Sep 6, 2013 at 7:55 AM, Hannu Kröger hkro...@gmail.com wrote:

 I asked the same thing earlier and this is what patrick mcfadin replied:
 It's not worded well. Essentially it's saying there is a 2B limit on a
 row. It should be worded a 'CQL row'

 I hope helps.

 Cheers,
 Hannu

 On 6.9.2013, at 8.20, J Ramesh Kumar rameshj1...@gmail.com wrote:

 Hi,

 http://wiki.apache.org/cassandra/CassandraLimitations

 In the above link, I found the below limitation,

 The maximum number of cells (rows x columns) in a single partition is 2
 billion..

 Here what does partition mean ? Is it node (or) column family (or)
 anything else ?

 Thanks,
 Ramesh






Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)

2013-09-06 Thread Petter von Dolwitz (Hem)
I am struggling with getting secondary indexes to work. I have created
secondary indexes on some fields that are part of the compound primary key
but only one of the indexes seems to work (the one set on the field 'e' on
the table definition below). Using any other secondary index in a where
clause causes the message Request did not complete within rpc_timeout..
It seems like if a put a value in the where clause that does not exist in a
column with secondary index then cassandra quickly return with the result
(0 rows) but if a put in a value that do exist I get a timeout. There is no
exception in the logs in connection with this. I've tried to increase the
timeout to a minute but it does not help.

I am currently running on a single machine with 12 GB RAM and the heap set
to 8 GB. The table and indexes occupy 1 GB of disk space. I query the table
with cqlsh.

The below issue should have fixed something in this area. Is this still a
problem or is it something in my environmnet/design that is the problem?
https://issues.apache.org/jira/browse/CASSANDRA-5851

/Petter

TABLE DEFINITION-

CREATE TABLE x.y (
  a varchar,
  b varchar,
  c varchar,
  d varchar,
  e varchar,
  f varchar,
  g varchar,
  h varchar,
  i int,
  j varchar,
  k varchar,
  l timestamp,
  m blob,
  PRIMARY KEY (a, b, c, d, e, f, g, h, i, j, k, l)
);

CREATE INDEX d_idx
  ON x.y ( d );

CREATE INDEX e_idx
  ON x.y ( e );

CREATE INDEX f_idx
  ON x.y ( f );

CREATE INDEX g_idx
  ON x.y ( g );

CREATE INDEX h_idx
  ON x.y ( h );

CREATE INDEX i_idx
  ON x.y ( i );

CREATE INDEX j_idx
  ON x.y ( j );

CREATE INDEX k_idx
  ON x.y ( k );


Re: Cassandra crashes

2013-09-06 Thread Jan Algermissen

On 06.09.2013, at 13:12, Alex Major al3...@gmail.com wrote:

 Have you changed the appropriate config settings so that Cassandra will run 
 with only 2GB RAM? You shouldn't find the nodes go down.
 
 Check out this blog post 
 http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
  , it outlines the configuration settings needed to run Cassandra on 64MB RAM 
 and might give you some insights.

Yes, I have my fingers on the knobs and have also seen the article you mention 
- very helpful indeed. As well as the replies so far. Thanks very much.

However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my data 
import :-(

Now, while it would be easy to scale out and up a bit until the default config 
of C* is sufficient, I really like to dive deep and try to understand why the 
thing is still going down, IOW, which of my config settings is so darn wrong 
that in most cases kill -9 remains the only way to shutdown the Java process in 
the end.


The problem seems to be the heap size (set to MAX_HEAP_SIZE=640M   and 
HEAP_NEWSIZE=120M ) in combination with some cassandra activity that demands 
too much heap, right?

So how do I find out what activity this is and how do I sufficiently reduce 
that activity.

What bugs me in general is that AFAIU C* is so eager at giving massive write 
speed, that it sort of forgets to protect itself from client demand. I would 
very much like to understand why and how that happens.  I mean: no matter how 
many clients are flooding the database, it should not die due to out of memory 
situations, regardless of any configuration specifics, or?


tl;dr

Currently my client side (with java-driver) after a while reports more and more 
timeouts and then the following exception:

com.datastax.driver.core.ex
ceptions.DriverInternalError: An unexpected error occured server side: 
java.lang.OutOfMemoryError: unable 
to create new native thread ;

On the server side, my cluster remains more or less in this condition:

DN  x 71,33 MB   256 34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
rack1
UN  x  189,38 MB  256 32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  rack1
UN  x198,49 MB  256 33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
rack1

The host that is down (it is the seed host, if that matters) still shows the 
running java process, but I cannot shut down cassandra or connect with 
nodetool, hence kill -9 to the rescue.

In that host, I still see a load of around 1.

jstack -F lists 892 threads, all blocked, except for 5 inactive ones.


The system.log after a few seconds of import shows the following exception:

java.lang.AssertionError: incorrect row data size 771030 written to 
/var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
correct is 771200
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


And then, after about 2 minutes there are out of memory errors:

 ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java 
(line 192) Exception in thread Thread[CompactionExecutor
:5,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:693)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer.init(ParallelCompactionIterable.java:296)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable.iterator(ParallelCompactionIterable.java:73)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:120)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 

Re: Cassandra 1.2.4 - Unflushed data lost on restart

2013-09-06 Thread sankalp kohli
You should be using replication. Not all machines will power off at the
same time.
Regarding changing the fsync setting, even if you choose it to be fully
sync, there have been many studies which have shown that data was lost on
many SSDs even after fsync has returned.
So I will fix this problem by replication and not changing the fsync stuff.


On Fri, Sep 6, 2013 at 10:21 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Are you not using RF = 3 ?


 On Fri, Sep 6, 2013 at 10:14 AM, Thapar, Vishal (HP Networking) 
 vtha...@hp.com wrote:

 My usage requirements are such that there should be least possible data
 loss even in case of a poweroff. When you say clean shutdown do you mean
 Cassandra service stop?

 I ran across this issue when testing out different scenarios to figure
 out what would be best configuration for my requirements, will consider
 batch mode as 10 seconds window might be too much for some of my use cases.

 Thanks for the reply,
 Vishal.
 ---
 From: Robert Coli [mailto:rc...@eventbrite.com]
 Sent: Friday, September 06, 2013 10:35 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra 1.2.4 - Unflushed data lost on restart

 On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) 
 vtha...@hp.com wrote:
 I am running Cassandra 1.2.4 in standalone mode and see a data loss when
 I stop/start Cassandra [kill the task].

 Clean shutdown waits for commitlog flush. Unclean shutdown does not. In
 default periodic mode, this could mean up to 10 seconds of loss. If you
 don't want to lose up to 10 seconds, use batch commitlog mode or... just
 shut down cleanly?

 There are also cases in your vintage Cassandra where data in secondary
 indexes has not been available, in case you are querying on those.

 =Rob





Tuning for heavy write load with limited RAM

2013-09-06 Thread Jan Algermissen
Trying to approach this in a bit more structured way to make it more helpful 
for others.

AFAIU my problem seems to be the combination of heavy write load  and very 
limited RAM (2GB).

C* design seems to cause nodes to run out of heap space instead of reducing 
processing of incoming writes.

What I am looking for is a configuration setting that keps C* stable and puts 
the burden on the client to slow down if it wants all writes to be handled. 
(Sure, we can scale out - but the point here is really that I want to configure 
C*  to protect itself better from being overwhelmed by writes).

What I think I understand so far is that there are four major areas to look at:

- Cache sizes
- Compaction
- GC
- Request handling

* Cache sizes * 
Cache Sizes are solely a topic relevant to reads, correct?  As I do not do 
reads for now,
I set everything to 0 I found about key and row caches. Anything else?

* Compaction*
I do not fully understand how compaction thresholds mix with GC. For example, 
there is a comment[1] in cassandra.yaml that tells me that other config 
switches are far more important than the threshold defined by 
flush_largest_memtables_at.

I did set
- flush_largest_memtables_at: 0.50
- CMSInitiatingOccupancyFraction: 50
Does that make sense or should I go lower, i.e. 0.1 and 10?

In addition, I have disabled compaction throttling (set to 0). Makes sense?

And I did set
- in_memory_compaction_limit_in_mb: 1
Is that actually good or bad?

* GC *
Besides CMSInitiatingOccupancyFraction I do not really have an understanding 
what else regarding GC I should do to prevent the OutOfMemory error I see in my 
logs.

*Request Handling*
The goal here would be to find a configuration that prevents request processing 
when C* is still busy writing to disk (which is what *I* want in this case, 
right?)

I have set
- rpc_server_type: hsha   (though I only have one client, so sync would 
not make a difference)
- rpc_min_threads: 1
- rpc_max_threads: 1
(which also renders hsha vs sync irrelevant, right?)

Do you have any suggestion what I could specifically monitor to see the 
development of the causes of the out of memory error and what other switches I 
should try out?

Jan


[1]
# emergency pressure valve: each time heap usage after a full (CMS)
# garbage collection is above this fraction of the max, Cassandra will
# flush the largest memtables.
#
# Set to 1.0 to disable.  Setting this lower than
# CMSInitiatingOccupancyFraction is not likely to be useful.
#
# RELYING ON THIS AS YOUR PRIMARY TUNING MECHANISM WILL WORK POORLY:
# it is most effective under light to moderate load, or read-heavy
# workloads; under truly massive write load, it will often be too
# little, too late.
flush_largest_memtables_at: 0.50











Re: Cassandra 1.2.4 - Unflushed data lost on restart

2013-09-06 Thread Robert Coli
On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) 
vtha...@hp.com wrote:

 I am running Cassandra 1.2.4 in standalone mode and see a data loss when I
 stop/start Cassandra [kill the task].


Clean shutdown waits for commitlog flush. Unclean shutdown does not. In
default periodic mode, this could mean up to 10 seconds of loss. If you
don't want to lose up to 10 seconds, use batch commitlog mode or... just
shut down cleanly?

There are also cases in your vintage Cassandra where data in secondary
indexes has not been available, in case you are querying on those.

=Rob


RE: Cassandra 1.2.4 - Unflushed data lost on restart

2013-09-06 Thread Thapar, Vishal (HP Networking)
My usage requirements are such that there should be least possible data loss 
even in case of a poweroff. When you say clean shutdown do you mean Cassandra 
service stop?

I ran across this issue when testing out different scenarios to figure out what 
would be best configuration for my requirements, will consider batch mode as 10 
seconds window might be too much for some of my use cases.

Thanks for the reply,
Vishal.
---
From: Robert Coli [mailto:rc...@eventbrite.com] 
Sent: Friday, September 06, 2013 10:35 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra 1.2.4 - Unflushed data lost on restart

On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) vtha...@hp.com 
wrote:
I am running Cassandra 1.2.4 in standalone mode and see a data loss when I 
stop/start Cassandra [kill the task].

Clean shutdown waits for commitlog flush. Unclean shutdown does not. In default 
periodic mode, this could mean up to 10 seconds of loss. If you don't want to 
lose up to 10 seconds, use batch commitlog mode or... just shut down cleanly?

There are also cases in your vintage Cassandra where data in secondary indexes 
has not been available, in case you are querying on those.

=Rob 


Is there a client side method of determining the Cassandra version to which it is connected?

2013-09-06 Thread Dwight Smith
This question is specific to Thrift - but in the process of moving to CQL - so 
either client will be fine.

Thanks

Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-09-06 Thread Jim Ancona
Unfortunately, Netflix doesn't seem to have released Aegisthus as open
source.

Jim


On Fri, Aug 30, 2013 at 1:44 PM, Jeremiah D Jordan 
jeremiah.jor...@gmail.com wrote:

 FYI:
 http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html

 -Jeremiah

 On Aug 30, 2013, at 9:21 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

  is there a SSTableInput for Map/Reduce instead of ColumnFamily (which
 uses thrift)?
 
  We are not worried about repeated reads since we are idempotent but
 would rather have the direct speed (even if we had to read from a snapshot,
 it would be fine).
 
  (We would most likely run our M/R on 4 nodes of the 12 nodes we have
 since we have RF=3 right now).
 
  Thanks,
  Dean




Re: Is there a client side method of determining the Cassandra version to which it is connected?

2013-09-06 Thread Nate McCall
You can get a good idea from describe_version which displays the thrift API
version. Here are some mappings I have in my notes:

1.1.12 has an API version of 19.33.0
1.2.0 has an API version of 19.35.0
1.2.8 has an API version of 19.36.0


On Fri, Sep 6, 2013 at 1:15 PM, Dwight Smith dwight.sm...@genesyslab.comwrote:

  This question is specific to Thrift – but in the process of moving to
 CQL – so either client will be fine.

 ** **

 Thanks



Re: Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)

2013-09-06 Thread Robert Coli
On Fri, Sep 6, 2013 at 6:18 AM, Petter von Dolwitz (Hem) 
petter.von.dolw...@gmail.com wrote:

 I am struggling with getting secondary indexes to work. I have created
 secondary indexes on some fields that are part of the compound primary key
 but only one of the indexes seems to work (the one set on the field 'e' on
 the table definition below). Using any other secondary index in a where
 clause causes the message Request did not complete within rpc_timeout..
 It seems like if a put a value in the where clause that does not exist in a
 column with secondary index then cassandra quickly return with the result
 (0 rows) but if a put in a value that do exist I get a timeout. There is no
 exception in the logs in connection with this. I've tried to increase the
 timeout to a minute but it does not help.


In general unless you absolutely need the atomicity of the update of a
secondary index with the underlying storage row, you are better off making
a manual secondary index column family.

=Rob


Re: Cassandra 1.2.4 - Unflushed data lost on restart

2013-09-06 Thread Mohit Anchlia
Are you not using RF = 3 ?

On Fri, Sep 6, 2013 at 10:14 AM, Thapar, Vishal (HP Networking) 
vtha...@hp.com wrote:

 My usage requirements are such that there should be least possible data
 loss even in case of a poweroff. When you say clean shutdown do you mean
 Cassandra service stop?

 I ran across this issue when testing out different scenarios to figure out
 what would be best configuration for my requirements, will consider batch
 mode as 10 seconds window might be too much for some of my use cases.

 Thanks for the reply,
 Vishal.
 ---
 From: Robert Coli [mailto:rc...@eventbrite.com]
 Sent: Friday, September 06, 2013 10:35 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra 1.2.4 - Unflushed data lost on restart

 On Fri, Sep 6, 2013 at 2:42 AM, Thapar, Vishal (HP Networking) 
 vtha...@hp.com wrote:
 I am running Cassandra 1.2.4 in standalone mode and see a data loss when I
 stop/start Cassandra [kill the task].

 Clean shutdown waits for commitlog flush. Unclean shutdown does not. In
 default periodic mode, this could mean up to 10 seconds of loss. If you
 don't want to lose up to 10 seconds, use batch commitlog mode or... just
 shut down cleanly?

 There are also cases in your vintage Cassandra where data in secondary
 indexes has not been available, in case you are querying on those.

 =Rob