Re: is the select result grouped by the value of the partition key?

2013-04-14 Thread aaron morton
 
 Is it guaranteed that the rows are grouped by the value of the
 partition key? That is, is it guaranteed that I'll get
yes.


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 7:24 PM, Sorin Manolache sor...@gmail.com wrote:

 On 2013-04-11 22:10, aaron morton wrote:
 Is it guaranteed that the rows are grouped by the value of the
 partition key? That is, is it guaranteed that I'll get
 Your primary key (k1, k2) is considered in type parts (partition_key ,
 grouping_columns). In your case the primary_key is key and the grouping
 column in k2. Columns are ordered by the grouping columns, k2.
 
 See http://thelastpickle.com/2013/01/11/primary-keys-in-cql/
 
 Thank you for the answer.
 
 However my question was about the _grouping_ (not ordering) of _rows_ (not 
 columns).
 
 Sorin
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 12/04/2013, at 3:19 AM, Sorin Manolache sor...@gmail.com
 mailto:sor...@gmail.com wrote:
 
 Hello,
 
 Let us consider that we have a table t created as follows:
 
 create table t(k1 vachar, k2 varchar, value varchar, primary key (k1,
 k2));
 
 Its contents is
 
 a m x
 a n y
 z 0 9
 z 1 8
 
 and I perform a
 
 select * from p where k1 in ('a', 'z');
 
 Is it guaranteed that the rows are grouped by the value of the
 partition key? That is, is it guaranteed that I'll get
 
 a m x
 a n y
 z 0 9
 z 1 8
 
 or
 
 a n y
 a m x
 z 1 8
 z 0 9
 
 or even
 
 z 0 9
 z 1 8
 a n y
 a m x
 
 but NEVER
 
 a m x
 z 0 9
 a n y
 z 1 8
 
 
 Thank you,
 Sorin
 
 



Re: multiple Datacenter values in PropertyFileSnitch

2013-04-14 Thread aaron morton
 So that 2 apps with same and very high load pattern are not clashing.
I'm not sure what the advantage is of putting two apps in the same cluster, but 
using the replication strategy properties so they are on different nodes. The 
reason to put the apps in the same cluster is to share the resources. 
 
Having a different number of nodes in different DC's and mixing the RF between 
them can get complicated. 

What sort of load are you considering? IMHO the simple thing to do do some 
capacity planning and when in doubt start with one multi DC cluster with the 
same RF in both. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 7:33 PM, Andras Szerdahelyi 
andras.szerdahe...@ignitionone.com wrote:

 I would replicate your different keyspaces to different DCs and scale those 
 appropriately 
 So, for example, HighLoad KS replicates to really-huge-dc, which would have, 
 10 nodes, LowerLoad KS replicates to smaller-dc with 5 nodes.
 The idea is , you do not mix your different keyspaces in the same datacenter 
 ( this is possible with NetworkTopology ) or for redundancy/HA purposes you 
 place a single replica in the other keyspace's DC but you direct your 
 applications to the primary DC of the keyspace, with LOCAL_QUORUM or ONE 
 reads.
 
 Regards,
 Andras
 
 From: Matthias Zeilinger matthias.zeilin...@bwinparty.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Friday 12 April 2013 07:57
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: RE: multiple Datacenter values in PropertyFileSnitch
 
 I´m using for each application it´s own keyspace.
 What I want is to split up for different load patterns.
 So that 2 apps with same and very high load pattern are not clashing.
  
 For other load patterns I want to use another splitting.
  
 Is there any best practice or should I scale out, so that the complete load 
 can be distributed to on all nodes?
  
 Br,
 Matthias Zeilinger
 Production Operation – Shared Services
  
 P: +43 (0) 50 858-31185
 M: +43 (0) 664 85-34459
 E: matthias.zeilin...@bwinparty.com
  
 bwin.party services (Austria) GmbH
 Marxergasse 1B
 A-1030 Vienna
  
 www.bwinparty.com
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Donnerstag, 11. April 2013 20:48
 To: user@cassandra.apache.org
 Subject: Re: multiple Datacenter values in PropertyFileSnitch
  
 A node can only exist in one DC and one rack. 
  
 Use different keyspaces as suggested. 
  
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
  
 @aaronmorton
 http://www.thelastpickle.com
  
 On 12/04/2013, at 1:47 AM, Jabbar Azam aja...@gmail.com wrote:
 
 
 Hello,
 
 I'm not an expert but I don't think you can do what you want. The way to 
 separate data for applications on the same cluster is to use different tables 
 for different applications or use multiple keyspaces, a keyspace per 
 application. The replication factor you specify for each keyspace specifies 
 how many copies of the data are stored in each datacenter.
 
 You can't specify that data for a particular application is stored on a 
 specific node, unless that node is in its own cluster.
 
 I think of a cassandra cluster as a shared resource where all the 
 applications have access to all the nodes in the cluster.
  
 
 Thanks
 
 Jabbar Azam
  
 
 On 11 April 2013 14:13, Matthias Zeilinger matthias.zeilin...@bwinparty.com 
 wrote:
 Hi,
  
 I would like to create big cluster for many applications.
 Within this cluster I would like to separate the data for each application, 
 which can be easily done via different virtual datacenters and the correct 
 replication strategy.
 What I would like to know, if I can specify for 1 node multiple values in the 
 PropertyFileSnitch configuration, so that I can use 1 node for more 
 applications?
 For example:
 6 nodes:
 3 for App A
 3 for App B
 4 for App C
  
 I want to have such a configuration:
 Node 1 – DC-A DC-C
 Node 2 – DC-B  DC-C
 Node 3 – DC-A  DC-C
 Node 4 – DC-B  DC-C
 Node 5 – DC-A
 Node 6 – DC-B
  
 Is this possible or does anyone have another solution for this?
  
  
 Thx  br matthias
  
  



Re: Exception for version 1.1.0

2013-04-14 Thread aaron morton
Always read the news.txt guide 
https://github.com/apache/cassandra/blob/cassandra-1.2/NEWS.txt

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 8:54 PM, Winsdom Chen wins...@gmail.com wrote:

 Hi Aaron,
 Thanks for your reply! I've checked with release note, 
 the patch has applied in 1.2.3. If upgrade from 1.1.0 to
 1.2.3, any data migration or other efforts?



Re: running cassandra on 8 GB servers

2013-04-14 Thread aaron morton
 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
 Exception in thread Thread[Thrift:641,5,main]
 java.lang.OutOfMemoryError: Java heap space
It's easier for people to help if you provide the error stack. Does this happen 
at startup or after it has been running for a while.

What are the full JVM startup params? 
How many CF's do you have and how many rows per node ? 
Are you using the key cache and what is it set to?
Double check you are using the serialising row cache provider (in the yaml 
file). 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/04/2013, at 8:53 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

 I am using 1.2.3, used default heap - 2 GB without JNA installed, 
 then modified heap to 4 GB / 400 MB young generation. + JNA installed.
 bloom filter on the CF's is lowered (more false positives, less disk space).
 
  WARN [ScheduledTasks:1] 2013-04-11 11:09:41,899 GCInspector.java (line 142) 
 Heap is 0.9885574036095974 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
  WARN [ScheduledTasks:1] 2013-04-11 11:09:41,906 StorageService.java (line 
 3541) Flushing CFS(Keyspace='CRAWLER', ColumnFamily='counters') to relieve 
 memory pressure
  INFO [ScheduledTasks:1] 2013-04-11 11:09:41,949 ColumnFamilyStore.java (line 
 637) Enqueuing flush of Memtable-counters@862481781(711504/6211531 
 serialized/live bytes, 11810 ops)
 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
 Exception in thread Thread[Thrift:641,5,main]
 java.lang.OutOfMemoryError: Java heap space
 
 
 On Thu, Apr 11, 2013 at 11:26 PM, aaron morton aa...@thelastpickle.com 
 wrote:
  The data will be huge, I am estimating 4-6 TB per server. I know this is 
  best, but those are my resources.
 You will have a very unhappy time.
 
 The general rule of thumb / guideline for a HDD based system with 1G 
 networking is 300GB to 500Gb per node. See previous discussions on this topic 
 for reasons.
 
  ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
  Exception in thread Thread[Thrift:641,5,main]
  ...
   INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915 
  ThriftServer.java (line 116) Stop listening to thrift clients
 What was the error ?
 
 What version are you using?
 If you have changed any defaults for memory in cassandra-env.sh or 
 cassandra.yaml revert them. Generally C* will do the right thing and not OOM, 
 unless you are trying to store a lot of data on a node that does not have 
 enough memory. See this thread for background 
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 12/04/2013, at 7:35 AM, Nikolay Mihaylov n...@nmmm.nu wrote:
 
  For one project I will need to run cassandra on following dedicated servers:
 
  Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally 
  attached HDD's in some kind of RAID, visible as single HDD.
 
  I can do cluster of 20-30 such servers, may be even more.
 
  The data will be huge, I am estimating 4-6 TB per server. I know this is 
  best, but those are my resources.
 
  Currently I am testing with one of such servers, except HDD is 300 GB. 
  Every 15-20 hours, I get out of heap memory, e.g. something like:
 
  ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line 164) 
  Exception in thread Thread[Thrift:641,5,main]
  ...
   INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915 
  ThriftServer.java (line 116) Stop listening to thrift clients
   INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943 Gossiper.java 
  (line 1077) Announcing shutdown
   INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613 
  MessagingService.java (line 682) Waiting for messaging service to quiesce
   INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655 MessagingService.java 
  (line 888) MessagingService shutting down server thread.
  ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java 
  (line 217) Error occurred during processing of message.
  java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has 
  shut down
 
  Anyone have some advices about better utilization of such servers?
 
  Nick.
 
 



Re: running cassandra on 8 GB servers

2013-04-14 Thread aaron morton
 Hmmm, what is the recommendation for a 10G network if 1G was 300G to
 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
 squeak out 600G to 1T?
Best thing to do would be run a test on how long it takes to repair or 
bootstrap a node. The 300GB to 500Gb was just a guideline.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 12:02 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Hmmm, what is the recommendation for a 10G network if 1G was 300G to
 500GŠI am guessing I can't do 10 times that, correct?  But maybe I could
 squeak out 600G to 1T?
 
 Thanks,
 Dean
 
 On 4/11/13 2:26 PM, aaron morton aa...@thelastpickle.com wrote:
 
 The data will be huge, I am estimating 4-6 TB per server. I know this
 is best, but those are my resources.
 You will have a very unhappy time.
 
 The general rule of thumb / guideline for a HDD based system with 1G
 networking is 300GB to 500Gb per node. See previous discussions on this
 topic for reasons.
 
 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
 ...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients
 What was the error ?
 
 What version are you using?
 If you have changed any defaults for memory in cassandra-env.sh or
 cassandra.yaml revert them. Generally C* will do the right thing and not
 OOM, unless you are trying to store a lot of data on a node that does not
 have enough memory. See this thread for background
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 12/04/2013, at 7:35 AM, Nikolay Mihaylov n...@nmmm.nu wrote:
 
 For one project I will need to run cassandra on following dedicated
 servers:
 
 Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
 attached HDD's in some kind of RAID, visible as single HDD.
 
 I can do cluster of 20-30 such servers, may be even more.
 
 The data will be huge, I am estimating 4-6 TB per server. I know this
 is best, but those are my resources.
 
 Currently I am testing with one of such servers, except HDD is 300 GB.
 Every 15-20 hours, I get out of heap memory, e.g. something like:
 
 ERROR [Thrift:641] 2013-04-11 11:25:19,563 CassandraDaemon.java (line
 164) Exception in thread Thread[Thrift:641,5,main]
 ...
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,915
 ThriftServer.java (line 116) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2013-04-11 11:25:39,943
 Gossiper.java (line 1077) Announcing shutdown
 INFO [StorageServiceShutdownHook] 2013-04-11 11:26:08,613
 MessagingService.java (line 682) Waiting for messaging service to quiesce
 INFO [ACCEPT-/208.94.232.37] 2013-04-11 11:26:08,655
 MessagingService.java (line 888) MessagingService shutting down server
 thread.
 ERROR [Thrift:721] 2013-04-11 11:26:37,709 CustomTThreadPoolServer.java
 (line 217) Error occurred during processing of message.
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
 shut down
 
 Anyone have some advices about better utilization of such servers?
 
 Nick.
 
 



Re: Repair hanges on 1.1.4

2013-04-14 Thread aaron morton
The errors from Hints are not concerned with repair. Increasing the rpc_timeout 
may help with those. If it's logging about 0 hints you may be seeing this 
https://issues.apache.org/jira/browse/CASSANDRA-5068

How did repair hang ? Check for progress with nodetool compactionstats and 
nodetool netstats. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 3:01 AM, Alexis Rodríguez arodrig...@inconcertcc.com wrote:

 Adeel,
 
 It may be a problem in the remote node, could you check the system.log?
 
 Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an 
 increase in this parameter helps.
 
 
 
 
 
 On Fri, Apr 12, 2013 at 9:17 AM, adeel.ak...@panasiangroup.com wrote:
 Hi,
 
 I have started repair on newly added node with -pr and this nodes exist on 
 another data center. I have 5MB internet connection and configured 
 setstreamthroughput 1. After some time repair goes hang and following meesage 
 found in logs;
 
 # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
 Address DC  RackStatus State   Load
 Effective-Ownership Token
   
  169417178424467235000914166253263322299
 10.0.0.3DC1 RAC1Up Normal  93.26 GB66.67% 
  0
 10.0.0.4DC1 RAC1Up Normal  89.1 GB 66.67% 
  56713727820156410577229101238628035242
 10.0.0.15   DC1 RAC1Up Normal  72.87 GB66.67% 
  113427455640312821154458202477256070484
 10.40.1.103 DC2 RAC1Up Normal  48.59 GB
 100.00% 169417178424467235000914166253263322299
 
 
  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java 
 (line 372) Timed out replaying hints to /10.40.1.103; aborting further 
 deliveries
  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java 
 (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
 
 Why we getting this message and how I prevent repair from this error.
 
 Regards,
 
 Adeel Akbar
 



Re: unexplained hinted handoff

2013-04-14 Thread aaron morton
 Do slow reads trigger hint storage?
No. 
But dropped read messages is often an indicator that the node is overwhelmed.

  If hints are being stored, doesn't that imply DOWN nodes, and why don't I 
 see that in the logs?
Hints are stored for two reasons. First if the node is down when the write 
request starts, second if the node does not reply to the coordinator before 
rpc_timeout. If you are not seeing dropped write messages it may indicate 
network issues between the nodes. 

 I'm seeing hinted handoff kick in on all our nodes during periods of
 high activity,
Are you seeing log messages about hints been sent to nodes?

Cheers

  
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 8:23 AM, Dane Miller d...@optimalsocial.com wrote:

 On Fri, Apr 12, 2013 at 1:12 PM, Dane Miller d...@optimalsocial.com wrote:
 I'm seeing hinted handoff kick in on all our nodes during periods of
 high activity, but all the nodes seem to be up (according to the logs
 and nodetool status).  The pattern in the logs is something like this:
 
 18:10:45 194 READ messages dropped in last 5000ms
 18:11:10 Started hinted handoff for host:
 7668c813-41a9-4d42-b362-5420528fefa0 with IP: /10
 18:11:11 Finished hinted handoff of 13 rows to endpoint /10
 
 This happens on all the nodes every 10 min, and with a different
 endpoint each time.  tpstats shows thousands of dropped reads, but no
 other types of messages are dropped.
 
 Do slow reads trigger hint storage?  If hints are being stored,
 doesn't that imply DOWN nodes, and why don't I see that in the logs?
 
 Sorry, meant to add: Cassandra 1.2.3, Ubuntu 12.04 x64



Re: CQL3 And ReversedTypes Question

2013-04-14 Thread aaron morton
 Bad Request: Type error: 
 org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318 cannot 
 be passed as argument 0 of function dateof of type timeuuid
 
 Is there something I am missing here or should I open a new ticket?
Yes please. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 4:40 PM, Gareth Collins gareth.o.coll...@gmail.com wrote:

 OK, trying out 1.2.4. The previous issue seems to be fine, but I am 
 experiencing a new one:
 
 cqlsh:location create table test_y (message_id timeuuid, name text, PRIMARY 
 KEY (name,message_id));
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_y (message_id,name) VALUES (now(),'foo');
 cqlsh:location select dateOf(message_id) from test_y;
 
  dateOf(message_id)
 --
  2013-04-13 00:33:42-0400
  2013-04-13 00:33:43-0400
  2013-04-13 00:33:43-0400
  2013-04-13 00:33:44-0400
 
 cqlsh:location create table test_x (message_id timeuuid, name text, PRIMARY 
 KEY (name,message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location insert into test_x (message_id,name) VALUES (now(),'foo');
 cqlsh:location select dateOf(message_id) from test_x;
 Bad Request: Type error: 
 org.apache.cassandra.cql3.statements.Selection$SimpleSelector@1e7318 cannot 
 be passed as argument 0 of function dateof of type timeuuid
 
 Is there something I am missing here or should I open a new ticket?
 
 thanks in advance,
 Gareth
 
 
 On Tue, Mar 26, 2013 at 3:30 PM, Gareth Collins gareth.o.coll...@gmail.com 
 wrote:
 Added:
 
 https://issues.apache.org/jira/browse/CASSANDRA-5386
 
 Thanks very much for the quick answer!
 
 regards,
 Gareth
 
 On Tue, Mar 26, 2013 at 3:55 AM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
  You aren't missing anything obvious. That's a bug really. Would you mind
  opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA?
 
  --
  Sylvain
 
 
  On Tue, Mar 26, 2013 at 2:48 AM, Gareth Collins gareth.o.coll...@gmail.com
  wrote:
 
  Hi,
 
  I created a table with the following structure in cqlsh (Cassandra
  1.2.3 - cql 3):
 
  CREATE TABLE mytable ( column1 text,
column2 text,
messageId timeuuid,
message blob,
PRIMARY KEY ((column1, column2), messageId));
 
  I can quite happily add values to this table. e.g:
 
  insert into client_queue (column1,column2,messageId,message) VALUES
  ('string1','string2',now(),'ABCCDCC123');
 
  Yet if I decide I want to set the clustering order on messageId DESC:
 
  CREATE TABLE mytable ( column1 text,
column2 text,
messageId timeuuid,
message blob,
PRIMARY KEY ((column1, column2), messageId)) WITH CLUSTERING
  ORDER BY (messageId DESC);
 
  and try to do an insert:
 
  insert into client_queue2 (column1,column2,messageId,message) VALUES
  ('string1','string2',now(),'ABCCDCC123');
 
  I get the following error:
 
  Bad Request: Type error: cannot assign result of function now (type
  timeuuid) to messageid (type
 
  'org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.TimeUUIDType)')
 
  I am sure I am missing something obvious here, but I don't understand.
  Why am I getting an error? What do I need
  to do to be able to add an entry to this table?
 
  thanks in advance,
  Gareth
 
 
 



Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-14 Thread aaron morton
That's better. 

The SSD size is a bit small, and be warned that you will want to leave 50Gb to 
100GB free to allow room for compaction (using the default size tiered). 

On the ram side you will want to run about 4GB (assuming cass 1.2) for the JVM 
the rest can be off heap Cassandra structures. This may not leave too much free 
space for the os page cache, but SSD may help there.

Cheers
  
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote:

 What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb 
 ssds?
 
 I know it will depend on our workload but will be better than a dual core 
 CPU. I think
 
 Jabbar Azam
 
 On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote:
 Duel core not the greatest you might run into GC issues before you run out of 
 IO from your ssd devices. Also cassandra has other concurrency settings that 
 are tuned roughly around the number of processors/cores. It is not uncommon 
 to see 4-6 cores of cpu (600 % in top dealing with young gen garbage managing 
 lots of sockets whatever.
 
 
 On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote:
 That's my guess. My colleague is still looking at CPU's so I'm hoping he can 
 get quad core CPU's for the servers.
 
 Thanks
 
 Jabbar Azam
 
 
 On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:
 If you have not seen it already, checkout the Netflix blog post on their 
 performance testing of AWS SSD instances.
 
 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
 
 My guess, based on very little experience, is that you will be CPU bound.
 
 
 On 04/12/2013 03:05 AM, Jabbar Azam wrote:
 Hello,
 
 I'm going to be building a 20 node cassandra cluster in one datacentre. The 
 spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB 
 RAM and two nics.
 
 
 Has anybody done any performance testing with this setup or have any 
 gotcha's I should be aware of wrt to the hardware?
 
  I do realise the CPU is fairly low computational power but I'm going to 
 assume the system is going to be IO bound hence the RAM and SSD's.
 
 
 Thanks
 
 Jabbar Azam
 
 
 -- 
 Colin Blower
 Software Engineer
 Barracuda Networks Inc.
 +1 408-342-5576 (o)
 
 



Re: Extracting data from SSTable files with MapReduce

2013-04-14 Thread aaron morton
 The SSTable files are in the -f- format from 0.8.10.
If you can upgrade to the latest version it will make things easier. 
Start a node and use nodetool upgradesstables. 

The org.apache.cassandra.tools.SSTableExport class provides a blue print for 
reading rows from disk.

hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 7:58 PM, Jasper K. jasper.knu...@incentro.com wrote:

 Hi,
 
 Does anyone have any experience with running a MapReduce directly against a 
 CF's SSTable files?
 
 I have a use case where this seems to be an option. I want to export all data 
 from a CF to a flat file format for statistical analysis.
 
 Some factors that make it (more) doable in my case:
 -The Cassandra instance is not 'on-line' (no writes- no reads)
 -The .db files were exported from another instance. I got them all in one 
 place now
 
 The SSTable files are in the -f- format from 0.8.10.
 
 Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it 
 should be possible to write a Hadoop RecordReader for Cassandra rowkeys.
 
 But maybe I am not fully aware of what I am up to.
 
 -- 
 
 Jasper 



Re: Any experience of 20 node mini-itx cassandra cluster

2013-04-14 Thread Jabbar Azam
Thanks Aaron.

Thanks

Jabbar Azam


On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote:

 That's better.

 The SSD size is a bit small, and be warned that you will want to leave
 50Gb to 100GB free to allow room for compaction (using the default size
 tiered).

 On the ram side you will want to run about 4GB (assuming cass 1.2) for the
 JVM the rest can be off heap Cassandra structures. This may not leave too
 much free space for the os page cache, but SSD may help there.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote:

 What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb
 ssds?

 I know it will depend on our workload but will be better than a dual core
 CPU. I think

 Jabbar Azam
 On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote:

 Duel core not the greatest you might run into GC issues before you run
 out of IO from your ssd devices. Also cassandra has other concurrency
 settings that are tuned roughly around the number of processors/cores. It
 is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young
 gen garbage managing lots of sockets whatever.


 On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote:

 That's my guess. My colleague is still looking at CPU's so I'm hoping he
 can get quad core CPU's for the servers.

 Thanks

 Jabbar Azam


 On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote:

  If you have not seen it already, checkout the Netflix blog post on
 their performance testing of AWS SSD instances.


 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

 My guess, based on very little experience, is that you will be CPU
 bound.


 On 04/12/2013 03:05 AM, Jabbar Azam wrote:

   Hello,

  I'm going to be building a 20 node cassandra cluster in one
 datacentre. The spec of the servers will roughly be dual core Celeron CPU,
 256 GB SSD, 16GB RAM and two nics.


  Has anybody done any performance testing with this setup or have any
 gotcha's I should be aware of wrt to the hardware?

  I do realise the CPU is fairly low computational power but I'm going
 to assume the system is going to be IO bound hence the RAM and SSD's.


  Thanks

 Jabbar Azam



 --
  *Colin Blower*
 *Software Engineer*
 Barracuda Networks Inc.
 +1 408-342-5576 (o)







Re: Problems with shuffle

2013-04-14 Thread aaron morton
 How does Cassandra with vnodes exactly decide how many vnodes to move?
The num_tokens setting in the yaml file. What did you set this to?

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/04/2013, at 11:56 AM, Rustam Aliyev rustam.li...@code.az wrote:

 Just a followup on this issue. Due to the cost of shuffle, we decided not to 
 do it. Recently, we added new node and ended up in not well balanced cluster:
 
 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns   Host ID   
 Rack
 UN  10.0.1.8  52.28 GB   260 18.3%  
 d28df6a6-c888-4658-9be1-f9e286368dce  rack1
 UN  10.0.1.11 55.21 GB   256 9.4%   
 7b0cf3c8-0c42-4443-9b0c-68f794299443  rack1
 UN  10.0.1.2  49.03 GB   259 17.9%  
 2d308bc3-1fd7-4fa4-b33f-cbbbdc557b2f  rack1
 UN  10.0.1.4  48.51 GB   255 18.4%  
 c253dcdf-3e93-495c-baf1-e4d2a033bce3  rack1
 UN  10.0.1.1  67.14 GB   253 17.9%  
 4f77fd70-b134-486b-9c25-cfea96b6d412  rack1
 UN  10.0.1.3  47.65 GB   253 18.0%  
 4d03690d-5363-42c1-85c2-5084596e09fc  rack1
 
 It looks like new node took from each other node equal amount of vnodes - 
 which is good. However, it's not clear why it decided to have twice less than 
 other nodes.
 
 How does Cassandra with vnodes exactly decide how many vnodes to move?
 
 Btw, during JOINING nodetool status command does not show any information 
 about joining node. It appears only when join finished (on v1.2.3).
 
 -- Rustam
 
 
 On 08/04/2013 22:33, Rustam Aliyev wrote:
 After 2 days of endless compactions and streaming I had to stop this and 
 cancel shuffle. One of the nodes even complained that there's no free disk 
 space (grew from 30GB to 400GB). After all these problems number of the 
 moved tokens were less than 40 (out of 1280!). 
 
 Now, when nodes start they report duplicate ranges. I wonder how bad is that 
 and how do I get rid of that? 
 
  INFO [GossipStage:1] 2013-04-09 02:16:37,920 StorageService.java (line 
 1386) Nodes /10.0.1.2 and /10.0.1.1 have the same token 
 99027485685976232531333625990885670910.  Ignoring /10.0.1.2 
  INFO [GossipStage:1] 2013-04-09 02:16:37,921 StorageService.java (line 
 1386) Nodes /10.0.1.2 and /10.0.1.4 have the same token 
 4319990986300976586937372945998718.  Ignoring /10.0.1.2 
 
 Overall, I'm not sure how bad it is to leave data unshuffled (I read 
 DataStax blog post, not clear). When adding new node wouldn't it be assigned 
 ranges randomly from all nodes? 
 
 Some other notes inline below: 
 
 On 08/04/2013 15:00, Eric Evans wrote: 
 [ Rustam Aliyev ] 
 Hi, 
 
 After upgrading to the vnodes I created and enabled shuffle 
 operation as suggested. After running for a couple of hours I had to 
 disable it because nodes were not catching up with compactions. I 
 repeated this process 3 times (enable/disable). 
 
 I have 5 nodes and each of them had ~35GB. After shuffle operations 
 described above some nodes are now reaching ~170GB. In the log files 
 I can see same files transferred 2-4 times to the same host within 
 the same shuffle session. Worst of all, after all of these I had 
 only 20 vnodes transferred out of 1280. So if it will continue at 
 the same speed it will take about a month or two to complete 
 shuffle. 
 As Edward says, you'll need to issue a cleanup post-shuffle if you expect 
 to see disk usage match your expectations. 
 
 I had few question to better understand shuffle: 
 
 1. Does disabling and re-enabling shuffle starts shuffle process from 
 scratch or it resumes from the last point? 
 It resumes. 
 
 2. Will vnode reallocations speedup as shuffle proceeds or it will 
 remain the same? 
 The shuffle proceeds synchronously, 1 range at a time; It's not going to 
 speed up as it progresses. 
 
 3. Why I see multiple transfers of the same file to the same host? e.g.: 
 
 INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038 
 StreamReplyVerbHandler.java (line 44) Successfully sent 
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db 
 to /10.0.1.8 
 INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427 
 StreamReplyVerbHandler.java (line 44) Successfully sent 
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db 
 to /10.0.1.8 
 I'm not sure, but perhaps that file contained data for two different 
 ranges? 
 Does it mean that if I have huge file (e.g. 20GB) which contain a lot of 
 ranges (let's say 100) it will be transferred each time (20GB*100)? 
 
 4. When I enable/disable shuffle I receive warning message such as 
 below. Do I need to worry about it? 
 
 cassandra-shuffle -h localhost disable 
 Failed to enable shuffling on 10.0.1.1! 
 Failed to enable shuffling on 10.0.1.3! 
 Is that the verbatim output?  

Re: Problems with shuffle

2013-04-14 Thread Rustam Aliyev

How does Cassandra with vnodes exactly decide how many vnodes to move?

The num_tokens setting in the yaml file. What did you set this to?

256, same as on all other nodes.



Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/04/2013, at 11:56 AM, Rustam Aliyev rustam.li...@code.az wrote:


Just a followup on this issue. Due to the cost of shuffle, we decided not to do 
it. Recently, we added new node and ended up in not well balanced cluster:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host ID 
  Rack
UN  10.0.1.8  52.28 GB   260 18.3%  
d28df6a6-c888-4658-9be1-f9e286368dce  rack1
UN  10.0.1.11 55.21 GB   256 9.4%   
7b0cf3c8-0c42-4443-9b0c-68f794299443  rack1
UN  10.0.1.2  49.03 GB   259 17.9%  
2d308bc3-1fd7-4fa4-b33f-cbbbdc557b2f  rack1
UN  10.0.1.4  48.51 GB   255 18.4%  
c253dcdf-3e93-495c-baf1-e4d2a033bce3  rack1
UN  10.0.1.1  67.14 GB   253 17.9%  
4f77fd70-b134-486b-9c25-cfea96b6d412  rack1
UN  10.0.1.3  47.65 GB   253 18.0%  
4d03690d-5363-42c1-85c2-5084596e09fc  rack1

It looks like new node took from each other node equal amount of vnodes - which 
is good. However, it's not clear why it decided to have twice less than other 
nodes.

How does Cassandra with vnodes exactly decide how many vnodes to move?

Btw, during JOINING nodetool status command does not show any information about 
joining node. It appears only when join finished (on v1.2.3).

-- Rustam


On 08/04/2013 22:33, Rustam Aliyev wrote:

After 2 days of endless compactions and streaming I had to stop this and cancel 
shuffle. One of the nodes even complained that there's no free disk space (grew 
from 30GB to 400GB). After all these problems number of the moved tokens were 
less than 40 (out of 1280!).

Now, when nodes start they report duplicate ranges. I wonder how bad is that 
and how do I get rid of that?

  INFO [GossipStage:1] 2013-04-09 02:16:37,920 StorageService.java (line 1386) 
Nodes /10.0.1.2 and /10.0.1.1 have the same token 
99027485685976232531333625990885670910.  Ignoring /10.0.1.2
  INFO [GossipStage:1] 2013-04-09 02:16:37,921 StorageService.java (line 1386) 
Nodes /10.0.1.2 and /10.0.1.4 have the same token 
4319990986300976586937372945998718.  Ignoring /10.0.1.2

Overall, I'm not sure how bad it is to leave data unshuffled (I read DataStax 
blog post, not clear). When adding new node wouldn't it be assigned ranges 
randomly from all nodes?

Some other notes inline below:

On 08/04/2013 15:00, Eric Evans wrote:

[ Rustam Aliyev ]

Hi,

After upgrading to the vnodes I created and enabled shuffle
operation as suggested. After running for a couple of hours I had to
disable it because nodes were not catching up with compactions. I
repeated this process 3 times (enable/disable).

I have 5 nodes and each of them had ~35GB. After shuffle operations
described above some nodes are now reaching ~170GB. In the log files
I can see same files transferred 2-4 times to the same host within
the same shuffle session. Worst of all, after all of these I had
only 20 vnodes transferred out of 1280. So if it will continue at
the same speed it will take about a month or two to complete
shuffle.

As Edward says, you'll need to issue a cleanup post-shuffle if you expect
to see disk usage match your expectations.


I had few question to better understand shuffle:

1. Does disabling and re-enabling shuffle starts shuffle process from
 scratch or it resumes from the last point?

It resumes.


2. Will vnode reallocations speedup as shuffle proceeds or it will
 remain the same?

The shuffle proceeds synchronously, 1 range at a time; It's not going to
speed up as it progresses.


3. Why I see multiple transfers of the same file to the same host? e.g.:

 INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
 StreamReplyVerbHandler.java (line 44) Successfully sent
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
 to /10.0.1.8
 INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
 StreamReplyVerbHandler.java (line 44) Successfully sent
 /u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
 to /10.0.1.8

I'm not sure, but perhaps that file contained data for two different
ranges?

Does it mean that if I have huge file (e.g. 20GB) which contain a lot of ranges 
(let's say 100) it will be transferred each time (20GB*100)?

4. When I enable/disable shuffle I receive warning message such as
 below. Do I need to worry about it?

 cassandra-shuffle -h localhost disable
 Failed to enable shuffling on 10.0.1.1!
 Failed to enable shuffling on 10.0.1.3!

Is that the verbatim output?  Did it report failing to enable when you
tried to disable?

Yes, this is verbatim 

Re: Rename failed while cassandra is starting up

2013-04-14 Thread aaron morton
 From the log messages, it looked like the table/keyspace was
 opened before the scrubDataDirectories was executed. This created a race
 condition between two threads.
Seems odd. 
AFAIK that startup is single threaded and the scrub runs before the tables are 
opened. See AbstractCassandraDaemon.setup()

 INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
 (line 184) Creating new index :
 ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
 validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
 index_name='fmzd_ap_mobilityZoneUUID'}
 ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
 java.io.IOError: java.io.IOException: rename failed of
 /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db

Looks like a secondary index is being created at startup and there is an error 
renaming the file. 
OR
The node was shut down before the index was built and it's been rebuilt at 
startup.

Both of these are async operations and cause a race with scrubDirectories(). 

Probably not the log replaying because it looks like the sstables have not been 
opened. 

I *think* the way around this is to um…. 
* move all existing data and commit log out of the way 
* start with node with -Dcassandra.join_ring=false JVM option in 
cassandra-env.sh
* check that all indexes are built using nodetool cfstats
* shut it down
* put the commit log and data dirs back in place. 

All we want to do is get the system KS updated, but in 1.0 that's a serialised 
object and not easy to poke. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 14/04/2013, at 3:50 PM, Boris Yen yulin...@gmail.com wrote:

 Hi All,
 
 Recently, we encountered an error on 1.0.12 that prevented cassandra from
 starting up. From the log messages, it looked like the table/keyspace was
 opened before the scrubDataDirectories was executed. This created a race
 condition between two threads. One was trying to rename files while the
 other was trying to remove tmp files. I was wondering if anyone could
 provide us some information or workaround for this.
 
 INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186)
 CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is
 3.7553409423470883 (just-counted was 3.1413828689370487).  calculation took
 2ms for 265 columns
 INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes)
 INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes)
 INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874
 ColumnFamilyStore.java (line 705) Enqueuing flush of
 Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live
 bytes, 275 ops)
 INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java
 (line 184) Creating new index : ColumnDefinition{name=6d65736853534944,
 validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
 index_name='fmzd_ap_meshSSID'}
 INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes)
 INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes)
 INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
 (line 184) Creating new index :
 ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
 validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
 index_name='fmzd_ap_mobilityZoneUUID'}
 ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
 java.io.IOError: java.io.IOException: rename failed of
 /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
 at
 org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:375)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
 at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:276)
 at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
 at org.apache.cassandra.db.Memtable$4.runMayThrow(Memtable.java:299)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.io.IOException: rename failed of
 /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
 at
 

1.1.9 to 1.2.3 upgrade issue

2013-04-14 Thread John Watson
Started doing a rolling upgrade of nodes from 1.1.9 to 1.2.3 and nodes on
1.1.9 started flooding this error:

Exception in thread Thread[RequestResponseStage:19496,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
at
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:155)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
... 6 more

As I understand the Hints CF changed from 1.1.x to 1.2.x so I assume that's
the cause of the 1.2.3 nodes flooding (for various IPs still being 1.1.9):

Unable to store hint for host with missing ID, /10.37.62.71 (old node?)

Is this a known issue? Or rolling upgrade form 1.1.x to 1.2.x not possible?

Thanks,

John


re-execution of failed queries with rpc_timeout

2013-04-14 Thread Moty Kosharovsky
Hello,

I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35.
Our application constantly writes large updates with cql. Once in a while,
an rpc_time will occur.

Since a lot of the information is counters, its impossible for me to
understand if the updates complete partially on rpc_timeout, or cassandra
somehow rolls back the change completely, and hence I can't tell if I
should re-execute the query on rpc_timeout (with double processing being a
bigger concern than missing updates).

I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
rpc_timeout will always mean that the update was not processes as a whole.
In all other cases, the rpc_timeout might be thrown from a remote node (not
the one I'm connected to), and hence some parts of the update will be
performed and others parts will not.

Anyone solved this issue before?

Kind Regards,
Kosha


Re: re-execution of failed queries with rpc_timeout

2013-04-14 Thread Moty Kosharovsky
Sorry, not LOCAL QUORUM, I meant ANY quorum.


On Mon, Apr 15, 2013 at 4:12 AM, Moty Kosharovsky motyk...@gmail.comwrote:

 Hello,

 I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35.
 Our application constantly writes large updates with cql. Once in a while,
 an rpc_time will occur.

 Since a lot of the information is counters, its impossible for me to
 understand if the updates complete partially on rpc_timeout, or cassandra
 somehow rolls back the change completely, and hence I can't tell if I
 should re-execute the query on rpc_timeout (with double processing being a
 bigger concern than missing updates).

 I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
 rpc_timeout will always mean that the update was not processes as a whole.
 In all other cases, the rpc_timeout might be thrown from a remote node (not
 the one I'm connected to), and hence some parts of the update will be
 performed and others parts will not.

 Anyone solved this issue before?

 Kind Regards,
 Kosha



Re: Rename failed while cassandra is starting up

2013-04-14 Thread Boris Yen
Hi Aaron,

startup is single threaded and the scrub runs before the tables are opened
.

This is what I was thinking too. However, after using the debugger to trace
the code, I realized that MeteredFlusher (see the countFlushBytes method)
might open the sstables before the scrub is completed. I suppose this is
the cause of the exceptions I saw.

My plan is to add a boolean flag named scrubCompleted at
AbstractCassandraDaemon or StorageService. By default, it is false, after
the scrub is completed the AbstractCassandraDaemon needs to set it to true.
The MeterdFluster needs to make sure the scrub is completed by checking
this boolean value and starts to do all the calculation.

Is this a good plan? or it might have side effects?

Thanks and Regards,
Boris


On Mon, Apr 15, 2013 at 4:26 AM, aaron morton aa...@thelastpickle.comwrote:

 From the log messages, it looked like the table/keyspace was
 opened before the scrubDataDirectories was executed. This created a race
 condition between two threads.

 Seems odd.
 AFAIK that startup is single threaded and the scrub runs before the tables
 are opened. See AbstractCassandraDaemon.setup()

 INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java

 (line 184) Creating new index :
 ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
 validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
 index_name='fmzd_ap_mobilityZoneUUID'}
 ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
 java.io.IOError: java.io.IOException: rename failed of
 /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db


 Looks like a secondary index is being created at startup and there is an
 error renaming the file.
 OR
 The node was shut down before the index was built and it's been rebuilt at
 startup.

 Both of these are async operations and cause a race with
 scrubDirectories().

 Probably not the log replaying because it looks like the sstables have not
 been opened.

 I *think* the way around this is to um….
 * move all existing data and commit log out of the way
 * start with node with -Dcassandra.join_ring=false JVM option in
 cassandra-env.sh
 * check that all indexes are built using nodetool cfstats
 * shut it down
 * put the commit log and data dirs back in place.

 All we want to do is get the system KS updated, but in 1.0 that's a
 serialised object and not easy to poke.

 Hope that helps.

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 14/04/2013, at 3:50 PM, Boris Yen yulin...@gmail.com wrote:

 Hi All,

 Recently, we encountered an error on 1.0.12 that prevented cassandra from
 starting up. From the log messages, it looked like the table/keyspace was
 opened before the scrubDataDirectories was executed. This created a race
 condition between two threads. One was trying to rename files while the
 other was trying to remove tmp files. I was wondering if anyone could
 provide us some information or workaround for this.

 INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186)
 CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is
 3.7553409423470883 (just-counted was 3.1413828689370487).  calculation took
 2ms for 265 columns
 INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes)
 INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes)
 INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874
 ColumnFamilyStore.java (line 705) Enqueuing flush of
 Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live
 bytes, 275 ops)
 INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java
 (line 184) Creating new index : ColumnDefinition{name=6d65736853534944,
 validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
 index_name='fmzd_ap_meshSSID'}
 INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes)
 INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line
 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes)
 INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java
 (line 184) Creating new index :
 ColumnDefinition{name=6d6f62696c6974795a6f6e654944,
 validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS,
 index_name='fmzd_ap_mobilityZoneUUID'}
 ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main]
 java.io.IOError: java.io.IOException: rename failed of
 /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db
 at

 

AUTO : Samuel CARRIERE is out of the office (retour 22/04/2013)

2013-04-14 Thread Samuel CARRIERE


Je suis absent(e) du bureau jusqu'au 22/04/2013




Remarque : ceci est une réponse automatique à votre message  re-execution
of failed queries with rpc_timeout envoyé le 15/04/2013 3:12:45.

C'est la seule notification que vous recevrez pendant l'absence de cette
personne.

Added extra column as composite key while creation counter column family

2013-04-14 Thread Kuldeep Mishra
Hi,
   While I creating counter column family a extra column is being added
what I do ?
Table creation script
 CREATE TABLE counters (
  key text,
  value counter,
  PRIMARY KEY (key)
 ) WITH COMPACT STORAGE

after describing column family I am getting following
CREATE TABLE counters (
  key text,
 * column1 text,*
  value counter,
  PRIMARY KEY (key,* column1*)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

extra column column1 is added

Please help

-- 
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199