Re: ERROR service.AbstractCassandraDaemon: Exception in thread Thread[Thrift:4,5,main]

2013-04-08 Thread Everton Lima
Do not heve any manner to limit the usage of heap by setting some attribute
in cassandra.yalm file?


2013/4/9 aaron morton 

> You need to increase the JVM heap, cassandra picks sensible defaults if
> you have enough server memory.
>
> Reset all changed config files to default settings and make sure cassandra
> has at least 4GB of JVM heap. The heap size is calculated in
> cassandra-env.sh.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/04/2013, at 7:07 AM, Everton Lima  wrote:
>
> > Hello,
> >
> > I am trying to insert a lot of data in cassandra 1.1.8, in 2 servers.
> > As a client I was using Astyanay to send Insert's CQL instructions.
> > It starts to insert the data, but after some time I recieve this error
> and both, server and client, dead.
> > Someone knows how to fix it? It is the best way to insert the data?
> > In appendix is the cassandra.yalm that the servers use.
> >
> > Ps. I am just reading the data of a postgres table, tuple by tuple, add
> one collumn, as key for cassandra, and insert in cassandra's servers.
> >
> >
> > The absolute log is:
> >
> >  ERROR service.AbstractCassandraDaemon: Exception in thread
> Thread[Thrift:4,5,main]
> > java.lang.OutOfMemoryError: Java heap space
> > at java.util.Arrays.copyOf(Arrays.java:2746)
> > at java.util.ArrayList.ensureCapacity(ArrayList.java:187)
> > at java.util.ArrayList.add(ArrayList.java:378)
> > at
> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:116)
> > at
> org.antlr.runtime.CommonTokenStream.mark(CommonTokenStream.java:305)
> > at org.antlr.runtime.DFA.predict(DFA.java:68)
> > at org.apache.cassandra.cql.CqlParser.query(CqlParser.java:215)
> > at
> org.apache.cassandra.cql.QueryProcessor.getStatement(QueryProcessor.java:959)
> > at
> org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:879)
> > at
> org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1267)
> > at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3637)
> > at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3625)
> > at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> > at
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> > at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > at java.lang.Thread.run(Thread.java:636)
> > [04/04/13 20:20:04,715 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,712 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,711 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,710 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,702 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,700 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,692 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:04,591 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> > java.lang.OutOfMemoryError: Java heap space
> > [04/04/13 20:20:01,114 BRT] ERROR service.AbstractCassandraDaemon:
> Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> > java.lang.OutOfMemoryError: Java heap space
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2072)
> > at
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
> > at
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecut

Re: Counter batches query

2013-04-08 Thread aaron morton
For #1 Storage Proxy (server wide) metrics are per request, so 1 in your 
example. CF level metrics are per row, so 5 in your example. 

Not sure what graph you were looking at in ops centre, probably best to ask on 
here http://www.datastax.com/support-forums/

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/04/2013, at 2:30 AM, Edward Capriolo  wrote:

> For #2
> There are tow mutates in thrift batch_mutate and atomic_batch_mutate. The 
> atomic version was just added. If you care more about the performance do not 
> use the atomic version..
> 
> 
> On Sat, Apr 6, 2013 at 12:03 AM, Matt K  wrote:
> Hi,
> 
> I have an application that does batch (counter) writes to multiple CFs. The 
> application itself is multi-threaded and I'm using C* 1.2.2 and Astyanax 
> driver. Could someone share insights on:
> 
> 1) When I see the cluster write throughput graph in opscenter, the number is 
> not reflective of actual number of writes. For example: If I issue a single 
> batch write ( internally have 5 mutation ), is the opscenter/JMX cluster/node 
> writes suppose to indicate 1 or 5 ? ( I would assume 5 ) 
> 
> 2) I read that from C* 1.2.x, there is atomic counter batches which can cause 
> 30% performance hit - wondering if this applicable to existing thrift based 
> clients like Astyanax/Hector and if so, what is the way to turn it off? Any 
> server side settings too?
> 
> Thanks!
> 



Re: Lost data after expanding cluster c* 1.2.3-1

2013-04-08 Thread aaron morton
Look in the logs for messages from the SecondaryIndexManager 

starts with "Submitting index build of"
end with "Index build of"

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/04/2013, at 12:55 AM, Kais Ahmed  wrote:

> hi aaron,
> 
> nodetool compactionstats on all nodes return 1 pending task :
> 
> ubuntu@app:~$ nodetool compactionstats host
> pending tasks: 1
> Active compaction remaining time :n/a
> 
> The command nodetool rebuild_index was launched several days ago.
> 
> 2013/4/5 aaron morton 
>> but nothing's happening, how can i monitor the progress? and how can i know 
>> when it's finished?
> 
> check nodetool compacitonstats
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/04/2013, at 2:51 PM, Kais Ahmed  wrote:
> 
>> Hi aaron,
>> 
>> I ran the command "nodetool rebuild_index host keyspace cf" on all the 
>> nodes, in the log i see :
>> 
>> INFO [RMI TCP Connection(5422)-10.34.139.xxx] 2013-04-04 08:31:53,641 
>> ColumnFamilyStore.java (line 558) User Requested secondary index re-build 
>> for ...
>> 
>> but nothing's happening, how can i monitor the progress? and how can i know 
>> when it's finished?
>> 
>> Thanks,
>>  
>> 
>> 2013/4/2 aaron morton 
>>> The problem come from that i don't put  auto_boostrap to true for the new 
>>> nodes, not in this documentation 
>>> (http://www.datastax.com/docs/1.2/install/expand_ami)
>> auto_bootstrap defaults to True if not specified in the yaml. 
>> 
>>> can i do that at any time, or when the cluster are not loaded
>> Not sure what the question is. 
>> Both those operations are online operations you can do while the node is 
>> processing requests. 
>>  
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 1/04/2013, at 9:26 PM, Kais Ahmed  wrote:
>> 
>>> > At this moment the errors started, we see that members and other data are 
>>> > gone, at this moment the nodetool status  return (in red color the 3 new 
>>> > nodes)
>>> > What errors?
>>> The errors was in my side in the application, not cassandra errors
>>> 
>>> > I put for each of them seeds = A ip, and start each with two minutes 
>>> > intervals.
>>> > When I'm making changes I tend to change a single node first, confirm 
>>> > everything is OK and then do a bulk change.
>>> Thank you for that advice.
>>> 
>>> >I'm not sure what or why it went wrong, but that should get you to a 
>>> >stable place. If you have any problems keep an eye on the logs for errors 
>>> >or warnings.
>>> The problem come from that i don't put  auto_boostrap to true for the new 
>>> nodes, not in this documentation 
>>> (http://www.datastax.com/docs/1.2/install/expand_ami)
>>> 
>>> >if you are using secondary indexes use nodetool rebuild_index to rebuild 
>>> >those.
>>> can i do that at any time, or when the cluster are not loaded
>>> 
>>> Thanks aaron,
>>> 
>>> 2013/4/1 aaron morton 
>>> Please do not rely on colour in your emails, the best way to get your 
>>> emails accepted by the Apache mail servers is to use plain text.
>>> 
>>> > At this moment the errors started, we see that members and other data are 
>>> > gone, at this moment the nodetool status return (in red color the 3 new 
>>> > nodes)
>>> What errors?
>>> 
>>> > I put for each of them seeds = A ip, and start each with two minutes 
>>> > intervals.
>>> When I'm making changes I tend to change a single node first, confirm 
>>> everything is OK and then do a bulk change.
>>> 
>>> > Now the cluster seem to work normally, but i can use the secondary for 
>>> > the moment, the queryanswer are random
>>> run nodetool repair -pr on each node, let it finish before starting the 
>>> next one.
>>> if you are using secondary indexes use nodetool rebuild_index to rebuild 
>>> those.
>>> Add one node new node to the cluster and confirm everything is ok, then add 
>>> the remaining ones.
>>> 
>>> >I'm not sure what or why it went wrong, but that should get you to a 
>>> >stable place. If you have any problems keep an eye on the logs for errors 
>>> >or warnings.
>>> 
>>> Cheers
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 31/03/2013, at 10:01 PM, Kais Ahmed  wrote:
>>> 
>>> > Hi aaron,
>>> >
>>> > Thanks for reply, i will try to explain what append exactly
>>> >
>>> > I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami 
>>> > (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
>>> > this config --clustername myDSCcluster --totalnodes 4--version community
>>> >
>>> > Two days after this cluster in production, i saw that the cluster was 
>>> > overload, I wanted to extend it by adding 3 another nodes.
>>> >
>>> > I c

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-08 Thread aaron morton
If you create a reverse index on all column names, where the single row has a 
key something like "the_index" and each column name is the column name that has 
been used else where, you are approaching the "twitter global timeline anti 
pattern"(™). 

Basically you will end up with a hot row that has to handle 100k inserts a 
second. It would be a good idea to do some tests if that is your target 
throughput. Your design options are to consider sharding the index using 
something simple like hash and mod or consistent sharding like C* does. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/04/2013, at 7:37 AM, Drew Kutcharian  wrote:

> One thing I can do is to have a client-side cache of the keys to reduce the 
> number of updates.
> 
> 
> On Apr 5, 2013, at 6:14 AM, Edward Capriolo  wrote:
> 
>> Since there are few column names what you can do is this. Make a reverse 
>> index, low read repair chance, Be aggressive with compaction. It will be 
>> many extra writes but that is ok. 
>> 
>> Other option is turn on row cache and try read before write. It is a good 
>> case for row cache because it is a very small data set.
>> 
>> On Thursday, April 4, 2013, Drew Kutcharian  wrote:
>> > I don't really need to answer "what rows contain column named X", so no 
>> > need for a reverse index here. All I want is a distinct set of all the 
>> > column names, so I can answer "what are all the available column names"
>> >
>> > On Apr 4, 2013, at 4:20 PM, Edward Capriolo  wrote:
>> >
>> > Your reverse index of "which rows contain a column named X" will have very 
>> > wide rows. You could look at cassandra's secondary indexing, or possibly 
>> > look at a solandra/solr approach. Another option is you can shift the 
>> > problem slightly, "which rows have column X that was added between time y 
>> > and time z". Remember with few distinct column names that reverse index of 
>> > column to row is going to be a very big list.
>> >
>> >
>> > On Thu, Apr 4, 2013 at 5:45 PM, Drew Kutcharian  wrote:
>> >>
>> >> Hi Edward,
>> >> I anticipate that the column names will be reused a lot. For example, 
>> >> key1 will be in many rows. So I think the number of distinct column names 
>> >> will be much much smaller than the number of rows. Is there a way to have 
>> >> a separate CF that keeps track of the column names? 
>> >> What I was thinking was to have a separate CF that I write only the 
>> >> column name with a null value in there every time I write a key/value to 
>> >> the main CF. In this case if that column name exist, then it will just be 
>> >> overridden. Now if I wanted to get all the column names, then I can just 
>> >> query that CF. Not sure if that's the best approach at high load (100k 
>> >> inserts a second).
>> >> -- Drew
>> >>
>> >> On Apr 4, 2013, at 12:02 PM, Edward Capriolo  
>> >> wrote:
>> >>
>> >> You can not get only the column name (which you are calling a key) you 
>> >> can use get_range_slice which returns all the columns. When you specify 
>> >> an empty byte array (new byte[0]{}) as the start and finish you get back 
>> >> all the columns. From there you can return only the columns to the user 
>> >> in a format that you like.
>> >>
>> >>
>> >> On Thu, Apr 4, 2013 at 2:18 PM, Drew Kutcharian  wrote:
>> >>>
>> >>> Hey Guys,
>> >>>
>> >>> I'm working on a project and one of the requirements is to have a schema 
>> >>> free CF where end users can insert arbitrary key/value pairs per row. 
>> >>> What would be the best way to know what are all the "keys" that were 
>> >>> inserted (preferably w/o any locking). For example,
>> >>>
>> >>> Row1 => key1 -> XXX, key2 -> XXX
>> >>> Row2 => key1 -> XXX, key3 -> XXX
>> >>> Row3 => key4 -> XXX, key5 -> XXX
>> >>> Row4 => key2 -> XXX, key5 -> XXX
>> >>> …
>> >>>
>> >>> The query would be give me all the inserted keys and the response would 
>> >>> be {key1, key2, key3, key4, key5}
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Drew
>> >>>
>> >>
>> >>
>> >
>> >
>> >
> 



Re: ERROR service.AbstractCassandraDaemon: Exception in thread Thread[Thrift:4,5,main]

2013-04-08 Thread aaron morton
You need to increase the JVM heap, cassandra picks sensible defaults if you 
have enough server memory. 

Reset all changed config files to default settings and make sure cassandra has 
at least 4GB of JVM heap. The heap size is calculated in cassandra-env.sh. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/04/2013, at 7:07 AM, Everton Lima  wrote:

> Hello,
> 
> I am trying to insert a lot of data in cassandra 1.1.8, in 2 servers.
> As a client I was using Astyanay to send Insert's CQL instructions.
> It starts to insert the data, but after some time I recieve this error and 
> both, server and client, dead.
> Someone knows how to fix it? It is the best way to insert the data?
> In appendix is the cassandra.yalm that the servers use.
> 
> Ps. I am just reading the data of a postgres table, tuple by tuple, add one 
> collumn, as key for cassandra, and insert in cassandra's servers.
> 
> 
> The absolute log is:
> 
>  ERROR service.AbstractCassandraDaemon: Exception in thread 
> Thread[Thrift:4,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2746)
> at java.util.ArrayList.ensureCapacity(ArrayList.java:187)
> at java.util.ArrayList.add(ArrayList.java:378)
> at 
> org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:116)
> at 
> org.antlr.runtime.CommonTokenStream.mark(CommonTokenStream.java:305)
> at org.antlr.runtime.DFA.predict(DFA.java:68)
> at org.apache.cassandra.cql.CqlParser.query(CqlParser.java:215)
> at 
> org.apache.cassandra.cql.QueryProcessor.getStatement(QueryProcessor.java:959)
> at 
> org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:879)
> at 
> org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1267)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3637)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3625)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
> [04/04/13 20:20:04,715 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,712 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,711 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,710 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,702 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,700 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,692 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:04,591 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[RMI TCP Connection(idle),5,RMI Runtime]
> java.lang.OutOfMemoryError: Java heap space
> [04/04/13 20:20:01,114 BRT] ERROR service.AbstractCassandraDaemon: Exception 
> in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.OutOfMemoryError: Java heap space
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2072)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
> at 
> org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:47)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at java.lang.Thread.run(Thread.java:636)
> [04/04/13 20:20:01,114 BRT] E

Re: Problems with shuffle

2013-04-08 Thread Rustam Aliyev
After 2 days of endless compactions and streaming I had to stop this and 
cancel shuffle. One of the nodes even complained that there's no free 
disk space (grew from 30GB to 400GB). After all these problems number of 
the moved tokens were less than 40 (out of 1280!).


Now, when nodes start they report duplicate ranges. I wonder how bad is 
that and how do I get rid of that?


 INFO [GossipStage:1] 2013-04-09 02:16:37,920 StorageService.java (line 
1386) Nodes /10.0.1.2 and /10.0.1.1 have the same token 
99027485685976232531333625990885670910.  Ignoring /10.0.1.2
 INFO [GossipStage:1] 2013-04-09 02:16:37,921 StorageService.java (line 
1386) Nodes /10.0.1.2 and /10.0.1.4 have the same token 
4319990986300976586937372945998718.  Ignoring /10.0.1.2


Overall, I'm not sure how bad it is to leave data unshuffled (I read 
DataStax blog post, not clear). When adding new node wouldn't it be 
assigned ranges randomly from all nodes?


Some other notes inline below:

On 08/04/2013 15:00, Eric Evans wrote:

[ Rustam Aliyev ]

Hi,

After upgrading to the vnodes I created and enabled shuffle
operation as suggested. After running for a couple of hours I had to
disable it because nodes were not catching up with compactions. I
repeated this process 3 times (enable/disable).

I have 5 nodes and each of them had ~35GB. After shuffle operations
described above some nodes are now reaching ~170GB. In the log files
I can see same files transferred 2-4 times to the same host within
the same shuffle session. Worst of all, after all of these I had
only 20 vnodes transferred out of 1280. So if it will continue at
the same speed it will take about a month or two to complete
shuffle.

As Edward says, you'll need to issue a cleanup post-shuffle if you expect
to see disk usage match your expectations.


I had few question to better understand shuffle:

1. Does disabling and re-enabling shuffle starts shuffle process from
scratch or it resumes from the last point?

It resumes.


2. Will vnode reallocations speedup as shuffle proceeds or it will
remain the same?

The shuffle proceeds synchronously, 1 range at a time; It's not going to
speed up as it progresses.


3. Why I see multiple transfers of the same file to the same host? e.g.:

INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
StreamReplyVerbHandler.java (line 44) Successfully sent
/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
to /10.0.1.8
INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
StreamReplyVerbHandler.java (line 44) Successfully sent
/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
to /10.0.1.8

I'm not sure, but perhaps that file contained data for two different
ranges?
Does it mean that if I have huge file (e.g. 20GB) which contain a lot of 
ranges (let's say 100) it will be transferred each time (20GB*100)?



4. When I enable/disable shuffle I receive warning message such as
below. Do I need to worry about it?

cassandra-shuffle -h localhost disable
Failed to enable shuffling on 10.0.1.1!
Failed to enable shuffling on 10.0.1.3!

Is that the verbatim output?  Did it report failing to enable when you
tried to disable?
Yes, this is verbatim output. It reports failure for enable as well as 
disable. Nodes .1.1 and .1.3 were not RELOCATING unless I ran 
cassandra-shuffle enable command on them locally.


As a rule of thumb though, you don't want an disable/enable to result in
only a subset of nodes shuffling.  Are there no other errors?  What do
the logs say?

No errors in logs. Only INFO about streams and WARN about relocation.



I couldn't find many docs on shuffle, only read through JIRA and
original proposal by Eric.




RE: gossip not working

2013-04-08 Thread S C
I did try this option and everything is working fine. Thank you Aaron.
From: aa...@thelastpickle.com
Subject: Re: gossip not working
Date: Fri, 5 Apr 2013 23:02:58 +0530
To: user@cassandra.apache.org

Starting the node with the JVM option -Dcassandra.load_ring_state=false in 
cassandra-env.sh sometimes works. 
If not post the output from nodetool gossipinfo
Cheers

-Aaron MortonFreelance Cassandra ConsultantNew Zealand
@aaronmortonhttp://www.thelastpickle.com



On 5/04/2013, at 9:38 AM, S C  wrote:Is there a way to force 
gossip among the nodes?

From: as...@outlook.com
To: user@cassandra.apache.org
Subject: RE: gossip not working
Date: Thu, 4 Apr 2013 19:59:45 -0500

I am not seeing anything in the logs other than "Starting up server gossip" and 
there is no firewall between the nodes.
From: paulsu...@gmail.com
Subject: Re: gossip not working
Date: Thu, 4 Apr 2013 18:49:29 -0500
To: user@cassandra.apache.org

What errors are you seeing in the log files of the down nodes? Did you run 
upgradesstables? You need to upgradesstables when moving from < 1.1.7 to 1.1.9
On Apr 4, 2013, at 6:11 PM, S C  wrote:I was in the middle 
of upgrade to 1.1.9. I brought one node with 1.1.9 while the other were running 
on 1.1.5. Once one of the node was on 1.1.9 it is no longer recognizing other 
nodes in the ring.
On 192.168.56.10 and 11
192.168.56.10  DC1-CassRAC1Up Normal  28.06 GB50.00%
  0   192.168.56.11  DC1-Cass   
 RAC1Up Normal  31.59 GB25.00%  
42535295865117307932921825928971026432  192.168.56.12  DC1-CassRAC1 
   Down   Normal  29.02 GB25.00%  
85070591730234615865843651857942052864

On 192.168.56.12
192.168.56.10  DC1-CassRAC1Down Normal  28.06 GB50.00%  
0   192.168.56.11  DC1-Cass 
   RAC1Down Normal  31.59 GB25.00%  
42535295865117307932921825928971026432  192.168.56.12  DC1-CassRAC1 
   Up   Normal  29.02 GB25.00%  
85070591730234615865843651857942052864

I do not see anything in the logs that tells me that there is a gossip issue.
nodetool infoToken: 85070591730234615865843651857942052864Gossip 
active: trueThrift active: trueLoad : 29.05 GBGeneration No 
   : 1365114563Uptime (seconds) : 2127Heap Memory (MB) : 848.71 / 
7945.94Exceptions   : 0Key Cache: size 2208 (bytes), capacity 
104857584 (bytes), 1056 hits, 1099 requests, 0.961 recent hit rate, 14400 save 
period in secondsRow Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 
0 requests, NaN recent hit rate, 0 save period in seconds
nodetool infoToken: 42535295865117307932921825928971026432Gossip 
active: trueThrift active: trueLoad : 31.59 GBGeneration No 
   : 1364413038Uptime (seconds) : 703904Heap Memory (MB) : 733.02 / 
7945.94Exceptions   : 1Key Cache: size 3693312 (bytes), capacity 
104857584 (bytes), 26071678 hits, 26616282 requests, 0.980 recent hit rate, 
14400 save period in secondsRow Cache: size 0 (bytes), capacity 0 
(bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds


There is no firewall between the nodes and I can reach each other on storage 
port. What else should I be looking at to find root cause? Appreciate your 
inputs.
  

data modeling from batch_mutate point of view

2013-04-08 Thread DE VITO Dominique
Hi,

I have a use case that sounds like storing data associated with files. So, I 
store them with the CF:
rowkey = (folder_id, file_id)
colname = property name (about the file corresponding to file_id)
colvalue = property value

And I have CF for "manual" indexing:
rowkey = (folder_id, indexed value)
colname = (timestamp, file_id)
colvalue = ""

like
rowkey = (folder_id, note_of_5) or (folder_id, some_status)
colname = (some_date, some_filename)
colvalue = ""

I have many CF for indexing, as I index according to different (file) 
properties.

So, one alternative design for indexing CF could be:
rowkey = folder_id
colname = (indexed value, timestamp, file_id)
colvalue = ""

Alternative design :
* pro: same rowkey for all indexing CF => **all** indexing CF could be updated 
through one batch_mutate
* con: repeating "indexed value" (1er colname part) again ang again (= a string 
up to 20c)

According to pro vs con, is the alternative design more or less interesting ?

Thanks.

Dominique




Re: Really have to repair ?

2013-04-08 Thread Edward Capriolo
Because cassandra is eventually consistent, and there are many settings
QUORUM, ONE, hint windows, disk failures, cosmic rays, node joins, there
are few absolutes.


On Mon, Apr 8, 2013 at 10:15 AM,  wrote:

> So, you're saying that deleted rows can come back even if the node is
> always up or down for less than max_hint_window_in_ms, right ?
>
> --
> Cyril SCETBON
>
> On Apr 5, 2013, at 11:59 PM, Edward Capriolo 
> wrote:
>
> There are a series of edge cases that dictate the need for repair. The
> largest cases are 1) lost deletes 2) random disk corruptions
>
> In our use case we only delete entire row keys, and if the row key comes
> back it is not actually a problem because our software will find it an
> delete it again. In those places we dodge running repair believe it or not.
>
> Edward
>
>
> On Fri, Apr 5, 2013 at 11:22 AM, Jean-Armel Luce wrote:
>
>> Hi Cyril,
>>
>> According to the documentation (
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair),
>> I understand that is is not necessary to repair every node before
>> gc_grace_seconds if you are sure that you don't miss to run a repair each
>> time a node is down longer than gc_grace_seconds.
>>
>> "*IF* your operations team is sufficiently on the ball, you can get by
>> without repair as long as you do not have hardware failure -- in that case,
>> HintedHandoff is adequate to repair successful updates that some replicas
>> have missed"
>>
>> Am I wrong ? Thoughts ?
>>
>>
>>
>>
>> 2013/4/4 
>>
>>> Hi,
>>>
>>> I know that deleted rows can reappear if "node repair" is not run on
>>> every node before *gc_grace_seconds* seconds. However do we really need
>>> to obey this rule if we run "node repair" on node that are down for more
>>> than *max_hint_window_in_ms* milliseconds ?
>>>
>>> Thanks
>>>  --
>>> Cyril SCETBON
>>>
>>> _
>>>
>>> Ce message et ses pieces jointes peuvent contenir des informations 
>>> confidentielles ou privilegiees et ne doivent donc
>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
>>> ce message par erreur, veuillez le signaler
>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
>>> electroniques etant susceptibles d'alteration,
>>> France Telecom - Orange decline toute responsabilite si ce message a ete 
>>> altere, deforme ou falsifie. Merci.
>>>
>>> This message and its attachments may contain confidential or privileged 
>>> information that may be protected by law;
>>> they should not be distributed, used or copied without authorisation.
>>> If you have received this email in error, please notify the sender and 
>>> delete this message and its attachments.
>>> As emails may be altered, France Telecom - Orange is not liable for 
>>> messages that have been modified, changed or falsified.
>>> Thank you.
>>>
>>>
>>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, France Telecom - Orange is not liable for messages 
> that have been modified, changed or falsified.
> Thank you.
>
>


Re: Really have to repair ?

2013-04-08 Thread cscetbon.ext
So, you're saying that deleted rows can come back even if the node is always up 
or down for less than max_hint_window_in_ms, right ?

--
Cyril SCETBON

On Apr 5, 2013, at 11:59 PM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>> wrote:

There are a series of edge cases that dictate the need for repair. The largest 
cases are 1) lost deletes 2) random disk corruptions

In our use case we only delete entire row keys, and if the row key comes back 
it is not actually a problem because our software will find it an delete it 
again. In those places we dodge running repair believe it or not.

Edward


On Fri, Apr 5, 2013 at 11:22 AM, Jean-Armel Luce 
mailto:jaluc...@gmail.com>> wrote:
Hi Cyril,

According to the documentation 
(http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair), I 
understand that is is not necessary to repair every node before 
gc_grace_seconds if you are sure that you don't miss to run a repair each time 
a node is down longer than gc_grace_seconds.

"*IF* your operations team is sufficiently on the ball, you can get by without 
repair as long as you do not have hardware failure -- in that case, 
HintedHandoff is adequate to repair successful updates that some replicas have 
missed"

Am I wrong ? Thoughts ?




2013/4/4 mailto:cscetbon@orange.com>>
Hi,

I know that deleted rows can reappear if "node repair" is not run on every node 
before gc_grace_seconds seconds. However do we really need to obey this rule if 
we run "node repair" on node that are down for more than max_hint_window_in_ms 
milliseconds ?

Thanks
--
Cyril SCETBON


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.





_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Re: Problems with shuffle

2013-04-08 Thread Eric Evans
[ Rustam Aliyev ]
> Hi,
> 
> After upgrading to the vnodes I created and enabled shuffle
> operation as suggested. After running for a couple of hours I had to
> disable it because nodes were not catching up with compactions. I
> repeated this process 3 times (enable/disable).
> 
> I have 5 nodes and each of them had ~35GB. After shuffle operations
> described above some nodes are now reaching ~170GB. In the log files
> I can see same files transferred 2-4 times to the same host within
> the same shuffle session. Worst of all, after all of these I had
> only 20 vnodes transferred out of 1280. So if it will continue at
> the same speed it will take about a month or two to complete
> shuffle.

As Edward says, you'll need to issue a cleanup post-shuffle if you expect
to see disk usage match your expectations.

> I had few question to better understand shuffle:
> 
> 1. Does disabling and re-enabling shuffle starts shuffle process from
>scratch or it resumes from the last point?

It resumes.

> 2. Will vnode reallocations speedup as shuffle proceeds or it will
>remain the same?

The shuffle proceeds synchronously, 1 range at a time; It's not going to
speed up as it progresses.

> 3. Why I see multiple transfers of the same file to the same host? e.g.:
> 
>INFO [Streaming to /10.0.1.8:6] 2013-04-07 14:27:10,038
>StreamReplyVerbHandler.java (line 44) Successfully sent
>/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>to /10.0.1.8
>INFO [Streaming to /10.0.1.8:7] 2013-04-07 16:27:07,427
>StreamReplyVerbHandler.java (line 44) Successfully sent
>/u01/cassandra/data/Keyspace/Metadata/Keyspace-Metadata-ib-111-Data.db
>to /10.0.1.8

I'm not sure, but perhaps that file contained data for two different
ranges?

> 4. When I enable/disable shuffle I receive warning message such as
>below. Do I need to worry about it?
> 
>cassandra-shuffle -h localhost disable
>Failed to enable shuffling on 10.0.1.1!
>Failed to enable shuffling on 10.0.1.3!

Is that the verbatim output?  Did it report failing to enable when you
tried to disable?

As a rule of thumb though, you don't want an disable/enable to result in
only a subset of nodes shuffling.  Are there no other errors?  What do
the logs say?

> I couldn't find many docs on shuffle, only read through JIRA and
> original proposal by Eric.

-- 
Eric Evans
eev...@sym-link.com


Re: Issues running Bulkloader program on AIX server

2013-04-08 Thread praveen.akunuru
Thanks Aaron.

We managed to sort out the Snappyjava issue by using snappy-java-1.1.0-M3.jar 
available at http://code.google.com/p/snappy-java/downloads/list.

We are still facing issue with the below error when using Java 1.6.

Unhandled exception
Type=Segmentation error vmState=0xJ9

Generic_Signal_Number=0004 Signal_Number=000b Error_Value= 
Signal_Code=0032
Handler1=09001000A06FF5A0 Handler2=09001000A06F60F0……..
We are troubleshooting this and I will drop a note once we get a breakthrough.

Best Regards,
Praveen

Wipro Limited (Company Regn No in UK FC 019088)
Address: Level 2, West wing, 3 Sheldon Square, London W2 6PS, United Kingdom. 
Tel +44 20 7432 8500 Fax: +44 20 7286 5703 

VAT Number: 563 1964 27

(Branch of Wipro Limited (Incorporated in India at Bangalore with limited 
liability vide Reg no L9KA1945PLC02800 with Registrar of Companies at 
Bangalore, India. Authorized share capital  Rs 5550 mn))

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com