Help: Data Migration Errors

2013-09-08 Thread Ben Waine
Hi -

Can anyone help me with some Cas data migration issues I'm having?

I'm attempting to migrate a dev ring (5 nodes) to a larger production one
(6 nodes). Both are hosted on EC2.

Cluster Info:

Small: Cas v.1.2.6, Rep Factor 1, vnodes enabled
Larger: Cas v.1.2.9, Rep Factor 3, vnodes enabled

After a lot of reading round I decided to use sstableloader:

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

http://www.datastax.com/dev/blog/bulk-loading

http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx

I have a script which exports the target SS tables from each of the nodes
in the smaller ring and moves them to an instance which can connect to the
larger ring and run sstableloader. The script does the following on each
node

nodetool flush
nodetool compact
nodetool scrub
scp to node specific directory on the target

I've set up the sstable loader machine with Cas 1.2.9 and configured it to
bind to it's static IP (listen_address, broadcast_address, rpc_address) and
added a seed node from my target ring into the seed config. The security
groups of the loader and the ring are both open to each other (all ports).

Between tests I'm deleting the contnet of /mnt/cassandra/* and then
creating a fresh schema (matching the smaller rings schema).

I'm getting some errors when attempting to load data from the sstableloader
instance.

On the sstableloader instance:

ERROR 22:03:30,409 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.io.IOException: Connection reset by peer
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
at sun.nio.ch.IOUtil.read(IOUtil.java:198)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.io.InputStream.read(InputStream.java:101)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81)
at java.io.DataInputStream.readInt(DataInputStream.java:388)
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:60)
at
org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:197)
at
org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:180)
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more
Exception in thread "Streaming to /10.xxx.xxx.161:1"
java.lang.RuntimeException: java.io.IOException: Connection reset by peer
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
at sun.nio.ch.IOUtil.read(IOUtil.java:198)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.io.InputStream.read(InputStream.java:101)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81)
at java.io.DataInputStream.readInt(DataInputStream.java:388)
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:60)
at
org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:197)
at
org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:180)
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more

And then the progress bar gets stuck like this:

progress: [/10..xxx.xxx.199 1/1 (100)] [/10.xxx.xxx.75 1/1 (100)]
[/10.xxx.xxx.228 1/1 (100)] [/10.xxx.xxx.243 1/1 (100)] [/10.xxx.xxx.46 1/1
(100)] [/10.xxx.xxx.161 0/1 (300)] [total: 149 - 0MB/s (avg: 0MB/s)]


And on the instances in the ring:

(node which accepts the request)

DEBUG [Thrift:5] 2013-09-08 22:03:30,027 CustomTThreadPoolServer.java (line
209) Thrift transport error occurred during processing of message.
org.

Re: Cassandra crashes - solved

2013-09-08 Thread Jan Algermissen

On 06.09.2013, at 17:07, Jan Algermissen  wrote:

> 
> On 06.09.2013, at 13:12, Alex Major  wrote:
> 
>> Have you changed the appropriate config settings so that Cassandra will run 
>> with only 2GB RAM? You shouldn't find the nodes go down.
>> 
>> Check out this blog post 
>> http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/
>>  , it outlines the configuration settings needed to run Cassandra on 64MB 
>> RAM and might give you some insights.
> 
> Yes, I have my fingers on the knobs and have also seen the article you 
> mention - very helpful indeed. As well as the replies so far. Thanks very 
> much.
> 
> However, I still manage to kill 2 or 3 nodes of my 3-node cluster with my 
> data import :-(

The problem for me was

  in_memory_compaction_limit_in_mb: 1

it seems that the combination of my rather large rows (70 cols each) in 
combination with the slower two-pass compaction process mentioned in the 
comment of the config switch caused the "java.lang.AssertionError: incorrect 
row data size" exceptions.

After turning in_memory_compaction_limit_in_mb back to 64 all I am getting are 
write tmeouts.

AFAIU that is fine because now C* is stable and i all have is a capacity 
problem solvable with more nodes or more RAM (maybe, depends on whether IO is 
an issue).

Jan



> 
> Now, while it would be easy to scale out and up a bit until the default 
> config of C* is sufficient, I really like to dive deep and try to understand 
> why the thing is still going down, IOW, which of my config settings is so 
> darn wrong that in most cases kill -9 remains the only way to shutdown the 
> Java process in the end.
> 
> 
> The problem seems to be the heap size (set to MAX_HEAP_SIZE="640M"   and 
> HEAP_NEWSIZE="120M" ) in combination with some cassandra activity that 
> demands too much heap, right?
> 
> So how do I find out what activity this is and how do I sufficiently reduce 
> that activity.
> 
> What bugs me in general is that AFAIU C* is so eager at giving massive write 
> speed, that it sort of forgets to protect itself from client demand. I would 
> very much like to understand why and how that happens.  I mean: no matter how 
> many clients are flooding the database, it should not die due to out of 
> memory situations, regardless of any configuration specifics, or?
> 
> 
> tl;dr
> 
> Currently my client side (with java-driver) after a while reports more and 
> more timeouts and then the following exception:
> 
> com.datastax.driver.core.ex
> ceptions.DriverInternalError: An unexpected error occured server side: 
> java.lang.OutOfMemoryError: unable 
> to create new native thread ;
> 
> On the server side, my cluster remains more or less in this condition:
> 
> DN  x 71,33 MB   256 34,1%  2f5e0b70-dbf4-4f37-8d5e-746ab76efbae  
> rack1
> UN  x  189,38 MB  256 32,0%  e6d95136-f102-49ce-81ea-72bd6a52ec5f  
> rack1
> UN  x198,49 MB  256 33,9%  0c2931a9-6582-48f2-b65a-e406e0bf1e56  
> rack1
> 
> The host that is down (it is the seed host, if that matters) still shows the 
> running java process, but I cannot shut down cassandra or connect with 
> nodetool, hence kill -9 to the rescue.
> 
> In that host, I still see a load of around 1.
> 
> jstack -F lists 892 threads, all blocked, except for 5 inactive ones.
> 
> 
> The system.log after a few seconds of import shows the following exception:
> 
> java.lang.AssertionError: incorrect row data size 771030 written to 
> /var/lib/cassandra/data/products/product/products-product-tmp-ic-6-Data.db; 
> correct is 771200
>at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)
>at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
>at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>at java.lang.Thread.run(Thread.java:724)
> 
> 
> And then, after about 2 minutes there are out of memory errors:
> 
> ERROR [CompactionExecutor:5] 2013-09-06 11:02:28,630 CassandraDaemon.java 
> (line 192) Exception in thread Thread[CompactionEx

Meaning of "java.lang.AssertionError: incorrect row data size ..."

2013-09-08 Thread Jan Algermissen
Hi,

I keep seeing the error message below in my log files. Can someone explain what 
it means and how to prevent it?




 INFO [OptionalTasks:1] 2013-09-07 13:46:27,160 MeteredFlusher.java (line 58) 
flushing high-traffic column family CFS(Keyspace='pr
oducts', ColumnFamily='product') (estimated 3145728 bytes)
ERROR [CompactionExecutor:2] 2013-09-07 13:46:27,163 CassandraDaemon.java (line 
192) Exception in thread Thread[CompactionExecutor
:2,1,main]
java.lang.AssertionError: incorrect row data size 132289 written to 
/var/lib/cassandra/data/products/product/products-product-tmp-ic-177-Data.db; 
correct is 132382
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162)



I am doing high frequency insert in a 3Node , each 2GB-RAM cluster and 
experience crashing C* processes.
My write pattern is a series of 70-col rows with 5-20 consecutive rows 
pertaining to the same wide row. Mabe that is causing too frequent compactions 
with high volume?.

Jan

Re: Recommended way of data migration

2013-09-08 Thread Paulo Motta
That's a good approach. You could also migrate in-place if you're confident
your migration algorithm is correct, but for more safety having another CF
is better.

If you have a huge volume of data to be migrated (millions of rows or
more), I'd suggest you to use Hadoop to perform these migrations (
http://wiki.apache.org/cassandra/HadoopSupport).

If it's only a few rows, then you could do it programmatically via *
get_range_slices* using the language binding of your choice. Below are some
links on how to perform this on Hector or Pycassa:

* Hector:
http://stackoverflow.com/questions/8418448/cassandra-hector-how-to-retrieve-all-rows-of-a-column-family
* Pycassa:
http://pycassa.github.io/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get_range

I Agree with Edward that you should only delete the rows once you make sure
they were correctly migrated.


2013/9/7 Edward Capriolo 

> I would do something like you are suggesting. I would not do the delete
> until all the rows are moved. Since writes in cassandra are idempotent you
> can even run the migration process multiple times without harm.
>
>
> On Sat, Sep 7, 2013 at 5:31 PM, Renat Gilfanov  wrote:
>
>> Hello,
>>
>> Let's say we have a simple CQL3 table
>>
>> CREATE TABLE example (
>> id UUID PRIMARY KEY,
>> timestamp TIMESTAMP,
>> data ASCII
>> );
>>
>> And I need to mutate  (for example encrypt) column values in the "data"
>> column for all rows.
>>
>> What's the recommended approach to perform such migration
>> programatically?
>>
>> For me the general approach is:
>>
>> 1. Create another column family
>> 2. extract a batch of records
>> 3. for each extracted record, perform mutation, insert it in the new cf
>> and delete from old one
>> 4. repeat until source cf not empty
>>
>> Is it correct approach and if yes, how to implement some kind of paging
>> for the step 2?
>>
>
>


-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*


Re: w00tw00t.at.ISC.SANS.DFind not found

2013-09-08 Thread Tim Dunphy
Richard,

Good advice. Thank you! I'll work on tuning IP tables so that only my other 
Cassandra nodes can connect to mx4j. Good thing I caught this, I was just 
making sure JNA was working when I saw this!

Sent from my iPhone

On Sep 8, 2013, at 5:40 AM, Richard Low  wrote:

> On 8 September 2013 02:55, Tim Dunphy  wrote:
>> Hey all,
>> 
>>  I'm seeing this exception in my cassandra logs:
>> 
>> Exception during http request
>> mx4j.tools.adaptor.http.HttpException: file 
>> mx4j/tools/adaptor/http/xsl/w00tw00t.at.ISC.SANS.DFind:) not found
>> at 
>> mx4j.tools.adaptor.http.XSLTProcessor.notFoundElement(XSLTProcessor.java:314)
>> at 
>> mx4j.tools.adaptor.http.HttpAdaptor.findUnknownElement(HttpAdaptor.java:800)
>> at 
>> mx4j.tools.adaptor.http.HttpAdaptor$HttpClient.run(HttpAdaptor.java:976)
>> 
>> Do I need to be concerned about the security of this server? How can I 
>> correct/eliminate this error message? I've just upgraded to Cassandra 2.0 
>> ,and this is the first time I've seen this error. 
> 
> There is a web vulnerability scanner that does "GET 
> /w00tw00t.at.ISC.SANS.DFind:)" on anything it thinks is HTTP.  This probably 
> means your mx4j port is open to the public which is a security issue.  This 
> means anyone can e.g. delete all your data or stop your Cassandra nodes.  You 
> should make sure that all your Cassandra ports (at least) are firewalled so 
> only you and other nodes can connect.
> 
> Richard.


C* 2.0 reduce_cache_sizes_at ?

2013-09-08 Thread Andrew Cobley
I'm following John Berryman's blog "Building the Perfect Cassandra Test 
Environment" concerning running C* in a very small amount of memory.  he 
recommends theses settings in cassandra.yaml

reduce_cache_sizes_at: 0
reduce_cache_capacity_to: 0

(Blog is at 
http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/)

I'm assuming the blog must be talking about C* prior to version 2.0 because 
these settings do not appear in 2.0's  .yaml file.

Why where they removed and what's the alternative ?

Andy

reduce_cache_sizes_at: 0 reduce_cache_capacity_to: 0 - See more at: 
http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/#sthash.T7wBs9Yv.dpuf

The University of Dundee is a registered Scottish Charity, No: SC015096



Re: w00tw00t.at.ISC.SANS.DFind not found

2013-09-08 Thread Richard Low
On 8 September 2013 02:55, Tim Dunphy  wrote:

> Hey all,
>
>  I'm seeing this exception in my cassandra logs:
>
> Exception during http request
> mx4j.tools.adaptor.http.HttpException: file
> mx4j/tools/adaptor/http/xsl/w00tw00t.at.ISC.SANS.DFind:) not found
> at
> mx4j.tools.adaptor.http.XSLTProcessor.notFoundElement(XSLTProcessor.java:314)
> at
> mx4j.tools.adaptor.http.HttpAdaptor.findUnknownElement(HttpAdaptor.java:800)
> at
> mx4j.tools.adaptor.http.HttpAdaptor$HttpClient.run(HttpAdaptor.java:976)
>
> Do I need to be concerned about the security of this server? How can I
> correct/eliminate this error message? I've just upgraded to Cassandra 2.0
> ,and this is the first time I've seen this error.
>

There is a web vulnerability scanner that does "GET
/w00tw00t.at.ISC.SANS.DFind:)" on anything it thinks is HTTP.  This
probably means your mx4j port is open to the public which is a security
issue.  This means anyone can e.g. delete all your data or stop your
Cassandra nodes.  You should make sure that all your Cassandra ports (at
least) are firewalled so only you and other nodes can connect.

Richard.


Re: Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)

2013-09-08 Thread Petter von Dolwitz (Hem)
Thank you for you reply.

I will look into this. I cannot not get my head around why the scenario I
am describing does not work though. Should I report an issue around this or
is this expected behaviour? A similar setup is described on this blog post
by the development lead.

http://www.datastax.com/dev/blog/cql3-for-cassandra-experts




2013/9/6 Robert Coli 

> On Fri, Sep 6, 2013 at 6:18 AM, Petter von Dolwitz (Hem) <
> petter.von.dolw...@gmail.com> wrote:
>
>> I am struggling with getting secondary indexes to work. I have created
>> secondary indexes on some fields that are part of the compound primary key
>> but only one of the indexes seems to work (the one set on the field 'e' on
>> the table definition below). Using any other secondary index in a where
>> clause causes the message "Request did not complete within rpc_timeout.".
>> It seems like if a put a value in the where clause that does not exist in a
>> column with secondary index then cassandra quickly return with the result
>> (0 rows) but if a put in a value that do exist I get a timeout. There is no
>> exception in the logs in connection with this. I've tried to increase the
>> timeout to a minute but it does not help.
>>
>
> In general unless you absolutely need the atomicity of the update of a
> secondary index with the underlying storage row, you are better off making
> a manual secondary index column family.
>
> =Rob
>