Re: MultiInput/MultiGet CF in MapReduce

2013-03-31 Thread aaron morton
> If I would use client.get_slice ( key).  My rowkey is '20130314'  from Index 
> Table.
> Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
Calculate the MD5 hash of the key and find the token range that contains it. 
This is what is used internally 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/FBUtilities.java#L239

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/03/2013, at 10:45 AM, Alicia Leong  wrote:

> This is the current flow for ColumnFamilyInputFormat.  Please correct me If 
> I'm wrong
> 
> 1) In ColumnFamilyInputFormat, Get all nodes token ranges using 
> client.describe_ring
> 2) Get CfSplit using client.describe_splits_ex with the token range
> 2) new ColumnFamilySplit with start range, end range and endpoint
> 3) In ColumnFamilyRecordReader, will query client.get_range_slices with the 
> start range & end range of the ColumnFamilySplit at endpoint (datanode)
> 
> 
> If I would use client.get_slice ( key).  My rowkey is '20130314'  from Index 
> Table.
> Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
> Even though I manage to find out the Token Range & EndPoint.  
> Is the available Thrift API, that I can pass the ( ByteBuffer key, KeyRange 
> range )  Likes merge of client.get_slice & client.get_range_slices
> 
> 
> Thanks
> 
> 
> 
> On Sat, Mar 30, 2013 at 7:53 AM, Edward Capriolo  
> wrote:
> You can use the output of describe_ring along with partitioner information to 
> determine which nodes data lives on.
> 
> 
> On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong  wrote:
> Hi All
> I’m thinking to do in this way.
> 
> 1)  1) get_slice ( MMDDHH )  from Index Table.
> 
> 2)  2) With the returned list of ROWKEYs
> 
> 3)  3) Pass it to multiget_slice ( keys …)
> 
>  
> But my questions is how to ensure ‘Data Locality’  ??
> 
> 
> 
> On Tue, Mar 19, 2013 at 3:33 PM, aaron morton  wrote:
> I would be looking at Hive or Pig, rather than writing the MapReduce. 
> 
> There is an example in the source cassandra distribution, or you can look at 
> Data Stax Enterprise to start playing with Hive. 
> 
> Typically with hadoop queries you want to query a lot of data, if you are 
> only querying a few rows consider writing the code in your favourite 
> language. 
> 
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/03/2013, at 1:29 PM, Alicia Leong  wrote:
> 
>> Hi All
>> 
>> I have 2 tables 
>> 
>> Data Table 
>> -
>> RowKey: 1 
>> => (column=name, value=apple) 
>> RowKey: 2 
>> => (column=name, value=orange) 
>> RowKey: 3 
>> => (column=name, value=banana) 
>> RowKey: 4 
>> => (column=name, value=mango) 
>> 
>> 
>> Index Table (MMDDHH)
>> 
>> RowKey: 2013030114 
>> => (column=1, value=) 
>> => (column=2, value=) 
>> => (column=3, value=) 
>> RowKey: 2013030115 
>> => (column=4, value=) 
>> 
>> 
>> I would like to know, how to implement below in MapReduce 
>> 1) first query the Index Table by RowKey: 2013030114 
>> 2) then pass the Index Table column names  (1,2,3) to query the Data Table 
>> 
>> Thanks in advance.
> 
> 
> 
> 



Re: Lost data after expanding cluster c* 1.2.3-1

2013-03-31 Thread aaron morton
Please do not rely on colour in your emails, the best way to get your emails 
accepted by the Apache mail servers is to use plain text. 

> At this moment the errors started, we see that members and other data are 
> gone, at this moment the nodetool status return (in red color the 3 new nodes)
What errors?

> I put for each of them seeds = A ip, and start each with two minutes 
> intervals. 
When I'm making changes I tend to change a single node first, confirm 
everything is OK and then do a bulk change.

> Now the cluster seem to work normally, but i can use the secondary for the 
> moment, the queryanswer are random
run nodetool repair -pr on each node, let it finish before starting the next 
one. 
if you are using secondary indexes use nodetool rebuild_index to rebuild those. 
Add one node new node to the cluster and confirm everything is ok, then add the 
remaining ones. 

I'm not sure what or why it went wrong, but that should get you to a stable 
place. If you have any problems keep an eye on the logs for errors or warnings. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/03/2013, at 10:01 PM, Kais Ahmed  wrote:

> Hi aaron,
> 
> Thanks for reply, i will try to explain what append exactly
> 
> I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami 
> (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with 
> this config --clustername myDSCcluster --totalnodes 4--version community
> 
> Two days after this cluster in production, i saw that the cluster was 
> overload, I wanted to extend it by adding 3 another nodes.
> 
> I create a new cluster with 3 C* [D,E,F]  
> (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> 
> And follow the documentation 
> (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the 
> ring.
> I put for each of them seeds = A ip, and start each with two minutes 
> intervals. 
> 
> At this moment the errors started, we see that members and other data are 
> gone, at this moment the nodetool status return (in red color the 3 new nodes)
> 
> Datacenter: eu-west
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/
>> Moving
>> --  Address   Load   Tokens  Owns   Host ID  
>>  Rack
>> UN  10.34.142.xxx 10.79 GB   256 15.4%  
>> 4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> UN  10.32.49.xxx   1.48 MB25613.7%  
>> e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> UN  10.33.206.xxx  2.19 MB25611.9%  
>> 92af17c3-954a-4511-bc90-29a9657623e4  1b
>> UN  10.32.27.xxx   1.95 MB256  14.9%  
>> 862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> UN  10.34.139.xxx 11.67 GB   25615.5%  
>> 0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> UN  10.34.147.xxx 11.18 GB   256 13.9%  
>> cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> UN  10.33.193.xxx 10.83 GB   256  14.7%  
>> 59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> 
> I saw that the 3 nodes have join the ring but they had no data, i put the 
> website in maintenance and lauch a nodetool repair on
> the 3 new nodes, during 5 hours i see in opcenter the data streamed to the 
> new nodes (very nice :))
> 
> During this time, i write a script to check if all members are present 
> (relative to a copy of members in mysql).
> 
> After data streamed seems to be finish, but i'm not sure because nodetool 
> compactionstats show pending task but nodetool netstats seems to be ok.
> 
> I ran my script to check if the data, but members are still missing.
> 
> I decide to roolback by running nodetool decommission node D, E, F
> 
> I re run my script, all seems to be ok but secondary index have strange 
> behavior, 
> some time the row was returned some times no result.
> 
> the user kais can be retrieve using his key with cassandra-cli but if i use 
> cqlsh :
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> 
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> 
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> 
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> 
>  kais
> 
> cqlsh:mydatabase>Tracing on;
> When tracing is activate i have this error but not all time
> cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> unsupported operand type(s) for /: 'NoneType' and 'float'
> 
> 
> NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 
> 3) on node D was replicated on E and F, that would seem strange because its 3 
> node was not correctly filled
> 
> Now the cluster seem to work normally, but i can use the secondar

Re: weird behavior with RAID 0 on EC2

2013-03-31 Thread Rudolf van der Leeden
I've seen the same behaviour (SLOW ephemeral disk) a few times. 
You can't do anything with a single slow disk except not using it. 
Our solution was always: Replace the m1.xlarge instance asap and everything is 
good.
-Rudolf.

On 31.03.2013, at 18:58, Alexis Lê-Quôc wrote:

> Alain,
> 
> Can you post your mdadm --detail /dev/md0 output here as well as your iostat 
> -x -d when that happens. A bad ephemeral drive on EC2 is not unheard of.
> 
> Alexis | @alq | http://datadog.com
> 
> P.S. also, disk utilization is not a reliable metric, iostat's await and 
> svctm are more useful imho.
> 
> 
> On Sun, Mar 31, 2013 at 6:03 AM, aaron morton  wrote:
>> Ok, if you're going to look into it, please keep me/us posted.
> 
> It's not on my radar.
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ  wrote:
> 
>> Ok, if you're going to look into it, please keep me/us posted.
>> 
>> It happen twice for me, the same day, within a few hours on the same node 
>> and only happened to 1 node out of 12, making this node almost unreachable.
>> 
>> 
>> 2013/3/28 aaron morton 
>> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well, 1 
>> or 2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish. 
>> 
>> Have not looked into it. 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ  wrote:
>> 
>>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd, xvde 
>>> parts of a logical Raid0 (md0).
>>> 
>>> I use to see their use increasing in the same way. This morning there was a 
>>> normal minor compaction followed by messages dropped on one node (out of 
>>> 12).
>>> 
>>> Looking closely at this node I saw the following:
>>> 
>>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>>> 
>>> On this node, one of the four disks (xvdd) started working hardly while 
>>> other worked less intensively.
>>> 
>>> This is quite weird since I always saw this 4 disks being used the exact 
>>> same way at every moment (as you can see on 5 other nodes or when the node 
>>> ".239" come back to normal).
>>> 
>>> Any idea on what happened and on how it can be avoided ?
>>> 
>>> Alain
>> 
>> 
> 
> 



Re: weird behavior with RAID 0 on EC2

2013-03-31 Thread Alexis Lê-Quôc
Alain,

Can you post your mdadm --detail /dev/md0 output here as well as your
iostat -x -d when that happens. A bad ephemeral drive on EC2 is not unheard
of.

Alexis | @alq | http://datadog.com

P.S. also, disk utilization is not a reliable metric, iostat's await and
svctm are more useful imho.


On Sun, Mar 31, 2013 at 6:03 AM, aaron morton wrote:

> Ok, if you're going to look into it, please keep me/us posted.
>
> It's not on my radar.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ  wrote:
>
> Ok, if you're going to look into it, please keep me/us posted.
>
> It happen twice for me, the same day, within a few hours on the same node
> and only happened to 1 node out of 12, making this node almost unreachable.
>
>
> 2013/3/28 aaron morton 
>
>> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well,
>> 1 or 2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish.
>>
>> Have not looked into it.
>>
>> Cheers
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ  wrote:
>>
>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd,
>> xvde parts of a logical Raid0 (md0).
>>
>> I use to see their use increasing in the same way. This morning there was
>> a normal minor compaction followed by messages dropped on one node (out of
>> 12).
>>
>> Looking closely at this node I saw the following:
>>
>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>>
>> On this node, one of the four disks (xvdd) started working hardly while
>> other worked less intensively.
>>
>> This is quite weird since I always saw this 4 disks being used the exact
>> same way at every moment (as you can see on 5 other nodes or when the node
>> ".239" come back to normal).
>>
>> Any idea on what happened and on how it can be avoided ?
>>
>> Alain
>>
>>
>>
>
>


Re: Lost data after expanding cluster c* 1.2.3-1

2013-03-31 Thread Kais Ahmed
Hi aaron,

Thanks for reply, i will try to explain what append exactly

I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami (
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with
this config --clustername myDSCcluster --totalnodes 4--version community

Two days after this cluster in production, i saw that the cluster was
overload, I wanted to extend it by adding 3 another nodes.

I create a new cluster with 3 C* [D,E,F]  (
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)

And follow the documentation (
http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the
ring.
I put for each of them seeds = A ip, and start each with two minutes
intervals.

At this moment the errors started, we see that members and other data are
gone, at this moment the nodetool status return (in red color the 3 new
nodes)

Datacenter: eu-west
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/

Moving
--  Address   Load   Tokens  Owns   Host
ID   Rack
UN  10.34.142.xxx 10.79 GB   256 15.4%
4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
UN  10.32.49.xxx   1.48 MB25613.7%
e86f67b6-d7cb-4b47-b090-3824a5887145
1b
UN  10.33.206.xxx  2.19 MB25611.9%
92af17c3-954a-4511-bc90-29a9657623e4
1b
UN  10.32.27.xxx   1.95 MB256  14.9%
862e6b39-b380-40b4-9d61-d83cb8dacf9e
1b
UN  10.34.139.xxx 11.67 GB   25615.5%
0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
UN  10.34.147.xxx 11.18 GB   256 13.9%
cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
UN  10.33.193.xxx 10.83 GB   256  14.7%
59f440db-cd2d-4041-aab4-fc8e9518c954  1b


I saw that the 3 nodes have join the ring but they had no data, i put the
website in maintenance and lauch a nodetool repair on
the 3 new nodes, during 5 hours i see in opcenter the data streamed to the
new nodes (very nice :))

During this time, i write a script to check if all members are present
(relative
to a copy of members in mysql).

After data streamed seems to be finish, but i'm not sure because nodetool
compactionstats show pending task but nodetool netstats seems to be ok.

I ran my script to check if the data, but members are still missing.

I decide to roolback by running nodetool decommission node D, E, F

I re run my script, all seems to be ok but secondary index have strange
behavior,
some time the row was returned some times no result.

the user kais can be retrieve using his key with cassandra-cli but if i use
cqlsh :

cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login

 kais

cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login

 kais

cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login

 kais

cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
cqlsh:database> SELECT login FROM userdata where login='kais' ;

 login

 kais

cqlsh:mydatabase>Tracing on;
When tracing is activate i have this error but not all time
cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
unsupported operand type(s) for /: 'NoneType' and 'float'


NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF
3) on node D was replicated on E and F, that would seem strange because its
3 node was not correctly filled

Now the cluster seem to work normally, but i can use the secondary for the
moment, the query answer are random

Thanks a lot for any help,
Kais





2013/3/31 aaron morton 

> First thought is the new nodes were marked as seeds.
> Next thought is check the logs for errors.
>
> You can always run a nodetool repair if you are concerned data is not
> where you think it should be.
>
> Cheers
>
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/03/2013, at 8:01 PM, Kais Ahmed  wrote:
>
> Hi all,
>
> I follow this tutorial for expanding a 4 c* cluster (production) and add 3
> new nodes.
>
> Datacenter: eu-west
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns   Host
> ID   Rack
> UN  10.34.142.xxx 10.79 GB   256 15.4%
> 4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> UN  10.32.49.xxx   1.48 MB25613.7%
> e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> UN  10.33.206.xxx  2.19 MB25611.9%
> 92af17c3-954a-4511-bc90-29a9657623e4  1b
> UN  10.32.27.xxx   1.95 MB256  14.9%
> 862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> UN  10.34.139.xxx 11.67 GB   25615.5%
> 0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> UN  10.34.147.xxx 11.18 GB   256 13.9%
> cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> UN  10.33.193.xxx 10.83 GB   256  14.7%
> 59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>
> The data a

Re: CQL queries timing out (and had worked)

2013-03-31 Thread Edward Capriolo
Technically it should work a mix of hsha and the other option. I tried a
mix/match as and I noticed some clients were not happy and some other odd
stuff, but I could not tie it down to the setting because thrift from the
cli was working for me.


On Sun, Mar 31, 2013 at 6:30 AM, aaron morton wrote:

> So that mismatch can break rpc across the cluster, apparently.
>
> mmm, that ain't right.
>
> Anything in the logs?
> Can you reproduce this on a small cluster or using ccm
> https://github.com/pcmanus/ccm ?
> Can you raise a ticket ?
>
> Thanks
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/03/2013, at 9:17 PM, David McNelis  wrote:
>
> Final reason for problem:
>
> We'd had one node's config for rpc type changed from sync to hsha...
>
> So that mismatch can break rpc across the cluster, apparently.
>
> It would be nice if there was a good way to set that in a single spot for
> the cluster or handle the mismatch differently.  Otherwise, if you wanted
> to change from sync to hsha in a cluster you'd have to entirely restart the
> cluster (not a big deal), but CQL would apparently not work at all until
> all of your nodes had been restarted.
>
>
> On Fri, Mar 29, 2013 at 10:35 AM, David McNelis wrote:
>
>> Appears that restarting a node makes CQL available on that node again,
>> but only that node.
>>
>> Looks like I'll be doing a rolling restart.
>>
>>
>> On Fri, Mar 29, 2013 at 10:26 AM, David McNelis wrote:
>>
>>> I'm running 1.2.3 and have both CQL3 tabels and old school style CFs in
>>> my cluster.
>>>
>>> I'd had a large insert job running the last several days which just
>>> ended it had been inserting using cql3 insert statements in a cql3
>>> table.
>>>
>>> Now, I show no compactions going on in my cluster but for some reason
>>> any cql3 query I try to execute, insert, select, through cqlsh or through
>>> external library, all time out with an rpc_timeout.
>>>
>>> If I use cassandra-cli, I can do "list tablename limit 10" and
>>> immediately get my 10 rows back.
>>>
>>> However, if I do "select * from tablename limit 10" I get the rpc
>>> timeout error.  Same table, same server.  It doesn't seem to matter if I'm
>>> hitting a cql3 definited table or older style.
>>>
>>> Load on the nodes is relatively low at the moment.
>>>
>>> Any suggestions short of restarting nodes?  This is a pretty major issue
>>> for us right now.
>>>
>>
>>
>
>


Re: Cassandra/MapReduce ‘Data Locality’

2013-03-31 Thread Alicia Leong
I have 4 Cassandra nodes that also installed with Datanode & TaskTracker.
This log printed at the console, when I execute hadoop jar

TokenRange (1) >> 127605887595351923798765477786913079296 => 0

TokenRange (2) >> 85070591730234615865843651857942052864 =>
127605887595351923798765477786913079296

TokenRange (3) >> 42535295865117307932921825928971026432 =>
85070591730234615865843651857942052864

TokenRange (4) >> 0 => 42535295865117307932921825928971026432

ColumnFamilySplit((127605887595351923798765477786913079296, '-1] @[d2t0050g
])

ColumnFamilySplit((-1, '0] @[d2t0050g])

ColumnFamilySplit((85070591730234615865843651857942052864,
'127605887595351923798765477786913079296] @[d2t0053g])

ColumnFamilySplit((42535295865117307932921825928971026432,
'85070591730234615865843651857942052864] @[d2t0052g])

ColumnFamilySplit((0, '42535295865117307932921825928971026432] @[d2t0051g])


Below logs I grabbed it at each Datanode map log folder.
In my mapper map method
System.out.println("Rowkey " + ByteBufferUtil.toInt(key) + ":" + name + "="
+ value + " from " + context.getInputSplit());

*RF1*---

*d2t0050g*

KeyRange(start_token:127605887595351923798765477786913079296, end_token:-1,
count:4096)

*d2t0051g*

KeyRange(start_token:85070591730234615865843651857942052864,
end_token:127605887595351923798765477786913079296, count:4096)

Rowkey:3; columnvalue=Critics Choice Awards from
ColumnFamilySplit((85070591730234615865843651857942052864,
'127605887595351923798765477786913079296] @[d2t0053g])

KeyRange(start_token:117356732921465116845890410746976120467,
end_token:127605887595351923798765477786913079296, count:4096)

KeyRange(start_token:0, end_token:42535295865117307932921825928971026432,
count:4096)

Rowkey:1; columnvalue=Academy Awards from ColumnFamilySplit((0,
'42535295865117307932921825928971026432] @[d2t0051g])

Rowkey 2: columnvalue=Golden Globe Awards from ColumnFamilySplit((0,
'42535295865117307932921825928971026432] @[d2t0051g])

KeyRange(start_token:19847720572362509985402305765727304993,
end_token:42535295865117307932921825928971026432, count:4096)

*d2t0052g*

KeyRange(start_token:42535295865117307932921825928971026432,
end_token:85070591730234615865843651857942052864, count:4096)

KeyRange(start_token:-1, end_token:0, count:4096)

*d2t0053g*

Nil





On Sun, Mar 31, 2013 at 6:26 PM, aaron morton wrote:

> > ColumnFamilySplit((85070591730234615865843651857942052864,
> '127605887595351923798765477786913079296] @[d2t0053g])
> Can you provide some more information on where these log lines are from
> and what you did to get them ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/03/2013, at 9:12 PM, Alicia Leong  wrote:
>
> > Hi All,
> >
> > CfSplit that highlighted in RED , in d2t0053g
> >
> > But why it being submitted to d2t0051g not d2t0053g ??
> >
> > Is this normal? Does this matter? In this case is no longer ‘Data
> Locality’ correct ?
> >
> >
> >
> > I’m using hadoop-1.1.2 & apache-cassandra-1.2.3.
> >
> > TokenRange (1) >> 127605887595351923798765477786913079296 => 0
> >
> > TokenRange (2) >> 85070591730234615865843651857942052864 =>
> 127605887595351923798765477786913079296
> >
> > TokenRange (3) >> 42535295865117307932921825928971026432 =>
> 85070591730234615865843651857942052864
> >
> > TokenRange (4) >> 0 => 42535295865117307932921825928971026432
> >
> > ColumnFamilySplit((127605887595351923798765477786913079296, '-1]
> @[d2t0050g])
> >
> > ColumnFamilySplit((-1, '0] @[d2t0050g])
> >
> > ColumnFamilySplit((85070591730234615865843651857942052864,
> '127605887595351923798765477786913079296] @[d2t0053g])
> >
> > ColumnFamilySplit((42535295865117307932921825928971026432,
> '85070591730234615865843651857942052864] @[d2t0052g])
> >
> > ColumnFamilySplit((0, '42535295865117307932921825928971026432]
> @[d2t0051g])
> >
> >
> > RF1---
> >
> > d2t0050g
> >
> > KeyRange(start_token:127605887595351923798765477786913079296,
> end_token:-1, count:4096)
> >
> > d2t0051g
> >
> > KeyRange(start_token:85070591730234615865843651857942052864,
> end_token:127605887595351923798765477786913079296, count:4096)
> >
> > Rowkey:3; columnvalue=Critics Choice Awards from
> ColumnFamilySplit((85070591730234615865843651857942052864,
> '127605887595351923798765477786913079296] @[d2t0053g])
> >
> > KeyRange(start_token:117356732921465116845890410746976120467,
> end_token:127605887595351923798765477786913079296, count:4096)
> >
> > KeyRange(start_token:0,
> end_token:42535295865117307932921825928971026432, count:4096)
> >
> > Rowkey:1; columnvalue=Academy Awards from ColumnFamilySplit((0,
> '42535295865117307932921825928971026432] @[d2t0051g])
> >
> > Rowkey 2: columnvalue=Golden Globe Awards from ColumnFamilySplit((0,
> '42535295865117307932921825928971026432] @[d2t0051g])
> >
> > KeyRange(start_token:19847720572362509985402305765727304993,
> end_token:42535295865117307932921825928971026432, count:4096

Re: Insert v/s Update performance

2013-03-31 Thread aaron morton
> How this parameter works? I have 3 nodes and 2 core each CPU and I have 
> higher writes.
It slows down the rate that compaction reads from disk. It reads at bit then 
has to take a break and wait until it can read again. 
With only 2 cores you will be running into issues when compaction or repair do 
their work. 

> So usually for high update and high read situation what parameter we should 
> consider for tuning?
In this case I think the issue is only having 2 cores. There are background 
processing like compaction and repair that have to run when you system is 
running.

Slowing down compaction will reduce it's impact. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/03/2013, at 12:58 AM, Jay Svc  wrote:

> Hi Aaron,
>  
> Thank you for your input. I have been monitoring my GC activities and looking 
> at my Heap, it shows pretty linear activities, without any spikes.
>  
> When I look at CPU it shows higher utilization while during writes alone. I 
> also expect hevy read traffic.
>  
> When I tried compaction_throughput_* parameter, I obsered that higher number 
> here in my case gets better CPU utilization and keeps pending compactions 
> pretty low. How this parameter works? I have 3 nodes and 2 core each CPU and 
> I have higher writes.
>  
> So usually for high update and high read situation what parameter we should 
> consider for tuning?
>  
> Thanks,
> Jay
>  
>  
>  
> 
> 
> On Wed, Mar 27, 2013 at 9:55 PM, aaron morton  wrote:
> * Check for GC activity in the logs
> * check the volume the commit log is on to see it it's over utilised. 
> * check if the dropped messages correlate to compaction, look at the 
> compaction_* settings in yaml and consider reducing the throughput. 
> 
> Like Dean says if you have existing data it will result in more compaction. 
> You may be able to get a lot of writes through in a clean new cluster, but it 
> also has to work when compaction and repair are running. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/03/2013, at 1:43 PM, Jay Svc  wrote:
> 
>> Thanks Dean again!
>>  
>> My use case is high number of reads and writes out of that I am just 
>> focusing on write now. I thought LCS is a suitable for my situation. I tried 
>> simillar on STCS and results are same.
>>  
>> I ran nodetool for tpstats and MutationStage pending are very high. At the 
>> same time the SSTable count and Pending Compaction are high too during my 
>> updates.
>>  
>> Please find the snapshot of my syslog.
>>  
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:48,560 StatusLogger.java (line 116) 
>> OpsCenter.rollups864000,0
>> INFO [FlushWriter:55] 2013-03-26 15:05:48,608 Memtable.java (line 264) 
>> Writing Memtable-InventoryPrice@1051586614(11438914/129587272 
>> serialized/live bytes, 404320 ops)
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,561 MessagingService.java (line 
>> 658) 2701 MUTATION messages dropped in last 5000ms
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,562 StatusLogger.java (line 57) 
>> Pool NameActive   Pending   Blocked
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,563 StatusLogger.java (line 72) 
>> ReadStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,568 StatusLogger.java (line 72) 
>> RequestResponseStage  0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,627 StatusLogger.java (line 72) 
>> ReadRepairStage   0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,627 StatusLogger.java (line 72) 
>> MutationStage32 19967 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,628 StatusLogger.java (line 72) 
>> ReplicateOnWriteStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,628 StatusLogger.java (line 72) 
>> GossipStage   0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,628 StatusLogger.java (line 72) 
>> AntiEntropyStage  0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,629 StatusLogger.java (line 72) 
>> MigrationStage0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,629 StatusLogger.java (line 72) 
>> StreamStage   0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,629 StatusLogger.java (line 72) 
>> MemtablePostFlusher   1 1 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,673 StatusLogger.java (line 72) 
>> FlushWriter   1 1 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,673 StatusLogger.java (line 72) 
>> MiscStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,673 StatusLogger.java (line

Re: CQL queries timing out (and had worked)

2013-03-31 Thread aaron morton
> So that mismatch can break rpc across the cluster, apparently.  
mmm, that ain't right. 

Anything in the logs?
Can you reproduce this on a small cluster or using ccm 
https://github.com/pcmanus/ccm ?
Can you raise a ticket ? 

Thanks

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 9:17 PM, David McNelis  wrote:

> Final reason for problem:
> 
> We'd had one node's config for rpc type changed from sync to hsha...  
> 
> So that mismatch can break rpc across the cluster, apparently.  
> 
> It would be nice if there was a good way to set that in a single spot for the 
> cluster or handle the mismatch differently.  Otherwise, if you wanted to 
> change from sync to hsha in a cluster you'd have to entirely restart the 
> cluster (not a big deal), but CQL would apparently not work at all until all 
> of your nodes had been restarted.
> 
> 
> On Fri, Mar 29, 2013 at 10:35 AM, David McNelis  wrote:
> Appears that restarting a node makes CQL available on that node again, but 
> only that node.
> 
> Looks like I'll be doing a rolling restart.
> 
> 
> On Fri, Mar 29, 2013 at 10:26 AM, David McNelis  wrote:
> I'm running 1.2.3 and have both CQL3 tabels and old school style CFs in my 
> cluster.
> 
> I'd had a large insert job running the last several days which just ended 
> it had been inserting using cql3 insert statements in a cql3 table.
> 
> Now, I show no compactions going on in my cluster but for some reason any 
> cql3 query I try to execute, insert, select, through cqlsh or through 
> external library, all time out with an rpc_timeout.
> 
> If I use cassandra-cli, I can do "list tablename limit 10" and immediately 
> get my 10 rows back.
> 
> However, if I do "select * from tablename limit 10" I get the rpc timeout 
> error.  Same table, same server.  It doesn't seem to matter if I'm hitting a 
> cql3 definited table or older style.
> 
> Load on the nodes is relatively low at the moment. 
> 
> Any suggestions short of restarting nodes?  This is a pretty major issue for 
> us right now.
> 
> 



Re: Cassandra/MapReduce ‘Data Locality’

2013-03-31 Thread aaron morton
> ColumnFamilySplit((85070591730234615865843651857942052864, 
> '127605887595351923798765477786913079296] @[d2t0053g])
Can you provide some more information on where these log lines are from and 
what you did to get them ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 9:12 PM, Alicia Leong  wrote:

> Hi All,
> 
> CfSplit that highlighted in RED , in d2t0053g
> 
> But why it being submitted to d2t0051g not d2t0053g ??
> 
> Is this normal? Does this matter? In this case is no longer ‘Data Locality’ 
> correct ?
> 
> 
> 
> I’m using hadoop-1.1.2 & apache-cassandra-1.2.3.
> 
> TokenRange (1) >> 127605887595351923798765477786913079296 => 0
> 
> TokenRange (2) >> 85070591730234615865843651857942052864 => 
> 127605887595351923798765477786913079296
> 
> TokenRange (3) >> 42535295865117307932921825928971026432 => 
> 85070591730234615865843651857942052864
> 
> TokenRange (4) >> 0 => 42535295865117307932921825928971026432
> 
> ColumnFamilySplit((127605887595351923798765477786913079296, '-1] @[d2t0050g])
> 
> ColumnFamilySplit((-1, '0] @[d2t0050g])
> 
> ColumnFamilySplit((85070591730234615865843651857942052864, 
> '127605887595351923798765477786913079296] @[d2t0053g])
> 
> ColumnFamilySplit((42535295865117307932921825928971026432, 
> '85070591730234615865843651857942052864] @[d2t0052g])
> 
> ColumnFamilySplit((0, '42535295865117307932921825928971026432] @[d2t0051g])
> 
>  
> RF1---
> 
> d2t0050g
> 
> KeyRange(start_token:127605887595351923798765477786913079296, end_token:-1, 
> count:4096)
> 
> d2t0051g
> 
> KeyRange(start_token:85070591730234615865843651857942052864, 
> end_token:127605887595351923798765477786913079296, count:4096)
> 
> Rowkey:3; columnvalue=Critics Choice Awards from 
> ColumnFamilySplit((85070591730234615865843651857942052864, 
> '127605887595351923798765477786913079296] @[d2t0053g])
> 
> KeyRange(start_token:117356732921465116845890410746976120467, 
> end_token:127605887595351923798765477786913079296, count:4096)
> 
> KeyRange(start_token:0, end_token:42535295865117307932921825928971026432, 
> count:4096)
> 
> Rowkey:1; columnvalue=Academy Awards from ColumnFamilySplit((0, 
> '42535295865117307932921825928971026432] @[d2t0051g])
> 
> Rowkey 2: columnvalue=Golden Globe Awards from ColumnFamilySplit((0, 
> '42535295865117307932921825928971026432] @[d2t0051g])
> 
> KeyRange(start_token:19847720572362509985402305765727304993, 
> end_token:42535295865117307932921825928971026432, count:4096)
> 
> d2t0052g
> 
> KeyRange(start_token:42535295865117307932921825928971026432, 
> end_token:85070591730234615865843651857942052864, count:4096)
> 
> KeyRange(start_token:-1, end_token:0, count:4096)
> 
> d2t0053g
> 
> Nil
> 
> 
> 
> 
> 
> 
> 
> Thanks in advance.
> 



Re: Lost data after expanding cluster c* 1.2.3-1

2013-03-31 Thread aaron morton
First thought is the new nodes were marked as seeds. 
Next thought is check the logs for errors. 

You can always run a nodetool repair if you are concerned data is not where you 
think it should be. 

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 8:01 PM, Kais Ahmed  wrote:

> Hi all,
> 
> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 
> new nodes.
> 
> Datacenter: eu-west
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns   Host ID   
> Rack
> UN  10.34.142.xxx 10.79 GB   256 15.4%  
> 4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
> UN  10.32.49.xxx   1.48 MB25613.7%  
> e86f67b6-d7cb-4b47-b090-3824a5887145  1b
> UN  10.33.206.xxx  2.19 MB25611.9%  
> 92af17c3-954a-4511-bc90-29a9657623e4  1b
> UN  10.32.27.xxx   1.95 MB256  14.9%  
> 862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
> UN  10.34.139.xxx 11.67 GB   25615.5%  
> 0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
> UN  10.34.147.xxx 11.18 GB   256 13.9%  
> cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
> UN  10.33.193.xxx 10.83 GB   256  14.7%  
> 59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> 
> The data are not streamed.
> 
> Can any one help me, our web site is down.
> 
> Thanks a lot,
> 
> 



Re: Timeseries data

2013-03-31 Thread aaron morton
> I think if you use Level compaction, the number of sstables you will touch 
> will be less because sstables in each level is non overlapping except L0.
You will want to do some testing because LCS uses extra IO to make those 
guarantees. You will also want to look at the SSTable size with LCS if you are 
going to have wide rows. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 12:18 PM, sankalp kohli  wrote:

> I think if you use Level compaction, the number of sstables you will touch 
> will be less because sstables in each level is non overlapping except L0.
> 
> 
> On Wed, Mar 27, 2013 at 8:20 PM, aaron morton  wrote:
> sstablekey can help you find which sstables your keys are in. 
> 
> But yes, a slice call will need to read from all sstables the row has a 
> fragment in. This is one reason we normally suggest partitioning time series 
> data by month or year or something sensible in your problem domain. 
> 
> You will probably also want to use reversed comparators so you do not have to 
> use reversed in your query. 
> 
> Hope that helps. 
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 28/03/2013, at 8:25 AM, Bryan Talbot  wrote:
> 
>> In the worst case, that is possible, but compaction strategies try to 
>> minimize the number of SSTables that a row appears in so a row being in ALL 
>> SStables is not likely for most cases.
>> 
>> -Bryan
>> 
>> 
>> 
>> On Wed, Mar 27, 2013 at 12:17 PM, Kanwar Sangha  wrote:
>> Hi – I have a query on Read with Cassandra. We are planning to have dynamic 
>> column family and each column would be on based a timeseries.
>> 
>>  
>> 
>> Inserting data — key => ‘xxx′, {column_name => TimeUUID(now), 
>> :column_value => ‘value’ }, {column_name => TimeUUID(now), :column_value => 
>> ‘value’ },..
>> 
>>  
>> 
>> Now this key might be spread across multiple SSTables over a period of days. 
>> When we do a READ query to fetch say a slice of data from this row based on 
>> time X->Y , would it need to get data from ALL sstables ?
>> 
>>  
>> 
>> Thanks,
>> 
>> Kanwar
>> 
>>  
>> 
>> 
> 
> 



Re: CQL3 And Map Literals

2013-03-31 Thread aaron morton
> I am curious. Was there a specific reason why it was decided to use
> single-quotes?
ANSII SQL compatible. 
(Am offline now and cannot confirm, but years of writing SQL with single quotes 
makes me think of that. )

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 10:09 AM, Gareth Collins  wrote:

> Hello,
> 
> I have been playing with map literals in CQL3 queries. I see that
> single-quotes work:
> 
> {'foo':'bar'}
> 
> but double-quotes do not:
> 
> {"foo":"bar"}
> 
> I am curious. Was there a specific reason why it was decided to use
> single-quotes?
> I ask because double-quotes would make this valid json.
> 
> thanks in advance,
> Gareth



Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-31 Thread aaron morton
> But what if the gc_grace was changed to a lower value as part of a schema 
> migration after the hints have been marked with TTLs equal to the lower 
> gc_grace before the migration? 
There would be a chance then if the tombstones had been purged. 
Want to raise a ticket ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/03/2013, at 2:58 AM, Arya Goudarzi  wrote:

> I am not familiar with that part of the code yet. But what if the gc_grace 
> was changed to a lower value as part of a schema migration after the hints 
> have been marked with TTLs equal to the lower gc_grace before the migration? 
> 
> From what you've described, I think this is not an issue for us as we did not 
> have a node down for a long period of time, but just pointing out what I 
> think could happen based on what you've described.
> 
> On Sun, Mar 24, 2013 at 10:03 AM, aaron morton  
> wrote:
>> I could imagine a  scenario where a hint was replayed to a replica after all 
>> replicas had purged their tombstones
> Scratch that, the hints are TTL'd with the lowest gc_grace. 
> Ticket closed https://issues.apache.org/jira/browse/CASSANDRA-5379
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 24/03/2013, at 6:24 AM, aaron morton  wrote:
> 
>>> Beside the joke, would hinted handoff really have any role in this issue?
>> I could imagine a  scenario where a hint was replayed to a replica after all 
>> replicas had purged their tombstones. That seems like a long shot, it would 
>> need one node to be down for the write and all up for the delete and for all 
>> of them to have purged the tombstone. But maybe we should have a max age on 
>> hints so it cannot happen. 
>> 
>> Created https://issues.apache.org/jira/browse/CASSANDRA-5379
>> 
>> Ensuring no hints are in place during an upgrade would work around. I tend 
>> to make sure hints and commit log are clear during an upgrade. 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 22/03/2013, at 7:54 AM, Arya Goudarzi  wrote:
>> 
>>> Beside the joke, would hinted handoff really have any role in this issue? I 
>>> have been struggling to reproduce this issue using the snapshot data taken 
>>> from our cluster and following the same upgrade process from 1.1.6 to 
>>> 1.1.10. I know snapshots only link to active SSTables. What if these 
>>> returned rows belong to some inactive SSTables and some bug exposed itself 
>>> and marked them as active? What are the possibilities that could lead to 
>>> this? I am eager to find our as this is blocking our upgrade.
>>> 
>>> On Tue, Mar 19, 2013 at 2:11 AM,  wrote:
>>> This obscure feature of Cassandra is called “haunted handoff”.
>>> 
>>>  
>>> 
>>> Happy (early) April Fools J
>>> 
>>>  
>>> 
>>> From: aaron morton [mailto:aa...@thelastpickle.com] 
>>> Sent: Monday, March 18, 2013 7:45 PM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
>>> 
>>>  
>>> 
>>> As you see, this node thinks lots of ranges are out of sync which shouldn't 
>>> be the case as successful repairs where done every night prior to the 
>>> upgrade. 
>>> 
>>> Could this be explained by writes occurring during the upgrade process ? 
>>> 
>>>  
>>> 
>>> I found this bug which touches timestamp and tomstones which was fixed in 
>>> 1.1.10 but am not 100% sure if it could be related to this issue: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-5153
>>> 
>>> Me neither, but the issue was fixed in 1.1.0
>>> 
>>>  
>>> 
>>>  It appears that the repair task that I executed after upgrade, brought 
>>> back lots of deleted rows into life.
>>> 
>>> Was it entire rows or columns in a row?
>>> 
>>> Do you know if row level or column level deletes were used ? 
>>> 
>>>  
>>> 
>>> Can you look at the data in cassanca-cli and confirm the timestamps on the 
>>> columns make sense ?  
>>> 
>>>  
>>> 
>>> Cheers
>>> 
>>>  
>>> 
>>> -
>>> 
>>> Aaron Morton
>>> 
>>> Freelance Cassandra Consultant
>>> 
>>> New Zealand
>>> 
>>>  
>>> 
>>> @aaronmorton
>>> 
>>> http://www.thelastpickle.com
>>> 
>>>  
>>> 
>>> On 16/03/2013, at 2:31 PM, Arya Goudarzi  wrote:
>>> 
>>> 
>>> 
>>> 
>>> Hi,
>>> 
>>>  
>>> 
>>> I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running 
>>> repairs. It appears that the repair task that I executed after upgrade, 
>>> brought back lots of deleted rows into life. Here are some logistics:
>>> 
>>>  
>>> 
>>> - The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6 
>>> 
>>> - Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
>>> 
>>> - Upgrade to : 1.1.10 with all other settings the same;
>>> 
>>> - Successful repairs were being done on this cluster every night;

Re: weird behavior with RAID 0 on EC2

2013-03-31 Thread aaron morton
> Ok, if you're going to look into it, please keep me/us posted.
It's not on my radar.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ  wrote:

> Ok, if you're going to look into it, please keep me/us posted.
> 
> It happen twice for me, the same day, within a few hours on the same node and 
> only happened to 1 node out of 12, making this node almost unreachable.
> 
> 
> 2013/3/28 aaron morton 
> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well, 1 
> or 2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish. 
> 
> Have not looked into it. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ  wrote:
> 
>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd, xvde 
>> parts of a logical Raid0 (md0).
>> 
>> I use to see their use increasing in the same way. This morning there was a 
>> normal minor compaction followed by messages dropped on one node (out of 12).
>> 
>> Looking closely at this node I saw the following:
>> 
>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>> 
>> On this node, one of the four disks (xvdd) started working hardly while 
>> other worked less intensively.
>> 
>> This is quite weird since I always saw this 4 disks being used the exact 
>> same way at every moment (as you can see on 5 other nodes or when the node 
>> ".239" come back to normal).
>> 
>> Any idea on what happened and on how it can be avoided ?
>> 
>> Alain
> 
> 



Re: Reading data in bulk from cassandra for indexing in Elastic search

2013-03-31 Thread aaron morton
> Approach 1:
> 1. Get chucks of 10,000 keys (which is configurable, but when I increase it 
> to more than 15,000, I get a thrift frame size error cassandra. To fix it, I 
> will need to increase that frame size via cassandra.yml)  and its columns 
> (around 15 columns/key).
> 
You can model this on the way the Hadoop ColumnFamilyRecordReader works. Run it 
in parallel on every node in the cluster, have each process only read the rows 
which are in the primary token range for the node it's running on. For the 
first range_slice query use the token range for the node, for the subsequent 
queries convert the last row key to a token and use that as the start token. 

IMHO 10K rows per slice is too many, I would start at 1K. More is not always 
better. 
 
> 1. What is the suggest strategy to read bulk data from cassandra? Which read 
> pattern is better, one big get range slide with 10,000 keys-columns or 
> multiple small GETs for every keys?
Somewhere there is a sweet spot. Big queries hurt overall query throughput on 
the nodes and can lead to memory/GC issues on the client and servers. Lots of 
small queries result in more time spent waiting for network latency. Start 
small and find the point where the overall throughput stops improving, then 
make sure you are not hurting the throughput for other clients. 
 
> 2. How about reading more values at once, say 50,000 keys-columns by 
> increasing the thrift frame size from 16Mb to something greater like 54MB? 
> How will it impact cassandra's performance in general?
It will result in increased GC pressure.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 1:44 PM, Utkarsh Sengar  wrote:

> Hello,
> 
> I am trying to implement an indexer for a column family in cassandra (cluster 
> of 4 nodes) using elastic search. There is a river plugin which I am writing 
> which retrieves data from cassandra and throws to elastic search. It is 
> triggered once a day (which is configurable based on the requirement).
> 
> Total keys: ~50M
> 
> So for reading the whole column family (random partition), I am going ahead 
> with this approach:
> As mentioned here, I use this example (PaginateGetRangeSlices.java):
> 
> Approach 1:
> 1. Get chucks of 10,000 keys (which is configurable, but when I increase it 
> to more than 15,000, I get a thrift frame size error cassandra. To fix it, I 
> will need to increase that frame size via cassandra.yml)  and its columns 
> (around 15 columns/key).
> 2. Then send 15,000 read records to elastic search.
> 3. It is single threaded for now. It will be hard to make this multithreaded 
> because I will need to track the range of keys which is already read and 
> share start key value. with every thread. Think PaginateGetRangeSlices.java 
> example, but multi-threaded.
> 
> I have implemented this approach, its not that fast. Takes about 6hours to 
> complete.
> 
> Approach 2:
> 1. Get all the keys using same query as above. But retrieve only the key.
> 2. Divide the keys by x. Where x will the total threads I spawn. Every 
> individual thread will do an individual GET for a key and insert it in 
> elastic search. This will considerably increase hits to cassandra, but sounds 
> more efficient.
> 
> 
> So my questions are:
> 1. What is the suggest strategy to read bulk data from cassandra? Which read 
> pattern is better, one big get range slide with 10,000 keys-columns or 
> multiple small GETs for every keys?
> 
> 2. How about reading more values at once, say 50,000 keys-columns by 
> increasing the thrift frame size from 16Mb to something greater like 54MB? 
> How will it impact cassandra's performance in general?
> 
> Will appreciate your input about any other strategies you use to move bulk 
> data from cassandra.
> 
> -- 
> Thanks,
> -Utkarsh



Re: Problem with streaming data from Hadoop: DecoratedKey(-1, )

2013-03-31 Thread aaron morton
>  but yesterday one of 600 mappers failed
>  
:)

> From what I can understand by looking into the C* source, it seems to me that 
> the problem is caused by a empty (or surprisingly finished?) input buffer (?) 
> causing token to be set to -1 which is improper for RandomPartitioner:
Yes, there is a zero length key which as a -1 token. 

> However, I can't figure out what's the root cause of this problem.
> Any ideas?
mmm, the BulkOutputFormat uses a SSTableSimpleUnsortedWriter and neither of 
them check for zero length row keys. I would look there first. 

There is no validation in the  AbstractSSTableSimpleWriter, not sure if that is 
by design or an oversight. Can you catch the zero length key in your map job ? 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 2:26 PM, Michal Michalski  wrote:

> We're streaming data to Cassandra directly from MapReduce job using 
> BulkOutputFormat. It's been working for more than a year without any 
> problems, but yesterday one of 600 mappers faild and we got a strange-looking 
> exception on one of the C* nodes.
> 
> IMPORTANT: It happens on one node and on one cluster only. We've loaded the 
> same data to test cluster and it worked.
> 
> 
> ERROR [Thread-1340977] 2013-03-28 06:35:47,695 CassandraDaemon.java (line 
> 133) Exception in thread Thread[Thread-1340977,5,main]
> java.lang.RuntimeException: Last written key 
> DecoratedKey(5664330507961197044404922676062547179, 
> 302c6461696c792c32303133303332352c312c646f6d61696e2c756e6971756575736572732c633a494e2c433a6d63635f6d6e635f636172726965725f43656c6c4f6e655f4b61726e6174616b615f2842616e67616c6f7265295f494e2c643a53616d73756e675f47542d49393037302c703a612c673a3133)
>  >= current key DecoratedKey(-1, ) writing into 
> /cassandra/production/IndexedValues/production-IndexedValues-tmp-ib-240346-Data.db
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:209)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
>   at 
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
> 
> 
> From what I can understand by looking into the C* source, it seems to me that 
> the problem is caused by a empty (or surprisingly finished?) input buffer (?) 
> causing token to be set to -1 which is improper for RandomPartitioner:
> 
> public BigIntegerToken getToken(ByteBuffer key)
> {
>if (key.remaining() == 0)
>return MINIMUM;// Which is -1
>return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
> }
> 
> However, I can't figure out what's the root cause of this problem.
> Any ideas?
> 
> Of course I can't exclude a bug in my code which streams these data, but - as 
> I said - it works when loading the same data to test cluster (which has 
> different number of nodes, thus different token assignment, which might be a 
> case too).
> 
> Michał



Re: Digest Query Seems to be corrupt on certain cases

2013-03-31 Thread aaron morton
> When I manually inspected this byte array, it seems hold all details 
> correctly, except the super-column name, causing it to fetch the entire wide 
> row.
What is the CF definition and what is the exact query you are sending? 
There does not appear to be anything obvious in the QueryPath serde for 1.0.7

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 10:54 AM, Ravikumar Govindarajan 
 wrote:

> VM Settings are
> -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn800M 
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> 
> error stack was containing 2 threads for the same key, stalling on digest 
> query
> 
> The below bytes which I referred is the actual value of "_body" variable in 
> org.apache.cassandra.net.Message object got from the heap dump.
> 
> As I understand from the code, ReadVerbHandler will deserialize this "_body" 
> variable into a SliceByNamesReadCommand object.
> 
> When I manually inspected this byte array, it seems hold all details 
> correctly, except the super-column name, causing it to fetch the entire wide 
> row.
> 
> --
> Ravi
> 
> On Thu, Mar 28, 2013 at 8:36 AM, aaron morton  wrote:
>> We started receiving OOMs in our cassandra grid and took a heap dump
> What are the JVM settings ? 
> What was the error stack? 
> 
>> I am pasting the serialized byte array of SliceByNamesReadCommand, which 
>> seems to be corrupt on issuing certain digest queries.
> 
> Sorry I don't follow what you are saying here. 
> Can you can you enable DEBUG logging and identify the behaviour you think is 
> incorrect ?
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 28/03/2013, at 4:15 AM, Ravikumar Govindarajan 
>  wrote:
> 
>> We started receiving OOMs in our cassandra grid and took a heap dump. We are 
>> running version 1.0.7 with LOCAL_QUORUM from both reads/writes.
>> 
>> After some analysis, we kind of identified the problem, with 
>> SliceByNamesReadCommand, involving a single Super-Column. This seems to be 
>> happening only in digest query and not during actual reads.
>> 
>> I am pasting the serialized byte array of SliceByNamesReadCommand, which 
>> seems to be corrupt on issuing certain digest queries.
>> 
>>  //Type is SliceByNamesReadCommand
>>  body[0] = (byte)1;
>>  
>>  //This is a digest query here.
>>  body[1] = (byte)1;
>> 
>> //Table-Name from 2-8 bytes
>> 
>> //Key-Name from 9-18 bytes
>> 
>> //QueryPath deserialization here
>>  
>>  //CF-Name from 19-30 bytes
>> 
>> //Super-Col-Name from 31st byte onwards, but gets 
>> corrupt as found in heap dump
>> 
>> //body[32-37] = 0, body[38] = 1, body[39] = 0.  This 
>> causes the SliceByNamesDeserializer to mark both ColName=NULL and 
>> SuperColName=NULL, fetching entire wide-row!!!
>> 
>>//Actual super-col-name starts only from byte 40, whereas 
>> it should have started from 31st byte itself
>> 
>> Has someone already encountered such an issue? Why is the super-col-name not 
>> correctly de-serialized during digest query.
>> 
>> --
>> Ravi
>> 
> 
>