Re: Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread Viktor Jevdokimov
Thank you. To sum up, to free up and discard a commit log - flush all. So
higher timeout for truncate will/should work.

2012/3/6 aaron morton 

> Truncate uses RPC timeout, which is in my case set to 10 seconds (I want
> even less) and it's not enough. I've seen in sources TODO for this case.
>
> created
> https://issues.apache.org/jira/browse/CASSANDRA-4006
>
> Is it possible to flush only required CF for truncate, not all? This could
> improve truncate time.
>
> see code comments here
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1681
>
> AFAIK truncate is not considered a regular operation. (All nodes must be
> online for example)
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/03/2012, at 1:34 AM, Viktor Jevdokimov wrote:
>
> Hello,
>
> Truncate uses RPC timeout, which is in my case set to 10 seconds (I want
> even less) and it's not enough. I've seen in sources TODO for this case.
>
> What I found is that truncate starting flush for all memtables for all
> CFs, not only for a CF to be truncated. When there're a lot of CFs to be
> flushed, it takes time.
>
> Is it possible to flush only required CF for truncate, not all? This could
> improve truncate time.
>
>
> Best regards,
> Viktor
>
>
>
>


Re: key sorting question

2012-03-06 Thread Tamar Fraenkel
Thanks.

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Wed, Mar 7, 2012 at 8:55 AM, Dave Brosius wrote:

>  With random partitioner, the rows are sorted by the hashes of the keys,
> so for all intents and purposes, not sorted.
>
> This comment below really is talking about how columns are sorted, and yes
> when time uuids are used, they are sorted by the time component, as a time
> uuids start with the time component and then adds various randomness bits.
>
>
> On 03/07/2012 01:51 AM, Tamar Fraenkel wrote:
>
>  Hi!
> I am currently experimenting with Cassandra 1.0.7, but while reading
> http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
> something caught my eye:
> "Cassandra orders version 1 
> UUIDs
>  by
> their time component"
> Is this true?
> If I have for example USER_CF where key is randomly generated
> java.util.UUID (UUID.randomUUID()), will the rows be sorted by the
> generation time?
> I use random partitioner if that makes any difference.
> Thanks,
>
>
>
>  *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
<><>

Re: key sorting question

2012-03-06 Thread Dave Brosius

  
  
With random partitioner, the rows are sorted by the hashes of the
keys, so for all intents and purposes, not sorted.

This comment below really is talking about how columns are sorted,
and yes when time uuids are used, they are sorted by the time
component, as a time uuids start with the time component and then
adds various randomness bits.

On 03/07/2012 01:51 AM, Tamar Fraenkel wrote:

  
Hi!
I am currently experimenting with Cassandra 1.0.7, but
  while reading http://www.datastax.com/dev/blog/schema-in-cassandra-1-1 
  something caught my eye:
"Cassandra
orders version
1 UUIDs by
their time component"
Is this true?
If
  I have for example USER_CF where key is randomly generated
  java.util.UUID (UUID.randomUUID()),
  will the rows be sorted by the generation time?
I use random partitioner if that
  makes any difference.
Thanks,





  
Tamar Fraenkel 
  Senior Software Engineer, TOK Media 
  
  

  ta...@tok-media.com
  Tel:   +972
2 6409736 
  Mob:  +972
54 8356490 
  Fax:   +972
2 5612956 
  
  


  
  

  


  



key sorting question

2012-03-06 Thread Tamar Fraenkel
Hi!
I am currently experimenting with Cassandra 1.0.7, but while reading
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
something caught my eye:
"Cassandra orders version 1
UUIDs
by
their time component"
Is this true?
If I have for example USER_CF where key is randomly generated
java.util.UUID (UUID.randomUUID()), will the rows be sorted by the
generation time?
I use random partitioner if that makes any difference.
Thanks,



*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: Issue with nodetool clearsnapshot

2012-03-06 Thread B R
Thanks a lot, Aaron. Our cluster is much stable now. We'll look at
upgrading to 1.x in the coming weeks.

On Tue, Mar 6, 2012 at 2:33 PM, aaron morton wrote:

> 1)Since you mentioned hard links, I would like to add that our data
> directory itself is a sym-link. Could that be causing an issue ?
>
> Seems unlikely.
>
> I restarted the node and it went about deleting the files and the disk
> space has been released. Can this be done using nodetool, and without
> restarting ?
>
> Under 0.8.x they are deleted when the files are no longer in use and when
> JVM GC free's all references. You can provoke this by getting the JVM to GC
> using JConsole or another JMX client.
>
> If there is not enough disk free space GC is forced and free space
> reclaimed.
>
> Under 1.x file handles are counted and the files are quickly deleted.
>
> Cheers
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/03/2012, at 7:38 AM, B R wrote:
>
> Hi Aaron,
>
> 1)Since you mentioned hard links, I would like to add that our data
> directory itself is a sym-link. Could that be causing an issue ?
>
> 2)Yes, there are 0 byte files of the same numbers
> in Keyspace1 directory
> 0 Mar  4 01:33 Standard1-g-7317-Compacted
> 0 Mar  3 22:58 Standard1-g-7968-Compacted
> 0 Mar  3 23:10 Standard1-g-8778-Compacted
> 0 Mar  3 23:47 Standard1-g-8782-Compacted
> ...
>
> I restarted the node and it went about deleting the files and the disk
> space has been released. Can this be done using nodetool, and without
> restarting ?
>
> Thanks.
>
> On Mon, Mar 5, 2012 at 10:59 PM, aaron morton wrote:
>
>>   It seems that instead of removing the snapshot, clearsnapshot moved
>> the data files from the snapshot directory to the parent directory and the
>> size of the data for that keyspace has doubled.
>>
>> That is not possible, there is only code there to delete a files in the
>> snapshot.
>>
>> Note that in the snapshot are hard links to the files in the data dir.
>> Deleting / clearing the snapshot will not delete the files from the data
>> dir if they are still in use.
>>
>>  Many of the files are looking like duplicates.
>>
>> in Keyspace1 directory
>> 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db
>> 156987786084 Mar  4 01:33 Standard1-g-8850-Data.db
>>
>> Under 0.8.x files are not immediately deleted. Did the data directory
>> contain zero size -Compacted files with the same number ?
>>
>> Cheers
>>
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/03/2012, at 11:50 PM, B R wrote:
>>
>> Version 0.8.9
>>
>> We run a 2 node cluster with RF=2. We ran a scrub and after that ran the
>> clearsnapshot to remove the backup snapshot created by scrub. It seems that
>> instead of removing the snapshot, clearsnapshot moved the data files from
>> the snapshot directory to the parent directory and the size of the data for
>> that keyspace has doubled. Many of the files are looking like duplicates.
>>
>> in Keyspace1 directory
>> 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db
>> 156987786084 Mar  4 01:33 Standard1-g-8850-Data.db
>> 118211555728 Jan 31 12:50 Standard1-g-7968-Data.db
>> 118211555728 Mar  3 22:58 Standard1-g-8840-Data.db
>> 116902342895 Feb 25 02:04 Standard1-g-8832-Data.db
>> 116902342895 Mar  3 22:10 Standard1-g-8836-Data.db
>> 93788425710 Feb 21 04:20 Standard1-g-8791-Data.db
>> 93788425710 Mar  4 00:29 Standard1-g-8845-Data.db
>> .
>>
>> Even though the nodetool ring command shows the correct data size for the
>> node, the du -sh on the keyspace directory gives double the size.
>>
>> Can you guide us to proceed from this situation ?
>>
>> Thanks.
>>
>>
>>
>
>


RE: Secondary indexes don't go away after metadata change

2012-03-06 Thread Frisch, Michael
Sure enough it does.  Looking back in the logs when the node was first coming 
online I can see it applying migrations and submitting index builds on indexes 
that are deleted in the newest version of the schema.  This may be a silly 
question but shouldn't it just apply the most recent version of the schema on a 
new node?  Is there a reason to apply the migrations?

- Mike

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, March 06, 2012 4:14 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes don't go away after metadata change

When the new node comes online the history of schema changes are streamed to 
it. I've not looked at the code but it could be that schema migrations are 
creating Indexes. That are then deleted from the schema but not from the DB 
it's self.

Does that fit your scenario ? When the new node comes online does it log 
migrations been applied and then indexes been created ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:56 AM, Frisch, Michael wrote:


Thank you very much for your response.  It is true that the older, previously 
existing nodes are not snapshotting the indexes that I had removed.  I'll go 
ahead and just delete those SSTables from the data directory.  They may be 
around still because they were created back when we used 0.8.

The more troubling issue is with adding new nodes to the cluster though.  It 
built indexes for column families that have had all indexes dropped weeks or 
months in the past.  It also will snapshot the index SSTables that it created.  
The index files are non-empty as well, some are hundreds of megabytes.

All nodes have the same schema, none list themselves as having the rows 
indexed.  I cannot drop the indexes via the CLI either because it says that 
they don't exist.  It's quite perplexing.

- Mike


From: aaron morton 
[mailto:aa...@thelastpickle.com]
Sent: Monday, March 05, 2012 3:58 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes don't go away after metadata change

The secondary index CF's are marked as no longer required / marked as 
compacted. under 1.x they would then be deleted reasonably quickly, and 
definitely deleted after a restart.

Is there a zero length .Compacted file there ?

Also, when adding a new node to the ring the new node will build indexes for 
the ones that supposedly don't exist any longer.  Is this supposed to happen?  
Would this have happened if I had deleted the old SSTables from the previously 
existing nodes?
Check you have a consistent schema using describe cluster in the CLI. And check 
the schema is what you think it is using show schema.

Another trick is to do a snapshot. Only the files in use are included the 
snapshot.

Hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:



I have a few column families that I decided to get rid of the secondary indexes 
on.  I see that there aren't any new index SSTables being created, but all of 
the old ones remain (some from as far back as September).  Is it safe to just 
delete then when the node is offline?  Should I run clean-up or scrub?

Also, when adding a new node to the ring the new node will build indexes for 
the ones that supposedly don't exist any longer.  Is this supposed to happen?  
Would this have happened if I had deleted the old SSTables from the previously 
existing nodes?

The nodes in question have either been upgraded from v0.8.1 => v1.0.2 (scrubbed 
at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary index was 
dropped when the nodes were version 1.0.6.  The new node added was also 1.0.6.

- Mike



Re: Repairing nodes when two schema versions appear

2012-03-06 Thread aaron morton
Go to one of the nodes, stop it and delete the Migrations and Schema files in 
the system keyspace. 

When you restart the node it will stream the migrations the other. Note that if 
the node is UP and accepting traffic it may log errors about missing CF's 
during this time. 

Cheers 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/03/2012, at 1:43 AM, Tharindu Mathew wrote:

> Hi,
> 
> I try to add column families programatically and end up with 2 schema 
> versions in the Cassandra cluster. Using Cassandra 0.7.
> 
> Is there a way to bring this back to normal (to one schema version) through 
> the cli or through the API?
> 
> -- 
> Regards,
> 
> Tharindu
> 
> blog: http://mackiemathew.com/
> 



Re: Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread aaron morton
> Truncate uses RPC timeout, which is in my case set to 10 seconds (I want even 
> less) and it's not enough. I've seen in sources TODO for this case.
created 
https://issues.apache.org/jira/browse/CASSANDRA-4006

> Is it possible to flush only required CF for truncate, not all? This could 
> improve truncate time.
see code comments here 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1681

AFAIK truncate is not considered a regular operation. (All nodes must be online 
for example)

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/03/2012, at 1:34 AM, Viktor Jevdokimov wrote:

> Hello,
> 
> Truncate uses RPC timeout, which is in my case set to 10 seconds (I want even 
> less) and it's not enough. I've seen in sources TODO for this case.
> 
> What I found is that truncate starting flush for all memtables for all CFs, 
> not only for a CF to be truncated. When there're a lot of CFs to be flushed, 
> it takes time.
> 
> Is it possible to flush only required CF for truncate, not all? This could 
> improve truncate time.
> 
> 
> Best regards,
> Viktor
> 
> 



Re: Mutation Dropped Messages

2012-03-06 Thread aaron morton
> 1.   One node is running at 8G rest on 10G – same config
Make them all the same. 

> 2.   Nodetool –
Even though the token ranges are not balanced, the load looks a little odd. 
Have you moved tokens ? Did you do a cleanup ? 

You'll need to look at the node that is dropping messages (not sure what that 
is). 

What is happening in the log ? Is it having GC problems ? 
What is happening with the io and CPU load on the machine ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 11:57 PM, Tiwari, Dushyant wrote:

> 1.   One node is running at 8G rest on 10G – same config
> 2.   Nodetool –
> Status State   LoadOwnsToken
>   
>  162563731948587347959549934419333022646
> Up Normal  107.79 MB   25.00%  34957844353235424160784456632419943350
> Up Normal  116.44 MB   25.00%  77493140218352732093706282561390969782
> Up Normal  27.01 MB12.68%  99065646426277998282363457251162269147
> Up Normal  35.9 MB 12.32%  120028436083470040026628108490361996214
> Up Normal  512.55 KB   25.00%  162563731948587347959549934419333022646
>  
> RF:2 and CL: QUORUM – writes at a rate of 1750 rows/s – every row – 5 cols 
> and 2 of them indexes.
>  
> Thanks,
> Dushyant
>  
>  
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Monday, March 05, 2012 11:07 PM
> To: user@cassandra.apache.org
> Subject: Re: Mutation Dropped Messages
>  
> I increased the size of the cluster also the concurrent_writes parameter. 
> Still there is a node which keeps on dropping the mutation messages.
> Ensure all the nodes have the same spec, and the nodes have the same config. 
> In a virtual environment consider moving the node.
>  
> Is this due to some improper load balancing? 
> What does nodetool ring say and what sort of queries (and RF and CL) are you 
> sending.
>  
> Cheers
>  
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 6/03/2012, at 3:58 AM, Tiwari, Dushyant wrote:
> 
> 
> Hey Aaron,
>  
> I increased the size of the cluster also the concurrent_writes parameter. 
> Still there is a node which keeps on dropping the mutation messages. The 
> other nodes are not dropping mutation messages. I am using Hector API and had 
> done nothing for load balancing so far. Just provided the host:port of the 
> nodes in the Cassandrahostconfig. Is this due to some improper load 
> balancing? Also the physical host where the node is hosted is relatively 
> heavier than other nodes’ host. What can I do to improve?
> PS: The node is seed of the cluster.
>  
> Thanks,
> Dushyant
>  
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Monday, March 05, 2012 4:15 PM
> To: user@cassandra.apache.org
> Subject: Re: Mutation Dropped Messages
>  
> 1.   Which parameters to tune in the config files? – Especially looking 
> for heavy writes
> The node is overloaded. It may be because there are no enough nodes, or the 
> node is under temporary stress such as GC or repair. 
> If you have spare IO / CPU capacity you could increase the current_writes to 
> increase throughput on the write stage. You then need to ensure the commit 
> log and, to a lesser degree, the data volumes can keep up. 
>  
> 2.   What is the difference between TimedOutException and silently 
> dropping mutation messages while operating on a CL of QUORUM.
> TimedOutExceptions means CL nodes did not respond to the coordinator before 
> rpc_timeout. Dropping messages happens when a message is removed from the 
> queue in the a thread pool after rpc_timeout has occurred. it is a feature of 
> the architecture, and correct behaviour under stress. 
> Inconsistencies created by dropped messages are repaired via reads as high 
> CL, HH (in 1.+), Read Repair or Anti Entropy.
>  
> Cheers
>  
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote:
> 
> 
> 
> Hi All,
>  
> While benchmarking Cassandra I found “Mutation Dropped” messages in the logs. 
>  Now I know this is a good old question. It will be really great if someone 
> can provide a check list to recover when such a thing happens. I am looking 
> for answers of the following questions  -
>  
> 1.   Which parameters to tune in the config files? – Especially looking 
> for heavy writes
> 2.   What is the difference between TimedOutException and silently 
> dropping mutation messages while operating on a CL of QUORUM.
>  
>  
> Regards,
> Dushyant
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions 
> or views contained herein are not intended to be, and do not constitute, 
> advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform 
> and Consumer Protection Act. If you have re

Re: Old data coming alive after adding node

2012-03-06 Thread aaron morton
> All our writes/deletes are done with CL.QUORUM.
> Our reads are done with CL.ONE. Although the reads that confirmed the old 
> data were done with CL.QUORUM.


> According to 
> https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 
> 0.6.6 has the same patch
> for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions in 0.6.6 
> and up also purged tombstones.
My bad. As you were. 

After the repair did the un-deleted data remain un-deleted ? Are you back to a 
stable situation ? 

Without a lot more detail I am at a bit of a loss. 

I know it's painful but migrating to 1.0 *really* will make your life so much 
easier and faster. At some point you may hit a bug or a problem in 0.6 and the 
solution may be to upgrade, quickly.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 11:13 PM, Stefan Reek wrote:

> Hi Aaron,
> 
> Thanks for the quick reply.
> All our writes/deletes are done with CL.QUORUM.
> Our reads are done with CL.ONE. Although the reads that confirmed the old 
> data were done with CL.QUORUM.
> According to 
> https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 
> 0.6.6 has the same patch
> for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions in 0.6.6 
> and up also purged tombstones.
> The only suspicious thing I noticed was that after adding the fourth node 
> repairs became extremely slow and heavy.
> Running it degraded the performance of the whole cluster and the new node 
> even went OOM when running it.
> 
> Cheers,
> 
> Stefan
> 
> On 03/06/2012 10:51 AM, aaron morton wrote:
>> 
>>> After we added a fourth node, keeping RF=3, some old data appeared in the 
>>> database.
>> What CL are you working at ? (Should not matter too much with repair 
>> working, just asking)
>> 
>> 
>>> We don't run compact on the nodes explicitly as I understand that running 
>>> repair will trigger a
>>> major compaction. I'm not entirely sure if it does so, but in any case the 
>>> tombstones will be removed by a minor
>>> compaction.
>> In 0.6.x tombstones were only purged during a major / manual compaction. 
>> Purging during minor compaction came in during 0.7
>> https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467
>> 
>>> Can anyone think of any reason why the old data reappeared?
>> It sounds like you are doing things correctly. The complicating factor is 
>> 0.6 is so very old. 
>> 
>> 
>> If I wanted to poke around some more I would conduct reads as CL one against 
>> nodes and see if they return the "deleted" data or not. This would help me 
>> understand if the tombstone is still out there. 
>> 
>> I would also poke around a lot in the logs to make sure repair was running 
>> as expected and completing. If you find anything suspicious post examples. 
>> 
>> Finally I would ensure CL QUROUM was been used. 
>> 
>> Hope that helps.
>> 
>> 
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 6/03/2012, at 10:13 PM, Stefan Reek wrote:
>> 
>>> Hi,
>>> 
>>> We were running a 3-node cluster of cassandra 0.6.13 with RF=3.
>>> After we added a fourth node, keeping RF=3, some old data appeared in the 
>>> database.
>>> As far as I understand this can only happen if nodetool repair wasn't run 
>>> for more than GCGraceSeconds.
>>> Our GCGraceSeconds is set to the default of 10 days (864000 seconds).
>>> We have  a scheduled cronjob to run repair once each week on every node, 
>>> each on another day.
>>> I'm sure that none of the nodes ever skipped running a repair.
>>> We don't run compact on the nodes explicitly as I understand that running 
>>> repair will trigger a
>>> major compaction. I'm not entirely sure if it does so, but in any case the 
>>> tombstones will be removed by a minor
>>> compaction. So I expected that the reappearing data, which is a couple of 
>>> months old in some cases, was long gone
>>> by the time we added the node.
>>> 
>>> Can anyone think of any reason why the old data reappeared?
>>> 
>>> Stefan
>> 
> 



Re: Schema change causes exception when adding data

2012-03-06 Thread Tamar Fraenkel
Hi!
Maybe I didn't understand, but if you use Hector's
addColumnFamily(CF, true);
it should wait for schema agreement.
Will that solve your problem?

Thanks

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Mar 6, 2012 at 7:55 PM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:

>  That is the best one I have found.
>
>
> On 03/01/2012 03:12 PM, Tharindu Mathew wrote:
>
> There are 2. I'd like to wait till there are one, when I insert the value.
>
> Going through the code, calling client.describe_schema_versions() seems to
> give a good answer to this. And I discovered that if I wait till there is
> only 1 version, I will not get this error.
>
> Is this the best practice if I want to check this programatically?
>
> On Thu, Mar 1, 2012 at 11:15 PM, aaron morton wrote:
>
>> use describe cluster in the CLI to see how many schema versions there
>> are.
>>
>>  Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>>  On 2/03/2012, at 12:25 AM, Tharindu Mathew wrote:
>>
>>
>>
>> On Thu, Mar 1, 2012 at 11:47 AM, Tharindu Mathew wrote:
>>
>>> Jeremiah,
>>>
>>> Thanks for the reply.
>>>
>>> This is what we have been doing, but it's not reliable as we don't know
>>> a definite time that the schema would get replicated. Is there any way I
>>> can know for sure that changes have propagated?
>>>
>> [Edit: corrected to a question]
>>
>>>
>>> Then I can block the insertion of data until then.
>>>
>>>
>>> On Thu, Mar 1, 2012 at 4:33 AM, Jeremiah Jordan <
>>> jeremiah.jor...@morningstar.com> wrote:
>>>
  The error is that the specified colum family doesn’t exist.  If you
 connect with the CLI and describe the keyspace does it show up?  Also,
 after adding a new column family programmatically you can’t use it
 immediately, you have to wait for it to propagate.  You can use calls to
 describe schema to do so, keep calling it until every node is on the same
 schema.



 -Jeremiah



 *From:* Tharindu Mathew [mailto:mcclou...@gmail.com]
 *Sent:* Wednesday, February 29, 2012 8:27 AM
 *To:* user
 *Subject:* Schema change causes exception when adding data



 Hi,

 I have a 3 node cluster and I'm dynamically updating a keyspace with a
 new column family. Then, when I try to write records to it I get the
 following exception shown at [1].

 How do I avoid this. I'm using Hector and the default consistency level
 of QUORUM is used. Cassandra version 0.7.8. Replication Factor is 1.

 How can I solve my problem?

 [1] -
 me.prettyprint.hector.api.exceptions.HInvalidRequestException:
 InvalidRequestException(why:unconfigured columnfamily proxySummary)

 at
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)

 at
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

 at
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:156)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)

 at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.multigetSlice(KeyspaceServiceImpl.java:401)

 at
 me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:67)

 at
 me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:59)

 at
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)

 at
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:72)

 at
 me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery.execute(ThriftMultigetSliceQuery.java:58)



 --
 Regards,

 Tharindu



 blog: http://mackiemathew.com/



>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Tharindu
>>>
>>>  blog: http://mackiemathew.com/
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Tharindu
>>
>>  blog: http://mackiemathew.com/
>>
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
>  blog: http://mackiemathew.com/
>
>
<>

Re: Schema change causes exception when adding data

2012-03-06 Thread Jeremiah Jordan

That is the best one I have found.

On 03/01/2012 03:12 PM, Tharindu Mathew wrote:

There are 2. I'd like to wait till there are one, when I insert the value.

Going through the code, calling client.describe_schema_versions() 
seems to give a good answer to this. And I discovered that if I wait 
till there is only 1 version, I will not get this error.


Is this the best practice if I want to check this programatically?

On Thu, Mar 1, 2012 at 11:15 PM, aaron morton > wrote:


use describe cluster in the CLI to see how many schema versions
there are.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/03/2012, at 12:25 AM, Tharindu Mathew wrote:




On Thu, Mar 1, 2012 at 11:47 AM, Tharindu Mathew
mailto:mcclou...@gmail.com>> wrote:

Jeremiah,

Thanks for the reply.

This is what we have been doing, but it's not reliable as we
don't know a definite time that the schema would get
replicated. Is there any way I can know for sure that changes
have propagated?

[Edit: corrected to a question]


Then I can block the insertion of data until then.


On Thu, Mar 1, 2012 at 4:33 AM, Jeremiah Jordan
mailto:jeremiah.jor...@morningstar.com>> wrote:

The error is that the specified colum family doesn’t
exist.  If you connect with the CLI and describe the
keyspace does it show up?  Also, after adding a new
column family programmatically you can’t use it
immediately, you have to wait for it to propagate.  You
can use calls to describe schema to do so, keep calling
it until every node is on the same schema.

-Jeremiah

*From:*Tharindu Mathew [mailto:mcclou...@gmail.com
]
*Sent:* Wednesday, February 29, 2012 8:27 AM
*To:* user
*Subject:* Schema change causes exception when adding data

Hi,

I have a 3 node cluster and I'm dynamically updating a
keyspace with a new column family. Then, when I try to
write records to it I get the following exception shown
at [1].

How do I avoid this. I'm using Hector and the default
consistency level of QUORUM is used. Cassandra version
0.7.8. Replication Factor is 1.

How can I solve my problem?

[1] -
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:unconfigured columnfamily
proxySummary)

at

me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)

at

me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

at

me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:156)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)

at

me.prettyprint.cassandra.service.KeyspaceServiceImpl.multigetSlice(KeyspaceServiceImpl.java:401)

at

me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:67)

at

me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:59)

at

me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)

at

me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:72)

at

me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery.execute(ThriftMultigetSliceQuery.java:58)



-- 
Regards,


Tharindu

blog: http://mackiemathew.com/




-- 
Regards,


Tharindu

blog: http://mackiemathew.com/




-- 
Regards,


Tharindu

blog: http://mackiemathew.com/






--
Regards,

Tharindu

blog: http://mackiemathew.com/



Re: running two rings on the same subnet

2012-03-06 Thread aaron morton
Reduce these settings for the CF
row_cache (disable it)
key_cache (disable it)

Increase these settings for the CF
bloom_filter_fp_chance

Reduce these settings in cassandra.yaml

flush_largest_memtables_at
memtable_flush_queue_size
sliced_buffer_size_in_kb
in_memory_compaction_limit_in_mb
concurrent_compactors


Increase these settings 
index_interval


While it obviously depends on load, I would not be surprised if you had a lot 
of trouble running cassandra with that setup. 

Cheers
A


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 11:02 PM, Tamar Fraenkel wrote:

> Arron, Thanks for your response. I was afraid this is the issue.
> Can you give me some direction regarding the fine tuning of my VMs, I would 
> like to explore that option some more.
> Thanks!
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> 
> 
> ta...@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 
> 
> 
> 
> On Tue, Mar 6, 2012 at 11:58 AM, aaron morton  wrote:
> You do not have enough memory allocated to the JVM and are suffering from 
> excessive GC as a result.
> 
> There are some tuning things you can try, but 480MB is not enough. 1GB would 
> be a better start, 2 better than that. 
> 
> Consider using https://github.com/pcmanus/ccm for testing multiple instances 
> on a single server rather than a VM.
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote:
> 
>> I have some more info, after couple of hours running the problematic node 
>> became again 100% CPU and I had to reboot it, last lines from log show it 
>> did GC:
>> 
>>  INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line 122) 
>> GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240
>>  INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line 122) 
>> GC for Copy: 3927 ms for 1 collections, 156572576 used; max is 513802240
>>  INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line 50) 
>> Pool NameActive   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line 65) 
>> ReadStage 2 2 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line 65) 
>> RequestResponseStage  0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
>> ReadRepairStage   0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
>> MutationStage 0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
>> ReplicateOnWriteStage 0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
>> GossipStage   0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
>> AntiEntropyStage  0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
>> MigrationStage0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
>> StreamStage   0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
>> MemtablePostFlusher   0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
>> FlushWriter   0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
>> MiscStage 0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
>> InternalResponseStage 0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
>> HintedHandoff 0 0 0
>>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,553 StatusLogger.java (line 69) 
>> CompactionManager   n/a 0
>> 
>> Thanks,
>> 
>> Tamar Fraenkel 
>> Senior Software Engineer, TOK Media 
>> 
>> 
>> 
>> ta...@tok-media.com
>> Tel:   +972 2 6409736 
>> Mob:  +972 54 8356490 
>> Fax:   +972 2 5612956 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Mar 6, 2012 at 9:12 AM, Tamar Fraenkel  wrote:
>> Works..
>> 
>> But during the night my setup encountered a problem.
>> I have two VMs on my cluster (running on VmWare ESXi).
>> Each VM has1GB memory, and two Virtual Disks of 16 GB
>> They are running on a small server with 4CPUs (2.66 GHz), and 4 GB memory 
>> (together with two other VMs)
>> I put cassandra data on the second disk of each machine.
>> VMs are running Ubuntu 11.10 and cassandra 1.0.7.
>> 
>>

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
CFRR.getProgress() is called by child mapper tasks on each TastTracker
node, so the log must appear on
${hadoop_log_dir}/attempt_201202081707_0001_m_00_0/syslog (or
somethings like this) on TaskTrackers, not on client job logs.
Are you sure to see the good log file, I say that because in your first
mail you link the client job log.
And may be you can log the size of each split in CFIF.




Le 6 mars 2012 13:09, Patrik Modesto  a écrit :

> I've added a debug message in the CFRR.getProgress() and I can't find
> it in the debug output. Seems like the getProgress() has not been
> called at all;
>
> Regards,
> P.
>
> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna 
> wrote:
> > you may be running into this -
> https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it
> really affects the execution of the job itself though.
> >
> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
> >
> >> Hi,
> >>
> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> >> Timeouts I get are not because of the Cassandra can't handle the
> >> requests. I've noticed there are several tasks that show proggess of
> >> several thousands percents. Seems like they are looping their range of
> >> keys. I've run the job with debug enabled and the ranges look ok, see
> >> http://pastebin.com/stVsFzLM
> >>
> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> >> number of mappers the job creates:
> >> 0.8.7: 4680
> >> 0.8.10: 595
> >>
> >> Task   Complete
> >> task_201202281457_2027_m_41   9076.81%
> >> task_201202281457_2027_m_73   9639.04%
> >> task_201202281457_2027_m_000105   10538.60%
> >> task_201202281457_2027_m_000108   9364.17%
> >>
> >> None of this happens with cassandra-all 0.8.7.
> >>
> >> Regards,
> >> P.
> >>
> >>
> >>
> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto 
> wrote:
> >>> I'll alter these settings and will let you know.
> >>>
> >>> Regards,
> >>> P.
> >>>
> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton 
> wrote:
>  Have you tried lowering the  batch size and increasing the time out?
> Even
>  just to get it to work.
> 
>  If you get a TimedOutException it means CL number of servers did not
> respond
>  in time.
> 
>  Cheers
> 
>  -
>  Aaron Morton
>  Freelance Developer
>  @aaronmorton
>  http://www.thelastpickle.com
> 
>  On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> 
>  Hi aaron,
> 
>  this is our current settings:
> 
>   
>   cassandra.range.batch.size
>   1024
>   
> 
>   
>   cassandra.input.split.size
>   16384
>   
> 
>  rpc_timeout_in_ms: 3
> 
>  Regards,
>  P.
> 
>  On Mon, Feb 27, 2012 at 21:54, aaron morton 
> wrote:
> 
>  What settings do you have for cassandra.range.batch.size
> 
>  and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> increasing
> 
>  the second ?
> 
> 
>  Cheers
> 
> 
>  -
> 
>  Aaron Morton
> 
>  Freelance Developer
> 
>  @aaronmorton
> 
>  http://www.thelastpickle.com
> 
> 
>  On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> 
> 
>  On Sun, Feb 26, 2012 at 04:25, Edward Capriolo  >
> 
>  wrote:
> 
> 
>  Did you see the notes here?
> 
> 
> 
>  I'm not sure what do you mean by the notes?
> 
> 
>  I'm using the mapred.* settings suggested there:
> 
> 
>  
> 
>  mapred.max.tracker.failures
> 
>  20
> 
>  
> 
>  
> 
>  mapred.map.max.attempts
> 
>  20
> 
>  
> 
>  
> 
>  mapred.reduce.max.attempts
> 
>  20
> 
>  
> 
> 
>  But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> 
> 
>  P.
> 
> 
>  http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> 
> 
> 
> 
> >
>


Repairing nodes when two schema versions appear

2012-03-06 Thread Tharindu Mathew
Hi,

I try to add column families programatically and end up with 2 schema
versions in the Cassandra cluster. Using Cassandra 0.7.

Is there a way to bring this back to normal (to one schema version) through
the cli or through the API?

-- 
Regards,

Tharindu

blog: http://mackiemathew.com/


Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread Viktor Jevdokimov
Hello,

Truncate uses RPC timeout, which is in my case set to 10 seconds (I want
even less) and it's not enough. I've seen in sources TODO for this case.

What I found is that truncate starting flush for all memtables for all CFs,
not only for a CF to be truncated. When there're a lot of CFs to be
flushed, it takes time.

Is it possible to flush only required CF for truncate, not all? This could
improve truncate time.


Best regards,
Viktor


Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
I've added a debug message in the CFRR.getProgress() and I can't find
it in the debug output. Seems like the getProgress() has not been
called at all;

Regards,
P.

On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna  wrote:
> you may be running into this - 
> https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it 
> really affects the execution of the job itself though.
>
> On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
>
>> Hi,
>>
>> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> Timeouts I get are not because of the Cassandra can't handle the
>> requests. I've noticed there are several tasks that show proggess of
>> several thousands percents. Seems like they are looping their range of
>> keys. I've run the job with debug enabled and the ranges look ok, see
>> http://pastebin.com/stVsFzLM
>>
>> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> number of mappers the job creates:
>> 0.8.7: 4680
>> 0.8.10: 595
>>
>> Task       Complete
>> task_201202281457_2027_m_41       9076.81%
>> task_201202281457_2027_m_73       9639.04%
>> task_201202281457_2027_m_000105       10538.60%
>> task_201202281457_2027_m_000108       9364.17%
>>
>> None of this happens with cassandra-all 0.8.7.
>>
>> Regards,
>> P.
>>
>>
>>
>> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto  
>> wrote:
>>> I'll alter these settings and will let you know.
>>>
>>> Regards,
>>> P.
>>>
>>> On Tue, Feb 28, 2012 at 09:23, aaron morton  wrote:
 Have you tried lowering the  batch size and increasing the time out? Even
 just to get it to work.

 If you get a TimedOutException it means CL number of servers did not 
 respond
 in time.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:

 Hi aaron,

 this is our current settings:

      
          cassandra.range.batch.size
          1024
      

      
          cassandra.input.split.size
          16384
      

 rpc_timeout_in_ms: 3

 Regards,
 P.

 On Mon, Feb 27, 2012 at 21:54, aaron morton  
 wrote:

 What settings do you have for cassandra.range.batch.size

 and rpc_timeout_in_ms  ? Have you tried reducing the first and/or 
 increasing

 the second ?


 Cheers


 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com


 On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:


 On Sun, Feb 26, 2012 at 04:25, Edward Capriolo 

 wrote:


 Did you see the notes here?



 I'm not sure what do you mean by the notes?


 I'm using the mapred.* settings suggested there:


     

         mapred.max.tracker.failures

         20

     

     

         mapred.map.max.attempts

         20

     

     

         mapred.reduce.max.attempts

         20

     


 But I still see the timeouts that I haven't with cassandra-all 0.8.7.


 P.


 http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting




>


Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
I've tryied cassandra-all 0.8.10 with fixed the rpc_endpoints ==
"0.0.0.0" bug, but the result is the same, there are still tasks over
1000%.  The only change is that there are real host names instead of
0.0.0.0 in the debug output.

Reconfiguring whole cluster is not possible, I can't test the
"rpc_address" commented out.

Regards,
P.


On Tue, Mar 6, 2012 at 12:26, Florent Lefillâtre  wrote:
> I remember a bug on the ColumnFamilyInputFormat class 0.8.10.
> It was a test rpc_endpoints == "0.0.0.0" in place of
> rpc_endpoint.equals("0.0.0.0"), may be it can help you
>
> Le 6 mars 2012 12:18, Florent Lefillâtre  a écrit :
>
>> Excuse me, I had not understood.
>> So, for me, the problem comes from the change of ColumnFamilyInputFormat
>> class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
>> endpoints and 0.8.10 uses rpc_endpoints).
>> With your config, splits fails, so Hadoop doesn't run a Map task on
>> approximtively 16384 rows (your cassandra.input.split.size) but on all the
>> rows of a node (certainly more over 16384).
>> However Hadoop estimate the task progress on 16384 inputs, it's why you
>> have something like 9076.81%.
>>
>> If you can't change rpc_adress configuration, I don't know how you can
>> solve your problem :/, sorry.
>>
>> Le 6 mars 2012 11:53, Patrik Modesto  a écrit :
>>
>>> Hi Florent,
>>>
>>> I don't change the server version, it is the Cassandra 0.8.10. I
>>> change just the version of cassandra-all in pom.xml of the mapreduce
>>> job.
>>>
>>> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
>>> cassandra to bind RPC to all interfaces.
>>>
>>> Regards,
>>> P.
>>>
>>> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre 
>>> wrote:
>>> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
>>> > In my case the split of token range failed.
>>> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
>>> > May be see if you have not configuration changes between 0.8.7 and
>>> > 0.8.10
>>> >
>>> >
>>> > Le 6 mars 2012 09:32, Patrik Modesto  a écrit
>>> > :
>>> >
>>> >> Hi,
>>> >>
>>> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>>> >> Timeouts I get are not because of the Cassandra can't handle the
>>> >> requests. I've noticed there are several tasks that show proggess of
>>> >> several thousands percents. Seems like they are looping their range of
>>> >> keys. I've run the job with debug enabled and the ranges look ok, see
>>> >> http://pastebin.com/stVsFzLM
>>> >>
>>> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>>> >> number of mappers the job creates:
>>> >> 0.8.7: 4680
>>> >> 0.8.10: 595
>>> >>
>>> >> Task       Complete
>>> >> task_201202281457_2027_m_41 9076.81%
>>> >> task_201202281457_2027_m_73 9639.04%
>>> >> task_201202281457_2027_m_000105 10538.60%
>>> >> task_201202281457_2027_m_000108 9364.17%
>>> >>
>>> >> None of this happens with cassandra-all 0.8.7.
>>> >>
>>> >> Regards,
>>> >> P.
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
>>> >> 
>>> >> wrote:
>>> >> > I'll alter these settings and will let you know.
>>> >> >
>>> >> > Regards,
>>> >> > P.
>>> >> >
>>> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton
>>> >> > 
>>> >> > wrote:
>>> >> >> Have you tried lowering the  batch size and increasing the time
>>> >> >> out?
>>> >> >> Even
>>> >> >> just to get it to work.
>>> >> >>
>>> >> >> If you get a TimedOutException it means CL number of servers did
>>> >> >> not
>>> >> >> respond
>>> >> >> in time.
>>> >> >>
>>> >> >> Cheers
>>> >> >>
>>> >> >> -
>>> >> >> Aaron Morton
>>> >> >> Freelance Developer
>>> >> >> @aaronmorton
>>> >> >> http://www.thelastpickle.com
>>> >> >>
>>> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>> >> >>
>>> >> >> Hi aaron,
>>> >> >>
>>> >> >> this is our current settings:
>>> >> >>
>>> >> >>  
>>> >> >>  cassandra.range.batch.size
>>> >> >>  1024
>>> >> >>  
>>> >> >>
>>> >> >>  
>>> >> >>  cassandra.input.split.size
>>> >> >>  16384
>>> >> >>  
>>> >> >>
>>> >> >> rpc_timeout_in_ms: 3
>>> >> >>
>>> >> >> Regards,
>>> >> >> P.
>>> >> >>
>>> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton
>>> >> >> 
>>> >> >> wrote:
>>> >> >>
>>> >> >> What settings do you have for cassandra.range.batch.size
>>> >> >>
>>> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>>> >> >> increasing
>>> >> >>
>>> >> >> the second ?
>>> >> >>
>>> >> >>
>>> >> >> Cheers
>>> >> >>
>>> >> >>
>>> >> >> -
>>> >> >>
>>> >> >> Aaron Morton
>>> >> >>
>>> >> >> Freelance Developer
>>> >> >>
>>> >> >> @aaronmorton
>>> >> >>
>>> >> >> http://www.thelastpickle.com
>>> >> >>
>>> >> >>
>>> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>> >> >>
>>> >> >>
>>> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
>>> >> >> 
>>> >> >>
>>> >> >> wrote:
>>> >> >>
>>> >> >>
>>> >> >> Did you see the notes here?
>>> >> >>
>>> >

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
I remember a bug on the ColumnFamilyInputFormat class 0.8.10.
It was a test rpc_endpoints == "0.0.0.0" in place of
rpc_endpoint.equals("0.0.0.0"), may be it can help you

Le 6 mars 2012 12:18, Florent Lefillâtre  a écrit :

> Excuse me, I had not understood.
> So, for me, the problem comes from the change of ColumnFamilyInputFormat
> class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
> endpoints and 0.8.10 uses rpc_endpoints).
> With your config, splits fails, so Hadoop doesn't run a Map task on
> approximtively 16384 rows (your cassandra.input.split.size) but on all the
> rows of a node (certainly more over 16384).
> However Hadoop estimate the task progress on 16384 inputs, it's why you
> have something like 9076.81%.
>
> If you can't change rpc_adress configuration, I don't know how you can
> solve your problem :/, sorry.
>
> Le 6 mars 2012 11:53, Patrik Modesto  a écrit :
>
> Hi Florent,
>>
>> I don't change the server version, it is the Cassandra 0.8.10. I
>> change just the version of cassandra-all in pom.xml of the mapreduce
>> job.
>>
>> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
>> cassandra to bind RPC to all interfaces.
>>
>> Regards,
>> P.
>>
>> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre 
>> wrote:
>> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
>> > In my case the split of token range failed.
>> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
>> > May be see if you have not configuration changes between 0.8.7 and
>> 0.8.10
>> >
>> >
>> > Le 6 mars 2012 09:32, Patrik Modesto  a
>> écrit :
>> >
>> >> Hi,
>> >>
>> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> >> Timeouts I get are not because of the Cassandra can't handle the
>> >> requests. I've noticed there are several tasks that show proggess of
>> >> several thousands percents. Seems like they are looping their range of
>> >> keys. I've run the job with debug enabled and the ranges look ok, see
>> >> http://pastebin.com/stVsFzLM
>> >>
>> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> >> number of mappers the job creates:
>> >> 0.8.7: 4680
>> >> 0.8.10: 595
>> >>
>> >> Task   Complete
>> >> task_201202281457_2027_m_41 9076.81%
>> >> task_201202281457_2027_m_73 9639.04%
>> >> task_201202281457_2027_m_000105 10538.60%
>> >> task_201202281457_2027_m_000108 9364.17%
>> >>
>> >> None of this happens with cassandra-all 0.8.7.
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <
>> patrik.mode...@gmail.com>
>> >> wrote:
>> >> > I'll alter these settings and will let you know.
>> >> >
>> >> > Regards,
>> >> > P.
>> >> >
>> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton > >
>> >> > wrote:
>> >> >> Have you tried lowering the  batch size and increasing the time out?
>> >> >> Even
>> >> >> just to get it to work.
>> >> >>
>> >> >> If you get a TimedOutException it means CL number of servers did not
>> >> >> respond
>> >> >> in time.
>> >> >>
>> >> >> Cheers
>> >> >>
>> >> >> -
>> >> >> Aaron Morton
>> >> >> Freelance Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >>
>> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >> >>
>> >> >> Hi aaron,
>> >> >>
>> >> >> this is our current settings:
>> >> >>
>> >> >>  
>> >> >>  cassandra.range.batch.size
>> >> >>  1024
>> >> >>  
>> >> >>
>> >> >>  
>> >> >>  cassandra.input.split.size
>> >> >>  16384
>> >> >>  
>> >> >>
>> >> >> rpc_timeout_in_ms: 3
>> >> >>
>> >> >> Regards,
>> >> >> P.
>> >> >>
>> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <
>> aa...@thelastpickle.com>
>> >> >> wrote:
>> >> >>
>> >> >> What settings do you have for cassandra.range.batch.size
>> >> >>
>> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>> >> >> increasing
>> >> >>
>> >> >> the second ?
>> >> >>
>> >> >>
>> >> >> Cheers
>> >> >>
>> >> >>
>> >> >> -
>> >> >>
>> >> >> Aaron Morton
>> >> >>
>> >> >> Freelance Developer
>> >> >>
>> >> >> @aaronmorton
>> >> >>
>> >> >> http://www.thelastpickle.com
>> >> >>
>> >> >>
>> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >> >>
>> >> >>
>> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <
>> edlinuxg...@gmail.com>
>> >> >>
>> >> >> wrote:
>> >> >>
>> >> >>
>> >> >> Did you see the notes here?
>> >> >>
>> >> >>
>> >> >>
>> >> >> I'm not sure what do you mean by the notes?
>> >> >>
>> >> >>
>> >> >> I'm using the mapred.* settings suggested there:
>> >> >>
>> >> >>
>> >> >> 
>> >> >>
>> >> >> mapred.max.tracker.failures
>> >> >>
>> >> >> 20
>> >> >>
>> >> >> 
>> >> >>
>> >> >> 
>> >> >>
>> >> >> mapred.map.max.attempts
>> >> >>
>> >> >> 20
>> >> >>
>> >> >> 
>> >> >>
>> >> >> 
>> >> >>
>> >> >> mapred.reduce.max.attempts
>> >> >>
>> >> >> 20
>> >> >>
>> >> >

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
Excuse me, I had not understood.
So, for me, the problem comes from the change of ColumnFamilyInputFormat
class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
endpoints and 0.8.10 uses rpc_endpoints).
With your config, splits fails, so Hadoop doesn't run a Map task on
approximtively 16384 rows (your cassandra.input.split.size) but on all the
rows of a node (certainly more over 16384).
However Hadoop estimate the task progress on 16384 inputs, it's why you
have something like 9076.81%.

If you can't change rpc_adress configuration, I don't know how you can
solve your problem :/, sorry.

Le 6 mars 2012 11:53, Patrik Modesto  a écrit :

> Hi Florent,
>
> I don't change the server version, it is the Cassandra 0.8.10. I
> change just the version of cassandra-all in pom.xml of the mapreduce
> job.
>
> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
> cassandra to bind RPC to all interfaces.
>
> Regards,
> P.
>
> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre 
> wrote:
> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
> > In my case the split of token range failed.
> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
> > May be see if you have not configuration changes between 0.8.7 and 0.8.10
> >
> >
> > Le 6 mars 2012 09:32, Patrik Modesto  a écrit
> :
> >
> >> Hi,
> >>
> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> >> Timeouts I get are not because of the Cassandra can't handle the
> >> requests. I've noticed there are several tasks that show proggess of
> >> several thousands percents. Seems like they are looping their range of
> >> keys. I've run the job with debug enabled and the ranges look ok, see
> >> http://pastebin.com/stVsFzLM
> >>
> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> >> number of mappers the job creates:
> >> 0.8.7: 4680
> >> 0.8.10: 595
> >>
> >> Task   Complete
> >> task_201202281457_2027_m_41 9076.81%
> >> task_201202281457_2027_m_73 9639.04%
> >> task_201202281457_2027_m_000105 10538.60%
> >> task_201202281457_2027_m_000108 9364.17%
> >>
> >> None of this happens with cassandra-all 0.8.7.
> >>
> >> Regards,
> >> P.
> >>
> >>
> >>
> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto  >
> >> wrote:
> >> > I'll alter these settings and will let you know.
> >> >
> >> > Regards,
> >> > P.
> >> >
> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton 
> >> > wrote:
> >> >> Have you tried lowering the  batch size and increasing the time out?
> >> >> Even
> >> >> just to get it to work.
> >> >>
> >> >> If you get a TimedOutException it means CL number of servers did not
> >> >> respond
> >> >> in time.
> >> >>
> >> >> Cheers
> >> >>
> >> >> -
> >> >> Aaron Morton
> >> >> Freelance Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >>
> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >> >>
> >> >> Hi aaron,
> >> >>
> >> >> this is our current settings:
> >> >>
> >> >>  
> >> >>  cassandra.range.batch.size
> >> >>  1024
> >> >>  
> >> >>
> >> >>  
> >> >>  cassandra.input.split.size
> >> >>  16384
> >> >>  
> >> >>
> >> >> rpc_timeout_in_ms: 3
> >> >>
> >> >> Regards,
> >> >> P.
> >> >>
> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton  >
> >> >> wrote:
> >> >>
> >> >> What settings do you have for cassandra.range.batch.size
> >> >>
> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> >> >> increasing
> >> >>
> >> >> the second ?
> >> >>
> >> >>
> >> >> Cheers
> >> >>
> >> >>
> >> >> -
> >> >>
> >> >> Aaron Morton
> >> >>
> >> >> Freelance Developer
> >> >>
> >> >> @aaronmorton
> >> >>
> >> >> http://www.thelastpickle.com
> >> >>
> >> >>
> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >> >>
> >> >>
> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <
> edlinuxg...@gmail.com>
> >> >>
> >> >> wrote:
> >> >>
> >> >>
> >> >> Did you see the notes here?
> >> >>
> >> >>
> >> >>
> >> >> I'm not sure what do you mean by the notes?
> >> >>
> >> >>
> >> >> I'm using the mapred.* settings suggested there:
> >> >>
> >> >>
> >> >> 
> >> >>
> >> >> mapred.max.tracker.failures
> >> >>
> >> >> 20
> >> >>
> >> >> 
> >> >>
> >> >> 
> >> >>
> >> >> mapred.map.max.attempts
> >> >>
> >> >> 20
> >> >>
> >> >> 
> >> >>
> >> >> 
> >> >>
> >> >> mapred.reduce.max.attempts
> >> >>
> >> >> 20
> >> >>
> >> >> 
> >> >>
> >> >>
> >> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> >> >>
> >> >>
> >> >> P.
> >> >>
> >> >>
> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >> >>
> >> >>
> >> >>
> >> >>
> >
> >
>


RE: Mutation Dropped Messages

2012-03-06 Thread Tiwari, Dushyant
1.   One node is running at 8G rest on 10G - same config

2.   Nodetool -

Status State   LoadOwnsToken

   
162563731948587347959549934419333022646

Up Normal  107.79 MB   25.00%  34957844353235424160784456632419943350

Up Normal  116.44 MB   25.00%  77493140218352732093706282561390969782

Up Normal  27.01 MB12.68%  99065646426277998282363457251162269147

Up Normal  35.9 MB 12.32%  120028436083470040026628108490361996214

Up Normal  512.55 KB   25.00%  162563731948587347959549934419333022646



RF:2 and CL: QUORUM - writes at a rate of 1750 rows/s - every row - 5 cols and 
2 of them indexes.



Thanks,

Dushyant



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, March 05, 2012 11:07 PM
To: user@cassandra.apache.org
Subject: Re: Mutation Dropped Messages

I increased the size of the cluster also the concurrent_writes parameter. Still 
there is a node which keeps on dropping the mutation messages.
Ensure all the nodes have the same spec, and the nodes have the same config. In 
a virtual environment consider moving the node.

Is this due to some improper load balancing?
What does nodetool ring say and what sort of queries (and RF and CL) are you 
sending.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 3:58 AM, Tiwari, Dushyant wrote:


Hey Aaron,

I increased the size of the cluster also the concurrent_writes parameter. Still 
there is a node which keeps on dropping the mutation messages. The other nodes 
are not dropping mutation messages. I am using Hector API and had done nothing 
for load balancing so far. Just provided the host:port of the nodes in the 
Cassandrahostconfig. Is this due to some improper load balancing? Also the 
physical host where the node is hosted is relatively heavier than other nodes' 
host. What can I do to improve?
PS: The node is seed of the cluster.

Thanks,
Dushyant

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, March 05, 2012 4:15 PM
To: user@cassandra.apache.org
Subject: Re: Mutation Dropped Messages

1.   Which parameters to tune in the config files? - Especially looking for 
heavy writes
The node is overloaded. It may be because there are no enough nodes, or the 
node is under temporary stress such as GC or repair.
If you have spare IO / CPU capacity you could increase the current_writes to 
increase throughput on the write stage. You then need to ensure the commit log 
and, to a lesser degree, the data volumes can keep up.

2.   What is the difference between TimedOutException and silently dropping 
mutation messages while operating on a CL of QUORUM.
TimedOutExceptions means CL nodes did not respond to the coordinator before 
rpc_timeout. Dropping messages happens when a message is removed from the queue 
in the a thread pool after rpc_timeout has occurred. it is a feature of the 
architecture, and correct behaviour under stress.
Inconsistencies created by dropped messages are repaired via reads as high CL, 
HH (in 1.+), Read Repair or Anti Entropy.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote:



Hi All,

While benchmarking Cassandra I found "Mutation Dropped" messages in the logs.  
Now I know this is a good old question. It will be really great if someone can 
provide a check list to recover when such a thing happens. I am looking for 
answers of the following questions  -

1.   Which parameters to tune in the config files? - Especially looking for 
heavy writes
2.   What is the difference between TimedOutException and silently dropping 
mutation messages while operating on a CL of QUORUM.


Regards,
Dushyant

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.


NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do no

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
Hi Florent,

I don't change the server version, it is the Cassandra 0.8.10. I
change just the version of cassandra-all in pom.xml of the mapreduce
job.

I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
cassandra to bind RPC to all interfaces.

Regards,
P.

On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre  wrote:
> Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
> In my case the split of token range failed.
> I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
> May be see if you have not configuration changes between 0.8.7 and 0.8.10
>
>
> Le 6 mars 2012 09:32, Patrik Modesto  a écrit :
>
>> Hi,
>>
>> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> Timeouts I get are not because of the Cassandra can't handle the
>> requests. I've noticed there are several tasks that show proggess of
>> several thousands percents. Seems like they are looping their range of
>> keys. I've run the job with debug enabled and the ranges look ok, see
>> http://pastebin.com/stVsFzLM
>>
>> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> number of mappers the job creates:
>> 0.8.7: 4680
>> 0.8.10: 595
>>
>> Task       Complete
>> task_201202281457_2027_m_41 9076.81%
>> task_201202281457_2027_m_73 9639.04%
>> task_201202281457_2027_m_000105 10538.60%
>> task_201202281457_2027_m_000108 9364.17%
>>
>> None of this happens with cassandra-all 0.8.7.
>>
>> Regards,
>> P.
>>
>>
>>
>> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto 
>> wrote:
>> > I'll alter these settings and will let you know.
>> >
>> > Regards,
>> > P.
>> >
>> > On Tue, Feb 28, 2012 at 09:23, aaron morton 
>> > wrote:
>> >> Have you tried lowering the  batch size and increasing the time out?
>> >> Even
>> >> just to get it to work.
>> >>
>> >> If you get a TimedOutException it means CL number of servers did not
>> >> respond
>> >> in time.
>> >>
>> >> Cheers
>> >>
>> >> -
>> >> Aaron Morton
>> >> Freelance Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >>
>> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >>
>> >> Hi aaron,
>> >>
>> >> this is our current settings:
>> >>
>> >>  
>> >>  cassandra.range.batch.size
>> >>  1024
>> >>  
>> >>
>> >>  
>> >>  cassandra.input.split.size
>> >>  16384
>> >>  
>> >>
>> >> rpc_timeout_in_ms: 3
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >> On Mon, Feb 27, 2012 at 21:54, aaron morton 
>> >> wrote:
>> >>
>> >> What settings do you have for cassandra.range.batch.size
>> >>
>> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>> >> increasing
>> >>
>> >> the second ?
>> >>
>> >>
>> >> Cheers
>> >>
>> >>
>> >> -
>> >>
>> >> Aaron Morton
>> >>
>> >> Freelance Developer
>> >>
>> >> @aaronmorton
>> >>
>> >> http://www.thelastpickle.com
>> >>
>> >>
>> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >>
>> >>
>> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo 
>> >>
>> >> wrote:
>> >>
>> >>
>> >> Did you see the notes here?
>> >>
>> >>
>> >>
>> >> I'm not sure what do you mean by the notes?
>> >>
>> >>
>> >> I'm using the mapred.* settings suggested there:
>> >>
>> >>
>> >> 
>> >>
>> >> mapred.max.tracker.failures
>> >>
>> >> 20
>> >>
>> >> 
>> >>
>> >> 
>> >>
>> >> mapred.map.max.attempts
>> >>
>> >> 20
>> >>
>> >> 
>> >>
>> >> 
>> >>
>> >> mapred.reduce.max.attempts
>> >>
>> >> 20
>> >>
>> >> 
>> >>
>> >>
>> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>> >>
>> >>
>> >> P.
>> >>
>> >>
>> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>> >>
>> >>
>> >>
>> >>
>
>


Re: Old data coming alive after adding node

2012-03-06 Thread Stefan Reek

Hi Aaron,

Thanks for the quick reply.
All our writes/deletes are done with CL.QUORUM.
Our reads are done with CL.ONE. Although the reads that confirmed the 
old data were done with CL.QUORUM.
According to 
https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 
0.6.6 has the same patch
for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions in 
0.6.6 and up also purged tombstones.
The only suspicious thing I noticed was that after adding the fourth 
node repairs became extremely slow and heavy.
Running it degraded the performance of the whole cluster and the new 
node even went OOM when running it.


Cheers,

Stefan

On 03/06/2012 10:51 AM, aaron morton wrote:
After we added a fourth node, keeping RF=3, some old data appeared in 
the database.
What CL are you working at ? (Should not matter too much with repair 
working, just asking)



We don't run compact on the nodes explicitly as I understand that 
running repair will trigger a
major compaction. I'm not entirely sure if it does so, but in any 
case the tombstones will be removed by a minor

compaction.
In 0.6.x tombstones were only purged during a major / manual 
compaction. Purging during minor compaction came in during 0.7

https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467


Can anyone think of any reason why the old data reappeared?
It sounds like you are doing things correctly. The complicating factor 
is 0.6 is so very old.



If I wanted to poke around some more I would conduct reads as CL one 
against nodes and see if they return the "deleted" data or not. This 
would help me understand if the tombstone is still out there.


I would also poke around a lot in the logs to make sure repair was 
running as expected and completing. If you find anything suspicious 
post examples.


Finally I would ensure CL QUROUM was been used.

Hope that helps.


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:13 PM, Stefan Reek wrote:


Hi,

We were running a 3-node cluster of cassandra 0.6.13 with RF=3.
After we added a fourth node, keeping RF=3, some old data appeared in 
the database.
As far as I understand this can only happen if nodetool repair wasn't 
run for more than GCGraceSeconds.

Our GCGraceSeconds is set to the default of 10 days (864000 seconds).
We have  a scheduled cronjob to run repair once each week on every 
node, each on another day.

I'm sure that none of the nodes ever skipped running a repair.
We don't run compact on the nodes explicitly as I understand that 
running repair will trigger a
major compaction. I'm not entirely sure if it does so, but in any 
case the tombstones will be removed by a minor
compaction. So I expected that the reappearing data, which is a 
couple of months old in some cases, was long gone

by the time we added the node.

Can anyone think of any reason why the old data reappeared?

Stefan






Re: running two rings on the same subnet

2012-03-06 Thread Tamar Fraenkel
Arron, Thanks for your response. I was afraid this is the issue.
Can you give me some direction regarding the fine tuning of my VMs, I would
like to explore that option some more.
Thanks!

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Mar 6, 2012 at 11:58 AM, aaron morton wrote:

> You do not have enough memory allocated to the JVM and are suffering from
> excessive GC as a result.
>
> There are some tuning things you can try, but 480MB is not enough. 1GB
> would be a better start, 2 better than that.
>
> Consider using https://github.com/pcmanus/ccm for testing multiple
> instances on a single server rather than a VM.
>
> Cheers
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote:
>
> I have some more info, after couple of hours running the problematic node
> became again 100% CPU and I had to reboot it, last lines from log show it
> did GC:
>
>  INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line
> 122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240
>  INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line
> 122) GC for Copy: 3927 ms for 1 collections, 156572576 used; max is
> 513802240
>  INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line
> 50) Pool NameActive   Pending   Blocked
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line
> 65) ReadStage 2 2 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line
> 65) RequestResponseStage  0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
> 65) ReadRepairStage   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
> 65) MutationStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
> 65) ReplicateOnWriteStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
> 65) GossipStage   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
> 65) AntiEntropyStage  0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
> 65) MigrationStage0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
> 65) StreamStage   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
> 65) MemtablePostFlusher   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
> 65) FlushWriter   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
> 65) MiscStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
> 65) InternalResponseStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
> 65) HintedHandoff 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,553 StatusLogger.java (line
> 69) CompactionManager   n/a 0
>
> Thanks,
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> 
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
> On Tue, Mar 6, 2012 at 9:12 AM, Tamar Fraenkel wrote:
>
>> Works..
>>
>> But during the night my setup encountered a problem.
>> I have two VMs on my cluster (running on VmWare ESXi).
>> Each VM has1GB memory, and two Virtual Disks of 16 GB
>> They are running on a small server with 4CPUs (2.66 GHz), and 4 GB memory
>> (together with two other VMs)
>> I put cassandra data on the second disk of each machine.
>> VMs are running Ubuntu 11.10 and cassandra 1.0.7.
>>
>> I left them running overnight and this morning when I came:
>> In one node cassandra was down, and the last thing in the system.log is:
>>
>>  INFO [CompactionExecutor:150] 2012-03-06 00:55:04,821
>> CompactionTask.java (line 113) Compacting
>> [SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1243-Data.db'),
>> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1245-Data.db'),
>> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1242-Data.db'),
>> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1244-Data.db')]
>>  INFO [CompactionExecutor:150] 2012-03-06 00:55:07,919
>> CompactionTask.java (line 221) Compacted to
>> [/opt/cassandra/data/tok/tk_verti

Re: running two rings on the same subnet

2012-03-06 Thread aaron morton
You do not have enough memory allocated to the JVM and are suffering from 
excessive GC as a result.

There are some tuning things you can try, but 480MB is not enough. 1GB would be 
a better start, 2 better than that. 

Consider using https://github.com/pcmanus/ccm for testing multiple instances on 
a single server rather than a VM.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:21 PM, Tamar Fraenkel wrote:

> I have some more info, after couple of hours running the problematic node 
> became again 100% CPU and I had to reboot it, last lines from log show it did 
> GC:
> 
>  INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line 122) 
> GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240
>  INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line 122) 
> GC for Copy: 3927 ms for 1 collections, 156572576 used; max is 513802240
>  INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line 50) 
> Pool NameActive   Pending   Blocked
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line 65) 
> ReadStage 2 2 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line 65) 
> RequestResponseStage  0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
> ReadRepairStage   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
> MutationStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
> ReplicateOnWriteStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line 65) 
> GossipStage   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
> AntiEntropyStage  0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
> MigrationStage0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
> StreamStage   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line 65) 
> MemtablePostFlusher   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
> FlushWriter   0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
> MiscStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
> InternalResponseStage 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line 65) 
> HintedHandoff 0 0 0
>  INFO [ScheduledTasks:1] 2012-03-06 10:29:03,553 StatusLogger.java (line 69) 
> CompactionManager   n/a 0
> 
> Thanks,
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> 
> 
> ta...@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 
> 
> 
> 
> On Tue, Mar 6, 2012 at 9:12 AM, Tamar Fraenkel  wrote:
> Works..
> 
> But during the night my setup encountered a problem.
> I have two VMs on my cluster (running on VmWare ESXi).
> Each VM has1GB memory, and two Virtual Disks of 16 GB
> They are running on a small server with 4CPUs (2.66 GHz), and 4 GB memory 
> (together with two other VMs)
> I put cassandra data on the second disk of each machine.
> VMs are running Ubuntu 11.10 and cassandra 1.0.7.
> 
> I left them running overnight and this morning when I came:
> In one node cassandra was down, and the last thing in the system.log is:
> 
>  INFO [CompactionExecutor:150] 2012-03-06 00:55:04,821 CompactionTask.java 
> (line 113) Compacting 
> [SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1243-Data.db'),
>  
> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1245-Data.db'),
>  
> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1242-Data.db'),
>  
> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1244-Data.db')]
>  INFO [CompactionExecutor:150] 2012-03-06 00:55:07,919 CompactionTask.java 
> (line 221) Compacted to 
> [/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1246-Data.db,].  
> 32,424,771 to 26,447,685 (~81% of original) bytes for 58,938 keys at 
> 8.144165MB/s.  Time: 3,097ms.
> 
> 
> The other node was using all it's CPU and I had to restart it.
> After that, I can see that the last lines in it's system.log are that the 
> other node is down...
> 
>  INFO [FlushWriter:142] 2012-03-06 00:55:02,418 Memtable.java (line 246) 
> Writing Memtable-tk_

Re: Old data coming alive after adding node

2012-03-06 Thread aaron morton
> After we added a fourth node, keeping RF=3, some old data appeared in the 
> database.
What CL are you working at ? (Should not matter too much with repair working, 
just asking)


> We don't run compact on the nodes explicitly as I understand that running 
> repair will trigger a
> major compaction. I'm not entirely sure if it does so, but in any case the 
> tombstones will be removed by a minor
> compaction.
In 0.6.x tombstones were only purged during a major / manual compaction. 
Purging during minor compaction came in during 0.7
https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467

> Can anyone think of any reason why the old data reappeared?
It sounds like you are doing things correctly. The complicating factor is 0.6 
is so very old. 


If I wanted to poke around some more I would conduct reads as CL one against 
nodes and see if they return the "deleted" data or not. This would help me 
understand if the tombstone is still out there. 

I would also poke around a lot in the logs to make sure repair was running as 
expected and completing. If you find anything suspicious post examples. 

Finally I would ensure CL QUROUM was been used. 

Hope that helps.


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:13 PM, Stefan Reek wrote:

> Hi,
> 
> We were running a 3-node cluster of cassandra 0.6.13 with RF=3.
> After we added a fourth node, keeping RF=3, some old data appeared in the 
> database.
> As far as I understand this can only happen if nodetool repair wasn't run for 
> more than GCGraceSeconds.
> Our GCGraceSeconds is set to the default of 10 days (864000 seconds).
> We have  a scheduled cronjob to run repair once each week on every node, each 
> on another day.
> I'm sure that none of the nodes ever skipped running a repair.
> We don't run compact on the nodes explicitly as I understand that running 
> repair will trigger a
> major compaction. I'm not entirely sure if it does so, but in any case the 
> tombstones will be removed by a minor
> compaction. So I expected that the reappearing data, which is a couple of 
> months old in some cases, was long gone
> by the time we added the node.
> 
> Can anyone think of any reason why the old data reappeared?
> 
> Stefan



Re: running two rings on the same subnet

2012-03-06 Thread Tamar Fraenkel
I have some more info, after couple of hours running the problematic node
became again 100% CPU and I had to reboot it, last lines from log show it
did GC:

 INFO [ScheduledTasks:1] 2012-03-06 10:28:00,880 GCInspector.java (line
122) GC for Copy: 203 ms for 1 collections, 185983456 used; max is 513802240
 INFO [ScheduledTasks:1] 2012-03-06 10:28:50,595 GCInspector.java (line
122) GC for Copy: 3927 ms for 1 collections, 156572576 used; max is
513802240
 INFO [ScheduledTasks:1] 2012-03-06 10:28:55,434 StatusLogger.java (line
50) Pool NameActive   Pending   Blocked
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,298 StatusLogger.java (line
65) ReadStage 2 2 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,499 StatusLogger.java (line
65) RequestResponseStage  0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
65) ReadRepairStage   0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
65) MutationStage 0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
65) ReplicateOnWriteStage 0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,500 StatusLogger.java (line
65) GossipStage   0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
65) AntiEntropyStage  0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
65) MigrationStage0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
65) StreamStage   0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,501 StatusLogger.java (line
65) MemtablePostFlusher   0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
65) FlushWriter   0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
65) MiscStage 0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
65) InternalResponseStage 0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,502 StatusLogger.java (line
65) HintedHandoff 0 0 0
 INFO [ScheduledTasks:1] 2012-03-06 10:29:03,553 StatusLogger.java (line
69) CompactionManager   n/a 0

Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Mar 6, 2012 at 9:12 AM, Tamar Fraenkel  wrote:

> Works..
>
> But during the night my setup encountered a problem.
> I have two VMs on my cluster (running on VmWare ESXi).
> Each VM has1GB memory, and two Virtual Disks of 16 GB
> They are running on a small server with 4CPUs (2.66 GHz), and 4 GB memory
> (together with two other VMs)
> I put cassandra data on the second disk of each machine.
> VMs are running Ubuntu 11.10 and cassandra 1.0.7.
>
> I left them running overnight and this morning when I came:
> In one node cassandra was down, and the last thing in the system.log is:
>
>  INFO [CompactionExecutor:150] 2012-03-06 00:55:04,821
> CompactionTask.java (line 113) Compacting
> [SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1243-Data.db'),
> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1245-Data.db'),
> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1242-Data.db'),
> SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1244-Data.db')]
>  INFO [CompactionExecutor:150] 2012-03-06 00:55:07,919 CompactionTask.java
> (line 221) Compacted to
> [/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1246-Data.db,].
>  32,424,771 to 26,447,685 (~81% of original) bytes for 58,938 keys at
> 8.144165MB/s.  Time: 3,097ms.
>
>
> The other node was using all it's CPU and I had to restart it.
> After that, I can see that the last lines in it's system.log are that the
> other node is down...
>
>  INFO [FlushWriter:142] 2012-03-06 00:55:02,418 Memtable.java (line 246)
> Writing Memtable-tk_vertical_tag_story_indx@1365852701(1122169/25154556
> serialized/live bytes, 21173 ops)
>  INFO [FlushWriter:142] 2012-03-06 00:55:02,742 Memtable.java (line 283)
> Completed flushing
> /opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1244-Data.db (2075930
> bytes)
>  INFO [GossipTasks:1] 2012-03-06 08:02:18,584 Gossiper.java (line 818)
> InetAddress /10.0.0.31 is now dead.
>
> How can I trace why that happened?
> Also, I brought cassandra up in both nodes. They both spend long time
> reading commit logs, but now they seem to run.
> Any idea how to debug or improve my setup?
> Thanks,
> Tamar
>
>
>
> *Tamar Fraenkel *

Re: Secondary indexes don't go away after metadata change

2012-03-06 Thread aaron morton
When the new node comes online the history of schema changes are streamed to 
it. I've not looked at the code but it could be that schema migrations are 
creating Indexes. That are then deleted from the schema but not from the DB 
it's self.

Does that fit your scenario ? When the new node comes online does it log 
migrations been applied and then indexes been created ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 10:56 AM, Frisch, Michael wrote:

> Thank you very much for your response.  It is true that the older, previously 
> existing nodes are not snapshotting the indexes that I had removed.  I’ll go 
> ahead and just delete those SSTables from the data directory.  They may be 
> around still because they were created back when we used 0.8.
>  
> The more troubling issue is with adding new nodes to the cluster though.  It 
> built indexes for column families that have had all indexes dropped weeks or 
> months in the past.  It also will snapshot the index SSTables that it 
> created.  The index files are non-empty as well, some are hundreds of 
> megabytes.
>  
> All nodes have the same schema, none list themselves as having the rows 
> indexed.  I cannot drop the indexes via the CLI either because it says that 
> they don’t exist.  It’s quite perplexing.
>  
> - Mike
>  
>  
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Monday, March 05, 2012 3:58 AM
> To: user@cassandra.apache.org
> Subject: Re: Secondary indexes don't go away after metadata change
>  
> The secondary index CF's are marked as no longer required / marked as 
> compacted. under 1.x they would then be deleted reasonably quickly, and 
> definitely deleted after a restart. 
>  
> Is there a zero length .Compacted file there ? 
>  
> Also, when adding a new node to the ring the new node will build indexes for 
> the ones that supposedly don’t exist any longer.  Is this supposed to happen? 
>  Would this have happened if I had deleted the old SSTables from the 
> previously existing nodes?
> Check you have a consistent schema using describe cluster in the CLI. And 
> check the schema is what you think it is using show schema. 
>  
> Another trick is to do a snapshot. Only the files in use are included the 
> snapshot. 
>  
> Hope that helps. 
>  
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 2/03/2012, at 2:53 AM, Frisch, Michael wrote:
> 
> 
> I have a few column families that I decided to get rid of the secondary 
> indexes on.  I see that there aren’t any new index SSTables being created, 
> but all of the old ones remain (some from as far back as September).  Is it 
> safe to just delete then when the node is offline?  Should I run clean-up or 
> scrub?
>  
> Also, when adding a new node to the ring the new node will build indexes for 
> the ones that supposedly don’t exist any longer.  Is this supposed to happen? 
>  Would this have happened if I had deleted the old SSTables from the 
> previously existing nodes?
>  
> The nodes in question have either been upgraded from v0.8.1 => v1.0.2 
> (scrubbed at this time) => v1.0.6 or from v1.0.2 => v1.0.6.  The secondary 
> index was dropped when the nodes were version 1.0.6.  The new node added was 
> also 1.0.6.
>  
> - Mike



Old data coming alive after adding node

2012-03-06 Thread Stefan Reek

Hi,

We were running a 3-node cluster of cassandra 0.6.13 with RF=3.
After we added a fourth node, keeping RF=3, some old data appeared in 
the database.
As far as I understand this can only happen if nodetool repair wasn't 
run for more than GCGraceSeconds.

Our GCGraceSeconds is set to the default of 10 days (864000 seconds).
We have  a scheduled cronjob to run repair once each week on every node, 
each on another day.

I'm sure that none of the nodes ever skipped running a repair.
We don't run compact on the nodes explicitly as I understand that 
running repair will trigger a
major compaction. I'm not entirely sure if it does so, but in any case 
the tombstones will be removed by a minor
compaction. So I expected that the reappearing data, which is a couple 
of months old in some cases, was long gone

by the time we added the node.

Can anyone think of any reason why the old data reappeared?

Stefan


Re: Issue with nodetool clearsnapshot

2012-03-06 Thread aaron morton
> 1)Since you mentioned hard links, I would like to add that our data directory 
> itself is a sym-link. Could that be causing an issue ?
Seems unlikely. 

> I restarted the node and it went about deleting the files and the disk space 
> has been released. Can this be done using nodetool, and without restarting ?

Under 0.8.x they are deleted when the files are no longer in use and when JVM 
GC free's all references. You can provoke this by getting the JVM to GC using 
JConsole or another JMX client. 

If there is not enough disk free space GC is forced and free space reclaimed.  

Under 1.x file handles are counted and the files are quickly deleted. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/03/2012, at 7:38 AM, B R wrote:

> Hi Aaron,
> 
> 1)Since you mentioned hard links, I would like to add that our data directory 
> itself is a sym-link. Could that be causing an issue ?
> 
> 2)Yes, there are 0 byte files of the same numbers
> in Keyspace1 directory
> 0 Mar  4 01:33 Standard1-g-7317-Compacted
> 0 Mar  3 22:58 Standard1-g-7968-Compacted
> 0 Mar  3 23:10 Standard1-g-8778-Compacted
> 0 Mar  3 23:47 Standard1-g-8782-Compacted
> ...
> 
> I restarted the node and it went about deleting the files and the disk space 
> has been released. Can this be done using nodetool, and without restarting ?
> 
> Thanks.
> 
> On Mon, Mar 5, 2012 at 10:59 PM, aaron morton  wrote:
>> It seems that instead of removing the snapshot, clearsnapshot moved the data 
>> files from the snapshot directory to the parent directory and the size of 
>> the data for that keyspace has doubled.
> That is not possible, there is only code there to delete a files in the 
> snapshot. 
> 
> Note that in the snapshot are hard links to the files in the data dir. 
> Deleting / clearing the snapshot will not delete the files from the data dir 
> if they are still in use. 
> 
>>  Many of the files are looking like duplicates.
>> 
>> in Keyspace1 directory
>> 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db
>> 156987786084 Mar  4 01:33 Standard1-g-8850-Data.db
> Under 0.8.x files are not immediately deleted. Did the data directory contain 
> zero size -Compacted files with the same number ?
>   
> Cheers
> 
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5/03/2012, at 11:50 PM, B R wrote:
> 
>> Version 0.8.9
>> 
>> We run a 2 node cluster with RF=2. We ran a scrub and after that ran the 
>> clearsnapshot to remove the backup snapshot created by scrub. It seems that 
>> instead of removing the snapshot, clearsnapshot moved the data files from 
>> the snapshot directory to the parent directory and the size of the data for 
>> that keyspace has doubled. Many of the files are looking like duplicates.
>> 
>> in Keyspace1 directory
>> 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db
>> 156987786084 Mar  4 01:33 Standard1-g-8850-Data.db
>> 118211555728 Jan 31 12:50 Standard1-g-7968-Data.db
>> 118211555728 Mar  3 22:58 Standard1-g-8840-Data.db
>> 116902342895 Feb 25 02:04 Standard1-g-8832-Data.db
>> 116902342895 Mar  3 22:10 Standard1-g-8836-Data.db
>> 93788425710 Feb 21 04:20 Standard1-g-8791-Data.db
>> 93788425710 Mar  4 00:29 Standard1-g-8845-Data.db
>> .
>> 
>> Even though the nodetool ring command shows the correct data size for the 
>> node, the du -sh on the keyspace directory gives double the size.
>> 
>> Can you guide us to proceed from this situation ?
>> 
>> Thanks.
> 
> 



Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Jeremy Hanna
you may be running into this - 
https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it 
really affects the execution of the job itself though.

On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:

> Hi,
> 
> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> Timeouts I get are not because of the Cassandra can't handle the
> requests. I've noticed there are several tasks that show proggess of
> several thousands percents. Seems like they are looping their range of
> keys. I've run the job with debug enabled and the ranges look ok, see
> http://pastebin.com/stVsFzLM
> 
> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> number of mappers the job creates:
> 0.8.7: 4680
> 0.8.10: 595
> 
> Task   Complete
> task_201202281457_2027_m_41   9076.81%
> task_201202281457_2027_m_73   9639.04%
> task_201202281457_2027_m_000105   10538.60%
> task_201202281457_2027_m_000108   9364.17%
> 
> None of this happens with cassandra-all 0.8.7.
> 
> Regards,
> P.
> 
> 
> 
> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto  
> wrote:
>> I'll alter these settings and will let you know.
>> 
>> Regards,
>> P.
>> 
>> On Tue, Feb 28, 2012 at 09:23, aaron morton  wrote:
>>> Have you tried lowering the  batch size and increasing the time out? Even
>>> just to get it to work.
>>> 
>>> If you get a TimedOutException it means CL number of servers did not respond
>>> in time.
>>> 
>>> Cheers
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>> 
>>> Hi aaron,
>>> 
>>> this is our current settings:
>>> 
>>>  
>>>  cassandra.range.batch.size
>>>  1024
>>>  
>>> 
>>>  
>>>  cassandra.input.split.size
>>>  16384
>>>  
>>> 
>>> rpc_timeout_in_ms: 3
>>> 
>>> Regards,
>>> P.
>>> 
>>> On Mon, Feb 27, 2012 at 21:54, aaron morton  wrote:
>>> 
>>> What settings do you have for cassandra.range.batch.size
>>> 
>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or increasing
>>> 
>>> the second ?
>>> 
>>> 
>>> Cheers
>>> 
>>> 
>>> -
>>> 
>>> Aaron Morton
>>> 
>>> Freelance Developer
>>> 
>>> @aaronmorton
>>> 
>>> http://www.thelastpickle.com
>>> 
>>> 
>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>> 
>>> 
>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo 
>>> 
>>> wrote:
>>> 
>>> 
>>> Did you see the notes here?
>>> 
>>> 
>>> 
>>> I'm not sure what do you mean by the notes?
>>> 
>>> 
>>> I'm using the mapred.* settings suggested there:
>>> 
>>> 
>>> 
>>> 
>>> mapred.max.tracker.failures
>>> 
>>> 20
>>> 
>>> 
>>> 
>>> 
>>> 
>>> mapred.map.max.attempts
>>> 
>>> 20
>>> 
>>> 
>>> 
>>> 
>>> 
>>> mapred.reduce.max.attempts
>>> 
>>> 20
>>> 
>>> 
>>> 
>>> 
>>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>>> 
>>> 
>>> P.
>>> 
>>> 
>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>>> 
>>> 
>>> 
>>> 



Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Florent Lefillâtre
Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
In my case the split of token range failed.
I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
May be see if you have not configuration changes between 0.8.7 and 0.8.10


Le 6 mars 2012 09:32, Patrik Modesto  a écrit :

> Hi,
>
> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> Timeouts I get are not because of the Cassandra can't handle the
> requests. I've noticed there are several tasks that show proggess of
> several thousands percents. Seems like they are looping their range of
> keys. I've run the job with debug enabled and the ranges look ok, see
> http://pastebin.com/stVsFzLM
>
> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> number of mappers the job creates:
> 0.8.7: 4680
> 0.8.10: 595
>
> Task   Complete
> task_201202281457_2027_m_41 9076.81%
> task_201202281457_2027_m_73 9639.04%
> task_201202281457_2027_m_000105 10538.60%
> task_201202281457_2027_m_000108 9364.17%
>
> None of this happens with cassandra-all 0.8.7.
>
> Regards,
> P.
>
>
>
> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto 
> wrote:
> > I'll alter these settings and will let you know.
> >
> > Regards,
> > P.
> >
> > On Tue, Feb 28, 2012 at 09:23, aaron morton 
> wrote:
> >> Have you tried lowering the  batch size and increasing the time out?
> Even
> >> just to get it to work.
> >>
> >> If you get a TimedOutException it means CL number of servers did not
> respond
> >> in time.
> >>
> >> Cheers
> >>
> >> -
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >>
> >> Hi aaron,
> >>
> >> this is our current settings:
> >>
> >>  
> >>  cassandra.range.batch.size
> >>  1024
> >>  
> >>
> >>  
> >>  cassandra.input.split.size
> >>  16384
> >>  
> >>
> >> rpc_timeout_in_ms: 3
> >>
> >> Regards,
> >> P.
> >>
> >> On Mon, Feb 27, 2012 at 21:54, aaron morton 
> wrote:
> >>
> >> What settings do you have for cassandra.range.batch.size
> >>
> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> increasing
> >>
> >> the second ?
> >>
> >>
> >> Cheers
> >>
> >>
> >> -
> >>
> >> Aaron Morton
> >>
> >> Freelance Developer
> >>
> >> @aaronmorton
> >>
> >> http://www.thelastpickle.com
> >>
> >>
> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >>
> >>
> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo 
> >>
> >> wrote:
> >>
> >>
> >> Did you see the notes here?
> >>
> >>
> >>
> >> I'm not sure what do you mean by the notes?
> >>
> >>
> >> I'm using the mapred.* settings suggested there:
> >>
> >>
> >> 
> >>
> >> mapred.max.tracker.failures
> >>
> >> 20
> >>
> >> 
> >>
> >> 
> >>
> >> mapred.map.max.attempts
> >>
> >> 20
> >>
> >> 
> >>
> >> 
> >>
> >> mapred.reduce.max.attempts
> >>
> >> 20
> >>
> >> 
> >>
> >>
> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> >>
> >>
> >> P.
> >>
> >>
> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >>
> >>
> >>
> >>
>


Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Patrik Modesto
Hi,

I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
Timeouts I get are not because of the Cassandra can't handle the
requests. I've noticed there are several tasks that show proggess of
several thousands percents. Seems like they are looping their range of
keys. I've run the job with debug enabled and the ranges look ok, see
http://pastebin.com/stVsFzLM

Another difference between cassandra-all 0.8.7 and 0.8.10 is the
number of mappers the job creates:
0.8.7: 4680
0.8.10: 595

Task   Complete
task_201202281457_2027_m_41 9076.81%
task_201202281457_2027_m_73 9639.04%
task_201202281457_2027_m_000105 10538.60%
task_201202281457_2027_m_000108 9364.17%

None of this happens with cassandra-all 0.8.7.

Regards,
P.



On Tue, Feb 28, 2012 at 12:29, Patrik Modesto  wrote:
> I'll alter these settings and will let you know.
>
> Regards,
> P.
>
> On Tue, Feb 28, 2012 at 09:23, aaron morton  wrote:
>> Have you tried lowering the  batch size and increasing the time out? Even
>> just to get it to work.
>>
>> If you get a TimedOutException it means CL number of servers did not respond
>> in time.
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>
>> Hi aaron,
>>
>> this is our current settings:
>>
>>  
>>  cassandra.range.batch.size
>>  1024
>>  
>>
>>  
>>  cassandra.input.split.size
>>  16384
>>  
>>
>> rpc_timeout_in_ms: 3
>>
>> Regards,
>> P.
>>
>> On Mon, Feb 27, 2012 at 21:54, aaron morton  wrote:
>>
>> What settings do you have for cassandra.range.batch.size
>>
>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or increasing
>>
>> the second ?
>>
>>
>> Cheers
>>
>>
>> -
>>
>> Aaron Morton
>>
>> Freelance Developer
>>
>> @aaronmorton
>>
>> http://www.thelastpickle.com
>>
>>
>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>
>>
>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo 
>>
>> wrote:
>>
>>
>> Did you see the notes here?
>>
>>
>>
>> I'm not sure what do you mean by the notes?
>>
>>
>> I'm using the mapred.* settings suggested there:
>>
>>
>> 
>>
>> mapred.max.tracker.failures
>>
>> 20
>>
>> 
>>
>> 
>>
>> mapred.map.max.attempts
>>
>> 20
>>
>> 
>>
>> 
>>
>> mapred.reduce.max.attempts
>>
>> 20
>>
>> 
>>
>>
>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>>
>>
>> P.
>>
>>
>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>>
>>
>>
>>