subject:"Timeout Errors while running Hadoop over Cassandra"

Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread Jairam Chandar

Hi folks,

We have a Cassandra 0.6.6 cluster running in production. We want to run
Hadoop (version 0.20.2) jobs over this cluster in order to generate
reports.
I modified the word_count example in the contrib folder of the cassandra
distribution. While the program is running fine for small datasets (in the
order of 100-200 MB) on small clusters (2 machines), it starts to give
errors while trying to run on a bigger cluster (5 machines) with much larger
dataset (400 GB). Here is the error that we get -

java.lang.RuntimeException: TimedOutException()
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:186)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:236)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:104)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:98)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: TimedOutException()
at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11094)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:628)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:602)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:164)
... 11 more




I came across this page on the Cassandra wiki -
http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the
ulimit and changing batch sizes. These did not help. Though the number of
successful map tasks increased, it eventually fails since the total number
of map tasks is huge.

Any idea on what could be causing this? The program we are running is a very
slight modification of the word_count example with respect to reading from
Cassandra. The only change being specific keyspace, columnfamily and
columns. The rest of the code for reading is the same as the word_count
example in the source code for Cassandra 0.6.6.

Thanks and regards,
Jairam Chandar

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread Aaron Morton

Whats happening in the cassandra server logs when you get these errors? Reading through the hadoop 0.6.6 code it looks like it creates a thrift client with an infinite timeout. So it may be an internode timeout, which is set in storage-conf.xml.AaronOn 13 Jan, 2011,at 07:40 AM, Jairam Chandar  wrote:Hi folks,We have a Cassandra 0.6.6 cluster running in production. We want to run Hadoop (version 0.20.2) jobs over this cluster in order to generate reports. I modified the word_count example in the contrib folder of the cassandra distribution. While the program is running fine for small datasets (in the order of 100-200 MB) on small clusters (2 machines), it starts to give errors while trying to run on a bigger cluster (5 machines) with much larger dataset (400 GB). Here is the error that we get - 
java.lang.RuntimeException: TimedOutException()
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:186)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:236)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:104)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:98)
	at org.apachehadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: TimedOutException()
	at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11094)
	at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:628)
	at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:602)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:164)
	... 11 more
I came across this page on the Cassandra wiki - http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the ulimit and changing batch sizes. These did not help. Though the number of successful map tasks increased, it eventually fails since the total number of map tasks is huge. 
Any idea on what could be causing this? The program we are running is a very slight modification of the word_count example with respect to reading from Cassandra. The only change being specific keyspace, columnfamily and columns. The rest of the code for reading is the same as the word_count example in the source code for Cassandra 0.6.6.
Thanks and regards,Jairam Chandar

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread mck

On Wed, 2011-01-12 at 18:40 +, Jairam Chandar wrote:
> Caused by: TimedOutException()

What is the exception in the cassandra logs?

~mck

-- 
"Don't use Outlook. Outlook is really just a security hole with a small
e-mail client attached to it." Brian Trosko | www.semb.wever.org |
www.sesat.no | www.finn.no | http://xss-http-filter.sf.net


signature.asc
Description: This is a digitally signed message part

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-12 Thread mck

On Wed, 2011-01-12 at 23:04 +0100, mck wrote:
> > Caused by: TimedOutException()
> 
> What is the exception in the cassandra logs? 

Or tried increasing rpc_timeout_in_ms?

~mck

-- 
"When there is no enemy within, the enemies outside can't hurt you."
African proverb | www.semb.wever.org | www.sesat.no | www.finn.no |
http://xss-http-filter.sf.net


signature.asc
Description: This is a digitally signed message part

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-13 Thread Jeremy Hanna

On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote:

> Hi folks,
> 
> We have a Cassandra 0.6.6 cluster running in production. We want to run 
> Hadoop (version 0.20.2) jobs over this cluster in order to generate reports. 
> I modified the word_count example in the contrib folder of the cassandra 
> distribution. While the program is running fine for small datasets (in the 
> order of 100-200 MB) on small clusters (2 machines), it starts to give errors 
> while trying to run on a bigger cluster (5 machines) with much larger dataset 
> (400 GB). Here is the error that we get - 
> 
> java.lang.RuntimeException: TimedOutException()
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:186)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:236)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:104)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>   at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: TimedOutException()
>   at 
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11094)
>   at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:628)
>   at 
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:602)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:164)
>   ... 11 more
> 

I wonder if messing with RpcTimeoutInMillis in storage-conf.xml would help. 

> 
> 
> 
> I came across this page on the Cassandra wiki - 
> http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the ulimit 
> and changing batch sizes. These did not help. Though the number of successful 
> map tasks increased, it eventually fails since the total number of map tasks 
> is huge. 
> 
> Any idea on what could be causing this? The program we are running is a very 
> slight modification of the word_count example with respect to reading from 
> Cassandra. The only change being specific keyspace, columnfamily and columns. 
> The rest of the code for reading is the same as the word_count example in the 
> source code for Cassandra 0.6.6.
> 
> Thanks and regards,
> Jairam Chandar

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-14 Thread Jairam Chandar

The cassandra logs strangely show no errors at the time of failure.
Changing the RPCTimeoutInMillis seemed to help. Though it slowed down the
job considerably, it seems to be finishing by changing the timeout value
to 1 min. Unfortunately, I cannot be sure if it will continue to work if
the data increases further. Hopefully will be upgrading to the recently
released final version of 0.7.0.

Thanks for all the help and suggestions.

Warm regards,
Jairam Chandar

On 13/01/2011 14:47, "Jeremy Hanna"  wrote:

>On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote:
>
>> Hi folks,
>> 
>> We have a Cassandra 0.6.6 cluster running in production. We want to run
>>Hadoop (version 0.20.2) jobs over this cluster in order to generate
>>reports. 
>> I modified the word_count example in the contrib folder of the
>>cassandra distribution. While the program is running fine for small
>>datasets (in the order of 100-200 MB) on small clusters (2 machines), it
>>starts to give errors while trying to run on a bigger cluster (5
>>machines) with much larger dataset (400 GB). Here is the error that we
>>get - 
>> 
>> java.lang.RuntimeException: TimedOutException()
>> at 
>>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni
>>t(ColumnFamilyRecordReader.java:186)
>> at 
>>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN
>>ext(ColumnFamilyRecordReader.java:236)
>> at 
>>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN
>>ext(ColumnFamilyRecordReader.java:104)
>> at 
>>com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera
>>tor.java:135)
>> at 
>>com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:
>>130)
>> at 
>>org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF
>>amilyRecordReader.java:98)
>> at 
>>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map
>>Task.java:423)
>> at 
>>org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: TimedOutException()
>> at 
>>org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan
>>dra.java:11094)
>> at 
>>org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan
>>dra.java:628)
>> at 
>>org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j
>>ava:602)
>> at 
>>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni
>>t(ColumnFamilyRecordReader.java:164)
>> ... 11 more
>> 
>
>I wonder if messing with RpcTimeoutInMillis in storage-conf.xml would
>help. 
>
>> 
>> 
>> 
>> I came across this page on the Cassandra wiki -
>>http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the
>>ulimit and changing batch sizes. These did not help. Though the number
>>of successful map tasks increased, it eventually fails since the total
>>number of map tasks is huge.
>> 
>> Any idea on what could be causing this? The program we are running is a
>>very slight modification of the word_count example with respect to
>>reading from Cassandra. The only change being specific keyspace,
>>columnfamily and columns. The rest of the code for reading is the same
>>as the word_count example in the source code for Cassandra 0.6.6.
>> 
>> Thanks and regards,
>> Jairam Chandar
>

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-19 Thread Jairam Chandar

I was able to workaround this problem by modifying the
ColumnFamilyRecordReader class from the org.apache.cassandra.hadoop package.
Since the errors where TimeoutException, I added sleep and retry logic
around
rows = client.get_range_slices(keyspace,

 new ColumnParent(cfName),

predicate,

keyRange,

ConsistencyLevel.ONE);

in RowIterator.maybeInit() function. And it works. :)

Check out http://pastebin.com/FxV4Gw5U for the modified maybeInit()
function. Please note that I also made slight modification to the
ConfigHelper to pass-in the sleep time and the max retry count. The

ConfigHelper.getRetryCount(conf) and ConfigHelper.getSleepTime(conf);

are not part of the original ConfigHelper.

Hope this helps anyone facing similar problems.

Regards,
Jairam

On 14 January 2011 11:59, Jairam Chandar  wrote:

> The cassandra logs strangely show no errors at the time of failure.
> Changing the RPCTimeoutInMillis seemed to help. Though it slowed down the
> job considerably, it seems to be finishing by changing the timeout value
> to 1 min. Unfortunately, I cannot be sure if it will continue to work if
> the data increases further. Hopefully will be upgrading to the recently
> released final version of 0.7.0.
>
> Thanks for all the help and suggestions.
>
> Warm regards,
> Jairam Chandar
>
> On 13/01/2011 14:47, "Jeremy Hanna"  wrote:
>
> >On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote:
> >
> >> Hi folks,
> >>
> >> We have a Cassandra 0.6.6 cluster running in production. We want to run
> >>Hadoop (version 0.20.2) jobs over this cluster in order to generate
> >>reports.
> >> I modified the word_count example in the contrib folder of the
> >>cassandra distribution. While the program is running fine for small
> >>datasets (in the order of 100-200 MB) on small clusters (2 machines), it
> >>starts to give errors while trying to run on a bigger cluster (5
> >>machines) with much larger dataset (400 GB). Here is the error that we
> >>get -
> >>
> >> java.lang.RuntimeException: TimedOutException()
> >> at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni
> >>t(ColumnFamilyRecordReader.java:186)
> >> at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN
> >>ext(ColumnFamilyRecordReader.java:236)
> >> at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN
> >>ext(ColumnFamilyRecordReader.java:104)
> >> at
> >>com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera
> >>tor.java:135)
> >> at
> >>com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:
> >>130)
> >> at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF
> >>amilyRecordReader.java:98)
> >> at
> >>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map
> >>Task.java:423)
> >> at
> >>org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >> Caused by: TimedOutException()
> >> at
> >>org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan
> >>dra.java:11094)
> >> at
> >>org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan
> >>dra.java:628)
> >> at
> >>org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j
> >>ava:602)
> >> at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni
> >>t(ColumnFamilyRecordReader.java:164)
> >> ... 11 more
> >>
> >
> >I wonder if messing with RpcTimeoutInMillis in storage-conf.xml would
> >help.
> >
> >>
> >>
> >>
> >> I came across this page on the Cassandra wiki -
> >>http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the
> >>ulimit and changing batch sizes. These did not help. Though the number
> >>of successful map tasks increased, it eventually fails since the total
> >>number of map tasks is huge.
> >>
> >> Any idea on what could be causing this? The program we are running is a
> >>very slight modification of the word_count example with respect to
> >>reading from Cassandra. The only change being specific keyspace,
> >>columnfamily and columns. The rest of the code for reading is the same
> >>as the word_count example in the source code for Cassandra 0.6.6.
> >>
> >> Thanks and regards,
> >> Jairam Chandar
> >
>
>
>

Timeout Errors while running Hadoop over Cassandra

Re: Timeout Errors while running Hadoop over Cassandra

Re: Timeout Errors while running Hadoop over Cassandra

Re: Timeout Errors while running Hadoop over Cassandra

Re: Timeout Errors while running Hadoop over Cassandra

Re: Timeout Errors while running Hadoop over Cassandra

Re: Timeout Errors while running Hadoop over Cassandra

7 matches

Site Navigation

Mail list logo

Footer information