Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "HadoopSupport" page has been changed by jeremyhanna:
https://wiki.apache.org/cassandra/HadoopSupport?action=diff&rev1=58&rev2=59

  If you are running into timeout exceptions, you might need to tweak one or 
both of these settings:
  
   * Each input split is divided into sequential batches of rows requested at a 
time from Cassandra.  This is the '''cassandra.range.batch.size''' property and 
it defaults to 4096.  If you are experiencing timeouts, you might first try to 
reduce the batch size so that it can more easily complete the request within 
the timeout.  This is either specified in your hadoop configuration or using 
`org.apache.cassandra.hadoop.ConfigHelper.setRangeBatchSize`.
-  * Starting in Cassandra 1.2, there is range request specific timeout called 
'''range_request_timeout_in_ms''' in the cassandra.yaml.  Hadoop will request 
data in sequential batches and the request has to complete within this timeout. 
 Prior to Cassandra 1.2, you're can set the general '''rpc_timeout_in_ms''' 
higher, which affects timeouts for reads, writes, and truncate operations in 
addition to range requests.
+  * Starting in Cassandra 1.2, there is range request specific timeout called 
'''range_request_timeout_in_ms''' in the cassandra.yaml.  Hadoop requests data 
in sequential batches and each request has to complete within this timeout.  
Prior to Cassandra 1.2, you're can set the general '''rpc_timeout_in_ms''' 
higher, which affects timeouts for reads, writes, and truncate operations in 
addition to range requests.
  
  If you still see timeout exceptions with resultant failed jobs and/or 
blacklisted tasktrackers, there are settings that can give Cassandra more 
latitude before failing the jobs.  An example of usage (in either the job 
configuration or tasktracker mapred-site.xml):
  

Reply via email to