Does it make sense to add some kind of throttle capability on the 
ColumnFamilyRecordReader for Hadoop?

If I have 60 or so Map tasks running at the same time when the cluster is 
already heavily loaded with OLTP operations, I can get some decreased on-line 
performance
that may not be acceptable.  (I'm loading an 8 node cluster with 2000 TPS.)  By 
default my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) has 
8 Map tasks per node making the get_range_slices call, based on what the 
ColumnFamilyInputFormat has calculated from my token ranges. 
I can increase the inputSplitSize  (ConfigHelper.setInputSplitSIze()) so that 
there 
is only one Map task per node, and this helps quite a bit.

But is it reasonable to provide a configurable sleep to cause a wait in between 
smaller size range queries?  That would stretch out the Map time
and let the OLTP processing be less affected.


--Michael


Reply via email to