Re: Different time ranges for different cfs when using TableInputFormat

2015-03-04 Thread Nick Dimiduk
Have a look at the versions of TableMapReduceUtil#initTableMapperJob that
take a ListScan instances. Does that provide what you're looking for?

-n

On Wed, Mar 4, 2015 at 6:05 AM, Dave Latham lat...@davelink.net wrote:

 That's not possible with HBase today.  The simplest thing may be to set
 your Scan time range to include both today's and yesterday's data and then
 filter down to only the data you want inside your map task.  Other
 possibilities would be creating a custom filter to do the filtering on the
 server side or even changing your input format or map task to run two
 concurrent scans with different familes/time ranges and merging the
 results.

 Being able to specify different time ranges for different column families
 is something I'd like to do as well.  Perhaps we'll get that into HBase at
 some point.

 Dave

 On Tue, Mar 3, 2015 at 5:23 PM, Felipe Sodré Silva fso...@gmail.com
 wrote:

  When using TableInputFormat to make HBase data available to map/reduce
  jobs we can use the settings SCAN_TIMERANGE_START and
  SCAN_TIMERANGE_END to specify a time range during scan.
  Is it possible to somehow have different time ranges for different
  column families?
 
  This is my problem:
  I have table X with column families cf1, cf2 and cf3. I want to run a
  map/reduce job on it using the most recent versions of columns in cf1
  and cf2, but I want to use yesterday's data from cf3. Is this
  possible?
 
  Felipe
 



Re: Different time ranges for different cfs when using TableInputFormat

2015-03-04 Thread Dave Latham
That's not possible with HBase today.  The simplest thing may be to set
your Scan time range to include both today's and yesterday's data and then
filter down to only the data you want inside your map task.  Other
possibilities would be creating a custom filter to do the filtering on the
server side or even changing your input format or map task to run two
concurrent scans with different familes/time ranges and merging the results.

Being able to specify different time ranges for different column families
is something I'd like to do as well.  Perhaps we'll get that into HBase at
some point.

Dave

On Tue, Mar 3, 2015 at 5:23 PM, Felipe Sodré Silva fso...@gmail.com wrote:

 When using TableInputFormat to make HBase data available to map/reduce
 jobs we can use the settings SCAN_TIMERANGE_START and
 SCAN_TIMERANGE_END to specify a time range during scan.
 Is it possible to somehow have different time ranges for different
 column families?

 This is my problem:
 I have table X with column families cf1, cf2 and cf3. I want to run a
 map/reduce job on it using the most recent versions of columns in cf1
 and cf2, but I want to use yesterday's data from cf3. Is this
 possible?

 Felipe