Have a look at the versions of TableMapReduceUtil#initTableMapperJob that
take a ListScan instances. Does that provide what you're looking for?
-n
On Wed, Mar 4, 2015 at 6:05 AM, Dave Latham lat...@davelink.net wrote:
That's not possible with HBase today. The simplest thing may be to set
your Scan time range to include both today's and yesterday's data and then
filter down to only the data you want inside your map task. Other
possibilities would be creating a custom filter to do the filtering on the
server side or even changing your input format or map task to run two
concurrent scans with different familes/time ranges and merging the
results.
Being able to specify different time ranges for different column families
is something I'd like to do as well. Perhaps we'll get that into HBase at
some point.
Dave
On Tue, Mar 3, 2015 at 5:23 PM, Felipe Sodré Silva fso...@gmail.com
wrote:
When using TableInputFormat to make HBase data available to map/reduce
jobs we can use the settings SCAN_TIMERANGE_START and
SCAN_TIMERANGE_END to specify a time range during scan.
Is it possible to somehow have different time ranges for different
column families?
This is my problem:
I have table X with column families cf1, cf2 and cf3. I want to run a
map/reduce job on it using the most recent versions of columns in cf1
and cf2, but I want to use yesterday's data from cf3. Is this
possible?
Felipe