Re: MR job "randomly" scans up thousands of rows less than the it should.

Cosmin Lehene Mon, 06 Feb 2012 08:25:50 -0800

Thanks Ted!

I wonder if it would make more sense to port it to 0.90.X or upgrade to
0.92.


Cosmin

On 2/2/12 5:03 PM, "Ted Yu" <[email protected]> wrote:

>HBASE-4838 ports HBASE-2856 to 0.92
>
>FYI
>
>On Thu, Feb 2, 2012 at 4:46 PM, Cosmin Lehene <[email protected]> wrote:
>
>> (sorry for the damaged subject :))
>>
>>
>> Hey Jon,
>> We have two column families.
>> There are no filters and there's a full table scan. We're not skipping
>> rows.
>> I did see however a single time that we had one qualifier "fault" in the
>> job counters (it was missing, and it wasn't supposed to be missing).
>> However that was only once and it doesn't happen when we encounter
>>missing
>> rows.
>>
>> We're getting this behavior consistently although I couldn't figure a
>>way
>> to reproduce it. I'll try running multiple instances of the job in
>> parallel to figure out if that would affect the outcome.
>> I'll probably have to add more debugging for the affected rows and dig
>> deeper.
>>
>> HBASE-2856 is a pretty large issue - do you think it could be related to
>> what I'm seeing? If so it could help me reproduce it.
>>
>> Thanks,
>> Cosmin
>>
>>
>>
>>
>> On 2/1/12 11:30 PM, "Jonathan Hsieh" <[email protected]> wrote:
>>
>> >Cosmin,
>> >
>> >How many column families to you have in this table?   Are you using any
>> >filters in you HBase scans?  Are you using skip rows that may not have
>> >qualifiers present?
>> >
>> >There are a few known issues with multi-CF atomicity and a recent one
>> >about
>> >flushes that may be related to this problem.  There HBASE-2856, a fix
>> >having to do with flushes which is pretty intricate and only in 0.92.
>> >
>> >Jon.
>> >
>> >On Wed, Feb 1, 2012 at 8:46 PM, Cosmin Lehene <[email protected]>
>>wrote:
>> >
>> >> We have a MR job that runs every few minutes on some time series data
>> >> which is continuously updated (never deleted).
>> >> Every few (in the range of tens to hundreds) runs the map task that
>> >>covers
>> >> the last region will get fewer input records (off by 500-5000 rows)
>> >>without
>> >> any splits happening. This lower number of input records could
>>persist
>> >>for
>> >> a few MR runs, but will eventually get back to the "correct" value.
>> >>
>> >> This drop can be seen both in the "map input records" metric but it's
>> >> correlated with the metrics that get computed by the MR job (so it's
>> >>not a
>> >> MR counter bug).
>> >>
>> >> There are no exceptions in the MR job, or in the region server and
>>this
>> >> doesn't seem to be correlated with any compaction, split or region
>> >>movement.
>> >> The only "variable" in this scenario is that new data gets injected
>> >> continuously (and the actual MR job which is idempotent)
>> >>
>> >> This entire puzzle takes place on  HBase 0.90.5 ish (12 dec 2011) on
>> >>top
>> >> of Hadoop cdh3u2.
>> >>
>> >> Cosmin
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >--
>> >// Jonathan Hsieh (shay)
>> >// Software Engineer, Cloudera
>> >// [email protected]
>>
>>

Re: MR job "randomly" scans up thousands of rows less than the it should.

Reply via email to