Duh.  Thanks Ioannis for finding my dumb bug.  I made hbase-2101 with your
suggested fix.
St.Ack

On Sat, Jan 9, 2010 at 10:31 AM, Ioannis Konstantinou
<[email protected]>wrote:

> The problem is in the class KeyValueSortReducer.
>
> When you add keyvalues to the treeset for sorting, you need to add keyvalue
> clones instead of just references. What happens now, is that in every
> iteration, the value that exists in the treeset gets replaced with the new
> value.
>
> So, you need to replace line 41: map.add(kv)
> with this line: map.add(kv.clone())
>
> in this case, the treeset populates correcty.
>
> στις 9/1/2010 7:58 μμ, O/H stack έγραψε:
>
>> Something is up here.  KVSR uses KeyValue.COMPARATOR which does:
>>
>>
>>    * Compare KeyValues.  When we compare KeyValues, we only compare the
>> Key
>>    * portion.  This means two KeyValues with same Key but different Values
>> are
>>    * considered the same as far as this Comparator is concerned.
>>    * Hosts a {...@link KeyComparator}.
>>
>> ... where Key in the above is the
>> key/columnfamily/columnqualifier/timestamp/type combination.
>>
>> If we're only keeping the last value added, thats odd.  It should be
>> keeping
>> them all since differing in column makes for a different key.
>>
>> Can you send us over a sample of the keyvalues that are getting conflated.
>>  Something is wrong.
>>
>> Thanks for reporting this.
>> St.Ack
>>
>> On Sat, Jan 9, 2010 at 9:09 AM, Ioannis Konstantinou<[email protected]
>> >wrote:
>>
>>
>>
>>> Hello,
>>>
>>> I am trying to bulk upload content to hbase using the instructions
>>> provided
>>> at
>>>
>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
>>> :
>>> I have a mapper that reads input and emmits KeyValue objects to be fed in
>>> the KeyValueSortReducer. The mapper emmits a number of KeyValue objects
>>> for
>>> each row. For the same rowid, the KeyValue objects have different
>>> columnids.
>>>  The problem is the following: when these KeyValue objects (that have the
>>> same rowid but different colids in the same column family) reach the
>>> reducer, the TreeSet used to sort KeyValues, keeps only the KeyValue that
>>> gets last (it replaces all entries with the last one that reaches the
>>> reducer), as the KeyValue.COMPARATOR compares only the rowid !!!!!
>>>
>>> Can I use a different Comparator??? KeyValue objects of the same rowid
>>> must
>>> be sorted before writing them in the Hfile, or this does not matter???
>>>
>>> Thank you in advance for your time.
>>>
>>>
>>> --
>>> Ioannis Konstantinou
>>> Research Associate, Computing Systems Laboratory
>>> National Technical University of Athens
>>> Web:http://www.cslab.ntua.gr/~ikons
>>>
>>>
>>>
>>>
>>
>>
>
> --
> Ioannis Konstantinou
> Research Associate, Computing Systems Laboratory
> National Technical University of Athens
> phone: +30 2107721544(internal 421)
> Web:http://www.cslab.ntua.gr/~ikons
>
>

Reply via email to