Duh. Thanks Ioannis for finding my dumb bug. I made hbase-2101 with your suggested fix. St.Ack
On Sat, Jan 9, 2010 at 10:31 AM, Ioannis Konstantinou <[email protected]>wrote: > The problem is in the class KeyValueSortReducer. > > When you add keyvalues to the treeset for sorting, you need to add keyvalue > clones instead of just references. What happens now, is that in every > iteration, the value that exists in the treeset gets replaced with the new > value. > > So, you need to replace line 41: map.add(kv) > with this line: map.add(kv.clone()) > > in this case, the treeset populates correcty. > > στις 9/1/2010 7:58 μμ, O/H stack έγραψε: > >> Something is up here. KVSR uses KeyValue.COMPARATOR which does: >> >> >> * Compare KeyValues. When we compare KeyValues, we only compare the >> Key >> * portion. This means two KeyValues with same Key but different Values >> are >> * considered the same as far as this Comparator is concerned. >> * Hosts a {...@link KeyComparator}. >> >> ... where Key in the above is the >> key/columnfamily/columnqualifier/timestamp/type combination. >> >> If we're only keeping the last value added, thats odd. It should be >> keeping >> them all since differing in column makes for a different key. >> >> Can you send us over a sample of the keyvalues that are getting conflated. >> Something is wrong. >> >> Thanks for reporting this. >> St.Ack >> >> On Sat, Jan 9, 2010 at 9:09 AM, Ioannis Konstantinou<[email protected] >> >wrote: >> >> >> >>> Hello, >>> >>> I am trying to bulk upload content to hbase using the instructions >>> provided >>> at >>> >>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description >>> : >>> I have a mapper that reads input and emmits KeyValue objects to be fed in >>> the KeyValueSortReducer. The mapper emmits a number of KeyValue objects >>> for >>> each row. For the same rowid, the KeyValue objects have different >>> columnids. >>> The problem is the following: when these KeyValue objects (that have the >>> same rowid but different colids in the same column family) reach the >>> reducer, the TreeSet used to sort KeyValues, keeps only the KeyValue that >>> gets last (it replaces all entries with the last one that reaches the >>> reducer), as the KeyValue.COMPARATOR compares only the rowid !!!!! >>> >>> Can I use a different Comparator??? KeyValue objects of the same rowid >>> must >>> be sorted before writing them in the Hfile, or this does not matter??? >>> >>> Thank you in advance for your time. >>> >>> >>> -- >>> Ioannis Konstantinou >>> Research Associate, Computing Systems Laboratory >>> National Technical University of Athens >>> Web:http://www.cslab.ntua.gr/~ikons >>> >>> >>> >>> >> >> > > -- > Ioannis Konstantinou > Research Associate, Computing Systems Laboratory > National Technical University of Athens > phone: +30 2107721544(internal 421) > Web:http://www.cslab.ntua.gr/~ikons > >
