Ok. Thanks.  Looks like a dumb bug in the HFileOutputFormat.  I'll check
tomorrow.  Thanks for your patience.
St.Ack

On Sat, Nov 7, 2009 at 11:06 PM, Murali Krishna. P
<[email protected]>wrote:

> no, we are not dropping it. It is going to the previous region's last
> entry. So,the  last key in inclusive but firstkey is exclusive.
>
> look at my test code:
>                        HFile.Reader reader = new HFile.Reader(fs, new
> Path(args[0]), null, true);
>                        reader.loadFileInfo();
>                        System.out.println("FirstKey:" + new
> String(reader.getFirstKey()));
>                        System.out.println("LastKey:" + new
> String(reader.getLastKey()));
>                        HFileScanner l = reader.getScanner();
>                        l.seekTo(reader.getLastKey());
>                        KeyValue t = l.getKeyValue();
>                        System.out.println("last key:" + t.getKeyString() +
> " last value length:" + t.getValueLength() + " value:" + t.getValue());
> and output is:
>
> FirstKey:00000d7d4f36c112imagevalue�������
> LastKey:333305184e0f7c3eimagevalue�������
> last
> key:\x00\x10333305184e0f7c3e\x05imagevalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04
> last value length:3398 value:[...@8888e6c
>
>  Thanks,
> Murali Krishna
>
>
>
>
> ________________________________
> From: stack <[email protected]>
> To: [email protected]
> Sent: Sun, 8 November, 2009 12:21:48 PM
> Subject: Re: Issue with bulk loader tool
>
> So, do you think we are dropping the first key in the region?
> Thanks,
> St.Ack
>
> On Sat, Nov 7, 2009 at 9:17 PM, Murali Krishna. P <[email protected]
> >wrote:
>
> > No, the first key is 6666909d611e8d7e for the region which says startKey
> is
> > 666629fe4378c096.
> > (this is actually the next key in the order).
> >
> > HFile -p:-
> > Scanning -> /hbase/test12/336573097/image/2362265315474952099
> > K: \x00\x106666909d611e8d7e\x05imagevalue\x7F\x..
> >
> >  HFileUtil /hbase/test12/336573097/image/2362265315474952099 :-
> > FirstKey:6666909d611e8d7eimagevalue�������
> > LastKey:99998c8f356b0d86imagevalue�������
> >
> > But the scan .META. shows the start key as 666629fe4378c096. (attached
> > .META.)
> >
> > This seems to be the case for all the regions. (the actual firstKey is
> next
> > one from claimed firstKey)
> >
> > I am on hadoop0.20.0
> >
> > Thanks,
> > Murali Krishna
> >
> >
> > ------------------------------
> > *From:* stack <[email protected]>
> > *To:* [email protected]
> > *Sent:* Sun, 8 November, 2009 4:30:15 AM
> >
> > *Subject:* Re: Issue with bulk loader tool
> >
> > Its what Lars says Murali, a region's startkey is inclusive and its
> endkey
> > exclusive.  If it exists, it should be in the region has it for a start
> key
> > (It will not be duplicated in both).
> >
> > For .META., there is usually only one Region instance in a .META. table.
> > Its startkey will be the empty key so its not suprirising its first key
> is
> > different from the empty key.  What do you see when you look at the
> second
> > region in your just uploaded table?  I'd expect the key 666629fe4378c096
> to
> > be first in the region whose startkey is 666629fe4378c096.
> >
> > Thanks for figuring MAPREDUCE-565 could trip us up.  Your hadoop is not
> > 0.20.1?
> >
> > Yours,
> > St.Ack
> >
> >
> >
> > On Sat, Nov 7, 2009 at 7:58 AM, Murali Krishna. P <
> [email protected]
> > >wrote:
> >
> > > Thanks Lars for the clarification,
> > >    But where does the record recide ? Is it duplicated to both the
> > regions
> > > ?? When I use HFile.Reader, the first key in the second region is
> > different.
> > > May be this behaviour(overlap) is only in .META. ?
> > >    The issue is that when I request for that boundary record, it is
> > loging
> > > the next region.
> > >
> > > 09/11/07 07:52:05 DEBUG client.HConnectionManager$TableServers: Cached
> > > location address: 76.13.20.58:60020, regioninfo: REGION => {NAME =>
> > > '.META.,,1', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE
> > =>
> > > {{NAME => '.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384',
> > > FAMILIES => [{NAME => 'historian', VERSIONS => '2147483647',
> COMPRESSION
> > =>
> > > 'NONE', TTL => '604800', BLOCKSIZE => '8192', IN_MEMORY => 'false',
> > > BLOCKCACHE => 'false'}, {NAME => 'info', VERSIONS => '10', COMPRESSION
> =>
> > > 'NONE', TTL => '2147483647', BLOCKSIZE => '8192', IN_MEMORY => 'false',
> > > BLOCKCACHE => 'false'}]}}
> > > 09/11/07 07:52:05 DEBUG client.HConnectionManager$TableServers: Cached
> > > location address: 76.13.20.114:60020, regioninfo: REGION => {NAME =>
> > > 'test12,333305184e0f7c3e,1257515988652', STARTKEY =>
> '333305184e0f7c3e',
> > > ENDKEY => '666629fe4378c096', ENCODED => 170637321, TABLE => {{NAME =>
> > > 'test12', FAMILIES => [{NAME => 'image', VERSIONS => '3', COMPRESSION
> =>
> > > 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
> 'false',
> > > BLOCKCACHE => 'true'}]}}
> > >
> > >  Thanks,
> > > Murali Krishna
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Lars George <[email protected]>
> > > To: "[email protected]" <[email protected]>
> > > Sent: Sat, 7 November, 2009 9:19:37 PM
> > > Subject: Re: Issue with bulk loader tool
> > >
> > > Hi Murali,
> > >
> > > What you see is normal the last keys do indeed overlap. The last key of
> a
> > > region is exclusive and marks the first key of the subsequent region.
> > >
> > > Lars
> > >
> > > On Nov 7, 2009, at 9:05, "Murali Krishna. P" <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > > I got it resolved.
> https://issues.apache.org/jira/browse/HADOOP-5750was
> > > causing this, even though I supplied a custom total ordering
> partitioner,
> > it
> > > didnt use that.
> > > >
> > > >
> > > >  Now the regions looks properly sorted, but facing a new issue. The
> > last
> > > key of the each region is not retrievable. The table.jsp  page shows
> the
> > > start and end key wrongly.
> > > > for eg, take first 2 regions
> > > > region1: start : end: 333305184e0f7c3e
> > > > region2: start: 333305184e0f7c3e end: 666629fe4378c096
> > > >
> > > > The end key of first region = start key of second ??
> > > >
> > > > If I get the first and last key using HFile.Reader, it shows as
> > follows:
> > > >
> > > > HFileUtil /hbase/test12/98766318/image/9052388247118781160
> > > > FirstKey:00000d7d4f36c112imagevalue�������
> > > > LastKey:333305184e0f7c3eimagevalue�������
> > > >
> > > > HFileUtil /hbase/test12/170637321/image/7602871928600243730
> > > > FirstKey:33338d45cc2491b8imagevalue�������
> > > > LastKey:666629fe4378c096imagevalue�������
> > > >
> > > > So, according to this first key of 2nd region is 33338d45cc2491b8 not
> > > 333305184e0f7c3e which is correct!
> > > >
> > > > Now when I do a get on 333305184e0f7c3e with debug on, it is loading
> > the
> > > second region which is wrong!
> > > >
> > > > Some thing went wrong with the index?
> > > >
> > > > Thanks,
> > > > Murali Krishna
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
> > > > From: stack <[email protected]>
> > > > To: [email protected]
> > > > Sent: Sat, 7 November, 2009 6:26:03 AM
> > > > Subject: Re: Issue with bulk loader tool
> > > >
> > > > On Fri, Nov 6, 2009 at 12:58 AM, Murali Krishna. P
> > > > <[email protected]>wrote:
> > > >
> > > >> Hi,
> > > >> If I increase hbase.hregion.max.filesize so that all the records
> holds
> > > in
> > > >> one region (and one reducer ), all the records as retrievable. If
> one
> > > >> reducer creates multiple hfile or multiple reducer creates one hfile
> > > each,
> > > >> the problem occurs.
> > > >>
> > > >>
> > > >
> > > > Multiple hfiles in a region?  Or are you saying if a reducer creates
> > > > multiple regions?  There is supposed to be one file per region only
> > when
> > > > done.
> > > >
> > > > Thanks for digging in,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > >> Does that give any clue?
> > > >>
> > > >> Thanks,
> > > >> Murali Krishna
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> ________________________________
> > > >> From: Murali Krishna. P <[email protected]>
> > > >> To: [email protected]
> > > >> Sent: Thu, 5 November, 2009 6:34:20 PM
> > > >> Subject: Re: Issue with bulk loader tool
> > > >>
> > > >> Hi Stack,
> > > >> Sorry, could not look into this last week...
> > > >>
> > > >> I got problem with the Htable interface as well. Some records i am
> not
> > > >> retrieve from Htable as well.
> > > >> I lost the old table, but reproduced the problem with a different
> > table.
> > > >>
> > > >> I cannot send the region since it is very huge. will try to give as
> > much
> > > >> info as possible here :)
> > > >>
> > > >> There are total 5 regions as below in that table:
> > > >> Name
> > > >>
> > > >> Encoded Name
> > > >> Start Key
> > > >> End Key
> > > >> test1,,1257414794600
> > > >> 106817540
> > > >> fffe9c7f87c8332a
> > > >> test1,fffe9c7f87c8332a,1257414794616
> > > >> 1346846599 fffe9c7f87c8332a fffebe279c0ac4d2
> > > >> test1,fffebe279c0ac4d2,1257414794628
> > > >> 1835851728 fffebe279c0ac4d2 fffec418284d6fbc
> > > >> test1,fffec418284d6fbc,1257414794637
> > > >> 1078205908 fffec418284d6fbc fffef7a12ea22498
> > > >> test1,fffef7a12ea22498,1257414794647
> > > >> 1515378663 fffef7a12ea22498
> > > >>
> > > >> I am looking for a key, say 000011d1bc8cd6fe . This should be in the
> > > first
> > > >> region ?
> > > >>
> > > >> using hfile tool,
> > > >> org.apache.hadoop.hbase.io.hfile.HFile -k -f
> > > >> /hbase/test1/106817540/image/3828859735461759684 -v -m -p |  grep
> > > >> 000011d1bc8cd6fe
> > > >> The first region doesn't have it. Not sure what happened to that
> > record.
> > > >>
> > > >> For a working key, it gives the record properly as below
> > > >> K:
> > > >>
> > >
> >
> \x00\x100003bdd08ca88ee2\x05imagevalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x04
> > > >> V: \xFF...
> > > >>
> > > >> Please let me know if you need more information
> > > >>
> > > >> Thanks,
> > > >> Murali Krishna
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> ________________________________
> > > >> From: stack <[email protected]>
> > > >> To: [email protected]
> > > >> Sent: Mon, 2 November, 2009 11:05:43 PM
> > > >> Subject: Re: Issue with bulk loader tool
> > > >>
> > > >> Murali:
> > > >>
> > > >> Any developments worth mentioning?
> > > >>
> > > >> St.Ack
> > > >>
> > > >>
> > > >> On Fri, Oct 30, 2009 at 10:14 AM, stack <[email protected]> wrote:
> > > >>
> > > >>> That is interesting.  It'd almost point to a shell issue.  Enable
> > DEBUG
> > > >> so
> > > >>> client can see it.  Then rerun shell.  Is it at least loading the
> > right
> > > >>> region?  (The regions start and end keys span the asked for key?).
>  I
> > > >> took a
> > > >>> look at your attached .META. scan.  All looks good there.  The
> region
> > > >>> specifications look right.  If you want to bundle up the region
> that
> > is
> > > >>> failing -- the one that the failing key comes out of, I can take a
> > look
> > > >>> here.  You could also try playing with the HFile tool: ./bin/hbase
> > > >>> org.apache.hadoop.hbase.io.hfile.HFile.  Run the former and it'll
> > > output
> > > >>> usage.  You should be able to get it to dump content of the region
> > (You
> > > >> need
> > > >>> to supply flags like -v to see actual keys to the HFile tool else
> it
> > > just
> > > >>> runs its check silently).    Check for your key.  Check things like
> > > >>> timestamp on it.  Maybe its 100 years in advance of now or
> something?
> > > >>>
> > > >>> Yours,
> > > >>> St.Ack
> > > >>>
> > > >>>
> > > >>> On Fri, Oct 30, 2009 at 9:01 AM, Murali Krishna. P <
> > > >> [email protected]
> > > >>>> wrote:
> > > >>>
> > > >>>> Attached ".META"
> > > >>>>
> > > >>>> Interesting, I was able to get the row from HTable via java code.
> > But
> > > >> from
> > > >>>> the shell, still getting following
> > > >>>>
> > > >>>> hbase(main):004:0> get 'TestTable2', 'ffffef95bcbf2638'
> > > >>>> 0 row(s) in 1.2250 seconds
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Murali Krishna
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Murali Krishna
> > > >>>>
> > > >>>>
> > > >>>> ------------------------------
> > > >>>> *From:* stack <[email protected]>
> > > >>>> *To:* [email protected]
> > > >>>> *Sent:* Fri, 30 October, 2009 8:39:46 PM
> > > >>>> *Subject:* Re: Issue with bulk loader tool
> > > >>>>
> > > >>>> Can you send a listing of ".META."?
> > > >>>>
> > > >>>> hbase> scan ".META."
> > > >>>>
> > > >>>> Also, can you bring a region down from hdfs, tar and gzip it, and
> > then
> > > >> put
> > > >>>> it someplace I can pull so I can take a look?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> St.Ack
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Oct 30, 2009 at 3:31 AM, Murali Krishna. P
> > > >>>> <[email protected]>wrote:
> > > >>>>
> > > >>>>> Hi guys,
> > > >>>>> I created a table according to hbase-48. A mapreduce job which
> > > >> creates
> > > >>>>> HFiles and then used loadtable.rb script to create the table.
> > > >> Everything
> > > >>>>> worked fine and i was able to scan the table. But when i do a get
> > for
> > > >> a
> > > >>>> key
> > > >>>>> displayed in the scan output, it is not retrieving the row. shell
> > > says
> > > >> 0
> > > >>>>> row.
> > > >>>>>
> > > >>>>> I tried using one reducer to ensure total ordering, but still
> same
> > > >>>> issue.
> > > >>>>>
> > > >>>>>
> > > >>>>> My mapper is like:
> > > >>>>> context.write(new
> > > >>>>> ImmutableBytesWritable(((Text)key).toString().getBytes()), new
> > > >>>>> KeyValue(((Text)key).toString().getBytes(), "family1".getBytes(),
> > > >>>>>                  "column1".getBytes(), getValueBytes()));
> > > >>>>>
> > > >>>>>
> > > >>>>> Please help me investigate this.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Murali Krishna
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
>

Reply via email to