Re: number of tablets and hbase table size

Jean-Daniel Cryans Wed, 02 Sep 2009 07:27:28 -0700

Well you configured it in hbase.rootdir, something like /hbase so you
need to do "./bin/hadoop dfs -ls /hbase"


J-D

On Wed, Sep 2, 2009 at 8:30 AM, Xine Jar<[email protected]> wrote:
> :)
>
> Since I am not seeing neither the ROOT nor the METADATA I am obviously on
> the wrong path. I thought it should be seen in the DFS where a mapreduce
> program takes its input file from and stores its output file. and the
> default for me is:
>
> *pc150:~/Desktop/hbase-0.19.3 # /root/Desktop/hadoop-0.19.1/bin/hadoop dfs
> -ls
> Found 2 items
> drwxr-xr-x   - root supergroup          0 2009-08-31 22:21 /user/root/input
> drwxr-xr-x   - root supergroup          0 2009-09-02 16:02 /user/root/output
>
> *If there is another path could you please tell me where is it configured?
> So that I can check it?!!!
>
> Thank you
>
>
> On Wed, Sep 2, 2009 at 12:28 PM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> Same drill.
>>
>> J-D
>>
>> On Wed, Sep 2, 2009 at 5:51 AM, Xine Jar<[email protected]> wrote:
>> > Hallo,
>> > The theoretical concept of the table is clear for me. I am aware that the
>> > writes are kept in memory in a buffer called memtable and whenever this
>> > buffer reaches a threshold, the memtable is automatically flushed to the
>> > disk.
>> >
>> > Now I have tried to flush the table by executing the following:
>> >
>> > *hbase(main):001:0> flush 'myTable'
>> > 0 row(s) in 0.2019 seconds
>> >
>> > hbase(main):002:0> describe 'myTable'
>> > {NAME => 'myTable', FAMILIES => [{NAME => 'cf', COMPRESSION => 'NONE',
>> > VERSIONS => '3', LENGTH => '2147483647'
>> > , TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}
>> > *
>> > Q1-the expression "0 row(s) in 0.2019" means that it did not flush
>> > anything?!!
>>
>> Nah it's just the way we count the rows we show in the shell. In this
>> case we did not increment some counter so it shows "0 row", so it's a
>> UI bug. BTW describing your table won't tell you how many rows you
>> have or how many are still kept in the memtable.
>>
>> >
>> > Q2- IN_MEMORY=FALSE means that the table is not in memory? so is it in
>> the
>> > disk?!!! If it is so, I still cannot see it in the DFS when executing
>> > "bin/hadoop dfs -ls".
>>
>> This is a family-scope property that tell HBase to keep it always in
>> RAM (but also on disk, it's not ephemeral). In your case, that means
>> that HBase shouldn't do anything in particular for that family.
>>
>> Are you sure you are doing a ls at the right place in the filesystem?
>> Do you see the META and ROOT folder? Is there any data in your table?
>> You can do a "count" in the shell to make sure.
>>
>> >
>> >
>> > Thank you for taking look at that
>> >
>> > Regards,
>> > CJ
>> >
>> > On Tue, Sep 1, 2009 at 7:13 PM, Jean-Daniel Cryans <[email protected]
>> >wrote:
>> >
>> >> Inline.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Sep 1, 2009 at 1:05 PM, Xine Jar<[email protected]>
>> wrote:
>> >> > Thank you,
>> >> >
>> >> >  while the answers on Q3 and Q4 were clear enough I still have some
>> >> problems
>> >> > with the first two questions.
>> >>
>> >> Good
>> >>
>> >> >
>> >> > -which entry in the hbase-default.xml allows me to check the size of a
>> >> > tablet?
>> >>
>> >> Those are configuration parameters, not commands. A region will split
>> >> when a family gets that size. See
>> >> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion for more
>> >> info on splitting.
>> >>
>> >> >
>> >> > -In hadoop, I used to copy a file to the DFS by doing "bin/hadoop dfs
>> >> > -copyFromLocal filesource fileDFS".
>> >> >  Having this file in the DFS I could list it "bin/hadoop dfs -ls" and
>> >> check
>> >> > its size by doing "bin/hadoop dfs -du fileDFS"
>> >> >  But when I create an hbase table, this table does not appear in the
>> DFS.
>> >> > Therefore the latter command gives an error it cannot find
>> >> >  the table!!  So how can I point to the folder of the table?
>> >>
>> >> Just make sure the table is flushed to disk, the writes are kept in
>> >> memory as described in the link I pasted for the previous question.
>> >> You can force that by going in the shell and issuing "flush 'table'"
>> >> where table replaced with the name of your table.
>> >>
>> >> >
>> >> > Regards,
>> >> > CJ
>> >> >
>> >> >
>> >> > On Tue, Sep 1, 2009 at 5:00 PM, Jean-Daniel Cryans <
>> [email protected]
>> >> >wrote:
>> >> >
>> >> >> Anwers inline.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Tue, Sep 1, 2009 at 10:53 AM, Xine Jar<[email protected]>
>> >> wrote:
>> >> >> > Hallo,
>> >> >> > I have a cluster of 6 nodes running hadoop0.19.3 and hbase 0.19.1.
>> I
>> >> have
>> >> >> > managed to write small programs to test the settings and everything
>> >> seems
>> >> >> to
>> >> >> > be fine.
>> >> >> >
>> >> >> > I wrote a mapreduce program reading a small hbase table (100 rows,
>> one
>> >> >> > familiy colum, 6 columns) and summing some values. In my opinion
>> the
>> >> job
>> >> >> is
>> >> >> > slow, it
>> >> >> > is taking 19sec. I would like to look closer what is going, if the
>> >> table
>> >> >> is
>> >> >> > plit into tablets or not ...Therefore I appreciate if someone can
>> >> answer
>> >> >> my
>> >> >> > following questions:
>> >> >>
>> >> >> With that size, that's expected. You would be better off scanning
>> your
>> >> >> table directly instead, MapReduce has a startup cost and 19 seconds
>> >> >> isn't that much.
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> > *Q1 -Does  the value of "hbase.hregion.max.filesize" in the
>> >> >> > hbase-default.xml indicate the maximum size of a tablet in bytes?
>> >> >>
>> >> >> It's the maximum size of a family (in a region) in bytes.
>> >> >>
>> >> >> >
>> >> >> > Q2- How can I know the size of the hbase table I have created? (I
>> >> guess
>> >> >> the
>> >> >> > "Describe" command from the shell does not provide it)
>> >> >>
>> >> >> Size as in disk space? You could use the hadoop dfs -du command on
>> >> >> your table's folder.
>> >> >>
>> >> >> >
>> >> >> > Q3- Is there a way to know the real number of tablets constituting
>> my
>> >> >> table?
>> >> >>
>> >> >> In the Master's web UI, click on the name of your table. If you want
>> >> >> to do that programmatically, you can indirectly do it by calling
>> >> >> HTable.getEndKeys() and the size of that array is the number of
>> >> >> regions.
>> >> >>
>> >> >> >
>> >> >> > Q4- Is there a way to get more information on the tablets handeled
>> by
>> >> >> each
>> >> >> > regionserver? (their number, the rows constituting each tablet)   *
>> >> >>
>> >> >> In the Master's web UI, click on the region server you want info for.
>> >> >> Getting the number of rows inside a region, for the moment, can't be
>> >> >> done directly (requires doing a scan between the start and end keys
>> of
>> >> >> a region and counting the number of rows you see).
>> >> >>
>> >> >> >
>> >> >> > Thank you for you help,
>> >> >> > CJ
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: number of tablets and hbase table size

Reply via email to