how to get rowkey with largest number of versions

2018-08-22 Thread Antonio Si
Hi,

I am new to hbase. I am wondering how I could find out which rowkey has the
largest number of versions in a column family.

Any pointer would be very helpful.

Thanks.

Antonio.


Re: how to get rowkey with largest number of versions

2018-08-22 Thread Antonio Si
Thanks for all the info. I will give it a try.

On Wed, Aug 22, 2018 at 12:13 PM, Ted Yu  wrote:

> Antonio:
> Please take a look at CellCounter under hbase-mapreduce module which may be
> of use to you:
>
>  * 6. Total number of versions of each qualifier.
>
>
> Please note that the max versions may fluctuate depending on when major
> compaction kicks in.
>
>
> FYI
>
> On Wed, Aug 22, 2018 at 11:53 AM Ankit Singhal 
> wrote:
>
> > I don't think so if there is any direct way.
> > You may need to do a raw scan of a full table and count the number of
> > versions of a column returned for each row to calculate the max. (you can
> > optimize this with custom coprocessor by returning a single row key
> having
> > the largest versions of a column through each regionserver and at client
> > select max out of all results)
> >
> > On Wed, Aug 22, 2018 at 11:28 AM Antonio Si 
> wrote:
> >
> > > Hi,
> > >
> > > I am new to hbase. I am wondering how I could find out which rowkey has
> > the
> > > largest number of versions in a column family.
> > >
> > > Any pointer would be very helpful.
> > >
> > > Thanks.
> > >
> > > Antonio.
> > >
> >
>


time out when running CellCounter

2018-08-25 Thread Antonio Si
Hi,

When I run  org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting Timed
out after 600 secs. Is there a way to override the timeout value rather
than changing it in hbase-site.xml and restart hbase?

Any suggestions would be helpful.

Thank you.

Antonio.


Re: time out when running CellCounter

2018-08-25 Thread Antonio Si
Thanks Ted.

I try passing "-Dhbase.client.scanner.timeout.period=180" when I invoke
CellCounter, but it is still saying timeout after 600 sec.

Thanks.

Antonio.

On Sat, Aug 25, 2018 at 2:09 PM Ted Yu  wrote:

> It seems CellCounter doesn't have such (commandline) option.
>
> You can specify, e.g. scan timerange, scan max versions, start row, stop
> row, etc. so that individual run has shorter runtime.
>
> Cheers
>
> On Sat, Aug 25, 2018 at 9:35 AM Antonio Si  wrote:
>
> > Hi,
> >
> > When I run  org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting
> > Timed
> > out after 600 secs. Is there a way to override the timeout value rather
> > than changing it in hbase-site.xml and restart hbase?
> >
> > Any suggestions would be helpful.
> >
> > Thank you.
> >
> > Antonio.
> >
>


question on reducing number of versions

2018-08-26 Thread Antonio Si
Hello,

I have a hbase table whose definition has a max number of versions set to
36000.
I have verified that there are rows which have more than 2 versions
saved.

Now, I change the definition of the table and reduce the max number of
versions to 18000. Will I see the size of the table being reduced as I am
not seeing that?

Also, after I reduce the max number of versions, I try to create a
snapshot, but I am getting a
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo

del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404;
Error Code: 404 Not Found;


What may be the cause of that?

I am using s3 as my storage.


Thanks in advance for your suggestions.


Antonio.


Re: question on reducing number of versions

2018-08-26 Thread Antonio Si
Thanks Anil.

We are using hbase on s3. Yes, I understand 18000 is very high. We are in
the process of reducing it.

If I have a snapshot and I restore the table from this snapshot. Let's call
this table t1.
I then clone another table from the same snapshot, call it t2.

If I reduce the max versions of t2 and run a major compaction on t2, will I
see the decrease in table size for t2? If I compare the size of t2 and t1,
I should see a smaller size for t2?

Thanks.

Antonio.

On Sun, Aug 26, 2018 at 3:33 PM Anil Gupta  wrote:

> You will need to do major compaction on table for the table to
> clean/delete up extra version.
> Btw, 18000 max version is a unusually high value.
>
> Are you using hbase on s3 or hbase on hdfs?
>
> Sent from my iPhone
>
> > On Aug 26, 2018, at 2:34 PM, Antonio Si  wrote:
> >
> > Hello,
> >
> > I have a hbase table whose definition has a max number of versions set to
> > 36000.
> > I have verified that there are rows which have more than 2 versions
> > saved.
> >
> > Now, I change the definition of the table and reduce the max number of
> > versions to 18000. Will I see the size of the table being reduced as I am
> > not seeing that?
> >
> > Also, after I reduce the max number of versions, I try to create a
> > snapshot, but I am getting a
> > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo
> >
> > del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404;
> > Error Code: 404 Not Found;
> >
> >
> > What may be the cause of that?
> >
> > I am using s3 as my storage.
> >
> >
> > Thanks in advance for your suggestions.
> >
> >
> > Antonio.
>


a table is neither disable or enable

2018-08-29 Thread Antonio Si
Hi,

We have a table which is stuck in FAILED_OPEN state. So, we planned to drop
the table and re-clone the table from an old snapshot. We disabled the
table, but the disable procedure has been running for more than 20 hrs.

I went to hbase shell and found out "is_disabled" and "is_enabled" both
return false. Is that a normal behavior since the table is in the middle of
being disabled?

Is it normal that the disable took that many hours even though the table is
large in size (about 33TB)?

Thanks.

Antonio.


Re: a table is neither disable or enable

2018-08-29 Thread Antonio Si
Thanks Ted.

The log says "java.io.IOException: missing table descriptor for
ba912582f295f7ac0b83e7e419351602

[AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close
ba912582f295f7ac0b83e7e419351602  set to FAILED_OPEN"


The version of hbase is 1.3.1


Thanks.


Antonio.

On Wed, Aug 29, 2018 at 2:28 PM Ted Yu  wrote:

> Do you have access to master / region logs for when FAILED_OPEN state was
> noticed ?
>
> There should be some hint there as to why some region couldn't open.
>
> The length of table DDL is related to number of regions the table has. But
> the length should be less related to data amount.
>
> Which version of hbase are you using ?
>
> Thanks
>
> On Wed, Aug 29, 2018 at 2:22 PM Antonio Si  wrote:
>
> > Hi,
> >
> > We have a table which is stuck in FAILED_OPEN state. So, we planned to
> drop
> > the table and re-clone the table from an old snapshot. We disabled the
> > table, but the disable procedure has been running for more than 20 hrs.
> >
> > I went to hbase shell and found out "is_disabled" and "is_enabled" both
> > return false. Is that a normal behavior since the table is in the middle
> of
> > being disabled?
> >
> > Is it normal that the disable took that many hours even though the table
> is
> > large in size (about 33TB)?
> >
> > Thanks.
> >
> > Antonio.
> >
>


Re: a table is neither disable or enable

2018-08-29 Thread Antonio Si
Thanks Ted.
Now that the table is in neither disable or enable state, will the table
eventually got disable completely?
>From the "Procedure" tab of the hbase ui, I see the "disable" is still
running.

Thanks.

Antonio.

On Wed, Aug 29, 2018 at 3:31 PM Ted Yu  wrote:

> The 'missing table descriptor' error should have been fixed by running hbck
> (with selected parameters).
>
> FYI
>
> On Wed, Aug 29, 2018 at 2:46 PM Antonio Si  wrote:
>
> > Thanks Ted.
> >
> > The log says "java.io.IOException: missing table descriptor for
> > ba912582f295f7ac0b83e7e419351602
> >
> > [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close
> > ba912582f295f7ac0b83e7e419351602  set to FAILED_OPEN"
> >
> >
> > The version of hbase is 1.3.1
> >
> >
> > Thanks.
> >
> >
> > Antonio.
> >
> > On Wed, Aug 29, 2018 at 2:28 PM Ted Yu  wrote:
> >
> > > Do you have access to master / region logs for when FAILED_OPEN state
> was
> > > noticed ?
> > >
> > > There should be some hint there as to why some region couldn't open.
> > >
> > > The length of table DDL is related to number of regions the table has.
> > But
> > > the length should be less related to data amount.
> > >
> > > Which version of hbase are you using ?
> > >
> > > Thanks
> > >
> > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > We have a table which is stuck in FAILED_OPEN state. So, we planned
> to
> > > drop
> > > > the table and re-clone the table from an old snapshot. We disabled
> the
> > > > table, but the disable procedure has been running for more than 20
> hrs.
> > > >
> > > > I went to hbase shell and found out "is_disabled" and "is_enabled"
> both
> > > > return false. Is that a normal behavior since the table is in the
> > middle
> > > of
> > > > being disabled?
> > > >
> > > > Is it normal that the disable took that many hours even though the
> > table
> > > is
> > > > large in size (about 33TB)?
> > > >
> > > > Thanks.
> > > >
> > > > Antonio.
> > > >
> > >
> >
>


Re: a table is neither disable or enable

2018-08-29 Thread Antonio Si
Forgot to mention that all regions of the table is offline now. Wondering
if the table will eventually got disable as it has been running for almost
24 hrs now.

Thanks.

Antonio.

On Wed, Aug 29, 2018 at 3:40 PM Antonio Si  wrote:

> Thanks Ted.
> Now that the table is in neither disable or enable state, will the table
> eventually got disable completely?
> From the "Procedure" tab of the hbase ui, I see the "disable" is still
> running.
>
> Thanks.
>
> Antonio.
>
> On Wed, Aug 29, 2018 at 3:31 PM Ted Yu  wrote:
>
>> The 'missing table descriptor' error should have been fixed by running
>> hbck
>> (with selected parameters).
>>
>> FYI
>>
>> On Wed, Aug 29, 2018 at 2:46 PM Antonio Si  wrote:
>>
>> > Thanks Ted.
>> >
>> > The log says "java.io.IOException: missing table descriptor for
>> > ba912582f295f7ac0b83e7e419351602
>> >
>> > [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close
>> > ba912582f295f7ac0b83e7e419351602  set to FAILED_OPEN"
>> >
>> >
>> > The version of hbase is 1.3.1
>> >
>> >
>> > Thanks.
>> >
>> >
>> > Antonio.
>> >
>> > On Wed, Aug 29, 2018 at 2:28 PM Ted Yu  wrote:
>> >
>> > > Do you have access to master / region logs for when FAILED_OPEN state
>> was
>> > > noticed ?
>> > >
>> > > There should be some hint there as to why some region couldn't open.
>> > >
>> > > The length of table DDL is related to number of regions the table has.
>> > But
>> > > the length should be less related to data amount.
>> > >
>> > > Which version of hbase are you using ?
>> > >
>> > > Thanks
>> > >
>> > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si 
>> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > We have a table which is stuck in FAILED_OPEN state. So, we planned
>> to
>> > > drop
>> > > > the table and re-clone the table from an old snapshot. We disabled
>> the
>> > > > table, but the disable procedure has been running for more than 20
>> hrs.
>> > > >
>> > > > I went to hbase shell and found out "is_disabled" and "is_enabled"
>> both
>> > > > return false. Is that a normal behavior since the table is in the
>> > middle
>> > > of
>> > > > being disabled?
>> > > >
>> > > > Is it normal that the disable took that many hours even though the
>> > table
>> > > is
>> > > > large in size (about 33TB)?
>> > > >
>> > > > Thanks.
>> > > >
>> > > > Antonio.
>> > > >
>> > >
>> >
>>
>


Re: a table is neither disable or enable

2018-08-29 Thread Antonio Si
Thanks Ted.

Antonio.

On Wed, Aug 29, 2018 at 4:00 PM Ted Yu  wrote:

> I doubt the procedure would finish, considering it has run for so long.
>
> You can check the tail of master log to see if it is stuck.
> If it is stuck, see if you can use abort_procedure.rb to stop.
>
> After the procedure is stopped, see if running hbck can fix the issue (I
> haven't worked with 1.3 release in production).
> When running hbck, run without -fix parameter first to see what
> inconsistencies hbck reports.
>
> Cheers
>
> On Wed, Aug 29, 2018 at 3:42 PM Antonio Si  wrote:
>
> > Forgot to mention that all regions of the table is offline now. Wondering
> > if the table will eventually got disable as it has been running for
> almost
> > 24 hrs now.
> >
> > Thanks.
> >
> > Antonio.
> >
> > On Wed, Aug 29, 2018 at 3:40 PM Antonio Si  wrote:
> >
> > > Thanks Ted.
> > > Now that the table is in neither disable or enable state, will the
> table
> > > eventually got disable completely?
> > > From the "Procedure" tab of the hbase ui, I see the "disable" is still
> > > running.
> > >
> > > Thanks.
> > >
> > > Antonio.
> > >
> > > On Wed, Aug 29, 2018 at 3:31 PM Ted Yu  wrote:
> > >
> > >> The 'missing table descriptor' error should have been fixed by running
> > >> hbck
> > >> (with selected parameters).
> > >>
> > >> FYI
> > >>
> > >> On Wed, Aug 29, 2018 at 2:46 PM Antonio Si 
> > wrote:
> > >>
> > >> > Thanks Ted.
> > >> >
> > >> > The log says "java.io.IOException: missing table descriptor for
> > >> > ba912582f295f7ac0b83e7e419351602
> > >> >
> > >> > [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close
> > >> > ba912582f295f7ac0b83e7e419351602  set to FAILED_OPEN"
> > >> >
> > >> >
> > >> > The version of hbase is 1.3.1
> > >> >
> > >> >
> > >> > Thanks.
> > >> >
> > >> >
> > >> > Antonio.
> > >> >
> > >> > On Wed, Aug 29, 2018 at 2:28 PM Ted Yu  wrote:
> > >> >
> > >> > > Do you have access to master / region logs for when FAILED_OPEN
> > state
> > >> was
> > >> > > noticed ?
> > >> > >
> > >> > > There should be some hint there as to why some region couldn't
> open.
> > >> > >
> > >> > > The length of table DDL is related to number of regions the table
> > has.
> > >> > But
> > >> > > the length should be less related to data amount.
> > >> > >
> > >> > > Which version of hbase are you using ?
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si 
> > >> wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > We have a table which is stuck in FAILED_OPEN state. So, we
> > planned
> > >> to
> > >> > > drop
> > >> > > > the table and re-clone the table from an old snapshot. We
> disabled
> > >> the
> > >> > > > table, but the disable procedure has been running for more than
> 20
> > >> hrs.
> > >> > > >
> > >> > > > I went to hbase shell and found out "is_disabled" and
> "is_enabled"
> > >> both
> > >> > > > return false. Is that a normal behavior since the table is in
> the
> > >> > middle
> > >> > > of
> > >> > > > being disabled?
> > >> > > >
> > >> > > > Is it normal that the disable took that many hours even though
> the
> > >> > table
> > >> > > is
> > >> > > > large in size (about 33TB)?
> > >> > > >
> > >> > > > Thanks.
> > >> > > >
> > >> > > > Antonio.
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>


question on snapshot and export utility

2018-09-05 Thread Antonio Si
Hi,

When taking a snapshot or running the export utility, is it possible to
specify a condition or filter on some columns so that only rows that
satisfy the condition will be included in the snapshot or exported?

Thanks.

Antonio.


Re: question on snapshot and export utility

2018-09-06 Thread Antonio Si
Thanks Vlad. I will take a look.

Antonio.

On Wed, Sep 5, 2018 at 12:15 PM Vladimir Rodionov 
wrote:

> No, it is not, to my best knowledge. ExportSnapshot just move files to new
> destination using M/R job.
> But, you can do the custom filtering yourself. Look at ExportSnapshot
> implementation. All you need is a new
> Mapper which does required filtering of a HFile before moving data to a
> destination.
>
> -Vlad
>
> On Wed, Sep 5, 2018 at 10:51 AM Antonio Si  wrote:
>
> > Hi,
> >
> > When taking a snapshot or running the export utility, is it possible to
> > specify a condition or filter on some columns so that only rows that
> > satisfy the condition will be included in the snapshot or exported?
> >
> > Thanks.
> >
> > Antonio.
> >
>


questions regarding hbase major compaction

2018-09-10 Thread Antonio Si
Hello,

As I understand, the deleted records in hbase files do not get removed
until a major compaction is performed.

I have a few questions regarding major compaction:

1.   If I set a TTL and/or a max number of versions, the records are older
than the TTL or the
  expired versions will still be in the hbase files until the major
compaction is performed?
  Is my understanding correct?

2.   If a major compaction is never performed on a table, besides the size
of the table keep
  increasing, eventually, we will have too many hbase files and the
cluster will slow down.
  Is there any other implications?

3.   Is there any guidelines about how often should we run major compaction?

4.   During major compaction, do we need to pause all read/write operations
until major
  compaction is finished?

  I realize that if using S3 as the storage, after I run major
compaction, there is inconsistencies
  between s3 metadata and s3 file system and I need to run a "emrfs
sync" to synchronize them
  after major compaction is completed. Does it mean I need to pause all
read/write operations
  during this period?

Thanks.

Antonio.


check if column family has any data

2018-11-13 Thread Antonio Si
Hi,

Is there an easy way to check if a column family of a hbase table has any
data?

I try something like "scan '', { LIMIT => 10,
FILTER=>"FamilyFilter(=, 'binary:')" } in hbase shell and it
timeout. I guess it's because my table
has 15TB of data. So, I am guessing that particular family has no data, but
I need a way to confirm that.

Any pointers would be appreciated.

Thanks.

Antonio.


question on column families

2018-11-13 Thread Antonio Si
Hi,

I would like to confirm my understand.

Let's say I have 13 column families in a hbase table. 11 of those column
families have no data, which 2 column families have large amount of data.

My understanding is that the size of memstore, which is 128M in my env,
will be shared across all column families even though there is no data in
that column families. Is my understanding correct?

Thanks in advance.

Antonio.


Re: question on column families

2018-11-13 Thread Antonio Si
Thanks Allan.

Then, why is it a problem of having too many column families? If there are
column
families with no data, would that cause any issues?

Thanks.

Antonio.

On Tue, Nov 13, 2018 at 7:09 PM Allan Yang  wrote:

> No, Every column family has its own memstore. Each one is 128MB in your
> case. When flushing, the flusher will choose those memstore who satisfy
> certain conditions, so it is possible that not every column family(Store)
> will flush the memstore.
> Best Regards
> Allan Yang
>
>
> Antonio Si  于2018年11月14日周三 上午7:34写道:
>
> > Hi,
> >
> > I would like to confirm my understand.
> >
> > Let's say I have 13 column families in a hbase table. 11 of those column
> > families have no data, which 2 column families have large amount of data.
> >
> > My understanding is that the size of memstore, which is 128M in my env,
> > will be shared across all column families even though there is no data in
> > that column families. Is my understanding correct?
> >
> > Thanks in advance.
> >
> > Antonio.
> >
>