how to get rowkey with largest number of versions
Hi, I am new to hbase. I am wondering how I could find out which rowkey has the largest number of versions in a column family. Any pointer would be very helpful. Thanks. Antonio.
Re: how to get rowkey with largest number of versions
Thanks for all the info. I will give it a try. On Wed, Aug 22, 2018 at 12:13 PM, Ted Yu wrote: > Antonio: > Please take a look at CellCounter under hbase-mapreduce module which may be > of use to you: > > * 6. Total number of versions of each qualifier. > > > Please note that the max versions may fluctuate depending on when major > compaction kicks in. > > > FYI > > On Wed, Aug 22, 2018 at 11:53 AM Ankit Singhal > wrote: > > > I don't think so if there is any direct way. > > You may need to do a raw scan of a full table and count the number of > > versions of a column returned for each row to calculate the max. (you can > > optimize this with custom coprocessor by returning a single row key > having > > the largest versions of a column through each regionserver and at client > > select max out of all results) > > > > On Wed, Aug 22, 2018 at 11:28 AM Antonio Si > wrote: > > > > > Hi, > > > > > > I am new to hbase. I am wondering how I could find out which rowkey has > > the > > > largest number of versions in a column family. > > > > > > Any pointer would be very helpful. > > > > > > Thanks. > > > > > > Antonio. > > > > > >
time out when running CellCounter
Hi, When I run org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting Timed out after 600 secs. Is there a way to override the timeout value rather than changing it in hbase-site.xml and restart hbase? Any suggestions would be helpful. Thank you. Antonio.
Re: time out when running CellCounter
Thanks Ted. I try passing "-Dhbase.client.scanner.timeout.period=180" when I invoke CellCounter, but it is still saying timeout after 600 sec. Thanks. Antonio. On Sat, Aug 25, 2018 at 2:09 PM Ted Yu wrote: > It seems CellCounter doesn't have such (commandline) option. > > You can specify, e.g. scan timerange, scan max versions, start row, stop > row, etc. so that individual run has shorter runtime. > > Cheers > > On Sat, Aug 25, 2018 at 9:35 AM Antonio Si wrote: > > > Hi, > > > > When I run org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting > > Timed > > out after 600 secs. Is there a way to override the timeout value rather > > than changing it in hbase-site.xml and restart hbase? > > > > Any suggestions would be helpful. > > > > Thank you. > > > > Antonio. > > >
question on reducing number of versions
Hello, I have a hbase table whose definition has a max number of versions set to 36000. I have verified that there are rows which have more than 2 versions saved. Now, I change the definition of the table and reduce the max number of versions to 18000. Will I see the size of the table being reduced as I am not seeing that? Also, after I reduce the max number of versions, I try to create a snapshot, but I am getting a com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; What may be the cause of that? I am using s3 as my storage. Thanks in advance for your suggestions. Antonio.
Re: question on reducing number of versions
Thanks Anil. We are using hbase on s3. Yes, I understand 18000 is very high. We are in the process of reducing it. If I have a snapshot and I restore the table from this snapshot. Let's call this table t1. I then clone another table from the same snapshot, call it t2. If I reduce the max versions of t2 and run a major compaction on t2, will I see the decrease in table size for t2? If I compare the size of t2 and t1, I should see a smaller size for t2? Thanks. Antonio. On Sun, Aug 26, 2018 at 3:33 PM Anil Gupta wrote: > You will need to do major compaction on table for the table to > clean/delete up extra version. > Btw, 18000 max version is a unusually high value. > > Are you using hbase on s3 or hbase on hdfs? > > Sent from my iPhone > > > On Aug 26, 2018, at 2:34 PM, Antonio Si wrote: > > > > Hello, > > > > I have a hbase table whose definition has a max number of versions set to > > 36000. > > I have verified that there are rows which have more than 2 versions > > saved. > > > > Now, I change the definition of the table and reduce the max number of > > versions to 18000. Will I see the size of the table being reduced as I am > > not seeing that? > > > > Also, after I reduce the max number of versions, I try to create a > > snapshot, but I am getting a > > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo > > > > del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; > > Error Code: 404 Not Found; > > > > > > What may be the cause of that? > > > > I am using s3 as my storage. > > > > > > Thanks in advance for your suggestions. > > > > > > Antonio. >
a table is neither disable or enable
Hi, We have a table which is stuck in FAILED_OPEN state. So, we planned to drop the table and re-clone the table from an old snapshot. We disabled the table, but the disable procedure has been running for more than 20 hrs. I went to hbase shell and found out "is_disabled" and "is_enabled" both return false. Is that a normal behavior since the table is in the middle of being disabled? Is it normal that the disable took that many hours even though the table is large in size (about 33TB)? Thanks. Antonio.
Re: a table is neither disable or enable
Thanks Ted. The log says "java.io.IOException: missing table descriptor for ba912582f295f7ac0b83e7e419351602 [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close ba912582f295f7ac0b83e7e419351602 set to FAILED_OPEN" The version of hbase is 1.3.1 Thanks. Antonio. On Wed, Aug 29, 2018 at 2:28 PM Ted Yu wrote: > Do you have access to master / region logs for when FAILED_OPEN state was > noticed ? > > There should be some hint there as to why some region couldn't open. > > The length of table DDL is related to number of regions the table has. But > the length should be less related to data amount. > > Which version of hbase are you using ? > > Thanks > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si wrote: > > > Hi, > > > > We have a table which is stuck in FAILED_OPEN state. So, we planned to > drop > > the table and re-clone the table from an old snapshot. We disabled the > > table, but the disable procedure has been running for more than 20 hrs. > > > > I went to hbase shell and found out "is_disabled" and "is_enabled" both > > return false. Is that a normal behavior since the table is in the middle > of > > being disabled? > > > > Is it normal that the disable took that many hours even though the table > is > > large in size (about 33TB)? > > > > Thanks. > > > > Antonio. > > >
Re: a table is neither disable or enable
Thanks Ted. Now that the table is in neither disable or enable state, will the table eventually got disable completely? >From the "Procedure" tab of the hbase ui, I see the "disable" is still running. Thanks. Antonio. On Wed, Aug 29, 2018 at 3:31 PM Ted Yu wrote: > The 'missing table descriptor' error should have been fixed by running hbck > (with selected parameters). > > FYI > > On Wed, Aug 29, 2018 at 2:46 PM Antonio Si wrote: > > > Thanks Ted. > > > > The log says "java.io.IOException: missing table descriptor for > > ba912582f295f7ac0b83e7e419351602 > > > > [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close > > ba912582f295f7ac0b83e7e419351602 set to FAILED_OPEN" > > > > > > The version of hbase is 1.3.1 > > > > > > Thanks. > > > > > > Antonio. > > > > On Wed, Aug 29, 2018 at 2:28 PM Ted Yu wrote: > > > > > Do you have access to master / region logs for when FAILED_OPEN state > was > > > noticed ? > > > > > > There should be some hint there as to why some region couldn't open. > > > > > > The length of table DDL is related to number of regions the table has. > > But > > > the length should be less related to data amount. > > > > > > Which version of hbase are you using ? > > > > > > Thanks > > > > > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si > wrote: > > > > > > > Hi, > > > > > > > > We have a table which is stuck in FAILED_OPEN state. So, we planned > to > > > drop > > > > the table and re-clone the table from an old snapshot. We disabled > the > > > > table, but the disable procedure has been running for more than 20 > hrs. > > > > > > > > I went to hbase shell and found out "is_disabled" and "is_enabled" > both > > > > return false. Is that a normal behavior since the table is in the > > middle > > > of > > > > being disabled? > > > > > > > > Is it normal that the disable took that many hours even though the > > table > > > is > > > > large in size (about 33TB)? > > > > > > > > Thanks. > > > > > > > > Antonio. > > > > > > > > > >
Re: a table is neither disable or enable
Forgot to mention that all regions of the table is offline now. Wondering if the table will eventually got disable as it has been running for almost 24 hrs now. Thanks. Antonio. On Wed, Aug 29, 2018 at 3:40 PM Antonio Si wrote: > Thanks Ted. > Now that the table is in neither disable or enable state, will the table > eventually got disable completely? > From the "Procedure" tab of the hbase ui, I see the "disable" is still > running. > > Thanks. > > Antonio. > > On Wed, Aug 29, 2018 at 3:31 PM Ted Yu wrote: > >> The 'missing table descriptor' error should have been fixed by running >> hbck >> (with selected parameters). >> >> FYI >> >> On Wed, Aug 29, 2018 at 2:46 PM Antonio Si wrote: >> >> > Thanks Ted. >> > >> > The log says "java.io.IOException: missing table descriptor for >> > ba912582f295f7ac0b83e7e419351602 >> > >> > [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close >> > ba912582f295f7ac0b83e7e419351602 set to FAILED_OPEN" >> > >> > >> > The version of hbase is 1.3.1 >> > >> > >> > Thanks. >> > >> > >> > Antonio. >> > >> > On Wed, Aug 29, 2018 at 2:28 PM Ted Yu wrote: >> > >> > > Do you have access to master / region logs for when FAILED_OPEN state >> was >> > > noticed ? >> > > >> > > There should be some hint there as to why some region couldn't open. >> > > >> > > The length of table DDL is related to number of regions the table has. >> > But >> > > the length should be less related to data amount. >> > > >> > > Which version of hbase are you using ? >> > > >> > > Thanks >> > > >> > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si >> wrote: >> > > >> > > > Hi, >> > > > >> > > > We have a table which is stuck in FAILED_OPEN state. So, we planned >> to >> > > drop >> > > > the table and re-clone the table from an old snapshot. We disabled >> the >> > > > table, but the disable procedure has been running for more than 20 >> hrs. >> > > > >> > > > I went to hbase shell and found out "is_disabled" and "is_enabled" >> both >> > > > return false. Is that a normal behavior since the table is in the >> > middle >> > > of >> > > > being disabled? >> > > > >> > > > Is it normal that the disable took that many hours even though the >> > table >> > > is >> > > > large in size (about 33TB)? >> > > > >> > > > Thanks. >> > > > >> > > > Antonio. >> > > > >> > > >> > >> >
Re: a table is neither disable or enable
Thanks Ted. Antonio. On Wed, Aug 29, 2018 at 4:00 PM Ted Yu wrote: > I doubt the procedure would finish, considering it has run for so long. > > You can check the tail of master log to see if it is stuck. > If it is stuck, see if you can use abort_procedure.rb to stop. > > After the procedure is stopped, see if running hbck can fix the issue (I > haven't worked with 1.3 release in production). > When running hbck, run without -fix parameter first to see what > inconsistencies hbck reports. > > Cheers > > On Wed, Aug 29, 2018 at 3:42 PM Antonio Si wrote: > > > Forgot to mention that all regions of the table is offline now. Wondering > > if the table will eventually got disable as it has been running for > almost > > 24 hrs now. > > > > Thanks. > > > > Antonio. > > > > On Wed, Aug 29, 2018 at 3:40 PM Antonio Si wrote: > > > > > Thanks Ted. > > > Now that the table is in neither disable or enable state, will the > table > > > eventually got disable completely? > > > From the "Procedure" tab of the hbase ui, I see the "disable" is still > > > running. > > > > > > Thanks. > > > > > > Antonio. > > > > > > On Wed, Aug 29, 2018 at 3:31 PM Ted Yu wrote: > > > > > >> The 'missing table descriptor' error should have been fixed by running > > >> hbck > > >> (with selected parameters). > > >> > > >> FYI > > >> > > >> On Wed, Aug 29, 2018 at 2:46 PM Antonio Si > > wrote: > > >> > > >> > Thanks Ted. > > >> > > > >> > The log says "java.io.IOException: missing table descriptor for > > >> > ba912582f295f7ac0b83e7e419351602 > > >> > > > >> > [AM.ZK.Worker-pool2-t6552] master.RegionStates: Failed to open/close > > >> > ba912582f295f7ac0b83e7e419351602 set to FAILED_OPEN" > > >> > > > >> > > > >> > The version of hbase is 1.3.1 > > >> > > > >> > > > >> > Thanks. > > >> > > > >> > > > >> > Antonio. > > >> > > > >> > On Wed, Aug 29, 2018 at 2:28 PM Ted Yu wrote: > > >> > > > >> > > Do you have access to master / region logs for when FAILED_OPEN > > state > > >> was > > >> > > noticed ? > > >> > > > > >> > > There should be some hint there as to why some region couldn't > open. > > >> > > > > >> > > The length of table DDL is related to number of regions the table > > has. > > >> > But > > >> > > the length should be less related to data amount. > > >> > > > > >> > > Which version of hbase are you using ? > > >> > > > > >> > > Thanks > > >> > > > > >> > > On Wed, Aug 29, 2018 at 2:22 PM Antonio Si > > >> wrote: > > >> > > > > >> > > > Hi, > > >> > > > > > >> > > > We have a table which is stuck in FAILED_OPEN state. So, we > > planned > > >> to > > >> > > drop > > >> > > > the table and re-clone the table from an old snapshot. We > disabled > > >> the > > >> > > > table, but the disable procedure has been running for more than > 20 > > >> hrs. > > >> > > > > > >> > > > I went to hbase shell and found out "is_disabled" and > "is_enabled" > > >> both > > >> > > > return false. Is that a normal behavior since the table is in > the > > >> > middle > > >> > > of > > >> > > > being disabled? > > >> > > > > > >> > > > Is it normal that the disable took that many hours even though > the > > >> > table > > >> > > is > > >> > > > large in size (about 33TB)? > > >> > > > > > >> > > > Thanks. > > >> > > > > > >> > > > Antonio. > > >> > > > > > >> > > > > >> > > > >> > > > > > >
question on snapshot and export utility
Hi, When taking a snapshot or running the export utility, is it possible to specify a condition or filter on some columns so that only rows that satisfy the condition will be included in the snapshot or exported? Thanks. Antonio.
Re: question on snapshot and export utility
Thanks Vlad. I will take a look. Antonio. On Wed, Sep 5, 2018 at 12:15 PM Vladimir Rodionov wrote: > No, it is not, to my best knowledge. ExportSnapshot just move files to new > destination using M/R job. > But, you can do the custom filtering yourself. Look at ExportSnapshot > implementation. All you need is a new > Mapper which does required filtering of a HFile before moving data to a > destination. > > -Vlad > > On Wed, Sep 5, 2018 at 10:51 AM Antonio Si wrote: > > > Hi, > > > > When taking a snapshot or running the export utility, is it possible to > > specify a condition or filter on some columns so that only rows that > > satisfy the condition will be included in the snapshot or exported? > > > > Thanks. > > > > Antonio. > > >
questions regarding hbase major compaction
Hello, As I understand, the deleted records in hbase files do not get removed until a major compaction is performed. I have a few questions regarding major compaction: 1. If I set a TTL and/or a max number of versions, the records are older than the TTL or the expired versions will still be in the hbase files until the major compaction is performed? Is my understanding correct? 2. If a major compaction is never performed on a table, besides the size of the table keep increasing, eventually, we will have too many hbase files and the cluster will slow down. Is there any other implications? 3. Is there any guidelines about how often should we run major compaction? 4. During major compaction, do we need to pause all read/write operations until major compaction is finished? I realize that if using S3 as the storage, after I run major compaction, there is inconsistencies between s3 metadata and s3 file system and I need to run a "emrfs sync" to synchronize them after major compaction is completed. Does it mean I need to pause all read/write operations during this period? Thanks. Antonio.
check if column family has any data
Hi, Is there an easy way to check if a column family of a hbase table has any data? I try something like "scan '', { LIMIT => 10, FILTER=>"FamilyFilter(=, 'binary:')" } in hbase shell and it timeout. I guess it's because my table has 15TB of data. So, I am guessing that particular family has no data, but I need a way to confirm that. Any pointers would be appreciated. Thanks. Antonio.
question on column families
Hi, I would like to confirm my understand. Let's say I have 13 column families in a hbase table. 11 of those column families have no data, which 2 column families have large amount of data. My understanding is that the size of memstore, which is 128M in my env, will be shared across all column families even though there is no data in that column families. Is my understanding correct? Thanks in advance. Antonio.
Re: question on column families
Thanks Allan. Then, why is it a problem of having too many column families? If there are column families with no data, would that cause any issues? Thanks. Antonio. On Tue, Nov 13, 2018 at 7:09 PM Allan Yang wrote: > No, Every column family has its own memstore. Each one is 128MB in your > case. When flushing, the flusher will choose those memstore who satisfy > certain conditions, so it is possible that not every column family(Store) > will flush the memstore. > Best Regards > Allan Yang > > > Antonio Si 于2018年11月14日周三 上午7:34写道: > > > Hi, > > > > I would like to confirm my understand. > > > > Let's say I have 13 column families in a hbase table. 11 of those column > > families have no data, which 2 column families have large amount of data. > > > > My understanding is that the size of memstore, which is 128M in my env, > > will be shared across all column families even though there is no data in > > that column families. Is my understanding correct? > > > > Thanks in advance. > > > > Antonio. > > >