Re: Problem to Insert the row that i was deleted
Your only chance is to run a major compaction on your table - that will get rid of the delete marker. Then you can re-add the Put with the same TS. -- Lars ps. Rereading my email below... At some point I will learn to proof-read my emails before I send them full of grammatical errors. - Original Message - From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Cc: Sent: Tuesday, April 24, 2012 10:46 PM Subject: RE: Problem to Insert the row that i was deleted thanks for ur sharing so there is no solution for return back the row ( or cells/columns) ? Date: Tue, 24 Apr 2012 22:39:49 -0700 From: lhofha...@yahoo.com Subject: Re: Problem to Insert the row that i was deleted To: user@hbase.apache.org Rows (or rather cells/columns) are not actually deleted. Instead they are marked for deletion by a delete marker. The deleted cells are collected during the next major or minor comaction. As long as the marker exist new Put (with thje same timestamp as the existing Put will affected by the delete marker. The delete marker itself will exist until the next major compaction. This might seems strange, but is actually an important feature of HBase as it allows operations to be executed in any order with the same end result. -- Lars From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Sent: Tuesday, April 24, 2012 9:05 PM Subject: Problem to Insert the row that i was deleted I delete a row and I want to add the same row ( with same Timestamp ) to HBase but it is not added to the table. I know if I changed the timestamp it will added but it is necessary to add it with the same timestamp. please advice me where is my problem ? regard mahdi
Re: Problem to Insert the row that i was deleted
As Lars mentioned, the row is not physically deleted. The way which Hbase uses is to insert a cell called tombstone which is used to mask the deleted value, but value is still there (if the deleted value is in the same memstore with tombstone, it will be deleted in the memstore, so you will not find tombstone and deleted value in the same HFile.) This is new in hbase 0.92.0. In the previous 0.90.*, both tombstone and deleted value are in HFile. If you want to read your deleted data, you can read the HFile which exists in server side which is supported by 0.90.* version. If you just read the table content at client side, I am afraid you have to first run the major compaction, and then reinsert your deleted data. Reagards! Yong On Wed, Apr 25, 2012 at 8:14 AM, lars hofhansl lhofha...@yahoo.com wrote: Your only chance is to run a major compaction on your table - that will get rid of the delete marker. Then you can re-add the Put with the same TS. -- Lars ps. Rereading my email below... At some point I will learn to proof-read my emails before I send them full of grammatical errors. - Original Message - From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Cc: Sent: Tuesday, April 24, 2012 10:46 PM Subject: RE: Problem to Insert the row that i was deleted thanks for ur sharing so there is no solution for return back the row ( or cells/columns) ? Date: Tue, 24 Apr 2012 22:39:49 -0700 From: lhofha...@yahoo.com Subject: Re: Problem to Insert the row that i was deleted To: user@hbase.apache.org Rows (or rather cells/columns) are not actually deleted. Instead they are marked for deletion by a delete marker. The deleted cells are collected during the next major or minor comaction. As long as the marker exist new Put (with thje same timestamp as the existing Put will affected by the delete marker. The delete marker itself will exist until the next major compaction. This might seems strange, but is actually an important feature of HBase as it allows operations to be executed in any order with the same end result. -- Lars From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Sent: Tuesday, April 24, 2012 9:05 PM Subject: Problem to Insert the row that i was deleted I delete a row and I want to add the same row ( with same Timestamp ) to HBase but it is not added to the table. I know if I changed the timestamp it will added but it is necessary to add it with the same timestamp. please advice me where is my problem ? regard mahdi
Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase
I guess Sesame Street isn't global... ;-) oh and of course I f'd the joke by saying Grover and not Oscar so it's my bad. :-(. [Google Oscar the groutch, and you'll understand the joke that I botched] Its most likely GC and a mis tuned cluster. The OP doesn't really get in to detail, except to say that his cluster is tiny. Yes, size does matter, regardless of those rumors to the contrary... 3 DN kinda small. If he's splitting that often then his region size is too small, hot spotting and other things can impact performance however not in the way he described. Also when you look at performance, look at reads, not writes. You can cache both and writes are less important than reads. (think about it.) Since this type conversation keeps popping up, it would be a good topic for Strata in NY. (Not too subtle of a hint to those who are picking topics...) Good cluster design is important, more important than people think. Sent from a remote device. Please excuse any typos... Mike Segel On Apr 25, 2012, at 12:08 AM, Mikael Sitruk mikael.sit...@gmail.com wrote: 1. writes are not blocked during compaction 2. compaction cannot have a constant time since the files/regions are getting bigger 3. beside the GC pauses (which seems to be the best candidate here) on either the client or RS (what are your setting BTW, and data size per insert), did you presplit your regions or a split is occurring during the execution? 4. did you look at the logs? is there any operation that is taking too long there (in 0.92 you can configure and print any operation that will take long time) Regards Mikael.S On Wed, Apr 25, 2012 at 4:58 AM, Michael Segel michael_se...@hotmail.comwrote: Have you thought about Garbage Collection? -Grover Sent from my iPhone On Apr 24, 2012, at 12:41 PM, Skchaudhary schoudh...@ivp.in wrote: I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a table which has 27 Regions equally distributed among 3 Region servers--9 regions per region server. Region server 1 has ---region 1-9 Region server 2 has ---region 10-18 Region server 3 has ---region 19-27 Now when I start a program which inserts rows in region 1 and region 5 (both under Region Server-1) alternatively and on continuous basis, I see that the insert time for each row is not constant or consistent---there is a lot of variance or say standard deviation of insert time is quite large. Some times it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and sometimes even 3000 ms.Even though data size in rows is equal. I understand that due to flushing and compaction of Regions the writes are blocked---but then it should not be blocked for larger span of time and the blockage time should be consistent for every flush/compaction (minor compaction). All in all every time flush and compaction occurs it should take nearly same time for each compaction and flush. For our application we need a consistent quality of service and if not perfect atleast we need a well visible boundary lines--like for each row insert it will take some 0 to 10 ms and not more than 10 ms(just an example) that even though minor compaction or flush occurs. Is there any setting/configuration which I should try? Any ideas of how to achieve it in Hbase. Any help would be really appreciated. Thanks in advance!! -- View this message in context: http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviation-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p33740438.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Problem to Insert the row that i was deleted
Uhm... Not exactly Lars... Just my $0.02 ... While I don't disagree w Lars, I think the question you have to ask is why is the time stamp important? Is it an element of the data or is it an artifact? This kind of gets in to your Schema design and taking short cuts. You may want to instead create a data element or column containing the time stamp rather than rely on an HBase internal time stamp. Or you could increase the existing time stamp by 1 ns... ;-) (Blame it on clock drift in your cluster? Of course we don't know the significance of the time stamp ... Or how often the row is un/re deleted... 1000 times and you'd be off by a whole second.) -Just saying... :-) Sent from a remote device. Please excuse any typos... Mike Segel On Apr 25, 2012, at 1:14 AM, lars hofhansl lhofha...@yahoo.com wrote: Your only chance is to run a major compaction on your table - that will get rid of the delete marker. Then you can re-add the Put with the same TS. -- Lars ps. Rereading my email below... At some point I will learn to proof-read my emails before I send them full of grammatical errors. - Original Message - From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Cc: Sent: Tuesday, April 24, 2012 10:46 PM Subject: RE: Problem to Insert the row that i was deleted thanks for ur sharing so there is no solution for return back the row ( or cells/columns) ? Date: Tue, 24 Apr 2012 22:39:49 -0700 From: lhofha...@yahoo.com Subject: Re: Problem to Insert the row that i was deleted To: user@hbase.apache.org Rows (or rather cells/columns) are not actually deleted. Instead they are marked for deletion by a delete marker. The deleted cells are collected during the next major or minor comaction. As long as the marker exist new Put (with thje same timestamp as the existing Put will affected by the delete marker. The delete marker itself will exist until the next major compaction. This might seems strange, but is actually an important feature of HBase as it allows operations to be executed in any order with the same end result. -- Lars From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Sent: Tuesday, April 24, 2012 9:05 PM Subject: Problem to Insert the row that i was deleted I delete a row and I want to add the same row ( with same Timestamp ) to HBase but it is not added to the table. I know if I changed the timestamp it will added but it is necessary to add it with the same timestamp. please advice me where is my problem ? regard mahdi
Regions not cleared
Hi, I have set TTL in hbase table due to which the data is cleared after specified time, but the regions are not cleared even as the data inside the regions are cleared. Can someone please let me know if I am missing anything. Thanks Ajay
hbase installation
Hi am new to hbase and hadoop. I want to install hbase and to work with hbase writing mapreduce jobs for data in hbase. I installed hbase. It works well in standalone mode but dont start master and zookeeper properly on pseudodistributed mode. kindly help to resolve this problem. Thanks -- View this message in context: http://old.nabble.com/hbase-installation-tp33746422p33746422.html Sent from the HBase User mailing list archive at Nabble.com.
Re: hbase installation
any error msg? On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.comwrote: Hi am new to hbase and hadoop. I want to install hbase and to work with hbase writing mapreduce jobs for data in hbase. I installed hbase. It works well in standalone mode but dont start master and zookeeper properly on pseudodistributed mode. kindly help to resolve this problem. Thanks -- View this message in context: http://old.nabble.com/hbase-installation-tp33746422p33746422.html Sent from the HBase User mailing list archive at Nabble.com. -- Nitin Pawar
Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase
Hi there- In addition to what was said about GC, you might want to double-check this... http://hbase.apache.org/book.html#performance ... as well as this case-study for performance troubleshooting http://hbase.apache.org/book.html#casestudies.perftroub On 4/24/12 9:58 PM, Michael Segel michael_se...@hotmail.com wrote: Have you thought about Garbage Collection? -Grover Sent from my iPhone On Apr 24, 2012, at 12:41 PM, Skchaudhary schoudh...@ivp.in wrote: I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a table which has 27 Regions equally distributed among 3 Region servers--9 regions per region server. Region server 1 has ---region 1-9 Region server 2 has ---region 10-18 Region server 3 has ---region 19-27 Now when I start a program which inserts rows in region 1 and region 5 (both under Region Server-1) alternatively and on continuous basis, I see that the insert time for each row is not constant or consistent---there is a lot of variance or say standard deviation of insert time is quite large. Some times it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and sometimes even 3000 ms.Even though data size in rows is equal. I understand that due to flushing and compaction of Regions the writes are blocked---but then it should not be blocked for larger span of time and the blockage time should be consistent for every flush/compaction (minor compaction). All in all every time flush and compaction occurs it should take nearly same time for each compaction and flush. For our application we need a consistent quality of service and if not perfect atleast we need a well visible boundary lines--like for each row insert it will take some 0 to 10 ms and not more than 10 ms(just an example) that even though minor compaction or flush occurs. Is there any setting/configuration which I should try? Any ideas of how to achieve it in Hbase. Any help would be really appreciated. Thanks in advance!! -- View this message in context: http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviati on-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p3 3740438.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Integrity constraints
Thank you Gary..! Now i understood the actual method. On Wed, Apr 25, 2012 at 11:36 AM, Gary Helmling ghelml...@gmail.com wrote: Hi Vamshi, See the ConstraintProcessor coprocessor that was added for just this kind of case: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint/package-summary.html You would need to implement the Constraint interface and apply the configuration to your tables via the Constraints utility. Assuming the fields are being handled as strings on the client end, your Constraint implementation could simply call Bytes.toString() and apply some basic regexs for validation. Or you could consider using a more structured serialization format like protobufs. --gh On Tue, Apr 24, 2012 at 9:35 PM, Vamshi Krishna vamshi2...@gmail.com wrote: Hi all , here i am having one basic doubt about constraints on hbase table, after knowing there is no concept of data types in hbase and everything is stored in the bytes. Suppose a table in hbase has 3 columns,(under same column family) 1st column is 'Name' which accepts only character strings not numbers and special symbols. 2nd column 'phoneNumber', which is numerals that too exactly 10 digits, and 3rd column 'city' which should accept only upper case character strings. If such is the situation, how to enforce the constraints on each of the columns of hbase table? Also Can anybody please tell how to write the equivalent query in hbase shell and Java to do so? -- *Regards* * Vamshi Krishna * -- *Regards* * Vamshi Krishna *
Re: Problem to Insert the row that i was deleted
Thanks yonghu. That is HBASE-4241. One small point: The deleted rows are not deleted from the memstore, but rather not included when the memstore is flushed to disk. -- Lars - Original Message - From: yonghu yongyong...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Sent: Wednesday, April 25, 2012 1:10 AM Subject: Re: Problem to Insert the row that i was deleted As Lars mentioned, the row is not physically deleted. The way which Hbase uses is to insert a cell called tombstone which is used to mask the deleted value, but value is still there (if the deleted value is in the same memstore with tombstone, it will be deleted in the memstore, so you will not find tombstone and deleted value in the same HFile.) This is new in hbase 0.92.0. In the previous 0.90.*, both tombstone and deleted value are in HFile. If you want to read your deleted data, you can read the HFile which exists in server side which is supported by 0.90.* version. If you just read the table content at client side, I am afraid you have to first run the major compaction, and then reinsert your deleted data. Reagards! Yong On Wed, Apr 25, 2012 at 8:14 AM, lars hofhansl lhofha...@yahoo.com wrote: Your only chance is to run a major compaction on your table - that will get rid of the delete marker. Then you can re-add the Put with the same TS. -- Lars ps. Rereading my email below... At some point I will learn to proof-read my emails before I send them full of grammatical errors. - Original Message - From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Cc: Sent: Tuesday, April 24, 2012 10:46 PM Subject: RE: Problem to Insert the row that i was deleted thanks for ur sharing so there is no solution for return back the row ( or cells/columns) ? Date: Tue, 24 Apr 2012 22:39:49 -0700 From: lhofha...@yahoo.com Subject: Re: Problem to Insert the row that i was deleted To: user@hbase.apache.org Rows (or rather cells/columns) are not actually deleted. Instead they are marked for deletion by a delete marker. The deleted cells are collected during the next major or minor comaction. As long as the marker exist new Put (with thje same timestamp as the existing Put will affected by the delete marker. The delete marker itself will exist until the next major compaction. This might seems strange, but is actually an important feature of HBase as it allows operations to be executed in any order with the same end result. -- Lars From: Mahdi Negahi negahi.ma...@hotmail.com To: Hbase user@hbase.apache.org Sent: Tuesday, April 24, 2012 9:05 PM Subject: Problem to Insert the row that i was deleted I delete a row and I want to add the same row ( with same Timestamp ) to HBase but it is not added to the table. I know if I changed the timestamp it will added but it is necessary to add it with the same timestamp. please advice me where is my problem ? regard mahdi
0.20 to 0.90 upgrade
As per the docs, it looks like painless to upgrade from 0.20.3 to 0.90 (only need to run upgrade script if upgrading to 0.92). http://hbase.apache.org/book/upgrading.html#upgrade0.90 Anyone has experience in upgrading from 0.20 to 0.90 or something similar with major upgrade ? Do we need to upgrade hadoop (0.20.1) as well ? or can 0.20.1 work with 0.90 hbase ? Trying to minimize the impact with one upgrade at a time.. any help appreciated.. Thanks in advance David
Re: hbase installation
just foll this http://hbase.apache.org/book/standalone_dist.html On Wed, Apr 25, 2012 at 7:05 PM, Nitin Pawar nitinpawar...@gmail.comwrote: any error msg? On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.com wrote: Hi am new to hbase and hadoop. I want to install hbase and to work with hbase writing mapreduce jobs for data in hbase. I installed hbase. It works well in standalone mode but dont start master and zookeeper properly on pseudodistributed mode. kindly help to resolve this problem. Thanks -- View this message in context: http://old.nabble.com/hbase-installation-tp33746422p33746422.html Sent from the HBase User mailing list archive at Nabble.com. -- Nitin Pawar -- ∞ Shashwat Shriparv
Re: hbase installation
Change 127.0.1.1 in your /etc/hosts file to 127.0.0.1...also add the hadoop-core.jar from hadoop folder and commons-configuration.jar from the hadoob/lib to the hbae/lib folder. On Apr 25, 2012 11:59 PM, shashwat shriparv dwivedishash...@gmail.com wrote: just foll this http://hbase.apache.org/book/standalone_dist.html On Wed, Apr 25, 2012 at 7:05 PM, Nitin Pawar nitinpawar...@gmail.com wrote: any error msg? On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.com wrote: Hi am new to hbase and hadoop. I want to install hbase and to work with hbase writing mapreduce jobs for data in hbase. I installed hbase. It works well in standalone mode but dont start master and zookeeper properly on pseudodistributed mode. kindly help to resolve this problem. Thanks -- View this message in context: http://old.nabble.com/hbase-installation-tp33746422p33746422.html Sent from the HBase User mailing list archive at Nabble.com. -- Nitin Pawar -- ∞ Shashwat Shriparv
Re: hbase installation
check out this too seems to make it work, do what tariq has suggested too http://ria101.wordpress.com/2010/01/28/setup-hbase-in-pseudo-distributed-mode-and-connect-java-client/ On Thu, Apr 26, 2012 at 1:05 AM, Mohammad Tariq donta...@gmail.com wrote: Change 127.0.1.1 in your /etc/hosts file to 127.0.0.1...also add the hadoop-core.jar from hadoop folder and commons-configuration.jar from the hadoob/lib to the hbae/lib folder. On Apr 25, 2012 11:59 PM, shashwat shriparv dwivedishash...@gmail.com wrote: just foll this http://hbase.apache.org/book/standalone_dist.html On Wed, Apr 25, 2012 at 7:05 PM, Nitin Pawar nitinpawar...@gmail.com wrote: any error msg? On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.com wrote: Hi am new to hbase and hadoop. I want to install hbase and to work with hbase writing mapreduce jobs for data in hbase. I installed hbase. It works well in standalone mode but dont start master and zookeeper properly on pseudodistributed mode. kindly help to resolve this problem. Thanks -- View this message in context: http://old.nabble.com/hbase-installation-tp33746422p33746422.html Sent from the HBase User mailing list archive at Nabble.com. -- Nitin Pawar -- ∞ Shashwat Shriparv -- ∞ Shashwat Shriparv
Re: 0.20 to 0.90 upgrade
On Wed, Apr 25, 2012 at 11:14 AM, David Charle dbchar2...@gmail.com wrote: As per the docs, it looks like painless to upgrade from 0.20.3 to 0.90 (only need to run upgrade script if upgrading to 0.92). http://hbase.apache.org/book/upgrading.html#upgrade0.90 Anyone has experience in upgrading from 0.20 to 0.90 or something similar with major upgrade ? Do we need to upgrade hadoop (0.20.1) as well ? or can 0.20.1 work with 0.90 hbase ? Hello David. I would not call the 0.20 to 0.90 migration 'painless'. 'Well-exercised' and 'awkward but works' would more come to mind. All of us long-term users have come out on the other side of this step. I would suggest you follow the instructions closely to minimize headache. You should be ok going from 0.20.3 to 0.90.x. St.Ack
Re: TIMERANGE performance on uniformly distributed keyspace
Hi, 2012-04-14 klockan 21:07 skrev Rob Verkuylen: As far as I understand sequential keys with a timerange scan have the best read performance possible, because of the HFile metadata, just as N indicates. Maybe adding Bloomfilters can further up the performance. As far I understand it, Bloom filters are only useful for lookups based on row key (and possibly column name), not for any time related lookups. Still, in my case with random keys I get quick(sub second) response from my scan example earlier. Does HBase keep all the HFile metadata in memory? I can't imagine it will start hitting hundreds, potentially thousands of HFiles, reading their metadata, start full scanning the files and returning rows. Does it? What does quick response mean here? Is it the response time for the first batch of results? This can be quite low if the scan finds rows that match your scan criteria in a region/HFile at the start of the scanned range (e.g. at the beginning of the table). Did you also measure the time for the complete scan to complete (and the load it causes on your cluster), and relate it to the performance of a sequential scan over a secondary index table with monotonically increasing keys (and the load that causes on your cluster since the index has to be maintained and written to a single region server)? — Wouter signature.asc Description: Digital signature
Re: Regions not cleared
Hi, as far as I know TTL as well as deletions just take effect on major compaction. (see http://hbase.apache.org/book.html#regions.arch - 8.7.5.5) regards Christian Von: ajay.bhosle ajay.bho...@zapak.co.in An: user@hbase.apache.org Gesendet: 14:33 Mittwoch, 25.April 2012 Betreff: Regions not cleared Hi, I have set TTL in hbase table due to which the data is cleared after specified time, but the regions are not cleared even as the data inside the regions are cleared. Can someone please let me know if I am missing anything. Thanks Ajay
Re: HBase, CDH3U2, EC2
We use ec2 and cdh as well and have around 80 Hadoop/hbase nodes deployed across a few different clusters. We use a combination of puppet for package management and fabric scripts for pushing configs and managing services. Our base AMI is a pretty bare centos6 install and puppet handles most of the rest after spinning up. Puppet also worked fine for managing configs, until we started having many clusters with different setups. That's the point we moved to fabric for that. There is certainly an investment required for setting this stuff up initially, but it pays off as you continually need to spin up replacements or new nodes. We can do that with only a couple minutes of work at this point. Sent from iPhone. On Apr 26, 2012, at 1:12 AM, Something Something mailinglist...@gmail.com wrote: Hello, We have a Hadoop cluster running on EC2 with Cloudera's hadoop-0.20.2-cdh3u2 distribution. We are now ready to install HBase on it. Trying to figure out what's the best way to accomplish this. We have quite a few machines in the cluster, so installing HBase on each machine would be time consuming. But if that's the only way, we can do it by creating our own RPMs. Is this document the best resource: https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-InstallingHBase Are there ec2 scripts that work with Cloudera's distribution to make this process easier? Please help. Thanks.