Re: Problem to Insert the row that i was deleted

2012-04-25 Thread lars hofhansl
Your only chance is to run a major compaction on your table - that will get rid 
of the delete marker. Then you can re-add the Put with the same TS.

-- Lars

ps. Rereading my email below... At some point I will learn to proof-read my 
emails before I send them full of grammatical errors.


- Original Message -
From: Mahdi Negahi negahi.ma...@hotmail.com
To: Hbase user@hbase.apache.org
Cc: 
Sent: Tuesday, April 24, 2012 10:46 PM
Subject: RE: Problem to Insert the row that i was deleted



thanks for ur sharing 

so there is no solution for return back the row ( or cells/columns) ?


 Date: Tue, 24 Apr 2012 22:39:49 -0700
 From: lhofha...@yahoo.com
 Subject: Re: Problem to Insert the row that i was deleted
 To: user@hbase.apache.org
 
 Rows (or rather cells/columns) are not actually deleted. Instead they are 
 marked for deletion by a delete marker. The deleted cells are collected 
 during the next major or minor comaction.
 
 As long as the marker exist new Put (with thje same timestamp as the existing 
 Put will affected by the delete marker.
 The delete marker itself will exist until the next major compaction.
 
 This might seems strange, but is actually an important feature of HBase as it 
 allows operations to be executed in any order with the same end result.
 
 -- Lars
 
 
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org 
 Sent: Tuesday, April 24, 2012 9:05 PM
 Subject: Problem to Insert the row that i was deleted
 
 
 
 
 
 I delete a row and I want to add the same row ( with same Timestamp ) to 
 HBase but it is not added to the table. I know if I changed the timestamp it 
 will added but it is necessary to add it with the same timestamp. 
 
 please advice me where is my problem ?
 
 regard
 mahdi


Re: Problem to Insert the row that i was deleted

2012-04-25 Thread yonghu
As Lars mentioned, the row is not physically deleted. The way which
Hbase uses is to insert a cell called tombstone which is used to
mask the deleted value, but value is still there (if the deleted value
is in the same memstore with tombstone, it will be deleted in the
memstore, so you will not find tombstone and deleted value in the same
HFile.) This is new in hbase 0.92.0. In the previous 0.90.*, both
tombstone and deleted value are in HFile.  If you want to read your
deleted data, you can read the HFile which exists in server side which
is supported by 0.90.* version. If you just read the table content at
client side, I am afraid you have to first run the major compaction,
and then reinsert your deleted data.

Reagards!

Yong

On Wed, Apr 25, 2012 at 8:14 AM, lars hofhansl lhofha...@yahoo.com wrote:
 Your only chance is to run a major compaction on your table - that will get 
 rid of the delete marker. Then you can re-add the Put with the same TS.

 -- Lars

 ps. Rereading my email below... At some point I will learn to proof-read my 
 emails before I send them full of grammatical errors.


 - Original Message -
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org
 Cc:
 Sent: Tuesday, April 24, 2012 10:46 PM
 Subject: RE: Problem to Insert the row that i was deleted



 thanks for ur sharing

 so there is no solution for return back the row ( or cells/columns) ?


 Date: Tue, 24 Apr 2012 22:39:49 -0700
 From: lhofha...@yahoo.com
 Subject: Re: Problem to Insert the row that i was deleted
 To: user@hbase.apache.org

 Rows (or rather cells/columns) are not actually deleted. Instead they are 
 marked for deletion by a delete marker. The deleted cells are collected 
 during the next major or minor comaction.

 As long as the marker exist new Put (with thje same timestamp as the 
 existing Put will affected by the delete marker.
 The delete marker itself will exist until the next major compaction.

 This might seems strange, but is actually an important feature of HBase as 
 it allows operations to be executed in any order with the same end result.

 -- Lars

 
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org
 Sent: Tuesday, April 24, 2012 9:05 PM
 Subject: Problem to Insert the row that i was deleted





 I delete a row and I want to add the same row ( with same Timestamp ) to 
 HBase but it is not added to the table. I know if I changed the timestamp it 
 will added but it is necessary to add it with the same timestamp.

 please advice me where is my problem ?

 regard
 mahdi


Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

2012-04-25 Thread Michel Segel
I guess Sesame Street isn't global... ;-) oh and of course I f'd the joke by 
saying Grover and not Oscar so it's my bad. :-(. [Google Oscar the groutch, and 
you'll understand the joke that I botched]

Its most likely GC and a mis tuned cluster.
The OP doesn't really get in to detail, except to say that his cluster is tiny. 
Yes, size does matter, regardless of those rumors to the contrary... 3 DN kinda 
small.  If he's splitting that often then his region size is too small, hot 
spotting and other things can impact performance however not in the way he 
described.

Also when you look at performance, look at reads, not writes. You can cache 
both and writes are less important than reads. (think about it.)

Since this type conversation keeps popping up, it would be a good topic for 
Strata in NY. (Not too subtle of a hint to those who are picking topics...) 
Good cluster design is important, more important than people think. 


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 25, 2012, at 12:08 AM, Mikael Sitruk mikael.sit...@gmail.com wrote:

 1. writes are not blocked during compaction
 2. compaction cannot have a constant time since the files/regions are
 getting bigger
 3. beside the GC pauses (which seems to be the best candidate here) on
 either the client or RS (what are your setting BTW, and data size per
 insert), did you presplit your regions or a split is occurring during the
 execution?
 4. did you look at the logs? is there any operation that is taking too long
 there (in 0.92 you can configure and print any operation that will take
 long time)
 
 
 Regards
 Mikael.S
 
 On Wed, Apr 25, 2012 at 4:58 AM, Michael Segel 
 michael_se...@hotmail.comwrote:
 
 Have you thought about Garbage Collection?
 
 -Grover
 
 Sent from my iPhone
 
 On Apr 24, 2012, at 12:41 PM, Skchaudhary schoudh...@ivp.in wrote:
 
 
 I have a cluster Hbase set-up. In that I have 3 Region Servers. There is
 a
 table which has 27 Regions equally distributed among 3 Region servers--9
 regions per region server.
 
 Region server 1 has ---region 1-9 Region server 2 has ---region 10-18
 Region
 server 3 has ---region 19-27
 
 Now when I start a program which inserts rows in region 1 and region 5
 (both
 under Region Server-1) alternatively and on continuous basis, I see that
 the
 insert time for each row is not constant or consistent---there is a lot
 of
 variance or say standard deviation of insert time is quite large. Some
 times
 it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
 sometimes even  3000 ms.Even though data size in rows is equal.
 
 I understand that due to flushing and compaction of Regions the writes
 are
 blocked---but then it should not be blocked for larger span of time and
 the
 blockage time should be consistent for every flush/compaction (minor
 compaction).
 
 All in all every time flush and compaction occurs it should take nearly
 same
 time for each compaction and flush.
 
 For our application we need a consistent quality of service and if not
 perfect atleast we need a well visible boundary lines--like for each row
 insert it will take some 0 to 10 ms and not more than 10 ms(just an
 example)
 that even though minor compaction or flush occurs.
 
 Is there any setting/configuration which I should try?
 
 Any ideas of how to achieve it in Hbase.
 
 Any help would be really appreciated.
 
 Thanks in advance!!
 
 --
 View this message in context:
 http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviation-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p33740438.html
 Sent from the HBase User mailing list archive at Nabble.com.
 
 


Re: Problem to Insert the row that i was deleted

2012-04-25 Thread Michel Segel
Uhm... Not exactly Lars...
Just my $0.02 ...

While I don't disagree w Lars, I think the question you have to ask is why is 
the time stamp important?
Is it an element of the data or is it an artifact?
This kind of gets in to your Schema design and taking short cuts. You may want 
to instead create a data element or column containing the time stamp rather 
than rely on an HBase internal time stamp. 

Or you could increase the existing time stamp by 1 ns... ;-)
(Blame it on clock drift in your cluster? Of course we don't know the 
significance of the time stamp ... Or how often the row is un/re deleted... 
1000 times and you'd be off by a whole second.)

-Just saying... :-)


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 25, 2012, at 1:14 AM, lars hofhansl lhofha...@yahoo.com wrote:

 Your only chance is to run a major compaction on your table - that will get 
 rid of the delete marker. Then you can re-add the Put with the same TS.
 
 -- Lars
 
 ps. Rereading my email below... At some point I will learn to proof-read my 
 emails before I send them full of grammatical errors.
 
 
 - Original Message -
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org
 Cc: 
 Sent: Tuesday, April 24, 2012 10:46 PM
 Subject: RE: Problem to Insert the row that i was deleted
 
 
 
 thanks for ur sharing 
 
 so there is no solution for return back the row ( or cells/columns) ?
 
 
 Date: Tue, 24 Apr 2012 22:39:49 -0700
 From: lhofha...@yahoo.com
 Subject: Re: Problem to Insert the row that i was deleted
 To: user@hbase.apache.org
 
 Rows (or rather cells/columns) are not actually deleted. Instead they are 
 marked for deletion by a delete marker. The deleted cells are collected 
 during the next major or minor comaction.
 
 As long as the marker exist new Put (with thje same timestamp as the 
 existing Put will affected by the delete marker.
 The delete marker itself will exist until the next major compaction.
 
 This might seems strange, but is actually an important feature of HBase as 
 it allows operations to be executed in any order with the same end result.
 
 -- Lars
 
 
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org 
 Sent: Tuesday, April 24, 2012 9:05 PM
 Subject: Problem to Insert the row that i was deleted
 
 
 
 
 
 I delete a row and I want to add the same row ( with same Timestamp ) to 
 HBase but it is not added to the table. I know if I changed the timestamp it 
 will added but it is necessary to add it with the same timestamp. 
 
 please advice me where is my problem ?
 
 regard
 mahdi
 


Regions not cleared

2012-04-25 Thread ajay.bhosle
Hi,

 

I have set TTL in hbase table due to which the data is cleared after
specified time,  but the regions are not cleared even as the data inside the
regions are cleared. Can someone please let me know if I am missing
anything.

 

Thanks

Ajay



hbase installation

2012-04-25 Thread shehreen

Hi 

am new to hbase and hadoop. I want to install hbase and to work with hbase
writing mapreduce jobs for data in hbase. I installed hbase. It works well
in standalone mode but dont start master and zookeeper properly on
pseudodistributed mode. 

kindly help to resolve this problem.

Thanks 
-- 
View this message in context: 
http://old.nabble.com/hbase-installation-tp33746422p33746422.html
Sent from the HBase User mailing list archive at Nabble.com.



Re: hbase installation

2012-04-25 Thread Nitin Pawar
any error msg?

On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.comwrote:


 Hi

 am new to hbase and hadoop. I want to install hbase and to work with hbase
 writing mapreduce jobs for data in hbase. I installed hbase. It works well
 in standalone mode but dont start master and zookeeper properly on
 pseudodistributed mode.

 kindly help to resolve this problem.

 Thanks
 --
 View this message in context:
 http://old.nabble.com/hbase-installation-tp33746422p33746422.html
 Sent from the HBase User mailing list archive at Nabble.com.




-- 
Nitin Pawar


Re: Hbase Quality Of Service: large standarad deviation in insert time while inserting same type of rows in Hbase

2012-04-25 Thread Doug Meil

Hi there-

In addition to what was said about GC, you might want to double-check
this...

http://hbase.apache.org/book.html#performance

... as well as this case-study for performance troubleshooting

http://hbase.apache.org/book.html#casestudies.perftroub




On 4/24/12 9:58 PM, Michael Segel michael_se...@hotmail.com wrote:

Have you thought about Garbage Collection?

-Grover

Sent from my iPhone

On Apr 24, 2012, at 12:41 PM, Skchaudhary schoudh...@ivp.in wrote:

 
 I have a cluster Hbase set-up. In that I have 3 Region Servers. There
is a
 table which has 27 Regions equally distributed among 3 Region servers--9
 regions per region server.
 
 Region server 1 has ---region 1-9 Region server 2 has ---region 10-18
Region
 server 3 has ---region 19-27
 
 Now when I start a program which inserts rows in region 1 and region 5
(both
 under Region Server-1) alternatively and on continuous basis, I see
that the
 insert time for each row is not constant or consistent---there is a lot
of
 variance or say standard deviation of insert time is quite large. Some
times
 it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and
 sometimes even  3000 ms.Even though data size in rows is equal.
 
 I understand that due to flushing and compaction of Regions the writes
are
 blocked---but then it should not be blocked for larger span of time and
the
 blockage time should be consistent for every flush/compaction (minor
 compaction).
 
 All in all every time flush and compaction occurs it should take nearly
same
 time for each compaction and flush.
 
 For our application we need a consistent quality of service and if not
 perfect atleast we need a well visible boundary lines--like for each row
 insert it will take some 0 to 10 ms and not more than 10 ms(just an
example)
 that even though minor compaction or flush occurs.
 
 Is there any setting/configuration which I should try?
 
 Any ideas of how to achieve it in Hbase.
 
 Any help would be really appreciated.
 
 Thanks in advance!!
 
 -- 
 View this message in context:
http://old.nabble.com/Hbase-Quality-Of-Service%3A-large-standarad-deviati
on-in-insert-time-while-inserting-same-type-of-rows-in-Hbase-tp33740438p3
3740438.html
 Sent from the HBase User mailing list archive at Nabble.com.
 





Re: Integrity constraints

2012-04-25 Thread Vamshi Krishna
Thank you Gary..! Now i understood the actual method.

On Wed, Apr 25, 2012 at 11:36 AM, Gary Helmling ghelml...@gmail.com wrote:

 Hi Vamshi,

 See the ConstraintProcessor coprocessor that was added for just this
 kind of case:
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint/package-summary.html

 You would need to implement the Constraint interface and apply the
 configuration to your tables via the Constraints utility.

 Assuming the fields are being handled as strings on the client end,
 your Constraint implementation could simply call Bytes.toString() and
 apply some basic regexs for validation.  Or you could consider using a
 more structured serialization format like protobufs.

 --gh

 On Tue, Apr 24, 2012 at 9:35 PM, Vamshi Krishna vamshi2...@gmail.com
 wrote:
  Hi all ,  here i am having one basic doubt about constraints on hbase
  table, after knowing there is no concept of data types in hbase and
  everything is stored in the bytes.
   Suppose a table in hbase has 3 columns,(under same column family)
  1st column is 'Name' which accepts only character strings not numbers and
  special symbols.
  2nd column 'phoneNumber', which is numerals that too exactly 10 digits,
 and
  3rd column 'city' which should accept only upper case character strings.
 If
  such is the situation, how to enforce the constraints on each of the
  columns of hbase table?
 
  Also Can anybody please tell how to write the equivalent  query in hbase
  shell and Java to do so?
  --
  *Regards*
  *
  Vamshi Krishna
  *




-- 
*Regards*
*
Vamshi Krishna
*


Re: Problem to Insert the row that i was deleted

2012-04-25 Thread lars hofhansl
Thanks yonghu.


That is HBASE-4241.

One small point: The deleted rows are not deleted from the memstore, but rather 
not included when the memstore is flushed to disk.


-- Lars


- Original Message -
From: yonghu yongyong...@gmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
Cc: 
Sent: Wednesday, April 25, 2012 1:10 AM
Subject: Re: Problem to Insert the row that i was deleted

As Lars mentioned, the row is not physically deleted. The way which
Hbase uses is to insert a cell called tombstone which is used to
mask the deleted value, but value is still there (if the deleted value
is in the same memstore with tombstone, it will be deleted in the
memstore, so you will not find tombstone and deleted value in the same
HFile.) This is new in hbase 0.92.0. In the previous 0.90.*, both
tombstone and deleted value are in HFile.  If you want to read your
deleted data, you can read the HFile which exists in server side which
is supported by 0.90.* version. If you just read the table content at
client side, I am afraid you have to first run the major compaction,
and then reinsert your deleted data.

Reagards!

Yong

On Wed, Apr 25, 2012 at 8:14 AM, lars hofhansl lhofha...@yahoo.com wrote:
 Your only chance is to run a major compaction on your table - that will get 
 rid of the delete marker. Then you can re-add the Put with the same TS.

 -- Lars

 ps. Rereading my email below... At some point I will learn to proof-read my 
 emails before I send them full of grammatical errors.


 - Original Message -
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org
 Cc:
 Sent: Tuesday, April 24, 2012 10:46 PM
 Subject: RE: Problem to Insert the row that i was deleted



 thanks for ur sharing

 so there is no solution for return back the row ( or cells/columns) ?


 Date: Tue, 24 Apr 2012 22:39:49 -0700
 From: lhofha...@yahoo.com
 Subject: Re: Problem to Insert the row that i was deleted
 To: user@hbase.apache.org

 Rows (or rather cells/columns) are not actually deleted. Instead they are 
 marked for deletion by a delete marker. The deleted cells are collected 
 during the next major or minor comaction.

 As long as the marker exist new Put (with thje same timestamp as the 
 existing Put will affected by the delete marker.
 The delete marker itself will exist until the next major compaction.

 This might seems strange, but is actually an important feature of HBase as 
 it allows operations to be executed in any order with the same end result.

 -- Lars

 
 From: Mahdi Negahi negahi.ma...@hotmail.com
 To: Hbase user@hbase.apache.org
 Sent: Tuesday, April 24, 2012 9:05 PM
 Subject: Problem to Insert the row that i was deleted





 I delete a row and I want to add the same row ( with same Timestamp ) to 
 HBase but it is not added to the table. I know if I changed the timestamp it 
 will added but it is necessary to add it with the same timestamp.

 please advice me where is my problem ?

 regard
 mahdi



0.20 to 0.90 upgrade

2012-04-25 Thread David Charle
As per the docs, it looks like painless to upgrade from 0.20.3 to 0.90
(only need to run upgrade script if upgrading to 0.92).
http://hbase.apache.org/book/upgrading.html#upgrade0.90

Anyone has experience in upgrading from 0.20 to 0.90 or something similar
with major upgrade ? Do we need to upgrade hadoop (0.20.1) as well ? or can
0.20.1 work with 0.90 hbase ?

Trying to minimize the impact with one upgrade at a time.. any help
appreciated..

Thanks in advance
David


Re: hbase installation

2012-04-25 Thread shashwat shriparv
just foll this

http://hbase.apache.org/book/standalone_dist.html

On Wed, Apr 25, 2012 at 7:05 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 any error msg?

 On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.com
 wrote:

 
  Hi
 
  am new to hbase and hadoop. I want to install hbase and to work with
 hbase
  writing mapreduce jobs for data in hbase. I installed hbase. It works
 well
  in standalone mode but dont start master and zookeeper properly on
  pseudodistributed mode.
 
  kindly help to resolve this problem.
 
  Thanks
  --
  View this message in context:
  http://old.nabble.com/hbase-installation-tp33746422p33746422.html
  Sent from the HBase User mailing list archive at Nabble.com.
 
 


 --
 Nitin Pawar




-- 


∞
Shashwat Shriparv


Re: hbase installation

2012-04-25 Thread Mohammad Tariq
Change 127.0.1.1 in your /etc/hosts file to 127.0.0.1...also add the
hadoop-core.jar from hadoop folder and commons-configuration.jar from the
hadoob/lib to the hbae/lib folder.
On Apr 25, 2012 11:59 PM, shashwat shriparv dwivedishash...@gmail.com
wrote:

 just foll this

 http://hbase.apache.org/book/standalone_dist.html

 On Wed, Apr 25, 2012 at 7:05 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

  any error msg?
 
  On Wed, Apr 25, 2012 at 7:02 PM, shehreen shehreen_cute...@hotmail.com
  wrote:
 
  
   Hi
  
   am new to hbase and hadoop. I want to install hbase and to work with
  hbase
   writing mapreduce jobs for data in hbase. I installed hbase. It works
  well
   in standalone mode but dont start master and zookeeper properly on
   pseudodistributed mode.
  
   kindly help to resolve this problem.
  
   Thanks
   --
   View this message in context:
   http://old.nabble.com/hbase-installation-tp33746422p33746422.html
   Sent from the HBase User mailing list archive at Nabble.com.
  
  
 
 
  --
  Nitin Pawar
 



 --


 ∞
 Shashwat Shriparv



Re: hbase installation

2012-04-25 Thread shashwat shriparv
check out this too seems to make it work, do what tariq has suggested too

http://ria101.wordpress.com/2010/01/28/setup-hbase-in-pseudo-distributed-mode-and-connect-java-client/



On Thu, Apr 26, 2012 at 1:05 AM, Mohammad Tariq donta...@gmail.com wrote:

 Change 127.0.1.1 in your /etc/hosts file to 127.0.0.1...also add the
 hadoop-core.jar from hadoop folder and commons-configuration.jar from the
 hadoob/lib to the hbae/lib folder.
 On Apr 25, 2012 11:59 PM, shashwat shriparv dwivedishash...@gmail.com
 wrote:

  just foll this
 
  http://hbase.apache.org/book/standalone_dist.html
 
  On Wed, Apr 25, 2012 at 7:05 PM, Nitin Pawar nitinpawar...@gmail.com
  wrote:
 
   any error msg?
  
   On Wed, Apr 25, 2012 at 7:02 PM, shehreen 
 shehreen_cute...@hotmail.com
   wrote:
  
   
Hi
   
am new to hbase and hadoop. I want to install hbase and to work with
   hbase
writing mapreduce jobs for data in hbase. I installed hbase. It works
   well
in standalone mode but dont start master and zookeeper properly on
pseudodistributed mode.
   
kindly help to resolve this problem.
   
Thanks
--
View this message in context:
http://old.nabble.com/hbase-installation-tp33746422p33746422.html
Sent from the HBase User mailing list archive at Nabble.com.
   
   
  
  
   --
   Nitin Pawar
  
 
 
 
  --
 
 
  ∞
  Shashwat Shriparv
 




-- 


∞
Shashwat Shriparv


Re: 0.20 to 0.90 upgrade

2012-04-25 Thread Stack
On Wed, Apr 25, 2012 at 11:14 AM, David Charle dbchar2...@gmail.com wrote:
 As per the docs, it looks like painless to upgrade from 0.20.3 to 0.90
 (only need to run upgrade script if upgrading to 0.92).
 http://hbase.apache.org/book/upgrading.html#upgrade0.90

 Anyone has experience in upgrading from 0.20 to 0.90 or something similar
 with major upgrade ? Do we need to upgrade hadoop (0.20.1) as well ? or can
 0.20.1 work with 0.90 hbase ?


Hello David.

I would not call the 0.20 to 0.90 migration 'painless'.
'Well-exercised' and 'awkward but works' would more come to mind.
All of us long-term users have come out on the other side of this
step.

I would suggest you follow the instructions closely to minimize headache.

You should be ok going from 0.20.3 to 0.90.x.

St.Ack


Re: TIMERANGE performance on uniformly distributed keyspace

2012-04-25 Thread Wouter Bolsterlee
Hi,

2012-04-14 klockan 21:07 skrev Rob Verkuylen:
 As far as I understand sequential keys with a timerange scan have the best
 read performance possible, because of the HFile metadata, just as N
 indicates. Maybe adding Bloomfilters can further up the performance.

As far I understand it, Bloom filters are only useful for lookups based on
row key (and possibly column name), not for any time related lookups.

 Still, in my case with random keys I get quick(sub second) response from my
 scan example earlier. Does HBase keep all the HFile metadata in memory? I
 can't imagine it will start hitting hundreds, potentially thousands of
 HFiles, reading their metadata, start full scanning the files and returning
 rows. Does it?

What does quick response mean here? Is it the response time for the first
batch of results? This can be quite low if the scan finds rows that match
your scan criteria in a region/HFile at the start of the scanned range (e.g.
at the beginning of the table).

Did you also measure the time for the complete scan to complete (and the
load it causes on your cluster), and relate it to the performance of a
sequential scan over a secondary index table with monotonically increasing
keys (and the load that causes on your cluster since the index has to be
maintained and written to a single region server)?

— Wouter


signature.asc
Description: Digital signature


Re: Regions not cleared

2012-04-25 Thread Christian Schäfer

Hi,

as
 far as I know TTL as well as deletions just take effect on major 
compaction. (see http://hbase.apache.org/book.html#regions.arch  - 
8.7.5.5)

regards
Christian
Von: ajay.bhosle ajay.bho...@zapak.co.in
 An: user@hbase.apache.org 
 Gesendet: 14:33 Mittwoch, 25.April 2012
 Betreff: Regions not cleared
   
Hi,

 

I have set TTL in
 hbase table due to which the data is cleared after
specified time,  but the regions are not cleared even as the data inside the
regions are cleared. Can someone please let me know if I am missing
anything.

 

Thanks

Ajay
  

Re: HBase, CDH3U2, EC2

2012-04-25 Thread Bryan Beaudreault
We use ec2 and cdh as well and have around 80 Hadoop/hbase nodes deployed 
across a few different clusters. We use a combination of puppet for package 
management and fabric scripts for pushing configs and managing services. 

Our base AMI is a pretty bare centos6 install and puppet handles most of the 
rest after spinning up. Puppet also worked fine for managing configs, until we 
started having many clusters with different setups. That's the point we moved 
to fabric for that.

There is certainly an investment required for setting this stuff up initially, 
but it pays off as you continually need to spin up replacements or new nodes. 
We can do that with only a couple minutes of work at this point. 


Sent from iPhone.

On Apr 26, 2012, at 1:12 AM, Something Something mailinglist...@gmail.com 
wrote:

 Hello,
 
 We have a Hadoop cluster running on EC2 with Cloudera's
 hadoop-0.20.2-cdh3u2 distribution.  We are now ready to install HBase on
 it.  Trying to figure out what's the best way to accomplish this.
 
 We have quite a few machines in the cluster, so installing HBase on each
 machine would be time consuming.  But if that's the only way, we can do it
 by creating our own RPMs.  Is this document the best resource:
 https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-InstallingHBase
 
 Are there ec2 scripts that work with Cloudera's distribution to make this
 process easier?
 
 Please help.  Thanks.