HBase and MapReduce

2012-05-23 Thread Hemant Bhanawat
I have couple of questions related to MapReduce over HBase 1. HBase guarantees data locality of store files and Regionserver only if it stays up for long. If there are too many region movements or the server has been recycled recently, there is a high probability that store file blocks are not

Re: Using put for nullifying qualifiers

2012-05-23 Thread Kristoffer Sjögren
Ted: Awesome. I can think of several use cases where this is useful, but im pretty stuck on 0.92 right now. I tried the null-version trick but must be doing something wrong. How do I set version to null on a column? Isnt version equal to the timestamp (primitive long)? Setting timestamp to 0 and

Restrictions during compactions

2012-05-23 Thread Takahiko Kawasaki
Hello, I'm a newbie and wondering whether or not there is any restriction during HBase minor/major compactions. I read the online document but could not find any explicit mention about restrictions. What I'm mostly worrying about is whether read/write operations are blocked during compactions.

Re: Using put for nullifying qualifiers

2012-05-23 Thread Tom Brown
I didn't mean to set the version to null, I meant to include a revision of the column whose contents are empty. This empty revision will Still be returned by any gets on that row, but you can put code into your client that treats empty values as deleted. It's a bit of a hack, but it's the best I

Re: Using put for nullifying qualifiers

2012-05-23 Thread Kristoffer Sjögren
Gotcha. Columns are quite dynamic in my case, but since I need to fetch rows first anyways; a KeyOnlyFilter to first find them and then overwrite values will do just fine. Cheers, -Kristoffer

Re: HBase and MapReduce

2012-05-23 Thread Dave Revell
1. HBase guarantees data locality of store files and Regionserver only if it stays up for long. If there are too many region movements or the server has been recycled recently, there is a high probability that store file blocks are not local to the region server. But the getSplits command

Re: Restrictions during compactions

2012-05-23 Thread Dave Revell
On Wed, May 23, 2012 at 6:15 AM, Takahiko Kawasaki takahiko.kawas...@jibemobile.jp wrote: Hello, I'm a newbie and wondering whether or not there is any restriction during HBase minor/major compactions. I read the online document but could not find any explicit mention about restrictions.

HBase 0.94 thrift2 (TScan sturct missing filterString)

2012-05-23 Thread Jay T
We are currently on Hbase 0.90 (cdh3u3) and soon will be upgrading to Hbase 0.94. Our application is written in Python and we use Thrift to connect to HBase. Looking at Thrift2 (hbase.thrift) I noticed that TScan struct does not accept filterString as a parameter. This was introduced in HBase

Re: HBase 0.94 thrift2 (TScan sturct missing filterString)

2012-05-23 Thread Ted Yu
Why don't you log a JIRA ? By the time you reach the next iteration, hopefully this feature is there - especially if your team can contribute. On Wed, May 23, 2012 at 10:06 AM, Jay T jay.pyl...@gmail.com wrote: We are currently on Hbase 0.90 (cdh3u3) and soon will be upgrading to Hbase

Re: Can we store a HBase Result object using Put

2012-05-23 Thread Alex Baranau
I saw the need for such converting many times before. Should we add it as a public method in some utility class? (create JIRA for that?) Alex Baranau -- Sematext :: http://blog.sematext.com/ On Mon, May 21, 2012 at 4:26 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: How exactly are you

Re: HBase 0.94 thrift2 (TScan sturct missing filterString)

2012-05-23 Thread Jay T
** Added a JIRA to track this issue. https://issues.apache.org/jira/browse/HBASE-6073 Thanks, Jay On 5/23/12 1:14 PM, Ted Yu wrote: Why don't you log a JIRA ? By the time you reach the next iteration, hopefully this feature is there - especially if your team can contribute. On Wed, May 23,

Re: About HBase Memstore Flushes

2012-05-23 Thread Alex Baranau
Talked to J-D (and source code). It turned out that when hbase.regionserver.global.memstore.lowerLimit is reached flushes are forced without blocking reads (of course, if hbase.regionserver.global.memstore.upperLimit is not hit). Makes perfect sense. Though couldn't figure this out from settings

Re: Consider individual RSs performance when writing records with random keys?

2012-05-23 Thread Alex Baranau
Talked to Stack. It's not completely crazy idea. May be implemented as tiny lib, which can be used when row keys are randomized in some way by application logic. In this case randomization would take into account how individual regionservers behave (wrt writing speed). Would be very interesting

Re: About HBase Memstore Flushes

2012-05-23 Thread Jean-Daniel Cryans
On Wed, May 23, 2012 at 2:33 PM, Alex Baranau alex.barano...@gmail.com wrote: Talked to J-D (and source code). It turned out that when hbase.regionserver.global.memstore.lowerLimit is reached flushes are forced without blocking reads (of course, if hbase.regionserver.global.memstore.upperLimit

Re: Append and Put

2012-05-23 Thread Jean-Daniel Cryans
It's a facility so that you don't have to read+write in order to add something to a value. With Append the read is done in the region server before the write, also it solves the problem where you could have a race when there are multiple appenders. J-D On Tue, May 22, 2012 at 8:51 PM, NNever

Re: Unblock Put/Delete

2012-05-23 Thread NNever
Thanks Harsh, I'll try it ;) --- Best regards, nn 2012/5/24 Harsh J ha...@cloudera.com NNever, You can use asynchbase (an asynchronous API for HBase) for that need: https://github.com/stumbleupon/asynchbase On Thu, May 24, 2012 at 7:25 AM, NNever

Re: Append and Put

2012-05-23 Thread NNever
Thanks J-D. so it means 'Append' keeps write-lock only and 'Put' keeps write-lock/read-lock both? and if we use 'Append' instead of 'Put', then the chance Clients to wait will reduce, right? 2012/5/24 Jean-Daniel Cryans jdcry...@apache.org It's a facility so that you don't have to

Re: Append and Put

2012-05-23 Thread Jean-Daniel Cryans
On Wed, May 23, 2012 at 8:11 PM, NNever nnever...@gmail.com wrote: Thanks J-D. so it means 'Append' keeps write-lock only and 'Put' keeps write-lock/read-lock both? Yeah... not at all. First, there's no read lock. Then Put is just a Put, it takes a write lock. Append is a read+write

Re: auto-added RegionServers

2012-05-23 Thread Michael Drzal
I think a similar concept would be a great idea. It would definitely prevent the type of issue that you mentioned. I think that if it was done in a similar way to how it is handled for hadoop, where you can specify a list, but if you don't, you get autoadd, should keep everyone happy. Mike