Re: Optimizing Multi Gets in hbase

2013-02-19 Thread lars hofhansl
I should qualify that statement, actually. I was comparing scanning 1m KVs to getting 1m KVs when all KVs are returned. As James Taylor pointed out to me privately: A fairer comparison would have been to run a scan with a filter that lets x% of the rows pass (i.e. the selectivity of the scan

Re: HBase without compactions?

2013-02-19 Thread lars hofhansl
If you store data in LSM trees you need compactions. The advantage is that your data files are immutable. MapR has a mutable file system and they probably store their data in something more akin to B-Trees...? Or maybe they somehow avoid the expensive merge sorting of many small files. It seems

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
Looking at the code, it seems possible to do this server side within the multi invocation: we could group the get by region, and do a single scan. We could also add some heuristics if necessary... On Tue, Feb 19, 2013 at 9:02 AM, lars hofhansl la...@apache.org wrote: I should qualify that

Re: PreSplit the table with Long format

2013-02-19 Thread Viral Bajaria
HBase shell is a jruby shell and so you can invoke any java commands from it. For example: import org.apache.hadoop.hbase.util.Bytes Bytes.toLong(Bytes.toBytes(1000)) Not sure if this works as expected since I don't have a terminal in front of me but you could try (assuming the SPLITS keyword

Re: PreSplit the table with Long format

2013-02-19 Thread Farrokh Shahriari
Tnx for your help,but it doesn't work.Do you have any other idea,cause I must run it from the shell. Farrokh On Tue, Feb 19, 2013 at 1:30 PM, Viral Bajaria viral.baja...@gmail.comwrote: HBase shell is a jruby shell and so you can invoke any java commands from it. For example: import

Re: storing lists in columns

2013-02-19 Thread Stas Maksimov
Hi Jean-Marc, I've validated this, it works perfectly. Very easy to implement and it's very fast! Thankfully in this project there isn't a lot of lists in each table, so I won't have to create too many column families. In other scenarios it could be a problem. Many thanks, Stas On 16 February

Re: storing lists in columns

2013-02-19 Thread Jean-Marc Spaggiari
Hi Stas, Don't forget that you should always try to keep the number of columns families lower than 3, else you might face some performances issues. JM 2013/2/19, Stas Maksimov maksi...@gmail.com: Hi Jean-Marc, I've validated this, it works perfectly. Very easy to implement and it's very

Table deleted after restart of computer

2013-02-19 Thread Paul van Hoven
I just started with hbase. Therefore I created a table and filled this table with some data. But after restarting my computer all the data has gone. This even happens when stopping hbase with stop-hbase.sh. How can this happen?

Re: Table deleted after restart of computer

2013-02-19 Thread Ted Yu
Which HBase / hadoop version were you using ? Did you start the cluster in standalone mode ? Thanks On Tue, Feb 19, 2013 at 5:23 AM, Paul van Hoven paul.van.ho...@googlemail.com wrote: I just started with hbase. Therefore I created a table and filled this table with some data. But after

Re: Table deleted after restart of computer

2013-02-19 Thread Paul van Hoven
I installed hbase via brew. brew install hadoop hbase pig hive Then I started hbase via start-hbase.sh command. Therefore I'm pretty sure it is a standalone version. 2013/2/19 Ted Yu yuzhih...@gmail.com: Which HBase / hadoop version were you using ? Did you start the cluster in standalone

Re: Table deleted after restart of computer

2013-02-19 Thread Ibrahim Yakti
Hello Paul, The default location for hbase data is /tmp so when you restart your machine it will be deleted, you need to change it as per http://hbase.apache.org/book.html#quickstart -- Ibrahim On Tue, Feb 19, 2013 at 5:54 PM, Ted Yu yuzhih...@gmail.com wrote: Which HBase / hadoop version

Re: coprocessor enabled put very slow, help please~~~

2013-02-19 Thread Wei Tan
A side question: if HTablePool is not encouraged to be used... how we handle the thread safeness in using HTable? Any replacement for HTablePool, in plan? Thanks, Best Regards, Wei From: Michel Segel michael_se...@hotmail.com To: user@hbase.apache.org user@hbase.apache.org, Date:

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Varun Sharma
I have another question, if I am running a scan wrapped around multiple rows in the same region, in the following way: Scan scan = new scan(getWithMultipleRowsInSameRegion); Now, how does execution occur. Is it just a sequential scan across the entire region or does it seek to hfile blocks

Re: coprocessor enabled put very slow, help please~~~

2013-02-19 Thread Michael Segel
Good question.. You create a class MyRO. How many instances of MyRO exist per RS? How many queries can access the instance MyRO at the same time? On Feb 19, 2013, at 9:15 AM, Wei Tan w...@us.ibm.com wrote: A side question: if HTablePool is not encouraged to be used... how we handle

Rowkey design question

2013-02-19 Thread Paul van Hoven
Hi, I'm currently playing with hbase. The design of the rowkey seems to be critical. The rowkey for a certain database table of mine is: timestamp+ipaddress It looks something like this when performing a scan on the table in the shell: hbase(main):012:0 scan 'ToyDataTable' ROW

Re: Rowkey design question

2013-02-19 Thread Mohammad Tariq
Hello Paul, Try this and see if it works : scan.setStartRow(Bytes.toBytes(startDate.getTime() + )); scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + )); Also try not to use TS as the rowkey, as it may lead to RS hotspotting. Just add a hash to your rowkeys so that data is

Re: Rowkey design question

2013-02-19 Thread Paul van Hoven
Hey Tariq, thanks for your quick answer. I'm not sure if I got the idea in the seond part of your answer. You mean if I use a timestamp as a rowkey I should append a hash like this: 135727920+MD5HASH and then the data would be distributed more equally? 2013/2/19 Mohammad Tariq

Re: coprocessor enabled put very slow, help please~~~

2013-02-19 Thread Michael Segel
I should follow up with that I was asking why he was using an HTable Pool, not saying that it was wrong. Still. I think in the pool the writes shouldn't have to go to the WAL. On Feb 19, 2013, at 10:01 AM, Michael Segel michael_se...@hotmail.com wrote: Good question.. You create a

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
Imho, the easiest thing to do would be to write a filter. You need to order the rows, then you can use hints to navigate to the next row (SEEK_NEXT_USING_HINT). The main drawback I see is that the filter will be invoked on all regions servers, including the ones that don't need it. But this would

Re: Using HBase for Deduping

2013-02-19 Thread Rahul Ravindran
I could surround with a Try..Catch, but that would each time I insert a UUID for the first time (99% of the time), I would do a checkAndPut(), catch the resultant exception and perform a Put; so, 2 operations each reduce invocation, which is what I was looking to avoid

Re: Co-Processor in scanning the HBase's Table

2013-02-19 Thread Farrokh Shahriari
Thanks you guys On Mon, Feb 18, 2013 at 12:00 PM, Amit Sela am...@infolinks.com wrote: Yes... that was emailing half asleep... :) On Mon, Feb 18, 2013 at 7:23 AM, Anoop Sam John anoo...@huawei.com wrote: We dont have any hook like postScan().. In ur case you can try with

Re: Rowkey design question

2013-02-19 Thread Mohammad Tariq
No. before the timestamp. All the row keys which are identical go to the same region. This is the default Hbase behavior and is meant to make the performance better. But sometimes the machine gets overloaded with reads and writes because we get concentrated on that particular machine. For example

Re: Rowkey design question

2013-02-19 Thread Paul van Hoven
Yeah it worked fine. But as I understand: If I prefix my row key with something like md5-hash + timestamp then the rowkeys are probably evenly distributed but how would I perform then a scan restricted to a special time range? 2013/2/19 Mohammad Tariq donta...@gmail.com: No. before the

Re: Rowkey design question

2013-02-19 Thread Mohammad Tariq
You can use FuzzyRowFilterhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.htmlto do that. Have a look at this linkhttp://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/. You might find it helpful. Warm

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Varun Sharma
The other suggestion, sounds better to me where the multi call is modified to run the Get(s) with this new filter or just initiate a scan with all the get(s). Since the client automatically groups the multi calls by region server and only calls the respective region servers. That would eliminate

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread lars hofhansl
I was thinking along the same lines. Doing a skip scan via filter hinting. The problem is as you say that the Filter is instantiated everywhere and it might be of significant size (have to maintain all row keys you are looking for). RegionScanner now a reseek method, it is possible to do this

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
Interesting, in the client we're doing a group by location the multiget. So we could have the filter as HBase core code, and then we could use it in the client for the multiget: compared to my initial proposal, we don't have to change anything in the server code and we reuse the filtering

Re: Optimizing Multi Gets in hbase

2013-02-19 Thread Nicolas Liochon
As well, an advantage of going only to the servers needed is the famous MTTR: there are a less chance to go to a dead server or to a region that just moved. On Tue, Feb 19, 2013 at 7:42 PM, Nicolas Liochon nkey...@gmail.com wrote: Interesting, in the client we're doing a group by location the

Scanning a row for certain md5hash does not work

2013-02-19 Thread Paul van Hoven
I'm currently reading a book about hbase (hbase in action by manning). In this book it is explained how to perform a scan if the rowkey is made out of a md5 hash (page 45 in the book). My rowkey design (and table filling method) looks like this: SimpleDateFormat dateFormatter = new

Re: Scanning a row for certain md5hash does not work

2013-02-19 Thread Paul van Hoven
Sorry, I had a mistake in my rowkey generation. Thanks for reading! 2013/2/19 Paul van Hoven paul.van.ho...@googlemail.com: I'm currently reading a book about hbase (hbase in action by manning). In this book it is explained how to perform a scan if the rowkey is made out of a md5 hash (page

Re: coprocessor enabled put very slow, help please~~~

2013-02-19 Thread Andrew Purtell
A coprocessor is some code running in a server process. The resources available and rules of the road are different from client side programming. HTablePool (and HTable in general) is problematic for server side programming in my opinion: http://search-hadoop.com/m/XtAi5Fogw32 Since this comes up

Re: coprocessor enabled put very slow, help please~~~

2013-02-19 Thread Asaf Mesika
1. Try batching your increment calls to a ListRow and use batch() to execute it. Should reduce RPC calls by 2 magnitudes. 2. Combine batching with scanning more words, thus aggregating your count for a certain word thus less Increment commands. 3. Enable Bloom Filters. Should speed up Increment by

Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I want to fix this table. Best Regards, Raymond Liu

Re: Is there any way to balance one table?

2013-02-19 Thread Ted Yu
What version of HBase are you using ? 0.94 has per-table load balancing. Cheers On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote: Hi Is there any way to balance just one table? I found one of my table is not balanced, while all the other table is balanced. So I

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, February 20, 2013 9:09 AM To: user@hbase.apache.org Subject: Re: Is

availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread James Taylor
Unless I'm doing something wrong, it looks like the Maven repository (http://mvnrepository.com/artifact/org.apache.hbase/hbase) only contains HBase up to 0.94.3. Is there a different repo I should use, or if not, any ETA on when it'll be updated? James

Re: availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread Viral Bajaria
I have come across this too, I think someone with authorization needs to perform a maven release to the apache maven repository and/or maven central. For now, I just end up compiling the dot release from trunk and deploy it to my local repository for other projects to use. Thanks, Viral On Tue,

Re: availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread Joarder KAMAL
I also came up with the same issue 1 day ago while building YCSB HBase client for HBase 0.94.5. Later I used the 0.94.3 version to carry out my work for the time being. Regards, Joarder Kamal On 20 February 2013 12:32, Viral Bajaria viral.baja...@gmail.com wrote: I have come across this too,

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
I choose to move region manually. Any other approaching? 0.94.1 Any cmd in shell? Or I need to change balance threshold to 0 an run global balancer cmd in shell? Best Regards, Raymond Liu -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday,

Re: availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread Andrew Purtell
Same here, just tripped over this moments ago. On Tue, Feb 19, 2013 at 5:30 PM, James Taylor jtay...@salesforce.comwrote: Unless I'm doing something wrong, it looks like the Maven repository (

Re: Is there any way to balance one table?

2013-02-19 Thread Jean-Marc Spaggiari
Hi Liu, Why did not you simply called the balancer? If other tables are already balanced, it should not touch them and will only balance the table which is not balancer? JM 2013/2/19, Liu, Raymond raymond@intel.com: I choose to move region manually. Any other approaching? 0.94.1 Any

Problem In Understanding Compaction Process

2013-02-19 Thread Anty
Hi: Guys I have some problem in understanding the compaction process, Can someone shed some light on me, much appreciate. Here is the problem: Region Server after successfully generate the final compacted file, it going through two steps: 1. move the above compacted file into

Re: Is there any way to balance one table?

2013-02-19 Thread Ted Yu
HBASE-3373 introduced hbase.master.loadbalance.bytable which defaults to true. This means when you issue 'balancer' command in shell, table should be balanced for you. Cheers On Tue, Feb 19, 2013 at 5:16 PM, Liu, Raymond raymond@intel.com wrote: 0.94.1 Any cmd in shell? Or I need to

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Wednesday, February 20, 2013 10:59 AM To:

Re: Is there any way to balance one table?

2013-02-19 Thread Marcos Ortiz
What is the size of your table? On 02/19/2013 10:40 PM, Liu, Raymond wrote: Hi I do call balancer, while it seems it doesn't work. Might due to this table is small and overall region number difference is within threshold? -Original Message- From: Jean-Marc Spaggiari

Re: Is there any way to balance one table?

2013-02-19 Thread Ted Yu
You're right. Default sloppiness is 20%: this.slop = conf.getFloat(hbase.regions.slop, (float) 0.2); src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java Meaning, region count on any server can be as far as 20% from average region count. You can tighten sloppiness. On Tue,

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
I mean region number is small. Overall I have say 3000 region on 4 node, while this table only have 96 region. It won't be 24 for each region server, instead , will be something like 19/30/23/21 etc. This means that I need to limit the slop to 0.02 etc? so that the balancer actually run on

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
Yeah, Since balance is already done on each table, why slop is not calculate upon each table... You're right. Default sloppiness is 20%: this.slop = conf.getFloat(hbase.regions.slop, (float) 0.2); src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java Meaning, region

Re: Is there any way to balance one table?

2013-02-19 Thread Ted Yu
Yes, Raymond. You should lower sloppiness. On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond raymond@intel.com wrote: I mean region number is small. Overall I have say 3000 region on 4 node, while this table only have 96 region. It won't be 24 for each region server, instead , will be

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
Hmm, in order to have the 96 region table be balanced within 20% On a 3000 region cluster when all other table is balanced. the slop will need to be around 20%/30, say 0.006? won't it be too small? Yes, Raymond. You should lower sloppiness. On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond

Re: Is there any way to balance one table?

2013-02-19 Thread Ted Yu
bq. On a 3000 region cluster Balancing is per-table. Meaning total number of regions doesn't come into play. On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond raymond@intel.com wrote: Hmm, in order to have the 96 region table be balanced within 20% On a 3000 region cluster when all other

region server of -ROOT- table is dead, but not reassigned

2013-02-19 Thread Lu, Wei
Hi, all, When I scan any table, I got: 13/02/20 05:16:45 INFO ipc.HBaseRPC: Server at Rs1/10.20.118.3:60020 could not be reached after 1 tries, giving up. ... ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: ... What I observe: 1)

RE: region server of -ROOT- table is dead, but not reassigned

2013-02-19 Thread Lu, Wei
By the way, the hbase version I am using is 0.92.1-cdh4.0.1 From: Lu, Wei Sent: Wednesday, February 20, 2013 1:28 PM To: user@hbase.apache.org Subject: region server of -ROOT- table is dead, but not reassigned Hi, all, When I scan any table, I got: 13/02/20 05:16:45 INFO ipc.HBaseRPC: Server

[resend] region server of -ROOT- table is dead, but not reassigned

2013-02-19 Thread Lu, Wei
Hi, all, When I scan any table, I got: 13/02/20 05:16:45 INFO ipc.HBaseRPC: Server at Rs1/10.20.118.3:60020 could not be reached after 1 tries, giving up. ... ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=7, exceptions: ... What I observe:

RE: Is there any way to balance one table?

2013-02-19 Thread Liu, Raymond
You mean slop is also base on per table? Weird, then it should work for my case let me check again. Best Regards, Raymond Liu bq. On a 3000 region cluster Balancing is per-table. Meaning total number of regions doesn't come into play. On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond

Re: [resend] region server of -ROOT- table is dead, but not reassigned

2013-02-19 Thread ramkrishna vasudevan
Ideally the ROOT table should be reassigned once the RS carrying ROOT goes down. This should happen automatically. May be what does your logs say. That would give us an insight. Before that if you can restart your master it may solve this problem. Even then if it persists try to delete the zk

Re: availability of 0.94.4 and 0.94.5 in maven repo?

2013-02-19 Thread lars hofhansl
Time permitting, I will do that tomorrow. From: Andrew Purtell apurt...@apache.org To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, February 19, 2013 6:58 PM Subject: Re: availability of 0.94.4 and 0.94.5 in maven repo? Same here, just tripped

Re: PreSplit the table with Long format

2013-02-19 Thread Farrokh Shahriari
Hello again, Doesn't anyone know how I can do this. The problem is: When you insert something from the shell, it supposes it's a string and then does a Bytes.toBytes conversion on the string and stores it in hbase. So how can I tell the shell that the thing I'm entering isn't a string? How I can