I should qualify that statement, actually.
I was comparing scanning 1m KVs to getting 1m KVs when all KVs are returned.
As James Taylor pointed out to me privately: A fairer comparison would have
been to run a scan with a filter that lets x% of the rows pass (i.e. the
selectivity of the scan
If you store data in LSM trees you need compactions.
The advantage is that your data files are immutable.
MapR has a mutable file system and they probably store their data in something
more akin to B-Trees...?
Or maybe they somehow avoid the expensive merge sorting of many small files. It
seems
Looking at the code, it seems possible to do this server side within the
multi invocation: we could group the get by region, and do a single scan.
We could also add some heuristics if necessary...
On Tue, Feb 19, 2013 at 9:02 AM, lars hofhansl la...@apache.org wrote:
I should qualify that
HBase shell is a jruby shell and so you can invoke any java commands from
it.
For example:
import org.apache.hadoop.hbase.util.Bytes
Bytes.toLong(Bytes.toBytes(1000))
Not sure if this works as expected since I don't have a terminal in front
of me but you could try (assuming the SPLITS keyword
Tnx for your help,but it doesn't work.Do you have any other idea,cause I
must run it from the shell.
Farrokh
On Tue, Feb 19, 2013 at 1:30 PM, Viral Bajaria viral.baja...@gmail.comwrote:
HBase shell is a jruby shell and so you can invoke any java commands from
it.
For example:
import
Hi Jean-Marc,
I've validated this, it works perfectly. Very easy to implement and it's
very fast!
Thankfully in this project there isn't a lot of lists in each table, so I
won't have to create too many column families. In other scenarios it could
be a problem.
Many thanks,
Stas
On 16 February
Hi Stas,
Don't forget that you should always try to keep the number of columns
families lower than 3, else you might face some performances issues.
JM
2013/2/19, Stas Maksimov maksi...@gmail.com:
Hi Jean-Marc,
I've validated this, it works perfectly. Very easy to implement and it's
very
I just started with hbase. Therefore I created a table and filled this
table with some data. But after restarting my computer all the data
has gone. This even happens when stopping hbase with stop-hbase.sh.
How can this happen?
Which HBase / hadoop version were you using ?
Did you start the cluster in standalone mode ?
Thanks
On Tue, Feb 19, 2013 at 5:23 AM, Paul van Hoven
paul.van.ho...@googlemail.com wrote:
I just started with hbase. Therefore I created a table and filled this
table with some data. But after
I installed hbase via brew.
brew install hadoop hbase pig hive
Then I started hbase via start-hbase.sh command. Therefore I'm pretty
sure it is a standalone version.
2013/2/19 Ted Yu yuzhih...@gmail.com:
Which HBase / hadoop version were you using ?
Did you start the cluster in standalone
Hello Paul,
The default location for hbase data is /tmp so when you restart your
machine it will be deleted, you need to change it as per
http://hbase.apache.org/book.html#quickstart
--
Ibrahim
On Tue, Feb 19, 2013 at 5:54 PM, Ted Yu yuzhih...@gmail.com wrote:
Which HBase / hadoop version
A side question: if HTablePool is not encouraged to be used... how we
handle the thread safeness in using HTable? Any replacement for
HTablePool, in plan?
Thanks,
Best Regards,
Wei
From: Michel Segel michael_se...@hotmail.com
To: user@hbase.apache.org user@hbase.apache.org,
Date:
I have another question, if I am running a scan wrapped around multiple
rows in the same region, in the following way:
Scan scan = new scan(getWithMultipleRowsInSameRegion);
Now, how does execution occur. Is it just a sequential scan across the
entire region or does it seek to hfile blocks
Good question..
You create a class MyRO.
How many instances of MyRO exist per RS?
How many queries can access the instance MyRO at the same time?
On Feb 19, 2013, at 9:15 AM, Wei Tan w...@us.ibm.com wrote:
A side question: if HTablePool is not encouraged to be used... how we
handle
Hi,
I'm currently playing with hbase. The design of the rowkey seems to be
critical.
The rowkey for a certain database table of mine is:
timestamp+ipaddress
It looks something like this when performing a scan on the table in the shell:
hbase(main):012:0 scan 'ToyDataTable'
ROW
Hello Paul,
Try this and see if it works :
scan.setStartRow(Bytes.toBytes(startDate.getTime() + ));
scan.setStopRow(Bytes.toBytes(endDate.getTime() + 1 + ));
Also try not to use TS as the rowkey, as it may lead to RS hotspotting.
Just add a hash to your rowkeys so that data is
Hey Tariq,
thanks for your quick answer. I'm not sure if I got the idea in the
seond part of your answer. You mean if I use a timestamp as a rowkey I
should append a hash like this:
135727920+MD5HASH
and then the data would be distributed more equally?
2013/2/19 Mohammad Tariq
I should follow up with that I was asking why he was using an HTable Pool, not
saying that it was wrong.
Still. I think in the pool the writes shouldn't have to go to the WAL.
On Feb 19, 2013, at 10:01 AM, Michael Segel michael_se...@hotmail.com wrote:
Good question..
You create a
Imho, the easiest thing to do would be to write a filter.
You need to order the rows, then you can use hints to navigate to the next
row (SEEK_NEXT_USING_HINT).
The main drawback I see is that the filter will be invoked on all regions
servers, including the ones that don't need it. But this would
I could surround with a Try..Catch, but that would each time I insert a UUID
for the first time (99% of the time), I would do a checkAndPut(), catch the
resultant exception and perform a Put; so, 2 operations each reduce invocation,
which is what I was looking to avoid
Thanks you guys
On Mon, Feb 18, 2013 at 12:00 PM, Amit Sela am...@infolinks.com wrote:
Yes... that was emailing half asleep... :)
On Mon, Feb 18, 2013 at 7:23 AM, Anoop Sam John anoo...@huawei.com
wrote:
We dont have any hook like postScan().. In ur case you can try with
No. before the timestamp. All the row keys which are identical go to the
same region. This is the default Hbase behavior and is meant to make the
performance better. But sometimes the machine gets overloaded with reads
and writes because we get concentrated on that particular machine. For
example
Yeah it worked fine.
But as I understand: If I prefix my row key with something like
md5-hash + timestamp
then the rowkeys are probably evenly distributed but how would I
perform then a scan restricted to a special time range?
2013/2/19 Mohammad Tariq donta...@gmail.com:
No. before the
You can use
FuzzyRowFilterhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.htmlto
do that.
Have a look at this
linkhttp://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/.
You might find it helpful.
Warm
The other suggestion, sounds better to me where the multi call is modified
to run the Get(s) with this new filter or just initiate a scan with all the
get(s). Since the client automatically groups the multi calls by region
server and only calls the respective region servers. That would eliminate
I was thinking along the same lines. Doing a skip scan via filter hinting. The
problem is as you say that the Filter is instantiated everywhere and it might
be of significant size (have to maintain all row keys you are looking for).
RegionScanner now a reseek method, it is possible to do this
Interesting, in the client we're doing a group by location the multiget.
So we could have the filter as HBase core code, and then we could use it in
the client for the multiget: compared to my initial proposal, we don't have
to change anything in the server code and we reuse the filtering
As well, an advantage of going only to the servers needed is the famous
MTTR: there are a less chance to go to a dead server or to a region that
just moved.
On Tue, Feb 19, 2013 at 7:42 PM, Nicolas Liochon nkey...@gmail.com wrote:
Interesting, in the client we're doing a group by location the
I'm currently reading a book about hbase (hbase in action by manning).
In this book it is explained how to perform a scan if the rowkey is
made out of a md5 hash (page 45 in the book). My rowkey design (and
table filling method) looks like this:
SimpleDateFormat dateFormatter = new
Sorry, I had a mistake in my rowkey generation.
Thanks for reading!
2013/2/19 Paul van Hoven paul.van.ho...@googlemail.com:
I'm currently reading a book about hbase (hbase in action by manning).
In this book it is explained how to perform a scan if the rowkey is
made out of a md5 hash (page
A coprocessor is some code running in a server process. The resources
available and rules of the road are different from client side programming.
HTablePool (and HTable in general) is problematic for server side
programming in my opinion: http://search-hadoop.com/m/XtAi5Fogw32 Since
this comes up
1. Try batching your increment calls to a ListRow and use batch() to
execute it. Should reduce RPC calls by 2 magnitudes.
2. Combine batching with scanning more words, thus aggregating your count
for a certain word thus less Increment commands.
3. Enable Bloom Filters. Should speed up Increment by
Hi
Is there any way to balance just one table? I found one of my table is not
balanced, while all the other table is balanced. So I want to fix this table.
Best Regards,
Raymond Liu
What version of HBase are you using ?
0.94 has per-table load balancing.
Cheers
On Tue, Feb 19, 2013 at 5:01 PM, Liu, Raymond raymond@intel.com wrote:
Hi
Is there any way to balance just one table? I found one of my table is not
balanced, while all the other table is balanced. So I
0.94.1
Any cmd in shell? Or I need to change balance threshold to 0 an run global
balancer cmd in shell?
Best Regards,
Raymond Liu
-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, February 20, 2013 9:09 AM
To: user@hbase.apache.org
Subject: Re: Is
Unless I'm doing something wrong, it looks like the Maven repository
(http://mvnrepository.com/artifact/org.apache.hbase/hbase) only contains
HBase up to 0.94.3. Is there a different repo I should use, or if not,
any ETA on when it'll be updated?
James
I have come across this too, I think someone with authorization needs to
perform a maven release to the apache maven repository and/or maven central.
For now, I just end up compiling the dot release from trunk and deploy it
to my local repository for other projects to use.
Thanks,
Viral
On Tue,
I also came up with the same issue 1 day ago while building YCSB HBase
client for HBase 0.94.5. Later I used the 0.94.3 version to carry out my
work for the time being.
Regards,
Joarder Kamal
On 20 February 2013 12:32, Viral Bajaria viral.baja...@gmail.com wrote:
I have come across this too,
I choose to move region manually. Any other approaching?
0.94.1
Any cmd in shell? Or I need to change balance threshold to 0 an run global
balancer cmd in shell?
Best Regards,
Raymond Liu
-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday,
Same here, just tripped over this moments ago.
On Tue, Feb 19, 2013 at 5:30 PM, James Taylor jtay...@salesforce.comwrote:
Unless I'm doing something wrong, it looks like the Maven repository (
Hi Liu,
Why did not you simply called the balancer? If other tables are
already balanced, it should not touch them and will only balance the
table which is not balancer?
JM
2013/2/19, Liu, Raymond raymond@intel.com:
I choose to move region manually. Any other approaching?
0.94.1
Any
Hi: Guys
I have some problem in understanding the compaction process, Can
someone shed some light on me, much appreciate. Here is the problem:
Region Server after successfully generate the final compacted file,
it going through two steps:
1. move the above compacted file into
HBASE-3373 introduced hbase.master.loadbalance.bytable which defaults to
true.
This means when you issue 'balancer' command in shell, table should be
balanced for you.
Cheers
On Tue, Feb 19, 2013 at 5:16 PM, Liu, Raymond raymond@intel.com wrote:
0.94.1
Any cmd in shell? Or I need to
Hi
I do call balancer, while it seems it doesn't work. Might due to this table is
small and overall region number difference is within threshold?
-Original Message-
From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org]
Sent: Wednesday, February 20, 2013 10:59 AM
To:
What is the size of your table?
On 02/19/2013 10:40 PM, Liu, Raymond wrote:
Hi
I do call balancer, while it seems it doesn't work. Might due to this table is
small and overall region number difference is within threshold?
-Original Message-
From: Jean-Marc Spaggiari
You're right. Default sloppiness is 20%:
this.slop = conf.getFloat(hbase.regions.slop, (float) 0.2);
src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java
Meaning, region count on any server can be as far as 20% from average
region count.
You can tighten sloppiness.
On Tue,
I mean region number is small.
Overall I have say 3000 region on 4 node, while this table only have 96 region.
It won't be 24 for each region server, instead , will be something like
19/30/23/21 etc.
This means that I need to limit the slop to 0.02 etc? so that the balancer
actually run on
Yeah, Since balance is already done on each table, why slop is not calculate
upon each table...
You're right. Default sloppiness is 20%:
this.slop = conf.getFloat(hbase.regions.slop, (float) 0.2);
src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java
Meaning, region
Yes, Raymond.
You should lower sloppiness.
On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond raymond@intel.com wrote:
I mean region number is small.
Overall I have say 3000 region on 4 node, while this table only have 96
region. It won't be 24 for each region server, instead , will be
Hmm, in order to have the 96 region table be balanced within 20% On a 3000
region cluster when all other table is balanced.
the slop will need to be around 20%/30, say 0.006? won't it be too small?
Yes, Raymond.
You should lower sloppiness.
On Tue, Feb 19, 2013 at 7:48 PM, Liu, Raymond
bq. On a 3000 region cluster
Balancing is per-table. Meaning total number of regions doesn't come into
play.
On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond raymond@intel.com wrote:
Hmm, in order to have the 96 region table be balanced within 20% On a 3000
region cluster when all other
Hi, all,
When I scan any table, I got:
13/02/20 05:16:45 INFO ipc.HBaseRPC: Server at Rs1/10.20.118.3:60020 could not
be reached after 1 tries, giving up.
...
ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=7, exceptions:
...
What I observe:
1)
By the way, the hbase version I am using is 0.92.1-cdh4.0.1
From: Lu, Wei
Sent: Wednesday, February 20, 2013 1:28 PM
To: user@hbase.apache.org
Subject: region server of -ROOT- table is dead, but not reassigned
Hi, all,
When I scan any table, I got:
13/02/20 05:16:45 INFO ipc.HBaseRPC: Server
Hi, all,
When I scan any table, I got:
13/02/20 05:16:45 INFO ipc.HBaseRPC: Server at Rs1/10.20.118.3:60020 could not
be reached after 1 tries, giving up.
...
ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=7, exceptions:
...
What I observe:
You mean slop is also base on per table?
Weird, then it should work for my case let me check again.
Best Regards,
Raymond Liu
bq. On a 3000 region cluster
Balancing is per-table. Meaning total number of regions doesn't come into
play.
On Tue, Feb 19, 2013 at 7:55 PM, Liu, Raymond
Ideally the ROOT table should be reassigned once the RS carrying ROOT goes
down. This should happen automatically.
May be what does your logs say. That would give us an insight.
Before that if you can restart your master it may solve this problem. Even
then if it persists try to delete the zk
Time permitting, I will do that tomorrow.
From: Andrew Purtell apurt...@apache.org
To: user@hbase.apache.org user@hbase.apache.org
Sent: Tuesday, February 19, 2013 6:58 PM
Subject: Re: availability of 0.94.4 and 0.94.5 in maven repo?
Same here, just tripped
Hello again,
Doesn't anyone know how I can do this.
The problem is:
When you insert something from the shell, it supposes it's a string and
then does a Bytes.toBytes conversion on the string and stores it in hbase.
So how can I tell the shell that the thing I'm entering isn't a string? How
I can
58 matches
Mail list logo