Hi all,
I was wondering how many HBase users there are in Paris (France...).
Would you guys be interested in participating in a Paris-based user group?
The idea would be to share HBase practises, with something like a meet-up
per quarter.
Reply to me directly or on the list, as you prefer.
is to copy data from old to new and switch clients the new cluster
and I am lookin for the best strategy to manage it.
A scanner based on timestamp should be enougth to get the last updates
after switching (But trying to keep it short).
Cheers,
--
Damien
2012/9/27 n keywal nkey
limiting export on starttime
http://hadoop.apache.org/docs/hdfs/current/hftp.html
This way could be safe with a minimal downtime ?
Cheers,
2012/9/28 n keywal nkey...@gmail.com
Depending on what you're doing with the data, I guess you might have some
corner cases, especially after a major
Hi,
I would like to direct you to the reference guide, but I must acknowledge
that, well, it's a reference guide, hence not really easy for a plain new
start.
You should have a look at Lars' blog (and may be buy his book), and
especially this entry:
,scan tables as the two nodes are in the
cluster as namenode datanode.
On Thu, Sep 27, 2012 at 1:02 PM, n keywal nkey...@gmail.com wrote:
Hi,
I would like to direct you to the reference guide, but I must acknowledge
that, well, it's a reference guide, hence not really easy for a plain new
You don't have to migrate the data when you upgrade, it's done on the fly.
But it seems you want to do something more complex? A kind of realtime
replication between two clusters in two different versions?
On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy dha...@viadeoteam.com wrote:
Hello,
Hi,
I think there is a confusion between hbase replication (replication between
clusters) and hdfs replication (replication between datanodes).
hdfs replication is (more or less) hidden and done for you.
Nicolas
On Wed, Sep 26, 2012 at 9:20 AM, Venkateswara Rao Dokku dvrao@gmail.com
wrote:
DoNotRetryIOException means that the error is considered at permanent: it's
not a missing regionserver, but for example a table that's not enabled.
I would expect a more detailed exception (a caused by or something alike).
If it's missing, you should have more info in the regionserver logs.
On
For each file; there is a time range. When you scan/search, the file is
skipped if there is no overlap between the file timerange and the timerange
of the query. As there are other parameters as well (row distribution,
compaction effects, cache, bloom filters, ...) it's difficult to know in
Hi,
You can use HBase in standalone mode? Cf.
http://hbase.apache.org/book.html#standalone_dist?
I guess you already tried and it didn't work?
Nicolas
On Fri, Sep 7, 2012 at 9:57 AM, Jeroen Hoek jer...@lable.org wrote:
Hello,
We are developing a web-application that uses HBase as database,
Hi,
With 8 regionservers, yes, you can. Target a few hundreds by default imho.
N.
On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 tewil...@gmail.com wrote:
+HBase users.
-- Forwarded message --
From: Dmitriy Ryaboy dvrya...@gmail.com
Date: 2012/9/4
Subject: Re: Extremely slow when
Hi Cristopher,
HBase starts a minicluster for many of its tests because we have a lot of
destructive tests. Or the non destructive tests would be impacted by the
destructive tests. When writing a client application, you usually don't
need to do that: you can rely on the same instance for all your
On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber
cristofer.we...@neogrid.com wrote:
For the other adapters (Cassandra, Cassandra + Thrift, Cassandra +
Astyanax, etc) they managed to run tests as Internal and External for unit
tests and also have a profile for Performance and Concurrent tests,
Hi Bing,
You should expect HBase to be slower in the generic case:
1) it writes much more data (see hbase data model), with extra columns
qualifiers, timestamps so on.
2) the data is written multiple times: once in the write-ahead-log, once
per replica on datanode so on again.
3) there are
Totally randoms (even on keys that do not exist).
It worth checking if it matches your real use cases. I expect that read by
row key are most of the time on existing rows (as a traditional db
relationship or a UI or workflow driven stuff), even if I'm sure it's
possible to have something
Hi Adrien,
What do you think about that hypothesis ?
Yes, there is something fishy to look at here. Difficult to say
without more logs as well.
Are your gets totally random, or are you doing gets on rows that do
exist? That would explain the number of request vs. empty/full
regions.
It does
Hi,
For a possible future, there is as well this to monitor:
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html
More or less requires JDK 1.7
See HBASE-2039
Cheers,
N.
On Thu, Aug 23, 2012 at 8:16 AM, J Mohamed Zahoor jmo...@gmail.com wrote:
Slab cache might help
Hi Adrien,
As well, if you can share the client code (number of threads, regions,
is it a set of single get, or are they multi gets, this kind of
stuff).
Cheers,
N.
On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
Hi Adrien,
I would love to see the region
Hi,
Please use the user mailing list (added at dest) for this type of
questions instead of the dev list (now in bcc).
It's a little bit strange to use the full distributed mode with a
single region server. Is the Pseudo-distributed mode working?
Check the number of datanodes vs. dfs.replication
Hi,
Well the first steps would be:
1) Use the JDK 1.6 from Oracle. 1.7 is not supported yet.
2) Check the content of
http://hbase.apache.org/book.html#configuration to set up your first
cluster. Worth reading the whole guide imho.
3) Start with the last released version (.94), except if you have
missing any
basic setup configuration.
On Wed, Aug 22, 2012 at 12:00 AM, N Keywal nkey...@gmail.com wrote:
Hi,
Please use the user mailing list (added at dest) for this type of
questions instead of the dev list (now in bcc).
It's a little bit strange to use the full distributed mode
Hi,
What are your queries exactly? What's the HBase version?
The mechanism is:
- There is a location cache, per HConnection, on the client
- The client first tries the region server in its cache
- if it fails, the client removes this entry from the cache and enters
the retry loop
- there is a
different between my servers, and there is no problem with network.
2012/8/10 N Keywal nkey...@gmail.com
Hi,
What are your queries exactly? What's the HBase version?
The mechanism is:
- There is a location cache, per HConnection, on the client
- The client first tries the region server
Hi Mohit,
For simple cases, it works for me for hbase 0.94 at least. But I'm not
sure it works for all features. I've never tried to run hbase unit
tests on windows for example.
N.
On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
I am trying to run mini cluster using
Hi,
The issue is in ZooKeeper, not directly HBase. It seems its data is
corrupted, so it cannot start. You can configure zookeeper to another
data directory to make it start.
N.
On Thu, Aug 2, 2012 at 11:11 AM, abloz...@gmail.com abloz...@gmail.com wrote:
I even move /hbase to hbase2, and
Hi Jay,
Yes, the whole log would be interesting, plus the logs of the datanode
on the same box as the dead RS.
What's your hbase hdfs versions?
The RS should be immune to hdfs errors. There are known issues (see
HDFS-3701), but it seems you have something different...
This:
. We are using HBase 0.94 on Hadoop
1.0.3.
I have uploaded the logs here:
Region Server log: http://pastebin.com/QEQ22UnU
Data Node log: http://pastebin.com/DF0JNL8K
Appreciate your help in figuring this out.
Thanks,
Jay
On 7/30/12 1:02 PM, N Keywal wrote:
Hi Jay,
Yes, the whole
Hi Bryan,
It's a difficult question, because dfs.socket.timeout is used all over
the place in hdfs. I'm currently documenting this.
Especially:
- it's used for connections between datanodes, and not only for
connections between hdfs clients hdfs datanodes.
- It's also used for the two types of
is
in the HDFS client code, couldn't I set this dfs.socket.timeout in my
hbase-site.xml and it would only affect hbase connections to hdfs? I.e. we
wouldn't have to worry about affecting connections between datanodes, etc.
--
Bryan Beaudreault
On Wednesday, July 18, 2012 at 4:38 AM, N
Hi,
There is no real limits as far as I know. As you will have one region
per table (at least :-), the number of region will be something to
monitor carefully if you need thousands of table. See
http://hbase.apache.org/book.html#arch.regions.size.
Don't forget that you can add as many column as
.
So - was hoping to get a confirmation if this is the only side effect.
Again - this is on the client side - I wouldn't risk doing this on the
cluster side ...
--Suraj
On Mon, Jul 9, 2012 at 9:44 AM, N Keywal nkey...@gmail.com wrote:
Hi,
What you're describing -the 35 minutes recovery
:12 AM, N Keywal nkey...@gmail.com wrote:
Thanks for the jira.
The client can be connected to multiple RS, depending on the rows is
working on. So yes it's initial, but it's a dynamic initial :-).
This said there is a retry on error...
On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma svarma
Hi,
What you're describing -the 35 minutes recovery time- seems to match
the code. And it's a bug (still there on trunk). Could you please
create a jira for it? If you have the logs it even better.
Lowering the ipc.socket.timeout seems to be an acceptable partial
workaround. Setting it to 10s
Hi Cyril,
BTW, have you checked dfs.datanode.max.xcievers and ulimit -n? When
underconfigured they can cause this type of errors, even if it seems
it's not the case here...
Cheers,
N.
On Fri, Jul 6, 2012 at 11:31 AM, Cyril Scetbon cyril.scet...@free.fr wrote:
The file is now missing but I
Hi,
It's a ZK expiry on sunday 1st. Root cause could be the leap second bug?
N.
On Thu, Jul 5, 2012 at 8:59 AM, lztaomin lztao...@163.com wrote:
HI ALL
My HBase group a total of 3 machine, Hadoop HBase mounted in the same
machine, zookeeper using HBase own. Operation 3 months after
Would Datanode issues impact the HMaster stability?
Yes and no. If you have only a few datanodes down, their should be no
issue. When there are enough missing datanodes to make some blocks not
available at all in the cluster, there are many tasks that can not be
done anymore (to say the least,
(moving this to the user mailing list, with the dev one in bcc)
From what you said it should be
customerid_MIN_TX_ID to customerid_MAX_TX_ID
But only if customerid size is constant.
Note that with this rowkey design there will be very few regions
involved, so it's unlikely to be parallelized.
be
many
due to randomness.
Regards
Ram
-Original Message-
From: N Keywal [mailto:nkey...@gmail.com]
Sent: Thursday, June 28, 2012 2:00 PM
To: user@hbase.apache.org
Subject: Re: Scan vs Put vs Get
Hi Jean-Marc,
Interesting :-)
Added to Anoop questions:
What's the hbase
of the performance
when there is a real selection. Your code for list of gets was correct
imho. I'm interested by the results if you activate bloomfilters.
Cheers,
N.
On Thu, Jun 28, 2012 at 3:45 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Hi N Keywal,
This result:
Time to read 1 lines
For the filter list my guess is that you're filtering out all rows
because RandomRowFilter#chance is not initialized (it should be
something like RandomRowFilter rrf = new RandomRowFilter(0.5);)
But note that this test will never be comparable to the test with a
list of gets. You can make it as
Hi,
Usually I'm inserting about 40 000 rows at a time. Should I do 40 000
calls to put? Or is there any bulkinsert method?
There is this chapter on bulk loading:
http://hbase.apache.org/book.html#arch.bulk.load
But for 40K rows you may just want to use void put(final ListPut
puts) in
Hi,
You can have this if the region moved, i.e. was previously managed by
this region server and is now managed by another. The client keeps a
cache of the locations, so after a move it will first contact the
wrong server. Then the client will update its cache. By default there
are 10 internal
Yes, this is the balance process (as its name says: keeps the cluster
balanced), and it's not related to the process of looking after dead
nodes.
The nodes are monitored by ZooKeeper, the timeout is by default 180
seconds (setting: zookeeper.session.timeout)
On Fri, Jun 1, 2012 at 4:40 PM, Cyril
There is a one to one mapping between the result and the get arrays;
so the result for rowkeys[i] is in results[i].
That's not what you want?
On Tue, May 29, 2012 at 9:34 AM, Ben Kim benkimkim...@gmail.com wrote:
Maybe I showed you a bad example. This makes more sense when it comes to
using
From http://hbase.apache.org/book/os.html:
HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some
other distributions, for example, will default to 127.0.1.1 and this
will cause problems for you.
It worths reading the whole section ;-).
You also don't need to set the master
Hi,
If you're speaking about preparing the query it's in HTable and
HConnectionManager.
If you're on the pure network level, then, on trunk, it's now done
with a third party called protobuf.
See the code from HConnectionManager#createCallable to see how it's used.
Cheers,
N.
On Tue, May 29,
to that somewhere, are those alternate connection libs?)
I know protobuf is just generating types for various languages...
On Tue, May 29, 2012 at 10:26 AM, N Keywal nkey...@gmail.com wrote:
Hi,
If you're speaking about preparing the query it's in HTable and
HConnectionManager.
If you're
the client, for data intensive tasks
like mapreduce etc. where they want direct access to the files?
On Tue, May 29, 2012 at 11:00 AM, N Keywal nkey...@gmail.com wrote:
There are two levels:
- communication between hbase client and hbase cluster: this is the
code you have in hbase client
Hi,
For the multiget, if it's small enough, it will be:
- parallelized on all region servers concerned. i.e. you will be as
fast as the slowest region server.
- there will be one query per region server (i.e. gets are grouped by
region server).
If there are too many gets, it will be split in
Hi,
What version are you using?
On trunk, put(Put) and put(ListPut) calls the same code, so I would
expect comparable performances when autoflush it set to false.
However, with 250K small puts you may have the gc playing a role.
What are the results if you do the inserts with 50 times 5K rows?
Hi,
There could be multiple issues, but it's strange to have in hbase-site.xml
valuehdfs://namenode:9000/hbase/value
while the core-site.xml says:
valuehdfs://namenode:54310//value
The two entries should match.
I would recommend to:
- use netstat to check the ports (netstat -l)
- do the
-14, at 4:07 PM, N Keywal nkey...@gmail.com wrote:
Hi,
There could be multiple issues, but it's strange to have in hbase-site.xml
valuehdfs://namenode:9000/hbase/value
while the core-site.xml says:
valuehdfs://namenode:54310//value
The two entries should match.
I would recommend
Hi Alex,
On the same idea, note that hbase is launched with
-XX:OnOutOfMemoryError=kill -9 %p.
N.
On Tue, May 1, 2012 at 10:41 AM, Igal Shilman ig...@wix.com wrote:
Hi Alex, just to rule out, oom killer,
Try this:
Hi,
fwiw, the close method was added in HBaseAdmin for HBase 0.90.5.
N.
On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee softse@gmail.com wrote:
I don't think this issue can resovle the problem
ZKWatcher is removed,but the configuration and HConnectionImplementation
objects are still in
Hi,
For the filtering part, every HFile is associated to a set of meta data.
This meta data includes the timerange. So if there is no overlap between
the time range you want and the time range of the store, the HFile is
totally skipped.
This work is done in StoreScanner#selectScannersFrom
Hi,
Literally, it means that ZooKeeper is there but the hbase client can't find
the hbase master address in it.
By default, the node used is /hbase/master, and it contains the hostname
and port of the master.
You can check its content in ZK by doing a get /hbase/master in
bin/zkCli.sh (see
Hi,
It should. I haven't tested the .90, but I tested the hbase trunk a few
month ago vs. ZK 3.4.x and ZK 3.3.x and it was working.
N.
2012/4/5 lulynn_2008 lulynn_2...@163.com
Hi,
I found hbase-0.90.2 use zookeeper-3.4.2. Can this version hbase work with
zookeeper-3.3.4?
Thank you.
, at 10:42 AM, N Keywal wrote:
It must be waiting for the master. Have you launched the master?
On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman
nabib.elrah...@tubemogul.com wrote:
Hi Guys,
I'm starting up an region server and it stalls on initialization. I took
a thread dump and found
Hi,
Just a few... See http://hbase.apache.org/book.html#number.of.cfs
N.
On Tue, Mar 20, 2012 at 12:39 PM, Manish Bhoge
manishbh...@rocketmail.comwrote:
Very basic question:
How many column families possible in a table in Hbase? I know you can have
thousand of columns in a family. But I
Hi,
The way you describe the in memory caching component, it looks very
similar to HBase memstore. Any reason for not relying on it?
N.
On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian
christian.kleegr...@siemens.com wrote:
Dear all,
We are currently working on an architecture for a
,
Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft:
Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg,
HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
-Ursprüngliche Nachricht-
Von: N Keywal [mailto:nkey...@gmail.com]
Gesendet: Freitag
You will need the hadoop jar for this. Hbase uses hadoop for common stuff
like the configuration you've seen, so even a simple client needs it.
N.
Le 12 mars 2012 12:06, Mahdi Negahi negahi.ma...@hotmail.com a écrit :
Is it necessary to install hadoop for hbase, if want use Hbase in my
only jar files. They are already in the hbase distrib (i.e. if you download
hbase, you get the hadoop jar files you need). You just need to import them
in your IDE.
On Mon, Mar 12, 2012 at 1:05 PM, Mahdi Negahi negahi.ma...@hotmail.comwrote:
I so confused. I must install Hadoop or use only
Hi,
Yes and no.
No, because as a table can have millions of columns and these columns can
be different for every row, the only way to get all the columns is to scan
the whole table.
Yes, because if you scan the table you can have the columns names. See
Result#getMap: it's organized by family --
Hi,
It's replaced by HBaseTestingUtility.
Cheers,
N.
2012/3/8 lulynn_2008 lulynn_2...@163.com
Hi All,
I am integrating flume-0.9.4 with hbase-0.92.0. And I find hbase-0.92.0
removed HBaseClusterTestCase which is used in flume-0.9.4.
My question is:
Is there any replacement for
You cannot use the option -D*skipTests* ?
On Wed, Feb 15, 2012 at 5:27 PM, Stack st...@duboce.net wrote:
On Tue, Feb 14, 2012 at 11:18 PM, Ulrich Staudinger
ustaudin...@activequant.com wrote:
Hi St.Ack,
i don't wanna be a pain in the back, but any progress on this?
You are not being
Hi,
The client needs to connect to zookeeper as well. You haven't set the
parameters for zookeeper, so it goes with the default settings
(localhost/2181), hence the error you're seeing. Set the zookeeper
connection property in the client, it should work.
This should do it:
conf
Hi,
FYI. I've been doing some tests mixing zookeeperclient/server versions
onhbase trunk, by executing medium category unit tests with a standalone
zookeeper server (Mixing versions 3.3 3.4 is officially supported by
Zookeeper, but was worth checking)
I tested:
Zookeeper Server server 3.3.4 and
Hi,
Yes, each cell is associated to a long. By default it's a timestamps, but
you can set it yourself when you create the put.
It's stored everywhere.
You've got a lot of information and links on this in the hbase book (
http://hbase.apache.org/book.html#versions)
Cheers,
N.
On Mon, Jan 30,
Hi Damien,
Can't say for the Python stuff.
You can reuse or extract what you need in HBaseTestingUtility from the
hbase test package, this will allow you to start a full Hbase mini cluster
in a few lines of Java code.
Cheers,
N.
On Mon, Jan 30, 2012 at 11:10 AM, Damien Hardy
If your're interested, some good slides on GC (slide 45 and after):
http://www.azulsystems.com/sites/www.azulsystems.com/SpringOne2011_UnderstandingGC.pdf
On Tue, Nov 8, 2011 at 11:25 PM, Mikael Sitruk mikael.sit...@gmail.comwrote:
Concurrent GC (a.k.a CMS) does not mean that there is no more
71 matches
Mail list logo