Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal
Hi,

The way you describe the "in memory caching component", it looks very
similar to HBase memstore. Any reason for not relying on it?

N.

On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian <
christian.kleegr...@siemens.com> wrote:

> Dear all,
>
> We are currently working on an architecture for a system that should be
> serve as an archive for 1000+ measuring components that frequently (~30/s)
> send messages containing measurement values (~300 bytes/message). The
> archiving system should be capable of not only serving as a long term
> storage but also as a kind of streaming data processing and caching
> component. There are several functions that should be computed on the
> incoming data before finally storing it.
>
> We suggested an architecture that comprises of:
> A message routing component that could route data to calculations and
> route calculation results to other components that are interested in these
> data.
> An in memory caching component that is used for storing up to 10 - 20
> minutes of data before it is written to the long term archive.
> An hBase database that is used for the long term storage.
> MapReduce framework for doing analytics on the data stored in the hBase
> database.
>
> The complete system should be failsafe and reliable regarding component
> failures and it should scale with the number of computers that are utilized.
>
> Are there any suggestions or feedback to this approach from the community?
> and are there any suggestions which tools or systems to use for the message
> routing component and the in memory cache.
>
> Thanks for any help and suggestions
>
> all the best
>
> Christian
>
>
> 8<---
>
> Siemens AG
> Corporate Technology
> Corporate Research and Technologies
> CT T DE IT3
> Otto-Hahn-Ring 6
> 81739 Munich, Germany
> Tel.: +49 89 636-42722
> Fax: +49 89 636-41423
> mailto:christian.kleegr...@siemens.com
>
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
> Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
> Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
> Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
> Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
> Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684;
> WEEE-Reg.-No. DE 23691322
>


Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal
Hi Christian,

It's a component internal to HBase, so you don't have to use it directly.
See http://hbase.apache.org/book/wal.html on how writes are handled by
HBase to ensure reliability & data distribution...

Cheers,

N.

On Fri, Mar 16, 2012 at 7:39 PM, Kleegrewe, Christian <
christian.kleegr...@siemens.com> wrote:

> Hi
>
> Is this memstore replicated? Since we store a significant amount of data
> in the memory cache we need a replicated solution. Also I can't find lots
> of information besides a java api doc for the MemStore class. I will
> continue searching for this, but if you have any URL with more
> documentation please send it. Thanks in advance
>
> regards
>
> Christian
>
>
> 8<--
> Siemens AG
> Corporate Technology
> Corporate Research and Technologies
> CT T DE IT3
> Otto-Hahn-Ring 6
> 81739 München, Deutschland
> Tel.: +49 89 636-42722
> Fax: +49 89 636-41423
> mailto:christian.kleegr...@siemens.com
>
> Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard
> Cromme; Vorstand: Peter Löscher, Vorsitzender; Roland Busch, Brigitte
> Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt,
> Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft:
> Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg,
> HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
>
>
> -Ursprüngliche Nachricht-
> Von: N Keywal [mailto:nkey...@gmail.com]
> Gesendet: Freitag, 16. März 2012 18:02
> An: user@hbase.apache.org
> Betreff: Re: Streaming data processing and hBase
>
> Hi,
>
> The way you describe the "in memory caching component", it looks very
> similar to HBase memstore. Any reason for not relying on it?
>
> N.
>
> On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian <
> christian.kleegr...@siemens.com> wrote:
>
> > Dear all,
> >
> > We are currently working on an architecture for a system that should be
> > serve as an archive for 1000+ measuring components that frequently
> (~30/s)
> > send messages containing measurement values (~300 bytes/message). The
> > archiving system should be capable of not only serving as a long term
> > storage but also as a kind of streaming data processing and caching
> > component. There are several functions that should be computed on the
> > incoming data before finally storing it.
> >
> > We suggested an architecture that comprises of:
> > A message routing component that could route data to calculations and
> > route calculation results to other components that are interested in
> these
> > data.
> > An in memory caching component that is used for storing up to 10 - 20
> > minutes of data before it is written to the long term archive.
> > An hBase database that is used for the long term storage.
> > MapReduce framework for doing analytics on the data stored in the hBase
> > database.
> >
> > The complete system should be failsafe and reliable regarding component
> > failures and it should scale with the number of computers that are
> utilized.
> >
> > Are there any suggestions or feedback to this approach from the
> community?
> > and are there any suggestions which tools or systems to use for the
> message
> > routing component and the in memory cache.
> >
> > Thanks for any help and suggestions
> >
> > all the best
> >
> > Christian
> >
> >
> >
> 8<---
> >
> > Siemens AG
> > Corporate Technology
> > Corporate Research and Technologies
> > CT T DE IT3
> > Otto-Hahn-Ring 6
> > 81739 Munich, Germany
> > Tel.: +49 89 636-42722
> > Fax: +49 89 636-41423
> > mailto:christian.kleegr...@siemens.com
> >
> > Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
> > Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
> > Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
> > Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
> > Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
> > Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB
> 6684;
> > WEEE-Reg.-No. DE 23691322
> >
>


Re: HBase schema model question.

2012-03-20 Thread N Keywal
Hi,

Just a few... See http://hbase.apache.org/book.html#number.of.cfs

N.

On Tue, Mar 20, 2012 at 12:39 PM, Manish Bhoge
wrote:

> Very basic question:
> How many column families possible in a table in Hbase? I know you can have
> thousand of columns in a family. But I don't know how many families can be
> possible. So far in example I haven't seen more than 1.
> Thanks
> Manish
> Sent from my BlackBerry, pls excuse typo
>
>


Re: Hbase RegionServer stalls on initialization

2012-03-28 Thread N Keywal
Then you should have an error in the master logs.
If not, it worths checking that the master & the region servers speak to
the same ZK...

As it's hbase related, I redirect the question to hbase user mailing list
(hadoop common is in bcc).

On Wed, Mar 28, 2012 at 8:03 PM, Nabib El-Rahman <
nabib.elrah...@tubemogul.com> wrote:

> The master is up. is it possible that zookeeper might not know about it?
>
>
>  *Nabib El-Rahman *|  Senior Sofware Engineer
>
> *M:* 734.846.25 <734.846.2529>
> www.tubemogul.com | *twitter: @nabiber*
>
>  <http://www.tubemogul.com/>
>  <http://www.tubemogul.com/>
>
> On Mar 28, 2012, at 10:42 AM, N Keywal wrote:
>
> It must be waiting for the master. Have you launched the master?
>
> On Wed, Mar 28, 2012 at 7:40 PM, Nabib El-Rahman <
> nabib.elrah...@tubemogul.com> wrote:
>
>> Hi Guys,
>>
>> I'm starting up an region server and it stalls on initialization.  I took
>> a thread dump and found it hanging on this spot:
>>
>> "regionserver60020" prio=10 tid=0x7fa90c5c4000 nid=0x4b50 in 
>> Object.wait() [0x7fa9101b4000]
>>
>>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>
>> at java.lang.Object.wait(Native Method)
>>
>> - waiting on <0xbc63b2b8> (a 
>> org.apache.hadoop.hbase.MasterAddressTracker)
>>
>>
>> at 
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:122)
>>
>>
>> - locked <0xbc63b2b8> (a 
>> org.apache.hadoop.hbase.MasterAddressTracker)
>>
>>
>> at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:516)
>>
>>
>> at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:493)
>>
>>
>> at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.initialize(HRegionServer.java:461)
>>
>>
>> at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:560)
>>
>>
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>> Any Idea on who or what its being blocked on?
>>
>>  *Nabib El-Rahman *|  Senior Sofware Engineer
>>
>> *M:* 734.846.2529
>> www.tubemogul.com | *twitter: @nabiber*
>>
>>  <http://www.tubemogul.com/>
>>  <http://www.tubemogul.com/>
>>
>>
>
>


Re: can hbase-0.90.2 work with zookeeper-3.3.4?

2012-04-05 Thread N Keywal
Hi,

It should. I haven't tested the .90, but I tested the hbase trunk a few
month ago vs. ZK 3.4.x and ZK 3.3.x and it was working.

N.

2012/4/5 lulynn_2008 

>  Hi,
> I found hbase-0.90.2 use zookeeper-3.4.2. Can this version hbase work with
> zookeeper-3.3.4?
>
> Thank you.
>


Re: Zookeeper available but no active master location found

2012-04-13 Thread N Keywal
Hi,

Literally, it means that ZooKeeper is there but the hbase client can't find
the hbase master address in it.
By default, the node used is /hbase/master, and it contains the hostname
and port of the master.

You can check its content in ZK by doing a "get /hbase/master" in
bin/zkCli.sh (see
http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_ConnectingToZooKeeper
).

There should be a root cause for this, so it worths looking for other error
messages in the logs (master especially).

N.

On Fri, Apr 13, 2012 at 1:23 AM, Henri Pipe  wrote:

> "client.HConnectionManager$HConnectionImplementation: ZooKeeper available
> but no active master location found"
>
> Having a problem with master startup that I have not seen before.
>
> running the following packages:
>
> hadoop-hbase-0.90.4+49.137-1
> hadoop-0.20-secondarynamenode-0.20.2+923.197-1
> hadoop-hbase-thrift-0.90.4+49.137-1
> hadoop-zookeeper-3.3.4+19.3-1
> hadoop-0.20-datanode-0.20.2+923.197-1
> hadoop-0.20-namenode-0.20.2+923.197-1
> hadoop-0.20-tasktracker-0.20.2+923.197-1
> hadoop-hbase-regionserver-0.90.4+49.137-1
> hadoop-zookeeper-server-3.3.4+19.3-1
> hadoop-0.20-0.20.2+923.197-1
> hadoop-0.20-jobtracker-0.20.2+923.197-1
> hadoop-hbase-master-0.90.4+49.137-1
> [root@ip-10-251-27-130 logs]# java -version
> java version "1.6.0_31"
> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>
> I start master and region server on another node.
>
> Master is initialized, but as soon as I try to check the master_status or
> do a zkdump via web interface, it blows up with:
>
> 2012-04-12 19:16:10,453 INFO
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> ZooKeeper available but no active master location found
> 2012-04-12 19:16:10,453 INFO
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> getMaster attempt 10 of 10 failed; retrying after sleep of 16000
>
> I am running three zookeepers:
>
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> dataDir=/mnt/zookeeper
> # The maximum number of zookeeper client connections
> maxClientCnxns=2000
> # the port at which the clients will connect
> clientPort=2181
> server.1=10.251.27.130:2888:3888
> server.2=10.250.9.220:2888:3888
> server.3=10.251.110.50:2888:3888
>
> I can telnet to the zookeepers just fine.
>
> Here is my hbase-site.xml file:
>
> 
>  
>hbase.rootdir
>hdfs://namenode:9000/hbase
>  
>  
>hbase.cluster.distributed
>true
>  
> 
>hbase.zookeeper.quorum
>10.251.27.130,10.250.9.220,10.251.110.50
> 
> 
>hbase.zookeeper.property.dataDir
>/hadoop/zookeeper/data
> 
> 
>hbase.zookeeper.property.maxClientCnxns
>2000
>true
> 
> 
>
> Any thoughts? Any help is greatly appreciated.
>
> Thanks
>
> Henri Pipe
>


Re: TIMERANGE performance on uniformly distributed keyspace

2012-04-14 Thread N Keywal
Hi,

For the filtering part, every HFile is associated to a set of meta data.
This meta data includes the timerange. So if there is no overlap between
the time range you want and the time range of the store, the HFile is
totally skipped.

This work is done in StoreScanner#selectScannersFrom

Cheers,

N.


On Sat, Apr 14, 2012 at 5:11 PM, Doug Meil wrote:

> Hi there-
>
> With respect to:
>
> "* Does it need to hit every memstore and HFile to determine if there
> isdata available? And if so does it need to do a full scan of that file to
> determine the records qualifying to the timerange, since keys are stored
> lexicographically?"
>
> And...
>
> "Using "scan 'table', {TIMERANGE => [t, t+x]}" :"
> See...
>
>
> http://hbase.apache.org/book.html#regions.arch
> 8.7.5.4. KeyValue
>
>
>
> The timestamp is an attribute of the KeyValue, but unless you perform a
> restriction using start/stop row it have to process every row.
>
> Major compactions don't change this fact, they just change the number of
> HFiles that have to get processed.
>
>
>
> On 4/14/12 10:38 AM, "Rob Verkuylen"  wrote:
>
> >I'm trying to find a definitive answer to the question if scans on
> >timerange alone will scale when you use uniformly distributed keys like
> >UUIDs.
> >
> >Since the keys are randomly generated that would mean the keys will be
> >spread out over all RegionServers, Regions and HFiles. In theory, assuming
> >enough writes, that would mean that every HFile will contain the entire
> >timerange of writes.
> >
> >Now before a major compaction, data is in the memstores and (non
> >max.filesize) flushed&merged HFiles. I can imagine that a scan using a
> >TIMERANGE can quickly serve from memstores and the smaller files, but how
> >does it perform after a major compaction?
> >
> >Using "scan 'table', {TIMERANGE => [t, t+x]}" :
> >* How does HBase handle this query in this case(UUIDs)?
> >* Does it need to hit every memstore and HFile to determine if there is
> >data available? And if so does it need to do a full scan of that file to
> >determine the records qualifying to the timerange, since keys are stored
> >lexicographically?
> >
> >I've run some tests on 300+ region tables, on month old data(so after
> >major
> >compaction) and performance/response seems fairly quick. But I'm trying to
> >understand why that is, because hitting every HFile on every region seems
> >to be ineffective. Lars' book figure 9-3 seems to indicate this as well,
> >but cant seem to get the answer from the book or anywhere else.
> >
> >Thnx, Rob
>
>
>


Re: HBaseAdmin needs a close methord

2012-04-19 Thread N Keywal
Hi,

fwiw, the "close" method was added in HBaseAdmin for HBase 0.90.5.

N.

On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee  wrote:

> I don't think this issue can resovle the problem
> ZKWatcher is removed,but the configuration and HConnectionImplementation
> objects are still in HConnectionManager
> this may still cause memery leak
>
> but calling HConnectionManager.**deleteConnection may resolve HBASE-5073
> problem.
> I can see
>
>  if (this.zooKeeper != null) {
>LOG.info("Closed zookeeper sessionid=0x" +
>  Long.toHexString(this.**zooKeeper.getZooKeeper().**
> getSessionId()));
>this.zooKeeper.close();
>this.zooKeeper = null;
>  }
>
> in HConnectionImplementation.**close which is called by
> HConnectionManager.**deleteConnection
>
>
>
>
>  Hi Lee
>>
>> Is HBASE-5073 resolved in that release?
>>
>> Regards
>> Ram
>>
>>  -Original Message-
>>> From: Eason Lee [mailto:softse@gmail.com]
>>> Sent: Thursday, April 19, 2012 10:40 AM
>>> To: user@hbase.apache.org
>>> Subject: Re: HBaseAdmin needs a close methord
>>>
>>> I am using cloudera's cdh3u3
>>>
 Hi Lee

 Which version of HBase are you using?

 Regards
 Ram

  -Original Message-
> From: Eason Lee [mailto:softse@gmail.com]
> Sent: Thursday, April 19, 2012 9:36 AM
> To: user@hbase.apache.org
> Subject: HBaseAdmin needs a close methord
>
> Resently, my app meets a problem list as follows
>
> Can't construct instance of class
> org/apache/hadoop/hbase/**client/HBaseAdmin
> Exception in thread "Thread-2" java.lang.OutOfMemoryError: unable to
> create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.**java:640)
> at org.apache.zookeeper.**ClientCnxn.start(ClientCnxn.**java:414)
> at org.apache.zookeeper.**ZooKeeper.(ZooKeeper.**java:378)
> at org.apache.hadoop.hbase.**zookeeper.ZKUtil.connect(**
> ZKUtil.java:97)
> at
>
>  org.apache.hadoop.hbase.**zookeeper.ZooKeeperWatcher.<**
>>> init>(ZooKeeperWatc
>>>
 her.java:119)
> at
>
>  org.apache.hadoop.hbase.**client.HConnectionManager$**
>>> HConnectionImplementa
>>>
 tion.getZooKeeperWatcher(**HConnectionManager.java:1002)
> at
>
>  org.apache.hadoop.hbase.**client.HConnectionManager$**
>>> HConnectionImplementa
>>>
 tion.setupZookeeperTrackers(**HConnectionManager.java:304)
> at
>
>  org.apache.hadoop.hbase.**client.HConnectionManager$**
>>> HConnectionImplementa
>>>
 tion.(**HConnectionManager.java:295)
> at
>
>  org.apache.hadoop.hbase.**client.HConnectionManager.**
>>> getConnection(HConnec
>>>
 tionManager.java:157)
> at
>
 org.apache.hadoop.hbase.**client.HBaseAdmin.(**
>>> HBaseAdmin.java:90)
>>>
 Call to org.apache.hadoop.hbase.**HBaseAdmin::HBaseAdmin failed!
>
> My app create HBaseAdmin every 30s,and the threads used by my app
> increases about 1thread/30s.See from the stack, there is only one
> HBaseAdmin in Memory, but lots of Configuration and
> HConnectionImplementation instances.
>
> I can see from the sources, everytime when HBaseAdmin is created, a
>
 new
>>>
 Configuration and HConnectionImplementation is created and added to
> HConnectionManager.HBASE_**INSTANCES.Sothey
>  are not collected by gc
>
 when
>>>
 HBaseAdmin is collected.
>
> So i think we need to add a close methord to remove the
> Configuration&**HConnectionImplementation from
> HConnectionManager.HBASE_**INSTANCES.Just as follows:
>
> public void close(){
>HConnectionManager.**deleteConnection(**getConfiguration(),
> true);
> }
>




>>
>>
>>
>
>


Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread N Keywal
Hi Alex,

On the same idea, note that hbase is launched with
-XX:OnOutOfMemoryError="kill -9 %p".

N.

On Tue, May 1, 2012 at 10:41 AM, Igal Shilman  wrote:

> Hi Alex, just to rule out, oom killer,
> Try this:
>
> http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer
>
>
> On Mon, Apr 30, 2012 at 10:48 PM, Alex Baranau  >wrote:
>
> > Hello,
> >
> > During recent weeks I constantly see some RSs *silently* dying on our
> HBase
> > cluster. By "silently" I mean that process stops, but no errors in logs
> > [1].
> >
> > The only thing I can relate to it is long CMS-concurrent-mark: almost 80
> > seconds. But this should not cause issues as it is not a "stop-the-world"
> > process.
> >
> > Any advice?
> >
> > HBase: hbase-0.90.4-cdh3u3
> > Hadoop: 0.20.2-cdh3u3
> >
> > Thank you,
> > Alex Baranau
> >
> > [1]
> >
> > last lines from RS log (no errors before too, and nothing written in
> *.out
> > file):
> >
> > 2012-04-30 18:52:11,806 DEBUG
> > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
> > requested for agg-sa-1.3,0011|
> >
> >
> te|dtc|\x00\x00\x00\x00\x00\x00<\x1E\x002\x00\x00\x00\x015\x9C_n\x00\x00\x00\x00\x00\x00\x00\x00\x00,1334852280902.4285f9339b520ee617c087c0fd0dbf65.
> > because regionserver60020.cacheFlusher; priority=-1, compaction queue
> > size=0
> > 2012-04-30 18:54:58,779 DEBUG
> > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using new
> > createWriter -- HADOOP-6840
> > 2012-04-30 18:54:58,779 DEBUG
> > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
> >
> >
> Path=hdfs://xxx.ec2.internal/hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651,
> > syncFs=true, hflush=false
> > 2012-04-30 18:54:58,874 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> > Roll
> >
> >
> /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335811856672,
> > entries=73789, filesize=63773934. New hlog
> >
> >
> /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651
> > 2012-04-30 18:56:31,867 INFO
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke
> up
> > with memory above low water.
> > 2012-04-30 18:56:31,867 INFO
> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region
> > agg-sa-1.3,s_00I4|
> >
> >
> tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.
> > due to global heap pressure
> > 2012-04-30 18:56:31,867 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Started memstore flush for agg-sa-1.3,s_00I4|
> >
> >
> tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.,
> > current region memstore size 138.1m
> > 2012-04-30 18:56:31,867 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished snapshotting, commencing flushing stores
> > 2012-04-30 18:56:56,303 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=322.84
> MB,
> > free=476.34 MB, max=799.17 MB, blocks=5024, accesses=12189396,
> hits=127592,
> > hitRatio=1.04%%, cachingAccesses=132480, cachingHits=126949,
> > cachingHitsRatio=95.82%%, evictions=0, evicted=0, evictedPerRun=NaN
> > 2012-04-30 18:56:59,026 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Renaming flushed file at
> >
> >
> hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/.tmp/391890051647401997
> > to
> >
> >
> hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168
> > 2012-04-30 18:56:59,034 INFO org.apache.hadoop.hbase.regionserver.Store:
> > Added
> >
> >
> hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168,
> > entries=476418, sequenceid=880198761, memsize=138.1m, filesize=5.7m
> > 2012-04-30 18:56:59,097 INFO
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Finished memstore flush of ~138.1m for region agg-sa-1.3,s_00I4|
> >
> >
> tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805.
> > in 27230ms, sequenceid=880198761, compaction requested=false
> > ~
> >
> > [2]
> >
> > last lines from GC log:
> >
> > 2012-04-30T18:58:46.683+: 105717.791: [GC 105717.791: [ParNew:
> > 35638K->1118K(38336K), 0.0548970 secs] 3145651K->3111412K(4091776K)
> > icms_dc=6 , 0.0550360 secs] [Times: user=0.08 sys=0.00, real=0.09 secs]
> > 2012-04-30T18:58:46.961+: 105718.069: [GC 105718.069: [ParNew:
> > 35230K->2224K(38336K), 0.0802440 secs] 3145524K->3112533K(4091776K)
> > icms_dc=6 , 0.0803

Re: Important "Undefined Error"

2012-05-14 Thread N Keywal
Hi,

There could be multiple issues, but it's strange to have in hbase-site.xml

  hdfs://namenode:9000/hbase

while the core-site.xml says:

hdfs://namenode:54310/

The two entries should match.

I would recommend to:
- use netstat to check the ports (netstat -l)
- do the check recommended by Harsh J previously.

N.


On Mon, May 14, 2012 at 3:21 PM, Dalia Sobhy  wrote:
>
>
> pleas hel
>
>> From: dalia.mohso...@hotmail.com
>> To: user@hbase.apache.org
>> Subject: RE: Important "Undefined Error"
>> Date: Mon, 14 May 2012 12:20:18 +0200
>>
>>
>>
>> Hi,
>> I tried what you told me, but nothing worked:(((
>> First when I run this command:dalia@namenode:~$ host -v -t A 
>> `hostname`Output:Trying "namenode"Host namenode not found: 
>> 3(NXDOMAIN)Received 101 bytes from 10.0.2.1#53 in 13 ms My 
>> core-site.xml:        fs.default.name  
>>               
>> hdfs://namenode:54310/
>> My 
>> hdfs-site.xmldfs.name.dir/data/1/dfs/nn,/nfsmount/dfs/nndfs.datanode.max.xcievers4096dfs.replication3
>>  dfs.permissions.superusergroup hadoop
>> My 
>> Mapred-site.xmlmapred.local.dir/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local
>> My 
>> Hbase-site.xmlhbase.cluster.distributed
>>   true  hbase.rootdir     
>> hdfs://namenode:9000/hbasehbase.zookeeper.quorun
>>  
>> namenodehbase.regionserver.port60020The
>>  host and port that the HBase master runs 
>> at.dfs.replication1hbase.zookeeper.property.clientPort2181Property
>>  from ZooKeeper's config zoo.cfg.The port at which the clients will 
>> connect.
>> Please Help I am really disappointed I have been through all that for two 
>> weeks 
>>
>>
>>
>> > From: dwivedishash...@gmail.com
>> > To: user@hbase.apache.org
>> > Subject: RE: Important "Undefined Error"
>> > Date: Sat, 12 May 2012 23:31:49 +0530
>> >
>> > The problem is your hbase is not able to connect to Hadoop, can you put 
>> > your
>> > hbase-site.xml content >> here.. have you specified localhost somewhere, if
>> > so remove localhost from everywhere and put your hdfsl namenode address
>> > suppose your namenode is running on master:9000 then put your hbase file
>> > system setting as master:9000/hbase here I am sending you the configuration
>> > which I am using in hbase and is working
>> >
>> >
>> > My hbase-site.xml content is
>> >
>> > 
>> > 
>> > 
>> > 
>> > 
>> > hbase.rootdir
>> > hdfs://master:9000/hbase
>> > 
>> > 
>> > hbase.master
>> > master:6
>> > The host and port that the HBase master runs at.
>> > 
>> > 
>> > hbase.regionserver.port
>> > 60020
>> > The host and port that the HBase master runs at.
>> > 
>> > 
>> > 
>> > hbase.cluster.distributed
>> > true
>> > 
>> > 
>> > hbase.tmp.dir
>> > /home/shashwat/Hadoop/hbase-0.90.4/temp
>> > 
>> > 
>> > hbase.zookeeper.quorum
>> > master
>> > 
>> > 
>> > dfs.replication
>> > 1
>> > 
>> > 
>> > hbase.zookeeper.property.clientPort
>> > 2181
>> > Property from ZooKeeper's config zoo.cfg.
>> > The port at which the clients will connect.
>> > 
>> > 
>> > 
>> > hbase.zookeeper.property.dataDir
>> > /home/shashwat/zookeeper
>> > Property from ZooKeeper's config zoo.cfg.
>> > The directory where the snapshot is stored.
>> > 
>> > 
>> >
>> > 
>> >
>> >
>> >
>> >
>> > Check this out, and also stop hbase, If its not stopping kill all the
>> > processes, and after putting your  hdfs-site.xml, mapred-site.xml and
>> > core-site.sml to hbase conf directory try to restart, and also delete the
>> > folders created by hbase ,,, like temp directory or other then try to 
>> > start.
>> >
>> > Regards
>> > ∞
>> > Shashwat Shriparv
>> >
>> >
>> > -Original Message-
>> > From: Dalia Sobhy [mailto:dalia.mohso...@hotmail.com]
>> > Sent: 12 May 2012 22:48
>> > To: user@hbase.apache.org
>> > Subject: RE: Important "Undefined Error"
>> >
>> >
>> > Hi Shashwat,
>> > I want to tell you about my configurations:
>> > I am using 4 nodesOne "Master": Namenode, SecondaryNamenode, Job Tracker,
>> > Zookeeper, HMasterThree "Slaves": datanodes, tasktrackers, regionservers In
>> > both master and slaves, all the hadoop daemons are working well, but as for
>> > the hbase master service it is not working..
>> > As for region server here is the error:12/05/12 14:42:13 INFO
>> > util.ServerCommandLine: vmName=Java HotSpot(TM) 64-Bit Server VM,
>> > vmVendor=Sun Microsystems Inc., vmVersion=20.1-b0212/05/12 14:42:13 INFO
>> > util.ServerCommandLine: vmInputArguments=[-Xmx1000m, -ea,
>> > -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode,
>> > -Dhbase.log.dir=/usr/lib/hbase/bin/../logs, -Dhbase.log.file=hbase.log,
>> > -Dhbase.home.dir=/usr/lib/hbase/bin/.., -Dhbase.id.str=,
>> > -Dhbase.root.logger=INFO,console,
>> > -Djava.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64:/usr/lib/
>> > hbase/bin/../lib/native/Linux-amd64-64]12/05/12 14:42:13 INFO
>> > ipc.HBaseRpcMetrics: Initializing RPC Metrics with hostName=HRegionServer,
>> > port=6002012/05/12 14:42:14 FATAL zookeeper.ZKConfig: The server in zoo.cfg
>> > cannot be set

Re: Important "Undefined Error"

2012-05-14 Thread N Keywal
In core-file.xml, do you have this?



fs.default.name
 hdfs://namenode:8020/hbase


If you want hbase to connect to 8020 you must have hdfs listening on
8020 as well.


On Mon, May 14, 2012 at 5:17 PM, Dalia Sobhy  wrote:
> H
>
> I have tried to make both ports the same.
> But the prob is the hbase cannot connect to port 8020.
> When i run nmap hostname, port 8020 wasnt with the list of open ports.
> I have tried what harsh told me abt.
> I used the same port he used but same error occurred.
> Another aspect in cloudera doc it says that i have to canonical name for the 
> host ex: namenode.example.com as the hostname, but i didnt find it in any 
> tutorial. No one makes it.
> Note that i am deploying my cluster in fully distributed mode i.e am using 4 
> machines..
>
> So any ideas??!!
>
> Sent from my iPhone
>
> On 2012-05-14, at 4:07 PM, "N Keywal"  wrote:
>
>> Hi,
>>
>> There could be multiple issues, but it's strange to have in hbase-site.xml
>>
>>  hdfs://namenode:9000/hbase
>>
>> while the core-site.xml says:
>>
>> hdfs://namenode:54310/
>>
>> The two entries should match.
>>
>> I would recommend to:
>> - use netstat to check the ports (netstat -l)
>> - do the check recommended by Harsh J previously.
>>
>> N.
>>
>>
>> On Mon, May 14, 2012 at 3:21 PM, Dalia Sobhy  
>> wrote:
>>>
>>>
>>> pleas hel
>>>
>>>> From: dalia.mohso...@hotmail.com
>>>> To: user@hbase.apache.org
>>>> Subject: RE: Important "Undefined Error"
>>>> Date: Mon, 14 May 2012 12:20:18 +0200
>>>>
>>>>
>>>>
>>>> Hi,
>>>> I tried what you told me, but nothing worked:(((
>>>> First when I run this command:dalia@namenode:~$ host -v -t A 
>>>> `hostname`Output:Trying "namenode"Host namenode not found: 
>>>> 3(NXDOMAIN)Received 101 bytes from 10.0.2.1#53 in 13 ms My 
>>>> core-site.xml:        
>>>> fs.default.name        
>>>>         
>>>> hdfs://namenode:54310/
>>>> My 
>>>> hdfs-site.xmldfs.name.dir/data/1/dfs/nn,/nfsmount/dfs/nndfs.datanode.max.xcievers4096dfs.replication3
>>>>  dfs.permissions.superusergroup 
>>>> hadoop
>>>> My 
>>>> Mapred-site.xmlmapred.local.dir/data/1/mapred/local,/data/2/mapred/local,/data/3/mapred/local
>>>> My 
>>>> Hbase-site.xmlhbase.cluster.distributed
>>>>   true  hbase.rootdir     
>>>> hdfs://namenode:9000/hbasehbase.zookeeper.quorun
>>>>  
>>>> namenodehbase.regionserver.port60020The
>>>>  host and port that the HBase master runs 
>>>> at.dfs.replication1hbase.zookeeper.property.clientPort2181Property
>>>>  from ZooKeeper's config zoo.cfg.The port at which the clients will 
>>>> connect.
>>>> Please Help I am really disappointed I have been through all that for two 
>>>> weeks 
>>>>
>>>>
>>>>
>>>>> From: dwivedishash...@gmail.com
>>>>> To: user@hbase.apache.org
>>>>> Subject: RE: Important "Undefined Error"
>>>>> Date: Sat, 12 May 2012 23:31:49 +0530
>>>>>
>>>>> The problem is your hbase is not able to connect to Hadoop, can you put 
>>>>> your
>>>>> hbase-site.xml content >> here.. have you specified localhost somewhere, 
>>>>> if
>>>>> so remove localhost from everywhere and put your hdfsl namenode address
>>>>> suppose your namenode is running on master:9000 then put your hbase file
>>>>> system setting as master:9000/hbase here I am sending you the 
>>>>> configuration
>>>>> which I am using in hbase and is working
>>>>>
>>>>>
>>>>> My hbase-site.xml content is
>>>>>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> hbase.rootdir
>>>>> hdfs://master:9000/hbase
>>>>> 
>>>>> 
>>>>> hbase.master
>>>>> master:6
>>>>> The host and port that the HBase master runs 
>>>>> at.
>>>>> 
>>>>> 
>>>>> hbase.regionserver.port
>>>>> 60020
>>>>> The host and port that the HBase master runs 
>>>>> at.
>>>>> 
>&

Re: batch insert performance

2012-05-27 Thread N Keywal
Hi,

What version are you using?
On trunk, put(Put) and put(List) calls the same code, so I would
expect  comparable performances when autoflush it set to false.

However, with 250K small puts you may have the gc playing a role.

What are the results if you do the inserts with 50 times 5K rows?

N.

On Sun, May 27, 2012 at 1:58 AM, Faruk Berksöz  wrote:
> Codes and their results :
>
>    Code Nmr List List Size Code Avarage Elapsed Time (sec)  1 List
> batchAllRows; 250.000   table.setAutoFlush(false);
>  for (Put mRow : batchAllRows) {
>   table.put(mRow);
>  }
>  table.flushCommits(); 27       2 List batchAllRows; 250.000
> table.setAutoFlush(false);
>    table.put(batchAllRows);
>    table.flushCommits(); 103     3 List batchAllRows; 250.000
> table.setAutoFlush(false);
>  Object[] results = new Object[batchAllRows.size()];
>  table.batch(batchAllRows, results);
>  //table.batch(batchAllRows) ; /* already tried */
>  table.flushCommits(); 105
> -- Forwarded message --
> From: Faruk Berksöz 
> Date: 2012/5/27
> Subject: batch insert performance
> To: user@hbase.apache.org
>
>
> Hi, HBase users,
>
> I have 250.000 Rows in a list.
> I want to insert all rows in HTable as soon as possible.
> I have 3 different Code and 3 different elapsed time.
> Why HTable.batch(List actions, Object[] results) and
> HTable.put(List
> puts) methods 4 times slower than 1.Code which inserts records to htable in
> a simple loop ?
> Codes and their results :
>
>
>
>
>
>
> Faruk


Re: Null rowkey with empty get operation

2012-05-29 Thread N Keywal
There is a one to one mapping between the result and the get arrays;
so the result for rowkeys[i] is in results[i].
That's not what you want?

On Tue, May 29, 2012 at 9:34 AM, Ben Kim  wrote:
> Maybe I showed you a bad example. This makes more sense when it comes to
> using List
> For instance,
>
> List gets = new ArrayList();
> for(String rowkey : rowkeys){
>  Get get = new Get(Bytes.toBytes(rowkey));
>  get.addFamily(family);
>  Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new
> BinaryComparator(item));
>  get.setFilter(filter);
>  gets.add(get);
> }
> Result[] results = table.get(get);
>
> Now I have multiple results, I need to find the rowkey of the result that
> has no keyvalue.
> but results[0].getRow() is null if results[0] has no keyvalue.  so it's
> hard to derive which row the empty result belongs to :(
>
> Thank you for your response,
> Ben
>
>
>
> On Tue, May 29, 2012 at 2:33 PM, Anoop Sam John  wrote:
>
>> Hi Ben,
>>      In HBase rowkey exists with KVs only. As in your case there is no KVs
>> in the result, and so no rowkey. What is the use case that you are
>> referring here? When you issued Get with a rowkey and empty result for that
>> , you know the rowkey already right? I mean any specific reason why you try
>> to find the rowkey from the result object?
>>
>> -Anoop-
>>
>> 
>> From: Ben Kim [benkimkim...@gmail.com]
>> Sent: Tuesday, May 29, 2012 6:42 AM
>> To: user@hbase.apache.org
>> Subject: Null rowkey with empty get operation
>>
>> I have following Get code with HBase 0.92.0
>>
>> Get get = new Get(Bytes.toBytes(rowkey));
>> get.addFamily(family);
>> Filter filter = new QualifierFilter(CompareOp.NOT_EQUAL, new
>> BinaryComparator(item));
>> get.setFilter(filter);
>> Result r = table.get(get);
>>
>> System.out.println(r);  // (1) prints "keyvalues=NONE"
>> System.out.println(Bytes.toString(r.getRow()));  // (2) throws
>> NullpointerException
>>
>>
>>
>> printing out the result shows that all columns in a row was filtered out.
>> but i still want to print out the row key of the empty result.
>> But the value of r.getRow() is null
>>
>> Shouldn't r.getRow() return the rowkey even if the keyvalues are emtpy?
>>
>>
>> --
>>
>> *Benjamin Kim**
>> benkimkimben at gmail*
>>
>
>
>
> --
>
> *Benjamin Kim*
> **Mo : +82 10.5357.0521*
> benkimkimben at gmail*


Re: Issues with Java sample for connecting to remote Hbase

2012-05-29 Thread N Keywal
>From http://hbase.apache.org/book/os.html:
HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some
other distributions, for example, will default to 127.0.1.1 and this
will cause problems for you.

It worths reading the whole section ;-).

You also don't need to set the master address: it will be read from
zookeeper. I.e. you can remove this line from your client code:
>> >   config.set("hbase.master", "10.78.32.131:60010");

N.

On Tue, May 29, 2012 at 3:46 PM, AnandaVelMurugan Chandra Mohan
 wrote:
> Thanks for the response. It still errors out.
>
> On Tue, May 29, 2012 at 7:05 PM, Mohammad Tariq  wrote:
>
>> change the name from "localhost" to something else in the line
>> "10.78.32.131    honeywel-4a7632    localhost" and see if it works
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Tue, May 29, 2012 at 6:59 PM, AnandaVelMurugan Chandra Mohan
>>  wrote:
>> > I have HBase version 0.92.1 running in standalone mode. I created a table
>> > and added few rows using hbase shell. Now I am developing a standalone
>> java
>> > application to connect to Hbase and retrieve the data from the table.
>> > *
>> > This is the code I am using
>> > *
>> >              Configuration config = HBaseConfiguration.create();
>> >               config.clear();
>> >               config.set("hbase.zookeeper.quorum", "10.78.32.131");
>> >               config.set("hbase.zookeeper.property.clientPort","2181");
>> >               config.set("hbase.master", "10.78.32.131:60010");
>> >
>> >               HBaseAdmin.checkHBaseAvailable(config);
>> >
>> >
>> >               // This instantiates an HTable object that connects you to
>> > the "myTable"
>> >               // table.
>> >               HTable table = new HTable(config, "asset");
>> >
>> >               Get g = new Get(Bytes.toBytes("APU 331-350"));
>> >               Result r = table.get(g);
>> >
>> > *This is the content of my /etc/hosts file*
>> >
>> > #127.0.0.1    localhost.localdomain    localhost
>> > #10.78.32.131   honeywel-4a7632
>> > #127.0.1.1    honeywel-4a7632
>> > ::1    honeywel-4a7632    localhost6.localdomain6    localhost6
>> > 10.78.32.131    honeywel-4a7632    localhost
>> > *
>> > This is part of my error stack trace*
>> >
>> > 12/05/29 18:53:33 INFO
>> client.HConnectionManager$HConnectionImplementation:
>> > getMaster attempt 0 of 1 failed; no more retrying.
>> > java.net.ConnectException: Connection refused: no further information
>> >    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> >    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
>> >    at
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>> >    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
>> >    at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328)
>> >    at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362)
>> >    at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045)
>> >    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897)
>> >    at
>> >
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>> >    at $Proxy5.getProtocolVersion(Unknown Source)
>> >    at
>> >
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
>> >    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:303)
>> >    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:280)
>> >    at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:332)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:642)
>> >    at
>> org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:106)
>> >    at
>> >
>> org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1553)
>> >    at hbaseMain.main(hbaseMain.java:27)
>> > 12/05/29 18:53:33 INFO
>> client.HConnectionManager$HConnectionImplementation:
>> > Closed zookeeper sessionid=0x13798c3ce190003
>> > 12/05/29 18:53:33 INFO zookeeper.ZooKeeper: Session: 0x13798c3ce190003
>> > closed
>> > 12/05/29 18:53:33 INFO zookeeper.ClientCnxn: EventThread shut down
>> >
>> > Can some one help me fix this? Thanks a lot.
>> > --
>> > Regards,
>> > Anand
>>
>
>
>
> --
> Regards,
> Anand


Re: understanding the client code

2012-05-29 Thread N Keywal
Hi,

If you're speaking about preparing the query it's in HTable and
HConnectionManager.
If you're on the pure network level, then, on trunk, it's now done
with a third party called protobuf.

See the code from HConnectionManager#createCallable to see how it's used.

Cheers,

N.

On Tue, May 29, 2012 at 4:15 PM, S Ahmed  wrote:
> I'm looking at the client code here:
> https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client
>
> Is this the high level operations, and the actual sending of this data over
> the network is done somewhere else?
>
> For example, during a PUT, you may want it to write to n nodes, where is
> the code that does that? And the actual network connection etc?


Re: understanding the client code

2012-05-29 Thread N Keywal
There are two levels:
- communication between hbase client and hbase cluster: this is the
code you have in hbase client package. As a end user you don't really
care, but you care if you want to learn hbase internals.
- communication between customer code and hbase as a whole if you
don't want to use the hbase client. Then several options are
available, thrift being one of them (I'm not sure of avro status).

What do you want to do exactly?

On Tue, May 29, 2012 at 4:33 PM, S Ahmed  wrote:
> So how does thrift and avro fit into the picture?  (I believe I saw
> references to that somewhere, are those alternate connection libs?)
>
> I know protobuf is just generating types for various languages...
>
> On Tue, May 29, 2012 at 10:26 AM, N Keywal  wrote:
>
>> Hi,
>>
>> If you're speaking about preparing the query it's in HTable and
>> HConnectionManager.
>> If you're on the pure network level, then, on trunk, it's now done
>> with a third party called protobuf.
>>
>> See the code from HConnectionManager#createCallable to see how it's used.
>>
>> Cheers,
>>
>> N.
>>
>> On Tue, May 29, 2012 at 4:15 PM, S Ahmed  wrote:
>> > I'm looking at the client code here:
>> >
>> https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client
>> >
>> > Is this the high level operations, and the actual sending of this data
>> over
>> > the network is done somewhere else?
>> >
>> > For example, during a PUT, you may want it to write to n nodes, where is
>> > the code that does that? And the actual network connection etc?
>>


Re: understanding the client code

2012-05-29 Thread N Keywal
So it's the right place for the internals :-).
The main use case for the thrift api is when you have non java client code.

On Tue, May 29, 2012 at 5:07 PM, S Ahmed  wrote:
> I don't really want any, I just want to learn the internals :)
>
> So why would someone not want to use the client, for data intensive tasks
> like mapreduce etc. where they want direct access to the files?
>
> On Tue, May 29, 2012 at 11:00 AM, N Keywal  wrote:
>
>> There are two levels:
>> - communication between hbase client and hbase cluster: this is the
>> code you have in hbase client package. As a end user you don't really
>> care, but you care if you want to learn hbase internals.
>> - communication between customer code and hbase as a whole if you
>> don't want to use the hbase client. Then several options are
>> available, thrift being one of them (I'm not sure of avro status).
>>
>> What do you want to do exactly?
>>
>> On Tue, May 29, 2012 at 4:33 PM, S Ahmed  wrote:
>> > So how does thrift and avro fit into the picture?  (I believe I saw
>> > references to that somewhere, are those alternate connection libs?)
>> >
>> > I know protobuf is just generating types for various languages...
>> >
>> > On Tue, May 29, 2012 at 10:26 AM, N Keywal  wrote:
>> >
>> >> Hi,
>> >>
>> >> If you're speaking about preparing the query it's in HTable and
>> >> HConnectionManager.
>> >> If you're on the pure network level, then, on trunk, it's now done
>> >> with a third party called protobuf.
>> >>
>> >> See the code from HConnectionManager#createCallable to see how it's
>> used.
>> >>
>> >> Cheers,
>> >>
>> >> N.
>> >>
>> >> On Tue, May 29, 2012 at 4:15 PM, S Ahmed  wrote:
>> >> > I'm looking at the client code here:
>> >> >
>> >>
>> https://github.com/apache/hbase/tree/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client
>> >> >
>> >> > Is this the high level operations, and the actual sending of this data
>> >> over
>> >> > the network is done somewhere else?
>> >> >
>> >> > For example, during a PUT, you may want it to write to n nodes, where
>> is
>> >> > the code that does that? And the actual network connection etc?
>> >>
>>


Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread N Keywal
Hi,

For the multiget, if it's small enough, it will be:
- parallelized on all region servers concerned. i.e. you will be as
fast as the slowest region server.
- there will be one query per region server (i.e. gets are grouped by
region server).

If there are too many gets, it will be split in small subsets and the
strategy above will be used for each subset, doing one subset after
another (and blocking between them).

so Large set  --> Small set will be ok from this point of view. Large
--> Large won't.

N.


On Tue, May 29, 2012 at 5:54 PM, Em  wrote:
> Ian,
>
> thanks for your detailed response!
>
> Let me give you feedback to each point:
>> 1. You could denormalize the additional information (e.g. course
>> name) into the students table. Then, you're simply reading the
>> student row, and all the info you need is there. That places an extra
>> burden of write time and disk space, and does make you do a lot more
>> work when a course name changes.
> That's exactly what I thought about and that's why I avoid it. The
> students and courses example is an example you find at several points on
> the web, when describing the differences and translations of relations
> from an RDBMS into a Key-Value-store.
> In fact, everything you model with a Key-Value-storage like HBase,
> Cassandra etc. can be modeled as an RDMBS-scheme.
> Since a lot of people, like me, are coming from that edge, we must
> re-learn several basic things.
> It starts with understanding that you model a K-V-storage the way you
> want to access the data, not as the data relates to eachother (in
> general terms) and ends with translating the connections of data into a
> K-V-schema as good as possible.
>
>
>> 2. You could do what you're talking about in your HBase access code:
>> find the list of course IDs you need for the student, and do a multi
>> get on the course table. Fundamentally, this won't be much more
>> efficient to do in batch mode, because the courses are likely to be
>> evenly spread out over the region servers (orthogonal to the
>> students). You're essentially doing a hash join, except that it's a
>> lot less pleasant than on a relational DB b/c you've got network
>> round trips for each GET. The disk blocks from the course table (I'm
>> assuming it's the smaller side) will likely be cached so at least
>> that part will be fast--you'll be answering those questions from
>> memory, not via disk IO.
>
> Whow, what?
> I thought a Multiget would reduce network-roundtrips as it only accesses
> each region *one* time, fetching all the queried keys and values from
> there. If your data is randomly distributed, this could result in the
> same costs as with doing several Gets in a loop, but should work better
> if several Keys are part of the same region.
> Am I right or did I missunderstood the concept???
>
>> 3. You could also let a higher client layer worry about this. For
>> example, your data layer query just returns a student with a list of
>> their course IDs, and then another process in your client code looks
>> up each course by ID to get the name. You can then put an external
>> caching layer (like memcached) in the middle and make things a lot
>> faster (though that does put the burden on you to have the code path
>> for changing course info also flush the relevant cache entries). In
>> your example, it's unlikely any institution would have more than a
>> few thousand courses, so they'd probably all stay in memory and be
>> served instantaneously.
> Hm, in what way does this give me an advantage over using HBase -
> assuming that the number of courses is small enough to fit in RAM - ?
> I know that Memcached is optimized for this purpose and might have much
> faster response times - no doubts.
> However, from a conceptual point of view: Why does Memcached handles the
> K-V-distribution more efficiently than a HBase with warmed caches?
> Hopefully this question isn't that hard :).
>
>> This might seem laborious, and to a degree it is. But note that it's >
> difficult to see the utility of HBase with toy examples like this; if >
> you're really storing courses and students, don't use HBase (unless
>> you've got billions of students and courses, which seems unlikely).
>> The extra thought you have to put in to making schemas work for you
>> in HBase is only worth it when it gives you the ability to scale to
>> gigantic data sets where other solutions wouldn't.
> Well, the background is a private project. I know that it's a lot easier
> to do what I want in a RDBMS and there is no real need for using a
> highly scalable beast like HBase.
> However, I want to learn something new and since I do not break
> someone's business by trying out new technology privately, I want to go
> with HStack.
> Without ever doing it, you never get a real feeling of when to use the
> right tool.
> Using a good tool for the wrong problem can be an interesting
> experience, since you learn some of the do's and don'ts of the software
> you use.
>
> Sinc

Re: hosts unreachables

2012-06-01 Thread N Keywal
Yes, this is the balance process (as its name says: keeps the cluster
balanced), and it's not related to the process of looking after dead
nodes.
The nodes are monitored by ZooKeeper, the timeout is by default 180
seconds (setting: zookeeper.session.timeout)

On Fri, Jun 1, 2012 at 4:40 PM, Cyril Scetbon  wrote:
> I've another regionserver (hb-d2) that crashed (I can easily reproduce the
> issue by continuing injections), and as I see in master log, it gets
> information about hb-d2 every 5 minutes. I suppose it's what helps him to
> note if a node is dead or not. However it adds hb-d2 to the dead node list
> at 13:32:20, so before 5 minutes since the last time it got the server
> information. Is it normal ?
>
> 2012-06-01 13:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:07:36,319 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:12:36,328 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:17:36,337 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:22:36,346 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:27:36,353 DEBUG org.apache.hadoop.hbase.master.LoadBalancer:
> Server information: hb-d5,60020,1338553124247=47,
> hb-d4,60020,1338553126577=47, hb-d7,60020,1338553124279=46,
> hb-d10,60020,1338553126695=47, hb-d6,60020,133
> 8553124588=47, hb-d8,60020,1338553124113=47, hb-d2,60020,1338553126560=47,
> hb-d11,60020,1338553124329=47, hb-d12,60020,1338553126567=47,
> hb-d1,60020,1338553126474=47, hb-d9,60020,1338553124179=47
> ..
> 2012-06-01 13:32:20,048 INFO
> org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> ephemeral node deleted, processing expiration [hb-d2,60020,1338553126560]
> 2012-06-01 13:32:20,048 DEBUG org.apache.hadoop.hbase.master.ServerManager:
> Added=hb-d2,60020,1338553126560 to dead servers, submitted shutdown handler
> to be executed, root=false, meta=false
> 2012-06-01 13:32:20,048 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
> for hb-d2,60020,1338553126560
>
>
>
> On 6/1/12 3:25 PM, Cyril Scetbon wrote:
>>
>> I've added hbase.hregion.memstore.mslab.enabled = true to the
>> configuration of all regionservers and add flags -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>> -XX:CMSInitiatingOccupancyFraction=60 to the hbase environment
>> However my regionservers are still crashing when I load data into the
>> cluster
>>
>> Here are the logs for the node hb-d3 that crashed at 12:56
>>
>> - GC logs : http://pastebin.com/T0d0y8pZ
>> - regionserver logs : http://pastebin.com/n6v9x3XM
>>
>> thanks
>>
>> On 5/31/12 11:12 PM, Jean-Daniel Cryans wrote:
>>>
>>> Both, also you could bigger log snippets (post them on something like
>>> pastebin.com) and we could see more evidence of the issue.
>>>
>>> J-D
>>>
>>> On Thu, May 31, 2012 at 2:09 PM, Cyril Scetbon
>>>  wrote:

 On 5/31/12 11:00 PM, Jean-Daniel Cryans wrote:
>
> What I'm seeing looks more like GC issues. Start reading this:
> http://hbase.apache.org/book.html#gc
>
> J-D

 Hi,

 Really not sure cause I've e

Re: Region is not online Execptions

2012-06-07 Thread N Keywal
Hi,

You can have this if the region moved, i.e. was previously managed by
this region server and is now managed by another. The client keeps a
cache of the locations, so after a move it will first contact the
wrong server. Then the client will update its cache. By default there
are 10 internal retries so the next retry will be the right one, and
this error should not be seen in the customer code.

 In 0.96 the regions server will send back a RegionMovedException with
the new location if the move is not too old (less than around ~5
minutes if I remember well).

N.

On Thu, Jun 7, 2012 at 9:36 PM, arun sirimalla  wrote:
> Hi,
>
> My Hbase cluster seems to work fine, but i see some exepctions in one of
> the RegionServer  with below message
>
> 2012-06-07 19:24:48,809 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online: -ROOT-,,0
> 2012-06-07 19:24:56,154 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online: -ROOT-,,0
>
> Though this regionserver is not Hosting the ROOT region. The -ROOT- region
> is hosted by another Regionserver. Can someone please tell me why these
> exceptions occur.
>
> Thanks
> Arun


Re: HBase first steps: Design a table

2012-06-13 Thread N Keywal
Hi,

> Usually I'm inserting about 40 000 rows at a time. Should I do 40 000
> calls to put? Or is there any "bulkinsert" method?

There is this chapter on bulk loading:
http://hbase.apache.org/book.html#arch.bulk.load
But for 40K rows you may just want to use "void put(final List
puts)" in HTableInterface, that will save a lot of rpc calls.

Cheers,

N.


Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
Hi Jean-Marc,

Interesting :-)

Added to Anoop questions:

What's the hbase version you're using?

Is it repeatable, I mean if you try twice the same "gets" with the
same client do you have the same results? I'm asking because the
client caches the locations.

If the locations are wrong (region moved) you will have a retry loop,
and it includes a sleep. Do you have anything in the logs?

Could you share as well the code you're using to get the ~100 ms time?

Cheers,

N.

On Thu, Jun 28, 2012 at 6:56 AM, Anoop Sam John  wrote:
> Hi
>     How many Gets you batch together in one call? Is this equal to the 
> Scan#setCaching () that u are using?
> If both are same u can be sure that the the number of NW calls is coming 
> almost same.
>
> Also you are giving random keys in the Gets. The scan will be always 
> sequential. Seems in your get scenario it is very very random reads resulting 
> in too many reads of HFile block from HDFS. [Block caching is enabled?]
>
> Also have you tried using Bloom filters?  ROW blooms might improve your get 
> performance.
>
> -Anoop-
> 
> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
> Sent: Thursday, June 28, 2012 5:04 AM
> To: user
> Subject: Scan vs Put vs Get
>
> Hi,
>
> I have a small piece of code, for testing, which is putting 1B lines
> in an existing table, getting 3000 lines and scanning 1.
>
> The table is one family, one column.
>
> Everything is done randomly. Put with Random key (24 bytes), fixed
> family and fixed column names with random content (24 bytes).
>
> Get (batch) is done with random keys and scan with RandomRowFilter.
>
> And here are the results.
> Time to insert 100 lines: 43 seconds (23255 lines/seconds)
> That's correct for my needs based on the poor performances of the
> servers in the cluster. I'm fine with the results.
>
> Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds)
> This is way to low. I don't understand why. So I tried the random scan
> because I'm not able to figure the issue.
>
> Time to read 1 lines: 108.0 mseconds (92593 lines/seconds)
> This it impressive! I have added that after I failed with the get. I
> moved from 262 lines per seconds to almost 100K lines/seconds!!! It's
> awesome!
>
> However, I'm still wondering what's wrong with my gets.
>
> The code is very simple. I'm using Get objects that I'm executing in a
> Batch. I tried to add a filter but it's not helping. Here is an
> extract of the code.
>
>                        for (long l = 0; l < linesToRead; l++)
>                        {
>                                byte[] array1 = new byte[24];
>                                for (int i = 0; i < array1.length; i++)
>                                                array1[i] = 
> (byte)Math.floor(Math.random() * 256);
>                                Get g = new Get (array1);
>                                gets.addElement(g);
>                        }
>                                Object[] results = new Object[gets.size()];
>                                System.out.println(new java.util.Date () + " 
> \"gets\" created.");
>                                long timeBefore = System.currentTimeMillis();
>                        table.batch(gets, results);
>                        long timeAfter = System.currentTimeMillis();
>
>                        float duration = timeAfter - timeBefore;
>                        System.out.println ("Time to read " + gets.size() + " 
> lines : "
> + duration + " mseconds (" + Math.round(((float)linesToRead /
> (duration / 1000))) + " lines/seconds)");
>
> What's wrong with it? I can't add the setBatch neither I can add
> setCaching because it's not a scan. I tried with different numbers of
> gets but it's almost always the same speed. Am I using it the wrong
> way? Does anyone have any advice to improve that?
>
> Thanks,
>
> JM


Re: Stargate: ScannerModel

2012-06-28 Thread N Keywal
(moving this to the user mailing list, with the dev one in bcc)

>From what you said it should be

customerid_MIN_TX_ID to customerid_MAX_TX_ID
But only if customerid size is constant.

Note that with this rowkey design there will be very few regions
involved, so it's unlikely to be parallelized.

N.


On Thu, Jun 28, 2012 at 7:43 AM, sameer  wrote:
> Hello,
>
> I want to what are the parameters for scan.setStartRow ans scan.setStopRow.
>
> My requirement is that I have a table, with key as customerid_transactionId.
>
> I want to scan all the rows, they key of which contains the customer Id that
> I have.
>
> I tried using rowFilter but it is quite slow.
>
> If I am using the scan - setStartRow and setStopRow then what would I give
> as parameters?
>
> Thanks,
> Sameer
>
> --
> View this message in context: 
> http://apache-hbase.679495.n3.nabble.com/Stargate-ScannerModel-tp2975161p4019139.html
> Sent from the HBase - Developer mailing list archive at Nabble.com.


Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
ation = timeAfter - timeBefore;
> System.out.println ("Time to read " + gets.size() + " lines : " +
> duration + " mseconds (" + Math.round(((float)linesToRead / (duration
> / 1000))) + " lines/seconds)");
>
>
> for (int i = 0; i < results.length; i++)
> {
>        if (results[i] instanceof KeyValue)
>                if (!((KeyValue)results[i]).isEmptyColumn())
>                        System.out.println("Result[" + i + "]: " + 
> results[i]); // co
> BatchExample-9-Dump Print all results.
> }
>
> 2012/6/28, Ramkrishna.S.Vasudevan :
>> Hi
>>
>> You can also check the cache hit and cache miss statistics that appears on
>> the UI?
>>
>> In your random scan how many Regions are scanned whereas in gets may be
>> many
>> due to randomness.
>>
>> Regards
>> Ram
>>
>>> -Original Message-
>>> From: N Keywal [mailto:nkey...@gmail.com]
>>> Sent: Thursday, June 28, 2012 2:00 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: Scan vs Put vs Get
>>>
>>> Hi Jean-Marc,
>>>
>>> Interesting :-)
>>>
>>> Added to Anoop questions:
>>>
>>> What's the hbase version you're using?
>>>
>>> Is it repeatable, I mean if you try twice the same "gets" with the
>>> same client do you have the same results? I'm asking because the
>>> client caches the locations.
>>>
>>> If the locations are wrong (region moved) you will have a retry loop,
>>> and it includes a sleep. Do you have anything in the logs?
>>>
>>> Could you share as well the code you're using to get the ~100 ms time?
>>>
>>> Cheers,
>>>
>>> N.
>>>
>>> On Thu, Jun 28, 2012 at 6:56 AM, Anoop Sam John 
>>> wrote:
>>> > Hi
>>> >     How many Gets you batch together in one call? Is this equal to
>>> the Scan#setCaching () that u are using?
>>> > If both are same u can be sure that the the number of NW calls is
>>> coming almost same.
>>> >
>>> > Also you are giving random keys in the Gets. The scan will be always
>>> sequential. Seems in your get scenario it is very very random reads
>>> resulting in too many reads of HFile block from HDFS. [Block caching is
>>> enabled?]
>>> >
>>> > Also have you tried using Bloom filters?  ROW blooms might improve
>>> your get performance.
>>> >
>>> > -Anoop-
>>> > 
>>> > From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
>>> > Sent: Thursday, June 28, 2012 5:04 AM
>>> > To: user
>>> > Subject: Scan vs Put vs Get
>>> >
>>> > Hi,
>>> >
>>> > I have a small piece of code, for testing, which is putting 1B lines
>>> > in an existing table, getting 3000 lines and scanning 1.
>>> >
>>> > The table is one family, one column.
>>> >
>>> > Everything is done randomly. Put with Random key (24 bytes), fixed
>>> > family and fixed column names with random content (24 bytes).
>>> >
>>> > Get (batch) is done with random keys and scan with RandomRowFilter.
>>> >
>>> > And here are the results.
>>> > Time to insert 100 lines: 43 seconds (23255 lines/seconds)
>>> > That's correct for my needs based on the poor performances of the
>>> > servers in the cluster. I'm fine with the results.
>>> >
>>> > Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds)
>>> > This is way to low. I don't understand why. So I tried the random
>>> scan
>>> > because I'm not able to figure the issue.
>>> >
>>> > Time to read 1 lines: 108.0 mseconds (92593 lines/seconds)
>>> > This it impressive! I have added that after I failed with the get. I
>>> > moved from 262 lines per seconds to almost 100K lines/seconds!!! It's
>>> > awesome!
>>> >
>>> > However, I'm still wondering what's wrong with my gets.
>>> >
>>> > The code is very simple. I'm using Get objects that I'm executing in
>>> a
>>> > Batch. I tried to add a filter but it's not helping. Here is an
>>> > extract of the code.
>>> >
>>> >                        for (long l = 0; l < linesToRead; l++)
>>> >                        {
>>> >                                byte[] array1 = new byte[24];
>>> >                                for (int i = 0; i < array1.length;
>>> i++)
>>> >                                                array1[i] =
>>> (byte)Math.floor(Math.random() * 256);
>>> >                                Get g = new Get (array1);
>>> >                                gets.addElement(g);
>>> >                        }
>>> >                                Object[] results = new
>>> Object[gets.size()];
>>> >                                System.out.println(new java.util.Date
>>> () + " \"gets\" created.");
>>> >                                long timeBefore =
>>> System.currentTimeMillis();
>>> >                        table.batch(gets, results);
>>> >                        long timeAfter = System.currentTimeMillis();
>>> >
>>> >                        float duration = timeAfter - timeBefore;
>>> >                        System.out.println ("Time to read " +
>>> gets.size() + " lines : "
>>> > + duration + " mseconds (" + Math.round(((float)linesToRead /
>>> > (duration / 1000))) + " lines/seconds)");
>>> >
>>> > What's wrong with it? I can't add the setBatch neither I can add
>>> > setCaching because it's not a scan. I tried with different numbers of
>>> > gets but it's almost always the same speed. Am I using it the wrong
>>> > way? Does anyone have any advice to improve that?
>>> >
>>> > Thanks,
>>> >
>>> > JM
>>
>>


Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
Thank you. It's clearer now. From the code you sent, RandomRowFilter
is not used. You're only using the KeyOnlyFilter (the second setFilter
replaces the first one; you need to use like FilterList to combine
filters). (Note as well that you would need to initialize
RandomRowFilter#chance, if not all the rows will be filtered out.)

So, in one case -list of gets-, you're reading a well defined set of
rows (defined randomly, but well defined :-), and this set spreads all
other the regions.
In the second one (KeyOnlyFilter), you're reading the first 1K rows
you could get from the cluster.

This explains the difference between the results. Activating
RandomRowFilter should not change much the results, as it's different
to select a random set of rows and to get a set of rows defined
randomly (don't know if I'm clear here...).

Unfortunately you're likely to be more interested of the performance
when there is a real selection. Your code for list of gets was correct
imho. I'm interested by the results if you activate bloomfilters.

Cheers,

N.

On Thu, Jun 28, 2012 at 3:45 PM, Jean-Marc Spaggiari
 wrote:
> Hi N Keywal,
>
> This result:
> Time to read 1 lines : 122.0 mseconds (81967 lines/seconds)
>
> Is obtain with this code:
> HTable table = new HTable(config, "test3");
> final int linesToRead = 1;
> System.out.println(new java.util.Date () + " Processing iteration " +
> iteration + "... ");
> RandomRowFilter rrf = new RandomRowFilter();
> KeyOnlyFilter kof = new KeyOnlyFilter();
>
> Scan scan = new Scan();
> scan.setFilter(rrf);
> scan.setFilter(kof);
> scan.setBatch(Math.min(linesToRead, 1000));
> scan.setCaching(Math.min(linesToRead, 1000));
> ResultScanner scanner = table.getScanner(scan);
> processed = 0;
> long timeBefore = System.currentTimeMillis();
> for (Result result : scanner.next(linesToRead))
> {
>        if (result != null)
>                processed++;
> }
> scanner.close();
> long timeAfter = System.currentTimeMillis();
>
> float duration = timeAfter - timeBefore;
> System.out.println ("Time to read " + linesToRead + " lines : " +
> duration + " mseconds (" + Math.round(((float)linesToRead / (duration
> / 1000))) + " lines/seconds)");
> table.close ();
>
> This is with the scan.
>
> scan > 80 000 lines/seconds
> put > 20 000 lines/seconds
> get < 300 lines/seconds
>
> 2012/6/28, Jean-Marc Spaggiari :
>> Hi Anoop,
>>
>> Are Bloom filters for columns? If I add "g.setFilter(new
>> KeyOnlyFilter());" that mean I can't use bloom filters, right?
>> Basically, what I'm doing here is something like
>> "existKey(byte[]):boolean" where I try to see if a key exist in the
>> database whitout taking into consideration if there is any column
>> content or not. This should be very fast. Even faster than the scan
>> which need to keep some tracks of where I'm reading for the next row.
>>
>> JM
>>
>> 2012/6/28, Anoop Sam John :
>>>>blockCacheHitRatio=69%
>>> Seems blocks you are getting from cache.
>>> You can check with Blooms also once.
>>>
>>> You can enable the usage of bloom using the config param
>>> "io.storefile.bloom.enabled" set to true  . This will enable the usage of
>>> bloom globally
>>> Now you need to set the bloom type for your CF
>>> HColumnDescriptor#setBloomFilterType()   U can check with type
>>> BloomType.ROW
>>>
>>> -Anoop-
>>>
>>> _
>>> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org]
>>> Sent: Thursday, June 28, 2012 5:42 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: Scan vs Put vs Get
>>>
>>> Oh! I never looked at this part ;) Ok. I have it.
>>>
>>> Here are the numbers for one server before the read:
>>>
>>> blockCacheSizeMB=186.28
>>> blockCacheFreeMB=55.4
>>> blockCacheCount=2923
>>> blockCacheHitCount=195999
>>> blockCacheMissCount=89297
>>> blockCacheEvictedCount=69858
>>> blockCacheHitRatio=68%
>>> blockCacheHitCachingRatio=72%
>>>
>>> And here are the numbers after 100 iterations of 1000 gets for  the same
>>> server:
>>>
>>> blockCacheSizeMB=194.44
>>> blockCacheFreeMB=47.25
>>> blockCacheCount=3052
>>> blockCacheHitCount=232034
>>> blockCacheMissCount=103250
>>> blockCacheEvictedCount=83682
>>> blockCacheHitRatio=69%
>>> blockCacheHitCachingRatio=72%
>>>
>>> Don

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
For the filter list my guess is that you're filtering out all rows
because RandomRowFilter#chance is not initialized (it should be
something like RandomRowFilter rrf = new RandomRowFilter(0.5);)
But note that this test will never be comparable to the test with a
list of gets. You can make it as slow/fast as you want by playing with
the 'chance' parameter.

The results with gets and bloom filter are also in the interesting
category, hopefully an expert will get in the loop...



On Thu, Jun 28, 2012 at 6:04 PM, Jean-Marc Spaggiari
 wrote:
> Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I
> mean, bad I did not figured that. Thanks for pointing that. That
> definitively explain the difference in the performances.
>
> I have activated the bloomfilters with this code:
> HBaseAdmin admin = new HBaseAdmin(config);
> HTable table = new HTable(config, "test3");
> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
> HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0];
> cd.setBloomFilterType(BloomType.ROW);
> admin.disableTable("test3");
> admin.modifyColumn("test3", cd);
> admin.enableTable("test3");
> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
>
> And here is the result for the first attempt (using gets):
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
> 'true', BLOCKCACHE => 'true'}
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
> 'true', BLOCKCACHE => 'true'}
> Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0...
> Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds)
>
> 2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds)
> 3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds)
> After few more iterations (about 30), I'm between 200 and 250
> lines/seconds, like before.
>
> Regarding the filterList, I tried, but now I'm getting this error from
> the servers:
> org.apache.hadoop.hbase.regionserver.LeaseException:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '-6376193724680783311' does not exist
> Here is the code:
>        final int linesToRead = 1;
>        System.out.println(new java.util.Date () + " Processing iteration " +
> iteration + "... ");
>        RandomRowFilter rrf = new RandomRowFilter();
>        KeyOnlyFilter kof = new KeyOnlyFilter();
>        Scan scan = new Scan();
>        List filters = new ArrayList();
>        filters.add(rrf);
>        filters.add(kof);
>        FilterList filterList = new FilterList(filters);
>        scan.setFilter(filterList);
>        scan.setBatch(Math.min(linesToRead, 1000));
>        scan.setCaching(Math.min(linesToRead, 1000));
>        ResultScanner scanner = table.getScanner(scan);
>        processed = 0;
>        long timeBefore = System.currentTimeMillis();
>        for (Result result : scanner.next(linesToRead))
>        {
>                System.out.println("Result: " + result); //
>                if (result != null)
>                        processed++;
>        }
>        scanner.close();
>
> It's failing when I try to do for (Result result :
> scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and
> 1 with the same result :(
>
> I will try to find the root cause, but if you have any hint, it's welcome.
>
> JM


Re: HMASTER -- odd messages ?

2012-07-03 Thread N Keywal
> Would Datanode issues impact the HMaster stability?

Yes and no. If you have only a few datanodes down, their should be no
issue. When there are enough missing datanodes to make some blocks not
available at all in the cluster, there are many tasks that can not be
done anymore (to say the least, and depending on the blocks), for the
master or for the region server. In this case the ideal contract for
the master would be to survive, does the tasks it can do, logs the
tasks it can't do. Today, the contract for the master in such
situation is more "do your best but don't corrupt anything". Note that
there is an autorestart option in the scripts in the planned 0.96, so
the master can be asked to restart automatically if not stopped
properly.

N.

On Tue, Jul 3, 2012 at 7:08 PM, Jay Wilson
 wrote:
> My HMaster and HRegionservers start and run for awhile.
>
> Looking at the messages, there are appear to be some Datanodes with some
> issues, HLogSplitter has some block issues, the HMaster appears to drop
> off the network (i know bad), then it comes back, and then the cluster
> runs for about 10 more minutes before everything aborts.
>
> Questions:
>   . Are HLogSplitter block error messages common?
>   . Would Datanode issues impact the HMaster stability?
>   . Other than an actual network issue is there anything that can cause
> a "No route to host"
>
> Thank you
> ---
> Jay Wilson
>
> 2012-07-03 09:04:58,266 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers
> finished
> 2012-07-03 09:04:58,273 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived
> processed log
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-03,60020,1341328322971-splitting/devrackA-03%3A60020.1341328323503
> to
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.oldlogs/devrackA-03%3A60020.1341328323503
> 2012-07-03 09:04:58,275 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file
> splitting completed in 1052 ms for
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-03,60020,1341328322971-splitting
> 2012-07-03 09:04:58,277 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1
> hlog(s) in
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting
> 2012-07-03 09:04:58,277 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
> Thread[WriterThread-0,5,main]: starting
> 2012-07-03 09:04:58,277 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
> Thread[WriterThread-1,5,main]: starting
> 2012-07-03 09:04:58,278 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
> Thread[WriterThread-2,5,main]: starting
> 2012-07-03 09:04:58,278 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1
> of 1:
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517,
> length=124
> 2012-07-03 09:04:58,278 INFO org.apache.hadoop.hbase.util.FSUtils:
> Recovering file
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
> 2012-07-03 09:04:59,282 INFO org.apache.hadoop.hbase.util.FSUtils:
> Finished lease recover attempt for
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
> 2012-07-03 09:04:59,339 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Pushed=0 entries
> from
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
> 2012-07-03 09:04:59,341 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Waiting for split
> writer threads to finish
> 2012-07-03 09:04:59,342 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Split writers
> finished
> 2012-07-03 09:04:59,347 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived
> processed log
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting/devrackA-04%3A60020.1341328323517
> to
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.oldlogs/devrackA-04%3A60020.1341328323517
> 2012-07-03 09:04:59,349 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file
> splitting completed in 1073 ms for
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-04,60020,1341328322988-splitting
> 2012-07-03 09:04:59,352 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 1
> hlog(s) in
> hdfs://devrackA-00:8020/var/hbase-hadoop/hbase/.logs/devrackA-05,60020,1341328322976-splitting
> 2012-07-03 09:04:59,352 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
> Thread[WriterThread-0,5,main]: starting
> 2012-07-03 09:04:59,352 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Writer thread
> Thread[WriterThread-1,5,mai

Re: Hmaster and HRegionServer disappearance reason to ask

2012-07-05 Thread N Keywal
Hi,

It's a ZK expiry on sunday 1st. Root cause could be the leap second bug?

N.

On Thu, Jul 5, 2012 at 8:59 AM, lztaomin  wrote:
> HI ALL
>   My HBase group a total of 3 machine, Hadoop HBase mounted in the same 
> machine, zookeeper using HBase own. Operation 3 months after the reported 
> abnormal as follows. Cause hmaster and HRegionServer processes are gone. 
> Please help me.
> Thanks
>
> The following is a log
>
> ABORTING region server serverName=datanode1,60020,1325326435553, 
> load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): 
> regionserver:60020-0x3488dec38a02b1 regionserver:60020-0x3488dec38a02b1 
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> 2012-07-01 13:45:38,707 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for datanode1,60020,1325326435553
> 2012-07-01 13:45:38,756 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 hlog(s) 
> in hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553
> 2012-07-01 13:45:38,764 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 
> 32: 
> hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352,
>  length=5671397
> 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering 
> file 
> hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352
> 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: Finished 
> lease recover attempt for 
> hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341006689352
> 2012-07-01 13:45:39,880 INFO 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs 
> -- HDFS-200
> 2012-07-01 13:45:39,925 INFO 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs 
> -- HDFS-200
>
> ABORTING region server serverName=datanode2,60020,1325146199444, 
> load=(requests=614, regions=189, usedHeap=3662, maxHeap=8165): 
> regionserver:60020-0x3488dec38a0002 regionserver:60020-0x3488dec38a0002 
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:343)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:261)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> 2012-07-01 13:24:10,308 INFO org.apache.hadoop.hbase.util.FSUtils: Finished 
> lease recover attempt for 
> hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341075090535
> 2012-07-01 13:24:10,918 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 21 of 
> 32: 
> hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanode1%3A60020.1341078690560,
>  length=11778108
> 2012-07-01 13:24:29,809 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path 
> hdfs://namenode:9000/hbase/t_speakfor_relation_chapter/ffd2057b46da227e078c82ff43f0f9f2/recovered.edits/00660951991
>  (wrote 8178 edits in 403ms)
> 2012-07-01 13:24:29,809 INFO 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting 
> completed in -1268935 ms for 
> hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553
> 2012-07-01 13:24:29,824 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Received 
> exception accessing META during server shutdown of 
> datanode1,60020,1325326435553, retrying META read
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not 
> running, aborting
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2408)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionInfo(HRegionServer.java:1649)
> at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>
>
>
> lztaomin


Re: distributed log splitting aborted

2012-07-06 Thread N Keywal
Hi Cyril,

BTW, have you checked dfs.datanode.max.xcievers and ulimit -n? When
underconfigured they can cause this type of errors, even if it seems
it's not the case here...

Cheers,

N.

On Fri, Jul 6, 2012 at 11:31 AM, Cyril Scetbon  wrote:
> The file is now missing but I have tried with another one and you can see the 
> error :
>
> shell> hdfs dfs -ls 
> "/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446"
> Found 1 items
> -rw-r--r--   4 hbase supergroup  0 2012-07-04 17:06 
> /hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446
> shell> hdfs dfs -cat 
> "/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446"
> 12/07/06 09:27:51 WARN hdfs.DFSClient: Last block locations not available. 
> Datanodes might not have reported blocks completely. Will retry for 3 times
> 12/07/06 09:27:55 WARN hdfs.DFSClient: Last block locations not available. 
> Datanodes might not have reported blocks completely. Will retry for 2 times
> 12/07/06 09:27:59 WARN hdfs.DFSClient: Last block locations not available. 
> Datanodes might not have reported blocks completely. Will retry for 1 times
> cat: Could not obtain the last block locations.
>
> I'm using hadoop 2.0 from Cloudera package (CDH4) with hbase 0.92.1
>
> Regards
> Cyril SCETBON
>
> On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote:
>
>> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it
>> and see if it goes to the end of it.
>>
>> It could also be useful to see a bigger portion of the master log, for
>> all I know maybe it handles it somehow and there's a problem
>> elsewhere.
>>
>> Finally, which Hadoop version are you using?
>>
>> Thx,
>>
>> J-D
>>
>> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon  wrote:
>>> yes :
>>>
>>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971
>>>
>>> I did a fsck and here is the report :
>>>
>>> Status: HEALTHY
>>> Total size:618827621255 B (Total open files size: 868 B)
>>> Total dirs:4801
>>> Total files:   2825 (Files currently being written: 42)
>>> Total blocks (validated):  11479 (avg. block size 53909541 B) (Total 
>>> open file blocks (not validated): 41)
>>> Minimally replicated blocks:   11479 (100.0 %)
>>> Over-replicated blocks:1 (0.008711561 %)
>>> Under-replicated blocks:   0 (0.0 %)
>>> Mis-replicated blocks: 0 (0.0 %)
>>> Default replication factor:4
>>> Average block replication: 4.873
>>> Corrupt blocks:0
>>> Missing replicas:  0 (0.0 %)
>>> Number of data-nodes:  12
>>> Number of racks:   1
>>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds
>>>
>>>
>>> The filesystem under path '/hbase' is HEALTHY
>>>
>>> Cyril SCETBON
>>>
>>> Cyril SCETBON
>>>
>>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote:
>>>
 Does this file really exist in HDFS?

 hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711

 If so, did you run fsck in HDFS?

 It would be weird if HDFS doesn't report anything bad but somehow the
 clients (like HBase) can't read it.

 J-D

 On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon  
 wrote:
> Hi,
>
> I can nolonger start my cluster correctly and get messages like 
> http://pastebin.com/T56wrJxE (taken on one region server)
>
> I suppose Hbase is not done for being stopped but only for having some 
> nodes going down ??? HDFS is not complaining, it's only HBase that can't 
> start correctly :(
>
> I suppose some data has not been flushed and it's not really important 
> for me. Is there a way to fix theses errors even if I will lose data ?
>
> thanks
>
> Cyril SCETBON
>
>>>
>


Re: HBaseClient recovery from .META. server power down

2012-07-09 Thread N Keywal
Hi,

What you're describing -the 35 minutes recovery time- seems to match
the code. And it's a bug (still there on trunk). Could you please
create a jira for it? If you have the logs it even better.

Lowering the ipc.socket.timeout seems to be an acceptable partial
workaround. Setting it to 10s seems ok to me. Lower than this... I
don't know.

N.


On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma  wrote:
> Hello:
> I'd like to get advice on the below strategy of decreasing the
> "ipc.socket.timeout" configuration on the HBase Client side ... has
> anyone tried this? Has anyone had any issues with configuring this
> lower than the default 20s?
>
> Thanks,
> --Suraj
>
> On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma  wrote:
>> By "power down" below, I mean powering down the host with the RS that
>> holds the .META. table. (So - essentially, the host IP is unreachable
>> and the RS/DN is gone.)
>>
>> Just wanted to clarify my below steps ...
>> --S
>>
>> On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma  wrote:
>>> Hello:
>>> We've been doing some failure scenario tests by powering down a .META.
>>> holding region server host and while the HBase cluster itself recovers
>>> and reassigns the META region and other regions (after we tweaked down
>>> the default timeouts), our client apps using HBaseClient take a long
>>> time to recover.
>>>
>>> hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23
>>>
>>> Process:
>>> 1) Apply load via client app on HBase cluster for several minutes
>>> 2) Power down the region server holding the .META. server
>>> 3) Measure how long it takes for cluster to reassign META table and
>>> for client threads to re-lookup and re-orient to the lesser cluster
>>> (minus the RS and DN on that host).
>>>
>>> What we see:
>>> 1) Client threads spike up to maxThread size ... and take over 35 mins
>>> to recover (i.e. for the thread count to go back to normal) - no calls
>>> are being serviced - they are all just backed up on a synchronized
>>> method ...
>>>
>>> 2) Essentially, all the client app threads queue up behind the
>>> HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient
>>> (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312).
>>> http://tinyurl.com/7js53dj
>>>
>>> After taking several thread dumps we found that the thread within this
>>> synchronized method was blocked on
>>>NetUtils.connect(this.socket, remoteId.getAddress(), 
>>> getSocketTimeout(conf));
>>>
>>> Essentially, the thread which got the lock would try to connect to the
>>> dead RS (till socket times out), retrying, and then the next thread
>>> gets in and so forth.
>>>
>>> Solution tested:
>>> ---
>>> So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s.
>>> We dropped this down to a low number (1000 ms,  100 ms, etc) and the
>>> recovery was much faster (in a couple of minutes).
>>>
>>> So - we're thinking of setting the HBase client side hbase-site.xml
>>> with an ipc.socket.timeout of 100ms. Looking at the code, it appears
>>> that this is only ever used during the initial "HConnection" setup via
>>> the NetUtils.connect and should only ever be used when connectivity to
>>> a region server is lost and needs to be re-established. i.e it does
>>> not affect the normal "RPC" actiivity as this is just the connect
>>> timeout.
>>>
>>> Am I reading the code right? Any thoughts on how whether this is too
>>> low for comfort? (Our internal tests did not show any errors during
>>> normal operation related to timeouts etc ... but, I just wanted to run
>>> this by the experts.).
>>>
>>> Note that this above timeout tweak is only on the HBase client side.
>>> Thanks,
>>> --Suraj


Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread N Keywal
Thanks for the jira.
The client can be connected to multiple RS, depending on the rows is
working on. So yes it's initial, but it's a dynamic initial :-).
This said there is a retry on error...

On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma  wrote:
> I will create a JIRA ticket ...
>
> The only side-effect I could think of is ... if a RS is having a GC of
> a few seconds, any _new_ client trying to connect would get connect
> failures. So ... the _initial_ connection to the RS is what would
> suffer from a super-low setting of the ipc.socket.timeout. This was my
> read of the code.
>
> So - was hoping to get a confirmation if this is the only side effect.
> Again - this is on the client side - I wouldn't risk doing this on the
> cluster side ...
> --Suraj
>
> On Mon, Jul 9, 2012 at 9:44 AM, N Keywal  wrote:
>> Hi,
>>
>> What you're describing -the 35 minutes recovery time- seems to match
>> the code. And it's a bug (still there on trunk). Could you please
>> create a jira for it? If you have the logs it even better.
>>
>> Lowering the ipc.socket.timeout seems to be an acceptable partial
>> workaround. Setting it to 10s seems ok to me. Lower than this... I
>> don't know.
>>
>> N.
>>
>>
>> On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma  wrote:
>>> Hello:
>>> I'd like to get advice on the below strategy of decreasing the
>>> "ipc.socket.timeout" configuration on the HBase Client side ... has
>>> anyone tried this? Has anyone had any issues with configuring this
>>> lower than the default 20s?
>>>
>>> Thanks,
>>> --Suraj
>>>
>>> On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma  wrote:
>>>> By "power down" below, I mean powering down the host with the RS that
>>>> holds the .META. table. (So - essentially, the host IP is unreachable
>>>> and the RS/DN is gone.)
>>>>
>>>> Just wanted to clarify my below steps ...
>>>> --S
>>>>
>>>> On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma  wrote:
>>>>> Hello:
>>>>> We've been doing some failure scenario tests by powering down a .META.
>>>>> holding region server host and while the HBase cluster itself recovers
>>>>> and reassigns the META region and other regions (after we tweaked down
>>>>> the default timeouts), our client apps using HBaseClient take a long
>>>>> time to recover.
>>>>>
>>>>> hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23
>>>>>
>>>>> Process:
>>>>> 1) Apply load via client app on HBase cluster for several minutes
>>>>> 2) Power down the region server holding the .META. server
>>>>> 3) Measure how long it takes for cluster to reassign META table and
>>>>> for client threads to re-lookup and re-orient to the lesser cluster
>>>>> (minus the RS and DN on that host).
>>>>>
>>>>> What we see:
>>>>> 1) Client threads spike up to maxThread size ... and take over 35 mins
>>>>> to recover (i.e. for the thread count to go back to normal) - no calls
>>>>> are being serviced - they are all just backed up on a synchronized
>>>>> method ...
>>>>>
>>>>> 2) Essentially, all the client app threads queue up behind the
>>>>> HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient
>>>>> (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312).
>>>>> http://tinyurl.com/7js53dj
>>>>>
>>>>> After taking several thread dumps we found that the thread within this
>>>>> synchronized method was blocked on
>>>>>NetUtils.connect(this.socket, remoteId.getAddress(), 
>>>>> getSocketTimeout(conf));
>>>>>
>>>>> Essentially, the thread which got the lock would try to connect to the
>>>>> dead RS (till socket times out), retrying, and then the next thread
>>>>> gets in and so forth.
>>>>>
>>>>> Solution tested:
>>>>> ---
>>>>> So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s.
>>>>> We dropped this down to a low number (1000 ms,  100 ms, etc) and the
>>>>> recovery was much faster (in a couple of minutes).
>>>>>
>>>>> So - we're thinking of setting the HBase client side hbase-site.xml
>>>>> with an ipc.socket.timeout of 100ms. Looking at the code, it appears
>>>>> that this is only ever used during the initial "HConnection" setup via
>>>>> the NetUtils.connect and should only ever be used when connectivity to
>>>>> a region server is lost and needs to be re-established. i.e it does
>>>>> not affect the normal "RPC" actiivity as this is just the connect
>>>>> timeout.
>>>>>
>>>>> Am I reading the code right? Any thoughts on how whether this is too
>>>>> low for comfort? (Our internal tests did not show any errors during
>>>>> normal operation related to timeouts etc ... but, I just wanted to run
>>>>> this by the experts.).
>>>>>
>>>>> Note that this above timeout tweak is only on the HBase client side.
>>>>> Thanks,
>>>>> --Suraj


Re: HBaseClient recovery from .META. server power down

2012-07-10 Thread N Keywal
I expect (without double checking the path in the code ;-) that the
code in HConnectionManager will retry.

On Tue, Jul 10, 2012 at 7:22 PM, Suraj Varma  wrote:
> Yes.
>
> On the maxRetries, though ... I saw the code
> (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#677)
> show
> this.maxRetries = conf.getInt("hbase.ipc.client.connect.max.retries", 0);
>
> So - looks like by default, the maxRetries is set to 0? So ... there
> is effectively no retry (i.e. it is fail-fast)
> --Suraj
>
> On Tue, Jul 10, 2012 at 10:12 AM, N Keywal  wrote:
>> Thanks for the jira.
>> The client can be connected to multiple RS, depending on the rows is
>> working on. So yes it's initial, but it's a dynamic initial :-).
>> This said there is a retry on error...
>>
>> On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma  wrote:
>>> I will create a JIRA ticket ...
>>>
>>> The only side-effect I could think of is ... if a RS is having a GC of
>>> a few seconds, any _new_ client trying to connect would get connect
>>> failures. So ... the _initial_ connection to the RS is what would
>>> suffer from a super-low setting of the ipc.socket.timeout. This was my
>>> read of the code.
>>>
>>> So - was hoping to get a confirmation if this is the only side effect.
>>> Again - this is on the client side - I wouldn't risk doing this on the
>>> cluster side ...
>>> --Suraj
>>>
>>> On Mon, Jul 9, 2012 at 9:44 AM, N Keywal  wrote:
>>>> Hi,
>>>>
>>>> What you're describing -the 35 minutes recovery time- seems to match
>>>> the code. And it's a bug (still there on trunk). Could you please
>>>> create a jira for it? If you have the logs it even better.
>>>>
>>>> Lowering the ipc.socket.timeout seems to be an acceptable partial
>>>> workaround. Setting it to 10s seems ok to me. Lower than this... I
>>>> don't know.
>>>>
>>>> N.
>>>>
>>>>
>>>> On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma  wrote:
>>>>> Hello:
>>>>> I'd like to get advice on the below strategy of decreasing the
>>>>> "ipc.socket.timeout" configuration on the HBase Client side ... has
>>>>> anyone tried this? Has anyone had any issues with configuring this
>>>>> lower than the default 20s?
>>>>>
>>>>> Thanks,
>>>>> --Suraj
>>>>>
>>>>> On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma  wrote:
>>>>>> By "power down" below, I mean powering down the host with the RS that
>>>>>> holds the .META. table. (So - essentially, the host IP is unreachable
>>>>>> and the RS/DN is gone.)
>>>>>>
>>>>>> Just wanted to clarify my below steps ...
>>>>>> --S
>>>>>>
>>>>>> On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma  wrote:
>>>>>>> Hello:
>>>>>>> We've been doing some failure scenario tests by powering down a .META.
>>>>>>> holding region server host and while the HBase cluster itself recovers
>>>>>>> and reassigns the META region and other regions (after we tweaked down
>>>>>>> the default timeouts), our client apps using HBaseClient take a long
>>>>>>> time to recover.
>>>>>>>
>>>>>>> hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23
>>>>>>>
>>>>>>> Process:
>>>>>>> 1) Apply load via client app on HBase cluster for several minutes
>>>>>>> 2) Power down the region server holding the .META. server
>>>>>>> 3) Measure how long it takes for cluster to reassign META table and
>>>>>>> for client threads to re-lookup and re-orient to the lesser cluster
>>>>>>> (minus the RS and DN on that host).
>>>>>>>
>>>>>>> What we see:
>>>>>>> 1) Client threads spike up to maxThread size ... and take over 35 mins
>>>>>>> to recover (i.e. for the thread count to go back to normal) - no calls
>>>>>>> are being serviced - they are all just backed up on a synchronized
>>>>>>> method ...
>>>>>>>
>>>>>>> 2) Essentially, all the client app threads queue up behind the
>>>>>>&

Re: Maximum number of tables ?

2012-07-13 Thread N Keywal
Hi,

There is no real limits as far as I know. As you will have one region
per table (at least :-), the number of region will be something to
monitor carefully  if you need thousands of table. See
http://hbase.apache.org/book.html#arch.regions.size.

Don't forget that you can add as many column as you want, and that an
empty cell cost nothing. For example, a class hierarchy is often
mapped to multiple tables in a RDBMS, while in HBase having a single
table for the same hierarchy makes much more sense. Moreover, there is
no transaction between tables, so sometimes a 'uml composition' will
go to a single table. And so on.

N.

On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet
 wrote:
> Hi there,
>
> I read some good practices about number of columns / column families, but
> nothing about the number of tables.
> What if I need to spread my data among hundred or thousand (big) tables ?
> What should I care about ? I guess I should keep a tight number of
> storeFiles per RegionServer ?
>
> --
> Adrien Mogenet
> http://www.mogenet.me


Re: Lowering HDFS socket timeouts

2012-07-18 Thread N Keywal
Hi Bryan,

It's a difficult question, because dfs.socket.timeout is used all over
the place in hdfs. I'm currently documenting this.
Especially:
- it's used for connections between datanodes, and not only for
connections between hdfs clients & hdfs datanodes.
- It's also used for the two types of datanodes connection (ports
beeing 50010 & 50020 by default).
- It's used as a connect timeout, but as well as a read timeout
(socket is connected, but the application does not write for a while)
- It's used with various extensions, so when your seeing stuff like
69000 or 66000 it's often the same setting timeout + 3s (hardcoded) *
#replica

For a single datanode issue, with everything going well, it will make
the cluster much more reactive: hbase will go to another node
immediately instead of waiting. But it will make it much more
sensitive to gc and network issues. If you have a major hardware
issue, something like 10% of your cluster going down, this setting
will multiply the number of retries, and will add a lot of workload to
your already damaged cluster, and this could make the things worse.

This said, I think we will need to make it shorter sooner or later, so
if you do it on your cluster, it will be helpful...

N.

On Tue, Jul 17, 2012 at 7:11 PM, Bryan Beaudreault
 wrote:
> Today I needed to restart one of my region servers, and did so without 
> gracefully shutting down the datanode.  For the next 1-2 minutes we had a 
> bunch of failed queries from various other region servers trying to access 
> that datanode.  Looking at the logs, I saw that they were all socket timeouts 
> after 6 milliseconds.
>
> We use HBase mostly as an online datastore, with various APIs powering 
> various web apps and external consumers.  Writes come from both the API in 
> some cases, but we have continuous hadoop jobs feeding data in as well.
>
> Since we have web app consumers, this 60 second timeout seems unreasonably 
> long.  If a datanode goes down, ideally the impact would be much smaller than 
> that.  I want to lower the dfs.socket.timeout to something like 5-10 seconds, 
> but do not know the implications of this.
>
> In googling I did not find much precedent for this, but I did find some 
> people talking about upping the timeout to much longer than 60 seconds.  Is 
> it generally safe to lower this timeout dramatically if you want faster 
> failures? Are there any downsides to this?
>
> Thanks
>
> --
> Bryan Beaudreault
>


Re: Lowering HDFS socket timeouts

2012-07-18 Thread N Keywal
I don't know. The question is mainly for the read time out: you will
connect to the ipc.Client with a read timeout of let say 10s. Server
side the implementation may do something with another server, with a
connect & read timeout of 60s. So if you have:
HBase --> live DN --> dead DN

The timeout will be triggered in HBase while the live DN is still
waiting for the answer from the dead dn. It could even retry on
another node.
 On paper, this should work, as this could happen in real life without
changing the dfs timeouts.. And may be this case does not even exist.
But as the extension mechanism is designed to add some extra seconds,
it could exist for this reason or something alike. Worth asking on the
hdfs mailing list I would say.

On Wed, Jul 18, 2012 at 4:28 PM, Bryan Beaudreault
 wrote:
> Thanks for the response, N.  I could be wrong here, but since this problem is 
> in the HDFS client code, couldn't I set this dfs.socket.timeout in my 
> hbase-site.xml and it would only affect hbase connections to hdfs?  I.e. we 
> wouldn't have to worry about affecting connections between datanodes, etc.
>
> --
> Bryan Beaudreault
>
>
> On Wednesday, July 18, 2012 at 4:38 AM, N Keywal wrote:
>
>> Hi Bryan,
>>
>> It's a difficult question, because dfs.socket.timeout is used all over
>> the place in hdfs. I'm currently documenting this.
>> Especially:
>> - it's used for connections between datanodes, and not only for
>> connections between hdfs clients & hdfs datanodes.
>> - It's also used for the two types of datanodes connection (ports
>> beeing 50010 & 50020 by default).
>> - It's used as a connect timeout, but as well as a read timeout
>> (socket is connected, but the application does not write for a while)
>> - It's used with various extensions, so when your seeing stuff like
>> 69000 or 66000 it's often the same setting timeout + 3s (hardcoded) *
>> #replica
>>
>> For a single datanode issue, with everything going well, it will make
>> the cluster much more reactive: hbase will go to another node
>> immediately instead of waiting. But it will make it much more
>> sensitive to gc and network issues. If you have a major hardware
>> issue, something like 10% of your cluster going down, this setting
>> will multiply the number of retries, and will add a lot of workload to
>> your already damaged cluster, and this could make the things worse.
>>
>> This said, I think we will need to make it shorter sooner or later, so
>> if you do it on your cluster, it will be helpful...
>>
>> N.
>>
>> On Tue, Jul 17, 2012 at 7:11 PM, Bryan Beaudreault
>> mailto:bbeaudrea...@gmail.com)> wrote:
>> > Today I needed to restart one of my region servers, and did so without 
>> > gracefully shutting down the datanode. For the next 1-2 minutes we had a 
>> > bunch of failed queries from various other region servers trying to access 
>> > that datanode. Looking at the logs, I saw that they were all socket 
>> > timeouts after 6 milliseconds.
>> >
>> > We use HBase mostly as an online datastore, with various APIs powering 
>> > various web apps and external consumers. Writes come from both the API in 
>> > some cases, but we have continuous hadoop jobs feeding data in as well.
>> >
>> > Since we have web app consumers, this 60 second timeout seems unreasonably 
>> > long. If a datanode goes down, ideally the impact would be much smaller 
>> > than that. I want to lower the dfs.socket.timeout to something like 5-10 
>> > seconds, but do not know the implications of this.
>> >
>> > In googling I did not find much precedent for this, but I did find some 
>> > people talking about upping the timeout to much longer than 60 seconds. Is 
>> > it generally safe to lower this timeout dramatically if you want faster 
>> > failures? Are there any downsides to this?
>> >
>> > Thanks
>> >
>> > --
>> > Bryan Beaudreault
>> >
>>
>>
>>
>
>


Re: Region Server failure due to remote data node errors

2012-07-30 Thread N Keywal
Hi Jay,

Yes, the whole log would be interesting, plus the logs of the datanode
on the same box as the dead RS.
What's your hbase & hdfs versions?

The RS should be immune to hdfs errors. There are known issues (see
HDFS-3701), but it seems you have something different...
This:
> java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
> remote=/10.128.204.225:50010]

Seems to say that the error was between the datanode on the same box as the RS?

Nicolas

On Mon, Jul 30, 2012 at 6:43 PM, Jay T  wrote:
>  A couple of our region servers (in a 16 node cluster) crashed due to
> underlying Data Node errors. I am trying to understand how errors on remote
> data nodes impact other region server processes.
>
> *To briefly describe what happened:
> *
> 1) Cluster was in operation. All 16 nodes were up, reads and writes were
> happening extensively.
> 2) Nodes 7 and 8 were shutdown for maintenance. (No graceful shutdown DN and
> RS service were running and the power was just pulled out)
> 3) Nodes 2 and 5 flushed and DFS client started reporting errors. From the
> log it seems like DFS blocks were being replicated to the nodes that were
> shutdown (7 and 8) and since replication could not go through successfully
> DFS client raised errors on 2 and 5 and eventually the RS itself died.
>
> The question I am trying to get an answer for is : Is a Region Server immune
> from remote data node errors (that are part of the replication pipeline) or
> not. ?
> *
> Part of the Region Server Log:* (Node 5)
>
> 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream 10.128.204.225:50010 java.io.IOException: Bad
> connect ack with firstBadLink
> as 10.128.204.228:50010
> 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-316956372096761177_489798
> 2012-07-26 18:53:15,246 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> datanode 10.128.204.228:50010
> 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.StoreFile:
> NO General Bloom and NO DeleteFamily was added to HFile
> (hdfs://Node101:8020/hbase/table/754de060
> c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da124)
> 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.Store:
> Flushed , sequenceid=4046717645, memsize=256.5m, into tmp file
> hdfs://Node101:8020/hbase/table/754de0
> 60c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da1242012-07-26
> 18:53:16,907 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming
> flushed file at
> hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/.tmp/26f5c
> d1fb2cb4547972a31073d2da124 to
> hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2da124
> 2012-07-26 18:53:16,921 INFO org.apache.hadoop.hbase.regionserver.Store:
> Added
> hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/CF/26f5cd1fb2cb4547972a31073d2d
> a124, entries=1137956, sequenceid=4046717645, filesize=13.2m2012-07-26
> 18:53:32,048 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
> java.net.SocketTimeoutException: 15000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
> remote=/10.128.204.225:50010]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2857)
> 2012-07-26 18:53:32,049 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_5116092240243398556_489796 bad datanode[0]
> 10.128.204.225:50010
> 2012-07-26 18:53:32,049 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_5116092240243398556_489796 in pipeline
> 10.128.204.225:50010, 10.128.204.221:50010, 10.128.204.227:50010: bad
> datanode 10.128.204.225:50010
>
> I can pastebin the entire log but this is when things started going wrong
> for Node 5 and eventually shutdown hook for RS started and the RS was
> shutdown.
>
> Any help in troubleshooting this is greatly appreciated.
>
> Thanks,
> Jay


Re: Region Server failure due to remote data node errors

2012-07-30 Thread N Keywal
Hi Jay,

As you said aldready, the pipeline for blk_5116092240243398556_489796
contains only dead nodes, and this is likely the cause for the wrong
behavior.
This block is used by a hlog file, created just before the error. I
don't get why there are 3 nodes in the pipeline, I would expect only
2. Do you have a specific setting for dfs.replication?

Log files are specific, HBase checks that the replication really
occurs by checking the replication count, and close them if it's not
ok. But it seems that all the nodes are dead from the start, and this
could be ill-managed in HBase. Reproducing this may be difficult, but
should be possible.

Then the region server is stopped, but I didn't see in the logs what
was the path for this, so it's surprising to say the least.

After this, all the errors on 'already closed are not that critical
imho: the close will fail as hdfs closes the file when it cannot
recover from an error.

I guess your question is still opened. But from what I see it could be
a HBase bug.

I will be interested to know the conclusions of your analysis...

Nicolas

On Mon, Jul 30, 2012 at 8:01 PM, Jay T  wrote:
>  Thanks for the quick reply Nicolas. We are using HBase 0.94 on Hadoop
> 1.0.3.
>
> I have uploaded the logs here:
>
> Region Server  log: http://pastebin.com/QEQ22UnU
> Data Node log:  http://pastebin.com/DF0JNL8K
>
> Appreciate your help in figuring this out.
>
> Thanks,
> Jay
>
>
>
>
> On 7/30/12 1:02 PM, N Keywal wrote:
>>
>> Hi Jay,
>>
>> Yes, the whole log would be interesting, plus the logs of the datanode
>> on the same box as the dead RS.
>> What's your hbase&  hdfs versions?
>>
>>
>> The RS should be immune to hdfs errors. There are known issues (see
>> HDFS-3701), but it seems you have something different...
>> This:
>>>
>>> java.nio.channels.SocketChannel[connected local=/10.128.204.225:52949
>>> remote=/10.128.204.225:50010]
>>
>> Seems to say that the error was between the datanode on the same box as
>> the RS?
>>
>> Nicolas
>>
>> On Mon, Jul 30, 2012 at 6:43 PM, Jay T  wrote:
>>>
>>>   A couple of our region servers (in a 16 node cluster) crashed due to
>>> underlying Data Node errors. I am trying to understand how errors on
>>> remote
>>> data nodes impact other region server processes.
>>>
>>> *To briefly describe what happened:
>>> *
>>> 1) Cluster was in operation. All 16 nodes were up, reads and writes were
>>> happening extensively.
>>> 2) Nodes 7 and 8 were shutdown for maintenance. (No graceful shutdown DN
>>> and
>>> RS service were running and the power was just pulled out)
>>> 3) Nodes 2 and 5 flushed and DFS client started reporting errors. From
>>> the
>>> log it seems like DFS blocks were being replicated to the nodes that were
>>> shutdown (7 and 8) and since replication could not go through
>>> successfully
>>> DFS client raised errors on 2 and 5 and eventually the RS itself died.
>>>
>>> The question I am trying to get an answer for is : Is a Region Server
>>> immune
>>> from remote data node errors (that are part of the replication pipeline)
>>> or
>>> not. ?
>>> *
>>> Part of the Region Server Log:* (Node 5)
>>>
>>> 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Exception
>>> in
>>> createBlockOutputStream 10.128.204.225:50010 java.io.IOException: Bad
>>> connect ack with firstBadLink
>>> as 10.128.204.228:50010
>>> 2012-07-26 18:53:15,245 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
>>> block blk_-316956372096761177_489798
>>> 2012-07-26 18:53:15,246 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
>>> datanode 10.128.204.228:50010
>>> 2012-07-26 18:53:16,903 INFO
>>> org.apache.hadoop.hbase.regionserver.StoreFile:
>>> NO General Bloom and NO DeleteFamily was added to HFile
>>> (hdfs://Node101:8020/hbase/table/754de060
>>> c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da124)
>>> 2012-07-26 18:53:16,903 INFO org.apache.hadoop.hbase.regionserver.Store:
>>> Flushed , sequenceid=4046717645, memsize=256.5m, into tmp file
>>> hdfs://Node101:8020/hbase/table/754de0
>>>
>>> 60c9d96286e0c8cd200716ffde/.tmp/26f5cd1fb2cb4547972a31073d2da1242012-07-26
>>> 18:53:16,907 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming
>>> flushed file at
>>>
>>> hdfs://Node101:8020/hbase/table/754de060c9d96286e0c8cd200716ffde/.tmp/26f5c
>>> d1fb2cb4547972a31073d2da1

Re: hbase can't start:KeeperErrorCode = NoNode for /hbase

2012-08-02 Thread N Keywal
Hi,

The issue is in ZooKeeper, not directly HBase. It seems its data is
corrupted, so it cannot start. You can configure zookeeper to another
data directory to make it start.

N.


On Thu, Aug 2, 2012 at 11:11 AM, abloz...@gmail.com  wrote:
> I even move /hbase to hbase2, and create a new dir /hbase1, modify
> hbase-site.xml to:
> 
> hbase.rootdir
> hdfs://Hadoop48:54310/hbase1
> 
>  
> zookeeper.znode.parent
> /hbase1
> 
>
> But the error message still  KeeperErrorCode = NoNode for /hbase
>
> Any body can give any help?
> Thanks!
>
> Andy zhou
>
> 2012/8/2 abloz...@gmail.com 
>
>> hi all,
>> After I killed all java process, I can't restart hbase, it reports:
>>
>> Hadoop46: starting zookeeper, logging to
>> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop46.out
>> Hadoop47: starting zookeeper, logging to
>> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop47.out
>> Hadoop48: starting zookeeper, logging to
>> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop48.out
>> Hadoop46: java.lang.RuntimeException: Unable to run quorum server
>> Hadoop46:   at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
>> Hadoop46:   at
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
>> Hadoop46:   at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
>> Hadoop46:   at
>> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74)
>> Hadoop46:   at
>> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64)
>> Hadoop46: Caused by: java.io.IOException: Failed to process transaction
>> type: 1 error: KeeperErrorCode = NoNode for /hbase
>> Hadoop46:   at
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>> Hadoop46:   at
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>> Hadoop46:   at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>> Hadoop47: java.lang.RuntimeException: Unable to run quorum server
>> Hadoop47:   at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
>> Hadoop47:   at
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
>> Hadoop47:   at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
>>  Hadoop47:   at
>> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74)
>> Hadoop47:   at
>> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64)
>> Hadoop47: Caused by: java.io.IOException: Failed to process transaction
>> type: 1 error: KeeperErrorCode = NoNode for /hbase
>> Hadoop47:   at
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>> Hadoop47:   at
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>> Hadoop47:   at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>>
>> while Hadoop48 is HMaster.
>> but hdfs://xxx/hbase is existed.
>> [zhouhh@Hadoop47 ~]$ hadoop fs -ls /hbase
>> Found 113 items
>> drwxr-xr-x   - zhouhh supergroup  0 2012-07-03 19:24 /hbase/-ROOT-
>> drwxr-xr-x   - zhouhh supergroup  0 2012-07-03 19:24 /hbase/.META.
>> ...
>>
>> So what's the problem?
>> Thanks!
>>
>> andy
>>


Re: HBaseTestingUtility on windows

2012-08-02 Thread N Keywal
Hi Mohit,

For simple cases, it works for me for hbase 0.94 at least. But I'm not
sure it works for all features. I've never tried to run hbase unit
tests on windows for example.

N.

On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia  wrote:
> I am trying to run mini cluster using HBaseTestingUtility Class from hbase
> tests on windows, but I get "bash command error". Is it not possible to run
> this utility class on windows?
>
> I followed this example:
>
> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/


Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

2012-08-10 Thread N Keywal
Hi,

What are your queries exactly? What's the HBase version?

The mechanism is:
- There is a location cache, per HConnection, on the client
- The client first tries the region server in its cache
- if it fails, the client removes this entry from the cache and enters
the retry loop
- there is a limited amount of retries and a sleep between the retries
- most of the times, the client will connect to meta to get the new location

When there are multiple queries, before HBASE-5924, the errors will be
analyzed after the other regions servers has returned as well. It
could be an explanation. HBASE-5877 exists as well, but only for
moves, not for splits...

Cheers,

N.


On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
 wrote:
> on the region server's log :2012-08-10 11:49:50,796 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> NotServingRegionException; Region is not online:
> test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
>
> after region split, client didnt get result after timeout setting(1.5
> second),then the task is canceled by my program, so the HConnectionManager
> didnt delete the cachedLocation;
> the client  still query the old region id which is no more exists
>
> And more, part of my processes updated the region location info, part
> not.I'm sure the network is fine;
>
> how to fix the problem?why does it need so long time to detect the new
> regions?


Re: after region split, client didnt get result after timeout setting,so the cachedLocation didnot update, client still query the old region id

2012-08-10 Thread N Keywal
If it's a single row, I would expect the server to return the error
immediately. Then you will have the sleep I was mentioning previously,
but the cache should be cleaned before the sleep...

On Fri, Aug 10, 2012 at 1:32 PM, deanforwever2010
 wrote:
> hi, Keywal
> my hbase version is 0.94,
> my query is just to get limited columns of a row,
> I make a callable task of 1.5 seconds, so  maybe it didnot fail but
> canceled by my process,so the region cache didnot clear after many requests
> happened.
> my question is why should it take so long time for failure? and it behave
> different between my servers, and there is no problem with network.
>
> 2012/8/10 N Keywal 
>
>> Hi,
>>
>> What are your queries exactly? What's the HBase version?
>>
>> The mechanism is:
>> - There is a location cache, per HConnection, on the client
>> - The client first tries the region server in its cache
>> - if it fails, the client removes this entry from the cache and enters
>> the retry loop
>> - there is a limited amount of retries and a sleep between the retries
>> - most of the times, the client will connect to meta to get the new
>> location
>>
>> When there are multiple queries, before HBASE-5924, the errors will be
>> analyzed after the other regions servers has returned as well. It
>> could be an explanation. HBASE-5877 exists as well, but only for
>> moves, not for splits...
>>
>> Cheers,
>>
>> N.
>>
>>
>> On Fri, Aug 10, 2012 at 11:26 AM, deanforwever2010
>>  wrote:
>> > on the region server's log :2012-08-10 11:49:50,796 DEBUG
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > NotServingRegionException; Region is not online:
>> > test_list,zWPpyme,1342510667492.91486e7fa0ac39048276848a2618479b.
>> >
>> > after region split, client didnt get result after timeout setting(1.5
>> > second),then the task is canceled by my program, so the
>> HConnectionManager
>> > didnt delete the cachedLocation;
>> > the client  still query the old region id which is no more exists
>> >
>> > And more, part of my processes updated the region location info, part
>> > not.I'm sure the network is fine;
>> >
>> > how to fix the problem?why does it need so long time to detect the new
>> > regions?
>>


Re: Problem - Bringing up the HBase cluster

2012-08-22 Thread N Keywal
Hi,

Please use the user mailing list (added at dest) for this type of
questions instead of the dev list (now in bcc).

It's a little bit strange to use the full distributed mode with a
single region server. Is the Pseudo-distributed mode working?
Check the number of datanodes vs. dfs.replication (default 3). If you
have less datanodes then dfs.replication value, it won't work
properly.
Check as well that the region server is connected to the master.

Cheers,



On Wed, Aug 22, 2012 at 3:16 AM, kbmkumar  wrote:
> Hi,
>   I am trying to bring up a HBase cluster with 1 master and 1 one region
> server. I am using
> Hadoop 1.0.3
> Hbase 0.94.1
>
> Starting the hdfs was straight forward and i could see the namenode up and
> running successfully. But the problem is with Hbase. I followed all the
> guidelines given in the Hbase cluster setup (fully distributed mode) and ran
> the start-hbase.sh
>
> It started the Master, Region server and zookeeper (in the region server) as
> per my configuration. But i am not sure the master is fully functional. When
> i try to connect hbase shell and create table, it errors out saying
> PleaseHoldException- Master is initializing
>
> In UI HMaster status shows like this *Assigning META region (since 18mins,
> 39sec ago)*
>
> and i see the Hmaster logs are flowing with the following debug prints, the
> log file is full of below prints,
> *
> 2012-08-22 01:08:19,637 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Looked up root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
> serverName=hadoop-datanode1,60020,1345596463277
> 2012-08-22 01:08:19,638 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Looked up root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
> serverName=hadoop-datanode1,60020,1345596463277
> 2012-08-22 01:08:19,639 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Looked up root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
> serverName=hadoop-datanode1,60020,1345596463277*
>
> Please help me in debugging this.
>
>
>
>
>
> --
> View this message in context: 
> http://apache-hbase.679495.n3.nabble.com/Problem-Bringing-up-the-HBase-cluster-tp4019948.html
> Sent from the HBase - Developer mailing list archive at Nabble.com.


Re: Hbase Shell: UnsatisfiedLinkError

2012-08-22 Thread N Keywal
Hi,

Well the first steps would be:
1) Use the JDK 1.6 from Oracle. 1.7 is not supported yet.
2) Check the content of
http://hbase.apache.org/book.html#configuration to set up your first
cluster. Worth reading the whole guide imho.
3) Start with the last released version (.94), except if you have a
good reason to use the .90 of course.
4) Use the user mailing list for this type of questions and not the
dev one. :-). I kept dev in bcc.

Good luck,

N.

On Wed, Aug 22, 2012 at 12:25 PM, o brbrs  wrote:
> Hi,
> I'm new at hbase. I installed Hadoop 1.0.3 and Hbase 0.90.6 with Java 1.7.0
> on Ubuntu 12.04.
> When I run "hbase shell" command, this error occures:
> $ /usr/local/hbase/bin/hbase shell
> java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: Could not
> locate stub library in jar file.  Tried [jni/ı386-Linux/libjffi-1.0.so,
> /jni/ı386-Linux/libjffi-1.0.so]
> at
> com.kenai.jffi.Foreign$InValidInstanceHolder.getForeign(Foreign.java:90)
> at com.kenai.jffi.Foreign.getInstance(Foreign.java:95)
> at com.kenai.jffi.Library.openLibrary(Library.java:151)
> at com.kenai.jffi.Library.getCachedInstance(Library.java:125)
> at
> com.kenai.jaffl.provider.jffi.Library.loadNativeLibraries(Library.java:66)
> at
> com.kenai.jaffl.provider.jffi.Library.getNativeLibraries(Library.java:56)
> at
> com.kenai.jaffl.provider.jffi.Library.getSymbolAddress(Library.java:35)
> at
> com.kenai.jaffl.provider.jffi.Library.findSymbolAddress(Library.java:45)
> at
> com.kenai.jaffl.provider.jffi.AsmLibraryLoader.generateInterfaceImpl(AsmLibraryLoader.java:188)
> at
> com.kenai.jaffl.provider.jffi.AsmLibraryLoader.loadLibrary(AsmLibraryLoader.java:110)
> .
>
> What is the reason of this error? Please help.
>
> Thanks...
> --
> ...
> Obrbrs


Re: Problem - Bringing up the HBase cluster

2012-08-22 Thread N Keywal
If you have a single datanode with a replication of two, it will
(basically) won't work, as it will try to replicate the blocks on two
datanodes while there is only one available. Note that I'm speaking
about datanodes (i.e. hdfs) and not region servers (i.e. hbase).

pastebin the full logs with the region server, may be someone will
have an idea of the root issue.

But I think it's safer to start with the pseudo distributed, it's
easier to setup and it's documented. A distributed config with a
single node is not really standard, it's better to start with the
easiest path imho.

On Wed, Aug 22, 2012 at 5:43 PM, Jothikumar Ekanath  wrote:
> Hi,
>  Thanks for the response, sorry i put this email in the dev space.
> My data replication is 2. and yes the region and master server connectivity
> is good
>
> Initially i started with 4 data nodes and 1 master, i faced the same
> problem. So i reduced the data nodes to 1 and wanted to test it. I see the
> same issue. I haven't tested the pseudo distribution mode, i can test that.
> But my objective is to test the full distributed mode and do some testing. I
> can send my configuration for review. Please let me know if i am missing any
> basic setup configuration.
>
>
> On Wed, Aug 22, 2012 at 12:00 AM, N Keywal  wrote:
>>
>> Hi,
>>
>> Please use the user mailing list (added at dest) for this type of
>> questions instead of the dev list (now in bcc).
>>
>> It's a little bit strange to use the full distributed mode with a
>> single region server. Is the Pseudo-distributed mode working?
>> Check the number of datanodes vs. dfs.replication (default 3). If you
>> have less datanodes then dfs.replication value, it won't work
>> properly.
>> Check as well that the region server is connected to the master.
>>
>> Cheers,
>>
>>
>>
>> On Wed, Aug 22, 2012 at 3:16 AM, kbmkumar  wrote:
>> > Hi,
>> >   I am trying to bring up a HBase cluster with 1 master and 1 one
>> > region
>> > server. I am using
>> > Hadoop 1.0.3
>> > Hbase 0.94.1
>> >
>> > Starting the hdfs was straight forward and i could see the namenode up
>> > and
>> > running successfully. But the problem is with Hbase. I followed all the
>> > guidelines given in the Hbase cluster setup (fully distributed mode) and
>> > ran
>> > the start-hbase.sh
>> >
>> > It started the Master, Region server and zookeeper (in the region
>> > server) as
>> > per my configuration. But i am not sure the master is fully functional.
>> > When
>> > i try to connect hbase shell and create table, it errors out saying
>> > PleaseHoldException- Master is initializing
>> >
>> > In UI HMaster status shows like this *Assigning META region (since
>> > 18mins,
>> > 39sec ago)*
>> >
>> > and i see the Hmaster logs are flowing with the following debug prints,
>> > the
>> > log file is full of below prints,
>> > *
>> > 2012-08-22 01:08:19,637 DEBUG
>> >
>> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>> > Looked up root region location,
>> >
>> > connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
>> > serverName=hadoop-datanode1,60020,1345596463277
>> > 2012-08-22 01:08:19,638 DEBUG
>> >
>> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>> > Looked up root region location,
>> >
>> > connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
>> > serverName=hadoop-datanode1,60020,1345596463277
>> > 2012-08-22 01:08:19,639 DEBUG
>> >
>> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>> > Looked up root region location,
>> >
>> > connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@49586cbd;
>> > serverName=hadoop-datanode1,60020,1345596463277*
>> >
>> > Please help me in debugging this.
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-hbase.679495.n3.nabble.com/Problem-Bringing-up-the-HBase-cluster-tp4019948.html
>> > Sent from the HBase - Developer mailing list archive at Nabble.com.
>
>


Re: How to avoid stop-the-world GC for HBase Region Server under big heap size

2012-08-23 Thread N Keywal
Hi,

For a possible future, there is as well this to monitor:
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html
More or less requires JDK 1.7
See HBASE-2039

Cheers,

N.

On Thu, Aug 23, 2012 at 8:16 AM, J Mohamed Zahoor  wrote:
> Slab cache might help
> http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/
>
> ./zahoor
>
> On Thu, Aug 23, 2012 at 11:36 AM, Gen Liu  wrote:
>
>> Hi,
>>
>> We are running Region Server on big memory machine (70G) and set Xmx=64G.
>> Most heap is used as block cache for random read.
>> Stop-the-world GC is killing the region server, but using less heap (16G)
>> doesn't utilize our machines well.
>>
>> Is there a concurrent or parallel GC option that won't block all threads?
>>
>> Any thought is appreciated. Thanks.
>>
>> Gen Liu
>>
>>


Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-23 Thread N Keywal
Hi Adrien,

As well, if you can share the client code (number of threads, regions,
is it a set of single get, or are they multi gets, this kind of
stuff).

Cheers,

N.


On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans  wrote:
> Hi Adrien,
>
> I would love to see the region server side of the logs while those
> socket timeouts happen, also check the GC log, but one thing people
> often hit while doing pure random read workloads with tons of clients
> is running out of sockets because they are all stuck in CLOSE_WAIT.
> You can check that by using lsof. There are other discussion on this
> mailing list about it.
>
> J-D
>
> On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet
>  wrote:
>> Hi there,
>>
>> While I'm performing read-intensive benchmarks, I'm seeing storm of
>> "CallerDisconnectedException" in certain RegionServers. As the
>> documentation says, my client received a SocketTimeoutException
>> (6ms etc...) at the same time.
>> It's always happening and I get very poor read-performances (from 10
>> to 5000 reads/sc) in a 10 nodes cluster.
>>
>> My benchmark consists in several iterations launching 10, 100 and 1000
>> Get requests on a given random rowkey with a single CF/qualifier.
>> I'm using HBase 0.94.1 (a few commits before the official stable
>> release) with Hadoop 1.0.3.
>> Bloom filters have been enabled (at the rowkey level).
>>
>> I do not find very clear informations about these exceptions. From the
>> reference guide :
>>   (...) you should consider digging in a bit more if you aren't doing
>> something to trigger them.
>>
>> Well... could you help me digging? :-)
>>
>> --
>> AM.


Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-24 Thread N Keywal
Hi Adrien,

>  What do you think about that hypothesis ?

Yes, there is something fishy to look at here. Difficult to say
without more logs as well.
Are your gets totally random, or are you doing gets on rows that do
exist? That would explain the number of request vs. empty/full
regions.

It does not explain all what you're seeing however. So if you're not
exhausting the system resources, there may be a bug somewhere. If you
can reproduce the behaviour on a pseudo distributed cluster it could
be interesting, as I understand from you previous mail, you have a
single client, and may be a single working server at the end...

Nicolas


Re: Client receives SocketTimeoutException (CallerDisconnected on RS)

2012-08-28 Thread N Keywal
>
>
> Totally randoms (even on keys that do not exist).
>

It worth checking if it matches your real use cases. I expect that read by
row key are most of the time on existing rows (as a traditional db
relationship or a UI or workflow driven stuff), even if I'm sure it's
possible to have something totally different.

It's not going to have an impact all the time. But I can easily imagine
scenarios with better performances when the row exists vs. does not exist.
For example, you have to read more files to check that the row key is
really not there. This will be even more true if  you're inserting a lot of
data simultaneously (i.e. the files won't be major compacted). On the
opposite side, bloom filters may be more efficient in this case. But again,
I'm not sure they're going to be efficient on random data. It's like
compression algorithms: on really random data; they will all have similar &
bad results. It does not mean they are equivalent, nor useless.


> I'm working on it ! Thanks,
>

If you can reproduce a 'bad behavior' or a performance issue, we will try
to fix it for sure.

Have a nice day,

N.


Re: HBase Is So Slow To Save Data?

2012-08-29 Thread N Keywal
Hi Bing,

You should expect HBase to be slower in the generic case:
1) it writes much more data (see hbase data model), with extra columns
qualifiers, timestamps & so on.
2) the data is written multiple times: once in the write-ahead-log, once
per replica on datanode & so on again.
3) there are inter process calls & inter machine calls on the critical path.

This is the cost of the atomicity, reliability and scalability features.
With these features in mind, HBase is reasonably fast to save data on a
cluster.

On your specific case (without the points 2 & 3 above), the performance
seems to be very bad.

You should first look at:
- how much is spent in the put vs. preparing the list
- do you have garbage collection going on? even swap?
- what's the size of your final Array vs. the available memory?

Cheers,

N.


On Wed, Aug 29, 2012 at 4:08 PM, Bing Li  wrote:

> Dear all,
>
> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>
> Best regards,
> Bing
>
> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li  wrote:
>
> > Dear all,
> >
> > According to my experiences, it is very slow for HBase to save data? Am I
> > right?
> >
> > For example, today I need to save data in a HashMap to HBase. It took
> > about more than three hours. However when saving the same HashMap in a
> file
> > in the text format with the redirected System.out, it took only 4.5
> seconds!
> >
> > Why is HBase so slow? It is indexing?
> >
> > My code to save data in HBase is as follows. I think the code must be
> > correct.
> >
> > ..
> > public synchronized void
> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap > ConcurrentHashMap>> hhOutNeighborMap, int
> timingScale)
> > {
> > List puts = new ArrayList();
> >
> > String hhNeighborRowKey;
> > Put hubKeyPut;
> > Put groupKeyPut;
> > Put topGroupKeyPut;
> > Put timingScalePut;
> > Put nodeKeyPut;
> > Put hubNeighborTypePut;
> >
> > for (Map.Entry > Set>> sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
> > {
> > for (Map.Entry>
> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
> > {
> > for (String neighborKey :
> > groupNeighborEntry.getValue())
> > {
> > hhNeighborRowKey =
> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >
> > hubKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> > puts.add(hubKeyPut);
> >
> > groupKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
> > Bytes.toBytes(groupNeighborEntry.getKey()));
> > puts.add(groupKeyPut);
> >
> > topGroupKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
> >
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey(;
> > puts.add(topGroupKeyPut);
> >
> > timingScalePut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> > Bytes.toBytes(timingScale));
> > puts.add(timingScalePut);
> >
> > nodeKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
> > Bytes.toBytes(neighborKey));
> > puts.add(nodeKeyPut);
> >
> > hubNeighborTypePut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> > puts.add(hubNeighborTypePut);

Re: HBase Is So Slow To Save Data?

2012-08-29 Thread N Keywal
It's not useful here: if you have a memory issue, it's when your using the
list, not when you have finished with it and set it to null.
You need to monitor the memory consumption of the jvm, both the client &
the server.
Google around these keywords, there are many examples on the web.
Google as well arrayList initialization.

Note as well that the important is not the memory size of the structure on
disk but the size of the" List puts = new ArrayList();" before
the table put.

On Wed, Aug 29, 2012 at 5:42 PM, Bing Li  wrote:

> Dear N Keywal,
>
> Thanks so much for your reply!
>
> The total amount of data is about 110M. The available memory is enough, 2G.
>
> In Java, I just set a collection to NULL to collect garbage. Do you think
> it is fine?
>
> Best regards,
> Bing
>
>
> On Wed, Aug 29, 2012 at 11:22 PM, N Keywal  wrote:
>
>> Hi Bing,
>>
>> You should expect HBase to be slower in the generic case:
>> 1) it writes much more data (see hbase data model), with extra columns
>> qualifiers, timestamps & so on.
>> 2) the data is written multiple times: once in the write-ahead-log, once
>> per replica on datanode & so on again.
>> 3) there are inter process calls & inter machine calls on the critical
>> path.
>>
>> This is the cost of the atomicity, reliability and scalability features.
>> With these features in mind, HBase is reasonably fast to save data on a
>> cluster.
>>
>> On your specific case (without the points 2 & 3 above), the performance
>> seems to be very bad.
>>
>> You should first look at:
>> - how much is spent in the put vs. preparing the list
>> - do you have garbage collection going on? even swap?
>> - what's the size of your final Array vs. the available memory?
>>
>> Cheers,
>>
>> N.
>>
>>
>>
>> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li  wrote:
>>
>>> Dear all,
>>>
>>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>>>
>>> Best regards,
>>> Bing
>>>
>>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li  wrote:
>>>
>>> > Dear all,
>>> >
>>> > According to my experiences, it is very slow for HBase to save data?
>>> Am I
>>> > right?
>>> >
>>> > For example, today I need to save data in a HashMap to HBase. It took
>>> > about more than three hours. However when saving the same HashMap in a
>>> file
>>> > in the text format with the redirected System.out, it took only 4.5
>>> seconds!
>>> >
>>> > Why is HBase so slow? It is indexing?
>>> >
>>> > My code to save data in HBase is as follows. I think the code must be
>>> > correct.
>>> >
>>> > ..
>>> > public synchronized void
>>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap>> > ConcurrentHashMap>> hhOutNeighborMap, int
>>> timingScale)
>>> > {
>>> > List puts = new ArrayList();
>>> >
>>> > String hhNeighborRowKey;
>>> > Put hubKeyPut;
>>> > Put groupKeyPut;
>>> > Put topGroupKeyPut;
>>> > Put timingScalePut;
>>> > Put nodeKeyPut;
>>> > Put hubNeighborTypePut;
>>> >
>>> > for (Map.Entry>> > Set>> sourceHubGroupNeighborEntry :
>>> hhOutNeighborMap.entrySet())
>>> > {
>>> > for (Map.Entry>
>>> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
>>> > {
>>> > for (String neighborKey :
>>> > groupNeighborEntry.getValue())
>>> > {
>>> > hhNeighborRowKey =
>>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
>>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
>>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
>>> >
>>> > hubKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLU

Re: HBase and unit tests

2012-08-31 Thread n keywal
Hi Cristopher,

HBase starts a minicluster for many of its tests because we have a lot of
destructive tests. Or the non destructive tests would be impacted by the
destructive tests. When writing a client application, you usually don't
need to do that: you can rely on the same instance for all your tests.

As well, it's useful to write the tests in a way compatible with a real
cluster or a pseudo distributed one. Sometimes, when the test fails, you
want to have a look at what the code wrote or found in HBase: you won't
have this in a mini cluster. And it saves a start.

I don't know if there is a blog entry on this; but it's not very difficult
to do (but as usual not that easy when you start). I've personally done it
with a singleton class + prefixing the table names by a random key (to
allow multiple tests in parallel on the same cluster without relying on
cleanup) + getProperty to decide between starting a mini cluster or
connecting to a cluster.

HTH,

Nicolas


On Fri, Aug 31, 2012 at 12:28 PM, Cristofer Weber <
cristofer.we...@neogrid.com> wrote:

> Hi Sonal, Stack and Ulrich!
>
> Yes, I should provide more details :$
>
> I reached the links you provided when I was searching for a way to start
> HBase with JUnit. From default, the only params I have changed are
> Zookeeper port and the amount of nodes, which is 1 in my case. Based on
> logs I suspect that most of time are spent with HDFS and that's why I asked
> if there is a way to start a standalone instance of HBase. The amount of
> data written at each test case would probably fit in memstore anyway, and
> table cleansing between each test method is managed by a loop of deletes.
>
> At least 15 seconds are spent on starting the mini cluster for each test
> case.
>
> Right now I reminded that I should turn off WAL when running unit tests
> :-), but this will not reflect on startup time.
>
> Thanks!!
>
> Best regards,
> Cristofer
>
> 
> De: Ulrich Staudinger [ustaudin...@gmail.com]
> Enviado: sexta-feira, 31 de agosto de 2012 2:21
> Para: user@hbase.apache.org
> Assunto: Re: HBase and unit tests
>
> As a general advice, although you probably do take care of this,
> instantiate the mini cluster only once in your junit test constructor
> and not in every test method. at the end of each test, either cleanup
> your hbase or use a different "area" per test.
>
> best regards,
> ulrich
>
>
> --
> connect on xing or linkedin. sent from my tablet.
>
> On 31.08.2012, at 06:46, Stack  wrote:
>
> > On Thu, Aug 30, 2012 at 4:44 PM, Cristofer Weber
> >  wrote:
> >> Hi there!
> >>
> >> After I started studying HBase, I've searched for open source projects
> backed by HBase and I found Titan distributed graph database (you probably
> heard about it). As soon as I read in their documentation that HBase
> adapter is experimental and suboptimal (disclaimer here:
> https://github.com/thinkaurelius/titan/wiki/Using-HBase) I volunteered to
> help improving this adapter and since then I made a few changes to improve
> on running tests (reduced from hours to minutes) and also an improvement on
> search feature.
> >>
> >> Now I'm trying to break the dependency on a pre-installed HBase for
> unit tests and found miniCluster inside HBase tests, but minicluster
> demands too much time to start and I don't know if tweaking on configs will
> improve significantly. Is there a way to start a 'lightweight' instance,
> like programatically starting a standalone instance?
> >>
> >
> > How much is 'too much time' Cristofer?  Do you want a standalone cluster
> at all?
> > St.Ack
> > P.S. If digging in this area, you might find the blog post by the
> > sematextians of use:
> >
> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
>


Re: HBase and unit tests

2012-08-31 Thread n keywal
On Fri, Aug 31, 2012 at 2:33 PM, Cristofer Weber <
cristofer.we...@neogrid.com> wrote:

> For the other adapters (Cassandra, Cassandra + Thrift, Cassandra +
> Astyanax, etc) they managed to run tests as Internal and External for unit
> tests and also have a profile for Performance and Concurrent tests, where
> External and Performance/Concurrent runs over a live database instance and
> only with Internal tests it is expected to start a database per test case,
> remaining the same tests as in External. HBase adapter already have
> External and Performance/Concurrent so I'm trying to provide the Internal
> set where the objective is to test Titan|HBase interaction.
>

Understood, thanks for sharing the context.

And my goal is to achieve better times than Cassandra :-)
>
> Singleton seems to be a good option, but I have to check if Maven Surefire
> can keep same process between JUnit Test Cases.
>

It should be ok with the parameter "forkMode=once" in surefire.

Because Titan work with adapters for different databases and manage
> table/CF creation when not exists, I think it will not be possible to
> prefix table names per test without changing some core components of Titan,
> and it seems to be too invasive to change this now, and deletion is fast
> enough so we can keep same table.
>

It's useful on an external cluster, as you can't fully rely on the clean up
when a test fails nastily, or if you want to analyse the content. It won't
be such an issue on a mini cluster, as it's recreated between the test runs.

Thanks!!
>

You're welcome. Keep us updated, and tell us if you have issues.


Re: Extremely slow when loading small amount of data from HBase

2012-09-05 Thread n keywal
Hi,

With 8 regionservers, yes, you can. Target a few hundreds by default imho.

N.

On Wed, Sep 5, 2012 at 4:55 AM, 某因幡  wrote:

> +HBase users.
>
>
> -- Forwarded message --
> From: Dmitriy Ryaboy 
> Date: 2012/9/4
> Subject: Re: Extremely slow when loading small amount of data from HBase
> To: "u...@pig.apache.org" 
>
>
> I think the hbase folks recommend something like 40 regions per node
> per table, but I might be misremembering something. Have you tried
> emailing the hbase users list?
>
> On Sep 4, 2012, at 3:39 AM, 某因幡  wrote:
>
> > After merging ~8000 regions to ~4000 on an 8-node cluster the things
> > is getting better.
> > Should I continue merging?
> >
> >
> > 2012/8/29 Dmitriy Ryaboy :
> >> Can you try the same scans with a regular hbase mapreduce job? If you
> see the same problem, it's an hbase issue. Otherwise, we need to see the
> script and some facts about your table (how many regions, how many rows,
> how big a cluster, is the small range all on one region server, etc)
> >>
> >> On Aug 27, 2012, at 11:49 PM, 某因幡  wrote:
> >>
> >>> When I load a range of data from HBase simply using row key range in
> >>> HBaseStorageHandler, I find that the speed is acceptable when I'm
> >>> trying to load some tens of millions rows or more, while the only map
> >>> ends up in a timeout when it's some thousands of rows.
> >>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
> >>>
> >>>
> >>> --
> >>> language: Chinese, Japanese, English
> >
> >
> >
> > --
> > language: Chinese, Japanese, English
>
>
> --
> language: Chinese, Japanese, English
>


Re: Local debugging (possibly with Maven and HBaseTestingUtility?)

2012-09-07 Thread n keywal
Hi,

You can use HBase in standalone mode? Cf.
http://hbase.apache.org/book.html#standalone_dist?
I guess you already tried and it didn't work?

Nicolas

On Fri, Sep 7, 2012 at 9:57 AM, Jeroen Hoek  wrote:

> Hello,
>
> We are developing a web-application that uses HBase as database, with
> Tomcat as application server. Currently, our server-side code can act
> as a sort of NoSQL abstraction-layer for either HBase or Google
> AppEngine. HBase is used in production, AppEngine mainly for testing
> and demo deployments.
>
> Our current development setup is centred around Eclipse, and local
> testing and debugging is done by running the application from Eclipse,
> which launches the Jetty application server and connects to a local
> AppEngine database persisted to a single file in the WEB-INF
> directory. This allows the developers to easily test new features
> against an existing (local) database that is persisted as long you
> don't throw away the binary file yourself.
>
> We would like to be able to do the same thing with HBase. So far I
> have seen examples of HBaseTestingUtility being used in unit tests
> (usually with Maven), but while that covers unit-testing, I have not
> been able to find a way to run a local, persistent faux-HBase cluster
> like AppEngine does. Is there a recommended way of doing this?
>
> The reason for wanting to be able to test locally like this is to
> avoid the overhead of running a local VM with HBase or having to
> connect to a remote test-cluster when developing.
>
> Kind regards,
>
> Jeroen Hoek
>


Re: Local debugging (possibly with Maven and HBaseTestingUtility?)

2012-09-10 Thread n keywal
With stand-alone mode I assume you mean installing HBase locally and

> work with that?
>

Yes. You can as well can launch "a la standalone" any version, including a
development version. The launch scripts check this, and use maven to get
the classpath needed on the dev version.

The problem with installing HBase directly on the developer laptop's
> OS is that this is limits you to the version installed at any one
> time. When writing software that uses the HBase client API it is
> sometimes necessary to switch between versions. For example, one day I
> might be working on a feature for our next release, based on
> Cloudera's CDH4 version of HBase, the next day I might have to switch
> back to CDH3, because that runs on production and a sudden hotfix is
> needed, and at the end of the week I might want to try out some of the
> new features in HBase 0.94.1.
>

I don''t know if CDH versions are exclusive. This would be a question for
the cdh lists. But for the apache releases at least, nothing prevents you
from having multiple ones installed on the same computer. If you need it,
you should even be able to run multiple versions simultaneously (I've never
done that, but I don't see why it would be an issue, it's just a matter of
ports & directory configuration).

Nicolas


Re: Regarding column family

2012-09-11 Thread n keywal
Yes, because there is one store (hence set of files) per column family.
See this: http://hbase.apache.org/book.html#number.of.cfs

On Tue, Sep 11, 2012 at 9:52 AM, Ramasubramanian <
ramasubramanian.naraya...@gmail.com> wrote:

> Hi,
>
> Does column family play any role during loading a file into hbase from
> hdfs in terms of performance?
>
> Regards,
> Rams


Re: Performance of scan setTimeRange VS manually doing it

2012-09-12 Thread n keywal
For each file; there is a time range. When you scan/search, the file is
skipped if there is no overlap between the file timerange and the timerange
of the query. As there are other parameters as well (row distribution,
compaction effects, cache, bloom filters, ...) it's difficult to know in
advance what's going to happen exactly.  But specifying a timerange does no
harm for sure, if it matches your functional needs...

This said, if you already have the rowkey, the time range is less
interesting as you will skip a lot of file already.

On Wed, Sep 12, 2012 at 11:52 PM, Tom Brown  wrote:

> When I query HBase, I always include a time range. This has not been a
> problem when querying recent data, but it seems to be an issue when I
> query older data (a few hours old). All of my row keys include the
> timestamp as part of the key (this value is the same as the HBase
> timestamp for the row).  I recently tried an experiment where I
> manually re-seek to the possible row (based on the timestamp as part
> of the row key) instead of using "setTimeRange" on my scan object and
> was amazed to see that there was no degradation for older data.
>
> Can someone postulate a theory as to why this might be happening? I'm
> happy to provide extra data if it will help you theorize...
>
> Is there a downside to stopping using "setTimeRange"?
>
> --Tom
>


Re: RetriesExhaustedWithDetailsException while puting in Table

2012-09-19 Thread n keywal
DoNotRetryIOException means that the error is considered at permanent: it's
not a missing regionserver, but for example a table that's not enabled.
I would expect a more detailed exception (a caused by or something alike).
If it's missing, you should have more info in the regionserver logs.

On Wed, Sep 19, 2012 at 11:54 AM, Dhirendra Singh  wrote:

> I am getting this exception while trying to insert entry to the table. the
> table has its secondary index and its coprocessors defined properly.
> I suspect this error is because the inserting row didn't had all the
> columns which were required in the secondary index but not sure.
>
> could someone tell me the way to debug this scenario as the exception is
> also a bit vauge, it actually doesn't tell what went wrong,
>
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 1 action: DoNotRetryIOException: 1 time, servers with issues:
> tserver.corp.nextag.com:60020,
>  at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>  at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:820)
>  at org.apache.hadoop.hbase.client.HTable.put(HTable.java:795)
> at
>
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
>
>
> --
> Warm Regards,
> Dhirendra Pratap
> +91. 9717394713
>


Re: H-base Master/Slave replication

2012-09-26 Thread n keywal
Hi,

I think there is a confusion between hbase replication (replication between
clusters) and hdfs replication (replication between datanodes).
hdfs replication is (more or less) hidden and done for you.

Nicolas

On Wed, Sep 26, 2012 at 9:20 AM, Venkateswara Rao Dokku  wrote:

> Hi,
> I wanted to Cluster Hbase on 2 nodes. I put one of my nodes as
> hadoop-namenodeas well as hbase-master & the other node as hadoop-datanode1
> as well as hbase-region-server1. I started hadoop cluster as well as Hbase
> on the name-node side. They started fine. I created tables & it went fine
> in the master. Now I am trying to replicate the data across the nodes. In
> some of the sites it is mentioned that, we have to maintain zoo-keeper by
> our-self. How to do it??
>  Currently my hbase is maintaining the zoo-keeper. what are the changes I
> need to do for conf/ files, in order to replicate data between Master/Slave
> nodes.
>
> --
> Thanks & Regards,
> Venkateswara Rao Dokku,
> Software Engineer,One Convergence Devices Pvt Ltd.,
> Jubille Hills,Hyderabad.
>


Re: Hbase clustering

2012-09-27 Thread n keywal
Hi,

I would like to direct you to the reference guide, but I must acknowledge
that, well, it's a reference guide, hence not really easy for a plain new
start.
You should have a look at Lars' blog (and may be buy his book), and
especially this entry:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

Some hints however:
- the replication occurs at the hdfs level, not the hbase level: hbase
writes files that are split in hdfs blocks that are replicated accross the
datanodes. If you want to check the replications, you must look at what
files are written by hbase and how they have been split in blocks by hdfs
and how these blocks have been replicated. That will be in the hdfs
interface. As a side note, it's not the easiest thing to learn when you
start :-)
- The error > ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
Retried 7 times
  this is not linked to replication or whatever. It means that second
machine cannot find the master. You need to fix this first. (googling &
checking the logs).


Good luck,

Nicolas




On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku  wrote:

> How can we verify that the data(tables) is distributed across the cluster??
> Is there a way to confirm it that the data is distributed across all the
> nodes in the cluster.?
>
> On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku <
> dvrao@gmail.com> wrote:
>
> > Hi,
> > I am completely new to Hbase. I want to cluster the Hbase on two
> > nodes.I installed hadoop,hbase on the two nodes & my conf files are as
> > given below.
> > *cat  conf/regionservers *
> > hbase-regionserver1
> > hbase-master
> > *cat conf/masters *
> > hadoop-namenode
> > * cat conf/slaves *
> > hadoop-datanode1
> > *vim conf/hdfs-site.xml *
> > 
> > 
> >
> > 
> >
> > 
> > 
> > dfs.replication
> > 2
> > Default block replication.The actual number of
> > replications can be specified when the file is created. The default is
> used
> > if replication is not specified in create time.
> > 
> > 
> > 
> > dfs.support.append
> > true
> > Default block replication.The actual number of
> > replications can be specified when the file is created. The default is
> used
> > if replication is not specified in create time.
> > 
> > 
> > 
> > *& finally my /etc/hosts file is *
> > 127.0.0.1   localhost
> > 127.0.0.1   oc-PowerEdge-R610
> > 10.2.32.48  hbase-master hadoop-namenode
> > 10.240.13.35 hbase-regionserver1  hadoop-datanode1
> >  The above files are identical on both of the machines. The following are
> > the processes that are running on my m/c's when I ran start scripts in
> > hadoop as well as hbase
> > *hadoop-namenode:*
> > HQuorumPeer
> > HMaster
> > Main
> > HRegionServer
> > SecondaryNameNode
> > Jps
> > NameNode
> > JobTracker
> > *hadoop-datanode1:*
> >
> > TaskTracker
> > Jps
> > DataNode
> > -- process information unavailable
> > Main
> > NC
> > HRegionServer
> >
> > I can able to create,list & scan tables on the *hadoop-namenode* machine
> > using Hbase shell. But while trying to run the same on the  *
> > hadoop-datanode1 *machine I couldn't able to do it as I am getting
> > following error.
> > hbase(main):001:0> list
> > TABLE
> >
> >
> > ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times
> >
> > Here is some help for this command:
> > List all tables in hbase. Optional regular expression parameter could
> > be used to filter the output. Examples:
> >
> >   hbase> list
> >   hbase> list 'abc.*'
> > How can I list,scan the tables that are created by the *hadoop-namenode *
> > from the *hadoop-datanode1* machine. Similarly Can I create some tables
> > on  *hadoop-datanode1 *& can I access them from the *hadoop-namenode * &
> > vice-versa as the data is distributed as this is a cluster.
> >
> >
> >
> > --
> > Thanks & Regards,
> > Venkateswara Rao Dokku,
> > Software Engineer,One Convergence Devices Pvt Ltd.,
> > Jubille Hills,Hyderabad.
> >
> >
>
>
> --
> Thanks & Regards,
> Venkateswara Rao Dokku,
> Software Engineer,One Convergence Devices Pvt Ltd.,
> Jubille Hills,Hyderabad.
>


Re: Hbase clustering

2012-09-27 Thread n keywal
You should launch the master only once, on whatever machine you like. Then
you will be able to access it from any other machine.
Please have a look at the blog I mentioned in my previous mail.

On Thu, Sep 27, 2012 at 9:39 AM, Venkateswara Rao Dokku  wrote:

> I can see that HMaster is not started on the data-node machine when the
> start scripts in hadoop & hbase ran on the hadoop-namenode. My doubt is
> that,Shall we have to start that master on the hadoop-datanode1 too or the
> hadoop-datanode1 will access the Hmaster that is running on the
> hadoop-namenode to create,list,scan tables as the two nodes are in the
> cluster as namenode & datanode.
>
> On Thu, Sep 27, 2012 at 1:02 PM, n keywal  wrote:
>
> > Hi,
> >
> > I would like to direct you to the reference guide, but I must acknowledge
> > that, well, it's a reference guide, hence not really easy for a plain new
> > start.
> > You should have a look at Lars' blog (and may be buy his book), and
> > especially this entry:
> > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
> >
> > Some hints however:
> > - the replication occurs at the hdfs level, not the hbase level: hbase
> > writes files that are split in hdfs blocks that are replicated accross
> the
> > datanodes. If you want to check the replications, you must look at what
> > files are written by hbase and how they have been split in blocks by hdfs
> > and how these blocks have been replicated. That will be in the hdfs
> > interface. As a side note, it's not the easiest thing to learn when you
> > start :-)
> > - The error > ERROR: org.apache.hadoop.hbase.MasterNotRunningException:
> > Retried 7 times
> >   this is not linked to replication or whatever. It means that second
> > machine cannot find the master. You need to fix this first. (googling &
> > checking the logs).
> >
> >
> > Good luck,
> >
> > Nicolas
> >
> >
> >
> >
> > On Thu, Sep 27, 2012 at 9:07 AM, Venkateswara Rao Dokku <
> > dvrao@gmail.com
> > > wrote:
> >
> > > How can we verify that the data(tables) is distributed across the
> > cluster??
> > > Is there a way to confirm it that the data is distributed across all
> the
> > > nodes in the cluster.?
> > >
> > > On Thu, Sep 27, 2012 at 12:26 PM, Venkateswara Rao Dokku <
> > > dvrao@gmail.com> wrote:
> > >
> > > > Hi,
> > > > I am completely new to Hbase. I want to cluster the Hbase on two
> > > > nodes.I installed hadoop,hbase on the two nodes & my conf files are
> as
> > > > given below.
> > > > *cat  conf/regionservers *
> > > > hbase-regionserver1
> > > > hbase-master
> > > > *cat conf/masters *
> > > > hadoop-namenode
> > > > * cat conf/slaves *
> > > > hadoop-datanode1
> > > > *vim conf/hdfs-site.xml *
> > > > 
> > > > 
> > > >
> > > > 
> > > >
> > > > 
> > > > 
> > > > dfs.replication
> > > > 2
> > > > Default block replication.The actual number of
> > > > replications can be specified when the file is created. The default
> is
> > > used
> > > > if replication is not specified in create time.
> > > > 
> > > > 
> > > > 
> > > > dfs.support.append
> > > > true
> > > > Default block replication.The actual number of
> > > > replications can be specified when the file is created. The default
> is
> > > used
> > > > if replication is not specified in create time.
> > > > 
> > > > 
> > > > 
> > > > *& finally my /etc/hosts file is *
> > > > 127.0.0.1   localhost
> > > > 127.0.0.1   oc-PowerEdge-R610
> > > > 10.2.32.48  hbase-master hadoop-namenode
> > > > 10.240.13.35 hbase-regionserver1  hadoop-datanode1
> > > >  The above files are identical on both of the machines. The following
> > are
> > > > the processes that are running on my m/c's when I ran start scripts
> in
> > > > hadoop as well as hbase
> > > > *hadoop-namenode:*
> > > > HQuorumPeer
> > > > HMaster
> > > > Main
> > > > HRegionServer
> > > > SecondaryNameNode
> > > > Jps
> > > > NameNode
> > > > JobTracker

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-27 Thread n keywal
You don't have to migrate the data when you upgrade, it's done on the fly.
But it seems you want to do something more complex? A kind of realtime
replication between two clusters in two different versions?

On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy  wrote:

> Hello,
>
> Corollary, what is the better way to migrate data from a 0.90 cluster to a
> 0.92 cluser ?
>
> Hbase 0.90 => Client 0.90 => stdout | stdin => client 0.92 => Hbase 0.92
>
> All the data must tansit on a single host where compute the 2 clients.
>
> It may be paralalize with mutiple version working with different range
> scanner maybe but not so easy.
>
> Is there a copytable version that should read on 0.90 to write on 0.92 with
> mapreduce version ?
>
> maybe there is some sort of namespace available for Java Classes that we
> may use 2 version of a same package and go for a mapreduce ?
>
> Cheers,
>
> --
> Damien
>
> 2012/9/25 Jean-Daniel Cryans 
>
> > It's not compatible. Like the guide says[1]:
> >
> > "replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
> > clear out all 0.90.x instances) and restart (You cannot do a rolling
> > restart from 0.90.x to 0.92.x -- you must restart)"
> >
> > This includes the client.
> >
> > J-D
> >
> > 1. http://hbase.apache.org/book.html#upgrade0.92
> >
> > On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
> >  wrote:
> > > Hi,
> > >
> > > We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app worked
> > fine in hbase 0.90.4.
> > >
> > > Our new setup has HBase 0.92 server and hbase 0.90.4 client. And throw
> > following exception when client would like to connect to server.
> > >
> > > Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me
> know,
> > >
> > > Thanks,
> > > Saurabh.
> > >
> > >
> > > 12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
> > complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
> > sessionid = 0x139f61977650034, negotiated timeout = 6
> > >
> > > java.lang.IllegalArgumentException: Not a host:port pair: ?
> > >
> > >   at
> > org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:60)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:786)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:797)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:895)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:801)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:766)
> > >
> > >   at org.apache.hadoop.hbase.client.HTable.(HTable.java:179)
> > >
> > >   at
> >
> org.apache.hadoop.hbase.HBaseTestingUtility.truncateTable(HBaseTestingUtility.java:609)
> > >
> > >   at
> >
> com.citi.sponge.flume.sink.ELFHbaseSinkTest.testAppend2(ELFHbaseSinkTest.java:221)
> > >
> > >   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >
> > >   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> > >
> > >   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
> > >
> > >   at java.lang.reflect.Method.invoke(Unknown Source)
> > >
> > >   at junit.framework.TestCase.runTest(TestCase.java:168)
> > >
> > >   at junit.framework.TestCase.runBare(TestCase.java:134)
> > >
> > >   at junit.framework.TestResult$1.protect(TestResult.java:110)
> > >
> > >   at junit.framework.TestResult.runProtected(TestResult.java:128)
> > >
> > >   at junit.framework.TestResult.run(TestResult.java:113)
> > >
> > >   at junit.framework.TestCase.run(TestCase.java:124)
> > >
> > >   at junit.framework.TestSuite.runTest(TestSuite.java:232)
> > >
> > >   at junit.framework.TestSuite.run(TestSuite.java:227)
> > >
> > >   at
> >
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:81)
> > >
> > >   at
> >
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
> 

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-28 Thread n keywal
Depending on what you're doing with the data, I guess you might have some
corner cases, especially after a major compaction. That may be a non
trivial piece of code to write (again, it depends on how you use HBase. May
be it is actually trivial).
And, if you're pessimistic, the regression in 0.92 can be one of those that
corrupts the data, so you will need manual data fixes as well during the
rollback.

It may be simpler to secure the migration by investing more in the testing
process (dry/parallel runs). As well, if you find bugs while a release is
in progress, it increases your chances to get your bugs fixed...

Nicolas

On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy wrote:

> Actually, I have an old cluster on on prod with 0.90.3 version installed
> manually and I am working on a CDH4 new cluster deployed full automatic
> with puppet.
> While migration is not reversible (according to the pointer given by
> Jean-Daniel) I would like to keep he old cluster safe by side to be able to
> revert operation
> Switching from an old vanilla version to a Cloudera one is an other risk
> introduced in migrating the actual cluster and I'm not feeling confortable
> with.
> My idea is to copy data from old to new and switch clients the new cluster
> and I am lookin for the best strategy to manage it.
>
> A scanner based on timestamp should be enougth to get the last updates
> after switching (But trying to keep it short).
>
> Cheers,
>
> --
> Damien
>
> 2012/9/27 n keywal 
>
> > You don't have to migrate the data when you upgrade, it's done on the
> fly.
> > But it seems you want to do something more complex? A kind of realtime
> > replication between two clusters in two different versions?
> >
> > On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy 
> > wrote:
> >
> > > Hello,
> > >
> > > Corollary, what is the better way to migrate data from a 0.90 cluster
> to
> > a
> > > 0.92 cluser ?
> > >
> > > Hbase 0.90 => Client 0.90 => stdout | stdin => client 0.92 => Hbase
> 0.92
> > >
> > > All the data must tansit on a single host where compute the 2 clients.
> > >
> > > It may be paralalize with mutiple version working with different range
> > > scanner maybe but not so easy.
> > >
> > > Is there a copytable version that should read on 0.90 to write on 0.92
> > with
> > > mapreduce version ?
> > >
> > > maybe there is some sort of namespace available for Java Classes that
> we
> > > may use 2 version of a same package and go for a mapreduce ?
> > >
> > > Cheers,
> > >
> > > --
> > > Damien
> > >
> > > 2012/9/25 Jean-Daniel Cryans 
> > >
> > > > It's not compatible. Like the guide says[1]:
> > > >
> > > > "replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you
> > > > clear out all 0.90.x instances) and restart (You cannot do a rolling
> > > > restart from 0.90.x to 0.92.x -- you must restart)"
> > > >
> > > > This includes the client.
> > > >
> > > > J-D
> > > >
> > > > 1. http://hbase.apache.org/book.html#upgrade0.92
> > > >
> > > > On Tue, Sep 25, 2012 at 11:16 AM, Agarwal, Saurabh
> > > >  wrote:
> > > > > Hi,
> > > > >
> > > > > We recently upgraded hbase 0.90.4 to HBase 0.92. Our HBase app
> worked
> > > > fine in hbase 0.90.4.
> > > > >
> > > > > Our new setup has HBase 0.92 server and hbase 0.90.4 client. And
> > throw
> > > > following exception when client would like to connect to server.
> > > > >
> > > > > Is anyone running HBase 0.92 server and hbase 0.90.4 client? Let me
> > > know,
> > > > >
> > > > > Thanks,
> > > > > Saurabh.
> > > > >
> > > > >
> > > > > 12/09/24 14:58:31 INFO zookeeper.ClientCnxn: Session establishment
> > > > complete on server vm-3733-969C.nam.nsroot.net/10.49.217.56:2181,
> > > > sessionid = 0x139f61977650034, negotiated timeout = 6
> > > > >
> > > > > java.lang.IllegalArgumentException: Not a host:port pair: ?
> > > > >
> > > > >   at
> > > > org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:60)
> > > > >
> > > > >   at
> > > >
> > >
> >
> org.apache.hadoop.hbase.zookeeper.RootRegionTracker.d

Re: Does hbase 0.90 client work with 0.92 server?

2012-09-28 Thread n keywal
I understood that you were targeting a backup plan to go back from 0.92 -->
0.90 if anything goes wrong?
But in any case, it might work, it depends on the data you're working with
and the downtime you're ready to accept. It's not simple to ensure you
won't miss any operation and to manage the deletes mixed with compactions.
Not taking into account the root issue you may have with the source
cluster. For example, if you're migrating back because your 0.92 cluster
cannot handle the load, adding a map reduce task to do an "export world" on
top of this might bring this extra little workload that will put it down
completely.

On Fri, Sep 28, 2012 at 11:59 AM, Damien Hardy wrote:

> And what about hbase 0.90 export && distcp hftp://hdfs0.20/ dfs://hdfs1.0/
> && hbase 0.92 import ?
>
> Then switch client (a rest interface), then recorver the few last update
> with the same approch limiting export on starttime
>
> http://hadoop.apache.org/docs/hdfs/current/hftp.html
>
> This way could be safe with a minimal downtime ?
>
> Cheers,
>
> 2012/9/28 n keywal 
>
> > Depending on what you're doing with the data, I guess you might have some
> > corner cases, especially after a major compaction. That may be a non
> > trivial piece of code to write (again, it depends on how you use HBase.
> May
> > be it is actually trivial).
> > And, if you're pessimistic, the regression in 0.92 can be one of those
> that
> > corrupts the data, so you will need manual data fixes as well during the
> > rollback.
> >
> > It may be simpler to secure the migration by investing more in the
> testing
> > process (dry/parallel runs). As well, if you find bugs while a release is
> > in progress, it increases your chances to get your bugs fixed...
> >
> > Nicolas
> >
> > On Thu, Sep 27, 2012 at 10:37 AM, Damien Hardy  > >wrote:
> >
> > > Actually, I have an old cluster on on prod with 0.90.3 version
> installed
> > > manually and I am working on a CDH4 new cluster deployed full automatic
> > > with puppet.
> > > While migration is not reversible (according to the pointer given by
> > > Jean-Daniel) I would like to keep he old cluster safe by side to be
> able
> > to
> > > revert operation
> > > Switching from an old vanilla version to a Cloudera one is an other
> risk
> > > introduced in migrating the actual cluster and I'm not feeling
> > confortable
> > > with.
> > > My idea is to copy data from old to new and switch clients the new
> > cluster
> > > and I am lookin for the best strategy to manage it.
> > >
> > > A scanner based on timestamp should be enougth to get the last updates
> > > after switching (But trying to keep it short).
> > >
> > > Cheers,
> > >
> > > --
> > > Damien
> > >
> > > 2012/9/27 n keywal 
> > >
> > > > You don't have to migrate the data when you upgrade, it's done on the
> > > fly.
> > > > But it seems you want to do something more complex? A kind of
> realtime
> > > > replication between two clusters in two different versions?
> > > >
> > > > On Thu, Sep 27, 2012 at 9:56 AM, Damien Hardy  >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Corollary, what is the better way to migrate data from a 0.90
> cluster
> > > to
> > > > a
> > > > > 0.92 cluser ?
> > > > >
> > > > > Hbase 0.90 => Client 0.90 => stdout | stdin => client 0.92 => Hbase
> > > 0.92
> > > > >
> > > > > All the data must tansit on a single host where compute the 2
> > clients.
> > > > >
> > > > > It may be paralalize with mutiple version working with different
> > range
> > > > > scanner maybe but not so easy.
> > > > >
> > > > > Is there a copytable version that should read on 0.90 to write on
> > 0.92
> > > > with
> > > > > mapreduce version ?
> > > > >
> > > > > maybe there is some sort of namespace available for Java Classes
> that
> > > we
> > > > > may use 2 version of a same package and go for a mapreduce ?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > --
> > > > > Damien
> > > > >
> > > > > 2012/9/25 Jean-Daniel Cryans 
> > 

HBase User Group in Paris

2012-10-02 Thread n keywal
Hi all,

I was wondering how many HBase users there are in Paris (France...).

Would you guys be interested in participating in a Paris-based user group?
The idea would be to share HBase practises, with something like a meet-up
per quarter.

Reply to me directly or on the list, as you prefer.

Cheers,

Nicolas


Re: hbase heap size beyond 16G ?

2011-11-08 Thread N Keywal
If your're interested, some good slides on GC (slide 45 and after):
http://www.azulsystems.com/sites/www.azulsystems.com/SpringOne2011_UnderstandingGC.pdf

On Tue, Nov 8, 2011 at 11:25 PM, Mikael Sitruk wrote:

> Concurrent GC (a.k.a CMS) does not mean that there is no more pause. The
> pauses are reduced to minimum but can still happen especially if the
> concurrent thread will not finish their work under high pressure. The G1
> collector in JDK 7.0 pretends to be a better collector than CMS, but i
> presume tests will need to be done to validate this.
> BTW the CMS collector is the one that is recommented in the book.
>
> Mikael.S
>
> On Tue, Nov 8, 2011 at 11:57 PM, Sujee Maniyam  wrote:
>
> > HI All
> > the HBase book by Lars warns it is not recommended to set heap size above
> > 16G, because of 'stop the world' GC.
> >
> > Does this still apply?  Specially with  'concurrentGC' ?
> >
> > thanks
> > Sujee
> > http://sujee.net
> >
>
>
>
> --
> Mikael.S
>


Re: How to implement tests for python based application using Hbase-thrift interface

2012-01-30 Thread N Keywal
Hi Damien,

Can't say for the Python stuff.
You can reuse or extract what you need in HBaseTestingUtility from the
hbase test package, this will allow you to start a full Hbase mini cluster
in a few lines of Java code.

Cheers,

N.


On Mon, Jan 30, 2012 at 11:10 AM, Damien Hardy  wrote:

> Hello,
>
> I wrote some code in python using Hbase as image storage.
> I want my code to be tested independently of some external Hbase full
> architecture so my question is :
> Is there some "howto" helping on instantiate a temporary local
> minicluster + thrift interface in order to pass python (or maybe other
> language) hbase-thrift based tests easily.
>
> Cheers,
>
> --
> Damien
>
>


Re: sequence number

2012-01-31 Thread N Keywal
Hi,

Yes, each cell is associated to a long. By default it's a timestamps, but
you can set it yourself when you create the put.
It's stored everywhere.

You've got a lot of information and links on this in the hbase book (
http://hbase.apache.org/book.html#versions)

Cheers,

N.

On Mon, Jan 30, 2012 at 9:38 PM, Noureddine BOUYAHIAOUI <
nour.bouyahia...@free.fr> wrote:

> Hi,
>
> In my reads about HBase, I understand that, The HRegionServer (n time
> HRegion) use sequence number ( AtomicLong ) to version each key/value
> stored in WAL.
>
> Please can you give me some details about this notion, for example how
> HRegionServer create his sequence number, and why we use it. Is't
> considered as version identifier?
>
> Best regards.
>
> Noureddine Bouyahiaoui
>
>
>
>


zookeeper 3.3/3.4 on hbase trunk

2012-02-07 Thread N Keywal
Hi,

FYI. I've been doing some tests mixing zookeeperclient/server versions
onhbase trunk, by executing medium category unit tests with a standalone
zookeeper server (Mixing versions 3.3 & 3.4 is officially supported by
Zookeeper, but was worth checking)

I tested:
Zookeeper Server server 3.3.4 and 3.4.2
Zookeeper Client API 3.3.4 and 3.4.2 (with some changes in hbase to make it
build with 3.3 API).

Meaning;
Client 3.4.2 --> Server 3.4.2
Client 3.3.4 --> Server 3.4.2
Client 3.3.4 --> Server 3.3.4
Client 3.4.2 --> Server 3.3.4

Conclusion:
- It works, except of course if you're activating secure login (the related
unit tests will hang).
- I had a strange random error with the 3.3.4 server (whatever the client
version), but it seems to be linked only to the start/stop phase (zookeeper
server surviving to a stop request).
- It's difficult from the client to know what's the zookeeper server
version. A zookeeper jira was created for this (ZOOKEEPER-1381)
- if you use a 3.4.2 feature like "multi" on a 3.3 server, it hangs: once
again, it's up to the developer/administrator to make sure he's not using
something specific to the 3.4 server, hence the jira 1381 if we want stuff
like warnings or implementations optimized for a given server.

Cheers,

N.


Re: Is it possible to connect HBase remotely?

2012-02-08 Thread N Keywal
Hi,

The client needs to connect to zookeeper as well. You haven't set the
parameters for zookeeper, so it goes with the default settings
(localhost/2181), hence the error you're seeing. Set the zookeeper
connection property in the client, it should work.

This should do it:
conf .set("hbase.zookeeper.quorum", "192.168.2.122");
conf .set("hbase.zookeeper.property.clientPort", "2181");

Cheers,

N.

On Wed, Feb 8, 2012 at 3:26 PM, shashwat shriparv  wrote:

> I have two machine on same network IPs are like *192.168.2.122* and *
> 192.168.2.133*, suppose hbase (stand alone mode) running on *192.168.2.122,
> *and i have eclipse or netbeans running on *192.168.2.133,* so i need to
> retrieve and put data to hbase running on other ip, till now what i have
> tried is creating a configuration for hbase inside my code like :
>
> Configuration conf = HBaseConfiguration.create();
> conf.set("hbase.master", "*192.168.2.122:9000*");
> HTable hTable = new HTable(conf, "table");
>
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 12/02/08 19:44:28 INFO zookeeper.ClientCnxn: Opening socket connection to
> server* localhost/127.0.0.1:2181*
> 12/02/08 19:44:28 WARN zookeeper.ClientCnxn: Session 0x1355d44ae6f0003 for
> server null, unexpected error, closing socket connection and attempting
> reconnect
>
> I a not able to understand why its trying to go to *localhost/
> 127.0.0.1:2181
> .*
> *
> *
> My host file configuration is follows :
>
>
> ==
> 127.0.0.1 localhost
> 127.0.0.1 ubuntu.ubuntu-domain ubuntu
> 192.168.2.126 ubuntu
> 192.168.2.125   ubuntu1
> 192.168.2.106   ubuntu2
> 192.168.2.56   ubuntu3
>
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
>
> ==
> I am able to telnet to localhost:9000, 127.0.0.1:9000, myhostname:9000,
> but
> if i am trying to connect to my ip which is 1982.168.2.125 its not
> connecting : its saying connection reffused.
>
> What method should follow to achieve this(connect to HBase running on
> another pc on the same network). any tutorial link will be appreciated.
>


Re: Is it possible to connect HBase remotely?

2012-02-08 Thread N Keywal
You have this with a simple client, or are you doing something more
complicated? Does it work when you run it on the same machine as the hbase
server?
You should have a look at zookeeper logs, it may contain useful info (post
them here as well :-)

Someone posted this some times ago:
http://www.mail-archive.com/user@hbase.apache.org/msg13488.html

On Wed, Feb 8, 2012 at 3:59 PM, shashwat shriparv  wrote:

> Hey,
> I tried using what you suggested not its giving the following exception :
>
> org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
> connect to ZooKeeper but the connection closes immediately. This could be a
> sign that the server has too many connections (30 is the default). Consider
> inspecting your ZK server logs for that error and then make sure you are
> reusing HBaseConfiguration as often as you can. See HTable's javadoc for
> more information.
>
> what may be reason. do i need to do some port related setting on the remote
> machine or what zookeeper configuration i need to look into on the remote
> machine.
>
> Regards
> Shashwat
>
> On Wed, Feb 8, 2012 at 8:13 PM, shashwat shriparv <
> dwivedishash...@gmail.com
> > wrote:
>
> > Let me, try thanks alot
> >
> >
> > On Wed, Feb 8, 2012 at 8:05 PM, N Keywal  wrote:
> >
> >> Hi,
> >>
> >> The client needs to connect to zookeeper as well. You haven't set the
> >> parameters for zookeeper, so it goes with the default settings
> >> (localhost/2181), hence the error you're seeing. Set the zookeeper
> >> connection property in the client, it should work.
> >>
> >> This should do it:
> >>conf .set("hbase.zookeeper.quorum", "192.168.2.122");
> >>conf .set("hbase.zookeeper.property.clientPort", "2181");
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >> On Wed, Feb 8, 2012 at 3:26 PM, shashwat shriparv <
> >> dwivedishash...@gmail.com
> >> > wrote:
> >>
> >> > I have two machine on same network IPs are like *192.168.2.122* and *
> >> > 192.168.2.133*, suppose hbase (stand alone mode) running on
> >> *192.168.2.122,
> >> > *and i have eclipse or netbeans running on *192.168.2.133,* so i need
> to
> >> > retrieve and put data to hbase running on other ip, till now what i
> have
> >> > tried is creating a configuration for hbase inside my code like :
> >> >
> >> > Configuration conf = HBaseConfiguration.create();
> >> > conf.set("hbase.master", "*192.168.2.122:9000*");
> >> > HTable hTable = new HTable(conf, "table");
> >> >
> >> > java.net.ConnectException: Connection refused
> >> > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >> > at
> >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >> > at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> >> > 12/02/08 19:44:28 INFO zookeeper.ClientCnxn: Opening socket connection
> >> to
> >> > server* localhost/127.0.0.1:2181*
> >> > 12/02/08 19:44:28 WARN zookeeper.ClientCnxn: Session 0x1355d44ae6f0003
> >> for
> >> > server null, unexpected error, closing socket connection and
> attempting
> >> > reconnect
> >> >
> >> > I a not able to understand why its trying to go to *localhost/
> >> > 127.0.0.1:2181
> >> > .*
> >> > *
> >> > *
> >> > My host file configuration is follows :
> >> >
> >> >
> >> >
> >>
> ==
> >> > 127.0.0.1 localhost
> >> > 127.0.0.1 ubuntu.ubuntu-domain ubuntu
> >> > 192.168.2.126 ubuntu
> >> > 192.168.2.125   ubuntu1
> >> > 192.168.2.106   ubuntu2
> >> > 192.168.2.56   ubuntu3
> >> >
> >> > # The following lines are desirable for IPv6 capable hosts
> >> > ::1 ip6-localhost ip6-loopback
> >> > fe00::0 ip6-localnet
> >> > ff00::0 ip6-mcastprefix
> >> > ff02::1 ip6-allnodes
> >> > ff02::2 ip6-allrouters
> >> >
> >> >
> >>
> ==
> >> > I am able to telnet to localhost:9000, 127.0.0.1:9000,
> myhostname:9000,
> >> > but
> >> > if i am trying to connect to my ip which is 1982.168.2.125 its not
> >> > connecting : its saying connection reffused.
> >> >
> >> > What method should follow to achieve this(connect to HBase running on
> >> > another pc on the same network). any tutorial link will be
> appreciated.
> >> >
> >>
> >
> >
> >
> > --
> > Shashwat Shriparv
> >
> >
> >
>
>
> --
> Shashwat Shriparv
>


Re: 0.92 in mvn repository somewhere?

2012-02-15 Thread N Keywal
You cannot use the option -D*skipTests* ?

On Wed, Feb 15, 2012 at 5:27 PM, Stack  wrote:

> On Tue, Feb 14, 2012 at 11:18 PM, Ulrich Staudinger
>  wrote:
> > Hi St.Ack,
> >
> > i don't wanna be a pain in the back, but any progress on this?
> >
>
> You are not being a pain.
>
> I'm fumbling the mvn publishing, repeatedly.  Its a little
> embarrassing which is why I'm not talking to much about it (smile).
>
> To publish to maven, we need to build ~3 (perhaps 4 times).  Each
> build takes ~two hours.  They can fail on an odd flakey test.  Also,
> maven release can fail w/ an error code 1 and thats all she wrote so I
> try a few things to try and get over the error code 1.. it doesn't
> always happen (then I restart the two hour build).  I'm doing this
> task in background so I forget about it from time to time (until you
> email me above).
>
> I promise to doc all I do to get it up there this time.  I half did it
> last time: http://hbase.apache.org/book.html#mvn_repo  Also, our build
> gets more sane in next versions taking 1/4 time.
>
> Sorry its taking so long,
> St.Ack
>


Re: HBase-0.92.0 removed HBaseClusterTestCase, is there any replacement for this class

2012-03-07 Thread N Keywal
Hi,

It's replaced by HBaseTestingUtility.

Cheers,

N.

2012/3/8 lulynn_2008 

>  Hi All,
> I am integrating flume-0.9.4 with hbase-0.92.0. And I find hbase-0.92.0
> removed HBaseClusterTestCase which is used in flume-0.9.4.
> My question is:
> Is there any replacement for HBaseClusterTestCase?
>
> Thank you.


RE: Java Programming and Hbase

2012-03-12 Thread N Keywal
You will need the hadoop jar for this. Hbase uses hadoop for common stuff
like the configuration you've seen, so even a simple client needs it.

N.
Le 12 mars 2012 12:06, "Mahdi Negahi"  a écrit :

>
> Is it necessary to install hadoop for hbase, if  want use Hbase in my
> laptop and use it via Java ?
>
> > Date: Mon, 12 Mar 2012 10:43:44 +0100
> > Subject: Re: Java Programming and Hbase
> > From: khi...@googlemail.com
> > To: user@hbase.apache.org
> >
> > you also need to import hadoop.jar, since hbase runs on hahoop
> >
> >
> >
> > On Mon, Mar 12, 2012 at 9:45 AM, Mahdi Negahi  >wrote:
> >
> > >
> > > Dear Friends
> > >
> > >
> > > I try to write a simple application with Java and manipulate my Hbase
> > > table. so I read this post and try to follow it.
> > >
> > > http://hbase.apache.org/docs/current/api/index.html
> > >
> > > I use eclipse and add hbase-092.0.jar as external jar file for my
> project.
> > > but i have problem in the first line of guideline. the following code
> line
> > > Configuration config = HBaseConfiguration.create();
> > >
> > > has a following error
> > >
> > > The type org.apache.hadoop.conf.Configuration cannot be resolved. It is
> > > indirectly referenced from required .class files
> > >
> > > and Configuration's package that eclipse want to add to my project is
> > >
> > > import javax.security.auth.login.Configuration;
> > >
> > > i think it is not an appropriate package.
> > >
> > > please advice me and refer me to new guideline.
> > >
>


Re: Java Programming and Hbase

2012-03-12 Thread N Keywal
only jar files. They are already in the hbase distrib (i.e. if you download
hbase, you get the hadoop jar files you need). You just need to import them
in your IDE.


On Mon, Mar 12, 2012 at 1:05 PM, Mahdi Negahi wrote:

>
> I so confused. I must install Hadoop or use only jar files ?
>
> > Date: Mon, 12 Mar 2012 12:46:09 +0100
> > Subject: RE: Java Programming and Hbase
> > From: nkey...@gmail.com
> > To: user@hbase.apache.org
> >
> > You will need the hadoop jar for this. Hbase uses hadoop for common stuff
> > like the configuration you've seen, so even a simple client needs it.
> >
> > N.
> > Le 12 mars 2012 12:06, "Mahdi Negahi"  a
> écrit :
> >
> > >
> > > Is it necessary to install hadoop for hbase, if  want use Hbase in my
> > > laptop and use it via Java ?
> > >
> > > > Date: Mon, 12 Mar 2012 10:43:44 +0100
> > > > Subject: Re: Java Programming and Hbase
> > > > From: khi...@googlemail.com
> > > > To: user@hbase.apache.org
> > > >
> > > > you also need to import hadoop.jar, since hbase runs on hahoop
> > > >
> > > >
> > > >
> > > > On Mon, Mar 12, 2012 at 9:45 AM, Mahdi Negahi <
> negahi.ma...@hotmail.com
> > > >wrote:
> > > >
> > > > >
> > > > > Dear Friends
> > > > >
> > > > >
> > > > > I try to write a simple application with Java and manipulate my
> Hbase
> > > > > table. so I read this post and try to follow it.
> > > > >
> > > > > http://hbase.apache.org/docs/current/api/index.html
> > > > >
> > > > > I use eclipse and add hbase-092.0.jar as external jar file for my
> > > project.
> > > > > but i have problem in the first line of guideline. the following
> code
> > > line
> > > > > Configuration config = HBaseConfiguration.create();
> > > > >
> > > > > has a following error
> > > > >
> > > > > The type org.apache.hadoop.conf.Configuration cannot be resolved.
> It is
> > > > > indirectly referenced from required .class files
> > > > >
> > > > > and Configuration's package that eclipse want to add to my project
> is
> > > > >
> > > > > import javax.security.auth.login.Configuration;
> > > > >
> > > > > i think it is not an appropriate package.
> > > > >
> > > > > please advice me and refer me to new guideline.
> > > > >
> > >
>
>


Re: Retrieve Column Family and Column with Java API

2012-03-12 Thread N Keywal
Hi,

Yes and no.
No, because as a table can have millions of columns and these columns can
be different for every row, the only way to get all the columns is to scan
the whole table.
Yes, because if you scan the table you can have the columns names. See
Result#getMap: it's organized by family --> qualifier --> version --> value
And yes, because you can get the column families from the HTableDescriptor.

N.

On Mon, Mar 12, 2012 at 3:10 PM, Mahdi Negahi wrote:

>
> Dear All friends
>
> Is there any way to retrieve a table's column families and columns with
> Java.
>
> for example, i want to scan a table that i know only its name.
>