Integration of Hadoop Streaming with Ruby and HBase

2012-03-16 Thread Subir S
Hi,

I just joined HBase user list. So this is my first question :-)

Is there any way i can dump the output of my Ruby Map Reduce jobs into
HBase directly? In other words does Hadoop Streaming with Ruby integrate
with HBase? Like Pig has HBaseStorage etc.

Thanks in advance!

Regards
Subir


Re: gc pause killing regionserver

2012-03-16 Thread Ferdy Galema
CPU resources never was a problem, munin shows there is enough idle time.
Although the graphs might exclude sharp peaks, I'm pretty certain CPU is
not a problem.

After some experiments it seems that disabling swap indeed seems to be the
solution to the aborting regionserver problem. I reverted the GC settings
to be more 'default'.  Currently that is -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC. Overcommit is set to 50%. The consequence of this is
that the memory policy should be strictly enforced, to prevent important
processes being killed by the OOM-killer.

Ferdy.



On Fri, Mar 9, 2012 at 5:30 AM, Laxman lakshman...@huawei.com wrote:

 Hi Ferdy,

   I'm running regionservers with 2GB heap and following tuning options:
   -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16
   -XX:CMSInitiatingOccupancyFraction=70 -
  XX:+UseCMSInitiatingOccupancyOnly
   -XX:MaxGCPauseMillis=100

 GC with huge heaps may take longer GC pauses.

 But in your case, heap is just 2GB. Doesn't look like a general problem to
 me.

 [Times: user=1.82 sys=0.31, real=51.38 secs]

 User - 1.82 seconds - Excluding context switch
 Real - 51.38 seconds - Including context switch

 Based on these numbers, it looks to me like a problem with CPU
 underprovisioning.

 So, its worth investigating the following

 - What is CPU usage at that point of time?
 - Are there any processes consuming all your CPU resources?

 Alternatively, you can also try increasing the zookeeper session timeout.
 --
 Regards,
 Laxman

  -Original Message-
  From: Gaojinchao [mailto:gaojinc...@huawei.com]
  Sent: Friday, March 09, 2012 8:29 AM
  To: user@hbase.apache.org
  Subject: re: gc pause killing regionserver
 
  We encountered the same thing. we set swappiness priority to 0. But
  swap is still working.
  So we disable swap.
 
  -邮件原件-
  发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel
  Cryans
  发送时间: 2012年3月9日 6:29
  收件人: user@hbase.apache.org
  主题: Re: gc pause killing regionserver
 
  When real cpu is bigger than user cpu it very often points to
  swapping. Even if you think you turned that off or that there's no
  possible way you could be swapping, check it again.
 
  I could also be that your CPUs were busy doing something else, I've
  seen crazy context switching CPUs freezing up my nodes, but in my
  experience it's not very likely.
 
  Setting swappiness to 0 just means it's not going to page anything out
  until it really needs to do it, meaning it's possible to swap. The
  only way to guarantee no swapping whatsoever is giving your system 0
  swap space.
 
  Regarding that promotion failure, you could try reducing the eden
  size. Try -Xmn128m
 
  J-D
 
  On Sat, Mar 3, 2012 at 5:05 AM, Ferdy Galema ferdy.gal...@kalooga.com
  wrote:
   Hi,
  
   I'm running regionservers with 2GB heap and following tuning options:
   -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16
   -XX:CMSInitiatingOccupancyFraction=70 -
  XX:+UseCMSInitiatingOccupancyOnly
   -XX:MaxGCPauseMillis=100
  
   A regionserver aborted (YouAreDeadException) and this was printed in
  the gc
   logs (all is shown up until the abort)
  
   211663.516: [GC 211663.516: [ParNew: 118715K-13184K(118912K),
  0.0445390
   secs] 1373940K-1289814K(2233472K), 0.0446420 secs] [Times: user=0.14
   sys=0.01, real=0.05 secs]
   211663.686: [GC 211663.686: [ParNew: 118912K-13184K(118912K),
  0.0594280
   secs] 1395542K-1310185K(2233472K), 0.0595420 secs] [Times: user=0.15
   sys=0.00, real=0.06 secs]
   211663.869: [GC 211663.869: [ParNew: 118790K-13184K(118912K),
  0.0434820
   secs] 1415792K-1331317K(2233472K), 0.0435930 secs] [Times: user=0.13
   sys=0.01, real=0.04 secs]
   211667.598: [GC 211667.598: [ParNew (promotion failed):
   118912K-118912K(118912K), 0.0225390 secs]211667.621: [CMS:
   1330845K-1127914K(2114560K), 51.3610670 secs]
   1437045K-1127914K(2233472K), [CMS Perm : 20680K-20622K(34504K)],
   51.3838170 secs] [Times: user=1.82 sys=0.31, real=51.38 secs]
   211719.713: [GC 211719.714: [ParNew: 105723K-13184K(118912K),
  0.0176130
   secs] 1233638K-1149393K(2233472K), 0.0177230 secs] [Times: user=0.07
   sys=0.00, real=0.02 secs]
   211719.851: [GC 211719.852: [ParNew: 118912K-13184K(118912K),
  0.0281860
   secs] 1255121K-1170269K(2233472K), 0.0282970 secs] [Times: user=0.10
   sys=0.01, real=0.03 secs]
   211719.993: [GC 211719.993: [ParNew: 118795K-13184K(118912K),
  0.0276320
   secs] 1275880K-1191268K(2233472K), 0.0277350 secs] [Times: user=0.09
   sys=0.00, real=0.03 secs]
   211720.490: [GC 211720.490: [ParNew: 118912K-13184K(118912K),
  0.0624650
   secs] 1296996K-1210640K(2233472K), 0.0625560 secs] [Times: user=0.15
   sys=0.00, real=0.06 secs]
   211720.687: [GC 211720.687: [ParNew: 118702K-13184K(118912K),
  0.1651750
   secs] 1316159K-1231993K(2233472K), 0.1652660 secs] [Times: user=0.25
   sys=0.01, real=0.17 secs]
   211721.038: [GC 211721.038: [ParNew: 118912K-13184K(118912K),
  0.0952750
   secs] 

Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.

2012-03-16 Thread Stack
On Fri, Mar 16, 2012 at 9:01 AM, yonghu yongyong...@gmail.com wrote:
 Hello,

 Can anyone give me a example to construct the  HFileReaderV2 object to
 read the HFile content.

 Thanks!


See http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html#499

St.Ack


Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.

2012-03-16 Thread yonghu
I implemented the code like this way. My Hbase version is 0.92.0.

Configuration conf = new Configuration();
CacheConfig cconf = new CacheConfig(conf);
FileSystem fs = FileSystem.get(conf);
Path path = new
Path(hdfs://localhost:8020/hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9);
HFile.Reader reader = HFile.createReader(fs, path, cconf);
HFileScanner hscanner = reader.getScanner(false, false, false);
hscanner.seekTo();
KeyValue kv = hscanner.getKeyValue();
System.out.println(kv.toString() + + + 
kv.getValue().toString());
while(hscanner.next()){
kv = hscanner.getKeyValue();
System.out.println(kv.toString() + + + 
kv.getValue().toString());
   }

I have already flush the data from the memstore into the disk. But I
got the error information, Exception in thread main
java.io.IOException: Cannot open filename
/hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9.

I can see the results from the command line. I want to know that why
its code dose not work?

Regards!

Yong

On Fri, Mar 16, 2012 at 5:50 PM, yonghu yongyong...@gmail.com wrote:
 Thanks for your information.

 Regards!

 Yong

 On Fri, Mar 16, 2012 at 5:39 PM, Stack st...@duboce.net wrote:
 On Fri, Mar 16, 2012 at 9:01 AM, yonghu yongyong...@gmail.com wrote:
 Hello,

 Can anyone give me a example to construct the  HFileReaderV2 object to
 read the HFile content.

 Thanks!


 See 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html#499

 St.Ack


Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal
Hi,

The way you describe the in memory caching component, it looks very
similar to HBase memstore. Any reason for not relying on it?

N.

On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 Dear all,

 We are currently working on an architecture for a system that should be
 serve as an archive for 1000+ measuring components that frequently (~30/s)
 send messages containing measurement values (~300 bytes/message). The
 archiving system should be capable of not only serving as a long term
 storage but also as a kind of streaming data processing and caching
 component. There are several functions that should be computed on the
 incoming data before finally storing it.

 We suggested an architecture that comprises of:
 A message routing component that could route data to calculations and
 route calculation results to other components that are interested in these
 data.
 An in memory caching component that is used for storing up to 10 - 20
 minutes of data before it is written to the long term archive.
 An hBase database that is used for the long term storage.
 MapReduce framework for doing analytics on the data stored in the hBase
 database.

 The complete system should be failsafe and reliable regarding component
 failures and it should scale with the number of computers that are utilized.

 Are there any suggestions or feedback to this approach from the community?
 and are there any suggestions which tools or systems to use for the message
 routing component and the in memory cache.

 Thanks for any help and suggestions

 all the best

 Christian


 8---

 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 Munich, Germany
 Tel.: +49 89 636-42722
 Fax: +49 89 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
 Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
 Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
 Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
 Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
 Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684;
 WEEE-Reg.-No. DE 23691322



Re: Problem accessing table.jsp page in HBase

2012-03-16 Thread anil gupta
Hi Steven,

Thanks for your reply. The folder is present under the hbase home directory.

Here is the output of ls -lRt run on that folder:
[root@namenode1 hbase]# ls -lRt hbase-webapps/
hbase-webapps/:
total 12
drwxr-xr-x. 2 root root 4096 Jan 20 08:26 static
drwxr-xr-x. 3 root root 4096 Jan 20 08:26 regionserver
drwxr-xr-x. 3 root root 4096 Jan 20 08:26 master

hbase-webapps/static:
total 8
-rwxr-xr-x. 1 root root  545 Oct 13 20:31 hbase.css
-rwxr-xr-x. 1 root root 3592 Oct 13 20:31 hbase_logo_med.gif

hbase-webapps/regionserver:
total 8
drwxr-xr-x. 2 root root 4096 Mar 16 09:58 WEB-INF
-rwxr-xr-x. 1 root root   56 Oct 13 20:31 index.html

hbase-webapps/regionserver/WEB-INF:
total 4
-rwxr-xr-x. 1 root root 722 Oct 13 20:32 web.xml

hbase-webapps/master:
total 8
drwxr-xr-x. 2 root root 4096 Jan 20 08:26 WEB-INF
-rwxr-xr-x. 1 root root   60 Oct 13 20:31 index.html

hbase-webapps/master/WEB-INF:
total 4
-rwxr-xr-x. 1 root root 1444 Oct 13 20:32 web.xml

Can you provide the ls -lRt output of your hbase-webapps folder or
point out if any of the files are missing on my cluster.

Thanks,
Anil Gupta


On Wed, Mar 14, 2012 at 11:40 PM, steven zhuang zhuangxin8...@gmail.comwrote:

 hi, have you checked if there is a hbase-webapps/ directory under your
 hbase
 home dir. I have had the same problem, built the HBase cdh3u3 myself, and
 somehow the directory was missing when I checked out the code from SVN,
 fixed it
 by coping the directory to the right spot.





-- 
Thanks  Regards,
Anil Gupta


ANN: HBase 0.90.6 is available for download

2012-03-16 Thread Ramakrishna s vasudevan
Hi All,

Your HBase Crew are pleased to announce the release of HBase 0.90.6.

Download it from your favourite Apache mirror [1].

HBase 0.90.6 is a bug-fix version on the 0.90 branch, with a handful of issues 
fixed in it.  For the complete list of changes, see the release notes [2].

Yours
The HBasistas

P.S. A big thanks to all those who contributed to this 0.90.6 release.!!!

1. http://www.apache.org/dyn/closer.cgi/hbase/
2. 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12319200






Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.

2012-03-16 Thread yonghu
I noticed the problem is that somehow I lost the data from hdfs. The
code is ok.

Regards!

Yong

On Fri, Mar 16, 2012 at 5:59 PM, yonghu yongyong...@gmail.com wrote:
 I implemented the code like this way. My Hbase version is 0.92.0.

                Configuration conf = new Configuration();
                CacheConfig cconf = new CacheConfig(conf);
                FileSystem fs = FileSystem.get(conf);
                Path path = new
 Path(hdfs://localhost:8020/hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9);
                HFile.Reader reader = HFile.createReader(fs, path, cconf);
                HFileScanner hscanner = reader.getScanner(false, false, false);
                hscanner.seekTo();
                KeyValue kv = hscanner.getKeyValue();
                System.out.println(kv.toString() + + + 
 kv.getValue().toString());
                while(hscanner.next()){
                        kv = hscanner.getKeyValue();
                        System.out.println(kv.toString() + + + 
 kv.getValue().toString());
               }

 I have already flush the data from the memstore into the disk. But I
 got the error information, Exception in thread main
 java.io.IOException: Cannot open filename
 /hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9.

 I can see the results from the command line. I want to know that why
 its code dose not work?

 Regards!

 Yong

 On Fri, Mar 16, 2012 at 5:50 PM, yonghu yongyong...@gmail.com wrote:
 Thanks for your information.

 Regards!

 Yong

 On Fri, Mar 16, 2012 at 5:39 PM, Stack st...@duboce.net wrote:
 On Fri, Mar 16, 2012 at 9:01 AM, yonghu yongyong...@gmail.com wrote:
 Hello,

 Can anyone give me a example to construct the  HFileReaderV2 object to
 read the HFile content.

 Thanks!


 See 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html#499

 St.Ack


Re: HBase Region splitting may times.

2012-03-16 Thread hdev ml
Does anybody have an answer to this?

Please let me know.

Thanks
Harshad

On Mar 15, 2012, at 11:12 PM, hdev ml wrote:

  Hi,
 
  We are using HBase version 0.90.3 in a 2 node cluster. Maybe this
 question
  has been asked too many times. But I could not find a good answer for
 this.
 
  I created a test table with one column family cf with 2 columns a
 and
  b, each having value of a 3000 character long string. Maximum versions
  allowed is 3 and maxfilesize is at default 256M.
 
  In a loop, I put 10 rows into it, with 3000 character long values
 for
  both a and b. Row key is incremental like row to row0009.
 
  I applied an outer loop which will run the above 10 row put loop, 10
  times.
 
  After running it 10 times, I found that it split into following number
 of
  regions for every run.
 
  Run Regions
  14
  25
  37
  4   10
  5   13
  6   19
  7   19
  8   19
  9   19
  10  19
 
  Question is, why did it stabilize after the 6th run? Shouldn't it
 stabilize
  after 3 runs, because number of versions is 3? After 3 runs, It should
 not
  split further, because new versions are being added but old version
 should
  be purged/deleted. Is that a correct statement?
 
  Any help is really appreciated.
 
  Thanks,
  Harshad





Re: HBase Region splitting may times.

2012-03-16 Thread Jean-Daniel Cryans
On Fri, Mar 16, 2012 at 10:40 AM, hdev ml hde...@gmail.com wrote:
 Does anybody have an answer to this?

Is there a hurry? Have you tried gathering more data about it?


  I created a test table with one column family cf with 2 columns a
 and
  b, each having value of a 3000 character long string. Maximum versions
  allowed is 3 and maxfilesize is at default 256M.
 
  In a loop, I put 10 rows into it, with 3000 character long values
 for
  both a and b. Row key is incremental like row to row0009.
 
  I applied an outer loop which will run the above 10 row put loop, 10
  times.
 
  After running it 10 times, I found that it split into following number
 of
  regions for every run.
 
  Run     Regions
  1            4
  2            5
  3            7
  4           10
  5           13
  6           19
  7           19
  8           19
  9           19
  10          19
 
  Question is, why did it stabilize after the 6th run? Shouldn't it
 stabilize

If you let it settle down, does it split later? It might just be that
it was getting behind compactions.

  after 3 runs, because number of versions is 3? After 3 runs, It should
 not
  split further, because new versions are being added but old version
 should
  be purged/deleted. Is that a correct statement?

No, unless you got lucky and the major compactions ran during the
import, but even then it will run 24h after a region is created.

J-D


Re: HBase Region splitting may times.

2012-03-16 Thread hdev ml
Thanks J-D for the answers.

My answers in *bold* below.

On Fri, Mar 16, 2012 at 10:52 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 On Fri, Mar 16, 2012 at 10:40 AM, hdev ml hde...@gmail.com wrote:
  Does anybody have an answer to this?

 Is there a hurry? Have you tried gathering more data about it?

*I am doing some capacity planning and wanted an answer for this.
Apologies, if I sounded too pushy. Yes I went over the documentation,
google search, but could not find reference to this particular problem.*

 
   I created a test table with one column family cf with 2 columns a
  and
   b, each having value of a 3000 character long string. Maximum
 versions
   allowed is 3 and maxfilesize is at default 256M.
  
   In a loop, I put 10 rows into it, with 3000 character long values
  for
   both a and b. Row key is incremental like row to row0009.
  
   I applied an outer loop which will run the above 10 row put
 loop, 10
   times.
  
   After running it 10 times, I found that it split into following
 number
  of
   regions for every run.
  
   Run Regions
   14
   25
   37
   4   10
   5   13
   6   19
   7   19
   8   19
   9   19
   10  19
  
   Question is, why did it stabilize after the 6th run? Shouldn't it
  stabilize

 If you let it settle down, does it split later? It might just be that
 it was getting behind compactions.

*Yes, I let it settle down for 2 days. Ran major_compact from the
shell and that did nothing. It did not split later though.*


   after 3 runs, because number of versions is 3? After 3 runs, It
 should
  not
   split further, because new versions are being added but old version
  should
   be purged/deleted. Is that a correct statement?

 No, unless you got lucky and the major compactions ran during the
 import, but even then it will run 24h after a region is created.

*As a I mentioned, I already ran major compaction with no positive
results.*


 J-D



Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.

2012-03-16 Thread Stack
On Fri, Mar 16, 2012 at 10:12 AM, yonghu yongyong...@gmail.com wrote:
 I noticed the problem is that somehow I lost the data from hdfs. The
 code is ok.

 Regards!


Why write code?  The hfile tool,
http://hbase.apache.org/book.html#hfile_tool2, gives a pretty rich
summary on file content (or you can ask it dump it for you).

St.Ack


Re: Streaming data processing and hBase

2012-03-16 Thread N Keywal
Hi Christian,

It's a component internal to HBase, so you don't have to use it directly.
See http://hbase.apache.org/book/wal.html on how writes are handled by
HBase to ensure reliability  data distribution...

Cheers,

N.

On Fri, Mar 16, 2012 at 7:39 PM, Kleegrewe, Christian 
christian.kleegr...@siemens.com wrote:

 Hi

 Is this memstore replicated? Since we store a significant amount of data
 in the memory cache we need a replicated solution. Also I can't find lots
 of information besides a java api doc for the MemStore class. I will
 continue searching for this, but if you have any URL with more
 documentation please send it. Thanks in advance

 regards

 Christian


 8--
 Siemens AG
 Corporate Technology
 Corporate Research and Technologies
 CT T DE IT3
 Otto-Hahn-Ring 6
 81739 München, Deutschland
 Tel.: +49 89 636-42722
 Fax: +49 89 636-41423
 mailto:christian.kleegr...@siemens.com

 Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard
 Cromme; Vorstand: Peter Löscher, Vorsitzender; Roland Busch, Brigitte
 Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt,
 Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft:
 Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg,
 HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322


 -Ursprüngliche Nachricht-
 Von: N Keywal [mailto:nkey...@gmail.com]
 Gesendet: Freitag, 16. März 2012 18:02
 An: user@hbase.apache.org
 Betreff: Re: Streaming data processing and hBase

 Hi,

 The way you describe the in memory caching component, it looks very
 similar to HBase memstore. Any reason for not relying on it?

 N.

 On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian 
 christian.kleegr...@siemens.com wrote:

  Dear all,
 
  We are currently working on an architecture for a system that should be
  serve as an archive for 1000+ measuring components that frequently
 (~30/s)
  send messages containing measurement values (~300 bytes/message). The
  archiving system should be capable of not only serving as a long term
  storage but also as a kind of streaming data processing and caching
  component. There are several functions that should be computed on the
  incoming data before finally storing it.
 
  We suggested an architecture that comprises of:
  A message routing component that could route data to calculations and
  route calculation results to other components that are interested in
 these
  data.
  An in memory caching component that is used for storing up to 10 - 20
  minutes of data before it is written to the long term archive.
  An hBase database that is used for the long term storage.
  MapReduce framework for doing analytics on the data stored in the hBase
  database.
 
  The complete system should be failsafe and reliable regarding component
  failures and it should scale with the number of computers that are
 utilized.
 
  Are there any suggestions or feedback to this approach from the
 community?
  and are there any suggestions which tools or systems to use for the
 message
  routing component and the in memory cache.
 
  Thanks for any help and suggestions
 
  all the best
 
  Christian
 
 
 
 8---
 
  Siemens AG
  Corporate Technology
  Corporate Research and Technologies
  CT T DE IT3
  Otto-Hahn-Ring 6
  81739 Munich, Germany
  Tel.: +49 89 636-42722
  Fax: +49 89 636-41423
  mailto:christian.kleegr...@siemens.com
 
  Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard
  Cromme; Managing Board: Peter Loescher, Chairman, President and Chief
  Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe
  Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y.
  Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany;
  Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB
 6684;
  WEEE-Reg.-No. DE 23691322