Integration of Hadoop Streaming with Ruby and HBase
Hi, I just joined HBase user list. So this is my first question :-) Is there any way i can dump the output of my Ruby Map Reduce jobs into HBase directly? In other words does Hadoop Streaming with Ruby integrate with HBase? Like Pig has HBaseStorage etc. Thanks in advance! Regards Subir
Re: gc pause killing regionserver
CPU resources never was a problem, munin shows there is enough idle time. Although the graphs might exclude sharp peaks, I'm pretty certain CPU is not a problem. After some experiments it seems that disabling swap indeed seems to be the solution to the aborting regionserver problem. I reverted the GC settings to be more 'default'. Currently that is -XX:+UseConcMarkSweepGC -XX:+UseParNewGC. Overcommit is set to 50%. The consequence of this is that the memory policy should be strictly enforced, to prevent important processes being killed by the OOM-killer. Ferdy. On Fri, Mar 9, 2012 at 5:30 AM, Laxman lakshman...@huawei.com wrote: Hi Ferdy, I'm running regionservers with 2GB heap and following tuning options: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16 -XX:CMSInitiatingOccupancyFraction=70 - XX:+UseCMSInitiatingOccupancyOnly -XX:MaxGCPauseMillis=100 GC with huge heaps may take longer GC pauses. But in your case, heap is just 2GB. Doesn't look like a general problem to me. [Times: user=1.82 sys=0.31, real=51.38 secs] User - 1.82 seconds - Excluding context switch Real - 51.38 seconds - Including context switch Based on these numbers, it looks to me like a problem with CPU underprovisioning. So, its worth investigating the following - What is CPU usage at that point of time? - Are there any processes consuming all your CPU resources? Alternatively, you can also try increasing the zookeeper session timeout. -- Regards, Laxman -Original Message- From: Gaojinchao [mailto:gaojinc...@huawei.com] Sent: Friday, March 09, 2012 8:29 AM To: user@hbase.apache.org Subject: re: gc pause killing regionserver We encountered the same thing. we set swappiness priority to 0. But swap is still working. So we disable swap. -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2012年3月9日 6:29 收件人: user@hbase.apache.org 主题: Re: gc pause killing regionserver When real cpu is bigger than user cpu it very often points to swapping. Even if you think you turned that off or that there's no possible way you could be swapping, check it again. I could also be that your CPUs were busy doing something else, I've seen crazy context switching CPUs freezing up my nodes, but in my experience it's not very likely. Setting swappiness to 0 just means it's not going to page anything out until it really needs to do it, meaning it's possible to swap. The only way to guarantee no swapping whatsoever is giving your system 0 swap space. Regarding that promotion failure, you could try reducing the eden size. Try -Xmn128m J-D On Sat, Mar 3, 2012 at 5:05 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote: Hi, I'm running regionservers with 2GB heap and following tuning options: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16 -XX:CMSInitiatingOccupancyFraction=70 - XX:+UseCMSInitiatingOccupancyOnly -XX:MaxGCPauseMillis=100 A regionserver aborted (YouAreDeadException) and this was printed in the gc logs (all is shown up until the abort) 211663.516: [GC 211663.516: [ParNew: 118715K-13184K(118912K), 0.0445390 secs] 1373940K-1289814K(2233472K), 0.0446420 secs] [Times: user=0.14 sys=0.01, real=0.05 secs] 211663.686: [GC 211663.686: [ParNew: 118912K-13184K(118912K), 0.0594280 secs] 1395542K-1310185K(2233472K), 0.0595420 secs] [Times: user=0.15 sys=0.00, real=0.06 secs] 211663.869: [GC 211663.869: [ParNew: 118790K-13184K(118912K), 0.0434820 secs] 1415792K-1331317K(2233472K), 0.0435930 secs] [Times: user=0.13 sys=0.01, real=0.04 secs] 211667.598: [GC 211667.598: [ParNew (promotion failed): 118912K-118912K(118912K), 0.0225390 secs]211667.621: [CMS: 1330845K-1127914K(2114560K), 51.3610670 secs] 1437045K-1127914K(2233472K), [CMS Perm : 20680K-20622K(34504K)], 51.3838170 secs] [Times: user=1.82 sys=0.31, real=51.38 secs] 211719.713: [GC 211719.714: [ParNew: 105723K-13184K(118912K), 0.0176130 secs] 1233638K-1149393K(2233472K), 0.0177230 secs] [Times: user=0.07 sys=0.00, real=0.02 secs] 211719.851: [GC 211719.852: [ParNew: 118912K-13184K(118912K), 0.0281860 secs] 1255121K-1170269K(2233472K), 0.0282970 secs] [Times: user=0.10 sys=0.01, real=0.03 secs] 211719.993: [GC 211719.993: [ParNew: 118795K-13184K(118912K), 0.0276320 secs] 1275880K-1191268K(2233472K), 0.0277350 secs] [Times: user=0.09 sys=0.00, real=0.03 secs] 211720.490: [GC 211720.490: [ParNew: 118912K-13184K(118912K), 0.0624650 secs] 1296996K-1210640K(2233472K), 0.0625560 secs] [Times: user=0.15 sys=0.00, real=0.06 secs] 211720.687: [GC 211720.687: [ParNew: 118702K-13184K(118912K), 0.1651750 secs] 1316159K-1231993K(2233472K), 0.1652660 secs] [Times: user=0.25 sys=0.01, real=0.17 secs] 211721.038: [GC 211721.038: [ParNew: 118912K-13184K(118912K), 0.0952750 secs]
Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.
On Fri, Mar 16, 2012 at 9:01 AM, yonghu yongyong...@gmail.com wrote: Hello, Can anyone give me a example to construct the HFileReaderV2 object to read the HFile content. Thanks! See http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html#499 St.Ack
Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.
I implemented the code like this way. My Hbase version is 0.92.0. Configuration conf = new Configuration(); CacheConfig cconf = new CacheConfig(conf); FileSystem fs = FileSystem.get(conf); Path path = new Path(hdfs://localhost:8020/hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9); HFile.Reader reader = HFile.createReader(fs, path, cconf); HFileScanner hscanner = reader.getScanner(false, false, false); hscanner.seekTo(); KeyValue kv = hscanner.getKeyValue(); System.out.println(kv.toString() + + + kv.getValue().toString()); while(hscanner.next()){ kv = hscanner.getKeyValue(); System.out.println(kv.toString() + + + kv.getValue().toString()); } I have already flush the data from the memstore into the disk. But I got the error information, Exception in thread main java.io.IOException: Cannot open filename /hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9. I can see the results from the command line. I want to know that why its code dose not work? Regards! Yong On Fri, Mar 16, 2012 at 5:50 PM, yonghu yongyong...@gmail.com wrote: Thanks for your information. Regards! Yong On Fri, Mar 16, 2012 at 5:39 PM, Stack st...@duboce.net wrote: On Fri, Mar 16, 2012 at 9:01 AM, yonghu yongyong...@gmail.com wrote: Hello, Can anyone give me a example to construct the HFileReaderV2 object to read the HFile content. Thanks! See http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html#499 St.Ack
Re: Streaming data processing and hBase
Hi, The way you describe the in memory caching component, it looks very similar to HBase memstore. Any reason for not relying on it? N. On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: Dear all, We are currently working on an architecture for a system that should be serve as an archive for 1000+ measuring components that frequently (~30/s) send messages containing measurement values (~300 bytes/message). The archiving system should be capable of not only serving as a long term storage but also as a kind of streaming data processing and caching component. There are several functions that should be computed on the incoming data before finally storing it. We suggested an architecture that comprises of: A message routing component that could route data to calculations and route calculation results to other components that are interested in these data. An in memory caching component that is used for storing up to 10 - 20 minutes of data before it is written to the long term archive. An hBase database that is used for the long term storage. MapReduce framework for doing analytics on the data stored in the hBase database. The complete system should be failsafe and reliable regarding component failures and it should scale with the number of computers that are utilized. Are there any suggestions or feedback to this approach from the community? and are there any suggestions which tools or systems to use for the message routing component and the in memory cache. Thanks for any help and suggestions all the best Christian 8--- Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 Munich, Germany Tel.: +49 89 636-42722 Fax: +49 89 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard Cromme; Managing Board: Peter Loescher, Chairman, President and Chief Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322
Re: Problem accessing table.jsp page in HBase
Hi Steven, Thanks for your reply. The folder is present under the hbase home directory. Here is the output of ls -lRt run on that folder: [root@namenode1 hbase]# ls -lRt hbase-webapps/ hbase-webapps/: total 12 drwxr-xr-x. 2 root root 4096 Jan 20 08:26 static drwxr-xr-x. 3 root root 4096 Jan 20 08:26 regionserver drwxr-xr-x. 3 root root 4096 Jan 20 08:26 master hbase-webapps/static: total 8 -rwxr-xr-x. 1 root root 545 Oct 13 20:31 hbase.css -rwxr-xr-x. 1 root root 3592 Oct 13 20:31 hbase_logo_med.gif hbase-webapps/regionserver: total 8 drwxr-xr-x. 2 root root 4096 Mar 16 09:58 WEB-INF -rwxr-xr-x. 1 root root 56 Oct 13 20:31 index.html hbase-webapps/regionserver/WEB-INF: total 4 -rwxr-xr-x. 1 root root 722 Oct 13 20:32 web.xml hbase-webapps/master: total 8 drwxr-xr-x. 2 root root 4096 Jan 20 08:26 WEB-INF -rwxr-xr-x. 1 root root 60 Oct 13 20:31 index.html hbase-webapps/master/WEB-INF: total 4 -rwxr-xr-x. 1 root root 1444 Oct 13 20:32 web.xml Can you provide the ls -lRt output of your hbase-webapps folder or point out if any of the files are missing on my cluster. Thanks, Anil Gupta On Wed, Mar 14, 2012 at 11:40 PM, steven zhuang zhuangxin8...@gmail.comwrote: hi, have you checked if there is a hbase-webapps/ directory under your hbase home dir. I have had the same problem, built the HBase cdh3u3 myself, and somehow the directory was missing when I checked out the code from SVN, fixed it by coping the directory to the right spot. -- Thanks Regards, Anil Gupta
ANN: HBase 0.90.6 is available for download
Hi All, Your HBase Crew are pleased to announce the release of HBase 0.90.6. Download it from your favourite Apache mirror [1]. HBase 0.90.6 is a bug-fix version on the 0.90 branch, with a handful of issues fixed in it. For the complete list of changes, see the release notes [2]. Yours The HBasistas P.S. A big thanks to all those who contributed to this 0.90.6 release.!!! 1. http://www.apache.org/dyn/closer.cgi/hbase/ 2. https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12319200
Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.
I noticed the problem is that somehow I lost the data from hdfs. The code is ok. Regards! Yong On Fri, Mar 16, 2012 at 5:59 PM, yonghu yongyong...@gmail.com wrote: I implemented the code like this way. My Hbase version is 0.92.0. Configuration conf = new Configuration(); CacheConfig cconf = new CacheConfig(conf); FileSystem fs = FileSystem.get(conf); Path path = new Path(hdfs://localhost:8020/hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9); HFile.Reader reader = HFile.createReader(fs, path, cconf); HFileScanner hscanner = reader.getScanner(false, false, false); hscanner.seekTo(); KeyValue kv = hscanner.getKeyValue(); System.out.println(kv.toString() + + + kv.getValue().toString()); while(hscanner.next()){ kv = hscanner.getKeyValue(); System.out.println(kv.toString() + + + kv.getValue().toString()); } I have already flush the data from the memstore into the disk. But I got the error information, Exception in thread main java.io.IOException: Cannot open filename /hbase/test/97366a27a98e0f43dfcecf9b63abd630/Course/33b6235defe44f69bda69c04e87fc7a9. I can see the results from the command line. I want to know that why its code dose not work? Regards! Yong On Fri, Mar 16, 2012 at 5:50 PM, yonghu yongyong...@gmail.com wrote: Thanks for your information. Regards! Yong On Fri, Mar 16, 2012 at 5:39 PM, Stack st...@duboce.net wrote: On Fri, Mar 16, 2012 at 9:01 AM, yonghu yongyong...@gmail.com wrote: Hello, Can anyone give me a example to construct the HFileReaderV2 object to read the HFile content. Thanks! See http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html#499 St.Ack
Re: HBase Region splitting may times.
Does anybody have an answer to this? Please let me know. Thanks Harshad On Mar 15, 2012, at 11:12 PM, hdev ml wrote: Hi, We are using HBase version 0.90.3 in a 2 node cluster. Maybe this question has been asked too many times. But I could not find a good answer for this. I created a test table with one column family cf with 2 columns a and b, each having value of a 3000 character long string. Maximum versions allowed is 3 and maxfilesize is at default 256M. In a loop, I put 10 rows into it, with 3000 character long values for both a and b. Row key is incremental like row to row0009. I applied an outer loop which will run the above 10 row put loop, 10 times. After running it 10 times, I found that it split into following number of regions for every run. Run Regions 14 25 37 4 10 5 13 6 19 7 19 8 19 9 19 10 19 Question is, why did it stabilize after the 6th run? Shouldn't it stabilize after 3 runs, because number of versions is 3? After 3 runs, It should not split further, because new versions are being added but old version should be purged/deleted. Is that a correct statement? Any help is really appreciated. Thanks, Harshad
Re: HBase Region splitting may times.
On Fri, Mar 16, 2012 at 10:40 AM, hdev ml hde...@gmail.com wrote: Does anybody have an answer to this? Is there a hurry? Have you tried gathering more data about it? I created a test table with one column family cf with 2 columns a and b, each having value of a 3000 character long string. Maximum versions allowed is 3 and maxfilesize is at default 256M. In a loop, I put 10 rows into it, with 3000 character long values for both a and b. Row key is incremental like row to row0009. I applied an outer loop which will run the above 10 row put loop, 10 times. After running it 10 times, I found that it split into following number of regions for every run. Run Regions 1 4 2 5 3 7 4 10 5 13 6 19 7 19 8 19 9 19 10 19 Question is, why did it stabilize after the 6th run? Shouldn't it stabilize If you let it settle down, does it split later? It might just be that it was getting behind compactions. after 3 runs, because number of versions is 3? After 3 runs, It should not split further, because new versions are being added but old version should be purged/deleted. Is that a correct statement? No, unless you got lucky and the major compactions ran during the import, but even then it will run 24h after a region is created. J-D
Re: HBase Region splitting may times.
Thanks J-D for the answers. My answers in *bold* below. On Fri, Mar 16, 2012 at 10:52 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Fri, Mar 16, 2012 at 10:40 AM, hdev ml hde...@gmail.com wrote: Does anybody have an answer to this? Is there a hurry? Have you tried gathering more data about it? *I am doing some capacity planning and wanted an answer for this. Apologies, if I sounded too pushy. Yes I went over the documentation, google search, but could not find reference to this particular problem.* I created a test table with one column family cf with 2 columns a and b, each having value of a 3000 character long string. Maximum versions allowed is 3 and maxfilesize is at default 256M. In a loop, I put 10 rows into it, with 3000 character long values for both a and b. Row key is incremental like row to row0009. I applied an outer loop which will run the above 10 row put loop, 10 times. After running it 10 times, I found that it split into following number of regions for every run. Run Regions 14 25 37 4 10 5 13 6 19 7 19 8 19 9 19 10 19 Question is, why did it stabilize after the 6th run? Shouldn't it stabilize If you let it settle down, does it split later? It might just be that it was getting behind compactions. *Yes, I let it settle down for 2 days. Ran major_compact from the shell and that did nothing. It did not split later though.* after 3 runs, because number of versions is 3? After 3 runs, It should not split further, because new versions are being added but old version should be purged/deleted. Is that a correct statement? No, unless you got lucky and the major compactions ran during the import, but even then it will run 24h after a region is created. *As a I mentioned, I already ran major compaction with no positive results.* J-D
Re: Can anyone show me how to construct the HFileReaderV2 object to read the HFile content.
On Fri, Mar 16, 2012 at 10:12 AM, yonghu yongyong...@gmail.com wrote: I noticed the problem is that somehow I lost the data from hdfs. The code is ok. Regards! Why write code? The hfile tool, http://hbase.apache.org/book.html#hfile_tool2, gives a pretty rich summary on file content (or you can ask it dump it for you). St.Ack
Re: Streaming data processing and hBase
Hi Christian, It's a component internal to HBase, so you don't have to use it directly. See http://hbase.apache.org/book/wal.html on how writes are handled by HBase to ensure reliability data distribution... Cheers, N. On Fri, Mar 16, 2012 at 7:39 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: Hi Is this memstore replicated? Since we store a significant amount of data in the memory cache we need a replicated solution. Also I can't find lots of information besides a java api doc for the MemStore class. I will continue searching for this, but if you have any URL with more documentation please send it. Thanks in advance regards Christian 8-- Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 München, Deutschland Tel.: +49 89 636-42722 Fax: +49 89 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Peter Löscher, Vorsitzender; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Süß; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322 -Ursprüngliche Nachricht- Von: N Keywal [mailto:nkey...@gmail.com] Gesendet: Freitag, 16. März 2012 18:02 An: user@hbase.apache.org Betreff: Re: Streaming data processing and hBase Hi, The way you describe the in memory caching component, it looks very similar to HBase memstore. Any reason for not relying on it? N. On Fri, Mar 16, 2012 at 4:21 PM, Kleegrewe, Christian christian.kleegr...@siemens.com wrote: Dear all, We are currently working on an architecture for a system that should be serve as an archive for 1000+ measuring components that frequently (~30/s) send messages containing measurement values (~300 bytes/message). The archiving system should be capable of not only serving as a long term storage but also as a kind of streaming data processing and caching component. There are several functions that should be computed on the incoming data before finally storing it. We suggested an architecture that comprises of: A message routing component that could route data to calculations and route calculation results to other components that are interested in these data. An in memory caching component that is used for storing up to 10 - 20 minutes of data before it is written to the long term archive. An hBase database that is used for the long term storage. MapReduce framework for doing analytics on the data stored in the hBase database. The complete system should be failsafe and reliable regarding component failures and it should scale with the number of computers that are utilized. Are there any suggestions or feedback to this approach from the community? and are there any suggestions which tools or systems to use for the message routing component and the in memory cache. Thanks for any help and suggestions all the best Christian 8--- Siemens AG Corporate Technology Corporate Research and Technologies CT T DE IT3 Otto-Hahn-Ring 6 81739 Munich, Germany Tel.: +49 89 636-42722 Fax: +49 89 636-41423 mailto:christian.kleegr...@siemens.com Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Gerhard Cromme; Managing Board: Peter Loescher, Chairman, President and Chief Executive Officer; Roland Busch, Brigitte Ederer, Klaus Helmrich, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen, Michael Suess; Registered offices: Berlin and Munich, Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 6684; WEEE-Reg.-No. DE 23691322