Re: HBase- Hive Integration
Hi all, Thank you for replies, Looking for more suggestions -- *THANKS REGARDS,* G.SAI PAVAN, CCDH4 CERTIFIED, Ph: 8121914494, *www.bigdatatrendz.com http://www.bigdatatrendz.com* linkedin profile http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/ HYDERABAD. On Fri, Mar 14, 2014 at 11:59 PM, Nick Dimiduk ndimi...@gmail.com wrote: hbase-dev to bcc; adding hive-user. this is a question for the user lists, and more for Hive's than HBase, as HBaseStorageHandler is code in Hive project, not HBase. Hi Sai, You are embarking into a brave world. Because your aim is the interop across these different Apache projects, I highly recommend using a vendor's distribution. The reason being that the vendor has done the work of certifying interoperability for you, so you won't be left finding bugs in the edges that aren't well tested in the communities. Notice in the blog post referenced, I was using a distribution and even then I've found interop issues. Hive 0.11 will require you manually specify the HBase jars. Please don't copy them around, instead use environment variables. You can follow along with my blog post and set HADOOP_CLASSPATH and HIVE_AUX_JARS_PATH appropriately. You're using HBase 0.96.1, which includes HBASE-8438, so you can use `hbase mapredcp` instead of itemizing jars manually. This requirement (hopefully) goes away entirely in Hive 0.13 with HIVE-2379 and HIVE-2055. Finally, are you using Hive on the CLI or through a web tool (Hue?). The details change slightly based on all these... details. Good luck. Keep asking questions. Please file bugs. Thanks, Nick On Thu, Mar 13, 2014 at 10:56 PM, Sai Pavan Gadde id4gpa...@gmail.com wrote: Hi, Here is my document in an attached file. It contains my flow of hive-hbase integration. 1. I created a table in mysql and imported into hbase, 2. I copied a hive-handler.jar, guva.jar and zookeeper.jar file in Hadoop-2.2.0/share/mapred/lib folder 3. copied required hive(0.11.0) jars to hbase(0.96.1.1-hadoop2) hbase jars to hive 4. Trying to create a table in hive with hbase properties but gets error as *FAILED: Error in metadata: MetaException(message:java.io.IOException: java.lang.reflect.InvocationTargetException* I think so hbase-0.96.1.1-hadoop2.jar file is not present in HBase directory. So, Suggest for further step to proceed. On Thu, Mar 13, 2014 at 7:12 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Hi Sai, If you can show us what specific exception you are seeing, we would be able to help you out in a better way. Thanks, On Thu, Mar 13, 2014 at 7:44 AM, Sai Pavan Gadde id4gpa...@gmail.com wrote: Hi, Thanks for reply... but I worked with hadoop-1.2.1 which is fine working, but need with real requirement is Hadoop-2.2.0 i.e, YARN Please show the way to get solve with apache products I Installed HBase-0.96.1.1-hadoop2 on top of hadoop-2.2.0, installed hive-0.11.0 too, Copied hive-hbase-handler-0.11.0 to hadoop-2.2.0, guva.jar, zookeeper.jar I created hbase-0.96.1.1-hadoop2.jar(which is absent in version) and worked but not getting connected with hive. I suppose to work with hadoop-0.94.x but the version is not compatible with hadoop-2.2.0 I tried in all the way, not found the solution, please suggest me On Thu, Mar 13, 2014 at 4:20 PM, Joshi, Rekha rekha_jo...@intuit.com wrote: Hi, We have successfully created external hive table pointing to hbase table with settings for hive, zookeeper, HBase security tokens. If it is a jar problem, then mostly the env is not set correctly.Hive should be able to recognize HBaseStorageHandler. Thanks Rekha http://hortonworks.com/blog/using-hive-to-interact-with-hbase-part-2/ On 3/13/14, 10:15 AM, Sai Pavan Gadde id4gpa...@gmail.com wrote: Hi, I am using hadoop-2.2.0 , HBASE-0.96.1.1-hadoop2 and hive-0.11.0 version for a project. I got a requirement such that i have to integrate hbase to hive , so that table reflection should happens. while i am doing this, i got error like TARGET INNOVOCATION EXCEPTION. This is because jar file target missing. please suggest me to overcome the problem -- *THANKS REGARDS,* G.SAI PAVAN, CCDH4 CERTIFIED, Ph: 8121914494, *www.bigdatatrendz.com http://www.bigdatatrendz.com* linkedin profile http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/ HYDERABAD. -- *THANKS REGARDS,* G.SAI PAVAN, CCDH4 CERTIFIED, Ph: 8121914494, *www.bigdatatrendz.com http://www.bigdatatrendz.com* linkedin profile http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/ HYDERABAD. -- Swarnim -- *THANKS REGARDS,* G.SAI PAVAN, CCDH4 CERTIFIED, Ph: 8121914494,
[jira] [Created] (HBASE-10770) Don't exit from the Canary daemon mode if no regions are present
Matteo Bertozzi created HBASE-10770: --- Summary: Don't exit from the Canary daemon mode if no regions are present Key: HBASE-10770 URL: https://issues.apache.org/jira/browse/HBASE-10770 Project: HBase Issue Type: Bug Affects Versions: 0.96.1.1, 0.98.1, 0.99.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Attachments: HBASE-10770-v0.patch Using the regionserver mode the canary exit if there are no region servers {code}hbase o.a.h.h.tool.Canary -daemon -regionserver REGION_NAME{code} while in table mode the canary is waiting until the table is up {code}hbase o.a.h.h.tool.Canary -daemon TABLE_NAME{code} Remove the exit code from the region server mode, and keep waiting until the -t timeout or the region shows up -- This message was sent by Atlassian JIRA (v6.2#6252)
setMaxResultSize method in Scan
Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping
答复: setMaxResultSize method in Scan
No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping
Re: 答复: setMaxResultSize method in Scan
Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
[jira] [Created] (HBASE-10772) Use ByteRanges instead of ByteBuffers in BlockCache
ramkrishna.s.vasudevan created HBASE-10772: -- Summary: Use ByteRanges instead of ByteBuffers in BlockCache Key: HBASE-10772 URL: https://issues.apache.org/jira/browse/HBASE-10772 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Try replacing the BBs with Byte Ranges in Block cache. See if this can be done in a pluggable way. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10773) Make use of ByteRanges in HFileBlock instead of ByteBuffers
ramkrishna.s.vasudevan created HBASE-10773: -- Summary: Make use of ByteRanges in HFileBlock instead of ByteBuffers Key: HBASE-10773 URL: https://issues.apache.org/jira/browse/HBASE-10773 Project: HBase Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Replacing BBs with Byte Ranges in block cache as part of HBASE-10772, would help in replacing BBs with BRs in HFileBlock also. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10774) Restore TestMultiTableInputFormat
Liu Shaohui created HBASE-10774: --- Summary: Restore TestMultiTableInputFormat Key: HBASE-10774 URL: https://issues.apache.org/jira/browse/HBASE-10774 Project: HBase Issue Type: Test Reporter: Liu Shaohui Priority: Minor TestMultiTableInputFormat is removed in HBASE-9009 for this test made the ci failed. But in HBASE-10692 we need to add a new test TestSecureMultiTableInputFormat which is depends on it. So we try to restore it in this issue. I rerun the test for several times and it passed. {code} Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 314.163 sec {code} [~stack] -- This message was sent by Atlassian JIRA (v6.2#6252)
Problem with TScan.StartRow
Hi, Here I am facing a strange Problem with TScan.StartRow property with C# Thrift API My problem is when I try to give the value as: scanFilter.StartRow = Encoding.ASCII.GetBytes(25|China|South|China South Branch|Times Square|668|2013|07|21|PM); scanFilter.EndRow = Encoding.ASCII.GetBytes(25|China|South|China South Branch|Times Square|668|2013|12|21|PM); Here the string passed in GetBytes is my row-key, but I am getting no rows. Row key is still correct. Please let me know if there is any other way to pass this startRow and EndRow.
Re: 答复: setMaxResultSize method in Scan
This method was introduced by HBASE-2214 which is in 0.96+ Can you upgrade to 0.96 or 0.98 ? Cheers On Mon, Mar 17, 2014 at 4:48 AM, Weiping Qu q...@informatik.uni-kl.de wrote: Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
Re: 答复: setMaxResultSize method in Scan
Thank you for the reply. I will check that. Cheers This method was introduced by HBASE-2214 which is in 0.96+ Can you upgrade to 0.96 or 0.98 ? Cheers On Mon, Mar 17, 2014 at 4:48 AM, Weiping Qu q...@informatik.uni-kl.de wrote: Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
RE: setMaxResultSize method in Scan
Its in 0.96 + You can use setBatch(limit) to limit numbers of key-values transmitted in one RPC call Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 3:50 AM To: dev@hbase.apache.org Subject: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
RE: 答复: setMaxResultSize method in Scan
Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
RE: Region server slowdown
I think, 0.90.6 has reached EOL a couple years ago. The best you can do right now is start planning upgrading to the latest stable 0.94 or 0.96. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Salabhanjika S [salabhanji...@gmail.com] Sent: Monday, March 17, 2014 2:55 AM To: dev@hbase.apache.org Subject: Re: Region server slowdown @Devs, please respond if you can provide me some hints on this problem. Did some more analysis. While going through the code in stack track I noticed something sub-optimal. This may not be a root cause of our slowdown but I felt it may be some thing worthy to optimize/fix. HBase is making a call to Compressor *WITHOUT* config object. This is resulting in configuration reload for every call. Should this be calling with existing config object as a parameter so that configuration reload (discovery xml parsing) will not happen so frequently? http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup {code} 309 public Compressor getCompressor() { 310 CompressionCodec codec = getCodec(conf); 311 if (codec != null) { 312 Compressor compressor = CodecPool.getCompressor(codec); 313 if (compressor != null) { {code} http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup {code} 162 public static Compressor getCompressor(CompressionCodec codec) { 163 return getCompressor(codec, null); 164 } {code} On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S salabhanji...@gmail.com wrote: Thanks for quick response Ted. - Hadoop version is 0.20.2 - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu yuzhih...@gmail.com wrote: What Hadoop version are you using ? Btw, the sentence about previous flushes was incomplete. Cheers On Mar 14, 2014, at 12:12 AM, Salabhanjika S salabhanji...@gmail.com wrote: Devs, We are using hbase version 0.90.6 (please don't complain of old version. we are in process of upgrading) in our production and we are noticing a strange problem arbitrarily for every few weeks. Region server goes extremely slow. We have to restart Region Server once this happens. There is no unique pattern of this problem. This happens on different region servers, different tables/regions and different times. Here are observations findings from our analysis. - We are using LZO compression (0.4.10). - [RS Dashboard] Flush is running for more than 6 hours. It is in creating writer status for long time. Other previous flushes (600MB to 1.5GB) takes - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor thread is in same state Configuration.loadResource regionserver60020.cacheFlusher daemon prio=10 tid=0x7efd016c4800 nid=0x35e9 runnable [0x7efcad9c5000] java.lang.Thread.State: RUNNABLE at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) - locked 0x7f02ccc2ef78 (a sun.net.www.protocol.file.FileURLConnection) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653) ... [cutting down some stack to keep mail compact. all this stack is in com.sun.org.apache.xerces...] at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200) - locked 0x7f014f1543b8 (a org.apache.hadoop.conf.Configuration) at org.apache.hadoop.conf.Configuration.get(Configuration.java:501) at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205) at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501) at
Re: 答复: setMaxResultSize method in Scan
I am doing a mult-thread(100) scan test over hbase. If one request with given key-range matches a large number of correspoding rows in hbase, my request is waiting for this scan to complete. The throughput is really slow. For test purpose, I'd like to use LIMIT to reduce the time on scanning and transferring results back from hbase to increase the throughput. Do you think the hbase.client.scan.max.result.size or setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT before scanning complete corresponding rows? As you mentioned that there is no query optimizer in HBase, I assume that region servers will not stop scanning the rows in this key-range in this case until it gets all the results and limit the results to max size which is sent to the client. If so, there is not much I can do to compare the throughput with that in relational databases like MySQL. Thanks, Cheers. Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
[jira] [Resolved] (HBASE-10027) CompoundRowPrefixFilter
[ https://issues.apache.org/jira/browse/HBASE-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju resolved HBASE-10027. Resolution: Fixed CompoundRowPrefixFilter --- Key: HBASE-10027 URL: https://issues.apache.org/jira/browse/HBASE-10027 Project: HBase Issue Type: New Feature Components: Filters Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Fix For: 0.89-fb Original Estimate: 168h Remaining Estimate: 168h In order to scan a sequence of row prefixes optimally, this filter will provide a hint to the Scanner via the scan query matcher to go to the next prefix after finishing scanning with the current range. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10437) Integrating CompoundRowPrefixFilter with RowPrefixBloomFilter
[ https://issues.apache.org/jira/browse/HBASE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju resolved HBASE-10437. Resolution: Fixed Integrating CompoundRowPrefixFilter with RowPrefixBloomFilter - Key: HBASE-10437 URL: https://issues.apache.org/jira/browse/HBASE-10437 Project: HBase Issue Type: New Feature Components: Scanners Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Minor Fix For: 0.89-fb Original Estimate: 72h Remaining Estimate: 72h Adding the changes to Filter which can be used to incorporate the bloom filter optimizations into the CompoundRowPrefixFilter. Having the context of the bloom filters from inside the filter gives a lot of benefit in terms of performance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-9828) Fix the reflection call in SequenceFileLogReader
[ https://issues.apache.org/jira/browse/HBASE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manukranth Kolloju resolved HBASE-9828. --- Resolution: Fixed Fix the reflection call in SequenceFileLogReader Key: HBASE-9828 URL: https://issues.apache.org/jira/browse/HBASE-9828 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Priority: Trivial Fix For: 0.89-fb Original Estimate: 24h Remaining Estimate: 24h In SequenceFileLogReader, Class.getMethod() fails to get a private method in the class. So converting it to Class.getDeclaredMethod(). -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: 答复: setMaxResultSize method in Scan
Hi Weiping, Take a look at Apache Phoenix (http://phoenix.incubator.apache.org/). It's a SQL layer on top of HBase and has support for LIMIT and a query planner and optimizer. Thanks, James On Mon, Mar 17, 2014 at 12:19 PM, Weiping Qu q...@informatik.uni-kl.dewrote: I am doing a mult-thread(100) scan test over hbase. If one request with given key-range matches a large number of correspoding rows in hbase, my request is waiting for this scan to complete. The throughput is really slow. For test purpose, I'd like to use LIMIT to reduce the time on scanning and transferring results back from hbase to increase the throughput. Do you think the hbase.client.scan.max.result.size or setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT before scanning complete corresponding rows? As you mentioned that there is no query optimizer in HBase, I assume that region servers will not stop scanning the rows in this key-range in this case until it gets all the results and limit the results to max size which is sent to the client. If so, there is not much I can do to compare the throughput with that in relational databases like MySQL. Thanks, Cheers. Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
Re: 答复: setMaxResultSize method in Scan
Hi James, Thank you for reminding me of that. Cheers, Weiping Hi Weiping, Take a look at Apache Phoenix (http://phoenix.incubator.apache.org/). It's a SQL layer on top of HBase and has support for LIMIT and a query planner and optimizer. Thanks, James On Mon, Mar 17, 2014 at 12:19 PM, Weiping Qu q...@informatik.uni-kl.dewrote: I am doing a mult-thread(100) scan test over hbase. If one request with given key-range matches a large number of correspoding rows in hbase, my request is waiting for this scan to complete. The throughput is really slow. For test purpose, I'd like to use LIMIT to reduce the time on scanning and transferring results back from hbase to increase the throughput. Do you think the hbase.client.scan.max.result.size or setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT before scanning complete corresponding rows? As you mentioned that there is no query optimizer in HBase, I assume that region servers will not stop scanning the rows in this key-range in this case until it gets all the results and limit the results to max size which is sent to the client. If so, there is not much I can do to compare the throughput with that in relational databases like MySQL. Thanks, Cheers. Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
[jira] [Created] (HBASE-10775) Revisit the HBASE-4012 byte comprare optimization
stack created HBASE-10775: - Summary: Revisit the HBASE-4012 byte comprare optimization Key: HBASE-10775 URL: https://issues.apache.org/jira/browse/HBASE-10775 Project: HBase Issue Type: Task Components: Performance Reporter: stack Some recent findings have it that we undo our HBASE-4012 Unsafe byte compare optimizations. See tail of the original issue for findings that show compare slower if no bytes in common, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: 答复: setMaxResultSize method in Scan
HBase RegionServer does scanning in batches, client requests next batch from server and server reads and merge the data from cache/disk. You can control batch data size by setting both: Scan.setRowCaching(number of rows to send in one RPC request) Technically speaking, this allows you to control LIMIT from the client side. Your overhead will never be larger than the limit set by setRowCaching. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 12:19 PM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan I am doing a mult-thread(100) scan test over hbase. If one request with given key-range matches a large number of correspoding rows in hbase, my request is waiting for this scan to complete. The throughput is really slow. For test purpose, I'd like to use LIMIT to reduce the time on scanning and transferring results back from hbase to increase the throughput. Do you think the hbase.client.scan.max.result.size or setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT before scanning complete corresponding rows? As you mentioned that there is no query optimizer in HBase, I assume that region servers will not stop scanning the rows in this key-range in this case until it gets all the results and limit the results to max size which is sent to the client. If so, there is not much I can do to compare the throughput with that in relational databases like MySQL. Thanks, Cheers. Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
Re: 答复: setMaxResultSize method in Scan
bq. Scan.setRowCaching() I think you meant Scan.setCaching() On Mon, Mar 17, 2014 at 1:28 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: HBase RegionServer does scanning in batches, client requests next batch from server and server reads and merge the data from cache/disk. You can control batch data size by setting both: Scan.setRowCaching(number of rows to send in one RPC request) Technically speaking, this allows you to control LIMIT from the client side. Your overhead will never be larger than the limit set by setRowCaching. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 12:19 PM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan I am doing a mult-thread(100) scan test over hbase. If one request with given key-range matches a large number of correspoding rows in hbase, my request is waiting for this scan to complete. The throughput is really slow. For test purpose, I'd like to use LIMIT to reduce the time on scanning and transferring results back from hbase to increase the throughput. Do you think the hbase.client.scan.max.result.size or setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT before scanning complete corresponding rows? As you mentioned that there is no query optimizer in HBase, I assume that region servers will not stop scanning the rows in this key-range in this case until it gets all the results and limit the results to max size which is sent to the client. If so, there is not much I can do to compare the throughput with that in relational databases like MySQL. Thanks, Cheers. Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331
Re: Region server slowdown
Hi Agreed with Vladimir. I doubt anybody will spend the time to debug the issue. It would be easier if you can upgrade your HBase cluster. Also you will have to upgrade your Hadoop cluster as well. You should go with 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book for the upgrade process. Enis On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: I think, 0.90.6 has reached EOL a couple years ago. The best you can do right now is start planning upgrading to the latest stable 0.94 or 0.96. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Salabhanjika S [salabhanji...@gmail.com] Sent: Monday, March 17, 2014 2:55 AM To: dev@hbase.apache.org Subject: Re: Region server slowdown @Devs, please respond if you can provide me some hints on this problem. Did some more analysis. While going through the code in stack track I noticed something sub-optimal. This may not be a root cause of our slowdown but I felt it may be some thing worthy to optimize/fix. HBase is making a call to Compressor *WITHOUT* config object. This is resulting in configuration reload for every call. Should this be calling with existing config object as a parameter so that configuration reload (discovery xml parsing) will not happen so frequently? http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup {code} 309 public Compressor getCompressor() { 310 CompressionCodec codec = getCodec(conf); 311 if (codec != null) { 312 Compressor compressor = CodecPool.getCompressor(codec); 313 if (compressor != null) { {code} http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup {code} 162 public static Compressor getCompressor(CompressionCodec codec) { 163 return getCompressor(codec, null); 164 } {code} On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S salabhanji...@gmail.com wrote: Thanks for quick response Ted. - Hadoop version is 0.20.2 - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu yuzhih...@gmail.com wrote: What Hadoop version are you using ? Btw, the sentence about previous flushes was incomplete. Cheers On Mar 14, 2014, at 12:12 AM, Salabhanjika S salabhanji...@gmail.com wrote: Devs, We are using hbase version 0.90.6 (please don't complain of old version. we are in process of upgrading) in our production and we are noticing a strange problem arbitrarily for every few weeks. Region server goes extremely slow. We have to restart Region Server once this happens. There is no unique pattern of this problem. This happens on different region servers, different tables/regions and different times. Here are observations findings from our analysis. - We are using LZO compression (0.4.10). - [RS Dashboard] Flush is running for more than 6 hours. It is in creating writer status for long time. Other previous flushes (600MB to 1.5GB) takes - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor thread is in same state Configuration.loadResource regionserver60020.cacheFlusher daemon prio=10 tid=0x7efd016c4800 nid=0x35e9 runnable [0x7efcad9c5000] java.lang.Thread.State: RUNNABLE at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) - locked 0x7f02ccc2ef78 (a sun.net.www.protocol.file.FileURLConnection) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653) ... [cutting down some stack to keep mail compact. all this stack is in com.sun.org.apache.xerces...] at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200) - locked 0x7f014f1543b8 (a org.apache.hadoop.conf.Configuration) at org.apache.hadoop.conf.Configuration.get(Configuration.java:501) at com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205) at com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) at
[jira] [Created] (HBASE-10776) Separate HConnectionManager into several parts
Yi Deng created HBASE-10776: --- Summary: Separate HConnectionManager into several parts Key: HBASE-10776 URL: https://issues.apache.org/jira/browse/HBASE-10776 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.89-fb Reporter: Yi Deng Priority: Minor Fix For: 0.89-fb HConnectionManager is too large to effectively maintain. This Jira records some refactoring jobs: 1. Move TableServers out as a standalone class 2. Move region-locating code as a class -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10777) Restart Embedded Thrift Server in case of failures
Rishit Shroff created HBASE-10777: - Summary: Restart Embedded Thrift Server in case of failures Key: HBASE-10777 URL: https://issues.apache.org/jira/browse/HBASE-10777 Project: HBase Issue Type: Bug Reporter: Rishit Shroff Priority: Trivial -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10778) Unique keys accounting in MultiThreadedReader is incorrect
Ted Yu created HBASE-10778: -- Summary: Unique keys accounting in MultiThreadedReader is incorrect Key: HBASE-10778 URL: https://issues.apache.org/jira/browse/HBASE-10778 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu TestMiniClusterLoad* tests fail in 10070 branch. Here is one example: {code} ava.lang.AssertionError: expected:0 but was:7 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential.runLoadTestOnExistingTable(TestMiniClusterLoadSequential.java:139) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: 答复: setMaxResultSize method in Scan
Sure. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Ted Yu [yuzhih...@gmail.com] Sent: Monday, March 17, 2014 1:34 PM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan bq. Scan.setRowCaching() I think you meant Scan.setCaching() On Mon, Mar 17, 2014 at 1:28 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: HBase RegionServer does scanning in batches, client requests next batch from server and server reads and merge the data from cache/disk. You can control batch data size by setting both: Scan.setRowCaching(number of rows to send in one RPC request) Technically speaking, this allows you to control LIMIT from the client side. Your overhead will never be larger than the limit set by setRowCaching. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 12:19 PM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan I am doing a mult-thread(100) scan test over hbase. If one request with given key-range matches a large number of correspoding rows in hbase, my request is waiting for this scan to complete. The throughput is really slow. For test purpose, I'd like to use LIMIT to reduce the time on scanning and transferring results back from hbase to increase the throughput. Do you think the hbase.client.scan.max.result.size or setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT before scanning complete corresponding rows? As you mentioned that there is no query optimizer in HBase, I assume that region servers will not stop scanning the rows in this key-range in this case until it gets all the results and limit the results to max size which is sent to the client. If so, there is not much I can do to compare the throughput with that in relational databases like MySQL. Thanks, Cheers. Limit clause in SQL Select statement makes sense because it allows query optimizer to plan accordingly. It does not make sense in HBase as since there is no query planner and or optimization involved during scanning HBase table. You can easily mimic this functionality on a client side (I mean - limit). Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Weiping Qu [q...@informatik.uni-kl.de] Sent: Monday, March 17, 2014 4:48 AM To: dev@hbase.apache.org Subject: Re: 答复: setMaxResultSize method in Scan Thanks. I'd like to assume that setMaxResultSize is equivalent to the SQL Limit clause as expected which is specified each time a SQL statement is executed . Now through hbase.client.scanner.max.result.size can the limitation of number of row returned only apply to all the scanner instances. I am wondering why the setMaxResultSize is removed now. No such method for Scan in 0.94.x. If you want to set the max result size for a scan, you can achieve this by setting the hbase.client.scanner.max.result.size configuration, the default for which is Long.MAX_VALUE (no limited) 发件人: Weiping Qu [q...@informatik.uni-kl.de] 发送时间: 2014年3月17日 18:50 收件人: dev@hbase.apache.org 主题: setMaxResultSize method in Scan Hello, I could not find the method setMaxResultSize(long m) ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) in my Scanclass (0.94.13 version). Can anyone help me? Thanks Weiping -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu (at) informatik.uni-kl.de Phone: +49 631 205 3264 Fax: +49 631 205 3299 Room: 36/331 Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments. -- Mit freundlichen Grü?en / Kind Regards *Weiping Qu* University of Kaiserslautern Department of Computer Science Heterogeneous Information Systems Group P.O. Box 3049 67653 Kaiserslautern, Germany Email: qu
The 1st HBase 0.98.1 release candidate (RC0) is available
The 1st HBase 0.98.1 release candidate (RC0) is available for download at http://people.apache.org/~apurtell/0.98.1RC0/ and Maven artifacts are also available in the temporary repository https://repository.apache.org/content/repositories/orgapachehbase-1007. Signed with my code signing key D5365CCD. The issues resolved in this release can be found here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12325664 This vote will run for 14 days given that a few RCs have stacked up this month. Please try out the candidate and vote +1/-1 by midnight Pacific Time (00:00 -0800 GMT) on February 31 on whether or not we should release this as 0.98.1. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: The 1st HBase 0.98.1 release candidate (RC0) is available
Will give RC a spin. bq. on February 31 possibly a typo: February - March. Cheers On Mon, Mar 17, 2014 at 4:42 PM, Andrew Purtell apurt...@apache.org wrote: The 1st HBase 0.98.1 release candidate (RC0) is available for download at http://people.apache.org/~apurtell/0.98.1RC0/ and Maven artifacts are also available in the temporary repository https://repository.apache.org/content/repositories/orgapachehbase-1007. Signed with my code signing key D5365CCD. The issues resolved in this release can be found here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12325664 This vote will run for 14 days given that a few RCs have stacked up this month. Please try out the candidate and vote +1/-1 by midnight Pacific Time (00:00 -0800 GMT) on February 31 on whether or not we should release this as 0.98.1. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[VOTE] The 1st HBase 0.98.1 release candidate (RC0) is available
Adding VOTE tag to subject and a clarification (pardon the typo): This vote will run for 14 days given that a few RCs have stacked up this month. Please try out the candidate and vote +1/-1 by midnight Pacific Time (00:00 -0800 GMT) on March 31 on whether or not we should release this as 0.98.1. On Mon, Mar 17, 2014 at 4:42 PM, Andrew Purtell apurt...@apache.org wrote: The 1st HBase 0.98.1 release candidate (RC0) is available for download at http://people.apache.org/~apurtell/0.98.1RC0/ and Maven artifacts are also available in the temporary repository https://repository.apache.org/content/repositories/orgapachehbase-1007. Signed with my code signing key D5365CCD. The issues resolved in this release can be found here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12325664 This vote will run for 14 days given that a few RCs have stacked up this month. Please try out the candidate and vote +1/-1 by midnight Pacific Time (00:00 -0800 GMT) on February 31 on whether or not we should release this as 0.98.1. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[jira] [Resolved] (HBASE-10778) Unique keys accounting in MultiThreadedReader is incorrect
[ https://issues.apache.org/jira/browse/HBASE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-10778. Resolution: Fixed Unique keys accounting in MultiThreadedReader is incorrect -- Key: HBASE-10778 URL: https://issues.apache.org/jira/browse/HBASE-10778 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu Fix For: hbase-10070 Attachments: 10778-v1.txt TestMiniClusterLoad* tests fail in 10070 branch. Here is one example: {code} ava.lang.AssertionError: expected:0 but was:7 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential.runLoadTestOnExistingTable(TestMiniClusterLoadSequential.java:139) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10735) [WINDOWS] Set -XX:MaxPermSize for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-10735. --- Resolution: Fixed [WINDOWS] Set -XX:MaxPermSize for unit tests Key: HBASE-10735 URL: https://issues.apache.org/jira/browse/HBASE-10735 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Trivial Fix For: 0.96.2, 0.99.0, 0.98.2 Attachments: hbase-10735_v1.patch, hbase-10735_v2.patch The tests on windows fail with java.lang.OutOfMemoryError: PermGen space. We have -XX:MaxPermSize=100m for tests on linux. We should just set it the same in windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
Introducing libhbase (C APIs for Apache HBase)
Hi, Pursuant to the JIRAs HBASE-10168https://issues.apache.org/jira/browse/HBASE-10168, HBASE-9977 https://issues.apache.org/jira/browse/HBASE-9977 and HBASE-9835https://issues.apache.org/jira/browse/HBASE-9835I am happy to announce that the first draft of a JNI based implementation of C APIs for HBase is now available for your review. The source and instructions to build and use is available at MapR's Github repository http://goo.gl/dE5tzB. A slide from my presentation on the same can be downloaded from the meetup site http://goo.gl/nfXx9f. Would put the patches on the respective JIRA shortly. Regards, aditya... https://issues.apache.org/jira/browse/HBASE-9835
Re: Region server slowdown
Thanks Rodinov Enis for responding. I agree with you that we need to upgrade. As I mentioned in my first mail, we are in process of upgrade. We are using hbase version 0.90.6 (please don't complain of old version. we are in process of upgrading) - Suboptimal (as per me) code snippets I posted in followup mail holds good for trunk as well. - I strongly feel this issue has something to do with HBase version. I verified the code paths of the stack I posted. I don't see any significant changes in current version in this code (Flusher - getCompressor). On Tue, Mar 18, 2014 at 2:30 AM, Enis Söztutar enis@gmail.com wrote: Hi Agreed with Vladimir. I doubt anybody will spend the time to debug the issue. It would be easier if you can upgrade your HBase cluster. Also you will have to upgrade your Hadoop cluster as well. You should go with 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book for the upgrade process. Enis On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov vrodio...@carrieriq.com wrote: I think, 0.90.6 has reached EOL a couple years ago. The best you can do right now is start planning upgrading to the latest stable 0.94 or 0.96. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Salabhanjika S [salabhanji...@gmail.com] Sent: Monday, March 17, 2014 2:55 AM To: dev@hbase.apache.org Subject: Re: Region server slowdown @Devs, please respond if you can provide me some hints on this problem. Did some more analysis. While going through the code in stack track I noticed something sub-optimal. This may not be a root cause of our slowdown but I felt it may be some thing worthy to optimize/fix. HBase is making a call to Compressor *WITHOUT* config object. This is resulting in configuration reload for every call. Should this be calling with existing config object as a parameter so that configuration reload (discovery xml parsing) will not happen so frequently? http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup {code} 309 public Compressor getCompressor() { 310 CompressionCodec codec = getCodec(conf); 311 if (codec != null) { 312 Compressor compressor = CodecPool.getCompressor(codec); 313 if (compressor != null) { {code} http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup {code} 162 public static Compressor getCompressor(CompressionCodec codec) { 163 return getCompressor(codec, null); 164 } {code} On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S salabhanji...@gmail.com wrote: Thanks for quick response Ted. - Hadoop version is 0.20.2 - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu yuzhih...@gmail.com wrote: What Hadoop version are you using ? Btw, the sentence about previous flushes was incomplete. Cheers On Mar 14, 2014, at 12:12 AM, Salabhanjika S salabhanji...@gmail.com wrote: Devs, We are using hbase version 0.90.6 (please don't complain of old version. we are in process of upgrading) in our production and we are noticing a strange problem arbitrarily for every few weeks. Region server goes extremely slow. We have to restart Region Server once this happens. There is no unique pattern of this problem. This happens on different region servers, different tables/regions and different times. Here are observations findings from our analysis. - We are using LZO compression (0.4.10). - [RS Dashboard] Flush is running for more than 6 hours. It is in creating writer status for long time. Other previous flushes (600MB to 1.5GB) takes - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor thread is in same state Configuration.loadResource regionserver60020.cacheFlusher daemon prio=10 tid=0x7efd016c4800 nid=0x35e9 runnable [0x7efcad9c5000] java.lang.Thread.State: RUNNABLE at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) - locked 0x7f02ccc2ef78 (a sun.net.www.protocol.file.FileURLConnection) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653) ... [cutting down some stack to keep mail compact. all this stack is in com.sun.org.apache.xerces...] at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) at
[jira] [Created] (HBASE-10779) Doc hadoop1 deprecated in 0.98 and NOT supported in hbase 1.0
stack created HBASE-10779: - Summary: Doc hadoop1 deprecated in 0.98 and NOT supported in hbase 1.0 Key: HBASE-10779 URL: https://issues.apache.org/jira/browse/HBASE-10779 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Do first two bullet items from parent issue adding doc to our hadoop support matrix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10780) HFilePrettyPrinter#processFile should return immediately if file does not exists.
Ashish Singhi created HBASE-10780: - Summary: HFilePrettyPrinter#processFile should return immediately if file does not exists. Key: HBASE-10780 URL: https://issues.apache.org/jira/browse/HBASE-10780 Project: HBase Issue Type: Bug Components: HFile Affects Versions: 0.94.11 Reporter: Ashish Singhi Priority: Minor HFilePrettyPrinter#processFile should return immediately if file does not exists same like HLogPrettyPrinter#run {code} if (!fs.exists(file)) { System.err.println(ERROR, file doesnt exist: + file); }{code} -- This message was sent by Atlassian JIRA (v6.2#6252)