Re: HBase- Hive Integration

2014-03-17 Thread Sai Pavan Gadde
Hi all,

Thank you for replies,

Looking for more suggestions

-- 
*THANKS  REGARDS,*
G.SAI PAVAN,
CCDH4 CERTIFIED,
Ph: 8121914494,
*www.bigdatatrendz.com http://www.bigdatatrendz.com*
linkedin profile http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/
HYDERABAD.



On Fri, Mar 14, 2014 at 11:59 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 hbase-dev to bcc; adding hive-user. this is a question for the user lists,
 and more for Hive's than HBase, as HBaseStorageHandler is code in Hive
 project, not HBase.

 Hi Sai,

 You are embarking into a brave world. Because your aim is the interop
 across these different Apache projects, I highly recommend using a vendor's
 distribution. The reason being that the vendor has done the work of
 certifying interoperability for you, so you won't be left finding bugs in
 the edges that aren't well tested in the communities. Notice in the blog
 post referenced, I was using a distribution and even then I've found
 interop issues.

 Hive 0.11 will require you manually specify the HBase jars. Please don't
 copy them around, instead use environment variables. You can follow along
 with my blog post and set HADOOP_CLASSPATH and HIVE_AUX_JARS_PATH
 appropriately. You're using HBase 0.96.1, which includes HBASE-8438, so you
 can use `hbase mapredcp` instead of itemizing jars manually. This
 requirement (hopefully) goes away entirely in Hive 0.13 with HIVE-2379
 and HIVE-2055.

 Finally, are you using Hive on the CLI or through a web tool (Hue?). The
 details change slightly based on all these... details.

 Good luck. Keep asking questions. Please file bugs.

 Thanks,
 Nick


 On Thu, Mar 13, 2014 at 10:56 PM, Sai Pavan Gadde id4gpa...@gmail.com
 wrote:

  Hi,
 
  Here is my document in an attached file. It contains my flow of
 hive-hbase
  integration.
  1. I created a table in mysql and imported into hbase,
  2. I copied a hive-handler.jar, guva.jar and zookeeper.jar file in
  Hadoop-2.2.0/share/mapred/lib folder
  3. copied required hive(0.11.0) jars to hbase(0.96.1.1-hadoop2)  hbase
  jars to hive
  4. Trying to create a table in hive with hbase properties but gets error
 as
 
 
  *FAILED: Error in metadata: MetaException(message:java.io.IOException:
  java.lang.reflect.InvocationTargetException*
 
 
  I think so hbase-0.96.1.1-hadoop2.jar file is not present in  HBase
  directory.
 
  So, Suggest for further step to proceed.
 
 
  On Thu, Mar 13, 2014 at 7:12 PM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
  Hi Sai,
 
  If you can show us what specific exception you are seeing, we would be
  able
  to help you out in a better way.
 
  Thanks,
 
 
  On Thu, Mar 13, 2014 at 7:44 AM, Sai Pavan Gadde id4gpa...@gmail.com
  wrote:
 
   Hi,
   Thanks for reply... but
  
   I worked with hadoop-1.2.1 which is fine working, but need with real
   requirement is Hadoop-2.2.0 i.e, YARN
  
   Please show the way to get solve with apache products
  
   I Installed HBase-0.96.1.1-hadoop2 on top of hadoop-2.2.0, installed
   hive-0.11.0 too,
  
   Copied hive-hbase-handler-0.11.0 to hadoop-2.2.0, guva.jar,
  zookeeper.jar
  
   I created hbase-0.96.1.1-hadoop2.jar(which is absent in version) and
  worked
   but not getting connected with hive. I suppose to work with
  hadoop-0.94.x
   but the version is not compatible with hadoop-2.2.0
  
   I tried in all the way, not found the solution, please suggest me
  
  
  
  
   On Thu, Mar 13, 2014 at 4:20 PM, Joshi, Rekha rekha_jo...@intuit.com
   wrote:
  
Hi,
   
We have successfully created external hive table pointing to hbase
  table
with settings for hive, zookeeper, HBase security tokens.
   
If it is a jar problem, then mostly the env is not set
 correctly.Hive
should be able to recognize HBaseStorageHandler.
   
Thanks
Rekha
   
   
 http://hortonworks.com/blog/using-hive-to-interact-with-hbase-part-2/
   
   
   
On 3/13/14, 10:15 AM, Sai Pavan Gadde id4gpa...@gmail.com
 wrote:
   
Hi,

I am using hadoop-2.2.0 , HBASE-0.96.1.1-hadoop2 and hive-0.11.0
  version
for a project. I got a requirement such that i have to integrate
  hbase
   to
hive , so that table reflection should happens.

while  i am doing this, i got error like TARGET INNOVOCATION
  EXCEPTION.
This is because jar file target missing. please suggest me to
  overcome
   the
problem

--
*THANKS  REGARDS,*
G.SAI PAVAN,
CCDH4 CERTIFIED,
Ph: 8121914494,
*www.bigdatatrendz.com http://www.bigdatatrendz.com*
linkedin profile 
   http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/
HYDERABAD.
   
   
  
  
   --
   *THANKS  REGARDS,*
   G.SAI PAVAN,
   CCDH4 CERTIFIED,
   Ph: 8121914494,
   *www.bigdatatrendz.com http://www.bigdatatrendz.com*
   linkedin profile 
  http://in.linkedin.com/pub/gadde-sai-pavan/38/44b/453/
   HYDERABAD.
  
 
 
 
  --
  Swarnim
 
 
 
 
  --
  *THANKS  REGARDS,*
  G.SAI PAVAN,
  CCDH4 CERTIFIED,
  Ph: 8121914494,
  

[jira] [Created] (HBASE-10770) Don't exit from the Canary daemon mode if no regions are present

2014-03-17 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-10770:
---

 Summary: Don't exit from the Canary daemon mode if no regions are 
present
 Key: HBASE-10770
 URL: https://issues.apache.org/jira/browse/HBASE-10770
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.1.1, 0.98.1, 0.99.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Attachments: HBASE-10770-v0.patch

Using the regionserver mode the canary exit if there are no region servers
{code}hbase o.a.h.h.tool.Canary -daemon -regionserver REGION_NAME{code}
while in table mode the canary is waiting until the table is up
{code}hbase o.a.h.h.tool.Canary -daemon TABLE_NAME{code}

Remove the exit code from the region server mode, and keep waiting until  the 
-t timeout or the region shows up



--
This message was sent by Atlassian JIRA
(v6.2#6252)


setMaxResultSize method in Scan

2014-03-17 Thread Weiping Qu

Hello,

I could not find the method setMaxResultSize(long m) 
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html) 
in my Scanclass (0.94.13 version).

Can anyone help me? Thanks

Weiping


答复: setMaxResultSize method in Scan

2014-03-17 Thread 冯宏华
No such method for Scan in 0.94.x.

If you want to set the max result size for a scan, you can achieve this by 
setting the hbase.client.scanner.max.result.size configuration, the default 
for which is Long.MAX_VALUE (no limited)

发件人: Weiping Qu [q...@informatik.uni-kl.de]
发送时间: 2014年3月17日 18:50
收件人: dev@hbase.apache.org
主题: setMaxResultSize method in Scan

Hello,

I could not find the method setMaxResultSize(long m)
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
in my Scanclass (0.94.13 version).
Can anyone help me? Thanks

Weiping


Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Weiping Qu
Thanks.

I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
clause as expected which is specified each time a SQL statement is
executed .
Now through hbase.client.scanner.max.result.size can the limitation of
number of row returned only apply to all the scanner instances.
I am wondering why the setMaxResultSize is removed now.

 No such method for Scan in 0.94.x.

 If you want to set the max result size for a scan, you can achieve this by 
 setting the hbase.client.scanner.max.result.size configuration, the default 
 for which is Long.MAX_VALUE (no limited)
 
 发件人: Weiping Qu [q...@informatik.uni-kl.de]
 发送时间: 2014年3月17日 18:50
 收件人: dev@hbase.apache.org
 主题: setMaxResultSize method in Scan

 Hello,

 I could not find the method setMaxResultSize(long m)
 (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
 in my Scanclass (0.94.13 version).
 Can anyone help me? Thanks

 Weiping


-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331


[jira] [Created] (HBASE-10772) Use ByteRanges instead of ByteBuffers in BlockCache

2014-03-17 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-10772:
--

 Summary: Use ByteRanges instead of ByteBuffers in BlockCache
 Key: HBASE-10772
 URL: https://issues.apache.org/jira/browse/HBASE-10772
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Try replacing the BBs with Byte Ranges in Block cache.  See if this can be done 
in a pluggable way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10773) Make use of ByteRanges in HFileBlock instead of ByteBuffers

2014-03-17 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-10773:
--

 Summary: Make use of ByteRanges in HFileBlock instead of 
ByteBuffers
 Key: HBASE-10773
 URL: https://issues.apache.org/jira/browse/HBASE-10773
 Project: HBase
  Issue Type: Sub-task
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Replacing BBs with Byte Ranges  in block cache as part of HBASE-10772, would 
help in replacing BBs with BRs in HFileBlock also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10774) Restore TestMultiTableInputFormat

2014-03-17 Thread Liu Shaohui (JIRA)
Liu Shaohui created HBASE-10774:
---

 Summary: Restore TestMultiTableInputFormat
 Key: HBASE-10774
 URL: https://issues.apache.org/jira/browse/HBASE-10774
 Project: HBase
  Issue Type: Test
Reporter: Liu Shaohui
Priority: Minor


TestMultiTableInputFormat is removed in HBASE-9009 for this test made the ci 
failed. But in HBASE-10692 we need to add a new test 
TestSecureMultiTableInputFormat which is depends on it. So we try to restore it 
in this issue.

I rerun the test for several times and it passed.
{code}
Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 314.163 sec
{code}

[~stack]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Problem with TScan.StartRow

2014-03-17 Thread Umesh Chaudhary
Hi,
Here I am facing a strange Problem with TScan.StartRow property with C# Thrift 
API
My problem is when I try to give the value as:

scanFilter.StartRow = Encoding.ASCII.GetBytes(25|China|South|China South 
Branch|Times Square|668|2013|07|21|PM);
scanFilter.EndRow = Encoding.ASCII.GetBytes(25|China|South|China South 
Branch|Times Square|668|2013|12|21|PM);

Here the string passed in GetBytes is my row-key, but I am getting no rows. Row 
key is still correct.

Please let me know if there is any other way to pass this startRow and EndRow.




Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Ted Yu
This method was introduced by HBASE-2214 which is in 0.96+

Can you upgrade to 0.96 or 0.98 ?

Cheers


On Mon, Mar 17, 2014 at 4:48 AM, Weiping Qu q...@informatik.uni-kl.de wrote:

 Thanks.

 I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
 clause as expected which is specified each time a SQL statement is
 executed .
 Now through hbase.client.scanner.max.result.size can the limitation of
 number of row returned only apply to all the scanner instances.
 I am wondering why the setMaxResultSize is removed now.

  No such method for Scan in 0.94.x.
 
  If you want to set the max result size for a scan, you can achieve this
 by setting the hbase.client.scanner.max.result.size configuration, the
 default for which is Long.MAX_VALUE (no limited)
  
  发件人: Weiping Qu [q...@informatik.uni-kl.de]
  发送时间: 2014年3月17日 18:50
  收件人: dev@hbase.apache.org
  主题: setMaxResultSize method in Scan
 
  Hello,
 
  I could not find the method setMaxResultSize(long m)
  (
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
  in my Scanclass (0.94.13 version).
  Can anyone help me? Thanks
 
  Weiping


 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331



Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Weiping Qu
Thank you for the reply.
I will check that.

Cheers
 This method was introduced by HBASE-2214 which is in 0.96+

 Can you upgrade to 0.96 or 0.98 ?

 Cheers


 On Mon, Mar 17, 2014 at 4:48 AM, Weiping Qu q...@informatik.uni-kl.de wrote:

 Thanks.

 I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
 clause as expected which is specified each time a SQL statement is
 executed .
 Now through hbase.client.scanner.max.result.size can the limitation of
 number of row returned only apply to all the scanner instances.
 I am wondering why the setMaxResultSize is removed now.

 No such method for Scan in 0.94.x.

 If you want to set the max result size for a scan, you can achieve this
 by setting the hbase.client.scanner.max.result.size configuration, the
 default for which is Long.MAX_VALUE (no limited)
 
 发件人: Weiping Qu [q...@informatik.uni-kl.de]
 发送时间: 2014年3月17日 18:50
 收件人: dev@hbase.apache.org
 主题: setMaxResultSize method in Scan

 Hello,

 I could not find the method setMaxResultSize(long m)
 (
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
 in my Scanclass (0.94.13 version).
 Can anyone help me? Thanks

 Weiping

 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331



-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331


RE: setMaxResultSize method in Scan

2014-03-17 Thread Vladimir Rodionov
Its in 0.96 +

You can use setBatch(limit) to limit numbers of key-values transmitted in one 
RPC call

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Weiping Qu [q...@informatik.uni-kl.de]
Sent: Monday, March 17, 2014 3:50 AM
To: dev@hbase.apache.org
Subject: setMaxResultSize method in Scan

Hello,

I could not find the method setMaxResultSize(long m)
(http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
in my Scanclass (0.94.13 version).
Can anyone help me? Thanks

Weiping

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


RE: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Vladimir Rodionov
Limit clause in SQL Select statement makes sense because it allows query 
optimizer to plan accordingly.
It does not make sense in HBase as since there is no query planner and or 
optimization involved during
scanning HBase table. You can easily mimic this functionality on a client side 
(I mean - limit).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Weiping Qu [q...@informatik.uni-kl.de]
Sent: Monday, March 17, 2014 4:48 AM
To: dev@hbase.apache.org
Subject: Re: 答复: setMaxResultSize method in Scan

Thanks.

I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
clause as expected which is specified each time a SQL statement is
executed .
Now through hbase.client.scanner.max.result.size can the limitation of
number of row returned only apply to all the scanner instances.
I am wondering why the setMaxResultSize is removed now.

 No such method for Scan in 0.94.x.

 If you want to set the max result size for a scan, you can achieve this by 
 setting the hbase.client.scanner.max.result.size configuration, the default 
 for which is Long.MAX_VALUE (no limited)
 
 发件人: Weiping Qu [q...@informatik.uni-kl.de]
 发送时间: 2014年3月17日 18:50
 收件人: dev@hbase.apache.org
 主题: setMaxResultSize method in Scan

 Hello,

 I could not find the method setMaxResultSize(long m)
 (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
 in my Scanclass (0.94.13 version).
 Can anyone help me? Thanks

 Weiping


--
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or notificati...@carrieriq.com and delete or destroy any 
copy of this message and its attachments.


RE: Region server slowdown

2014-03-17 Thread Vladimir Rodionov
I think, 0.90.6 has reached EOL a couple years ago. The best you can do right 
now is
start planning upgrading to the latest stable 0.94 or 0.96.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Salabhanjika S [salabhanji...@gmail.com]
Sent: Monday, March 17, 2014 2:55 AM
To: dev@hbase.apache.org
Subject: Re: Region server slowdown

@Devs, please respond if you can provide me some hints on this problem.

Did some more analysis. While going through the code in stack track I
noticed something sub-optimal.
This may not be a root cause of our slowdown but I felt it may be some
thing worthy to optimize/fix.

HBase is making a call to Compressor *WITHOUT* config object. This is
resulting in configuration reload for every call.
Should this be calling with existing config object as a parameter so
that configuration reload (discovery  xml parsing) will not happen so
frequently?

http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
{code}
309 public Compressor getCompressor() {
310 CompressionCodec codec = getCodec(conf);
311 if (codec != null) {
312 Compressor compressor = CodecPool.getCompressor(codec);
313 if (compressor != null) {
{code}

http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
{code}
162 public static Compressor getCompressor(CompressionCodec codec) {
163 return getCompressor(codec, null);
164 }
{code}

On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S salabhanji...@gmail.com wrote:
 Thanks for quick response Ted.

 - Hadoop version is 0.20.2
 - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds

 On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu yuzhih...@gmail.com wrote:
 What Hadoop version are you using ?

 Btw, the sentence about previous flushes was incomplete.

 Cheers

 On Mar 14, 2014, at 12:12 AM, Salabhanjika S salabhanji...@gmail.com wrote:

 Devs,

 We are using hbase version 0.90.6 (please don't complain of old
 version. we are in process of upgrading) in our production and we are
 noticing a strange problem arbitrarily for every few weeks. Region
 server goes extremely slow.
 We have to restart Region Server once this happens. There is no unique
 pattern of this problem. This happens on different region servers,
 different tables/regions and different times.

 Here are observations  findings from our analysis.
 - We are using LZO compression (0.4.10).

 - [RS Dashboard] Flush is running for more than 6 hours. It is in
 creating writer status for long time. Other previous flushes (600MB
 to 1.5GB) takes

 - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
 thread is in same state Configuration.loadResource
 regionserver60020.cacheFlusher daemon prio=10 tid=0x7efd016c4800
 nid=0x35e9 runnable [0x7efcad9c5000]
   java.lang.Thread.State: RUNNABLE
at 
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
at 
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
- locked 0x7f02ccc2ef78 (a
 sun.net.www.protocol.file.FileURLConnection)
at 
 com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
... [cutting down some stack to keep mail compact. all this stack
 is in com.sun.org.apache.xerces...]
at 
 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
- locked 0x7f014f1543b8 (a org.apache.hadoop.conf.Configuration)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
at 
 com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
at 
 com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
at 
 org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
at 
 org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
at 
 org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
at 
 

Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Weiping Qu
I am doing a mult-thread(100) scan test over hbase.
If one request with given key-range matches a large number of
correspoding rows in hbase, my request is waiting for this scan to complete.
The throughput is really slow.
For test purpose, I'd like to use LIMIT to reduce the time on scanning
and transferring results back from hbase to increase the throughput.
Do you think the hbase.client.scan.max.result.size or
setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
before scanning complete corresponding rows?

As you mentioned that there is no query optimizer in HBase, I assume
that region servers will not stop scanning the rows in this key-range in
this case until it gets all the results and limit the results to max
size which is sent to the client.
If so, there is not much I can do to compare the throughput with that in
relational databases like MySQL.

Thanks,
Cheers.
 Limit clause in SQL Select statement makes sense because it allows query 
 optimizer to plan accordingly.
 It does not make sense in HBase as since there is no query planner and or 
 optimization involved during
 scanning HBase table. You can easily mimic this functionality on a client 
 side (I mean - limit).

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Weiping Qu [q...@informatik.uni-kl.de]
 Sent: Monday, March 17, 2014 4:48 AM
 To: dev@hbase.apache.org
 Subject: Re: 答复: setMaxResultSize method in Scan

 Thanks.

 I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
 clause as expected which is specified each time a SQL statement is
 executed .
 Now through hbase.client.scanner.max.result.size can the limitation of
 number of row returned only apply to all the scanner instances.
 I am wondering why the setMaxResultSize is removed now.

 No such method for Scan in 0.94.x.

 If you want to set the max result size for a scan, you can achieve this by 
 setting the hbase.client.scanner.max.result.size configuration, the 
 default for which is Long.MAX_VALUE (no limited)
 
 发件人: Weiping Qu [q...@informatik.uni-kl.de]
 发送时间: 2014年3月17日 18:50
 收件人: dev@hbase.apache.org
 主题: setMaxResultSize method in Scan

 Hello,

 I could not find the method setMaxResultSize(long m)
 (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
 in my Scanclass (0.94.13 version).
 Can anyone help me? Thanks

 Weiping

 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331

 Confidentiality Notice:  The information contained in this message, including 
 any attachments hereto, may be confidential and is intended to be read only 
 by the individual or entity to whom this message is addressed. If the reader 
 of this message is not the intended recipient or an agent or designee of the 
 intended recipient, please note that any review, use, disclosure or 
 distribution of this message or its attachments, in any form, is strictly 
 prohibited.  If you have received this message in error, please immediately 
 notify the sender and/or notificati...@carrieriq.com and delete or destroy 
 any copy of this message and its attachments.


-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331


[jira] [Resolved] (HBASE-10027) CompoundRowPrefixFilter

2014-03-17 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju resolved HBASE-10027.


Resolution: Fixed

 CompoundRowPrefixFilter
 ---

 Key: HBASE-10027
 URL: https://issues.apache.org/jira/browse/HBASE-10027
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Fix For: 0.89-fb

   Original Estimate: 168h
  Remaining Estimate: 168h

 In order to scan a sequence of row prefixes optimally, this filter will 
 provide a hint to the Scanner via the scan query matcher to go to the next 
 prefix after finishing scanning with the current range.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10437) Integrating CompoundRowPrefixFilter with RowPrefixBloomFilter

2014-03-17 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju resolved HBASE-10437.


Resolution: Fixed

 Integrating CompoundRowPrefixFilter with RowPrefixBloomFilter
 -

 Key: HBASE-10437
 URL: https://issues.apache.org/jira/browse/HBASE-10437
 Project: HBase
  Issue Type: New Feature
  Components: Scanners
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Minor
 Fix For: 0.89-fb

   Original Estimate: 72h
  Remaining Estimate: 72h

 Adding the changes to Filter which can be used to incorporate the bloom 
 filter optimizations into the CompoundRowPrefixFilter.
 Having the context of the bloom filters from inside the filter gives a lot of 
 benefit in terms of performance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-9828) Fix the reflection call in SequenceFileLogReader

2014-03-17 Thread Manukranth Kolloju (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manukranth Kolloju resolved HBASE-9828.
---

Resolution: Fixed

 Fix the reflection call in SequenceFileLogReader
 

 Key: HBASE-9828
 URL: https://issues.apache.org/jira/browse/HBASE-9828
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
Priority: Trivial
 Fix For: 0.89-fb

   Original Estimate: 24h
  Remaining Estimate: 24h

 In SequenceFileLogReader, Class.getMethod() fails to get a private method in 
 the class. So converting it to Class.getDeclaredMethod().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread James Taylor
Hi Weiping,
Take a look at Apache Phoenix (http://phoenix.incubator.apache.org/). It's
a SQL layer on top of HBase and has support for LIMIT and a query planner
and optimizer.
Thanks,
James


On Mon, Mar 17, 2014 at 12:19 PM, Weiping Qu q...@informatik.uni-kl.dewrote:

 I am doing a mult-thread(100) scan test over hbase.
 If one request with given key-range matches a large number of
 correspoding rows in hbase, my request is waiting for this scan to
 complete.
 The throughput is really slow.
 For test purpose, I'd like to use LIMIT to reduce the time on scanning
 and transferring results back from hbase to increase the throughput.
 Do you think the hbase.client.scan.max.result.size or
 setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
 before scanning complete corresponding rows?

 As you mentioned that there is no query optimizer in HBase, I assume
 that region servers will not stop scanning the rows in this key-range in
 this case until it gets all the results and limit the results to max
 size which is sent to the client.
 If so, there is not much I can do to compare the throughput with that in
 relational databases like MySQL.

 Thanks,
 Cheers.
  Limit clause in SQL Select statement makes sense because it allows query
 optimizer to plan accordingly.
  It does not make sense in HBase as since there is no query planner and
 or optimization involved during
  scanning HBase table. You can easily mimic this functionality on a
 client side (I mean - limit).
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Weiping Qu [q...@informatik.uni-kl.de]
  Sent: Monday, March 17, 2014 4:48 AM
  To: dev@hbase.apache.org
  Subject: Re: 答复: setMaxResultSize method in Scan
 
  Thanks.
 
  I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
  clause as expected which is specified each time a SQL statement is
  executed .
  Now through hbase.client.scanner.max.result.size can the limitation of
  number of row returned only apply to all the scanner instances.
  I am wondering why the setMaxResultSize is removed now.
 
  No such method for Scan in 0.94.x.
 
  If you want to set the max result size for a scan, you can achieve this
 by setting the hbase.client.scanner.max.result.size configuration, the
 default for which is Long.MAX_VALUE (no limited)
  
  发件人: Weiping Qu [q...@informatik.uni-kl.de]
  发送时间: 2014年3月17日 18:50
  收件人: dev@hbase.apache.org
  主题: setMaxResultSize method in Scan
 
  Hello,
 
  I could not find the method setMaxResultSize(long m)
  (
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
  in my Scanclass (0.94.13 version).
  Can anyone help me? Thanks
 
  Weiping
 
  --
  Mit freundlichen Grü?en / Kind Regards
 
  *Weiping Qu*
 
  University of Kaiserslautern
  Department of Computer Science
  Heterogeneous Information Systems Group
  P.O. Box 3049
  67653 Kaiserslautern, Germany
 
  Email: qu (at) informatik.uni-kl.de
  Phone: +49 631 205 3264
  Fax: +49 631 205 3299
  Room: 36/331
 
  Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.


 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331



Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Weiping Qu
Hi James,

Thank you for reminding me of that.
Cheers,
Weiping
 Hi Weiping,
 Take a look at Apache Phoenix (http://phoenix.incubator.apache.org/). It's
 a SQL layer on top of HBase and has support for LIMIT and a query planner
 and optimizer.
 Thanks,
 James


 On Mon, Mar 17, 2014 at 12:19 PM, Weiping Qu q...@informatik.uni-kl.dewrote:

 I am doing a mult-thread(100) scan test over hbase.
 If one request with given key-range matches a large number of
 correspoding rows in hbase, my request is waiting for this scan to
 complete.
 The throughput is really slow.
 For test purpose, I'd like to use LIMIT to reduce the time on scanning
 and transferring results back from hbase to increase the throughput.
 Do you think the hbase.client.scan.max.result.size or
 setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
 before scanning complete corresponding rows?

 As you mentioned that there is no query optimizer in HBase, I assume
 that region servers will not stop scanning the rows in this key-range in
 this case until it gets all the results and limit the results to max
 size which is sent to the client.
 If so, there is not much I can do to compare the throughput with that in
 relational databases like MySQL.

 Thanks,
 Cheers.
 Limit clause in SQL Select statement makes sense because it allows query
 optimizer to plan accordingly.
 It does not make sense in HBase as since there is no query planner and
 or optimization involved during
 scanning HBase table. You can easily mimic this functionality on a
 client side (I mean - limit).
 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Weiping Qu [q...@informatik.uni-kl.de]
 Sent: Monday, March 17, 2014 4:48 AM
 To: dev@hbase.apache.org
 Subject: Re: 答复: setMaxResultSize method in Scan

 Thanks.

 I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
 clause as expected which is specified each time a SQL statement is
 executed .
 Now through hbase.client.scanner.max.result.size can the limitation of
 number of row returned only apply to all the scanner instances.
 I am wondering why the setMaxResultSize is removed now.

 No such method for Scan in 0.94.x.

 If you want to set the max result size for a scan, you can achieve this
 by setting the hbase.client.scanner.max.result.size configuration, the
 default for which is Long.MAX_VALUE (no limited)
 
 发件人: Weiping Qu [q...@informatik.uni-kl.de]
 发送时间: 2014年3月17日 18:50
 收件人: dev@hbase.apache.org
 主题: setMaxResultSize method in Scan

 Hello,

 I could not find the method setMaxResultSize(long m)
 (
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
 in my Scanclass (0.94.13 version).
 Can anyone help me? Thanks

 Weiping
 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331

 Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.


 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331



-- 
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331


[jira] [Created] (HBASE-10775) Revisit the HBASE-4012 byte comprare optimization

2014-03-17 Thread stack (JIRA)
stack created HBASE-10775:
-

 Summary: Revisit the HBASE-4012 byte comprare optimization
 Key: HBASE-10775
 URL: https://issues.apache.org/jira/browse/HBASE-10775
 Project: HBase
  Issue Type: Task
  Components: Performance
Reporter: stack


Some recent findings have it that we undo our HBASE-4012 Unsafe byte compare 
optimizations.  See tail of the original issue for findings that show compare 
slower if no bytes in common, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


RE: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Vladimir Rodionov

HBase RegionServer does scanning in batches, client requests next batch from 
server
and server reads and merge the data from cache/disk. You can control batch data 
size by setting both:

Scan.setRowCaching(number of rows to send in one RPC request)

Technically speaking, this allows you to control LIMIT from the client side. 
Your overhead will never be larger than the limit set by setRowCaching.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Weiping Qu [q...@informatik.uni-kl.de]
Sent: Monday, March 17, 2014 12:19 PM
To: dev@hbase.apache.org
Subject: Re: 答复: setMaxResultSize method in Scan

I am doing a mult-thread(100) scan test over hbase.
If one request with given key-range matches a large number of
correspoding rows in hbase, my request is waiting for this scan to complete.
The throughput is really slow.
For test purpose, I'd like to use LIMIT to reduce the time on scanning
and transferring results back from hbase to increase the throughput.
Do you think the hbase.client.scan.max.result.size or
setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
before scanning complete corresponding rows?

As you mentioned that there is no query optimizer in HBase, I assume
that region servers will not stop scanning the rows in this key-range in
this case until it gets all the results and limit the results to max
size which is sent to the client.
If so, there is not much I can do to compare the throughput with that in
relational databases like MySQL.

Thanks,
Cheers.
 Limit clause in SQL Select statement makes sense because it allows query 
 optimizer to plan accordingly.
 It does not make sense in HBase as since there is no query planner and or 
 optimization involved during
 scanning HBase table. You can easily mimic this functionality on a client 
 side (I mean - limit).

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Weiping Qu [q...@informatik.uni-kl.de]
 Sent: Monday, March 17, 2014 4:48 AM
 To: dev@hbase.apache.org
 Subject: Re: 答复: setMaxResultSize method in Scan

 Thanks.

 I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
 clause as expected which is specified each time a SQL statement is
 executed .
 Now through hbase.client.scanner.max.result.size can the limitation of
 number of row returned only apply to all the scanner instances.
 I am wondering why the setMaxResultSize is removed now.

 No such method for Scan in 0.94.x.

 If you want to set the max result size for a scan, you can achieve this by 
 setting the hbase.client.scanner.max.result.size configuration, the 
 default for which is Long.MAX_VALUE (no limited)
 
 发件人: Weiping Qu [q...@informatik.uni-kl.de]
 发送时间: 2014年3月17日 18:50
 收件人: dev@hbase.apache.org
 主题: setMaxResultSize method in Scan

 Hello,

 I could not find the method setMaxResultSize(long m)
 (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
 in my Scanclass (0.94.13 version).
 Can anyone help me? Thanks

 Weiping

 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331

 Confidentiality Notice:  The information contained in this message, including 
 any attachments hereto, may be confidential and is intended to be read only 
 by the individual or entity to whom this message is addressed. If the reader 
 of this message is not the intended recipient or an agent or designee of the 
 intended recipient, please note that any review, use, disclosure or 
 distribution of this message or its attachments, in any form, is strictly 
 prohibited.  If you have received this message in error, please immediately 
 notify the sender and/or notificati...@carrieriq.com and delete or destroy 
 any copy of this message and its attachments.


--
Mit freundlichen Grü?en / Kind Regards

*Weiping Qu*

University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany

Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299
Room: 36/331


Re: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Ted Yu
bq. Scan.setRowCaching()

I think you meant Scan.setCaching()


On Mon, Mar 17, 2014 at 1:28 PM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:


 HBase RegionServer does scanning in batches, client requests next batch
 from server
 and server reads and merge the data from cache/disk. You can control batch
 data size by setting both:

 Scan.setRowCaching(number of rows to send in one RPC request)

 Technically speaking, this allows you to control LIMIT from the client
 side. Your overhead will never be larger than the limit set by
 setRowCaching.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Weiping Qu [q...@informatik.uni-kl.de]
 Sent: Monday, March 17, 2014 12:19 PM
 To: dev@hbase.apache.org
 Subject: Re: 答复: setMaxResultSize method in Scan

 I am doing a mult-thread(100) scan test over hbase.
 If one request with given key-range matches a large number of
 correspoding rows in hbase, my request is waiting for this scan to
 complete.
 The throughput is really slow.
 For test purpose, I'd like to use LIMIT to reduce the time on scanning
 and transferring results back from hbase to increase the throughput.
 Do you think the hbase.client.scan.max.result.size or
 setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
 before scanning complete corresponding rows?

 As you mentioned that there is no query optimizer in HBase, I assume
 that region servers will not stop scanning the rows in this key-range in
 this case until it gets all the results and limit the results to max
 size which is sent to the client.
 If so, there is not much I can do to compare the throughput with that in
 relational databases like MySQL.

 Thanks,
 Cheers.
  Limit clause in SQL Select statement makes sense because it allows query
 optimizer to plan accordingly.
  It does not make sense in HBase as since there is no query planner and
 or optimization involved during
  scanning HBase table. You can easily mimic this functionality on a
 client side (I mean - limit).
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Weiping Qu [q...@informatik.uni-kl.de]
  Sent: Monday, March 17, 2014 4:48 AM
  To: dev@hbase.apache.org
  Subject: Re: 答复: setMaxResultSize method in Scan
 
  Thanks.
 
  I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
  clause as expected which is specified each time a SQL statement is
  executed .
  Now through hbase.client.scanner.max.result.size can the limitation of
  number of row returned only apply to all the scanner instances.
  I am wondering why the setMaxResultSize is removed now.
 
  No such method for Scan in 0.94.x.
 
  If you want to set the max result size for a scan, you can achieve this
 by setting the hbase.client.scanner.max.result.size configuration, the
 default for which is Long.MAX_VALUE (no limited)
  
  发件人: Weiping Qu [q...@informatik.uni-kl.de]
  发送时间: 2014年3月17日 18:50
  收件人: dev@hbase.apache.org
  主题: setMaxResultSize method in Scan
 
  Hello,
 
  I could not find the method setMaxResultSize(long m)
  (
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
  in my Scanclass (0.94.13 version).
  Can anyone help me? Thanks
 
  Weiping
 
  --
  Mit freundlichen Grü?en / Kind Regards
 
  *Weiping Qu*
 
  University of Kaiserslautern
  Department of Computer Science
  Heterogeneous Information Systems Group
  P.O. Box 3049
  67653 Kaiserslautern, Germany
 
  Email: qu (at) informatik.uni-kl.de
  Phone: +49 631 205 3264
  Fax: +49 631 205 3299
  Room: 36/331
 
  Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.


 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu (at) informatik.uni-kl.de
 Phone: +49 631 205 3264
 Fax: +49 631 205 3299
 Room: 36/331



Re: Region server slowdown

2014-03-17 Thread Enis Söztutar
Hi

Agreed with Vladimir. I doubt anybody will spend the time to debug the
issue. It would be easier if you can upgrade your HBase cluster. Also you
will have to upgrade your Hadoop cluster as well. You should go with
0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
for the upgrade process.

Enis


On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

 I think, 0.90.6 has reached EOL a couple years ago. The best you can do
 right now is
 start planning upgrading to the latest stable 0.94 or 0.96.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Salabhanjika S [salabhanji...@gmail.com]
 Sent: Monday, March 17, 2014 2:55 AM
 To: dev@hbase.apache.org
 Subject: Re: Region server slowdown

 @Devs, please respond if you can provide me some hints on this problem.

 Did some more analysis. While going through the code in stack track I
 noticed something sub-optimal.
 This may not be a root cause of our slowdown but I felt it may be some
 thing worthy to optimize/fix.

 HBase is making a call to Compressor *WITHOUT* config object. This is
 resulting in configuration reload for every call.
 Should this be calling with existing config object as a parameter so
 that configuration reload (discovery  xml parsing) will not happen so
 frequently?


 http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
 {code}
 309 public Compressor getCompressor() {
 310 CompressionCodec codec = getCodec(conf);
 311 if (codec != null) {
 312 Compressor compressor = CodecPool.getCompressor(codec);
 313 if (compressor != null) {
 {code}


 http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
 {code}
 162 public static Compressor getCompressor(CompressionCodec codec) {
 163 return getCompressor(codec, null);
 164 }
 {code}

 On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S salabhanji...@gmail.com
 wrote:
  Thanks for quick response Ted.
 
  - Hadoop version is 0.20.2
  - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
 
  On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu yuzhih...@gmail.com wrote:
  What Hadoop version are you using ?
 
  Btw, the sentence about previous flushes was incomplete.
 
  Cheers
 
  On Mar 14, 2014, at 12:12 AM, Salabhanjika S salabhanji...@gmail.com
 wrote:
 
  Devs,
 
  We are using hbase version 0.90.6 (please don't complain of old
  version. we are in process of upgrading) in our production and we are
  noticing a strange problem arbitrarily for every few weeks. Region
  server goes extremely slow.
  We have to restart Region Server once this happens. There is no unique
  pattern of this problem. This happens on different region servers,
  different tables/regions and different times.
 
  Here are observations  findings from our analysis.
  - We are using LZO compression (0.4.10).
 
  - [RS Dashboard] Flush is running for more than 6 hours. It is in
  creating writer status for long time. Other previous flushes (600MB
  to 1.5GB) takes
 
  - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
  thread is in same state Configuration.loadResource
  regionserver60020.cacheFlusher daemon prio=10 tid=0x7efd016c4800
  nid=0x35e9 runnable [0x7efcad9c5000]
java.lang.Thread.State: RUNNABLE
 at
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
 at
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
 - locked 0x7f02ccc2ef78 (a
  sun.net.www.protocol.file.FileURLConnection)
 at
 com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
 ... [cutting down some stack to keep mail compact. all this stack
  is in com.sun.org.apache.xerces...]
 at
 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
 at
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
 at
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
 at
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
 - locked 0x7f014f1543b8 (a
 org.apache.hadoop.conf.Configuration)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
 at
 com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
 at
 com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
 at
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
 at
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
 at
 

[jira] [Created] (HBASE-10776) Separate HConnectionManager into several parts

2014-03-17 Thread Yi Deng (JIRA)
Yi Deng created HBASE-10776:
---

 Summary: Separate HConnectionManager into several parts
 Key: HBASE-10776
 URL: https://issues.apache.org/jira/browse/HBASE-10776
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.89-fb
Reporter: Yi Deng
Priority: Minor
 Fix For: 0.89-fb


HConnectionManager is too large to effectively maintain. This Jira records some 
refactoring jobs:

1. Move TableServers out as a standalone class
2. Move region-locating code as a class



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10777) Restart Embedded Thrift Server in case of failures

2014-03-17 Thread Rishit Shroff (JIRA)
Rishit Shroff created HBASE-10777:
-

 Summary: Restart Embedded Thrift Server in case of failures
 Key: HBASE-10777
 URL: https://issues.apache.org/jira/browse/HBASE-10777
 Project: HBase
  Issue Type: Bug
Reporter: Rishit Shroff
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10778) Unique keys accounting in MultiThreadedReader is incorrect

2014-03-17 Thread Ted Yu (JIRA)
Ted Yu created HBASE-10778:
--

 Summary: Unique keys accounting in MultiThreadedReader is incorrect
 Key: HBASE-10778
 URL: https://issues.apache.org/jira/browse/HBASE-10778
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Ted Yu


TestMiniClusterLoad* tests fail in 10070 branch.
Here is one example:
{code}
ava.lang.AssertionError: expected:0 but was:7
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential.runLoadTestOnExistingTable(TestMiniClusterLoadSequential.java:139)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


RE: 答复: setMaxResultSize method in Scan

2014-03-17 Thread Vladimir Rodionov
Sure.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com


From: Ted Yu [yuzhih...@gmail.com]
Sent: Monday, March 17, 2014 1:34 PM
To: dev@hbase.apache.org
Subject: Re: 答复: setMaxResultSize method in Scan

bq. Scan.setRowCaching()

I think you meant Scan.setCaching()


On Mon, Mar 17, 2014 at 1:28 PM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:


 HBase RegionServer does scanning in batches, client requests next batch
 from server
 and server reads and merge the data from cache/disk. You can control batch
 data size by setting both:

 Scan.setRowCaching(number of rows to send in one RPC request)

 Technically speaking, this allows you to control LIMIT from the client
 side. Your overhead will never be larger than the limit set by
 setRowCaching.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Weiping Qu [q...@informatik.uni-kl.de]
 Sent: Monday, March 17, 2014 12:19 PM
 To: dev@hbase.apache.org
 Subject: Re: 答复: setMaxResultSize method in Scan

 I am doing a mult-thread(100) scan test over hbase.
 If one request with given key-range matches a large number of
 correspoding rows in hbase, my request is waiting for this scan to
 complete.
 The throughput is really slow.
 For test purpose, I'd like to use LIMIT to reduce the time on scanning
 and transferring results back from hbase to increase the throughput.
 Do you think the hbase.client.scan.max.result.size or
 setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
 before scanning complete corresponding rows?

 As you mentioned that there is no query optimizer in HBase, I assume
 that region servers will not stop scanning the rows in this key-range in
 this case until it gets all the results and limit the results to max
 size which is sent to the client.
 If so, there is not much I can do to compare the throughput with that in
 relational databases like MySQL.

 Thanks,
 Cheers.
  Limit clause in SQL Select statement makes sense because it allows query
 optimizer to plan accordingly.
  It does not make sense in HBase as since there is no query planner and
 or optimization involved during
  scanning HBase table. You can easily mimic this functionality on a
 client side (I mean - limit).
 
  Best regards,
  Vladimir Rodionov
  Principal Platform Engineer
  Carrier IQ, www.carrieriq.com
  e-mail: vrodio...@carrieriq.com
 
  
  From: Weiping Qu [q...@informatik.uni-kl.de]
  Sent: Monday, March 17, 2014 4:48 AM
  To: dev@hbase.apache.org
  Subject: Re: 答复: setMaxResultSize method in Scan
 
  Thanks.
 
  I'd like to assume that setMaxResultSize is equivalent to the SQL Limit
  clause as expected which is specified each time a SQL statement is
  executed .
  Now through hbase.client.scanner.max.result.size can the limitation of
  number of row returned only apply to all the scanner instances.
  I am wondering why the setMaxResultSize is removed now.
 
  No such method for Scan in 0.94.x.
 
  If you want to set the max result size for a scan, you can achieve this
 by setting the hbase.client.scanner.max.result.size configuration, the
 default for which is Long.MAX_VALUE (no limited)
  
  发件人: Weiping Qu [q...@informatik.uni-kl.de]
  发送时间: 2014年3月17日 18:50
  收件人: dev@hbase.apache.org
  主题: setMaxResultSize method in Scan
 
  Hello,
 
  I could not find the method setMaxResultSize(long m)
  (
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html)
  in my Scanclass (0.94.13 version).
  Can anyone help me? Thanks
 
  Weiping
 
  --
  Mit freundlichen Grü?en / Kind Regards
 
  *Weiping Qu*
 
  University of Kaiserslautern
  Department of Computer Science
  Heterogeneous Information Systems Group
  P.O. Box 3049
  67653 Kaiserslautern, Germany
 
  Email: qu (at) informatik.uni-kl.de
  Phone: +49 631 205 3264
  Fax: +49 631 205 3299
  Room: 36/331
 
  Confidentiality Notice:  The information contained in this message,
 including any attachments hereto, may be confidential and is intended to be
 read only by the individual or entity to whom this message is addressed. If
 the reader of this message is not the intended recipient or an agent or
 designee of the intended recipient, please note that any review, use,
 disclosure or distribution of this message or its attachments, in any form,
 is strictly prohibited.  If you have received this message in error, please
 immediately notify the sender and/or notificati...@carrieriq.com and
 delete or destroy any copy of this message and its attachments.


 --
 Mit freundlichen Grü?en / Kind Regards

 *Weiping Qu*

 University of Kaiserslautern
 Department of Computer Science
 Heterogeneous Information Systems Group
 P.O. Box 3049
 67653 Kaiserslautern, Germany

 Email: qu 

The 1st HBase 0.98.1 release candidate (RC0) is available

2014-03-17 Thread Andrew Purtell
The 1st HBase 0.98.1 release candidate (RC0) is available for download at
http://people.apache.org/~apurtell/0.98.1RC0/ and Maven artifacts are also
available in the temporary repository
https://repository.apache.org/content/repositories/orgapachehbase-1007.

Signed with my code signing key D5365CCD.

The issues resolved in this release can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12325664


This vote will run for 14 days given that a few RCs have stacked up this
month. Please try out the candidate and vote +1/-1 by midnight Pacific Time
(00:00 -0800 GMT) on February 31 on whether or not we should release this
as 0.98.1.

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: The 1st HBase 0.98.1 release candidate (RC0) is available

2014-03-17 Thread Ted Yu
Will give RC a spin.

bq. on February 31

possibly a typo: February - March.

Cheers


On Mon, Mar 17, 2014 at 4:42 PM, Andrew Purtell apurt...@apache.org wrote:

 The 1st HBase 0.98.1 release candidate (RC0) is available for download at
 http://people.apache.org/~apurtell/0.98.1RC0/ and Maven artifacts are also
 available in the temporary repository
 https://repository.apache.org/content/repositories/orgapachehbase-1007.

 Signed with my code signing key D5365CCD.

 The issues resolved in this release can be found here:

 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12325664


 This vote will run for 14 days given that a few RCs have stacked up this
 month. Please try out the candidate and vote +1/-1 by midnight Pacific Time
 (00:00 -0800 GMT) on February 31 on whether or not we should release this
 as 0.98.1.

 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)



[VOTE] The 1st HBase 0.98.1 release candidate (RC0) is available

2014-03-17 Thread Andrew Purtell
Adding VOTE tag to subject

and a clarification (pardon the typo):

This vote will run for 14 days given that a few RCs have stacked up this
month. Please try out the candidate and vote +1/-1 by midnight Pacific Time
(00:00 -0800 GMT) on March 31 on whether or not we should release this as
0.98.1.


On Mon, Mar 17, 2014 at 4:42 PM, Andrew Purtell apurt...@apache.org wrote:

 The 1st HBase 0.98.1 release candidate (RC0) is available for download at
 http://people.apache.org/~apurtell/0.98.1RC0/ and Maven artifacts are
 also available in the temporary repository
 https://repository.apache.org/content/repositories/orgapachehbase-1007.

 Signed with my code signing key D5365CCD.

 The issues resolved in this release can be found here:
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753version=12325664


 This vote will run for 14 days given that a few RCs have stacked up this
 month. Please try out the candidate and vote +1/-1 by midnight Pacific Time
 (00:00 -0800 GMT) on February 31 on whether or not we should release this
 as 0.98.1.

 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


[jira] [Resolved] (HBASE-10778) Unique keys accounting in MultiThreadedReader is incorrect

2014-03-17 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-10778.


Resolution: Fixed

 Unique keys accounting in MultiThreadedReader is incorrect
 --

 Key: HBASE-10778
 URL: https://issues.apache.org/jira/browse/HBASE-10778
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: hbase-10070

 Attachments: 10778-v1.txt


 TestMiniClusterLoad* tests fail in 10070 branch.
 Here is one example:
 {code}
 ava.lang.AssertionError: expected:0 but was:7
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential.runLoadTestOnExistingTable(TestMiniClusterLoadSequential.java:139)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10735) [WINDOWS] Set -XX:MaxPermSize for unit tests

2014-03-17 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar resolved HBASE-10735.
---

Resolution: Fixed

 [WINDOWS] Set -XX:MaxPermSize for unit tests
 

 Key: HBASE-10735
 URL: https://issues.apache.org/jira/browse/HBASE-10735
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Trivial
 Fix For: 0.96.2, 0.99.0, 0.98.2

 Attachments: hbase-10735_v1.patch, hbase-10735_v2.patch


 The tests on windows fail with java.lang.OutOfMemoryError: PermGen space.
 We have -XX:MaxPermSize=100m for tests on linux. We should just set it the 
 same in windows. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Introducing libhbase (C APIs for Apache HBase)

2014-03-17 Thread Aditya
Hi,

Pursuant to the JIRAs
HBASE-10168https://issues.apache.org/jira/browse/HBASE-10168,
HBASE-9977 https://issues.apache.org/jira/browse/HBASE-9977 and
HBASE-9835https://issues.apache.org/jira/browse/HBASE-9835I am happy
to announce that the first draft of a JNI based implementation
of C APIs for HBase is now available for your review.

The source and instructions to build and use is available at MapR's Github
repository http://goo.gl/dE5tzB. A slide from my presentation on the same
can be downloaded from the meetup site http://goo.gl/nfXx9f.

Would put the patches on the respective JIRA shortly.

Regards,
aditya...

https://issues.apache.org/jira/browse/HBASE-9835


Re: Region server slowdown

2014-03-17 Thread Salabhanjika S
Thanks Rodinov  Enis for responding. I agree with you that we need to upgrade.

As I mentioned in my first mail, we are in process of upgrade.
  We are using hbase version 0.90.6 (please don't complain of old
  version. we are in process of upgrading)

- Suboptimal (as per me) code snippets I posted in followup mail holds
good for trunk as well.

- I strongly feel this issue has something to do with HBase version. I
verified the code paths of the stack I posted.
I don't see any significant changes in current version in this code
(Flusher - getCompressor).


On Tue, Mar 18, 2014 at 2:30 AM, Enis Söztutar enis@gmail.com wrote:
 Hi

 Agreed with Vladimir. I doubt anybody will spend the time to debug the
 issue. It would be easier if you can upgrade your HBase cluster. Also you
 will have to upgrade your Hadoop cluster as well. You should go with
 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
 for the upgrade process.

 Enis


 On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov vrodio...@carrieriq.com
 wrote:

 I think, 0.90.6 has reached EOL a couple years ago. The best you can do
 right now is
 start planning upgrading to the latest stable 0.94 or 0.96.

 Best regards,
 Vladimir Rodionov
 Principal Platform Engineer
 Carrier IQ, www.carrieriq.com
 e-mail: vrodio...@carrieriq.com

 
 From: Salabhanjika S [salabhanji...@gmail.com]
 Sent: Monday, March 17, 2014 2:55 AM
 To: dev@hbase.apache.org
 Subject: Re: Region server slowdown

 @Devs, please respond if you can provide me some hints on this problem.

 Did some more analysis. While going through the code in stack track I
 noticed something sub-optimal.
 This may not be a root cause of our slowdown but I felt it may be some
 thing worthy to optimize/fix.

 HBase is making a call to Compressor *WITHOUT* config object. This is
 resulting in configuration reload for every call.
 Should this be calling with existing config object as a parameter so
 that configuration reload (discovery  xml parsing) will not happen so
 frequently?


 http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
 {code}
 309 public Compressor getCompressor() {
 310 CompressionCodec codec = getCodec(conf);
 311 if (codec != null) {
 312 Compressor compressor = CodecPool.getCompressor(codec);
 313 if (compressor != null) {
 {code}


 http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
 {code}
 162 public static Compressor getCompressor(CompressionCodec codec) {
 163 return getCompressor(codec, null);
 164 }
 {code}

 On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S salabhanji...@gmail.com
 wrote:
  Thanks for quick response Ted.
 
  - Hadoop version is 0.20.2
  - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
 
  On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu yuzhih...@gmail.com wrote:
  What Hadoop version are you using ?
 
  Btw, the sentence about previous flushes was incomplete.
 
  Cheers
 
  On Mar 14, 2014, at 12:12 AM, Salabhanjika S salabhanji...@gmail.com
 wrote:
 
  Devs,
 
  We are using hbase version 0.90.6 (please don't complain of old
  version. we are in process of upgrading) in our production and we are
  noticing a strange problem arbitrarily for every few weeks. Region
  server goes extremely slow.
  We have to restart Region Server once this happens. There is no unique
  pattern of this problem. This happens on different region servers,
  different tables/regions and different times.
 
  Here are observations  findings from our analysis.
  - We are using LZO compression (0.4.10).
 
  - [RS Dashboard] Flush is running for more than 6 hours. It is in
  creating writer status for long time. Other previous flushes (600MB
  to 1.5GB) takes
 
  - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
  thread is in same state Configuration.loadResource
  regionserver60020.cacheFlusher daemon prio=10 tid=0x7efd016c4800
  nid=0x35e9 runnable [0x7efcad9c5000]
java.lang.Thread.State: RUNNABLE
 at
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
 at
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
 - locked 0x7f02ccc2ef78 (a
  sun.net.www.protocol.file.FileURLConnection)
 at
 com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
 ... [cutting down some stack to keep mail compact. all this stack
  is in com.sun.org.apache.xerces...]
 at
 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
 at
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
 at
 

[jira] [Created] (HBASE-10779) Doc hadoop1 deprecated in 0.98 and NOT supported in hbase 1.0

2014-03-17 Thread stack (JIRA)
stack created HBASE-10779:
-

 Summary: Doc hadoop1 deprecated in 0.98 and NOT supported in hbase 
1.0
 Key: HBASE-10779
 URL: https://issues.apache.org/jira/browse/HBASE-10779
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack


Do first two bullet items from parent issue adding doc to our hadoop support 
matrix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10780) HFilePrettyPrinter#processFile should return immediately if file does not exists.

2014-03-17 Thread Ashish Singhi (JIRA)
Ashish Singhi created HBASE-10780:
-

 Summary: HFilePrettyPrinter#processFile should return immediately 
if file does not exists.
 Key: HBASE-10780
 URL: https://issues.apache.org/jira/browse/HBASE-10780
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.11
Reporter: Ashish Singhi
Priority: Minor


HFilePrettyPrinter#processFile should return immediately if file does not 
exists same like HLogPrettyPrinter#run

{code}
if (!fs.exists(file)) {
  System.err.println(ERROR, file doesnt exist:  + file);
}{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)