[jira] [Commented] (FLINK-2188) Reading from big HBase Tables

Ufuk Celebi (JIRA) Tue, 09 Jun 2015 09:30:34 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579173#comment-14579173
 ]


Ufuk Celebi commented on FLINK-2188:
------------------------------------

OK, I've looked into the issue. I had a problem running the 
mapreduce.TableInputFormat, which I've fixed now. Thanks for your feedback. 
This is very important for the release. You can try it out the fix shortly from 
my branch if you have the time.

The issue with your programs is probably a problem with Flink's HBase input 
format.

> Reading from big HBase Tables
> -----------------------------
>
>                 Key: FLINK-2188
>                 URL: https://issues.apache.org/jira/browse/FLINK-2188
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Hilmi Yildirim
>            Priority: Critical
>         Attachments: flinkTest.zip
>
>
> I detected a bug in the reading from a big Hbase Table.
> I used a cluster of 13 machines with 13 processing slots for each machine 
> which results in a total number of processing slots of 169. Further, our 
> cluster uses cdh5.4.1 and the HBase version is 1.0.0-cdh5.4.1. There is a 
> Hbase Table with nearly 100. mio rows. I used Spark and Hive to count the 
> number of rows and both results are identical (nearly 100 mio.). 
> Then, I used Flink to count the number of rows. For that I added the 
> hbase-client 1.0.0-cdh5.4.1 Java API as dependency in maven and excluded the 
> other hbase-client dependencies. The result of the job is nearly 102 mio. , 2 
> mio. rows more than the result of Spark and Hive. Moreover, I run the Flink 
> job multiple times and sometimes the result fluctuates by +-5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2188) Reading from big HBase Tables

Reply via email to