[ https://issues.apache.org/jira/browse/FLINK-2188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580282#comment-14580282 ]
Ufuk Celebi commented on FLINK-2188: ------------------------------------ You can try it with this branch: https://github.com/uce/incubator-flink/tree/configurable_if-2195 The following code snippet should allow you to adjust your example. {code} public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment(4); Configuration conf = new Configuration(); conf.set(TableInputFormat.INPUT_TABLE, "test"); DataSource<Tuple2<ImmutableBytesWritable, Result>> hbase = env.createHadoopInput( new TableInputFormat(), ImmutableBytesWritable.class, Result.class, Job.getInstance(conf)); DataSet<Tuple2<String, String>> toTuple = hbase.map( new MapFunction<Tuple2<ImmutableBytesWritable, Result>, Tuple2<String, String>>() { public Tuple2<String, String> map(Tuple2<ImmutableBytesWritable, Result> record) throws Exception { Result result = record.f1; return new Tuple2<String, String>( Bytes.toString(result.getRow()), new String(result.value())); } }); System.out.println(toTuple.count()); } {code} > Reading from big HBase Tables > ----------------------------- > > Key: FLINK-2188 > URL: https://issues.apache.org/jira/browse/FLINK-2188 > Project: Flink > Issue Type: Bug > Reporter: Hilmi Yildirim > Priority: Critical > Attachments: flinkTest.zip > > > I detected a bug in the reading from a big Hbase Table. > I used a cluster of 13 machines with 13 processing slots for each machine > which results in a total number of processing slots of 169. Further, our > cluster uses cdh5.4.1 and the HBase version is 1.0.0-cdh5.4.1. There is a > Hbase Table with nearly 100. mio rows. I used Spark and Hive to count the > number of rows and both results are identical (nearly 100 mio.). > Then, I used Flink to count the number of rows. For that I added the > hbase-client 1.0.0-cdh5.4.1 Java API as dependency in maven and excluded the > other hbase-client dependencies. The result of the job is nearly 102 mio. , 2 > mio. rows more than the result of Spark and Hive. Moreover, I run the Flink > job multiple times and sometimes the result fluctuates by +-5. -- This message was sent by Atlassian JIRA (v6.3.4#6332)