[ 
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126231#comment-15126231
 ] 

PJ Van Aeken edited comment on FLINK-2055 at 2/1/16 2:22 PM:
-------------------------------------------------------------

Indeed the example that you described uses the native client API which I think 
is the way to go. Unfortunately, HTable is now deprecated so the examples are 
outdated. In the link to the mailing list (see the issue description), it is 
suggested to now use the write method on DataStream combined with 
TableOutputFormat.

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#write%28org.apache.flink.api.common.io.OutputFormat,%20long%29

What I am proposing instead is to make a SinkFunction (like we have for Flume 
for instance) that uses the new HBase client API's, similar to how the example 
you referred to used to work, rather than using this TableOutputFormat which as 
far as I understand buffers requests on the client side based on some internal 
heuristics, as per the HBase documentation:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html


was (Author: vanaepi):
Indeed the example that you described uses the native client API which I think 
is the way to go. Unfortunately, HTable is now deprecated so the examples are 
outdated. In the link to the mailing list (see the issue description), it is 
suggested to now use the write method on DataStream combined with 
TableOutputFormat.

https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#write%28org.apache.flink.api.common.io.OutputFormat,%20long%29

What I am proposing instead is to make a SinkFunction (like we have for Flume 
for instance) that uses the new HBase client API's, similar to how the example 
you referred to used to work, rather than using this TableOutputFormat which as 
far as I understand buffers requests on the client side based on some internal 
heuristics, as per the HBase documentation:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html

EDIT: There appears to be a version mismatch which is why we are not seeing the 
same problems. Turns out my assumptions are not true in version 0.98x, I am 
unsure about 1.x for now and its definitely true for 2.x which is in snapshot 
currently. So the inner workings of the TableOutputFormat have changed in 
recent versions, which introduces the problem I have described.

> Implement Streaming HBaseSink
> -----------------------------
>
>                 Key: FLINK-2055
>                 URL: https://issues.apache.org/jira/browse/FLINK-2055
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming, Streaming Connectors
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Hilmi Yildirim
>
> As per : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to