[ 
https://issues.apache.org/jira/browse/BEAM-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated BEAM-2661:
--------------------------------
    Description: 
New IO for Apache Kudu (https://kudu.apache.org/overview.html).

This work is in progress [on this 
branch|https://github.com/timrobertson100/beam/tree/BEAM-2661-KuduIO].

Design aspects are documented below.

The API
# The Kudu 
[Operation|https://kudu.apache.org/apidocs/org/apache/kudu/client/Operation.html]
 is a fat class, and is a subclass of {{KuduRpc<OperationResponse>}}. It holds 
RPC logic, callbacks and a Kudu client. Because of this the {{Operation}} does 
not serialize and furthermore, the logic for encoding the operations (Insert, 
Upsert etc) in the Kudu Java API are one way only (no decode) because the 
server is written in C++.
# An alternative could be to introduce a new object to beam  (e.g. 
{{o.a.b.sdk.io.kudu.KuduOperation}}) to enable {{PCollection<KuduOperation>}}. 
This was considered but was discounted because:
## It is not a familiar API to those already knowing Kudu
## It still requires serialization and deserialization of the operations. Using 
the existing Kudu approach of serializing into compact byte arrays would 
require a decoder along the lines of [this almost complete 
example|https://gist.github.com/timrobertson100/df77d1337ba8f5609319751ee7c6e01e]




  was:New IO for Apache Kudu (https://kudu.apache.org/overview.html).


> Add KuduIO
> ----------
>
>                 Key: BEAM-2661
>                 URL: https://issues.apache.org/jira/browse/BEAM-2661
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Tim Robertson
>            Priority: Major
>
> New IO for Apache Kudu (https://kudu.apache.org/overview.html).
> This work is in progress [on this 
> branch|https://github.com/timrobertson100/beam/tree/BEAM-2661-KuduIO].
> Design aspects are documented below.
> The API
> # The Kudu 
> [Operation|https://kudu.apache.org/apidocs/org/apache/kudu/client/Operation.html]
>  is a fat class, and is a subclass of {{KuduRpc<OperationResponse>}}. It 
> holds RPC logic, callbacks and a Kudu client. Because of this the 
> {{Operation}} does not serialize and furthermore, the logic for encoding the 
> operations (Insert, Upsert etc) in the Kudu Java API are one way only (no 
> decode) because the server is written in C++.
> # An alternative could be to introduce a new object to beam  (e.g. 
> {{o.a.b.sdk.io.kudu.KuduOperation}}) to enable 
> {{PCollection<KuduOperation>}}. This was considered but was discounted 
> because:
> ## It is not a familiar API to those already knowing Kudu
> ## It still requires serialization and deserialization of the operations. 
> Using the existing Kudu approach of serializing into compact byte arrays 
> would require a decoder along the lines of [this almost complete 
> example|https://gist.github.com/timrobertson100/df77d1337ba8f5609319751ee7c6e01e]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to