Re: Pubsub to Beam SQL

Anton Kedin Thu, 03 May 2018 12:53:00 -0700

A SQL-specific wrapper+custom transforms for PubsubIO should suffice. We
will probably need to a way to expose a message publish timestamp if we
want to use it as an event timestamp, but that will be consumed by the same
wrapper/transform without adding anything schema or SQL-specific to
PubsubIO itself.


On Thu, May 3, 2018 at 11:44 AM Reuven Lax <re...@google.com> wrote:

> Are you planning on integrating this directly into PubSubIO, or add a
> follow-on transform?
>
> On Wed, May 2, 2018 at 10:30 AM Anton Kedin <ke...@google.com> wrote:
>
>> Hi
>>
>> I am working on adding functionality to support querying Pubsub messages
>> directly from Beam SQL.
>>
>> *Goal*
>>   Provide Beam users a pure  SQL solution to create the pipelines with
>> Pubsub as a data source, without the need to set up the pipelines in
>> Java before applying the query.
>>
>> *High level approach*
>>
>>    -
>>    - Build on top of PubsubIO;
>>    - Pubsub source will be declared using CREATE TABLE DDL statement:
>>       - Beam SQL already supports declaring sources like Kafka and Text
>>       using CREATE TABLE DDL;
>>       - it supports additional configuration using TBLPROPERTIES clause.
>>       Currently it takes a text blob, where we can put a JSON configuration;
>>       - wrapping PubsubIO into a similar source looks feasible;
>>    - The plan is to initially support messages only with JSON payload:
>>    -
>>       - more payload formats can be added later;
>>    - Messages will be fully described in the CREATE TABLE statements:
>>       - event timestamps. Source of the timestamp is configurable. It is
>>       required by Beam SQL to have an explicit timestamp column for windowing
>>       support;
>>       - messages attributes map;
>>       - JSON payload schema;
>>    - Event timestamps will be taken either from publish time or
>>    user-specified message attribute (configurable);
>>
>> Thoughts, ideas, comments?
>>
>> More details are in the doc here:
>> https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcAxYfE
>>
>>
>> Thank you,
>> Anton
>>
>

Re: Pubsub to Beam SQL

Reply via email to