I like to avoid magic too. I might not have been entirely clear in what I was asking. Here is an example of what I had in mind, replacing the TBLPROPERTIES with a more generic TIMESTAMP option:
CREATE TABLE table_name ( publishTimestamp TIMESTAMP, attributes MAP(VARCHAR, VARCHAR), payload ROW ( name VARCHAR, age INTEGER, isSWE BOOLEAN, tags ARRAY(VARCHAR))) TIMESTAMP attributes["createTime"]; Andrew On Thu, May 3, 2018 at 12:47 PM Anton Kedin <ke...@google.com> wrote: > I think it makes sense for the case when timestamp is provided in the > payload (including pubsub message attributes). We can mark the field as an > event timestamp. But if the timestamp is internally defined by the source > (pubsub message publish time) and not exposed in the event body, then we > need a source-specific mechanism to extract and map the event timestamp to > the schema. This is, of course, if we don't automatically add a magic > timestamp field which Beam SQL can populate behind the scenes and add to > the schema. I want to avoid this magic path for now. > > On Thu, May 3, 2018 at 11:10 AM Andrew Pilloud <apill...@google.com> > wrote: > >> This sounds awesome! >> >> Is event timestamp something that we need to specify for every source? If >> so, I would suggest we add this as a first class option on CREATE TABLE >> rather then something hidden in TBLPROPERTIES. >> >> Andrew >> >> On Wed, May 2, 2018 at 10:30 AM Anton Kedin <ke...@google.com> wrote: >> >>> Hi >>> >>> I am working on adding functionality to support querying Pubsub messages >>> directly from Beam SQL. >>> >>> *Goal* >>> Provide Beam users a pure SQL solution to create the pipelines with >>> Pubsub as a data source, without the need to set up the pipelines in >>> Java before applying the query. >>> >>> *High level approach* >>> >>> - >>> - Build on top of PubsubIO; >>> - Pubsub source will be declared using CREATE TABLE DDL statement: >>> - Beam SQL already supports declaring sources like Kafka and Text >>> using CREATE TABLE DDL; >>> - it supports additional configuration using TBLPROPERTIES >>> clause. Currently it takes a text blob, where we can put a JSON >>> configuration; >>> - wrapping PubsubIO into a similar source looks feasible; >>> - The plan is to initially support messages only with JSON payload: >>> - >>> - more payload formats can be added later; >>> - Messages will be fully described in the CREATE TABLE statements: >>> - event timestamps. Source of the timestamp is configurable. It >>> is required by Beam SQL to have an explicit timestamp column for >>> windowing >>> support; >>> - messages attributes map; >>> - JSON payload schema; >>> - Event timestamps will be taken either from publish time or >>> user-specified message attribute (configurable); >>> >>> Thoughts, ideas, comments? >>> >>> More details are in the doc here: >>> https://docs.google.com/document/d/1wIXTxh-nQ3u694XbF0iEZX_7-b3yi4ad0ML2pcAxYfE >>> >>> >>> Thank you, >>> Anton >>> >>