[ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-3157: ---------------------------------- Fix Version/s: (was: 2.4.0) > BeamSql transform should support other PCollection types > -------------------------------------------------------- > > Key: BEAM-3157 > URL: https://issues.apache.org/jira/browse/BEAM-3157 > Project: Beam > Issue Type: Improvement > Components: dsl-sql > Reporter: Ismaël Mejía > Assignee: Anton Kedin > Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently the Beam SQL transform only supports input and output data > represented as a BeamRecord. This seems to me like an usability limitation > (even if we can do a ParDo to prepare objects before and after the transform). > I suppose this constraint comes from the fact that we need to map > name/type/value from an object field into Calcite so it is convenient to have > a specific data type (BeamRecord) for this. However we can accomplish the > same by using a PCollection of JavaBean (where we know the same information > via the field names/types/values) or by using Avro records where we also have > the Schema information. For the output PCollection we can map the object via > a Reference (e.g. a JavaBean to be filled with the names of an Avro object). > Note: I am assuming for the moment simple mappings since the SQL does not > support composite types for the moment. > A simple API idea would be something like this: > A simple filter: > PCollection<MyPojo> col = BeamSql.query("SELECT * FROM .... WHERE > ...").from(MyPojo.class); > A projection: > PCollection<MyNewPojo> newCol = BeamSql.query("SELECT id, > name").from(MyPojo.class).as(MyNewPojo.class); > A first approach could be to just add the extra ParDos + transform DoFns > however I suppose that for memory use reasons maybe mapping directly into > Calcite would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)