[ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismaël Mejía updated BEAM-3157: ------------------------------- Description: Currently the Beam SQL transform only supports input and output data represented as a BeamRecord. This seems to me like an usability limitation (even if we can do a ParDo to prepare objects before and after the transform). I suppose this constraint comes from the fact that we need to map name/type/value from an object field into Calcite so it is convenient to have a specific data type (BeamRecord) for this. However we can accomplish the same by using a PCollection of JavaBean (where we know the same information via the field names/types/values) or by using Avro records where we also have the Schema information. For the output PCollection we can map the object via a Reference (e.g. a JavaBean to be filled with the names of an Avro object). Note: I am assuming for the moment simple mappings since the SQL does not support composite types for the moment. A simple API idea would be something like this: PCollection<MyPojo> col = ... PCollection<MyNewPojo> newCol = BeamSql.query("SELECT ...", MyNewPojo.class); A first approach could be to just add the extra ParDos + transform DoFns however I suppose that for memory use reasons maybe mapping directly into Calcite would make sense. was: Currently the Beam SQL transform only supports input and output data represented as a BeamRecord. This seems to me like an usability limitation (even if we can do a ParDo to prepare objects before and after the transform). I suppose this constraint comes from the fact that we need to map name/type/value from an object field into Calcite so it is convenient to have a specific data type (BeamRecord) for this. However we can accomplish the same by using a PCollection of JavaBean (where we know the same information via the field names/types/values) or by using Avro records where we also have the Schema information. For the output PCollection we can map the object via a Reference (e.g. a JavaBean to be filled with the names of an Avro object). Note: I am assuming for the moment simple mappings since the SQL does not support composite types for the moment. A simple API idea would be something like this: PCollection<MyPojo> col = ... PCollection<>BeamSql.query("SELECT ...", MyNewPojo.class); A first approach could be to just add the extra ParDos + transform DoFns however I suppose that for memory use reasons maybe mapping directly into Calcite would make sense. > BeamSql transform should support other PCollection types > -------------------------------------------------------- > > Key: BEAM-3157 > URL: https://issues.apache.org/jira/browse/BEAM-3157 > Project: Beam > Issue Type: Improvement > Components: dsl-sql > Reporter: Ismaël Mejía > > Currently the Beam SQL transform only supports input and output data > represented as a BeamRecord. This seems to me like an usability limitation > (even if we can do a ParDo to prepare objects before and after the transform). > I suppose this constraint comes from the fact that we need to map > name/type/value from an object field into Calcite so it is convenient to have > a specific data type (BeamRecord) for this. However we can accomplish the > same by using a PCollection of JavaBean (where we know the same information > via the field names/types/values) or by using Avro records where we also have > the Schema information. For the output PCollection we can map the object via > a Reference (e.g. a JavaBean to be filled with the names of an Avro object). > Note: I am assuming for the moment simple mappings since the SQL does not > support composite types for the moment. > A simple API idea would be something like this: > PCollection<MyPojo> col = ... > PCollection<MyNewPojo> newCol = BeamSql.query("SELECT ...", MyNewPojo.class); > A first approach could be to just add the extra ParDos + transform DoFns > however I suppose that for memory use reasons maybe mapping directly into > Calcite would make sense. -- This message was sent by Atlassian JIRA (v6.4.14#64029)