[jira] [Updated] (BEAM-5384) [SQL] Calcite optimizes away LogicalProject
[ https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-5384: -- Summary: [SQL] Calcite optimizes away LogicalProject (was: [SQL] Calcite Doesn't optimizes away LogicalProject) > [SQL] Calcite optimizes away LogicalProject > --- > > Key: BEAM-5384 > URL: https://issues.apache.org/jira/browse/BEAM-5384 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From > [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution] > :* > I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform > and writes the results to BigQuery. > When I don't do any aggregation in my SQL statement it works fine: > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select views from PCOLLECTION")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > However, when I try to aggregate with a sum then it fails (throws a > CannotPlanException exception): > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select wikimedia_project, > sum(views) from PCOLLECTION group by wikimedia_project")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > Stacktrace: > {code:java} > Step #1: 11:47:37,562 0[main] INFO > org.apache.beam.runners.dataflow.DataflowRunner - > PipelineOptions.filesToStage was not specified. Defaulting to files from the > classpath: will stage 117 files. Enable logging at DEBUG level to see which > files will be staged. > Step #1: 11:47:39,845 2283 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL: > Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`) > Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION` > Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project` > Step #1: 11:47:40,387 2825 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan> > Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)]) > Step #1: BeamIOSourceRel(table=[[beam, PCOLLECTION]]) > Step #1: > Step #1: Exception in thread "main" > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner > state: > Step #1: > Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[] > Step #1: Original rel: > Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], > group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = > {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5 > Step #1: BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], > table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 > rows, 101.0 cpu, 0.0 io}, id = 2 > Step #1: > Step #1: Sets: > Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views) > Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81 > Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405 > Step #1: > rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1) > Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9 > Step #1: > rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)), > rowcount=10.0, cumulative cost={inf} > Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0 > Step #1: > rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]), > rowcount=10.0, cumulative cost={inf} > Step #1: > Step #1: > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448) > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298) > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666) > Step #1:at >
[jira] [Commented] (BEAM-5384) [SQL] Calcite Doesn't optimizes away LogicalProject
[ https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615043#comment-16615043 ] Anton Kedin commented on BEAM-5384: --- Reproduction: https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java > [SQL] Calcite Doesn't optimizes away LogicalProject > --- > > Key: BEAM-5384 > URL: https://issues.apache.org/jira/browse/BEAM-5384 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From > [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution] > :* > I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform > and writes the results to BigQuery. > When I don't do any aggregation in my SQL statement it works fine: > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select views from PCOLLECTION")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > However, when I try to aggregate with a sum then it fails (throws a > CannotPlanException exception): > {code:java} > .. > PCollection outputStream = > sqlRows.apply( > "sql_transform", > SqlTransform.query("select wikimedia_project, > sum(views) from PCOLLECTION group by wikimedia_project")); > outputStream.setCoder(SCHEMA.getRowCoder()); > .. > {code} > Stacktrace: > {code:java} > Step #1: 11:47:37,562 0[main] INFO > org.apache.beam.runners.dataflow.DataflowRunner - > PipelineOptions.filesToStage was not specified. Defaulting to files from the > classpath: will stage 117 files. Enable logging at DEBUG level to see which > files will be staged. > Step #1: 11:47:39,845 2283 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL: > Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`) > Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION` > Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project` > Step #1: 11:47:40,387 2825 [main] INFO > org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan> > Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)]) > Step #1: BeamIOSourceRel(table=[[beam, PCOLLECTION]]) > Step #1: > Step #1: Exception in thread "main" > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: > Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner > state: > Step #1: > Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[] > Step #1: Original rel: > Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], > group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = > {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5 > Step #1: BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], > table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 > rows, 101.0 cpu, 0.0 io}, id = 2 > Step #1: > Step #1: Sets: > Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views) > Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81 > Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, > PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} > Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405 > Step #1: > rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]), > rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, > 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} > Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1) > Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9 > Step #1: > rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)), > rowcount=10.0, cumulative cost={inf} > Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0 > Step #1: > rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]), > rowcount=10.0, cumulative cost={inf} > Step #1: > Step #1: > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448) > Step #1:at > org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298) > Step #1:at >
[jira] [Updated] (BEAM-5384) [SQL] Calcite Doesn't optimizes away LogicalProject
[ https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-5384: -- Description: *From [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution] :* I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and writes the results to BigQuery. When I don't do any aggregation in my SQL statement it works fine: {code:java} .. PCollection outputStream = sqlRows.apply( "sql_transform", SqlTransform.query("select views from PCOLLECTION")); outputStream.setCoder(SCHEMA.getRowCoder()); .. {code} However, when I try to aggregate with a sum then it fails (throws a CannotPlanException exception): {code:java} .. PCollection outputStream = sqlRows.apply( "sql_transform", SqlTransform.query("select wikimedia_project, sum(views) from PCOLLECTION group by wikimedia_project")); outputStream.setCoder(SCHEMA.getRowCoder()); .. {code} Stacktrace: {code:java} Step #1: 11:47:37,562 0[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 117 files. Enable logging at DEBUG level to see which files will be staged. Step #1: 11:47:39,845 2283 [main] INFO org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL: Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`) Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION` Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project` Step #1: 11:47:40,387 2825 [main] INFO org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan> Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)]) Step #1: BeamIOSourceRel(table=[[beam, PCOLLECTION]]) Step #1: Step #1: Exception in thread "main" org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner state: Step #1: Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[] Step #1: Original rel: Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5 Step #1: BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 2 Step #1: Step #1: Sets: Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views) Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81 Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405 Step #1: rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]), rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1) Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9 Step #1: rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)), rowcount=10.0, cumulative cost={inf} Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0 Step #1: rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]), rowcount=10.0, cumulative cost={inf} Step #1: Step #1: Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:336) Step #1:at org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner.convertToBeamRel(BeamQueryPlanner.java:138) Step #1:at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:105) Step #1:at org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:96) Step #1:at
[jira] [Created] (BEAM-5384) [SQL] Calcite Doesn't optimizes away LogicalProject
Anton Kedin created BEAM-5384: - Summary: [SQL] Calcite Doesn't optimizes away LogicalProject Key: BEAM-5384 URL: https://issues.apache.org/jira/browse/BEAM-5384 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Anton Kedin *From https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution *: I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and writes the results to BigQuery. When I don't do any aggregation in my SQL statement it works fine: {code} .. PCollection outputStream = sqlRows.apply( "sql_transform", SqlTransform.query("select views from PCOLLECTION")); outputStream.setCoder(SCHEMA.getRowCoder()); .. {code} However, when I try to aggregate with a sum then it fails (throws a CannotPlanException exception): {code} .. PCollection outputStream = sqlRows.apply( "sql_transform", SqlTransform.query("select wikimedia_project, sum(views) from PCOLLECTION group by wikimedia_project")); outputStream.setCoder(SCHEMA.getRowCoder()); .. {code} Stacktrace: {code} Step #1: 11:47:37,562 0[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 117 files. Enable logging at DEBUG level to see which files will be staged. Step #1: 11:47:39,845 2283 [main] INFO org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL: Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`) Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION` Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project` Step #1: 11:47:40,387 2825 [main] INFO org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan> Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)]) Step #1: BeamIOSourceRel(table=[[beam, PCOLLECTION]]) Step #1: Step #1: Exception in thread "main" org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner state: Step #1: Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[] Step #1: Original rel: Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5 Step #1: BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 2 Step #1: Step #1: Sets: Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views) Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81 Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io} Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405 Step #1: rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]), rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 1.7976931348623157E308 cpu, 1.7976931348623157E308 io} Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1) Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9 Step #1: rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)), rowcount=10.0, cumulative cost={inf} Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0 Step #1: rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]), rowcount=10.0, cumulative cost={inf} Step #1: Step #1: Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368) Step #1:at org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:336) Step #1:at org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner.convertToBeamRel(BeamQueryPlanner.java:138) Step #1:at org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:105)
[jira] [Commented] (BEAM-5335) [SQL] Output schema is not set incorrectly
[ https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607355#comment-16607355 ] Anton Kedin commented on BEAM-5335: --- >From >https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query/52209683: bq. I think it is a bug in Scio - the pipeline works in plain Java, as it turns out. Then I saw that Scio sets the coder itself after applying a transform, maybe this is the problem. I'll raise a ticket on the repo https://github.com/spotify/scio/blob/e95310ec0828e60209e279baa5047a6b62288c5a/scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala#L149 > [SQL] Output schema is not set incorrectly > -- > > Key: BEAM-5335 > URL: https://issues.apache.org/jira/browse/BEAM-5335 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From: > https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query > :* > I've been playing with the Beam SQL DSL and I'm unable to use the output from > a query without providing a code that's aware of the output schema manually. > Can I infer the output schema rather than hardcoding it? > Neither the walkthrough or the examples actually use the output from a query. > I'm using Scio rather than the plain Java API to keep the code relatively > readable and concise, I don't think that makes a difference for this question. > Here's an example of what I mean. > Given an input schema inSchema and some data source that is mapped onto a Row > as follows: (in this example, Avro-based, but again, I don't think that > matters): > {code} > sc.avroFile[Foo](args("input")) >.map(fooToRow) >.setCoder(inSchema.getRowCoder) >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.saveAsTextFile(args("output")) > {code} > Running this pipeline results in a KryoException as follows: > {code} > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > fieldIndices (org.apache.beam.sdk.schemas.Schema) > schema (org.apache.beam.sdk.values.RowWithStorage) > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > {code} > However, inserting a RowCoder matching the SQL output, in this case a single > count int column: > {code} >...snip... >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder) >.saveAsTextFile(args("output")) > {code} > Now the pipeline runs just fine. > Having to manually tell the pipeline how to encode the SQL output seems > unnecessary, given that we specify the input schema/coder(s) and a query. It > seems to me that we should be able to infer the output schema from that - but > I can't see how, other than maybe using Calcite directly? > Before raising a ticket on the Beam Jira, I thought I'd check I wasn't > missing something obvious! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5335) [SQL] Output schema is not set incorrectly
[ https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-5335: -- Summary: [SQL] Output schema is not set incorrectly (was: [SQL] Output schema is set incorrectly) > [SQL] Output schema is not set incorrectly > -- > > Key: BEAM-5335 > URL: https://issues.apache.org/jira/browse/BEAM-5335 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From: > https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query > :* > I've been playing with the Beam SQL DSL and I'm unable to use the output from > a query without providing a code that's aware of the output schema manually. > Can I infer the output schema rather than hardcoding it? > Neither the walkthrough or the examples actually use the output from a query. > I'm using Scio rather than the plain Java API to keep the code relatively > readable and concise, I don't think that makes a difference for this question. > Here's an example of what I mean. > Given an input schema inSchema and some data source that is mapped onto a Row > as follows: (in this example, Avro-based, but again, I don't think that > matters): > {code} > sc.avroFile[Foo](args("input")) >.map(fooToRow) >.setCoder(inSchema.getRowCoder) >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.saveAsTextFile(args("output")) > {code} > Running this pipeline results in a KryoException as follows: > {code} > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > fieldIndices (org.apache.beam.sdk.schemas.Schema) > schema (org.apache.beam.sdk.values.RowWithStorage) > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > {code} > However, inserting a RowCoder matching the SQL output, in this case a single > count int column: > {code} >...snip... >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder) >.saveAsTextFile(args("output")) > {code} > Now the pipeline runs just fine. > Having to manually tell the pipeline how to encode the SQL output seems > unnecessary, given that we specify the input schema/coder(s) and a query. It > seems to me that we should be able to infer the output schema from that - but > I can't see how, other than maybe using Calcite directly? > Before raising a ticket on the Beam Jira, I thought I'd check I wasn't > missing something obvious! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5335) [SQL] Output schema is set incorrectly
[ https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606168#comment-16606168 ] Anton Kedin commented on BEAM-5335: --- And we need a test to chain multiple queries > [SQL] Output schema is set incorrectly > -- > > Key: BEAM-5335 > URL: https://issues.apache.org/jira/browse/BEAM-5335 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From: > https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query > :* > I've been playing with the Beam SQL DSL and I'm unable to use the output from > a query without providing a code that's aware of the output schema manually. > Can I infer the output schema rather than hardcoding it? > Neither the walkthrough or the examples actually use the output from a query. > I'm using Scio rather than the plain Java API to keep the code relatively > readable and concise, I don't think that makes a difference for this question. > Here's an example of what I mean. > Given an input schema inSchema and some data source that is mapped onto a Row > as follows: (in this example, Avro-based, but again, I don't think that > matters): > {code} > sc.avroFile[Foo](args("input")) >.map(fooToRow) >.setCoder(inSchema.getRowCoder) >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.saveAsTextFile(args("output")) > {code} > Running this pipeline results in a KryoException as follows: > {code} > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > fieldIndices (org.apache.beam.sdk.schemas.Schema) > schema (org.apache.beam.sdk.values.RowWithStorage) > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > {code} > However, inserting a RowCoder matching the SQL output, in this case a single > count int column: > {code} >...snip... >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder) >.saveAsTextFile(args("output")) > {code} > Now the pipeline runs just fine. > Having to manually tell the pipeline how to encode the SQL output seems > unnecessary, given that we specify the input schema/coder(s) and a query. It > seems to me that we should be able to infer the output schema from that - but > I can't see how, other than maybe using Calcite directly? > Before raising a ticket on the Beam Jira, I thought I'd check I wasn't > missing something obvious! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5335) [SQL] Output schema is set incorrectly
[ https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606167#comment-16606167 ] Anton Kedin commented on BEAM-5335: --- This is potentially scio-specific, as output schema should be set correctly: https://github.com/apache/beam/blob/9319ce18ac625b239c8cdc1b32e2b697d83c2504/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L80 > [SQL] Output schema is set incorrectly > -- > > Key: BEAM-5335 > URL: https://issues.apache.org/jira/browse/BEAM-5335 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > *From: > https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query > :* > I've been playing with the Beam SQL DSL and I'm unable to use the output from > a query without providing a code that's aware of the output schema manually. > Can I infer the output schema rather than hardcoding it? > Neither the walkthrough or the examples actually use the output from a query. > I'm using Scio rather than the plain Java API to keep the code relatively > readable and concise, I don't think that makes a difference for this question. > Here's an example of what I mean. > Given an input schema inSchema and some data source that is mapped onto a Row > as follows: (in this example, Avro-based, but again, I don't think that > matters): > {code} > sc.avroFile[Foo](args("input")) >.map(fooToRow) >.setCoder(inSchema.getRowCoder) >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.saveAsTextFile(args("output")) > {code} > Running this pipeline results in a KryoException as follows: > {code} > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > Serialization trace: > fieldIndices (org.apache.beam.sdk.schemas.Schema) > schema (org.apache.beam.sdk.values.RowWithStorage) > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException > {code} > However, inserting a RowCoder matching the SQL output, in this case a single > count int column: > {code} >...snip... >.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) >.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder) >.saveAsTextFile(args("output")) > {code} > Now the pipeline runs just fine. > Having to manually tell the pipeline how to encode the SQL output seems > unnecessary, given that we specify the input schema/coder(s) and a query. It > seems to me that we should be able to infer the output schema from that - but > I can't see how, other than maybe using Calcite directly? > Before raising a ticket on the Beam Jira, I thought I'd check I wasn't > missing something obvious! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5335) [SQL] Output schema is set incorrectly
Anton Kedin created BEAM-5335: - Summary: [SQL] Output schema is set incorrectly Key: BEAM-5335 URL: https://issues.apache.org/jira/browse/BEAM-5335 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Anton Kedin *From: https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query :* I've been playing with the Beam SQL DSL and I'm unable to use the output from a query without providing a code that's aware of the output schema manually. Can I infer the output schema rather than hardcoding it? Neither the walkthrough or the examples actually use the output from a query. I'm using Scio rather than the plain Java API to keep the code relatively readable and concise, I don't think that makes a difference for this question. Here's an example of what I mean. Given an input schema inSchema and some data source that is mapped onto a Row as follows: (in this example, Avro-based, but again, I don't think that matters): {code} sc.avroFile[Foo](args("input")) .map(fooToRow) .setCoder(inSchema.getRowCoder) .applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) .saveAsTextFile(args("output")) {code} Running this pipeline results in a KryoException as follows: {code} com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException Serialization trace: fieldIndices (org.apache.beam.sdk.schemas.Schema) schema (org.apache.beam.sdk.values.RowWithStorage) org.apache.beam.sdk.Pipeline$PipelineExecutionException: com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException {code} However, inserting a RowCoder matching the SQL output, in this case a single count int column: {code} ...snip... .applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION")) .setCoder(Schema.builder().addInt64Field("count").build().getRowCoder) .saveAsTextFile(args("output")) {code} Now the pipeline runs just fine. Having to manually tell the pipeline how to encode the SQL output seems unnecessary, given that we specify the input schema/coder(s) and a query. It seems to me that we should be able to infer the output schema from that - but I can't see how, other than maybe using Calcite directly? Before raising a ticket on the Beam Jira, I thought I'd check I wasn't missing something obvious! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5127) [SQL] Avoid String parsing in BeamTableUtils
Anton Kedin created BEAM-5127: - Summary: [SQL] Avoid String parsing in BeamTableUtils Key: BEAM-5127 URL: https://issues.apache.org/jira/browse/BEAM-5127 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Anton Kedin Looks like we're going through DateTime parsing on each projection: https://github.com/apache/beam/blob/9319ce18ac625b239c8cdc1b32e2b697d83c2504/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L125 https://github.com/apache/beam/blob/9ef5e8690871737cccfe513ac00481320f662c18/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java#L110 We should avoid unnecessary casting and parsing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5050) [SQL] NULLs are aggregated incorrectly
[ https://issues.apache.org/jira/browse/BEAM-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-5050. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] NULLs are aggregated incorrectly > -- > > Key: BEAM-5050 > URL: https://issues.apache.org/jira/browse/BEAM-5050 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 50m > Remaining Estimate: 0h > > For example, COUNT(field) should not count records with NULL field. We also > should handle and test on other aggregation functions (like AVG, SUM, MIN, > MAX, VAR_POP, VAR_SAMP, etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5056) [SQL] Nullability of aggregation expressions isn't inferred properly
[ https://issues.apache.org/jira/browse/BEAM-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-5056. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] Nullability of aggregation expressions isn't inferred properly > > > Key: BEAM-5056 > URL: https://issues.apache.org/jira/browse/BEAM-5056 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Gleb Kanterov >Assignee: Xu Mingmin >Priority: Major > Fix For: Not applicable > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Given schema and rows: > {code:java} > Schema schema = > Schema.builder() > .addNullableField("f_int1", Schema.FieldType.INT32) > .addNullableField("f_int2", Schema.FieldType.INT32) > .build(); > List rows = > TestUtils.RowsBuilder.of(schema) > .addRows(null, null) > .getRows(); > {code} > Following query fails: > {code:sql} > SELECT AVG(f_int1) FROM PCOLLECTION GROUP BY f_int2 > {code} > {code:java} > Caused by: java.lang.IllegalArgumentException: Field EXPR$0 is not > nullable{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5050) [SQL] NULLs are aggregated incorrectly
Anton Kedin created BEAM-5050: - Summary: [SQL] NULLs are aggregated incorrectly Key: BEAM-5050 URL: https://issues.apache.org/jira/browse/BEAM-5050 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Anton Kedin For example, COUNT(field) should not count records with NULL field -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5049) [SQL] Batch Join results in two shuffles
[ https://issues.apache.org/jira/browse/BEAM-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-5049: -- Description: The query like this: {code} SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON a.some_id = c.some_id; {code} results in two shuffles. Can probably be optimized. Relevant code: - BeamJoinRel implements Join in SQL: https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 - CoGBK Join implementation: https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 was: The query like this: {code} SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id = c.user_id; {code} results in two shuffles. Can probably be optimized. Relevant code: - BeamJoinRel implements Join in SQL: https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 - CoGBK Join implementation: https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 > [SQL] Batch Join results in two shuffles > > > Key: BEAM-5049 > URL: https://issues.apache.org/jira/browse/BEAM-5049 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > The query like this: > {code} > SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON > a.some_id = c.some_id; > {code} > results in two shuffles. Can probably be optimized. > Relevant code: > - BeamJoinRel implements Join in SQL: > https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 > - CoGBK Join implementation: > https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-5049) [SQL] Batch Join results in two shuffles
Anton Kedin created BEAM-5049: - Summary: [SQL] Batch Join results in two shuffles Key: BEAM-5049 URL: https://issues.apache.org/jira/browse/BEAM-5049 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Anton Kedin The query like this: {code} SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id = c.user_id; {code} results in two shuffles. Can probably be optimized. Relevant code: - BeamJoinRel implements Join in SQL: https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 - CoGBK Join implementation: https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4815) [SQL] Document/fix text table (TextIO) insert behavior
Anton Kedin created BEAM-4815: - Summary: [SQL] Document/fix text table (TextIO) insert behavior Key: BEAM-4815 URL: https://issues.apache.org/jira/browse/BEAM-4815 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Tables of type `text` are backed by TextIO. We don't do any extra configuration, so it results in behavior like this: - create text table with location 'test-file.csv' - if the file exists, you can select from it; - now insert values into that table; - this results in TextIO writing files with names like 'test-file.csv-0-of-1'; - selecting from the original table doesn't allow selecting from the newly created files; We need to properly document this, possibly log to Stdout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4771) Update BeamSQL Walkthrough documentation to align with latest version
[ https://issues.apache.org/jira/browse/BEAM-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546989#comment-16546989 ] Anton Kedin commented on BEAM-4771: --- updated the doc: https://beam.apache.org/documentation/dsls/sql/walkthrough/ > Update BeamSQL Walkthrough documentation to align with latest version > - > > Key: BEAM-4771 > URL: https://issues.apache.org/jira/browse/BEAM-4771 > Project: Beam > Issue Type: Bug > Components: examples-java, website >Affects Versions: 2.5.0 >Reporter: Akanksha Sharma >Assignee: Reuven Lax >Priority: Minor > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > As I see, in 2.5 BeamSQL had been changed to work with Schema. > The sample code provided in > [https://beam.apache.org/documentation/dsls/sql/walkthrough/] does not > compile with Beam 2.5, and needs to be updated. > > Row.withRowType(appType) > > The above mentioned line needs to be adapted to use schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-4771) Update BeamSQL Walkthrough documentation to align with latest version
[ https://issues.apache.org/jira/browse/BEAM-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin resolved BEAM-4771. --- Resolution: Fixed Assignee: Anton Kedin (was: Reuven Lax) > Update BeamSQL Walkthrough documentation to align with latest version > - > > Key: BEAM-4771 > URL: https://issues.apache.org/jira/browse/BEAM-4771 > Project: Beam > Issue Type: Bug > Components: examples-java, website >Affects Versions: 2.5.0 >Reporter: Akanksha Sharma >Assignee: Anton Kedin >Priority: Minor > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > As I see, in 2.5 BeamSQL had been changed to work with Schema. > The sample code provided in > [https://beam.apache.org/documentation/dsls/sql/walkthrough/] does not > compile with Beam 2.5, and needs to be updated. > > Row.withRowType(appType) > > The above mentioned line needs to be adapted to use schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3191) [SQL] Implement Stateful Join
[ https://issues.apache.org/jira/browse/BEAM-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3191. - Resolution: Won't Fix Fix Version/s: Not applicable Doesn't seem like stateful join is a good path forward > [SQL] Implement Stateful Join > - > > Key: BEAM-3191 > URL: https://issues.apache.org/jira/browse/BEAM-3191 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > Fix For: Not applicable > > > For binary key-joins, we probably could implement stateful join. > Things to consider: > - identify primary keys for the inputs if possible; > - otherwise buffer both collections, correctly discard collections prefixes > on timer; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3190) [SQL] Join Windowing Semantics
[ https://issues.apache.org/jira/browse/BEAM-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3190. - Resolution: Fixed Fix Version/s: Not applicable Joins of unbounded inputs only support non-global windows with default triggers at the moment. Everything else is rejected. Further improvements will be covered by retractions > [SQL] Join Windowing Semantics > -- > > Key: BEAM-3190 > URL: https://issues.apache.org/jira/browse/BEAM-3190 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > Should join implementation reject incorrect windowing strategies? > Concerns: discarding mode + joins + multiple trigger firings might lead to > incorrect results, like missing join/data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3180) [Nexmark][SQL] Refactor NexmarkLauncher
[ https://issues.apache.org/jira/browse/BEAM-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3180. - Resolution: Won't Fix Fix Version/s: Not applicable > [Nexmark][SQL] Refactor NexmarkLauncher > --- > > Key: BEAM-3180 > URL: https://issues.apache.org/jira/browse/BEAM-3180 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Labels: nexmark > Fix For: Not applicable > > > NexmarkLauncher is a huge class which handles everything about running the > queries, from generation of entities, managing sources and sinks, to > monitoring the perf. And it has zero coverage. > It needs to be split into few components, at least need to move event > generation, sources and sinks creation, and monitoring into their own > components. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3176) BeamSqlCli: support DROP clause
[ https://issues.apache.org/jira/browse/BEAM-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3176. - Resolution: Fixed Fix Version/s: Not applicable This is done > BeamSqlCli: support DROP clause > --- > > Key: BEAM-3176 > URL: https://issues.apache.org/jira/browse/BEAM-3176 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: James Xu >Assignee: James Xu >Priority: Major > Fix For: Not applicable > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3175) BeamSqlCli: support SELECT CLAUSE
[ https://issues.apache.org/jira/browse/BEAM-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3175. - Resolution: Fixed Fix Version/s: Not applicable This is done > BeamSqlCli: support SELECT CLAUSE > - > > Key: BEAM-3175 > URL: https://issues.apache.org/jira/browse/BEAM-3175 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: James Xu >Priority: Major > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-2998) add IT test for SQL
[ https://issues.apache.org/jira/browse/BEAM-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-2998. - Resolution: Fixed Assignee: Anton Kedin Fix Version/s: Not applicable Integration tests added for Pubsub->BigQuery writes using Beam SQL. They cover DDL, sources/sinks > add IT test for SQL > --- > > Key: BEAM-2998 > URL: https://issues.apache.org/jira/browse/BEAM-2998 > Project: Beam > Issue Type: Test > Components: dsl-sql, testing >Reporter: Xu Mingmin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > Add IT test for SQL module > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java > is the base example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4173) [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv
[ https://issues.apache.org/jira/browse/BEAM-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4173. - Resolution: Fixed Assignee: Anton Kedin Fix Version/s: Not applicable Deleted BeamSql Renamed QueryTransform -> SqlTransform Hidden the planner behind BeamSqlEnv > [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv > -- > > Key: BEAM-4173 > URL: https://issues.apache.org/jira/browse/BEAM-4173 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > BeamSql is a single method which delegates to QueryTransform factory method > which creates BeamSqlEnv which creates BeamSqlPlanner which then configures > the parser and parses the query. > It looks like we can squash together a lot of it by: > - replacing BeamSql invocations with direct QueryTransform invocations; > - combining BeamSqlEnv with BeamSqlPlanner or extracting a higher level > configuration object; > - exposing few more QueryTransform builders to accept either planner or a > configuration object; > - building QueryTransforms in BeamSqlCli; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516347#comment-16516347 ] Anton Kedin commented on BEAM-4174: --- There's a TestTableProvider which supports in-memory reads/writes/ddl > [SQL] Add DDL Support for MockedBoundedTable > > > Key: BEAM-4174 > URL: https://issues.apache.org/jira/browse/BEAM-4174 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > Fix For: Not applicable > > > We have a MockedBoundedTable which is an in-memory implementation of > BeamSqlTable. > We should support it in DDL and use it for testing of DDL/DML/CLI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4174. - Resolution: Fixed Assignee: Anton Kedin Fix Version/s: Not applicable > [SQL] Add DDL Support for MockedBoundedTable > > > Key: BEAM-4174 > URL: https://issues.apache.org/jira/browse/BEAM-4174 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > We have a MockedBoundedTable which is an in-memory implementation of > BeamSqlTable. > We should support it in DDL and use it for testing of DDL/DML/CLI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3773. - Resolution: Fixed Fix Version/s: Not applicable JDBC Driver Implemented on top of Avatica > [SQL] Investigate JDBC interface for Beam SQL > - > > Key: BEAM-3773 > URL: https://issues.apache.org/jira/browse/BEAM-3773 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Andrew Pilloud >Priority: Major > Fix For: Not applicable > > Time Spent: 17h > Remaining Estimate: 0h > > JDBC allows integration with a lot of third-party tools, e.g > [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], > [sqlline|https://github.com/julianhyde/sqlline]. We should look into how > feasible it is to implement a JDBC interface for Beam SQL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3983) BigQuery writes from pure SQL
[ https://issues.apache.org/jira/browse/BEAM-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3983. - Resolution: Fixed Fix Version/s: Not applicable > BigQuery writes from pure SQL > - > > Key: BEAM-3983 > URL: https://issues.apache.org/jira/browse/BEAM-3983 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Andrew Pilloud >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 33h 20m > Remaining Estimate: 0h > > It would be nice if you could write to BigQuery in SQL without writing any > java code. For example: > {code:java} > INSERT INTO bigquery SELECT * FROM PCOLLECTION{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4580) [SQL] Package self-contained REPL
[ https://issues.apache.org/jira/browse/BEAM-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4580: -- Description: Create a build target to package everything needed to launch a command-line tool. All runners and IOs (was: Create a build target to package everything needed to launch a command-line tool) > [SQL] Package self-contained REPL > - > > Key: BEAM-4580 > URL: https://issues.apache.org/jira/browse/BEAM-4580 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > Create a build target to package everything needed to launch a command-line > tool. All runners and IOs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4580) [SQL] Package self-contained REPL
Anton Kedin created BEAM-4580: - Summary: [SQL] Package self-contained REPL Key: BEAM-4580 URL: https://issues.apache.org/jira/browse/BEAM-4580 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Create a build target to package everything needed to launch a command-line tool -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4562) [SQL] Fix INSERT VALUES in JdbcDriver
Anton Kedin created BEAM-4562: - Summary: [SQL] Fix INSERT VALUES in JdbcDriver Key: BEAM-4562 URL: https://issues.apache.org/jira/browse/BEAM-4562 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Executing INSERT VALUES against JdbcDriver fails. Executing similar statements against BeamSqlEnv works fine. Example: {code:java} TestTableProvider tableProvider = new TestTableProvider(); Connection connection = JdbcDriver.connect(tableProvider); connection .createStatement() .executeUpdate("CREATE TABLE person (id BIGINT, name VARCHAR) TYPE 'test'"); connection.createStatement().executeQuery("INSERT INTO person VALUES(3, 'yyy')"); {code} Output: {code} java.sql.SQLException: Error while executing SQL "INSERT INTO person VALUES(3, 'yyy')": Node [rel#9:Subset#1.ENUMERABLE.[]] could not be implemented; planner state: Root: rel#9:Subset#1.ENUMERABLE.[] Original rel: BeamIOSinkRel(subset=[rel#9:Subset#1.ENUMERABLE.[]], table=[[beam, person]], operation=[INSERT], flattened=[false]): rowcount = 1.0, cumulative cost = {1.0 rows, 0.0 cpu, 0.0 io}, id = 6 LogicalValues(subset=[rel#5:Subset#0.NONE.[]], tuples=[[{ 3, 'yyy' }]]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 0 Sets: Set#0, type: RecordType(BIGINT id, VARCHAR name) rel#5:Subset#0.NONE.[], best=null, importance=0.81 rel#0:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(BIGINT id, VARCHAR name),tuples=[{ 3, 'yyy' }]), rowcount=1.0, cumulative cost={inf} rel#14:Subset#0.BEAM_LOGICAL.[], best=null, importance=0.81 rel#20:Subset#0.ENUMERABLE.[], best=rel#19, importance=0.405 rel#19:EnumerableValues.ENUMERABLE.[[0, 1], [1]](type=RecordType(BIGINT id, VARCHAR name),tuples=[{ 3, 'yyy' }]), rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io} Set#1, type: RecordType(BIGINT ROWCOUNT) rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=0.9 rel#6:BeamIOSinkRel.BEAM_LOGICAL.[](input=rel#5:Subset#0.NONE.[],table=[beam, person],operation=INSERT,flattened=false), rowcount=1.0, cumulative cost={inf} rel#15:BeamIOSinkRel.BEAM_LOGICAL.[](input=rel#14:Subset#0.BEAM_LOGICAL.[],table=[beam, person],operation=INSERT,flattened=false), rowcount=1.0, cumulative cost={inf} rel#9:Subset#1.ENUMERABLE.[], best=null, importance=1.0 rel#10:AbstractConverter.ENUMERABLE.[](input=rel#7:Subset#1.BEAM_LOGICAL.[],convention=ENUMERABLE,sort=[]), rowcount=1.0, cumulative cost={inf} rel#11:BeamEnumerableConverter.ENUMERABLE.[](input=rel#7:Subset#1.BEAM_LOGICAL.[]), rowcount=1.0, cumulative cost={inf}{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4516) [SQL] Allow casting to complex types
[ https://issues.apache.org/jira/browse/BEAM-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4516: -- Description: Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the target type parameter it only supports primitive types or MULTISET). It's impossible to cast to an inline fully specified Row type at the moment (Calcite 1.16). Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support CAST was: Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the target type parameter it only supports primitive types or MULTISET). It's impossible to cast to a fully ad-hoc specified Row at the moment (Calcite 1.16). Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support CAST > [SQL] Allow casting to complex types > > > Key: BEAM-4516 > URL: https://issues.apache.org/jira/browse/BEAM-4516 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the > target type parameter it only supports primitive types or MULTISET). It's > impossible to cast to an inline fully specified Row type at the moment > (Calcite 1.16). > Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support > CAST -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4516) [SQL] Allow casting to complex types
Anton Kedin created BEAM-4516: - Summary: [SQL] Allow casting to complex types Key: BEAM-4516 URL: https://issues.apache.org/jira/browse/BEAM-4516 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the target type parameter it only supports primitive types or MULTISET). It's impossible to cast to a fully ad-hoc specified Row at the moment (Calcite 1.16). Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support CAST -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4515) [SQL] Support Row construction with named fields
Anton Kedin created BEAM-4515: - Summary: [SQL] Support Row construction with named fields Key: BEAM-4515 URL: https://issues.apache.org/jira/browse/BEAM-4515 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Currently Calcite's ROW() constructor only accepts values and doesn't allow names, so `INSERT INTO f_row SELECT ROW('asdfasdf', 'sdfsdfsd')` will fail with something like this: `Cannot assign to target field 'f_row' of type RecordType(VARCHAR f_str1, VARCHAR f_str2) from source field 'f_row' of type RecordType(VARCHAR EXPR$0, VARCHAR EXPR$1)` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4514) [SQL] Enable flattening configuration
Anton Kedin created BEAM-4514: - Summary: [SQL] Enable flattening configuration Key: BEAM-4514 URL: https://issues.apache.org/jira/browse/BEAM-4514 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Currently the calcite projection also flattens the output row in PlannerImpl. This causes failure when inserting into a table with a ROW field. E.g. `INSERT INTO table (f_row_destination) SELECT f_row_source` will not work, as f_row_source will be flattened into separate fields and won't match the destination schema -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4509) Implement ROW_NUMBER
Anton Kedin created BEAM-4509: - Summary: Implement ROW_NUMBER Key: BEAM-4509 URL: https://issues.apache.org/jira/browse/BEAM-4509 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Design and implement ROW_NUMBER() OVER window. It is supported by Calcite and we should look at feasibility of supporting it in Beam SQL [StackOverflow Post|https://stackoverflow.com/questions/50724531/implement-row-number-in-beamsql] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4387) [SQL] Implement date types comparisons
Anton Kedin created BEAM-4387: - Summary: [SQL] Implement date types comparisons Key: BEAM-4387 URL: https://issues.apache.org/jira/browse/BEAM-4387 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin implement and document datetime/timestamp etc comparisons -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3785) [SQL] Add support for arrays
[ https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3785. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] Add support for arrays > > > Key: BEAM-3785 > URL: https://issues.apache.org/jira/browse/BEAM-3785 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 4.5h > Remaining Estimate: 0h > > Support fields of Array type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3789) [SQL] Support Nested Rows
[ https://issues.apache.org/jira/browse/BEAM-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3789. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] Support Nested Rows > - > > Key: BEAM-3789 > URL: https://issues.apache.org/jira/browse/BEAM-3789 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Add support for SqlTypeName.ROW -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4162. - Resolution: Fixed Fix Version/s: Not applicable > Wire up PubsubIO+JSON to Beam SQL > - > > Key: BEAM-4162 > URL: https://issues.apache.org/jira/browse/BEAM-4162 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to > Beam SQL. > > Use publication time as event timestamp -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4199) [SQL] Add a DLQ support for Pubsub tables
[ https://issues.apache.org/jira/browse/BEAM-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4199. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] Add a DLQ support for Pubsub tables > - > > Key: BEAM-4199 > URL: https://issues.apache.org/jira/browse/BEAM-4199 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently we crash the pipeline if there's any error processing the message > from the pubsub, including if it has incorrect JSON format, like missing > fields etc. > Correct solution would be for the user to specify a way to handle the errors, > and ideally point to a dead-letter-queue where Beam should send the messages > it could not process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4386) [SQL] Implement string LIKE operator
[ https://issues.apache.org/jira/browse/BEAM-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4386. - Resolution: Duplicate Fix Version/s: Not applicable > [SQL] Implement string LIKE operator > > > Key: BEAM-4386 > URL: https://issues.apache.org/jira/browse/BEAM-4386 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > Fix For: Not applicable > > > Implent string LIKE operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4386) [SQL] Implement string LIKE operator
Anton Kedin created BEAM-4386: - Summary: [SQL] Implement string LIKE operator Key: BEAM-4386 URL: https://issues.apache.org/jira/browse/BEAM-4386 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Implent string LIKE operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4358) Create test artifacts
[ https://issues.apache.org/jira/browse/BEAM-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4358: -- Description: Currently things like TestPipeline and TestPubsub implement TestRule and thus require the project to depend on Junit. We need to create separate artifacts for these and depend on Junit only in test scope. (was: Currently things like TestPipeline implement TestRule and thus require dependency on Junit. We need to create separate artifacts for these and depend on Junit only in test scope.) > Create test artifacts > - > > Key: BEAM-4358 > URL: https://issues.apache.org/jira/browse/BEAM-4358 > Project: Beam > Issue Type: Improvement > Components: build-system, testing >Reporter: Anton Kedin >Priority: Major > > Currently things like TestPipeline and TestPubsub implement TestRule and thus > require the project to depend on Junit. We need to create separate artifacts > for these and depend on Junit only in test scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4358) Create test artifacts
[ https://issues.apache.org/jira/browse/BEAM-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4358: -- Description: Currently things like TestPipeline and TestPubsub implement TestRule and thus require the project to depend on Junit. We need to create separate artifacts for these test utilities and depend on Junit only in test scope. (was: Currently things like TestPipeline and TestPubsub implement TestRule and thus require the project to depend on Junit. We need to create separate artifacts for these and depend on Junit only in test scope.) > Create test artifacts > - > > Key: BEAM-4358 > URL: https://issues.apache.org/jira/browse/BEAM-4358 > Project: Beam > Issue Type: Improvement > Components: build-system, testing >Reporter: Anton Kedin >Priority: Major > > Currently things like TestPipeline and TestPubsub implement TestRule and thus > require the project to depend on Junit. We need to create separate artifacts > for these test utilities and depend on Junit only in test scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4358) Create test artifacts
Anton Kedin created BEAM-4358: - Summary: Create test artifacts Key: BEAM-4358 URL: https://issues.apache.org/jira/browse/BEAM-4358 Project: Beam Issue Type: Improvement Components: build-system, testing Reporter: Anton Kedin Currently things like TestPipeline implement TestRule and thus require dependency on Junit. We need to create separate artifacts for these and depend on Junit only in test scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4201) Integration Tests for PubsubIO
[ https://issues.apache.org/jira/browse/BEAM-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4201. - Resolution: Fixed Fix Version/s: Not applicable > Integration Tests for PubsubIO > -- > > Key: BEAM-4201 > URL: https://issues.apache.org/jira/browse/BEAM-4201 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 5h > Remaining Estimate: 0h > > Add integration tests for PubsubIO -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4199) [SQL] Add a DLQ support for Pubsub tables
[ https://issues.apache.org/jira/browse/BEAM-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin reassigned BEAM-4199: - Assignee: Anton Kedin > [SQL] Add a DLQ support for Pubsub tables > - > > Key: BEAM-4199 > URL: https://issues.apache.org/jira/browse/BEAM-4199 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Currently we crash the pipeline if there's any error processing the message > from the pubsub, including if it has incorrect JSON format, like missing > fields etc. > Correct solution would be for the user to specify a way to handle the errors, > and ideally point to a dead-letter-queue where Beam should send the messages > it could not process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4294) Join operator translator
[ https://issues.apache.org/jira/browse/BEAM-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476114#comment-16476114 ] Anton Kedin commented on BEAM-4294: --- You're probably aware of the Joins library in Beam, but i'm going to leave the link here anyway :) [https://github.com/apache/beam/blob/2eeeaa257c9336b2c48c3850c8508686a75be7b3/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java] Don't know how helpful it is in context of Euphoria > Join operator translator > > > Key: BEAM-4294 > URL: https://issues.apache.org/jira/browse/BEAM-4294 > Project: Beam > Issue Type: Sub-task > Components: dsl-euphoria >Affects Versions: Not applicable >Reporter: David Moravek >Assignee: Vaclav Plajt >Priority: Major > > Implementation of basic LeftJoin, RightJoin, InnerJoin and FullJoin > translation using CoGroupByKey. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471491#comment-16471491 ] Anton Kedin commented on BEAM-4162: --- Main PR merged. Remaining work: integration test > Wire up PubsubIO+JSON to Beam SQL > - > > Key: BEAM-4162 > URL: https://issues.apache.org/jira/browse/BEAM-4162 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to > Beam SQL. > > Use publication time as event timestamp -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4200) [SQL] Support Pubsub publish time as event timestamp
[ https://issues.apache.org/jira/browse/BEAM-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4200. - Resolution: Fixed Assignee: Anton Kedin Fix Version/s: Not applicable > [SQL] Support Pubsub publish time as event timestamp > > > Key: BEAM-4200 > URL: https://issues.apache.org/jira/browse/BEAM-4200 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > [https://github.com/apache/beam/pull/5253] adds support for event timestamps > explicitly specified in the messages attributes. PubsubIO also supports using > messages publish timestamps instead. We need to wire this through. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4200) [SQL] Support Pubsub publish time as event timestamp
[ https://issues.apache.org/jira/browse/BEAM-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470894#comment-16470894 ] Anton Kedin commented on BEAM-4200: --- Updated the PR to use ProcessContext.timestamp() to populate the event_timestamp field. This will work for timestamps extracted from both attributes and publish time. > [SQL] Support Pubsub publish time as event timestamp > > > Key: BEAM-4200 > URL: https://issues.apache.org/jira/browse/BEAM-4200 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > [https://github.com/apache/beam/pull/5253] adds support for event timestamps > explicitly specified in the messages attributes. PubsubIO also supports using > messages publish timestamps instead. We need to wire this through. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4196) [SQL] Support Complex Types in DDL
[ https://issues.apache.org/jira/browse/BEAM-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4196. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] Support Complex Types in DDL > -- > > Key: BEAM-4196 > URL: https://issues.apache.org/jira/browse/BEAM-4196 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Neither our DDL parser we copied from calcite-server or the calcite-server > don't support complex types in DDL. If we want to model something like JSON > objects we need to support at least Arrays and nested Rows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4201) Integration Tests for PubsubIO
[ https://issues.apache.org/jira/browse/BEAM-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin reassigned BEAM-4201: - Assignee: Anton Kedin > Integration Tests for PubsubIO > -- > > Key: BEAM-4201 > URL: https://issues.apache.org/jira/browse/BEAM-4201 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Add integration tests for PubsubIO -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-4196) [SQL] Support Complex Types in DDL
[ https://issues.apache.org/jira/browse/BEAM-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin reassigned BEAM-4196: - Assignee: Anton Kedin > [SQL] Support Complex Types in DDL > -- > > Key: BEAM-4196 > URL: https://issues.apache.org/jira/browse/BEAM-4196 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Neither our DDL parser we copied from calcite-server or the calcite-server > don't support complex types in DDL. If we want to model something like JSON > objects we need to support at least Arrays and nested Rows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4201) Integration Tests for PubsubIO
Anton Kedin created BEAM-4201: - Summary: Integration Tests for PubsubIO Key: BEAM-4201 URL: https://issues.apache.org/jira/browse/BEAM-4201 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Add integration tests for PubsubIO -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4200) [SQL] Support Pubsub publish time as event timestamp
Anton Kedin created BEAM-4200: - Summary: [SQL] Support Pubsub publish time as event timestamp Key: BEAM-4200 URL: https://issues.apache.org/jira/browse/BEAM-4200 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin [https://github.com/apache/beam/pull/5253] adds support for event timestamps explicitly specified in the messages attributes. PubsubIO also supports using messages publish timestamps instead. We need to wire this through. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4199) [SQL] Add a DLQ support for Pubsub tables
Anton Kedin created BEAM-4199: - Summary: [SQL] Add a DLQ support for Pubsub tables Key: BEAM-4199 URL: https://issues.apache.org/jira/browse/BEAM-4199 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Anton Kedin Currently we crash the pipeline if there's any error processing the message from the pubsub, including if it has incorrect JSON format, like missing fields etc. Correct solution would be for the user to specify a way to handle the errors, and ideally point to a dead-letter-queue where Beam should send the messages it could not process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4198) Automatically infer JSON schema from Pubsub messages
[ https://issues.apache.org/jira/browse/BEAM-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4198: -- Summary: Automatically infer JSON schema from Pubsub messages (was: Dynamically infer JSON schema from Pubsub messages) > Automatically infer JSON schema from Pubsub messages > > > Key: BEAM-4198 > URL: https://issues.apache.org/jira/browse/BEAM-4198 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Anton Kedin >Priority: Major > > JsonToRow transform allows JSON String->Row conversion but requires users to > know and specify the correct schema upfront. It would be great to be able to > infer the schema from a message sample. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4198) Dynamically infer JSON schema from Pubsub messages
Anton Kedin created BEAM-4198: - Summary: Dynamically infer JSON schema from Pubsub messages Key: BEAM-4198 URL: https://issues.apache.org/jira/browse/BEAM-4198 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Anton Kedin JsonToRow transform allows JSON String->Row conversion but requires users to know and specify the correct schema upfront. It would be great to be able to infer the schema from a message sample. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4197) Add tests for PubsubIO.readXxx() -> Row
Anton Kedin created BEAM-4197: - Summary: Add tests for PubsubIO.readXxx() -> Row Key: BEAM-4197 URL: https://issues.apache.org/jira/browse/BEAM-4197 Project: Beam Issue Type: Improvement Components: sdk-java-core Reporter: Anton Kedin PubsubIO has methods like PubsubIO.readStrings(), PubsubIO.readAvros() and PubsubIO.readProtos(). We need to add tests to make sure it's easy to convert objects read from Pubsub to Rows, e.g. using JsonToRow transform, or InferredRowCoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4196) [SQL] Support Complex Types in DDL
Anton Kedin created BEAM-4196: - Summary: [SQL] Support Complex Types in DDL Key: BEAM-4196 URL: https://issues.apache.org/jira/browse/BEAM-4196 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin Neither our DDL parser we copied from calcite-server or the calcite-server don't support complex types in DDL. If we want to model something like JSON objects we need to support at least Arrays and nested Rows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4195) Support Pubsub emulator in Junit
Anton Kedin created BEAM-4195: - Summary: Support Pubsub emulator in Junit Key: BEAM-4195 URL: https://issues.apache.org/jira/browse/BEAM-4195 Project: Beam Issue Type: New Feature Components: io-java-gcp Reporter: Anton Kedin Assignee: Anton Kedin Add support for unit testing of Pubsub-related functionality by, for example, wrapping the Pubsub emulator into a Junit test runner or a rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4194) [SQL] Support LIMIT
Anton Kedin created BEAM-4194: - Summary: [SQL] Support LIMIT Key: BEAM-4194 URL: https://issues.apache.org/jira/browse/BEAM-4194 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin We need to support queries with "LIMIT xxx". Problem is that we don't know when aggregates will trigger, they can potentially accumulate values in global window and never trigger. If we have some trigger syntax (BEAM-4193), then the use case becomes similar to what we have at the moment, where the user defines the trigger upstream for all inputs. In this case LIMIT probably can be implemented as sample.any(5) with trigger at count. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4193) [SQL] Support trigger definition in CLI
[ https://issues.apache.org/jira/browse/BEAM-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4193: -- Description: We need to be able to define triggers for the sources in SQL CLI. Otherwise it is impossible to reason about outputs. One approach is to come up with a simple JSON trigger definition syntax, along the lines of {code:java} SET OPT( '{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' ) {code} was: We need to be able to define triggers for the sources in SQL CLI. Otherwise it is impossible to reason about outputs. One approach is to come up with a simple JSON trigger definition syntax, along the lines of SET OPT( '\{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' ) > [SQL] Support trigger definition in CLI > --- > > Key: BEAM-4193 > URL: https://issues.apache.org/jira/browse/BEAM-4193 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > We need to be able to define triggers for the sources in SQL CLI. Otherwise > it is impossible to reason about outputs. > One approach is to come up with a simple JSON trigger definition syntax, > along the lines of > > {code:java} > SET OPT( '{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' ) > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4193) [SQL] Support trigger definition in CLI
Anton Kedin created BEAM-4193: - Summary: [SQL] Support trigger definition in CLI Key: BEAM-4193 URL: https://issues.apache.org/jira/browse/BEAM-4193 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin We need to be able to define triggers for the sources in SQL CLI. Otherwise it is impossible to reason about outputs. One approach is to come up with a simple JSON trigger definition syntax, along the lines of SET OPT( '\{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' ) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4183) [SQL] Document supported DDL
Anton Kedin created BEAM-4183: - Summary: [SQL] Document supported DDL Key: BEAM-4183 URL: https://issues.apache.org/jira/browse/BEAM-4183 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin Document supported DDL with all supported types, parameters, and properties for each table type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable
[ https://issues.apache.org/jira/browse/BEAM-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4174: -- Description: We have a MockedBoundedTable which is an in-memory implementation of BeamSqlTable. We should support it in DDL and use it for testing of DDL/DML/CLI. was: We have a MockedBoundedTable which implements BeamSqlTable. We should support it in DDL and use it for testing of DDL/DML/CLI. > [SQL] Add DDL Support for MockedBoundedTable > > > Key: BEAM-4174 > URL: https://issues.apache.org/jira/browse/BEAM-4174 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > We have a MockedBoundedTable which is an in-memory implementation of > BeamSqlTable. > We should support it in DDL and use it for testing of DDL/DML/CLI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable
Anton Kedin created BEAM-4174: - Summary: [SQL] Add DDL Support for MockedBoundedTable Key: BEAM-4174 URL: https://issues.apache.org/jira/browse/BEAM-4174 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin We have a MockedBoundedTable which implements BeamSqlTable. We should support it in DDL and use it for testing of DDL/DML/CLI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4173) [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv
Anton Kedin created BEAM-4173: - Summary: [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv Key: BEAM-4173 URL: https://issues.apache.org/jira/browse/BEAM-4173 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin BeamSql is a single method which delegates to QueryTransform factory method which creates BeamSqlEnv which creates BeamSqlPlanner which then configures the parser and parses the query. It looks like we can squash together a lot of it by: - replacing BeamSql invocations with direct QueryTransform invocations; - combining BeamSqlEnv with BeamSqlPlanner or extracting a higher level configuration object; - exposing few more QueryTransform builders to accept either planner or a configuration object; - building QueryTransforms in BeamSqlCli; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-4160) Convert JSON objects to Rows
[ https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-4160. - Resolution: Fixed Fix Version/s: Not applicable > Convert JSON objects to Rows > > > Key: BEAM-4160 > URL: https://issues.apache.org/jira/browse/BEAM-4160 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, sdk-java-core >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Automate conversion of JSON objects to Rows to reduce overhead for querying > JSON-based sources -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4168) [SQL] Create a "WordCount" example
Anton Kedin created BEAM-4168: - Summary: [SQL] Create a "WordCount" example Key: BEAM-4168 URL: https://issues.apache.org/jira/browse/BEAM-4168 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin Create a SQL example which we can execute and verify in runners other than DirectRunner -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4167) Implement UNNEST
Anton Kedin created BEAM-4167: - Summary: Implement UNNEST Key: BEAM-4167 URL: https://issues.apache.org/jira/browse/BEAM-4167 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin We need to be able to convert collections to relations in the query to perform any meaningful operations on them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types
[ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450456#comment-16450456 ] Anton Kedin commented on BEAM-3157: --- Added example: [https://github.com/apache/beam/pull/5215] Jira for fixing SELECT *: BEAM-4163 > BeamSql transform should support other PCollection types > > > Key: BEAM-3157 > URL: https://issues.apache.org/jira/browse/BEAM-3157 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Ismaël Mejía >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently the Beam SQL transform only supports input and output data > represented as a BeamRecord. This seems to me like an usability limitation > (even if we can do a ParDo to prepare objects before and after the transform). > I suppose this constraint comes from the fact that we need to map > name/type/value from an object field into Calcite so it is convenient to have > a specific data type (BeamRecord) for this. However we can accomplish the > same by using a PCollection of JavaBean (where we know the same information > via the field names/types/values) or by using Avro records where we also have > the Schema information. For the output PCollection we can map the object via > a Reference (e.g. a JavaBean to be filled with the names of an Avro object). > Note: I am assuming for the moment simple mappings since the SQL does not > support composite types for the moment. > A simple API idea would be something like this: > A simple filter: > PCollection col = BeamSql.query("SELECT * FROM WHERE > ...").from(MyPojo.class); > A projection: > PCollection newCol = BeamSql.query("SELECT id, > name").from(MyPojo.class).as(MyNewPojo.class); > A first approach could be to just add the extra ParDos + transform DoFns > however I suppose that for memory use reasons maybe mapping directly into > Calcite would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4163) Fix SELECT * for Pojo queries
Anton Kedin created BEAM-4163: - Summary: Fix SELECT * for Pojo queries Key: BEAM-4163 URL: https://issues.apache.org/jira/browse/BEAM-4163 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Anton Kedin Rows generated from Pojos are based on field indices. Which means they can break if Pojo fields are enumerated in a different order. Which can cause generated Row to be different for different runner instance. Which can cause SELECT * to fail. One solution is to make Pojo field ordering deterministic, e.g. sort them before generating field accessors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4161) Nested Rows flattening doesn't work
[ https://issues.apache.org/jira/browse/BEAM-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4161: -- Issue Type: Bug (was: New Feature) > Nested Rows flattening doesn't work > --- > > Key: BEAM-4161 > URL: https://issues.apache.org/jira/browse/BEAM-4161 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Calcite flattens nested rows. It updates the field indices of the flattened > row so the fields are referenced correctly in the Rel Nodes. But the fields > after the flattened row don't have the indices updated, they have the > previous ordinals before the flattening. There is no way to look up the > correct index at the point when it reaches Beam SQL Rel Nodes. It will be > fixed in Calcite 1.17. > We need to update the Calcite as soon as it is released and add few > integration tests around nested Rows: > - basic nesting with fields before and after the row field; > - multi-level row nesting; > - multiple row fields; > > Calcite JIRA: CALCITE-2220 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4162: -- Description: Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to Beam SQL. Use publication time as event timestamp was:Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to Beam SQL > Wire up PubsubIO+JSON to Beam SQL > - > > Key: BEAM-4162 > URL: https://issues.apache.org/jira/browse/BEAM-4162 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to > Beam SQL. > > Use publication time as event timestamp -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL
Anton Kedin created BEAM-4162: - Summary: Wire up PubsubIO+JSON to Beam SQL Key: BEAM-4162 URL: https://issues.apache.org/jira/browse/BEAM-4162 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin Assignee: Anton Kedin Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to Beam SQL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-4160) Convert JSON objects to Rows
[ https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449058#comment-16449058 ] Anton Kedin edited comment on BEAM-4160 at 4/23/18 11:58 PM: - PR: [https://github.com/apache/beam/pull/5120] Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment). was (Author: kedin): PR: [https://github.com/apache/beam/pull/5120 ] Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment). > Convert JSON objects to Rows > > > Key: BEAM-4160 > URL: https://issues.apache.org/jira/browse/BEAM-4160 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, sdk-java-core >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Automate conversion of JSON objects to Rows to reduce overhead for querying > JSON-based sources -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4161) Nested Rows flattening doesn't work
[ https://issues.apache.org/jira/browse/BEAM-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4161: -- Description: Calcite flattens nested rows. It updates the field indices of the flattened row so the fields are referenced correctly in the Rel Nodes. But the fields after the flattened row don't have the indices updated, they have the previous ordinals before the flattening. There is no way to look up the correct index at the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17. We need to update the Calcite as soon as it is released and add few integration tests around nested Rows: - basic nesting with fields before and after the row field; - multi-level row nesting; - multiple row fields; Calcite JIRA: CALCITE-2220 was: Calcite flattens nested rows. It updates the field indices of the flattened row so the fields are referenced correctly in the Rel Nodes. But the fields after the flattened row don't have the indices updated, they have the previous ordinals before the flattening. There is no way to look up the correct index at the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17. We need to update the Calcite as soon as it is released and add few integration tests around nested Rows: - basic nesting with fields before and after the row field; - multi-level row nesting; - multiple row fields; > Nested Rows flattening doesn't work > --- > > Key: BEAM-4161 > URL: https://issues.apache.org/jira/browse/BEAM-4161 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Calcite flattens nested rows. It updates the field indices of the flattened > row so the fields are referenced correctly in the Rel Nodes. But the fields > after the flattened row don't have the indices updated, they have the > previous ordinals before the flattening. There is no way to look up the > correct index at the point when it reaches Beam SQL Rel Nodes. It will be > fixed in Calcite 1.17. > We need to update the Calcite as soon as it is released and add few > integration tests around nested Rows: > - basic nesting with fields before and after the row field; > - multi-level row nesting; > - multiple row fields; > > Calcite JIRA: CALCITE-2220 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-4160) Convert JSON objects to Rows
[ https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449058#comment-16449058 ] Anton Kedin commented on BEAM-4160: --- PR: [https://github.com/apache/beam/pull/5120 ] Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment). > Convert JSON objects to Rows > > > Key: BEAM-4160 > URL: https://issues.apache.org/jira/browse/BEAM-4160 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, sdk-java-core >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Automate conversion of JSON objects to Rows to reduce overhead for querying > JSON-based sources -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4160) Convert JSON objects to Rows
[ https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4160: -- Description: Automate conversion of JSON objects to Rows to reduce overhead for querying JSON-based sources > Convert JSON objects to Rows > > > Key: BEAM-4160 > URL: https://issues.apache.org/jira/browse/BEAM-4160 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, sdk-java-core >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > > Automate conversion of JSON objects to Rows to reduce overhead for querying > JSON-based sources -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4161) Nested Rows flattening doesn't work
Anton Kedin created BEAM-4161: - Summary: Nested Rows flattening doesn't work Key: BEAM-4161 URL: https://issues.apache.org/jira/browse/BEAM-4161 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Anton Kedin Assignee: Anton Kedin Calcite flattens nested rows. It updates the field indices of the flattened row so the fields are referenced correctly in the Rel Nodes. But the fields after the flattened row don't have the indices updated, they have the previous ordinals before the flattening. There is no way to look up the correct index at the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17. We need to update the Calcite as soon as it is released and add few integration tests around nested Rows: - basic nesting with fields before and after the row field; - multi-level row nesting; - multiple row fields; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-4160) Convert JSON objects to Rows
Anton Kedin created BEAM-4160: - Summary: Convert JSON objects to Rows Key: BEAM-4160 URL: https://issues.apache.org/jira/browse/BEAM-4160 Project: Beam Issue Type: New Feature Components: dsl-sql, sdk-java-core Reporter: Anton Kedin Assignee: Anton Kedin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4076) Schema followups
[ https://issues.apache.org/jira/browse/BEAM-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-4076: -- Component/s: dsl-sql > Schema followups > > > Key: BEAM-4076 > URL: https://issues.apache.org/jira/browse/BEAM-4076 > Project: Beam > Issue Type: Improvement > Components: beam-model, dsl-sql, sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > This umbrella bug contains subtasks with followups for Beam schemas, which > were moved from SQL to the core Java SDK and made to be type-name-based > rather than coder based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3449) [SQL] Document joins
[ https://issues.apache.org/jira/browse/BEAM-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3449. - Resolution: Fixed Fix Version/s: Not applicable Doc updated: https://beam.apache.org/documentation/dsls/sql/#features-joins > [SQL] Document joins > > > Key: BEAM-3449 > URL: https://issues.apache.org/jira/browse/BEAM-3449 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > Joins behavior becomes complicated in presence of non-trivial windowing on > the inputs, e.g. merging windows. We need to provide complete description of > how each windowing mode works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3157) BeamSql transform should support other PCollection types
[ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3157. - Resolution: Fixed Fix Version/s: Not applicable > BeamSql transform should support other PCollection types > > > Key: BEAM-3157 > URL: https://issues.apache.org/jira/browse/BEAM-3157 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Ismaël Mejía >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently the Beam SQL transform only supports input and output data > represented as a BeamRecord. This seems to me like an usability limitation > (even if we can do a ParDo to prepare objects before and after the transform). > I suppose this constraint comes from the fact that we need to map > name/type/value from an object field into Calcite so it is convenient to have > a specific data type (BeamRecord) for this. However we can accomplish the > same by using a PCollection of JavaBean (where we know the same information > via the field names/types/values) or by using Avro records where we also have > the Schema information. For the output PCollection we can map the object via > a Reference (e.g. a JavaBean to be filled with the names of an Avro object). > Note: I am assuming for the moment simple mappings since the SQL does not > support composite types for the moment. > A simple API idea would be something like this: > A simple filter: > PCollection col = BeamSql.query("SELECT * FROM WHERE > ...").from(MyPojo.class); > A projection: > PCollection newCol = BeamSql.query("SELECT id, > name").from(MyPojo.class).as(MyNewPojo.class); > A first approach could be to just add the extra ParDos + transform DoFns > however I suppose that for memory use reasons maybe mapping directly into > Calcite would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types
[ https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437561#comment-16437561 ] Anton Kedin commented on BEAM-3157: --- PR is merged which allows Rows generation for POJOS: [https://github.com/apache/beam/pull/4649/files] Usage example: [https://github.com/apache/beam/blob/670c75e94795ad9da6a0690647e996dc97b60718/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredRowCoderSqlTest.java#L82] Caveat: `SELECT *` doesn't work correctly due to undefined order of fields. This PR resolves this Jira for now. Bugs and extra features will go into separate Jiras. PS: there's also work to convert JSON obejcts directly to Rows: https://github.com/apache/beam/pull/5120 > BeamSql transform should support other PCollection types > > > Key: BEAM-3157 > URL: https://issues.apache.org/jira/browse/BEAM-3157 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Ismaël Mejía >Assignee: Anton Kedin >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently the Beam SQL transform only supports input and output data > represented as a BeamRecord. This seems to me like an usability limitation > (even if we can do a ParDo to prepare objects before and after the transform). > I suppose this constraint comes from the fact that we need to map > name/type/value from an object field into Calcite so it is convenient to have > a specific data type (BeamRecord) for this. However we can accomplish the > same by using a PCollection of JavaBean (where we know the same information > via the field names/types/values) or by using Avro records where we also have > the Schema information. For the output PCollection we can map the object via > a Reference (e.g. a JavaBean to be filled with the names of an Avro object). > Note: I am assuming for the moment simple mappings since the SQL does not > support composite types for the moment. > A simple API idea would be something like this: > A simple filter: > PCollection col = BeamSql.query("SELECT * FROM WHERE > ...").from(MyPojo.class); > A projection: > PCollection newCol = BeamSql.query("SELECT id, > name").from(MyPojo.class).as(MyNewPojo.class); > A first approach could be to just add the extra ParDos + transform DoFns > however I suppose that for memory use reasons maybe mapping directly into > Calcite would make sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3360) [SQL] Do not assign triggers for HOP/TUMBLE
[ https://issues.apache.org/jira/browse/BEAM-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3360. - Resolution: Fixed Fix Version/s: 2.4.0 fixed by https://github.com/apache/beam/pull/4546 > [SQL] Do not assign triggers for HOP/TUMBLE > --- > > Key: BEAM-3360 > URL: https://issues.apache.org/jira/browse/BEAM-3360 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: 2.4.0 > > > Currently when parsing HOP/TUMBLE/SESSION expressions we create a repeating > trigger for the defined windows, see: > {code:java|title=BeamAggregationRule.java} > private Trigger createTriggerWithDelay(GregorianCalendar delayTime) { > return > Repeatedly.forever(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime > > .pastFirstElementInPane().plusDelayOf(Duration.millis(delayTime.getTimeInMillis(); > } > {code} > This will not work correctly with joins, as joins with multiple trigger > firings are currently broken: https://issues.apache.org/jira/browse/BEAM-3190 > . > Even if joins with multiple firings worked correctly, SQL parsing stage is > still probably an incorrect place to infer them. > Better alternatives: > - inherit the user-defined triggers for the input pcollection without > modification; > - triggering at sinks ( https://s.apache.org/beam-sink-triggers ) might > define a way to backpropagate triggers with correct semantics; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin reassigned BEAM-3773: - Assignee: Andrew Pilloud > [SQL] Investigate JDBC interface for Beam SQL > - > > Key: BEAM-3773 > URL: https://issues.apache.org/jira/browse/BEAM-3773 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Andrew Pilloud >Priority: Major > > JDBC allows integration with a lot of third-party tools, e.g > [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], > [sqlline|https://github.com/julianhyde/sqlline]. We should look into how > feasible it is to implement a JDBC interface for Beam SQL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437547#comment-16437547 ] Anton Kedin commented on BEAM-3773: --- Raw Beam JDBC Prototype: [https://github.com/akedin/beam/commit/096ca8d7185af6b6a01f8231cf85c40fa221051a] [~apilloud] is looking at Calcite Avatica integration: [https://calcite.apache.org/avatica/docs/] > [SQL] Investigate JDBC interface for Beam SQL > - > > Key: BEAM-3773 > URL: https://issues.apache.org/jira/browse/BEAM-3773 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Andrew Pilloud >Priority: Major > > JDBC allows integration with a lot of third-party tools, e.g > [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], > [sqlline|https://github.com/julianhyde/sqlline]. We should look into how > feasible it is to implement a JDBC interface for Beam SQL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3547) [SQL] Nested Query Generates Incompatible Trigger
[ https://issues.apache.org/jira/browse/BEAM-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin updated BEAM-3547: -- Fix Version/s: (was: Not applicable) 2.4.0 > [SQL] Nested Query Generates Incompatible Trigger > - > > Key: BEAM-3547 > URL: https://issues.apache.org/jira/browse/BEAM-3547 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: 2.4.0 > > > From > [https://stackoverflow.com/questions/48335383/nested-queries-in-beam-sql] : > > SQL: > {code:java} > PCollection Query_Output = Query.apply( > BeamSql.queryMulti("Select Orders.OrderID From Orders Where > Orders.CustomerID IN (Select Customers.CustomerID From Customers WHERE > Customers.CustomerID = 2)"));{code} > > Error: > {code:java} > org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner > validateAndConvert > INFO: SQL: > SELECT `Orders`.`OrderID` > FROM `Orders` AS `Orders` > WHERE `Orders`.`CustomerID` IN (SELECT `Customers`.`CustomerID` > FROM `Customers` AS `Customers` > WHERE `Customers`.`CustomerID` = 2) > Jan 19, 2018 11:56:36 AM > org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner > convertToBeamRel > INFO: SQLPlan> > LogicalProject(OrderID=[$0]) > LogicalJoin(condition=[=($1, $3)], joinType=[inner]) > LogicalTableScan(table=[[Orders]]) > LogicalAggregate(group=[{0}]) > LogicalProject(CustomerID=[$0]) > LogicalFilter(condition=[=($0, 2)]) > LogicalTableScan(table=[[Customers]]) > Exception in thread "main" java.lang.IllegalStateException: > java.lang.IllegalStateException: Inputs to Flatten had incompatible triggers: > DefaultTrigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow()) > at > org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:165) > at > org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465) > at > org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160) > at com.bitwise.cloud.ExampleOfJoins.main(ExampleOfJoins.java:91) > Caused by: java.lang.IllegalStateException: Inputs to Flatten had > incompatible triggers: DefaultTrigger, > Repeatedly.forever(AfterWatermark.pastEndOfWindow()) > at > org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:123) > at > org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:101) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465) > at > org.apache.beam.sdk.values.PCollectionList.apply(PCollectionList.java:182) > at > org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:124) > at > org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:74) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465) > at > org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple.apply(KeyedPCollectionTuple.java:107) > at org.apache.beam.sdk.extensions.joinlibrary.Join.innerJoin(Join.java:59) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.standardJoin(BeamJoinRel.java:217) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.buildBeamPipeline(BeamJoinRel.java:161) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:68) > at > org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:163) > ... 5 more{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-3547) [SQL] Nested Query Generates Incompatible Trigger
[ https://issues.apache.org/jira/browse/BEAM-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Kedin closed BEAM-3547. - Resolution: Fixed Fix Version/s: Not applicable > [SQL] Nested Query Generates Incompatible Trigger > - > > Key: BEAM-3547 > URL: https://issues.apache.org/jira/browse/BEAM-3547 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Fix For: Not applicable > > > From > [https://stackoverflow.com/questions/48335383/nested-queries-in-beam-sql] : > > SQL: > {code:java} > PCollection Query_Output = Query.apply( > BeamSql.queryMulti("Select Orders.OrderID From Orders Where > Orders.CustomerID IN (Select Customers.CustomerID From Customers WHERE > Customers.CustomerID = 2)"));{code} > > Error: > {code:java} > org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner > validateAndConvert > INFO: SQL: > SELECT `Orders`.`OrderID` > FROM `Orders` AS `Orders` > WHERE `Orders`.`CustomerID` IN (SELECT `Customers`.`CustomerID` > FROM `Customers` AS `Customers` > WHERE `Customers`.`CustomerID` = 2) > Jan 19, 2018 11:56:36 AM > org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner > convertToBeamRel > INFO: SQLPlan> > LogicalProject(OrderID=[$0]) > LogicalJoin(condition=[=($1, $3)], joinType=[inner]) > LogicalTableScan(table=[[Orders]]) > LogicalAggregate(group=[{0}]) > LogicalProject(CustomerID=[$0]) > LogicalFilter(condition=[=($0, 2)]) > LogicalTableScan(table=[[Customers]]) > Exception in thread "main" java.lang.IllegalStateException: > java.lang.IllegalStateException: Inputs to Flatten had incompatible triggers: > DefaultTrigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow()) > at > org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:165) > at > org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465) > at > org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160) > at com.bitwise.cloud.ExampleOfJoins.main(ExampleOfJoins.java:91) > Caused by: java.lang.IllegalStateException: Inputs to Flatten had > incompatible triggers: DefaultTrigger, > Repeatedly.forever(AfterWatermark.pastEndOfWindow()) > at > org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:123) > at > org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:101) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465) > at > org.apache.beam.sdk.values.PCollectionList.apply(PCollectionList.java:182) > at > org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:124) > at > org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:74) > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533) > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465) > at > org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple.apply(KeyedPCollectionTuple.java:107) > at org.apache.beam.sdk.extensions.joinlibrary.Join.innerJoin(Join.java:59) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.standardJoin(BeamJoinRel.java:217) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.buildBeamPipeline(BeamJoinRel.java:161) > at > org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:68) > at > org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:163) > ... 5 more{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-3785) [SQL] Add support for arrays
[ https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561 ] Anton Kedin edited comment on BEAM-3785 at 3/13/18 5:38 PM: to go: - arrays of arrays - test complex indexing (nested expressions) - test aggregations, other complex operations was (Author: kedin): to go: - arrays of arrays - test complex indexing (nested expressions) - test aggregations, other complex operations - DOT operator > [SQL] Add support for arrays > > > Key: BEAM-3785 > URL: https://issues.apache.org/jira/browse/BEAM-3785 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Support fields of Array type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-3785) [SQL] Add support for arrays
[ https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561 ] Anton Kedin edited comment on BEAM-3785 at 3/13/18 5:28 AM: to go: - arrays of arrays - test complex indexing (nested expressions) - test aggregations, other complex operations - DOT operator was (Author: kedin): to go: - arrays of arrays - test complex indexing - DOT operator > [SQL] Add support for arrays > > > Key: BEAM-3785 > URL: https://issues.apache.org/jira/browse/BEAM-3785 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Support fields of Array type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3785) [SQL] Add support for arrays
[ https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561 ] Anton Kedin commented on BEAM-3785: --- to go: - arrays of arrays - test complex indexing - DOT operator > [SQL] Add support for arrays > > > Key: BEAM-3785 > URL: https://issues.apache.org/jira/browse/BEAM-3785 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Support fields of Array type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3785) [SQL] Add support for arrays
[ https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396555#comment-16396555 ] Anton Kedin commented on BEAM-3785: --- implemented arrays of rows > [SQL] Add support for arrays > > > Key: BEAM-3785 > URL: https://issues.apache.org/jira/browse/BEAM-3785 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Anton Kedin >Assignee: Anton Kedin >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Support fields of Array type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-3417) Fix Calcite assertions
[ https://issues.apache.org/jira/browse/BEAM-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390732#comment-16390732 ] Anton Kedin edited comment on BEAM-3417 at 3/12/18 8:10 PM: *What fails?* [Assert in question is in in VolcanoPlanner |https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546]. It checks whether [all traits are simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517] by checking whether they're not instances of RelCompositeTrait. *Why it fails?* In our case, when it fails, traitSet.allSimple() has 2 traits. One is BeamLogicalConvention (it's not a composite trait), and another is a collation-related composite trait which causes the assertion to fail. *Where does the composite trait come from?* We specify the collation trait def in [BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89] before parsing. It then [gets replaced in LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102] with the [composite trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239] which causes the failure. *Why LogicalTableScan needs to do the collation magic?* Dunno, it seems that it adds the statistics information to the collation trait so that the engine can handle sorting correctly. It does so only when we ask it to by adding the collation trait def. *Why VolcanoPlanner doesn't like CompositeTraitSet in that part?* Dunno. *Do we need the collation trait def?* Dunno. *What do we do?* If we can, it probably makes sense to replace LogicalTableScan rel with some kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic or makes it configurable. - (update) At the second look, this assertion seems to happen before the Rel Replacement, so we probably won't be able to replace the logical table scan rel with our own logic. was (Author: kedin): *What fails?* [Assert in question is in in VolcanoPlanner |https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546]. It checks whether [all traits are simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517] by checking whether they're not instances of RelCompositeTrait. *Why it fails?* In our case, when it fails, traitSet.allSimple() has 2 traits. One is BeamLogicalConvention (it's not a composite trait), and another is a collation-related composite trait which causes the assertion to fail. *Where does the composite trait come from?* We specify the collation trait def in [BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89] before parsing. It then [gets replaced in LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102] with the [composite trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239] which causes the failure. *Why LogicalTableScan needs to do the collation magic?* Dunno, it seems that it adds the statistics information to the collation trait so that the engine can handle sorting correctly. It does so only when we ask it to by adding the collation trait def. *Why VolcanoPlanner doesn't like CompositeTraitSet in that part?* Dunno. *Do we need the collation trait def?* Dunno. *What do we do?* If we can, it probably makes sense to replace LogicalTableScanRel with some kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic or makes it configurable > Fix Calcite assertions > -- > > Key: BEAM-3417 > URL: https://issues.apache.org/jira/browse/BEAM-3417 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > Currently we disable assertions in test for every project which depends on > Beam SQL / Calcite. Otherwise it fails assertions when Calcite validates > relational representation of the query. E.g. in projects
[jira] [Comment Edited] (BEAM-3417) Fix Calcite assertions
[ https://issues.apache.org/jira/browse/BEAM-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390732#comment-16390732 ] Anton Kedin edited comment on BEAM-3417 at 3/8/18 4:30 AM: --- *What fails?* [Assert in question is in in VolcanoPlanner |https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546]. It checks whether [all traits are simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517] by checking whether they're not instances of RelCompositeTrait. *Why it fails?* In our case, when it fails, traitSet.allSimple() has 2 traits. One is BeamLogicalConvention (it's not a composite trait), and another is a collation-related composite trait which causes the assertion to fail. *Where does the composite trait come from?* We specify the collation trait def in [BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89] before parsing. It then [gets replaced in LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102] with the [composite trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239] which causes the failure. *Why LogicalTableScan needs to do the collation magic?* Dunno, it seems that it adds the statistics information to the collation trait so that the engine can handle sorting correctly. It does so only when we ask it to by adding the collation trait def. *Why VolcanoPlanner doesn't like CompositeTraitSet in that part?* Dunno. *Do we need the collation trait def?* Dunno. *What do we do?* If we can, it probably makes sense to replace LogicalTableScanRel with some kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic or makes it configurable was (Author: kedin): *What fails? [Assert in question is in in VolcanoPlanner |https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546]. It checks whether [all traits are simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517] by checking whether they're not instances of RelCompositeTrait. *Why it fails? In our case, when it fails, traitSet.allSimple() has 2 traits. One is BeamLogicalConvention (it's not a composite trait), and another is a collation-related composite trait which causes the assertion to fail. *Where does the composite trait come from? We specify the collation trait def in [BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89] before parsing. It then [gets replaced in LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102] with the [composite trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239] which causes the failure. *Why LogicalTableScan needs to do the collation magic? Dunno, it seems that it adds the statistics information to the collation trait so that the engine can handle sorting correctly. It does so only when we ask it to by adding the collation trait def. *Why VolcanoPlanner doesn't like CompositeTraitSet in that part? Dunno. *Do we need the collation trait def? Dunno. *What do we do? If we can, it probably makes sense to replace LogicalTableScanRel with some kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic or makes it configurable > Fix Calcite assertions > -- > > Key: BEAM-3417 > URL: https://issues.apache.org/jira/browse/BEAM-3417 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Anton Kedin >Priority: Major > > Currently we disable assertions in test for every project which depends on > Beam SQL / Calcite. Otherwise it fails assertions when Calcite validates > relational representation of the query. E.g. in projects which depend on Beam > SQL / Calcite we have to specify > {code:java|title=build.gradle} > test { > jvmArgs "-da" > } > {code} > We need to either update our relational conversion logic