[jira] [Updated] (BEAM-5384) [SQL] Calcite optimizes away LogicalProject

2018-10-02 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-5384:
--
Summary: [SQL] Calcite optimizes away LogicalProject  (was: [SQL] Calcite 
Doesn't optimizes away LogicalProject)

> [SQL] Calcite optimizes away LogicalProject
> ---
>
> Key: BEAM-5384
> URL: https://issues.apache.org/jira/browse/BEAM-5384
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> *From 
> [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution]
>  :*
> I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform 
> and writes the results to BigQuery.
> When I don't do any aggregation in my SQL statement it works fine:
> {code:java}
> ..
> PCollection outputStream =
> sqlRows.apply(
> "sql_transform",
> SqlTransform.query("select views from PCOLLECTION"));
> outputStream.setCoder(SCHEMA.getRowCoder());
> ..
> {code}
> However, when I try to aggregate with a sum then it fails (throws a 
> CannotPlanException exception):
> {code:java}
> ..
> PCollection outputStream =
> sqlRows.apply(
> "sql_transform",
> SqlTransform.query("select wikimedia_project, 
> sum(views) from PCOLLECTION group by wikimedia_project"));
> outputStream.setCoder(SCHEMA.getRowCoder());
> ..
> {code}
> Stacktrace:
> {code:java}
> Step #1: 11:47:37,562 0[main] INFO  
> org.apache.beam.runners.dataflow.DataflowRunner - 
> PipelineOptions.filesToStage was not specified. Defaulting to files from the 
> classpath: will stage 117 files. Enable logging at DEBUG level to see which 
> files will be staged.
> Step #1: 11:47:39,845 2283 [main] INFO  
> org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL:
> Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`)
> Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`
> Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project`
> Step #1: 11:47:40,387 2825 [main] INFO  
> org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan>
> Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
> Step #1:   BeamIOSourceRel(table=[[beam, PCOLLECTION]])
> Step #1: 
> Step #1: Exception in thread "main" 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner 
> state:
> Step #1: 
> Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[]
> Step #1: Original rel:
> Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], 
> group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = 
> {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5
> Step #1:   BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], 
> table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 
> rows, 101.0 cpu, 0.0 io}, id = 2
> Step #1: 
> Step #1: Sets:
> Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views)
> Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81
> Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405
> Step #1:
> rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1)
> Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9
> Step #1:
> rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)),
>  rowcount=10.0, cumulative cost={inf}
> Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0
> Step #1:
> rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]),
>  rowcount=10.0, cumulative cost={inf}
> Step #1: 
> Step #1: 
> Step #1:at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
> Step #1:at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
> Step #1:at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
> Step #1:at 
> 

[jira] [Commented] (BEAM-5384) [SQL] Calcite Doesn't optimizes away LogicalProject

2018-09-14 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615043#comment-16615043
 ] 

Anton Kedin commented on BEAM-5384:
---

Reproduction: 
https://github.com/polleyg/gcp-batch-ingestion-bigquery/blob/beam_sql/src/main/java/org/polleyg/TemplatePipeline.java

> [SQL] Calcite Doesn't optimizes away LogicalProject
> ---
>
> Key: BEAM-5384
> URL: https://issues.apache.org/jira/browse/BEAM-5384
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> *From 
> [https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution]
>  :*
> I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform 
> and writes the results to BigQuery.
> When I don't do any aggregation in my SQL statement it works fine:
> {code:java}
> ..
> PCollection outputStream =
> sqlRows.apply(
> "sql_transform",
> SqlTransform.query("select views from PCOLLECTION"));
> outputStream.setCoder(SCHEMA.getRowCoder());
> ..
> {code}
> However, when I try to aggregate with a sum then it fails (throws a 
> CannotPlanException exception):
> {code:java}
> ..
> PCollection outputStream =
> sqlRows.apply(
> "sql_transform",
> SqlTransform.query("select wikimedia_project, 
> sum(views) from PCOLLECTION group by wikimedia_project"));
> outputStream.setCoder(SCHEMA.getRowCoder());
> ..
> {code}
> Stacktrace:
> {code:java}
> Step #1: 11:47:37,562 0[main] INFO  
> org.apache.beam.runners.dataflow.DataflowRunner - 
> PipelineOptions.filesToStage was not specified. Defaulting to files from the 
> classpath: will stage 117 files. Enable logging at DEBUG level to see which 
> files will be staged.
> Step #1: 11:47:39,845 2283 [main] INFO  
> org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL:
> Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`)
> Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`
> Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project`
> Step #1: 11:47:40,387 2825 [main] INFO  
> org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan>
> Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
> Step #1:   BeamIOSourceRel(table=[[beam, PCOLLECTION]])
> Step #1: 
> Step #1: Exception in thread "main" 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
>  Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner 
> state:
> Step #1: 
> Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[]
> Step #1: Original rel:
> Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], 
> group=[{0}], EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = 
> {11.375000476837158 rows, 0.0 cpu, 0.0 io}, id = 5
> Step #1:   BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], 
> table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 
> rows, 101.0 cpu, 0.0 io}, id = 2
> Step #1: 
> Step #1: Sets:
> Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views)
> Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81
> Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, 
> PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405
> Step #1:
> rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]),
>  rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
> 1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
> Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1)
> Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9
> Step #1:
> rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)),
>  rowcount=10.0, cumulative cost={inf}
> Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0
> Step #1:
> rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]),
>  rowcount=10.0, cumulative cost={inf}
> Step #1: 
> Step #1: 
> Step #1:at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
> Step #1:at 
> org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
> Step #1:at 
> 

[jira] [Updated] (BEAM-5384) [SQL] Calcite Doesn't optimizes away LogicalProject

2018-09-13 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-5384:
--
Description: 
*From 
[https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution]
 :*

I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and 
writes the results to BigQuery.

When I don't do any aggregation in my SQL statement it works fine:
{code:java}
..
PCollection outputStream =
sqlRows.apply(
"sql_transform",
SqlTransform.query("select views from PCOLLECTION"));
outputStream.setCoder(SCHEMA.getRowCoder());
..
{code}
However, when I try to aggregate with a sum then it fails (throws a 
CannotPlanException exception):
{code:java}
..
PCollection outputStream =
sqlRows.apply(
"sql_transform",
SqlTransform.query("select wikimedia_project, 
sum(views) from PCOLLECTION group by wikimedia_project"));
outputStream.setCoder(SCHEMA.getRowCoder());
..
{code}
Stacktrace:
{code:java}
Step #1: 11:47:37,562 0[main] INFO  
org.apache.beam.runners.dataflow.DataflowRunner - PipelineOptions.filesToStage 
was not specified. Defaulting to files from the classpath: will stage 117 
files. Enable logging at DEBUG level to see which files will be staged.
Step #1: 11:47:39,845 2283 [main] INFO  
org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL:
Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`)
Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`
Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project`
Step #1: 11:47:40,387 2825 [main] INFO  
org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan>
Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
Step #1:   BeamIOSourceRel(table=[[beam, PCOLLECTION]])
Step #1: 
Step #1: Exception in thread "main" 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
 Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner state:
Step #1: 
Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[]
Step #1: Original rel:
Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], group=[{0}], 
EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = {11.375000476837158 rows, 
0.0 cpu, 0.0 io}, id = 5
Step #1:   BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], 
table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 
101.0 cpu, 0.0 io}, id = 2
Step #1: 
Step #1: Sets:
Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views)
Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81
Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, 
PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405
Step #1:
rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]),
 rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1)
Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9
Step #1:
rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)),
 rowcount=10.0, cumulative cost={inf}
Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0
Step #1:
rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]),
 rowcount=10.0, cumulative cost={inf}
Step #1: 
Step #1: 
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:336)
Step #1:at 
org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner.convertToBeamRel(BeamQueryPlanner.java:138)
Step #1:at 
org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:105)
Step #1:at 
org.apache.beam.sdk.extensions.sql.SqlTransform.expand(SqlTransform.java:96)
Step #1:at 

[jira] [Created] (BEAM-5384) [SQL] Calcite Doesn't optimizes away LogicalProject

2018-09-13 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-5384:
-

 Summary: [SQL] Calcite Doesn't optimizes away LogicalProject
 Key: BEAM-5384
 URL: https://issues.apache.org/jira/browse/BEAM-5384
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


*From 
https://stackoverflow.com/questions/52313324/beam-sql-wont-work-when-using-aggregation-in-statement-cannot-plan-execution
 *:

I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and 
writes the results to BigQuery.

When I don't do any aggregation in my SQL statement it works fine:

{code}
..
PCollection outputStream =
sqlRows.apply(
"sql_transform",
SqlTransform.query("select views from PCOLLECTION"));
outputStream.setCoder(SCHEMA.getRowCoder());
..
{code}

However, when I try to aggregate with a sum then it fails (throws a 
CannotPlanException exception):

{code}
..
PCollection outputStream =
sqlRows.apply(
"sql_transform",
SqlTransform.query("select wikimedia_project, 
sum(views) from PCOLLECTION group by wikimedia_project"));
outputStream.setCoder(SCHEMA.getRowCoder());
..
{code}

Stacktrace:

{code}
Step #1: 11:47:37,562 0[main] INFO  
org.apache.beam.runners.dataflow.DataflowRunner - PipelineOptions.filesToStage 
was not specified. Defaulting to files from the classpath: will stage 117 
files. Enable logging at DEBUG level to see which files will be staged.
Step #1: 11:47:39,845 2283 [main] INFO  
org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQL:
Step #1: SELECT `PCOLLECTION`.`wikimedia_project`, SUM(`PCOLLECTION`.`views`)
Step #1: FROM `beam`.`PCOLLECTION` AS `PCOLLECTION`
Step #1: GROUP BY `PCOLLECTION`.`wikimedia_project`
Step #1: 11:47:40,387 2825 [main] INFO  
org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner - SQLPlan>
Step #1: LogicalAggregate(group=[{0}], EXPR$1=[SUM($1)])
Step #1:   BeamIOSourceRel(table=[[beam, PCOLLECTION]])
Step #1: 
Step #1: Exception in thread "main" 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.RelOptPlanner$CannotPlanException:
 Node [rel#7:Subset#1.BEAM_LOGICAL.[]] could not be implemented; planner state:
Step #1: 
Step #1: Root: rel#7:Subset#1.BEAM_LOGICAL.[]
Step #1: Original rel:
Step #1: LogicalAggregate(subset=[rel#7:Subset#1.BEAM_LOGICAL.[]], group=[{0}], 
EXPR$1=[SUM($1)]): rowcount = 10.0, cumulative cost = {11.375000476837158 rows, 
0.0 cpu, 0.0 io}, id = 5
Step #1:   BeamIOSourceRel(subset=[rel#4:Subset#0.BEAM_LOGICAL.[]], 
table=[[beam, PCOLLECTION]]): rowcount = 100.0, cumulative cost = {100.0 rows, 
101.0 cpu, 0.0 io}, id = 2
Step #1: 
Step #1: Sets:
Step #1: Set#0, type: RecordType(VARCHAR wikimedia_project, BIGINT views)
Step #1:rel#4:Subset#0.BEAM_LOGICAL.[], best=rel#2, importance=0.81
Step #1:rel#2:BeamIOSourceRel.BEAM_LOGICAL.[](table=[beam, 
PCOLLECTION]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
Step #1:rel#10:Subset#0.ENUMERABLE.[], best=rel#9, importance=0.405
Step #1:
rel#9:BeamEnumerableConverter.ENUMERABLE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[]),
 rowcount=100.0, cumulative cost={1.7976931348623157E308 rows, 
1.7976931348623157E308 cpu, 1.7976931348623157E308 io}
Step #1: Set#1, type: RecordType(VARCHAR wikimedia_project, BIGINT EXPR$1)
Step #1:rel#6:Subset#1.NONE.[], best=null, importance=0.9
Step #1:
rel#5:LogicalAggregate.NONE.[](input=rel#4:Subset#0.BEAM_LOGICAL.[],group={0},EXPR$1=SUM($1)),
 rowcount=10.0, cumulative cost={inf}
Step #1:rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=1.0
Step #1:
rel#8:AbstractConverter.BEAM_LOGICAL.[](input=rel#6:Subset#1.NONE.[],convention=BEAM_LOGICAL,sort=[]),
 rowcount=10.0, cumulative cost={inf}
Step #1: 
Step #1: 
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:448)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:298)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368)
Step #1:at 
org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:336)
Step #1:at 
org.apache.beam.sdk.extensions.sql.impl.BeamQueryPlanner.convertToBeamRel(BeamQueryPlanner.java:138)
Step #1:at 
org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:105)

[jira] [Commented] (BEAM-5335) [SQL] Output schema is not set incorrectly

2018-09-07 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607355#comment-16607355
 ] 

Anton Kedin commented on BEAM-5335:
---

>From 
>https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query/52209683:

bq. I think it is a bug in Scio - the pipeline works in plain Java, as it turns 
out. Then I saw that Scio sets the coder itself after applying a transform, 
maybe this is the problem. I'll raise a ticket on the repo 
https://github.com/spotify/scio/blob/e95310ec0828e60209e279baa5047a6b62288c5a/scio-core/src/main/scala/com/spotify/scio/values/SCollection.scala#L149

> [SQL] Output schema is not set incorrectly
> --
>
> Key: BEAM-5335
> URL: https://issues.apache.org/jira/browse/BEAM-5335
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> *From: 
> https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query
>  :*
> I've been playing with the Beam SQL DSL and I'm unable to use the output from 
> a query without providing a code that's aware of the output schema manually. 
> Can I infer the output schema rather than hardcoding it?
> Neither the walkthrough or the examples actually use the output from a query. 
> I'm using Scio rather than the plain Java API to keep the code relatively 
> readable and concise, I don't think that makes a difference for this question.
> Here's an example of what I mean.
> Given an input schema inSchema and some data source that is mapped onto a Row 
> as follows: (in this example, Avro-based, but again, I don't think that 
> matters):
> {code}
> sc.avroFile[Foo](args("input"))
>.map(fooToRow)
>.setCoder(inSchema.getRowCoder)
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.saveAsTextFile(args("output"))
> {code}
> Running this pipeline results in a KryoException as follows:
> {code}
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> fieldIndices (org.apache.beam.sdk.schemas.Schema)
> schema (org.apache.beam.sdk.values.RowWithStorage)
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> {code}
> However, inserting a RowCoder matching the SQL output, in this case a single 
> count int column:
> {code}
>...snip...
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder)
>.saveAsTextFile(args("output"))
> {code}
> Now the pipeline runs just fine.
> Having to manually tell the pipeline how to encode the SQL output seems 
> unnecessary, given that we specify the input schema/coder(s) and a query. It 
> seems to me that we should be able to infer the output schema from that - but 
> I can't see how, other than maybe using Calcite directly?
> Before raising a ticket on the Beam Jira, I thought I'd check I wasn't 
> missing something obvious!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5335) [SQL] Output schema is not set incorrectly

2018-09-06 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-5335:
--
Summary: [SQL] Output schema is not set incorrectly  (was: [SQL] Output 
schema is set incorrectly)

> [SQL] Output schema is not set incorrectly
> --
>
> Key: BEAM-5335
> URL: https://issues.apache.org/jira/browse/BEAM-5335
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> *From: 
> https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query
>  :*
> I've been playing with the Beam SQL DSL and I'm unable to use the output from 
> a query without providing a code that's aware of the output schema manually. 
> Can I infer the output schema rather than hardcoding it?
> Neither the walkthrough or the examples actually use the output from a query. 
> I'm using Scio rather than the plain Java API to keep the code relatively 
> readable and concise, I don't think that makes a difference for this question.
> Here's an example of what I mean.
> Given an input schema inSchema and some data source that is mapped onto a Row 
> as follows: (in this example, Avro-based, but again, I don't think that 
> matters):
> {code}
> sc.avroFile[Foo](args("input"))
>.map(fooToRow)
>.setCoder(inSchema.getRowCoder)
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.saveAsTextFile(args("output"))
> {code}
> Running this pipeline results in a KryoException as follows:
> {code}
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> fieldIndices (org.apache.beam.sdk.schemas.Schema)
> schema (org.apache.beam.sdk.values.RowWithStorage)
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> {code}
> However, inserting a RowCoder matching the SQL output, in this case a single 
> count int column:
> {code}
>...snip...
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder)
>.saveAsTextFile(args("output"))
> {code}
> Now the pipeline runs just fine.
> Having to manually tell the pipeline how to encode the SQL output seems 
> unnecessary, given that we specify the input schema/coder(s) and a query. It 
> seems to me that we should be able to infer the output schema from that - but 
> I can't see how, other than maybe using Calcite directly?
> Before raising a ticket on the Beam Jira, I thought I'd check I wasn't 
> missing something obvious!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5335) [SQL] Output schema is set incorrectly

2018-09-06 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606168#comment-16606168
 ] 

Anton Kedin commented on BEAM-5335:
---

And we need a test to chain multiple queries

> [SQL] Output schema is set incorrectly
> --
>
> Key: BEAM-5335
> URL: https://issues.apache.org/jira/browse/BEAM-5335
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> *From: 
> https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query
>  :*
> I've been playing with the Beam SQL DSL and I'm unable to use the output from 
> a query without providing a code that's aware of the output schema manually. 
> Can I infer the output schema rather than hardcoding it?
> Neither the walkthrough or the examples actually use the output from a query. 
> I'm using Scio rather than the plain Java API to keep the code relatively 
> readable and concise, I don't think that makes a difference for this question.
> Here's an example of what I mean.
> Given an input schema inSchema and some data source that is mapped onto a Row 
> as follows: (in this example, Avro-based, but again, I don't think that 
> matters):
> {code}
> sc.avroFile[Foo](args("input"))
>.map(fooToRow)
>.setCoder(inSchema.getRowCoder)
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.saveAsTextFile(args("output"))
> {code}
> Running this pipeline results in a KryoException as follows:
> {code}
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> fieldIndices (org.apache.beam.sdk.schemas.Schema)
> schema (org.apache.beam.sdk.values.RowWithStorage)
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> {code}
> However, inserting a RowCoder matching the SQL output, in this case a single 
> count int column:
> {code}
>...snip...
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder)
>.saveAsTextFile(args("output"))
> {code}
> Now the pipeline runs just fine.
> Having to manually tell the pipeline how to encode the SQL output seems 
> unnecessary, given that we specify the input schema/coder(s) and a query. It 
> seems to me that we should be able to infer the output schema from that - but 
> I can't see how, other than maybe using Calcite directly?
> Before raising a ticket on the Beam Jira, I thought I'd check I wasn't 
> missing something obvious!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5335) [SQL] Output schema is set incorrectly

2018-09-06 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606167#comment-16606167
 ] 

Anton Kedin commented on BEAM-5335:
---

This is potentially scio-specific, as output schema should be set correctly: 
https://github.com/apache/beam/blob/9319ce18ac625b239c8cdc1b32e2b697d83c2504/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L80

> [SQL] Output schema is set incorrectly
> --
>
> Key: BEAM-5335
> URL: https://issues.apache.org/jira/browse/BEAM-5335
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> *From: 
> https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query
>  :*
> I've been playing with the Beam SQL DSL and I'm unable to use the output from 
> a query without providing a code that's aware of the output schema manually. 
> Can I infer the output schema rather than hardcoding it?
> Neither the walkthrough or the examples actually use the output from a query. 
> I'm using Scio rather than the plain Java API to keep the code relatively 
> readable and concise, I don't think that makes a difference for this question.
> Here's an example of what I mean.
> Given an input schema inSchema and some data source that is mapped onto a Row 
> as follows: (in this example, Avro-based, but again, I don't think that 
> matters):
> {code}
> sc.avroFile[Foo](args("input"))
>.map(fooToRow)
>.setCoder(inSchema.getRowCoder)
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.saveAsTextFile(args("output"))
> {code}
> Running this pipeline results in a KryoException as follows:
> {code}
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> Serialization trace:
> fieldIndices (org.apache.beam.sdk.schemas.Schema)
> schema (org.apache.beam.sdk.values.RowWithStorage)
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
> {code}
> However, inserting a RowCoder matching the SQL output, in this case a single 
> count int column:
> {code}
>...snip...
>.applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
>.setCoder(Schema.builder().addInt64Field("count").build().getRowCoder)
>.saveAsTextFile(args("output"))
> {code}
> Now the pipeline runs just fine.
> Having to manually tell the pipeline how to encode the SQL output seems 
> unnecessary, given that we specify the input schema/coder(s) and a query. It 
> seems to me that we should be able to infer the output schema from that - but 
> I can't see how, other than maybe using Calcite directly?
> Before raising a ticket on the Beam Jira, I thought I'd check I wasn't 
> missing something obvious!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5335) [SQL] Output schema is set incorrectly

2018-09-06 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-5335:
-

 Summary: [SQL] Output schema is set incorrectly
 Key: BEAM-5335
 URL: https://issues.apache.org/jira/browse/BEAM-5335
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


*From: 
https://stackoverflow.com/questions/52181795/how-do-i-get-an-output-schema-for-an-apache-beam-sql-query
 :*

I've been playing with the Beam SQL DSL and I'm unable to use the output from a 
query without providing a code that's aware of the output schema manually. Can 
I infer the output schema rather than hardcoding it?

Neither the walkthrough or the examples actually use the output from a query. 
I'm using Scio rather than the plain Java API to keep the code relatively 
readable and concise, I don't think that makes a difference for this question.

Here's an example of what I mean.

Given an input schema inSchema and some data source that is mapped onto a Row 
as follows: (in this example, Avro-based, but again, I don't think that 
matters):

{code}
sc.avroFile[Foo](args("input"))
   .map(fooToRow)
   .setCoder(inSchema.getRowCoder)
   .applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
   .saveAsTextFile(args("output"))
{code}

Running this pipeline results in a KryoException as follows:

{code}
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
fieldIndices (org.apache.beam.sdk.schemas.Schema)
schema (org.apache.beam.sdk.values.RowWithStorage)
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
{code}

However, inserting a RowCoder matching the SQL output, in this case a single 
count int column:

{code}
   ...snip...
   .applyTransform(SqlTransform.query("SELECT COUNT(1) FROM PCOLLECTION"))
   .setCoder(Schema.builder().addInt64Field("count").build().getRowCoder)
   .saveAsTextFile(args("output"))
{code}

Now the pipeline runs just fine.

Having to manually tell the pipeline how to encode the SQL output seems 
unnecessary, given that we specify the input schema/coder(s) and a query. It 
seems to me that we should be able to infer the output schema from that - but I 
can't see how, other than maybe using Calcite directly?

Before raising a ticket on the Beam Jira, I thought I'd check I wasn't missing 
something obvious!





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5127) [SQL] Avoid String parsing in BeamTableUtils

2018-08-10 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-5127:
-

 Summary: [SQL] Avoid String parsing in BeamTableUtils
 Key: BEAM-5127
 URL: https://issues.apache.org/jira/browse/BEAM-5127
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


Looks like we're going through DateTime parsing on each projection: 

https://github.com/apache/beam/blob/9319ce18ac625b239c8cdc1b32e2b697d83c2504/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L125

https://github.com/apache/beam/blob/9ef5e8690871737cccfe513ac00481320f662c18/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/schema/BeamTableUtils.java#L110

We should avoid unnecessary casting and parsing





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5050) [SQL] NULLs are aggregated incorrectly

2018-08-03 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-5050.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] NULLs are aggregated incorrectly
> --
>
> Key: BEAM-5050
> URL: https://issues.apache.org/jira/browse/BEAM-5050
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For example, COUNT(field) should not count records with NULL field. We also 
> should handle and test on other aggregation functions (like AVG, SUM, MIN, 
> MAX, VAR_POP, VAR_SAMP, etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-5056) [SQL] Nullability of aggregation expressions isn't inferred properly

2018-08-03 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-5056.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Nullability of aggregation expressions isn't inferred properly
> 
>
> Key: BEAM-5056
> URL: https://issues.apache.org/jira/browse/BEAM-5056
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Gleb Kanterov
>Assignee: Xu Mingmin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Given schema and rows:
> {code:java}
> Schema schema =
> Schema.builder()
> .addNullableField("f_int1", Schema.FieldType.INT32)
> .addNullableField("f_int2", Schema.FieldType.INT32)
> .build();
> List rows =
> TestUtils.RowsBuilder.of(schema)
> .addRows(null, null)
> .getRows();
> {code}
> Following query fails:
> {code:sql}
> SELECT AVG(f_int1) FROM PCOLLECTION GROUP BY f_int2
> {code}
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Field EXPR$0 is not 
> nullable{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5050) [SQL] NULLs are aggregated incorrectly

2018-07-31 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-5050:
-

 Summary: [SQL] NULLs are aggregated incorrectly
 Key: BEAM-5050
 URL: https://issues.apache.org/jira/browse/BEAM-5050
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


For example, COUNT(field) should not count records with NULL field



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-5049) [SQL] Batch Join results in two shuffles

2018-07-31 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-5049:
--
Description: 
The query like this:

{code}
SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON a.some_id 
= c.some_id;
{code}

results in two shuffles. Can probably be optimized.

Relevant code:

 - BeamJoinRel implements Join in SQL: 
https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194

- CoGBK Join implementation: 
https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36



  was:
The query like this:

{code}
SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id 
= c.user_id;
{code}

results in two shuffles. Can probably be optimized.

Relevant code:

 - BeamJoinRel implements Join in SQL: 
https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194

- CoGBK Join implementation: 
https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36




> [SQL] Batch Join results in two shuffles
> 
>
> Key: BEAM-5049
> URL: https://issues.apache.org/jira/browse/BEAM-5049
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> The query like this:
> {code}
> SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON 
> a.some_id = c.some_id;
> {code}
> results in two shuffles. Can probably be optimized.
> Relevant code:
>  - BeamJoinRel implements Join in SQL: 
> https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194
> - CoGBK Join implementation: 
> https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-5049) [SQL] Batch Join results in two shuffles

2018-07-31 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-5049:
-

 Summary: [SQL] Batch Join results in two shuffles
 Key: BEAM-5049
 URL: https://issues.apache.org/jira/browse/BEAM-5049
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


The query like this:

{code}
SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id 
= c.user_id;
{code}

results in two shuffles. Can probably be optimized.

Relevant code:

 - BeamJoinRel implements Join in SQL: 
https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194

- CoGBK Join implementation: 
https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4815) [SQL] Document/fix text table (TextIO) insert behavior

2018-07-18 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4815:
-

 Summary: [SQL] Document/fix text table (TextIO) insert behavior
 Key: BEAM-4815
 URL: https://issues.apache.org/jira/browse/BEAM-4815
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Tables of type `text` are backed by TextIO. We don't do any extra 
configuration, so it results in behavior like this:

 - create text table with location 'test-file.csv'

 - if the file exists, you can select from it;

 - now insert values into that table;

     - this results in TextIO writing files with names like 
'test-file.csv-0-of-1';

     - selecting from the original table doesn't allow selecting from the newly 
created files;

We need to properly document this, possibly log to Stdout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4771) Update BeamSQL Walkthrough documentation to align with latest version

2018-07-17 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546989#comment-16546989
 ] 

Anton Kedin commented on BEAM-4771:
---

updated the doc: https://beam.apache.org/documentation/dsls/sql/walkthrough/

> Update BeamSQL Walkthrough documentation to align with latest version
> -
>
> Key: BEAM-4771
> URL: https://issues.apache.org/jira/browse/BEAM-4771
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, website
>Affects Versions: 2.5.0
>Reporter: Akanksha Sharma
>Assignee: Reuven Lax
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As I see, in 2.5 BeamSQL had been changed to work with Schema.
> The sample code provided in 
> [https://beam.apache.org/documentation/dsls/sql/walkthrough/] does not 
> compile with Beam 2.5, and needs to be updated.
>  
> Row.withRowType(appType)
>  
> The above mentioned line needs to be adapted to use schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (BEAM-4771) Update BeamSQL Walkthrough documentation to align with latest version

2018-07-17 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin resolved BEAM-4771.
---
Resolution: Fixed
  Assignee: Anton Kedin  (was: Reuven Lax)

> Update BeamSQL Walkthrough documentation to align with latest version
> -
>
> Key: BEAM-4771
> URL: https://issues.apache.org/jira/browse/BEAM-4771
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java, website
>Affects Versions: 2.5.0
>Reporter: Akanksha Sharma
>Assignee: Anton Kedin
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As I see, in 2.5 BeamSQL had been changed to work with Schema.
> The sample code provided in 
> [https://beam.apache.org/documentation/dsls/sql/walkthrough/] does not 
> compile with Beam 2.5, and needs to be updated.
>  
> Row.withRowType(appType)
>  
> The above mentioned line needs to be adapted to use schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3191) [SQL] Implement Stateful Join

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3191.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

Doesn't seem like stateful join is a good path forward

> [SQL] Implement Stateful Join
> -
>
> Key: BEAM-3191
> URL: https://issues.apache.org/jira/browse/BEAM-3191
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> For binary key-joins, we probably could implement stateful join.
> Things to consider:
>  - identify primary keys for the inputs if possible;
>  - otherwise buffer both collections, correctly discard collections prefixes 
> on timer;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3190) [SQL] Join Windowing Semantics

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3190.
-
   Resolution: Fixed
Fix Version/s: Not applicable

Joins of unbounded inputs only support non-global windows with default triggers 
at the moment. Everything else is rejected. Further improvements will be 
covered by retractions

> [SQL] Join Windowing Semantics
> --
>
> Key: BEAM-3190
> URL: https://issues.apache.org/jira/browse/BEAM-3190
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> Should join implementation reject incorrect windowing strategies?
> Concerns: discarding mode + joins + multiple trigger firings might lead to 
> incorrect results, like missing join/data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3180) [Nexmark][SQL] Refactor NexmarkLauncher

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3180.
-
   Resolution: Won't Fix
Fix Version/s: Not applicable

> [Nexmark][SQL] Refactor NexmarkLauncher
> ---
>
> Key: BEAM-3180
> URL: https://issues.apache.org/jira/browse/BEAM-3180
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Labels: nexmark
> Fix For: Not applicable
>
>
> NexmarkLauncher is a huge class which handles everything about running the 
> queries, from generation of entities, managing sources and sinks, to 
> monitoring the perf. And it has zero coverage.
> It needs to be split into few components, at least need to move event 
> generation, sources and sinks creation, and monitoring into their own 
> components.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3176) BeamSqlCli: support DROP clause

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3176.
-
   Resolution: Fixed
Fix Version/s: Not applicable

This is done

> BeamSqlCli: support DROP clause
> ---
>
> Key: BEAM-3176
> URL: https://issues.apache.org/jira/browse/BEAM-3176
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Assignee: James Xu
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3175) BeamSqlCli: support SELECT CLAUSE

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3175.
-
   Resolution: Fixed
Fix Version/s: Not applicable

This is done

> BeamSqlCli: support SELECT CLAUSE
> -
>
> Key: BEAM-3175
> URL: https://issues.apache.org/jira/browse/BEAM-3175
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: James Xu
>Priority: Major
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-2998) add IT test for SQL

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-2998.
-
   Resolution: Fixed
 Assignee: Anton Kedin
Fix Version/s: Not applicable

Integration tests added for Pubsub->BigQuery writes using Beam SQL. They cover 
DDL, sources/sinks

> add IT test for SQL
> ---
>
> Key: BEAM-2998
> URL: https://issues.apache.org/jira/browse/BEAM-2998
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql, testing
>Reporter: Xu Mingmin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> Add IT test for SQL module
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java
>  is the base example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4173) [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4173.
-
   Resolution: Fixed
 Assignee: Anton Kedin
Fix Version/s: Not applicable

Deleted BeamSql

Renamed QueryTransform -> SqlTransform

Hidden the planner behind BeamSqlEnv

> [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv
> --
>
> Key: BEAM-4173
> URL: https://issues.apache.org/jira/browse/BEAM-4173
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> BeamSql is a single method which delegates to QueryTransform factory method 
> which creates BeamSqlEnv which creates BeamSqlPlanner which then configures 
> the parser and parses the query.
> It looks like we can squash together a lot of it by:
>  - replacing BeamSql invocations with direct QueryTransform invocations;
>  - combining BeamSqlEnv with BeamSqlPlanner or extracting a higher level 
> configuration object;
>  - exposing few more QueryTransform builders to accept either planner or a 
> configuration object;
>  - building QueryTransforms in BeamSqlCli;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable

2018-06-18 Thread Anton Kedin (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516347#comment-16516347
 ] 

Anton Kedin commented on BEAM-4174:
---

There's a TestTableProvider which supports in-memory reads/writes/ddl

> [SQL] Add DDL Support for MockedBoundedTable
> 
>
> Key: BEAM-4174
> URL: https://issues.apache.org/jira/browse/BEAM-4174
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> We have a MockedBoundedTable which is an in-memory implementation of 
> BeamSqlTable.
> We should support it in DDL and use it for testing of DDL/DML/CLI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4174.
-
   Resolution: Fixed
 Assignee: Anton Kedin
Fix Version/s: Not applicable

> [SQL] Add DDL Support for MockedBoundedTable
> 
>
> Key: BEAM-4174
> URL: https://issues.apache.org/jira/browse/BEAM-4174
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> We have a MockedBoundedTable which is an in-memory implementation of 
> BeamSqlTable.
> We should support it in DDL and use it for testing of DDL/DML/CLI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3773.
-
   Resolution: Fixed
Fix Version/s: Not applicable

JDBC Driver Implemented on top of Avatica

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3983) BigQuery writes from pure SQL

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3983.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> BigQuery writes from pure SQL
> -
>
> Key: BEAM-3983
> URL: https://issues.apache.org/jira/browse/BEAM-3983
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Andrew Pilloud
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 33h 20m
>  Remaining Estimate: 0h
>
> It would be nice if you could write to BigQuery in SQL without writing any 
> java code. For example:
> {code:java}
> INSERT INTO bigquery SELECT * FROM PCOLLECTION{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4580) [SQL] Package self-contained REPL

2018-06-18 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4580:
--
Description: Create a build target to package everything needed to launch a 
command-line tool. All runners and IOs  (was: Create a build target to package 
everything needed to launch a command-line tool)

> [SQL] Package self-contained REPL
> -
>
> Key: BEAM-4580
> URL: https://issues.apache.org/jira/browse/BEAM-4580
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> Create a build target to package everything needed to launch a command-line 
> tool. All runners and IOs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4580) [SQL] Package self-contained REPL

2018-06-18 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4580:
-

 Summary: [SQL] Package self-contained REPL
 Key: BEAM-4580
 URL: https://issues.apache.org/jira/browse/BEAM-4580
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Create a build target to package everything needed to launch a command-line tool



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4562) [SQL] Fix INSERT VALUES in JdbcDriver

2018-06-14 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4562:
-

 Summary: [SQL] Fix INSERT VALUES in JdbcDriver 
 Key: BEAM-4562
 URL: https://issues.apache.org/jira/browse/BEAM-4562
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Executing INSERT VALUES against JdbcDriver fails. Executing similar statements 
against BeamSqlEnv works fine. Example:

{code:java}
TestTableProvider tableProvider = new TestTableProvider();
Connection connection = JdbcDriver.connect(tableProvider);

connection
.createStatement()
.executeUpdate("CREATE TABLE person (id BIGINT, name VARCHAR) TYPE 
'test'");

connection.createStatement().executeQuery("INSERT INTO person VALUES(3, 
'yyy')");
{code}

 Output:

{code}
java.sql.SQLException: Error while executing SQL "INSERT INTO person VALUES(3, 
'yyy')": Node [rel#9:Subset#1.ENUMERABLE.[]] could not be implemented; planner 
state:

Root: rel#9:Subset#1.ENUMERABLE.[]
Original rel:
BeamIOSinkRel(subset=[rel#9:Subset#1.ENUMERABLE.[]], table=[[beam, person]], 
operation=[INSERT], flattened=[false]): rowcount = 1.0, cumulative cost = {1.0 
rows, 0.0 cpu, 0.0 io}, id = 6
  LogicalValues(subset=[rel#5:Subset#0.NONE.[]], tuples=[[{ 3, 'yyy' }]]): 
rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 0

Sets:
Set#0, type: RecordType(BIGINT id, VARCHAR name)
rel#5:Subset#0.NONE.[], best=null, importance=0.81
rel#0:LogicalValues.NONE.[[0, 1], [1]](type=RecordType(BIGINT 
id, VARCHAR name),tuples=[{ 3, 'yyy' }]), rowcount=1.0, cumulative cost={inf}
rel#14:Subset#0.BEAM_LOGICAL.[], best=null, importance=0.81
rel#20:Subset#0.ENUMERABLE.[], best=rel#19, importance=0.405
rel#19:EnumerableValues.ENUMERABLE.[[0, 1], 
[1]](type=RecordType(BIGINT id, VARCHAR name),tuples=[{ 3, 'yyy' }]), 
rowcount=1.0, cumulative cost={1.0 rows, 1.0 cpu, 0.0 io}
Set#1, type: RecordType(BIGINT ROWCOUNT)
rel#7:Subset#1.BEAM_LOGICAL.[], best=null, importance=0.9

rel#6:BeamIOSinkRel.BEAM_LOGICAL.[](input=rel#5:Subset#0.NONE.[],table=[beam, 
person],operation=INSERT,flattened=false), rowcount=1.0, cumulative cost={inf}

rel#15:BeamIOSinkRel.BEAM_LOGICAL.[](input=rel#14:Subset#0.BEAM_LOGICAL.[],table=[beam,
 person],operation=INSERT,flattened=false), rowcount=1.0, cumulative cost={inf}
rel#9:Subset#1.ENUMERABLE.[], best=null, importance=1.0

rel#10:AbstractConverter.ENUMERABLE.[](input=rel#7:Subset#1.BEAM_LOGICAL.[],convention=ENUMERABLE,sort=[]),
 rowcount=1.0, cumulative cost={inf}

rel#11:BeamEnumerableConverter.ENUMERABLE.[](input=rel#7:Subset#1.BEAM_LOGICAL.[]),
 rowcount=1.0, cumulative cost={inf}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4516) [SQL] Allow casting to complex types

2018-06-06 Thread Anton Kedin (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4516:
--
Description: 
Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the 
target type parameter it only supports primitive types or MULTISET). It's 
impossible to cast to an inline fully specified Row type at the moment (Calcite 
1.16).

Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support CAST

  was:
Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the 
target type parameter it only supports primitive types or MULTISET). It's 
impossible to cast to a fully ad-hoc specified Row at the moment (Calcite 1.16).

Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support CAST


> [SQL] Allow casting to complex types
> 
>
> Key: BEAM-4516
> URL: https://issues.apache.org/jira/browse/BEAM-4516
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the 
> target type parameter it only supports primitive types or MULTISET). It's 
> impossible to cast to an inline fully specified Row type at the moment 
> (Calcite 1.16).
> Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support 
> CAST



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4516) [SQL] Allow casting to complex types

2018-06-06 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4516:
-

 Summary: [SQL] Allow casting to complex types
 Key: BEAM-4516
 URL: https://issues.apache.org/jira/browse/BEAM-4516
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Currently CAST is parsed in the main Calcite's Parser.jj. When parsing the 
target type parameter it only supports primitive types or MULTISET). It's 
impossible to cast to a fully ad-hoc specified Row at the moment (Calcite 1.16).

Probably CREATE TYPE in Calcite 1.17 is worth looking at, it should support CAST



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4515) [SQL] Support Row construction with named fields

2018-06-06 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4515:
-

 Summary: [SQL] Support Row construction with named fields
 Key: BEAM-4515
 URL: https://issues.apache.org/jira/browse/BEAM-4515
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Currently Calcite's ROW() constructor only accepts values and doesn't allow 
names, so `INSERT INTO f_row SELECT ROW('asdfasdf', 'sdfsdfsd')` will fail with 
something like this:


`Cannot assign to target field 'f_row' of type RecordType(VARCHAR f_str1, 
VARCHAR f_str2) from source field 'f_row' of type RecordType(VARCHAR EXPR$0, 
VARCHAR EXPR$1)`
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4514) [SQL] Enable flattening configuration

2018-06-06 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4514:
-

 Summary: [SQL] Enable flattening configuration
 Key: BEAM-4514
 URL: https://issues.apache.org/jira/browse/BEAM-4514
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Currently the calcite projection also flattens the output row in PlannerImpl.
This causes failure when inserting into a table with a ROW field.

E.g. `INSERT INTO table (f_row_destination) SELECT f_row_source` will not work, 
as f_row_source will be flattened into separate fields and won't match the 
destination schema



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4509) Implement ROW_NUMBER

2018-06-06 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4509:
-

 Summary: Implement ROW_NUMBER
 Key: BEAM-4509
 URL: https://issues.apache.org/jira/browse/BEAM-4509
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Design and implement ROW_NUMBER() OVER window. It is supported by Calcite and 
we should look at feasibility of supporting it in Beam SQL

[StackOverflow 
Post|https://stackoverflow.com/questions/50724531/implement-row-number-in-beamsql]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4387) [SQL] Implement date types comparisons

2018-05-22 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4387:
-

 Summary: [SQL] Implement date types comparisons
 Key: BEAM-4387
 URL: https://issues.apache.org/jira/browse/BEAM-4387
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


implement and document datetime/timestamp etc comparisons



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3785) [SQL] Add support for arrays

2018-05-22 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3785.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3789) [SQL] Support Nested Rows

2018-05-22 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3789.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Support Nested Rows
> -
>
> Key: BEAM-3789
> URL: https://issues.apache.org/jira/browse/BEAM-3789
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add support for SqlTypeName.ROW



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL

2018-05-22 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4162.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Wire up PubsubIO+JSON to Beam SQL
> -
>
> Key: BEAM-4162
> URL: https://issues.apache.org/jira/browse/BEAM-4162
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
> Beam SQL.
>  
> Use publication time as event timestamp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4199) [SQL] Add a DLQ support for Pubsub tables

2018-05-22 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4199.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Add a DLQ support for Pubsub tables
> -
>
> Key: BEAM-4199
> URL: https://issues.apache.org/jira/browse/BEAM-4199
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently we crash the pipeline if there's any error processing the message 
> from the pubsub, including if it has incorrect JSON format, like missing 
> fields etc.
> Correct solution would be for the user to specify a way to handle the errors, 
> and ideally point to a dead-letter-queue where Beam should send the messages 
> it could not process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4386) [SQL] Implement string LIKE operator

2018-05-22 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4386.
-
   Resolution: Duplicate
Fix Version/s: Not applicable

> [SQL] Implement string LIKE operator
> 
>
> Key: BEAM-4386
> URL: https://issues.apache.org/jira/browse/BEAM-4386
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> Implent string LIKE operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4386) [SQL] Implement string LIKE operator

2018-05-22 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4386:
-

 Summary: [SQL] Implement string LIKE operator
 Key: BEAM-4386
 URL: https://issues.apache.org/jira/browse/BEAM-4386
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Implent string LIKE operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4358) Create test artifacts

2018-05-17 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4358:
--
Description: Currently things like TestPipeline and TestPubsub implement 
TestRule and thus require the project to depend on Junit. We need to create 
separate artifacts for these and depend on Junit only in test scope.  (was: 
Currently things like TestPipeline implement TestRule and thus require 
dependency on Junit. We need to create separate artifacts for these and depend 
on Junit only in test scope.)

> Create test artifacts
> -
>
> Key: BEAM-4358
> URL: https://issues.apache.org/jira/browse/BEAM-4358
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Anton Kedin
>Priority: Major
>
> Currently things like TestPipeline and TestPubsub implement TestRule and thus 
> require the project to depend on Junit. We need to create separate artifacts 
> for these and depend on Junit only in test scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4358) Create test artifacts

2018-05-17 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4358:
--
Description: Currently things like TestPipeline and TestPubsub implement 
TestRule and thus require the project to depend on Junit. We need to create 
separate artifacts for these test utilities and depend on Junit only in test 
scope.  (was: Currently things like TestPipeline and TestPubsub implement 
TestRule and thus require the project to depend on Junit. We need to create 
separate artifacts for these and depend on Junit only in test scope.)

> Create test artifacts
> -
>
> Key: BEAM-4358
> URL: https://issues.apache.org/jira/browse/BEAM-4358
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Anton Kedin
>Priority: Major
>
> Currently things like TestPipeline and TestPubsub implement TestRule and thus 
> require the project to depend on Junit. We need to create separate artifacts 
> for these test utilities and depend on Junit only in test scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4358) Create test artifacts

2018-05-17 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4358:
-

 Summary: Create test artifacts
 Key: BEAM-4358
 URL: https://issues.apache.org/jira/browse/BEAM-4358
 Project: Beam
  Issue Type: Improvement
  Components: build-system, testing
Reporter: Anton Kedin


Currently things like TestPipeline implement TestRule and thus require 
dependency on Junit. We need to create separate artifacts for these and depend 
on Junit only in test scope.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4201) Integration Tests for PubsubIO

2018-05-15 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4201.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Integration Tests for PubsubIO
> --
>
> Key: BEAM-4201
> URL: https://issues.apache.org/jira/browse/BEAM-4201
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Add integration tests for PubsubIO



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4199) [SQL] Add a DLQ support for Pubsub tables

2018-05-15 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin reassigned BEAM-4199:
-

Assignee: Anton Kedin

> [SQL] Add a DLQ support for Pubsub tables
> -
>
> Key: BEAM-4199
> URL: https://issues.apache.org/jira/browse/BEAM-4199
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Currently we crash the pipeline if there's any error processing the message 
> from the pubsub, including if it has incorrect JSON format, like missing 
> fields etc.
> Correct solution would be for the user to specify a way to handle the errors, 
> and ideally point to a dead-letter-queue where Beam should send the messages 
> it could not process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4294) Join operator translator

2018-05-15 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476114#comment-16476114
 ] 

Anton Kedin commented on BEAM-4294:
---

You're probably aware of the Joins library in Beam, but i'm going to leave the 
link here anyway :) 
[https://github.com/apache/beam/blob/2eeeaa257c9336b2c48c3850c8508686a75be7b3/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java]
  Don't know how helpful it is in context of Euphoria 

> Join operator translator
> 
>
> Key: BEAM-4294
> URL: https://issues.apache.org/jira/browse/BEAM-4294
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-euphoria
>Affects Versions: Not applicable
>Reporter: David Moravek
>Assignee: Vaclav Plajt
>Priority: Major
>
> Implementation of basic LeftJoin, RightJoin, InnerJoin and FullJoin 
> translation using CoGroupByKey.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL

2018-05-10 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471491#comment-16471491
 ] 

Anton Kedin commented on BEAM-4162:
---

Main PR merged. Remaining work: integration test

> Wire up PubsubIO+JSON to Beam SQL
> -
>
> Key: BEAM-4162
> URL: https://issues.apache.org/jira/browse/BEAM-4162
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
> Beam SQL.
>  
> Use publication time as event timestamp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4200) [SQL] Support Pubsub publish time as event timestamp

2018-05-10 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4200.
-
   Resolution: Fixed
 Assignee: Anton Kedin
Fix Version/s: Not applicable

> [SQL] Support Pubsub publish time as event timestamp
> 
>
> Key: BEAM-4200
> URL: https://issues.apache.org/jira/browse/BEAM-4200
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> [https://github.com/apache/beam/pull/5253] adds support for event timestamps 
> explicitly specified in the messages attributes. PubsubIO also supports using 
> messages publish timestamps instead. We need to wire this through.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4200) [SQL] Support Pubsub publish time as event timestamp

2018-05-10 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470894#comment-16470894
 ] 

Anton Kedin commented on BEAM-4200:
---

Updated the PR to use ProcessContext.timestamp() to populate the 
event_timestamp field. This will work for timestamps extracted from both 
attributes and publish time.

> [SQL] Support Pubsub publish time as event timestamp
> 
>
> Key: BEAM-4200
> URL: https://issues.apache.org/jira/browse/BEAM-4200
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> [https://github.com/apache/beam/pull/5253] adds support for event timestamps 
> explicitly specified in the messages attributes. PubsubIO also supports using 
> messages publish timestamps instead. We need to wire this through.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4196) [SQL] Support Complex Types in DDL

2018-05-10 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4196.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Support Complex Types in DDL
> --
>
> Key: BEAM-4196
> URL: https://issues.apache.org/jira/browse/BEAM-4196
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Neither our DDL parser we copied from calcite-server or the calcite-server 
> don't support complex types in DDL. If we want to model something like JSON 
> objects we need to support at least Arrays and nested Rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4201) Integration Tests for PubsubIO

2018-05-03 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin reassigned BEAM-4201:
-

Assignee: Anton Kedin

> Integration Tests for PubsubIO
> --
>
> Key: BEAM-4201
> URL: https://issues.apache.org/jira/browse/BEAM-4201
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Add integration tests for PubsubIO



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-4196) [SQL] Support Complex Types in DDL

2018-05-03 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin reassigned BEAM-4196:
-

Assignee: Anton Kedin

> [SQL] Support Complex Types in DDL
> --
>
> Key: BEAM-4196
> URL: https://issues.apache.org/jira/browse/BEAM-4196
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Neither our DDL parser we copied from calcite-server or the calcite-server 
> don't support complex types in DDL. If we want to model something like JSON 
> objects we need to support at least Arrays and nested Rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4201) Integration Tests for PubsubIO

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4201:
-

 Summary: Integration Tests for PubsubIO
 Key: BEAM-4201
 URL: https://issues.apache.org/jira/browse/BEAM-4201
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Add integration tests for PubsubIO



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4200) [SQL] Support Pubsub publish time as event timestamp

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4200:
-

 Summary: [SQL] Support Pubsub publish time as event timestamp
 Key: BEAM-4200
 URL: https://issues.apache.org/jira/browse/BEAM-4200
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


[https://github.com/apache/beam/pull/5253] adds support for event timestamps 
explicitly specified in the messages attributes. PubsubIO also supports using 
messages publish timestamps instead. We need to wire this through.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4199) [SQL] Add a DLQ support for Pubsub tables

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4199:
-

 Summary: [SQL] Add a DLQ support for Pubsub tables
 Key: BEAM-4199
 URL: https://issues.apache.org/jira/browse/BEAM-4199
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Anton Kedin


Currently we crash the pipeline if there's any error processing the message 
from the pubsub, including if it has incorrect JSON format, like missing fields 
etc.

Correct solution would be for the user to specify a way to handle the errors, 
and ideally point to a dead-letter-queue where Beam should send the messages it 
could not process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4198) Automatically infer JSON schema from Pubsub messages

2018-04-30 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4198:
--
Summary: Automatically infer JSON schema from Pubsub messages  (was: 
Dynamically infer JSON schema from Pubsub messages)

> Automatically infer JSON schema from Pubsub messages
> 
>
> Key: BEAM-4198
> URL: https://issues.apache.org/jira/browse/BEAM-4198
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Anton Kedin
>Priority: Major
>
> JsonToRow transform allows JSON String->Row conversion but requires users to 
> know and specify the correct schema upfront. It would be great to be able to 
> infer the schema from a message sample.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4198) Dynamically infer JSON schema from Pubsub messages

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4198:
-

 Summary: Dynamically infer JSON schema from Pubsub messages
 Key: BEAM-4198
 URL: https://issues.apache.org/jira/browse/BEAM-4198
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Anton Kedin


JsonToRow transform allows JSON String->Row conversion but requires users to 
know and specify the correct schema upfront. It would be great to be able to 
infer the schema from a message sample.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4197) Add tests for PubsubIO.readXxx() -> Row

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4197:
-

 Summary: Add tests for PubsubIO.readXxx() -> Row
 Key: BEAM-4197
 URL: https://issues.apache.org/jira/browse/BEAM-4197
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Anton Kedin


PubsubIO has methods like PubsubIO.readStrings(), PubsubIO.readAvros() and 
PubsubIO.readProtos(). We need to add tests to make sure it's easy to convert 
objects read from Pubsub to Rows, e.g. using JsonToRow transform, or 
InferredRowCoder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4196) [SQL] Support Complex Types in DDL

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4196:
-

 Summary: [SQL] Support Complex Types in DDL
 Key: BEAM-4196
 URL: https://issues.apache.org/jira/browse/BEAM-4196
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


Neither our DDL parser we copied from calcite-server or the calcite-server 
don't support complex types in DDL. If we want to model something like JSON 
objects we need to support at least Arrays and nested Rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4195) Support Pubsub emulator in Junit

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4195:
-

 Summary: Support Pubsub emulator in Junit
 Key: BEAM-4195
 URL: https://issues.apache.org/jira/browse/BEAM-4195
 Project: Beam
  Issue Type: New Feature
  Components: io-java-gcp
Reporter: Anton Kedin
Assignee: Anton Kedin


Add support for unit testing of Pubsub-related functionality by, for example, 
wrapping the Pubsub emulator into a Junit test runner or a rule.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4194) [SQL] Support LIMIT

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4194:
-

 Summary: [SQL] Support LIMIT
 Key: BEAM-4194
 URL: https://issues.apache.org/jira/browse/BEAM-4194
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


We need to support queries with "LIMIT xxx".

Problem is that we don't know when aggregates will trigger, they can 
potentially accumulate values in global window and never trigger.

If we have some trigger syntax (BEAM-4193), then the use case becomes similar 
to what we have at the moment, where the user defines the trigger upstream for 
all inputs. In this case LIMIT probably can be implemented as sample.any(5) 
with trigger at count.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4193) [SQL] Support trigger definition in CLI

2018-04-30 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4193:
--
Description: 
We need to be able to define triggers for the sources in SQL CLI. Otherwise it 
is impossible to reason about outputs.

One approach is to come up with a simple JSON trigger definition syntax, along 
the lines of

 
{code:java}
SET OPT( '{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' )
{code}
 

  was:
We need to be able to define triggers for the sources in SQL CLI. Otherwise it 
is impossible to reason about outputs.

One approach is to come up with a simple JSON trigger definition syntax, along 
the lines of

SET OPT( '\{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' )


> [SQL] Support trigger definition in CLI
> ---
>
> Key: BEAM-4193
> URL: https://issues.apache.org/jira/browse/BEAM-4193
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> We need to be able to define triggers for the sources in SQL CLI. Otherwise 
> it is impossible to reason about outputs.
> One approach is to come up with a simple JSON trigger definition syntax, 
> along the lines of
>  
> {code:java}
> SET OPT( '{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' )
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4193) [SQL] Support trigger definition in CLI

2018-04-30 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4193:
-

 Summary: [SQL] Support trigger definition in CLI
 Key: BEAM-4193
 URL: https://issues.apache.org/jira/browse/BEAM-4193
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


We need to be able to define triggers for the sources in SQL CLI. Otherwise it 
is impossible to reason about outputs.

One approach is to come up with a simple JSON trigger definition syntax, along 
the lines of

SET OPT( '\{ "trgger" : { "DefaultTrigger" }, "allowedLateness" : 0 }' )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4183) [SQL] Document supported DDL

2018-04-27 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4183:
-

 Summary: [SQL] Document supported DDL
 Key: BEAM-4183
 URL: https://issues.apache.org/jira/browse/BEAM-4183
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


Document supported DDL with all supported types, parameters, and properties for 
each table type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable

2018-04-25 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4174:
--
Description: 
We have a MockedBoundedTable which is an in-memory implementation of 
BeamSqlTable.

We should support it in DDL and use it for testing of DDL/DML/CLI.

  was:
We have a MockedBoundedTable which implements BeamSqlTable.

We should support it in DDL and use it for testing of DDL/DML/CLI.


> [SQL] Add DDL Support for MockedBoundedTable
> 
>
> Key: BEAM-4174
> URL: https://issues.apache.org/jira/browse/BEAM-4174
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> We have a MockedBoundedTable which is an in-memory implementation of 
> BeamSqlTable.
> We should support it in DDL and use it for testing of DDL/DML/CLI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4174) [SQL] Add DDL Support for MockedBoundedTable

2018-04-25 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4174:
-

 Summary: [SQL] Add DDL Support for MockedBoundedTable
 Key: BEAM-4174
 URL: https://issues.apache.org/jira/browse/BEAM-4174
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


We have a MockedBoundedTable which implements BeamSqlTable.

We should support it in DDL and use it for testing of DDL/DML/CLI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4173) [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv

2018-04-25 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4173:
-

 Summary: [SQL] Refactor BeamSql, BeamSqlCli, BeamSqlEnv
 Key: BEAM-4173
 URL: https://issues.apache.org/jira/browse/BEAM-4173
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


BeamSql is a single method which delegates to QueryTransform factory method 
which creates BeamSqlEnv which creates BeamSqlPlanner which then configures the 
parser and parses the query.

It looks like we can squash together a lot of it by:

 - replacing BeamSql invocations with direct QueryTransform invocations;

 - combining BeamSqlEnv with BeamSqlPlanner or extracting a higher level 
configuration object;

 - exposing few more QueryTransform builders to accept either planner or a 
configuration object;

 - building QueryTransforms in BeamSqlCli;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-4160) Convert JSON objects to Rows

2018-04-25 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-4160.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4168) [SQL] Create a "WordCount" example

2018-04-24 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4168:
-

 Summary: [SQL] Create a "WordCount" example
 Key: BEAM-4168
 URL: https://issues.apache.org/jira/browse/BEAM-4168
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


Create a SQL example which we can execute and verify in runners other than 
DirectRunner



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4167) Implement UNNEST

2018-04-24 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4167:
-

 Summary: Implement UNNEST
 Key: BEAM-4167
 URL: https://issues.apache.org/jira/browse/BEAM-4167
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin


We need to be able to convert collections to relations in the query to perform 
any meaningful operations on them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types

2018-04-24 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450456#comment-16450456
 ] 

Anton Kedin commented on BEAM-3157:
---

Added example: [https://github.com/apache/beam/pull/5215]

Jira for fixing SELECT *: BEAM-4163

> BeamSql transform should support other PCollection types
> 
>
> Key: BEAM-3157
> URL: https://issues.apache.org/jira/browse/BEAM-3157
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Ismaël Mejía
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently the Beam SQL transform only supports input and output data 
> represented as a BeamRecord. This seems to me like an usability limitation 
> (even if we can do a ParDo to prepare objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map 
> name/type/value from an object field into Calcite so it is convenient to have 
> a specific data type (BeamRecord) for this. However we can accomplish the 
> same by using a PCollection of JavaBean (where we know the same information 
> via the field names/types/values) or by using Avro records where we also have 
> the Schema information. For the output PCollection we can map the object via 
> a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not 
> support composite types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection col = BeamSql.query("SELECT * FROM  WHERE 
> ...").from(MyPojo.class);
> A projection:
> PCollection newCol = BeamSql.query("SELECT id, 
> name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns 
> however I suppose that for memory use reasons maybe mapping directly into 
> Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4163) Fix SELECT * for Pojo queries

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4163:
-

 Summary: Fix SELECT * for Pojo queries
 Key: BEAM-4163
 URL: https://issues.apache.org/jira/browse/BEAM-4163
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Anton Kedin


Rows generated from Pojos are based on field indices. Which means they can 
break if Pojo fields are enumerated in a different order. Which can cause 
generated Row to be different for different runner instance. Which can cause 
SELECT * to fail.

 

One solution is to make Pojo field ordering deterministic, e.g. sort them 
before generating field accessors.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4161) Nested Rows flattening doesn't work

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4161:
--
Issue Type: Bug  (was: New Feature)

> Nested Rows flattening doesn't work
> ---
>
> Key: BEAM-4161
> URL: https://issues.apache.org/jira/browse/BEAM-4161
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Calcite flattens nested rows. It updates the field indices of the flattened 
> row so the fields are referenced correctly in the Rel Nodes. But the fields 
> after the flattened row don't have the indices updated, they have the 
> previous ordinals before the flattening. There is no way to look up the 
> correct index at the point when it reaches Beam SQL Rel Nodes. It will be 
> fixed in Calcite 1.17.
> We need to update the Calcite as soon as it is released and add few 
> integration tests around nested Rows:
>  - basic nesting with fields before and after the row field;
>  - multi-level row nesting;
>  - multiple row fields;
>  
> Calcite JIRA: CALCITE-2220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4162:
--
Description: 
Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
Beam SQL.

 

Use publication time as event timestamp

  was:Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up 
to Beam SQL


> Wire up PubsubIO+JSON to Beam SQL
> -
>
> Key: BEAM-4162
> URL: https://issues.apache.org/jira/browse/BEAM-4162
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
> Beam SQL.
>  
> Use publication time as event timestamp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4162) Wire up PubsubIO+JSON to Beam SQL

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4162:
-

 Summary: Wire up PubsubIO+JSON to Beam SQL
 Key: BEAM-4162
 URL: https://issues.apache.org/jira/browse/BEAM-4162
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin
Assignee: Anton Kedin


Read JSON messages from Pubsub, convert them to Rows (BEAM-4160), wire up to 
Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449058#comment-16449058
 ] 

Anton Kedin edited comment on BEAM-4160 at 4/23/18 11:58 PM:
-

PR: [https://github.com/apache/beam/pull/5120]

Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It 
is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment).


was (Author: kedin):
PR: [https://github.com/apache/beam/pull/5120
]

Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It 
is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment).

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4161) Nested Rows flattening doesn't work

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4161:
--
Description: 
Calcite flattens nested rows. It updates the field indices of the flattened row 
so the fields are referenced correctly in the Rel Nodes. But the fields after 
the flattened row don't have the indices updated, they have the previous 
ordinals before the flattening. There is no way to look up the correct index at 
the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17.

We need to update the Calcite as soon as it is released and add few integration 
tests around nested Rows:

 - basic nesting with fields before and after the row field;

 - multi-level row nesting;

 - multiple row fields;

 

Calcite JIRA: CALCITE-2220

  was:
Calcite flattens nested rows. It updates the field indices of the flattened row 
so the fields are referenced correctly in the Rel Nodes. But the fields after 
the flattened row don't have the indices updated, they have the previous 
ordinals before the flattening. There is no way to look up the correct index at 
the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17.

We need to update the Calcite as soon as it is released and add few integration 
tests around nested Rows:

 - basic nesting with fields before and after the row field;

 - multi-level row nesting;

 - multiple row fields;


> Nested Rows flattening doesn't work
> ---
>
> Key: BEAM-4161
> URL: https://issues.apache.org/jira/browse/BEAM-4161
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Calcite flattens nested rows. It updates the field indices of the flattened 
> row so the fields are referenced correctly in the Rel Nodes. But the fields 
> after the flattened row don't have the indices updated, they have the 
> previous ordinals before the flattening. There is no way to look up the 
> correct index at the point when it reaches Beam SQL Rel Nodes. It will be 
> fixed in Calcite 1.17.
> We need to update the Calcite as soon as it is released and add few 
> integration tests around nested Rows:
>  - basic nesting with fields before and after the row field;
>  - multi-level row nesting;
>  - multiple row fields;
>  
> Calcite JIRA: CALCITE-2220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449058#comment-16449058
 ] 

Anton Kedin commented on BEAM-4160:
---

PR: [https://github.com/apache/beam/pull/5120
]

Caveat: Calcite has a bug which prevents querying complex Rows: BEAM-4161. It 
is getting fixed in Calcite 1.17 (we're at 1.16 (latest) at the moment).

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4160:
--
Description: Automate conversion of JSON objects to Rows to reduce overhead 
for querying JSON-based sources

> Convert JSON objects to Rows
> 
>
> Key: BEAM-4160
> URL: https://issues.apache.org/jira/browse/BEAM-4160
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-java-core
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Automate conversion of JSON objects to Rows to reduce overhead for querying 
> JSON-based sources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4161) Nested Rows flattening doesn't work

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4161:
-

 Summary: Nested Rows flattening doesn't work
 Key: BEAM-4161
 URL: https://issues.apache.org/jira/browse/BEAM-4161
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Anton Kedin
Assignee: Anton Kedin


Calcite flattens nested rows. It updates the field indices of the flattened row 
so the fields are referenced correctly in the Rel Nodes. But the fields after 
the flattened row don't have the indices updated, they have the previous 
ordinals before the flattening. There is no way to look up the correct index at 
the point when it reaches Beam SQL Rel Nodes. It will be fixed in Calcite 1.17.

We need to update the Calcite as soon as it is released and add few integration 
tests around nested Rows:

 - basic nesting with fields before and after the row field;

 - multi-level row nesting;

 - multiple row fields;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-4160) Convert JSON objects to Rows

2018-04-23 Thread Anton Kedin (JIRA)
Anton Kedin created BEAM-4160:
-

 Summary: Convert JSON objects to Rows
 Key: BEAM-4160
 URL: https://issues.apache.org/jira/browse/BEAM-4160
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql, sdk-java-core
Reporter: Anton Kedin
Assignee: Anton Kedin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-4076) Schema followups

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-4076:
--
Component/s: dsl-sql

> Schema followups
> 
>
> Key: BEAM-4076
> URL: https://issues.apache.org/jira/browse/BEAM-4076
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, dsl-sql, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> This umbrella bug contains subtasks with followups for Beam schemas, which 
> were moved from SQL to the core Java SDK and made to be type-name-based 
> rather than coder based.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3449) [SQL] Document joins

2018-04-23 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3449.
-
   Resolution: Fixed
Fix Version/s: Not applicable

Doc updated: https://beam.apache.org/documentation/dsls/sql/#features-joins

> [SQL] Document joins
> 
>
> Key: BEAM-3449
> URL: https://issues.apache.org/jira/browse/BEAM-3449
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> Joins behavior becomes complicated in presence of non-trivial windowing on 
> the inputs, e.g. merging windows. We need to provide complete description of 
> how each windowing mode works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3157) BeamSql transform should support other PCollection types

2018-04-13 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3157.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> BeamSql transform should support other PCollection types
> 
>
> Key: BEAM-3157
> URL: https://issues.apache.org/jira/browse/BEAM-3157
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Ismaël Mejía
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently the Beam SQL transform only supports input and output data 
> represented as a BeamRecord. This seems to me like an usability limitation 
> (even if we can do a ParDo to prepare objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map 
> name/type/value from an object field into Calcite so it is convenient to have 
> a specific data type (BeamRecord) for this. However we can accomplish the 
> same by using a PCollection of JavaBean (where we know the same information 
> via the field names/types/values) or by using Avro records where we also have 
> the Schema information. For the output PCollection we can map the object via 
> a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not 
> support composite types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection col = BeamSql.query("SELECT * FROM  WHERE 
> ...").from(MyPojo.class);
> A projection:
> PCollection newCol = BeamSql.query("SELECT id, 
> name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns 
> however I suppose that for memory use reasons maybe mapping directly into 
> Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3157) BeamSql transform should support other PCollection types

2018-04-13 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437561#comment-16437561
 ] 

Anton Kedin commented on BEAM-3157:
---

PR is merged which allows Rows generation for POJOS: 
[https://github.com/apache/beam/pull/4649/files]

Usage example: 
[https://github.com/apache/beam/blob/670c75e94795ad9da6a0690647e996dc97b60718/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/InferredRowCoderSqlTest.java#L82]
 

Caveat: `SELECT *` doesn't work correctly due to undefined order of fields.

This PR resolves this Jira for now. Bugs and extra features will go into 
separate Jiras.

PS: there's also work to convert JSON obejcts directly to Rows: 
https://github.com/apache/beam/pull/5120

> BeamSql transform should support other PCollection types
> 
>
> Key: BEAM-3157
> URL: https://issues.apache.org/jira/browse/BEAM-3157
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Ismaël Mejía
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently the Beam SQL transform only supports input and output data 
> represented as a BeamRecord. This seems to me like an usability limitation 
> (even if we can do a ParDo to prepare objects before and after the transform).
> I suppose this constraint comes from the fact that we need to map 
> name/type/value from an object field into Calcite so it is convenient to have 
> a specific data type (BeamRecord) for this. However we can accomplish the 
> same by using a PCollection of JavaBean (where we know the same information 
> via the field names/types/values) or by using Avro records where we also have 
> the Schema information. For the output PCollection we can map the object via 
> a Reference (e.g. a JavaBean to be filled with the names of an Avro object).
> Note: I am assuming for the moment simple mappings since the SQL does not 
> support composite types for the moment.
> A simple API idea would be something like this:
> A simple filter:
> PCollection col = BeamSql.query("SELECT * FROM  WHERE 
> ...").from(MyPojo.class);
> A projection:
> PCollection newCol = BeamSql.query("SELECT id, 
> name").from(MyPojo.class).as(MyNewPojo.class);
> A first approach could be to just add the extra ParDos + transform DoFns 
> however I suppose that for memory use reasons maybe mapping directly into 
> Calcite would make sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3360) [SQL] Do not assign triggers for HOP/TUMBLE

2018-04-13 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3360.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

fixed by https://github.com/apache/beam/pull/4546

> [SQL] Do not assign triggers for HOP/TUMBLE
> ---
>
> Key: BEAM-3360
> URL: https://issues.apache.org/jira/browse/BEAM-3360
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently when parsing HOP/TUMBLE/SESSION expressions we create a repeating 
> trigger for the defined windows, see:
> {code:java|title=BeamAggregationRule.java}
>   private Trigger createTriggerWithDelay(GregorianCalendar delayTime) {
> return 
> Repeatedly.forever(AfterWatermark.pastEndOfWindow().withLateFirings(AfterProcessingTime
> 
> .pastFirstElementInPane().plusDelayOf(Duration.millis(delayTime.getTimeInMillis();
>   }
> {code}
> This will not work correctly with joins, as joins with multiple trigger 
> firings are currently broken: https://issues.apache.org/jira/browse/BEAM-3190 
> .
> Even if joins with multiple firings worked correctly, SQL parsing stage is 
> still probably an incorrect place to infer them.
> Better alternatives:
>  - inherit the user-defined triggers for the input pcollection without 
> modification;
>  - triggering at sinks ( https://s.apache.org/beam-sink-triggers ) might 
> define a way to backpropagate triggers with correct semantics;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-13 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin reassigned BEAM-3773:
-

Assignee: Andrew Pilloud

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3773) [SQL] Investigate JDBC interface for Beam SQL

2018-04-13 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437547#comment-16437547
 ] 

Anton Kedin commented on BEAM-3773:
---

Raw Beam JDBC Prototype: 
[https://github.com/akedin/beam/commit/096ca8d7185af6b6a01f8231cf85c40fa221051a]

[~apilloud] is looking at Calcite Avatica integration: 
[https://calcite.apache.org/avatica/docs/] 

 

> [SQL] Investigate JDBC interface for Beam SQL
> -
>
> Key: BEAM-3773
> URL: https://issues.apache.org/jira/browse/BEAM-3773
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Andrew Pilloud
>Priority: Major
>
> JDBC allows integration with a lot of third-party tools, e.g 
> [Zeppelin|https://zeppelin.apache.org/docs/0.7.0/manual/interpreters.html], 
> [sqlline|https://github.com/julianhyde/sqlline]. We should look into how 
> feasible it is to implement a JDBC interface for Beam SQL



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-3547) [SQL] Nested Query Generates Incompatible Trigger

2018-04-13 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin updated BEAM-3547:
--
Fix Version/s: (was: Not applicable)
   2.4.0

> [SQL] Nested Query Generates Incompatible Trigger
> -
>
> Key: BEAM-3547
> URL: https://issues.apache.org/jira/browse/BEAM-3547
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: 2.4.0
>
>
> From 
> [https://stackoverflow.com/questions/48335383/nested-queries-in-beam-sql] :
>  
> SQL:
> {code:java}
> PCollection Query_Output = Query.apply(
> BeamSql.queryMulti("Select Orders.OrderID From Orders Where 
> Orders.CustomerID IN (Select Customers.CustomerID From Customers WHERE 
> Customers.CustomerID = 2)"));{code}
>  
> Error:
> {code:java}
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> validateAndConvert
> INFO: SQL:
> SELECT `Orders`.`OrderID`
> FROM `Orders` AS `Orders`
> WHERE `Orders`.`CustomerID` IN (SELECT `Customers`.`CustomerID`
> FROM `Customers` AS `Customers`
> WHERE `Customers`.`CustomerID` = 2)
> Jan 19, 2018 11:56:36 AM 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(OrderID=[$0])
>   LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> LogicalTableScan(table=[[Orders]])
> LogicalAggregate(group=[{0}])
>   LogicalProject(CustomerID=[$0])
> LogicalFilter(condition=[=($0, 2)])
>   LogicalTableScan(table=[[Customers]])
> Exception in thread "main" java.lang.IllegalStateException: 
> java.lang.IllegalStateException: Inputs to Flatten had incompatible triggers: 
> DefaultTrigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:165)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160)
> at com.bitwise.cloud.ExampleOfJoins.main(ExampleOfJoins.java:91)
> Caused by: java.lang.IllegalStateException: Inputs to Flatten had 
> incompatible triggers: DefaultTrigger, 
> Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:123)
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:101)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionList.apply(PCollectionList.java:182)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:124)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:74)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple.apply(KeyedPCollectionTuple.java:107)
> at org.apache.beam.sdk.extensions.joinlibrary.Join.innerJoin(Join.java:59)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.standardJoin(BeamJoinRel.java:217)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.buildBeamPipeline(BeamJoinRel.java:161)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:68)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:163)
> ... 5 more{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-3547) [SQL] Nested Query Generates Incompatible Trigger

2018-04-13 Thread Anton Kedin (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Kedin closed BEAM-3547.
-
   Resolution: Fixed
Fix Version/s: Not applicable

> [SQL] Nested Query Generates Incompatible Trigger
> -
>
> Key: BEAM-3547
> URL: https://issues.apache.org/jira/browse/BEAM-3547
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
> Fix For: Not applicable
>
>
> From 
> [https://stackoverflow.com/questions/48335383/nested-queries-in-beam-sql] :
>  
> SQL:
> {code:java}
> PCollection Query_Output = Query.apply(
> BeamSql.queryMulti("Select Orders.OrderID From Orders Where 
> Orders.CustomerID IN (Select Customers.CustomerID From Customers WHERE 
> Customers.CustomerID = 2)"));{code}
>  
> Error:
> {code:java}
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> validateAndConvert
> INFO: SQL:
> SELECT `Orders`.`OrderID`
> FROM `Orders` AS `Orders`
> WHERE `Orders`.`CustomerID` IN (SELECT `Customers`.`CustomerID`
> FROM `Customers` AS `Customers`
> WHERE `Customers`.`CustomerID` = 2)
> Jan 19, 2018 11:56:36 AM 
> org.apache.beam.sdk.extensions.sql.impl.planner.BeamQueryPlanner 
> convertToBeamRel
> INFO: SQLPlan>
> LogicalProject(OrderID=[$0])
>   LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> LogicalTableScan(table=[[Orders]])
> LogicalAggregate(group=[{0}])
>   LogicalProject(CustomerID=[$0])
> LogicalFilter(condition=[=($0, 2)])
>   LogicalTableScan(table=[[Customers]])
> Exception in thread "main" java.lang.IllegalStateException: 
> java.lang.IllegalStateException: Inputs to Flatten had incompatible triggers: 
> DefaultTrigger, Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:165)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:116)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionTuple.apply(PCollectionTuple.java:160)
> at com.bitwise.cloud.ExampleOfJoins.main(ExampleOfJoins.java:91)
> Caused by: java.lang.IllegalStateException: Inputs to Flatten had 
> incompatible triggers: DefaultTrigger, 
> Repeatedly.forever(AfterWatermark.pastEndOfWindow())
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:123)
> at 
> org.apache.beam.sdk.transforms.Flatten$PCollections.expand(Flatten.java:101)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.values.PCollectionList.apply(PCollectionList.java:182)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:124)
> at 
> org.apache.beam.sdk.transforms.join.CoGroupByKey.expand(CoGroupByKey.java:74)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:533)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:465)
> at 
> org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple.apply(KeyedPCollectionTuple.java:107)
> at org.apache.beam.sdk.extensions.joinlibrary.Join.innerJoin(Join.java:59)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.standardJoin(BeamJoinRel.java:217)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamJoinRel.buildBeamPipeline(BeamJoinRel.java:161)
> at 
> org.apache.beam.sdk.extensions.sql.impl.rel.BeamProjectRel.buildBeamPipeline(BeamProjectRel.java:68)
> at 
> org.apache.beam.sdk.extensions.sql.BeamSql$QueryTransform.expand(BeamSql.java:163)
> ... 5 more{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3785) [SQL] Add support for arrays

2018-03-13 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561
 ] 

Anton Kedin edited comment on BEAM-3785 at 3/13/18 5:38 PM:


to go:

 - arrays of arrays

 - test complex indexing (nested expressions)

 - test aggregations, other complex operations

 


was (Author: kedin):
to go:

 - arrays of arrays

 - test complex indexing (nested expressions)

 - test aggregations, other complex operations

 - DOT operator

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561
 ] 

Anton Kedin edited comment on BEAM-3785 at 3/13/18 5:28 AM:


to go:

 - arrays of arrays

 - test complex indexing (nested expressions)

 - test aggregations, other complex operations

 - DOT operator


was (Author: kedin):
to go:

 - arrays of arrays

 - test complex indexing

 - DOT operator

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396561#comment-16396561
 ] 

Anton Kedin commented on BEAM-3785:
---

to go:

 - arrays of arrays

 - test complex indexing

 - DOT operator

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3785) [SQL] Add support for arrays

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396555#comment-16396555
 ] 

Anton Kedin commented on BEAM-3785:
---

implemented arrays of rows

> [SQL] Add support for arrays
> 
>
> Key: BEAM-3785
> URL: https://issues.apache.org/jira/browse/BEAM-3785
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Support fields of Array type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3417) Fix Calcite assertions

2018-03-12 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390732#comment-16390732
 ] 

Anton Kedin edited comment on BEAM-3417 at 3/12/18 8:10 PM:


*What fails?*

[Assert in question is in in VolcanoPlanner 
|https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546].
 It checks whether [all traits are 
simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517]
 by checking whether they're not instances of RelCompositeTrait.

*Why it fails?*

In our case, when it fails, traitSet.allSimple() has 2 traits. One is 
BeamLogicalConvention (it's not a composite trait), and another is a 
collation-related composite trait which causes the assertion to fail.

*Where does the composite trait come from?*

We specify the collation trait def in 
[BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89]
 before parsing. It then [gets replaced in 
LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102]
 with the [composite 
trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239]
 which causes the failure.

*Why LogicalTableScan needs to do the collation magic?*

Dunno, it seems that it adds the statistics information to the collation trait 
so that the engine can handle sorting correctly. It does so only when we ask it 
to by adding the collation trait def.

*Why VolcanoPlanner doesn't like CompositeTraitSet in that part?*

Dunno.

*Do we need the collation trait def?*

Dunno.

*What do we do?*

If we can, it probably makes sense to replace LogicalTableScan rel with some 
kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic 
or makes it configurable.

 - (update) At the second look, this assertion seems to happen before the Rel 
Replacement, so we probably won't be able to replace the logical table scan rel 
with our own logic.


was (Author: kedin):
*What fails?*

[Assert in question is in in VolcanoPlanner 
|https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546].
 It checks whether [all traits are 
simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517]
 by checking whether they're not instances of RelCompositeTrait.

*Why it fails?*

In our case, when it fails, traitSet.allSimple() has 2 traits. One is 
BeamLogicalConvention (it's not a composite trait), and another is a 
collation-related composite trait which causes the assertion to fail.

*Where does the composite trait come from?*

We specify the collation trait def in 
[BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89]
 before parsing. It then [gets replaced in 
LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102]
 with the [composite 
trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239]
 which causes the failure.

*Why LogicalTableScan needs to do the collation magic?*

Dunno, it seems that it adds the statistics information to the collation trait 
so that the engine can handle sorting correctly. It does so only when we ask it 
to by adding the collation trait def.

*Why VolcanoPlanner doesn't like CompositeTraitSet in that part?*

Dunno.

*Do we need the collation trait def?*

Dunno.

*What do we do?*

If we can, it probably makes sense to replace LogicalTableScanRel with some 
kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic 
or makes it configurable

> Fix Calcite assertions
> --
>
> Key: BEAM-3417
> URL: https://issues.apache.org/jira/browse/BEAM-3417
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> Currently we disable assertions in test for every project which depends on 
> Beam SQL / Calcite. Otherwise it fails assertions when Calcite validates 
> relational representation of the query. E.g. in projects 

[jira] [Comment Edited] (BEAM-3417) Fix Calcite assertions

2018-03-07 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390732#comment-16390732
 ] 

Anton Kedin edited comment on BEAM-3417 at 3/8/18 4:30 AM:
---

*What fails?*

[Assert in question is in in VolcanoPlanner 
|https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546].
 It checks whether [all traits are 
simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517]
 by checking whether they're not instances of RelCompositeTrait.

*Why it fails?*

In our case, when it fails, traitSet.allSimple() has 2 traits. One is 
BeamLogicalConvention (it's not a composite trait), and another is a 
collation-related composite trait which causes the assertion to fail.

*Where does the composite trait come from?*

We specify the collation trait def in 
[BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89]
 before parsing. It then [gets replaced in 
LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102]
 with the [composite 
trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239]
 which causes the failure.

*Why LogicalTableScan needs to do the collation magic?*

Dunno, it seems that it adds the statistics information to the collation trait 
so that the engine can handle sorting correctly. It does so only when we ask it 
to by adding the collation trait def.

*Why VolcanoPlanner doesn't like CompositeTraitSet in that part?*

Dunno.

*Do we need the collation trait def?*

Dunno.

*What do we do?*

If we can, it probably makes sense to replace LogicalTableScanRel with some 
kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic 
or makes it configurable


was (Author: kedin):
*What fails?

[Assert in question is in in VolcanoPlanner 
|https://github.com/apache/calcite/blob/9ab47c732ec99c3162954e1eb74eaa30cddf/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L546].
 It checks whether [all traits are 
simple|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L517]
 by checking whether they're not instances of RelCompositeTrait.

*Why it fails?

In our case, when it fails, traitSet.allSimple() has 2 traits. One is 
BeamLogicalConvention (it's not a composite trait), and another is a 
collation-related composite trait which causes the assertion to fail. 

*Where does the composite trait come from?

We specify the collation trait def in 
[BeamQueryPlanner|https://github.com/apache/beam/blob/14b17ad574342a875c8f99278e18c605aa5b4bc3/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L89]
 before parsing. It then [gets replaced in 
LogicalTableScan|https://github.com/apache/calcite/blob/914b5cfbf978e796afeaff7b780e268ed39d8ec5/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L102]
 with the [composite 
trait|https://github.com/apache/calcite/blob/0938c7b6d767e3242874d87a30d9112512d9243a/core/src/main/java/org/apache/calcite/plan/RelTraitSet.java#L239]
 which causes the failure.

*Why LogicalTableScan needs to do the collation magic?

Dunno, it seems that it adds the statistics information to the collation trait 
so that the engine can handle sorting correctly. It does so only when we ask it 
to by adding the collation trait def.

*Why VolcanoPlanner doesn't like CompositeTraitSet in that part?

Dunno.

*Do we need the collation trait def?

Dunno.

*What do we do?

If we can, it probably makes sense to replace LogicalTableScanRel with some 
kind of BeamIllogicalPCollectionScan which doesn't do all the collation magic 
or makes it configurable

> Fix Calcite assertions
> --
>
> Key: BEAM-3417
> URL: https://issues.apache.org/jira/browse/BEAM-3417
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Anton Kedin
>Priority: Major
>
> Currently we disable assertions in test for every project which depends on 
> Beam SQL / Calcite. Otherwise it fails assertions when Calcite validates 
> relational representation of the query. E.g. in projects which depend on Beam 
> SQL / Calcite we have to specify 
> {code:java|title=build.gradle}
> test {
>  jvmArgs "-da" 
> }
> {code}
> We need to either update our relational conversion logic 

  1   2   3   >