[jira] [Created] (BEAM-9871) Continuous performance tests for BigQuery read is not displayed on the dashboard
Kirill Kozlov created BEAM-9871: --- Summary: Continuous performance tests for BigQuery read is not displayed on the dashboard Key: BEAM-9871 URL: https://issues.apache.org/jira/browse/BEAM-9871 Project: Beam Issue Type: Bug Components: testing Affects Versions: 2.20.0 Reporter: Kirill Kozlov Dashboard: [https://apache-beam-testing.appspot.com/explore?dashboard=5687798237495296] BigQuery read in EXPORT and DIRECT_READ modes are not being displayed on the dashboard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7609) SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names
[ https://issues.apache.org/jira/browse/BEAM-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066946#comment-17066946 ] Kirill Kozlov commented on BEAM-7609: - Not sure if this issue is still reproducible. Not actively working on this, will move to Unassigned. > SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names > --- > > Key: BEAM-7609 > URL: https://issues.apache.org/jira/browse/BEAM-7609 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.13.0 >Reporter: Gleb Kanterov >Priority: Major > > Works in sqlline shell: > {code} > Welcome to Beam SQL 2.14.0-SNAPSHOT (based on sqlline version 1.4.0) > 0: BeamSQL> CREATE EXTERNAL TABLE s1 (id BIGINT) TYPE 'test'; > No rows affected (0.507 seconds) > 0: BeamSQL> CREATE EXTERNAL TABLE s2 (id BIGINT) TYPE 'test'; > No rows affected (0.004 seconds) > 0: BeamSQL> SELECT DISTINCT s1.id as lhs, s2.id as rhs FROM s1 JOIN s2 USING > (id); > +-+-+ > | lhs | rhs | > +-+-+ > +-+-+ > No rows selected (2.568 seconds) > {code} > But doesn't work in the test: > {code} > Schema inputSchema = Schema.of( > Schema.Field.of("id", Schema.FieldType.INT32)); > PCollection i1 = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > PCollection i2 = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > Schema outputSchema = PCollectionTuple > .of("i1", i1) > .and("i2", i2) > .apply(SqlTransform.query("SELECT DISTINCT s1.id as lhs, s2.id as rhs > FROM i1 JOIN i2 USING (id)")) > .getSchema(); > assertEquals(ImmutableList.of("lhs", "rhs"), > outputSchema.getFieldNames()); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7610) SELECT COALESCE(...) isn't inferred as non-nullable field
[ https://issues.apache.org/jira/browse/BEAM-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066942#comment-17066942 ] Kirill Kozlov edited comment on BEAM-7610 at 3/25/20, 6:10 PM: --- The underlying issue was fixed in Calcite. Updating vendored Calite to version 1.22.0 or later should fix this issue. Not actively working on this, will move to Unassigned. was (Author: kirillkozlov): The underlying issue was fixed in Calcite. Updating vendored Calite to version 1.22.0 or later should fix this issue. > SELECT COALESCE(...) isn't inferred as non-nullable field > - > > Key: BEAM-7610 > URL: https://issues.apache.org/jira/browse/BEAM-7610 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.13.0 >Reporter: Gleb Kanterov >Priority: Major > > In Calcite, Coalesce is described as: > {code} > ReturnTypes.cascade(ReturnTypes.LEAST_RESTRICTIVE, > SqlTypeTransforms.LEAST_NULLABLE) > {code} > However, giving non-null constant as an argument doesn't result in a > non-nullable expression: > {code} > Schema inputSchema = Schema.of( > Schema.Field.of("name", Schema.FieldType.STRING.withNullable(true))); > PCollection input = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > Schema outputSchema = input > .apply(SqlTransform.query("SELECT COALESCE(name, 'unknown') as name > FROM PCOLLECTION")) > .getSchema(); > assertEquals( > Schema.builder().addStringField("name").build(), > outputSchema); > {code} > Not sure if it's a problem in Calcite or Beam SQL. > There are no other functions that can be used to produce a non-nullable field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7610) SELECT COALESCE(...) isn't inferred as non-nullable field
[ https://issues.apache.org/jira/browse/BEAM-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov reassigned BEAM-7610: --- Assignee: (was: Kirill Kozlov) > SELECT COALESCE(...) isn't inferred as non-nullable field > - > > Key: BEAM-7610 > URL: https://issues.apache.org/jira/browse/BEAM-7610 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.13.0 >Reporter: Gleb Kanterov >Priority: Major > > In Calcite, Coalesce is described as: > {code} > ReturnTypes.cascade(ReturnTypes.LEAST_RESTRICTIVE, > SqlTypeTransforms.LEAST_NULLABLE) > {code} > However, giving non-null constant as an argument doesn't result in a > non-nullable expression: > {code} > Schema inputSchema = Schema.of( > Schema.Field.of("name", Schema.FieldType.STRING.withNullable(true))); > PCollection input = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > Schema outputSchema = input > .apply(SqlTransform.query("SELECT COALESCE(name, 'unknown') as name > FROM PCOLLECTION")) > .getSchema(); > assertEquals( > Schema.builder().addStringField("name").build(), > outputSchema); > {code} > Not sure if it's a problem in Calcite or Beam SQL. > There are no other functions that can be used to produce a non-nullable field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7609) SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names
[ https://issues.apache.org/jira/browse/BEAM-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov reassigned BEAM-7609: --- Assignee: (was: Kirill Kozlov) > SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names > --- > > Key: BEAM-7609 > URL: https://issues.apache.org/jira/browse/BEAM-7609 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.13.0 >Reporter: Gleb Kanterov >Priority: Major > > Works in sqlline shell: > {code} > Welcome to Beam SQL 2.14.0-SNAPSHOT (based on sqlline version 1.4.0) > 0: BeamSQL> CREATE EXTERNAL TABLE s1 (id BIGINT) TYPE 'test'; > No rows affected (0.507 seconds) > 0: BeamSQL> CREATE EXTERNAL TABLE s2 (id BIGINT) TYPE 'test'; > No rows affected (0.004 seconds) > 0: BeamSQL> SELECT DISTINCT s1.id as lhs, s2.id as rhs FROM s1 JOIN s2 USING > (id); > +-+-+ > | lhs | rhs | > +-+-+ > +-+-+ > No rows selected (2.568 seconds) > {code} > But doesn't work in the test: > {code} > Schema inputSchema = Schema.of( > Schema.Field.of("id", Schema.FieldType.INT32)); > PCollection i1 = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > PCollection i2 = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > Schema outputSchema = PCollectionTuple > .of("i1", i1) > .and("i2", i2) > .apply(SqlTransform.query("SELECT DISTINCT s1.id as lhs, s2.id as rhs > FROM i1 JOIN i2 USING (id)")) > .getSchema(); > assertEquals(ImmutableList.of("lhs", "rhs"), > outputSchema.getFieldNames()); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7610) SELECT COALESCE(...) isn't inferred as non-nullable field
[ https://issues.apache.org/jira/browse/BEAM-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066942#comment-17066942 ] Kirill Kozlov commented on BEAM-7610: - The underlying issue was fixed in Calcite. Updating vendored Calite to version 1.22.0 or later should fix this issue. > SELECT COALESCE(...) isn't inferred as non-nullable field > - > > Key: BEAM-7610 > URL: https://issues.apache.org/jira/browse/BEAM-7610 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.13.0 >Reporter: Gleb Kanterov >Assignee: Kirill Kozlov >Priority: Major > > In Calcite, Coalesce is described as: > {code} > ReturnTypes.cascade(ReturnTypes.LEAST_RESTRICTIVE, > SqlTypeTransforms.LEAST_NULLABLE) > {code} > However, giving non-null constant as an argument doesn't result in a > non-nullable expression: > {code} > Schema inputSchema = Schema.of( > Schema.Field.of("name", Schema.FieldType.STRING.withNullable(true))); > PCollection input = p.apply(Create.of(ImmutableList.of()) > .withCoder(SchemaCoder.of(inputSchema))); > Schema outputSchema = input > .apply(SqlTransform.query("SELECT COALESCE(name, 'unknown') as name > FROM PCOLLECTION")) > .getSchema(); > assertEquals( > Schema.builder().addStringField("name").build(), > outputSchema); > {code} > Not sure if it's a problem in Calcite or Beam SQL. > There are no other functions that can be used to produce a non-nullable field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9358) BigQueryIO potential write speed regression
[ https://issues.apache.org/jira/browse/BEAM-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov closed BEAM-9358. --- Fix Version/s: 2.19.0 Resolution: Resolved > BigQueryIO potential write speed regression > --- > > Key: BEAM-9358 > URL: https://issues.apache.org/jira/browse/BEAM-9358 > Project: Beam > Issue Type: Task > Components: io-py-gcp >Affects Versions: 2.19.0 >Reporter: Kirill Kozlov >Priority: Minor > Fix For: 2.19.0 > > > There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) > [1], as well as 10x increase in runtime [2] for python BigQueryIO in the > PerfKit dashboard. > Seems to be fairly recent, started on the Feb 20th and continued on the Feb > 21st. Maybe a flake, but still worth investigating. > [1] > [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938] > [2] > [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9358) BigQueryIO potential write speed regression
[ https://issues.apache.org/jira/browse/BEAM-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043694#comment-17043694 ] Kirill Kozlov commented on BEAM-9358: - Looks like everything went back to normal on the Feb 23rd, closing this issue. > BigQueryIO potential write speed regression > --- > > Key: BEAM-9358 > URL: https://issues.apache.org/jira/browse/BEAM-9358 > Project: Beam > Issue Type: Task > Components: io-py-gcp >Affects Versions: 2.19.0 >Reporter: Kirill Kozlov >Priority: Minor > > There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) > [1], as well as 10x increase in runtime [2] for python BigQueryIO in the > PerfKit dashboard. > Seems to be fairly recent, started on the Feb 20th and continued on the Feb > 21st. Maybe a flake, but still worth investigating. > [1] > [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938] > [2] > [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9358) BigQueryIO potential write speed regression
[ https://issues.apache.org/jira/browse/BEAM-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-9358: Description: There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) [1], as well as 10x increase in runtime [2] for python BigQueryIO in the PerfKit dashboard. Seems to be fairly recent, started on the Feb 20th and continued on the Feb 21st. Maybe a flake, but still worth investigating. [1] [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938] [2] [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888] was: There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5), as well as 10x increase in runtime [2] for python BigQueryIO in the PerfKit dashboard [1]. Seems to be fairly recent, started on the Feb 20th and continued on the Feb 21st. Maybe a flake, but still worth investigating. [1] [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938] [2] [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888] > BigQueryIO potential write speed regression > --- > > Key: BEAM-9358 > URL: https://issues.apache.org/jira/browse/BEAM-9358 > Project: Beam > Issue Type: Task > Components: io-py-gcp >Affects Versions: 2.19.0 >Reporter: Kirill Kozlov >Priority: Minor > > There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) > [1], as well as 10x increase in runtime [2] for python BigQueryIO in the > PerfKit dashboard. > Seems to be fairly recent, started on the Feb 20th and continued on the Feb > 21st. Maybe a flake, but still worth investigating. > [1] > [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938] > [2] > [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9358) BigQueryIO potential write speed regression
Kirill Kozlov created BEAM-9358: --- Summary: BigQueryIO potential write speed regression Key: BEAM-9358 URL: https://issues.apache.org/jira/browse/BEAM-9358 Project: Beam Issue Type: Task Components: io-py-gcp Affects Versions: 2.19.0 Reporter: Kirill Kozlov There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5), as well as 10x increase in runtime [2] for python BigQueryIO in the PerfKit dashboard [1]. Seems to be fairly recent, started on the Feb 20th and continued on the Feb 21st. Maybe a flake, but still worth investigating. [1] [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938] [2] [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-4457) Analyze FieldAccessDescriptors and drop fields that are never accessed
[ https://issues.apache.org/jira/browse/BEAM-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030165#comment-17030165 ] Kirill Kozlov commented on BEAM-4457: - Definitely a cool idea! Yes, right now push-down heavily relies on the SQL optimizer rules, but this can help simplify them. > Analyze FieldAccessDescriptors and drop fields that are never accessed > -- > > Key: BEAM-4457 > URL: https://issues.apache.org/jira/browse/BEAM-4457 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > > We can walk backwards through the graph, analyzing which fields are accessed. > When we find paths where many fields are never accessed, we can insert a > projection transform to drop those fields preemptively. This can save a lot > of resources in the case where many fields in the input are never accessed. > To do this, the FieldAccessDescriptor information must be added to the > portability protos. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8042. - Fix Version/s: 2.20.0 Resolution: Fixed > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result
[ https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023346#comment-17023346 ] Kirill Kozlov commented on BEAM-9027: - Thanks! Yes, feel free to assign to yourself. > [SQL] ZetaSQL unparsing should produce valid result > --- > > Key: BEAM-9027 > URL: https://issues.apache.org/jira/browse/BEAM-9027 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > * ZetaSQL does not recognize keyword INTERVAL > * Calcite cannot unparse RexNode back to bytes literal > * Calcite cannot unparse some floating point literals correctly > * Calcite cannot unparse some string literals correctly > * Calcite cannot unparse types correctly for CAST function -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/24/20 10:00 PM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to ensure that strings with special characters (like: {noformat} SELECT 'abc\\n'{noformat} ) get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: {noformat} SELECT 'abc\\n'{noformat} ) get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-9169. - Fix Version/s: 2.20.0 Resolution: Fixed > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8890) Update google-cloud-java version
[ https://issues.apache.org/jira/browse/BEAM-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8890: Description: google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB). Which is a problem when reading in DIRECT_READ (Read API) from BigQuery tables with large rows. Updating to v0.117.0 or above should fix the problem. [https://github.com/googleapis/google-cloud-java/blob/v0.117.0/google-cloud-clients/google-cloud-bigquerystorage/pom.xml] was: google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB). Which is a problem when reading in DIRECT_READ (Read API) from BigQuery tables with large rows. Updating to v0.117.0 or above should fix the problem. > Update google-cloud-java version > > > Key: BEAM-8890 > URL: https://issues.apache.org/jira/browse/BEAM-8890 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Kirill Kozlov >Priority: Major > > google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB). > Which is a problem when reading in DIRECT_READ (Read API) from BigQuery > tables with large rows. > Updating to v0.117.0 or above should fix the problem. > [https://github.com/googleapis/google-cloud-java/blob/v0.117.0/google-cloud-clients/google-cloud-bigquerystorage/pom.xml] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9072) [SQL] Add support for Datastore source
[ https://issues.apache.org/jira/browse/BEAM-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-9072. - Fix Version/s: 2.20.0 Resolution: Fixed > [SQL] Add support for Datastore source > -- > > Key: BEAM-9072 > URL: https://issues.apache.org/jira/browse/BEAM-9072 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.20.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > * Create a Datastore table and table provider > * Conversion between Datastore and Beam data types > * Implement buildIOReader > * Implement buildIOWrite > * Implement getTableStatistics > Doc: > [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9180) [ZetaSQL] Support 4-byte unicode in literal string unparsing
Kirill Kozlov created BEAM-9180: --- Summary: [ZetaSQL] Support 4-byte unicode in literal string unparsing Key: BEAM-9180 URL: https://issues.apache.org/jira/browse/BEAM-9180 Project: Beam Issue Type: Improvement Components: dsl-sql-zetasql Reporter: Kirill Kozlov When unprasing literal strings we need to escape special symbols (ex: `\n`, `\r`, `\u0012`). ZetaSQL supports for some 4-byte (or 8 hex digit) unicode via `\U`. As of [now|[https://github.com/apache/beam/blob/8a35f408f640d04c38ad6e2a497d30410b3bff32/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L59]] only 2-byte (or 4 hex digit) unicode is supported by escaping it via `\u`. More about escape sequences here (need to scroll down a little): https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021459#comment-17021459 ] Kirill Kozlov commented on BEAM-9169: - Created a PR to use a custom escape method. The issue involving `/` should be fixed now, but I noticed that we might have some issues escaping single quotes. {code:java} SELECT 'it\'s' {code} A correctly escaped string with single quotes gets corrupted by SqlCharStringLiteral. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020690#comment-17020690 ] Kirill Kozlov commented on BEAM-9169: - Yes, we might be able to fix this by specifying what needs escaping and what does not (or using a different escaping method). Not sure how to achieve that, but I will look into it. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: ' {noformat} SELECT 'abc\\n'{noformat} ') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: {noformat} SELECT 'abc\\n'{noformat} ) get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: ' {noformat} SELECT 'abc\\n'{noformat} ') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Kirill Kozlov >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681 ] Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM: --- [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. was (Author: kirillkozlov): [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] Edit: This was added to insure that strings with special characters (like: 'abc\\n') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL does not expect to be escaped. > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing
[ https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681 ] Kirill Kozlov commented on BEAM-9169: - [~robinyqiu] This might be the cause: [https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52] > Extra character introduced during Calcite unparsing > --- > > Key: BEAM-9169 > URL: https://issues.apache.org/jira/browse/BEAM-9169 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Minor > > When I am testing query string > {code:java} > "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", > \"America/Los_Angeles\")"{code} > on BeamZetaSqlCalcRel I found that the second string parameter to the > function is unparsed to > {code:java} > America\/Los_Angeles{code} > (note an extra backslash character is added). > This breaks the ZetaSQL evaluator with error > {code:java} > Syntax error: Illegal escape sequence: \/{code} > From what I can see now this character is introduced during the Calcite > unparsing step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8042 started by Kirill Kozlov. --- > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov reassigned BEAM-8042: --- Assignee: Kirill Kozlov > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 0.5h > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9164) [PreCommit_Java] [Flake] org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
[ https://issues.apache.org/jira/browse/BEAM-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-9164: Summary: [PreCommit_Java] [Flake] org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest (was: Flaky test: org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest) > [PreCommit_Java] [Flake] > org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest > --- > > Key: BEAM-9164 > URL: https://issues.apache.org/jira/browse/BEAM-9164 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kirill Kozlov >Priority: Major > Labels: flake > > Test: > org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest > >> testWatermarkEmission[numTasks = 1; numSplits=1] > Fails with the following exception: > {code:java} > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds{code} > Affected Jenkins job: > [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1665/] > Gradle build scan: [https://scans.gradle.com/s/nvgeb425fe63q] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9164) Flaky test: org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
Kirill Kozlov created BEAM-9164: --- Summary: Flaky test: org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest Key: BEAM-9164 URL: https://issues.apache.org/jira/browse/BEAM-9164 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Kirill Kozlov Test: org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest >> testWatermarkEmission[numTasks = 1; numSplits=1] Fails with the following exception: {code:java} org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds{code} Affected Jenkins job: [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1665/] Gradle build scan: [https://scans.gradle.com/s/nvgeb425fe63q] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9072) [SQL] Add support for Datastore source
[ https://issues.apache.org/jira/browse/BEAM-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-9072: Description: * Create a Datastore table and table provider * Conversion between Datastore and Beam data types * Implement buildIOReader * Implement buildIOWrite * Implement getTableStatistics Doc: [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1] was: * Create a Datastore table and table provider * Conversion between Datastore and Beam data types * Implement buildIOReader * Implement buildIOWrite * Implement getTableStatistics > [SQL] Add support for Datastore source > -- > > Key: BEAM-9072 > URL: https://issues.apache.org/jira/browse/BEAM-9072 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > * Create a Datastore table and table provider > * Conversion between Datastore and Beam data types > * Implement buildIOReader > * Implement buildIOWrite > * Implement getTableStatistics > Doc: > [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result
[ https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010124#comment-17010124 ] Kirill Kozlov edited comment on BEAM-9027 at 1/14/20 8:51 PM: -- Brief update on this issue. Following unparsing errors have been fixed: * Calcite cannot unparse RexNode back to bytes literal * Calcite cannot unparse some floating point literals correctly * Calcite cannot unparse some string literals correctly Other issues from the initial list (INTERVAL and CAST) still need to be fixed. was (Author: kirillkozlov): Brief update on this issue. Following unparsing error have been fixed: * Calcite cannot unparse RexNode back to bytes literal * Calcite cannot unparse some floating point literals correctly * Calcite cannot unparse some string literals correctly Other issues from the initial list (INTERVAL and CAST) still need to be fixed. > [SQL] ZetaSQL unparsing should produce valid result > --- > > Key: BEAM-9027 > URL: https://issues.apache.org/jira/browse/BEAM-9027 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > * ZetaSQL does not recognize keyword INTERVAL > * Calcite cannot unparse RexNode back to bytes literal > * Calcite cannot unparse some floating point literals correctly > * Calcite cannot unparse some string literals correctly > * Calcite cannot unparse types correctly for CAST function -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8844) [SQL] Create performance tests for BigQueryTable
[ https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8844. - Fix Version/s: 2.19.0 Resolution: Fixed > [SQL] Create performance tests for BigQueryTable > > > Key: BEAM-8844 > URL: https://issues.apache.org/jira/browse/BEAM-8844 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.19.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > They should measure read-time for: > * DIRECT_READ w/o push-down > * DIRECT_READ w/ push-down > * DEFAULT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8844) [SQL] Create performance tests for BigQueryTable
[ https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014717#comment-17014717 ] Kirill Kozlov commented on BEAM-8844: - Result can be viewed here: [https://apache-beam-testing.appspot.com/explore?dashboard=5687798237495296] The problem I have noticed with the tests: they rely on public datasets, which get periodically updated (with new data), making test results inconsistent. We should make a copy of the dataset and use it for performance tests. > [SQL] Create performance tests for BigQueryTable > > > Key: BEAM-8844 > URL: https://issues.apache.org/jira/browse/BEAM-8844 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > They should measure read-time for: > * DIRECT_READ w/o push-down > * DIRECT_READ w/ push-down > * DEFAULT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9114) BigQueryIO TableRowParser should support Arrow and Avro data formats
Kirill Kozlov created BEAM-9114: --- Summary: BigQueryIO TableRowParser should support Arrow and Avro data formats Key: BEAM-9114 URL: https://issues.apache.org/jira/browse/BEAM-9114 Project: Beam Issue Type: Improvement Components: io-java-gcp Reporter: Kirill Kozlov We should implement a function like BigQueryIO#readRows that encapsulate row conversion logic [1] for both Arrow and Avro data formats. BigQueryTable#buildIOReader from SQL extension should call the function described above. [1] https://github.com/apache/beam/pull/10369#discussion_r365979515 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8993) [SQL] MongoDb should use predicate push-down
[ https://issues.apache.org/jira/browse/BEAM-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8993. - Fix Version/s: 2.19.0 Resolution: Fixed > [SQL] MongoDb should use predicate push-down > > > Key: BEAM-8993 > URL: https://issues.apache.org/jira/browse/BEAM-8993 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.19.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > * Add a MongoDbFilter class, implementing BeamSqlTableFilter. > ** Support simple comparison operations. > ** Support boolean field. > ** Support nested conjunction/disjunction. > * Update MongoDbTable#buildIOReader > ** Construct a push-down filter from RexNodes. > ** Set filter to FindQuery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7865) Java precommit Samza tests opening RocksDB flaky
[ https://issues.apache.org/jira/browse/BEAM-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012208#comment-17012208 ] Kirill Kozlov edited comment on BEAM-7865 at 1/9/20 8:30 PM: - This PR [1] seems to be affected be this issue too. Jenkins tasks: [2, 3]. [1] [https://github.com/apache/beam/pull/10527] [2] [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Commit/9530/] [3] [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1629/] was (Author: kirillkozlov): This PR [1] seems to be affected. Jenkins tasks affected [2, 3]. [1] [https://github.com/apache/beam/pull/10527] [2] [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Commit/9530/] [3] [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1629/] > Java precommit Samza tests opening RocksDB flaky > > > Key: BEAM-7865 > URL: https://issues.apache.org/jira/browse/BEAM-7865 > Project: Beam > Issue Type: Bug > Components: runner-samza, test-failures >Reporter: Kyle Weaver >Priority: Major > > org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers > and > org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testRestore > fail with exception: > org.apache.samza.SamzaException: Error opening RocksDB store beamStore at > location /tmp/store2 > ... > Caused by: org.rocksdb.RocksDBException: Can't access /000359.sst: IO error: > while stat a file for size: /tmp/store2/000359.sst: No such file or directory > Can't access /000356.sst: IO error: while stat a file for size: > /tmp/store2/000356.sst: No such file or directory Can't access /000354.sst: > IO error: while stat a file for size: /tmp/store2/000354.sst: No such file or > directory > > Example failures: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6896/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6906/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7865) Java precommit Samza tests opening RocksDB flaky
[ https://issues.apache.org/jira/browse/BEAM-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012208#comment-17012208 ] Kirill Kozlov commented on BEAM-7865: - This PR [1] seems to be affected. Jenkins tasks affected [2, 3]. [1] [https://github.com/apache/beam/pull/10527] [2] [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Commit/9530/] [3] [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1629/] > Java precommit Samza tests opening RocksDB flaky > > > Key: BEAM-7865 > URL: https://issues.apache.org/jira/browse/BEAM-7865 > Project: Beam > Issue Type: Bug > Components: runner-samza, test-failures >Reporter: Kyle Weaver >Priority: Major > > org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers > and > org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testRestore > fail with exception: > org.apache.samza.SamzaException: Error opening RocksDB store beamStore at > location /tmp/store2 > ... > Caused by: org.rocksdb.RocksDBException: Can't access /000359.sst: IO error: > while stat a file for size: /tmp/store2/000359.sst: No such file or directory > Can't access /000356.sst: IO error: while stat a file for size: > /tmp/store2/000356.sst: No such file or directory Can't access /000354.sst: > IO error: while stat a file for size: /tmp/store2/000354.sst: No such file or > directory > > Example failures: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6896/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6906/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9072) [SQL] Add support for Datastore source
[ https://issues.apache.org/jira/browse/BEAM-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-9072: Summary: [SQL] Add support for Datastore source (was: [SQL] Add Datastore table read support) > [SQL] Add support for Datastore source > -- > > Key: BEAM-9072 > URL: https://issues.apache.org/jira/browse/BEAM-9072 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > > * Create a Datastore table and table provider > * Conversion between Datastore and Beam data types > * Implement buildIOReader > * Implement buildIOWrite > * Implement getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9072) [SQL] Add Datastore table read support
Kirill Kozlov created BEAM-9072: --- Summary: [SQL] Add Datastore table read support Key: BEAM-9072 URL: https://issues.apache.org/jira/browse/BEAM-9072 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov * Create a Datastore table and table provider * Conversion between Datastore and Beam data types * Implement buildIOReader * Implement buildIOWrite * Implement getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8794. - Fix Version/s: 2.18.0 Resolution: Fixed > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result
[ https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010124#comment-17010124 ] Kirill Kozlov commented on BEAM-9027: - Brief update on this issue. Following unparsing error have been fixed: * Calcite cannot unparse RexNode back to bytes literal * Calcite cannot unparse some floating point literals correctly * Calcite cannot unparse some string literals correctly Other issues from the initial list (INTERVAL and CAST) still need to be fixed. > [SQL] ZetaSQL unparsing should produce valid result > --- > > Key: BEAM-9027 > URL: https://issues.apache.org/jira/browse/BEAM-9027 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > * ZetaSQL does not recognize keyword INTERVAL > * Calcite cannot unparse RexNode back to bytes literal > * Calcite cannot unparse some floating point literals correctly > * Calcite cannot unparse some string literals correctly > * Calcite cannot unparse types correctly for CAST function -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003733#comment-17003733 ] Kirill Kozlov commented on BEAM-9000: - I believe that the following PR [1] is somewhat relevant to this, so I decided to link it here just in case. [1] [https://github.com/apache/beam/pull/10094] CC: [~bhulette] > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Fix For: 2.19.0 > > Time Spent: 2h > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result
[ https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002570#comment-17002570 ] Kirill Kozlov commented on BEAM-9027: - CC: [~apilloud] > [SQL] ZetaSQL unparsing should produce valid result > --- > > Key: BEAM-9027 > URL: https://issues.apache.org/jira/browse/BEAM-9027 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Priority: Major > > * ZetaSQL does not recognize keyword INTERVAL > * Calcite cannot unparse RexNode back to bytes literal > * Calcite cannot unparse some floating point literals correctly > * Calcite cannot unparse some string literals correctly > * Calcite cannot unparse types correctly for CAST function -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result
[ https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002569#comment-17002569 ] Kirill Kozlov commented on BEAM-9027: - CC: [~robinyqiu] > [SQL] ZetaSQL unparsing should produce valid result > --- > > Key: BEAM-9027 > URL: https://issues.apache.org/jira/browse/BEAM-9027 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Priority: Major > > * ZetaSQL does not recognize keyword INTERVAL > * Calcite cannot unparse RexNode back to bytes literal > * Calcite cannot unparse some floating point literals correctly > * Calcite cannot unparse some string literals correctly > * Calcite cannot unparse types correctly for CAST function -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result
Kirill Kozlov created BEAM-9027: --- Summary: [SQL] ZetaSQL unparsing should produce valid result Key: BEAM-9027 URL: https://issues.apache.org/jira/browse/BEAM-9027 Project: Beam Issue Type: Improvement Components: dsl-sql-zetasql Reporter: Kirill Kozlov * ZetaSQL does not recognize keyword INTERVAL * Calcite cannot unparse RexNode back to bytes literal * Calcite cannot unparse some floating point literals correctly * Calcite cannot unparse some string literals correctly * Calcite cannot unparse types correctly for CAST function -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8933 started by Kirill Kozlov. --- > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8993) [SQL] MongoDb should use predicate push-down
[ https://issues.apache.org/jira/browse/BEAM-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8993: Description: * Add a MongoDbFilter class, implementing BeamSqlTableFilter. ** Support simple comparison operations. ** Support boolean field. ** Support nested conjunction/disjunction. * Update MongoDbTable#buildIOReader ** Construct a push-down filter from RexNodes. ** Set filter to FindQuery. was: * Add a MongoDbFilter class, implementing BeamSqlTableFilter. ** Support simple comparison operations. ** Support boolean field. ** Support nested conjunction/disjunction. * Update MongoDbTable#buildIOReader ** Construct a push-down filter from RexNodes. ** Set filter to FindQuery. > [SQL] MongoDb should use predicate push-down > > > Key: BEAM-8993 > URL: https://issues.apache.org/jira/browse/BEAM-8993 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > > * Add a MongoDbFilter class, implementing BeamSqlTableFilter. > ** Support simple comparison operations. > ** Support boolean field. > ** Support nested conjunction/disjunction. > * Update MongoDbTable#buildIOReader > ** Construct a push-down filter from RexNodes. > ** Set filter to FindQuery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8993) [SQL] MongoDb should use predicate push-down
[ https://issues.apache.org/jira/browse/BEAM-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8993: Status: Open (was: Triage Needed) > [SQL] MongoDb should use predicate push-down > > > Key: BEAM-8993 > URL: https://issues.apache.org/jira/browse/BEAM-8993 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > > * Add a MongoDbFilter class, implementing BeamSqlTableFilter. > ** Support simple comparison operations. > ** Support boolean field. > ** Support nested conjunction/disjunction. > * Update MongoDbTable#buildIOReader > ** Construct a push-down filter from RexNodes. > ** Set filter to FindQuery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8993) [SQL] MongoDb should use predicate push-down
Kirill Kozlov created BEAM-8993: --- Summary: [SQL] MongoDb should use predicate push-down Key: BEAM-8993 URL: https://issues.apache.org/jira/browse/BEAM-8993 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov * Add a MongoDbFilter class, implementing BeamSqlTableFilter. ** Support simple comparison operations. ** Support boolean field. ** Support nested conjunction/disjunction. * Update MongoDbTable#buildIOReader ** Construct a push-down filter from RexNodes. ** Set filter to FindQuery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8664) [SQL] MongoDb should use project push-down
[ https://issues.apache.org/jira/browse/BEAM-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8664. - Fix Version/s: 2.18.0 Resolution: Fixed > [SQL] MongoDb should use project push-down > -- > > Key: BEAM-8664 > URL: https://issues.apache.org/jira/browse/BEAM-8664 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > MongoDbTable should implement the following methods: > {code:java} > public PCollection buildIOReader( > PBegin begin, BeamSqlTableFilter filters, List fieldNames); > public ProjectSupport supportsProjects(); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8933: Description: As of right now BigQuery uses Avro format for reading and writing. We should add a config to BigQueryIO to specify which format to use: Arrow or Avro (with Avro as default). was: As of right now BigQuery uses Avro format for reading and writing. We should add a config to BigQueryIO to specify which format to use (with Avro as default). > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997475#comment-16997475 ] Kirill Kozlov commented on BEAM-8933: - [~iemejia], I meant to say Arrow [1] in the title, BigQuery currently supports Avro. As of right now reading from BQ in Arrow does not support all features Avro format does, but there are people working on matching the functionality. [1] [https://github.com/apache/arrow] > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use (with > Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov reassigned BEAM-8933: --- Assignee: Kirill Kozlov > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use (with > Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8933) BigQuery IO should support read/write in Arrow format
Kirill Kozlov created BEAM-8933: --- Summary: BigQuery IO should support read/write in Arrow format Key: BEAM-8933 URL: https://issues.apache.org/jira/browse/BEAM-8933 Project: Beam Issue Type: Improvement Components: io-java-gcp Reporter: Kirill Kozlov As of right now BigQuery uses Avro format for reading and writing. We should add a config to BigQueryIO to specify which format to use (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type
[ https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989329#comment-16989329 ] Kirill Kozlov commented on BEAM-8896: - [~amaliujia], yes, join condition is pushed down to a Project which serve as an input to a Join. > WITH query AS + SELECT query JOIN other throws invalid type > --- > > Key: BEAM-8896 > URL: https://issues.apache.org/jira/browse/BEAM-8896 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.16.0 >Reporter: fdiazgon >Assignee: Andrew Pilloud >Priority: Major > > The first one of the three following queries fails, despite queries being > equivalent: > {code:java} > Pipeline p = Pipeline.create(); > Schema schemaA = > Schema.of( > Schema.Field.of("id", Schema.FieldType.BYTES), > Schema.Field.of("fA1", Schema.FieldType.STRING)); > Schema schemaB = > Schema.of( > Schema.Field.of("id", Schema.FieldType.STRING), > Schema.Field.of("fB1", Schema.FieldType.STRING)); > PCollection inputA = > > p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA))); > PCollection inputB = > > p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB))); > // Fails > String query1 = > "WITH query AS " > + "( " > + " SELECT id, fA1, fA1 AS fA1_2 " > + " FROM tblA" > + ") " > + "SELECT fA1, fB1, fA1_2 " > + "FROM query " > + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)"; > // Ok > String query2 = > "WITH query AS " > + "( " > + " SELECT fA1, fB1, fA1 AS fA1_2 " > + " FROM tblA " > + " JOIN tblB " > + " ON (TO_HEX(tblA.id) = tblB.id) " > + ")" > + "SELECT fA1, fB1, fA1_2 " > + "FROM query "; > // Ok > String query3 = > "WITH query AS " > + "( " > + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 " > + " FROM tblA" > + ") " > + "SELECT fA1, fB1, fA1_2 " > + "FROM query " > + "JOIN tblB ON (query.id = tblB.id)"; > Schema transform3 = > PCollectionTuple.of("tblA", inputA) > .and("tblB", inputB) > .apply(SqlTransform.query(query3)) > .getSchema(); > System.out.println(transform3); > Schema transform2 = > PCollectionTuple.of("tblA", inputA) > .and("tblB", inputB) > .apply(SqlTransform.query(query2)) > .getSchema(); > System.out.println(transform2); > Schema transform1 = > PCollectionTuple.of("tblA", inputA) > .and("tblB", inputB) > .apply(SqlTransform.query(query1)) > .getSchema(); > System.out.println(transform1); > {code} > > The error is: > {noformat} > Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is > invalid for type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread > "main" java.lang.AssertionError: Field ordinal 2 is invalid for type > 'RecordType(VARBINARY id, VARCHAR fA1)' at > org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat} > > If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all > queries work fine. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type
[ https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162 ] Kirill Kozlov edited comment on BEAM-8896 at 12/5/19 10:27 PM: --- After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571] Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of nodes (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, causing an error above. I think a viable solution would be to modify Join creation to register a resulting join inputs as leaves (when they are modified ). Alternative approach would be to add an additional Project when a Join needs to modify an input. was (Author: kirillkozlov): After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571] Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of node (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated
[jira] [Comment Edited] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type
[ https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162 ] Kirill Kozlov edited comment on BEAM-8896 at 12/5/19 10:24 PM: --- After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571] Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of node (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, casing an error above. I think a viable solution would be to modify Join creation to register a resulting join inputs as leaves (when they are modified ). Alternative approach would be to add an additional Project when a Join needs to modify an input. was (Author: kirillkozlov): After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of node (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, casing an error above. I think a viable solution would be to modify Join creation to register a resulti
[jira] [Comment Edited] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type
[ https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162 ] Kirill Kozlov edited comment on BEAM-8896 at 12/5/19 10:23 PM: --- After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of node (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, casing an error above. I think a viable solution would be to modify Join creation to register a resulting join inputs as leaves (when they are modified ). Alternative approach would be to add an additional Project when a Join needs to modify an input. [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571] was (Author: kirillkozlov): After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of node (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, casing an error above. I think a viable solution would be to modify Join creation to register a
[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type
[ https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162 ] Kirill Kozlov commented on BEAM-8896: - After some investigating I have a feeling that this problem has to do with the way leaves (Array in SqlToRelConverter class) are registered in Calcite. Before joins are created left and right paths are pared first. For the 1st query above they are as follows: {code:java} Left: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) Right: LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} As they are processed - they are registered as leaves (added to the Array). When Join node is being created it knows what the `condition expressions` is: {code:java} =(TO_HEX($0), $3) {code} Since TO_HEX is not computed anywhere - it modifies the left input to be as follows: {code:java} LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) {code} where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated. Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node {code:java} top-most node: LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR id0, VARCHAR fB1) LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3, VARCHAR id0, VARCHAR fB1) LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR $f3) LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1) LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code} Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created. In a constructor, LookupContext performs flatten, which recursively traverses tree of node (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan. When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, casing an error above. I think a viable solution would be to modify Join creation to register a resulting join inputs as leaves (when they are modified ). Alternative approach would be to add an additional Project when a Join needs to modify an input. > WITH query AS + SELECT query JOIN other throws invalid type > --- > > Key: BEAM-8896 > URL: https://issues.apache.org/jira/browse/BEAM-8896 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.16.0 >Reporter: fdiazgon >Assignee: Andrew Pilloud >Priority: Major > > The first one of the three following queries fails, despite queries being > equivalent: > {code:java} > Pipeline p = Pipeline.create(); > Schema schemaA = > Schema.of( > Schema.Field.of("id", Schema.FieldType.BYTES), > Schema.Field.of("fA1", Schema.FieldType.STRING)); > Schema schemaB = > Schema.of( > Schema.Field.of("id", Schema.FieldType.STRING), > Schema.Field.of("fB1", Schema.FieldType.STRING)); > PCollection inputA = > > p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA))); > PCollection inputB = > > p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB))); > // Fails > String query1 = > "WITH query AS " > + "( " > + " SELECT id, fA1, fA1 AS fA1_2 " > + " FROM tblA" > + ") " > + "SELECT fA1, fB1, fA1_2 " > + "FROM query " > + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)"; > // Ok > String query2 = > "WITH query AS " > + "( " > + " SELECT fA1, fB1, fA1 AS fA1_2 " > + " FROM tblA " > + " JOIN tblB " > + " ON (TO_HEX(tblA.id) = tblB.id) " > + ")" > + "SELECT fA1, fB1, fA1_2 " > + "FROM query "; > // Ok > String query3 = > "WITH query AS " > + "( " > + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 " > + " FROM tblA" > + ") " > + "SELECT fA1, fB1, fA1_2 " > + "FROM query " > + "JOIN tblB ON (query.id = tblB.id)"; > Schema transform3 = > PCollectionTuple.of("tblA", inputA) > .and("tblB", inputB) > .apply(SqlTransform.query(query3)) > .getSchema(); > System.out.println(transform3); > Schema transform2 = > PCollectionTuple.of("tblA", inputA) > .and("tblB", inputB) > .apply(SqlTransform.query(query2)) > .getSchema(); > System.out.println(transform2); > Schema transform1 = > PCollectionTuple.of("tblA", inputA) >
[jira] [Updated] (BEAM-8890) Update google-cloud-java version
[ https://issues.apache.org/jira/browse/BEAM-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8890: Summary: Update google-cloud-java version (was: Update google-cloud-java) > Update google-cloud-java version > > > Key: BEAM-8890 > URL: https://issues.apache.org/jira/browse/BEAM-8890 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Kirill Kozlov >Priority: Major > > google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB). > Which is a problem when reading in DIRECT_READ (Read API) from BigQuery > tables with large rows. > Updating to v0.117.0 or above should fix the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8890) Update google-cloud-java
Kirill Kozlov created BEAM-8890: --- Summary: Update google-cloud-java Key: BEAM-8890 URL: https://issues.apache.org/jira/browse/BEAM-8890 Project: Beam Issue Type: Bug Components: io-java-gcp Affects Versions: 2.16.0 Reporter: Kirill Kozlov google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB). Which is a problem when reading in DIRECT_READ (Read API) from BigQuery tables with large rows. Updating to v0.117.0 or above should fix the problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987266#comment-16987266 ] Kirill Kozlov commented on BEAM-8564: - I do not have much experience working with GO SDK, but the Readme [1] on GitHub has some relevant information. I would start there. [1] [https://github.com/apache/beam/tree/master/sdks/go] > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8844) [SQL] Create performance tests for BigQueryTable
[ https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8844 started by Kirill Kozlov. --- > [SQL] Create performance tests for BigQueryTable > > > Key: BEAM-8844 > URL: https://issues.apache.org/jira/browse/BEAM-8844 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > They should measure read-time for: > * DIRECT_READ w/o push-down > * DIRECT_READ w/ push-down > * DEFAULT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986375#comment-16986375 ] Kirill Kozlov edited comment on BEAM-8564 at 12/2/19 9:14 PM: -- [~AmoghTiwari], thanks for working on this! Are you working on adding this just for the Java SDK? If so, there is no need to build other SDKs (the error above is from building GO SDK). You might want to use the following command: {code:java} ./gradlew -p sdks/java/ build {code} It should make your build go much faster. You could further specify the build target by updating the path. was (Author: kirillkozlov): [~AmoghTiwari], thanks for working on this! Are you working on adding this just for the Java SDK? If so, there is no need to build other SDKs. You might want to use the following command: {code:java} ./gradlew -p sdks/java/ build {code} It should make your build go much faster. You could further specify the build target by updating the path. > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986375#comment-16986375 ] Kirill Kozlov commented on BEAM-8564: - [~AmoghTiwari], thanks for working on this! Are you working on adding this just for the Java SDK? If so, there is no need to build other SDKs. You might want to use the following command: {code:java} ./gradlew -p sdks/java/ build {code} It should make your build go much faster. You could further specify the build target by updating the path. > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8844) [SQL] Create performance tests for BigQueryTable
[ https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8844: Summary: [SQL] Create performance tests for BigQueryTable (was: [SQL] Create performance tests for BigQuery) > [SQL] Create performance tests for BigQueryTable > > > Key: BEAM-8844 > URL: https://issues.apache.org/jira/browse/BEAM-8844 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > > They should measure read-time for: > * DIRECT_READ w/o push-down > * DIRECT_READ w/ push-down > * DEFAULT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8844) [SQL] Create performance tests for BigQuery
Kirill Kozlov created BEAM-8844: --- Summary: [SQL] Create performance tests for BigQuery Key: BEAM-8844 URL: https://issues.apache.org/jira/browse/BEAM-8844 Project: Beam Issue Type: New Feature Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov They should measure read-time for: * DIRECT_READ w/o push-down * DIRECT_READ w/ push-down * DEFAULT -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test
[ https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8586. - Fix Version/s: 2.18.0 Resolution: Fixed > [SQL] Add a server for MongoDb Integration Test > --- > > Key: BEAM-8586 > URL: https://issues.apache.org/jira/browse/BEAM-8586 > Project: Beam > Issue Type: Test > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Minor > Fix For: 2.18.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > We need to pass pipeline options with server information to the > MongoDbReadWriteIT. > For now that test is ignored and excluded from the build.gradle file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test
[ https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov reassigned BEAM-8586: --- Assignee: Kirill Kozlov > [SQL] Add a server for MongoDb Integration Test > --- > > Key: BEAM-8586 > URL: https://issues.apache.org/jira/browse/BEAM-8586 > Project: Beam > Issue Type: Test > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > We need to pass pipeline options with server information to the > MongoDbReadWriteIT. > For now that test is ignored and excluded from the build.gradle file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8427. - Fix Version/s: 2.18.0 Resolution: Fixed > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8794: Status: Open (was: Triage Needed) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
Kirill Kozlov created BEAM-8794: --- Summary: Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule Key: BEAM-8794 URL: https://issues.apache.org/jira/browse/BEAM-8794 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov It is more efficient to push-down projected fields at an IO level (vs merging with an Aggregate), when supported. When running queries like: {code:java} select SUM(score) as total_score from group by name{code} Projects get merged with an aggregate, as a result Calc (after an IOSourceRel) projects all fields and BeamIOPushDown rule does know what fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8343. - Fix Version/s: 2.18.0 Resolution: Fixed > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter constructFilter(List filter) > - ProjectSupport supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > ProjectSupport is an enum with the following options: > * NONE > * WITHOUT_FIELD_REORDERING > * WITH_FIELD_REORDERING > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8583. - Fix Version/s: 2.18.0 Resolution: Fixed > [SQL] BigQuery should support predicate push-down in DIRECT_READ mode > - > > Key: BEAM-8583 > URL: https://issues.apache.org/jira/browse/BEAM-8583 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > * Add BigQuery Dialect with TypeTranslation (since it is not implemented in > Calcite 1.20.0, but is present in unreleased versions). > * Create a BigQueryFilter class. > * BigQueryTable#buildIOReader should translate supported filters into a Sql > string and pass it to BigQueryIO. > > Potential improvements: > * After updating vendor Calcite, class > `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's > `BigQuerySqlDialect` can be utilized instead. > * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` > should be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8664) [SQL] MongoDb should use project push-down
Kirill Kozlov created BEAM-8664: --- Summary: [SQL] MongoDb should use project push-down Key: BEAM-8664 URL: https://issues.apache.org/jira/browse/BEAM-8664 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov MongoDbTable should implement the following methods: {code:java} public PCollection buildIOReader( PBegin begin, BeamSqlTableFilter filters, List fieldNames); public ProjectSupport supportsProjects(); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8583: Status: Open (was: Triage Needed) > [SQL] BigQuery should support predicate push-down in DIRECT_READ mode > - > > Key: BEAM-8583 > URL: https://issues.apache.org/jira/browse/BEAM-8583 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > * Add BigQuery Dialect with TypeTranslation (since it is not implemented in > Calcite 1.20.0, but is present in unreleased versions). > * Create a BigQueryFilter class. > * BigQueryTable#buildIOReader should translate supported filters into a Sql > string and pass it to BigQueryIO. > > Potential improvements: > * After updating vendor Calcite, class > `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's > `BigQuerySqlDialect` can be utilized instead. > * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` > should be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8468) Add predicate/filter push-down capability to IO APIs
[ https://issues.apache.org/jira/browse/BEAM-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8468. - Fix Version/s: 2.18.0 Resolution: Fixed > Add predicate/filter push-down capability to IO APIs > > > Key: BEAM-8468 > URL: https://issues.apache.org/jira/browse/BEAM-8468 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > * InMemoryTable should implement a following methods: > {code:java} > public PCollection buildIOReader( > PBegin begin, BeamSqlTableFilter filters, List fieldNames); > public BeamSqlTableFilter constructFilter(List filter); > {code} > * Update a push-down rule to support predicate/filter push-down. > * Create a class > {code:java} > class TestTableFilter implements BeamSqlTableFilter{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8508. - Fix Version/s: 2.18.0 Resolution: Fixed > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973664#comment-16973664 ] Kirill Kozlov commented on BEAM-8042: - [~kanterov] After looking more into this, I suspect that it is a ResolvedAggregateScan bug. Aggregate list, which stores ResolvedComputedColumns stores wrong column indexes in argumentList. Following test cases pass: {code:java} String sql = "SELECT \n" + " id, \n" + " COUNT(id) as count, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; {code} {code:java} String sql = "SELECT \n" + " id, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count, \n" + " COUNT(*) as count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; {code} In case when COUNT is last the combined aggregates look like this (as they should, used field indexes are correct): {code:java} aggCalls = {ArrayList@5898} size = 7 0 = {AggregateCall@5899} "SUM($1)" 1 = {AggregateCall@5947} "SUM($2)" 2 = {AggregateCall@5948} "SUM($3)" 3 = {AggregateCall@5949} "SUM($4)" 4 = {AggregateCall@5950} "SUM($5)" 5 = {AggregateCall@5951} "SUM($6)" 6 = {AggregateCall@5952} "COUNT()" {code} When COUNT is first (used field indexes are offset by 1): {code:java} aggCalls = {ArrayList@5898} size = 7 0 = {AggregateCall@5899} "COUNT()" 1 = {AggregateCall@5947} "SUM($2)" 2 = {AggregateCall@5948} "SUM($3)" 3 = {AggregateCall@5949} "SUM($4)" 4 = {AggregateCall@5950} "SUM($5)" 5 = {AggregateCall@5951} "SUM($6)" 6 = {AggregateCall@5952} "SUM($7)" {code} It increases fields used by aggregates that happen after COUNT by 1, thus causing an index out of bound exception when validating the last one. Even if out of bound exception was not thrown, this would produce the wrong result, because SUM aggregates would happen over the wrong fields. > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.b
[jira] [Comment Edited] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972928#comment-16972928 ] Kirill Kozlov edited comment on BEAM-8042 at 11/13/19 1:03 AM: --- I tried running a similar query, but without COUNT( * ) and it seems construct BeamRelNode successfully. {code:java} ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); String sql = "SELECT \n" + " id, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql); {code} https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to AggregateProjectMergeRule. was (Author: kirillkozlov): I tried running a similar query, but without COUNT(*) and it seems construct BeamRelNode successfully. {code:java} ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); String sql = "SELECT \n" + " id, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql); {code} https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to AggregateProjectMergeRule. > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Critical > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) >
[jira] [Comment Edited] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972928#comment-16972928 ] Kirill Kozlov edited comment on BEAM-8042 at 11/13/19 1:03 AM: --- I tried running a similar query, but without COUNT(*) and it seems construct BeamRelNode successfully. {code:java} ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); String sql = "SELECT \n" + " id, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql); {code} https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to AggregateProjectMergeRule. was (Author: kirillkozlov): I tried running a similar query, but without COUNT(*) and it seems construct BeamRelNode successfully. {code:java} ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); String sql = "SELECT \n" + " id, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql); {code} https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to AggregateProjectMergeRule. > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Critical > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at
[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972928#comment-16972928 ] Kirill Kozlov commented on BEAM-8042: - I tried running a similar query, but without COUNT(*) and it seems construct BeamRelNode successfully. {code:java} ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config); String sql = "SELECT \n" + " id, \n" + " SUM(has_f1) as f1_count, \n" + " SUM(has_f2) as f2_count, \n" + " SUM(has_f3) as f3_count, \n" + " SUM(has_f4) as f4_count, \n" + " SUM(has_f5) as f5_count, \n" + " SUM(has_f6) as f6_count \n" + "FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 0 as has_f5, 0 as has_f6)\n" + "GROUP BY id"; BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql); {code} https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to AggregateProjectMergeRule. > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Critical > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test
[ https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8586: Summary: [SQL] Add a server for MongoDb Integration Test (was: Add a server for MongoDb Integration Test) > [SQL] Add a server for MongoDb Integration Test > --- > > Key: BEAM-8586 > URL: https://issues.apache.org/jira/browse/BEAM-8586 > Project: Beam > Issue Type: Test > Components: dsl-sql >Reporter: Kirill Kozlov >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We need to pass pipeline options with server information to the > MongoDbReadWriteIT. > For now that test is ignored and excluded from the build.gradle file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test
[ https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8586: Status: Open (was: Triage Needed) > [SQL] Add a server for MongoDb Integration Test > --- > > Key: BEAM-8586 > URL: https://issues.apache.org/jira/browse/BEAM-8586 > Project: Beam > Issue Type: Test > Components: dsl-sql >Reporter: Kirill Kozlov >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We need to pass pipeline options with server information to the > MongoDbReadWriteIT. > For now that test is ignored and excluded from the build.gradle file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8427: Description: * Create a MongoDB table and table provider. * Implement buildIOReader * Support primitive types * Implement buildIOWrite * improve getTableStatistics was: In progress: * Create a MongoDB table and table provider. * Implement buildIOReader * Support primitive types Still needs to be done: * Implement buildIOWrite * improve getTableStatistics > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 7.5h > Remaining Estimate: 0h > > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8583: Description: * Add BigQuery Dialect with TypeTranslation (since it is not implemented in Calcite 1.20.0, but is present in unreleased versions). * Create a BigQueryFilter class. * BigQueryTable#buildIOReader should translate supported filters into a Sql string and pass it to BigQueryIO. Potential improvements: * After updating vendor Calcite, class `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's `BigQuerySqlDialect` can be utilized instead. * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` should be updated. > [SQL] BigQuery should support predicate push-down in DIRECT_READ mode > - > > Key: BEAM-8583 > URL: https://issues.apache.org/jira/browse/BEAM-8583 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > * Add BigQuery Dialect with TypeTranslation (since it is not implemented in > Calcite 1.20.0, but is present in unreleased versions). > * Create a BigQueryFilter class. > * BigQueryTable#buildIOReader should translate supported filters into a Sql > string and pass it to BigQueryIO. > > Potential improvements: > * After updating vendor Calcite, class > `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's > `BigQuerySqlDialect` can be utilized instead. > * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` > should be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov closed BEAM-8574. --- Fix Version/s: 2.18.0 Resolution: Resolved > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Integration test for Sql MongoDb table read and write fails. > Jenkins: > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > Cause: [https://github.com/apache/beam/pull/9806] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8586) Add a server for MongoDb Integration Test
Kirill Kozlov created BEAM-8586: --- Summary: Add a server for MongoDb Integration Test Key: BEAM-8586 URL: https://issues.apache.org/jira/browse/BEAM-8586 Project: Beam Issue Type: Test Components: dsl-sql Reporter: Kirill Kozlov We need to pass pipeline options with server information to the MongoDbReadWriteIT. For now that test is ignored and excluded from the build.gradle file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
Kirill Kozlov created BEAM-8583: --- Summary: [SQL] BigQuery should support predicate push-down in DIRECT_READ mode Key: BEAM-8583 URL: https://issues.apache.org/jira/browse/BEAM-8583 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8574: Description: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] Cause: [https://github.com/apache/beam/pull/9806] Revert PR: [https://github.com/apache/beam/pull/10018] was: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] Cause: [https://github.com/apache/beam/pull/9806] > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > Integration test for Sql MongoDb table read and write fails. > Jenkins: > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > Cause: [https://github.com/apache/beam/pull/9806] > Revert PR: [https://github.com/apache/beam/pull/10018] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8574: Description: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] Cause: [https://github.com/apache/beam/pull/9806] was: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] Cause: [https://github.com/apache/beam/pull/9806] Revert PR: [https://github.com/apache/beam/pull/10018] > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > Integration test for Sql MongoDb table read and write fails. > Jenkins: > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > Cause: [https://github.com/apache/beam/pull/9806] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8574: Description: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] Cause: [https://github.com/apache/beam/pull/9806] was: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > > Integration test for Sql MongoDb table read and write fails. > Jenkins: > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > Cause: [https://github.com/apache/beam/pull/9806] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8574: Description: Integration test for Sql MongoDb table read and write fails. Jenkins: [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] was: Integration test for Sql MongoDb table read and write fails. [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > > Integration test for Sql MongoDb table read and write fails. > Jenkins: > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8574: Description: Integration test for Sql MongoDb table read and write fails. [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] was:Integration test for Sql MongoDb table read and write fails. > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > > Integration test for Sql MongoDb table read and write fails. > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
Kirill Kozlov created BEAM-8574: --- Summary: [SQL] MongoDb PostCommit_SQL fails Key: BEAM-8574 URL: https://issues.apache.org/jira/browse/BEAM-8574 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov Integration test for Sql MongoDb table read and write fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8343: Description: The objective is to create a universal way for Beam SQL IO APIs to support predicate/project push-down. A proposed way to achieve that is by introducing an interface responsible for identifying what portion(s) of a Calc can be moved down to IO layer. Also, adding following methods to a BeamSqlTable interface to pass necessary parameters to IO APIs: - BeamSqlTableFilter constructFilter(List filter) - ProjectSupport supportsProjects() - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames) ProjectSupport is an enum with the following options: * NONE * WITHOUT_FIELD_REORDERING * WITH_FIELD_REORDERING Design doc [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. was: The objective is to create a universal way for Beam SQL IO APIs to support predicate/project push-down. A proposed way to achieve that is by introducing an interface responsible for identifying what portion(s) of a Calc can be moved down to IO layer. Also, adding following methods to a BeamSqlTable interface to pass necessary parameters to IO APIs: - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) - ProjectSupport supportsProjects() - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames) * ProjectSupport is an enum with the following options: * NONE * WITHOUT_FIELD_REORDERING * WITH_FIELD_REORDERING Design doc [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter constructFilter(List filter) > - ProjectSupport supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > ProjectSupport is an enum with the following options: > * NONE > * WITHOUT_FIELD_REORDERING > * WITH_FIELD_REORDERING > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8343: Description: The objective is to create a universal way for Beam SQL IO APIs to support predicate/project push-down. A proposed way to achieve that is by introducing an interface responsible for identifying what portion(s) of a Calc can be moved down to IO layer. Also, adding following methods to a BeamSqlTable interface to pass necessary parameters to IO APIs: - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) - ProjectSupport supportsProjects() - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames) * ProjectSupport is an enum with the following options: * NONE * WITHOUT_FIELD_REORDERING * WITH_FIELD_REORDERING Design doc [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. was: The objective is to create a universal way for Beam SQL IO APIs to support predicate/project push-down. A proposed way to achieve that is by introducing an interface responsible for identifying what portion(s) of a Calc can be moved down to IO layer. Also, adding following methods to a BeamSqlTable interface to pass necessary parameters to IO APIs: - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) - Boolean supportsProjects() - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, List fieldNames) Design doc [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter) > - ProjectSupport supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > * ProjectSupport is an enum with the following options: > * NONE > * WITHOUT_FIELD_REORDERING > * WITH_FIELD_REORDERING > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8514) ZetaSql should use cost-based optimization to take advantage of Join Reordering Rule and Push-Down Rule
[ https://issues.apache.org/jira/browse/BEAM-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968625#comment-16968625 ] Kirill Kozlov commented on BEAM-8514: - There are currently no ZetaSql tests to validate Join Reordering. > ZetaSql should use cost-based optimization to take advantage of Join > Reordering Rule and Push-Down Rule > --- > > Key: BEAM-8514 > URL: https://issues.apache.org/jira/browse/BEAM-8514 > Project: Beam > Issue Type: Improvement > Components: dsl-sql-zetasql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Default config should use BeamCostModel, as well as tests with custom > configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8498) [SQL] MongoDb table reader should convert Documents directly to Row (and vice versa)
[ https://issues.apache.org/jira/browse/BEAM-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8498: Summary: [SQL] MongoDb table reader should convert Documents directly to Row (and vice versa) (was: [SQL] MongoDb table reader should convert Documents directly to Row) > [SQL] MongoDb table reader should convert Documents directly to Row (and vice > versa) > > > Key: BEAM-8498 > URL: https://issues.apache.org/jira/browse/BEAM-8498 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Priority: Major > Labels: performance > > As of right now, the process of reading from MongoDb in SQL is as follows: > Read MongoDb Documents -> Convert Documents to JSON -> Convert JSON to Rows. > It should be possible to get rid of the middle step by making use of > RowWithGetters (vs RowWithStorage). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8428) [SQL] BigQuery should support project push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8428. - Fix Version/s: 2.18.0 Resolution: Fixed > [SQL] BigQuery should support project push-down in DIRECT_READ mode > --- > > Key: BEAM-8428 > URL: https://issues.apache.org/jira/browse/BEAM-8428 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 2h > Remaining Estimate: 0h > > BigQuery should perform project push-down for read pipelines when applicable. -- This message was sent by Atlassian Jira (v8.3.4#803005)