[jira] [Created] (BEAM-9871) Continuous performance tests for BigQuery read is not displayed on the dashboard

2020-05-01 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9871:
---

 Summary: Continuous performance tests for BigQuery read is not 
displayed on the dashboard
 Key: BEAM-9871
 URL: https://issues.apache.org/jira/browse/BEAM-9871
 Project: Beam
  Issue Type: Bug
  Components: testing
Affects Versions: 2.20.0
Reporter: Kirill Kozlov


Dashboard: 
[https://apache-beam-testing.appspot.com/explore?dashboard=5687798237495296]

BigQuery read in EXPORT and DIRECT_READ modes are not being displayed on the 
dashboard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7609) SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names

2020-03-25 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066946#comment-17066946
 ] 

Kirill Kozlov commented on BEAM-7609:
-

Not sure if this issue is still reproducible.
Not actively working on this, will move to Unassigned.

> SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names
> ---
>
> Key: BEAM-7609
> URL: https://issues.apache.org/jira/browse/BEAM-7609
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Gleb Kanterov
>Priority: Major
>
> Works in sqlline shell:
> {code}
> Welcome to Beam SQL 2.14.0-SNAPSHOT (based on sqlline version 1.4.0)
> 0: BeamSQL> CREATE EXTERNAL TABLE s1 (id BIGINT) TYPE 'test';
> No rows affected (0.507 seconds)
> 0: BeamSQL> CREATE EXTERNAL TABLE s2 (id BIGINT) TYPE 'test';
> No rows affected (0.004 seconds)
> 0: BeamSQL> SELECT DISTINCT s1.id as lhs, s2.id as rhs FROM s1 JOIN s2 USING 
> (id);
> +-+-+
> | lhs | rhs |
> +-+-+
> +-+-+
> No rows selected (2.568 seconds)
> {code}
> But doesn't work in the test:
> {code}
> Schema inputSchema = Schema.of(
> Schema.Field.of("id", Schema.FieldType.INT32));
> PCollection i1 = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> PCollection i2 = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> Schema outputSchema = PCollectionTuple
> .of("i1", i1)
> .and("i2", i2)
> .apply(SqlTransform.query("SELECT DISTINCT s1.id as lhs, s2.id as rhs 
> FROM i1 JOIN i2 USING (id)"))
> .getSchema();
> assertEquals(ImmutableList.of("lhs", "rhs"), 
> outputSchema.getFieldNames());
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7610) SELECT COALESCE(...) isn't inferred as non-nullable field

2020-03-25 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066942#comment-17066942
 ] 

Kirill Kozlov edited comment on BEAM-7610 at 3/25/20, 6:10 PM:
---

The underlying issue was fixed in Calcite. Updating vendored Calite to version 
1.22.0 or later should fix this issue.
Not actively working on this, will move to Unassigned.


was (Author: kirillkozlov):
The underlying issue was fixed in Calcite. Updating vendored Calite to version 
1.22.0 or later should fix this issue.

> SELECT COALESCE(...) isn't inferred as non-nullable field
> -
>
> Key: BEAM-7610
> URL: https://issues.apache.org/jira/browse/BEAM-7610
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Gleb Kanterov
>Priority: Major
>
> In Calcite, Coalesce is described as:
> {code}
> ReturnTypes.cascade(ReturnTypes.LEAST_RESTRICTIVE,
> SqlTypeTransforms.LEAST_NULLABLE)
> {code}
> However, giving non-null constant as an argument doesn't result in a 
> non-nullable expression:
> {code}
> Schema inputSchema = Schema.of(
> Schema.Field.of("name", Schema.FieldType.STRING.withNullable(true)));
> PCollection input = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> Schema outputSchema = input
> .apply(SqlTransform.query("SELECT COALESCE(name, 'unknown') as name 
> FROM PCOLLECTION"))
> .getSchema();
> assertEquals(
> Schema.builder().addStringField("name").build(),
> outputSchema);
> {code}
> Not sure if it's a problem in Calcite or Beam SQL.
> There are no other functions that can be used to produce a non-nullable field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7610) SELECT COALESCE(...) isn't inferred as non-nullable field

2020-03-25 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov reassigned BEAM-7610:
---

Assignee: (was: Kirill Kozlov)

> SELECT COALESCE(...) isn't inferred as non-nullable field
> -
>
> Key: BEAM-7610
> URL: https://issues.apache.org/jira/browse/BEAM-7610
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Gleb Kanterov
>Priority: Major
>
> In Calcite, Coalesce is described as:
> {code}
> ReturnTypes.cascade(ReturnTypes.LEAST_RESTRICTIVE,
> SqlTypeTransforms.LEAST_NULLABLE)
> {code}
> However, giving non-null constant as an argument doesn't result in a 
> non-nullable expression:
> {code}
> Schema inputSchema = Schema.of(
> Schema.Field.of("name", Schema.FieldType.STRING.withNullable(true)));
> PCollection input = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> Schema outputSchema = input
> .apply(SqlTransform.query("SELECT COALESCE(name, 'unknown') as name 
> FROM PCOLLECTION"))
> .getSchema();
> assertEquals(
> Schema.builder().addStringField("name").build(),
> outputSchema);
> {code}
> Not sure if it's a problem in Calcite or Beam SQL.
> There are no other functions that can be used to produce a non-nullable field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7609) SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names

2020-03-25 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov reassigned BEAM-7609:
---

Assignee: (was: Kirill Kozlov)

> SqlTransform#getSchema for "SELECT DISTINCT + JOIN" has invalid field names
> ---
>
> Key: BEAM-7609
> URL: https://issues.apache.org/jira/browse/BEAM-7609
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Gleb Kanterov
>Priority: Major
>
> Works in sqlline shell:
> {code}
> Welcome to Beam SQL 2.14.0-SNAPSHOT (based on sqlline version 1.4.0)
> 0: BeamSQL> CREATE EXTERNAL TABLE s1 (id BIGINT) TYPE 'test';
> No rows affected (0.507 seconds)
> 0: BeamSQL> CREATE EXTERNAL TABLE s2 (id BIGINT) TYPE 'test';
> No rows affected (0.004 seconds)
> 0: BeamSQL> SELECT DISTINCT s1.id as lhs, s2.id as rhs FROM s1 JOIN s2 USING 
> (id);
> +-+-+
> | lhs | rhs |
> +-+-+
> +-+-+
> No rows selected (2.568 seconds)
> {code}
> But doesn't work in the test:
> {code}
> Schema inputSchema = Schema.of(
> Schema.Field.of("id", Schema.FieldType.INT32));
> PCollection i1 = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> PCollection i2 = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> Schema outputSchema = PCollectionTuple
> .of("i1", i1)
> .and("i2", i2)
> .apply(SqlTransform.query("SELECT DISTINCT s1.id as lhs, s2.id as rhs 
> FROM i1 JOIN i2 USING (id)"))
> .getSchema();
> assertEquals(ImmutableList.of("lhs", "rhs"), 
> outputSchema.getFieldNames());
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7610) SELECT COALESCE(...) isn't inferred as non-nullable field

2020-03-25 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066942#comment-17066942
 ] 

Kirill Kozlov commented on BEAM-7610:
-

The underlying issue was fixed in Calcite. Updating vendored Calite to version 
1.22.0 or later should fix this issue.

> SELECT COALESCE(...) isn't inferred as non-nullable field
> -
>
> Key: BEAM-7610
> URL: https://issues.apache.org/jira/browse/BEAM-7610
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.13.0
>Reporter: Gleb Kanterov
>Assignee: Kirill Kozlov
>Priority: Major
>
> In Calcite, Coalesce is described as:
> {code}
> ReturnTypes.cascade(ReturnTypes.LEAST_RESTRICTIVE,
> SqlTypeTransforms.LEAST_NULLABLE)
> {code}
> However, giving non-null constant as an argument doesn't result in a 
> non-nullable expression:
> {code}
> Schema inputSchema = Schema.of(
> Schema.Field.of("name", Schema.FieldType.STRING.withNullable(true)));
> PCollection input = p.apply(Create.of(ImmutableList.of())
> .withCoder(SchemaCoder.of(inputSchema)));
> Schema outputSchema = input
> .apply(SqlTransform.query("SELECT COALESCE(name, 'unknown') as name 
> FROM PCOLLECTION"))
> .getSchema();
> assertEquals(
> Schema.builder().addStringField("name").build(),
> outputSchema);
> {code}
> Not sure if it's a problem in Calcite or Beam SQL.
> There are no other functions that can be used to produce a non-nullable field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9358) BigQueryIO potential write speed regression

2020-02-24 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov closed BEAM-9358.
---
Fix Version/s: 2.19.0
   Resolution: Resolved

> BigQueryIO potential write speed regression
> ---
>
> Key: BEAM-9358
> URL: https://issues.apache.org/jira/browse/BEAM-9358
> Project: Beam
>  Issue Type: Task
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: Kirill Kozlov
>Priority: Minor
> Fix For: 2.19.0
>
>
> There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) 
> [1], as well as 10x increase in runtime [2] for python BigQueryIO in the 
> PerfKit dashboard.
> Seems to be fairly recent, started on the Feb 20th and continued on the Feb 
> 21st. Maybe a flake, but still worth investigating.
> [1] 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938]
> [2] 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9358) BigQueryIO potential write speed regression

2020-02-24 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043694#comment-17043694
 ] 

Kirill Kozlov commented on BEAM-9358:
-

Looks like everything went back to normal on the Feb 23rd, closing this issue.

> BigQueryIO potential write speed regression
> ---
>
> Key: BEAM-9358
> URL: https://issues.apache.org/jira/browse/BEAM-9358
> Project: Beam
>  Issue Type: Task
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: Kirill Kozlov
>Priority: Minor
>
> There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) 
> [1], as well as 10x increase in runtime [2] for python BigQueryIO in the 
> PerfKit dashboard.
> Seems to be fairly recent, started on the Feb 20th and continued on the Feb 
> 21st. Maybe a flake, but still worth investigating.
> [1] 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938]
> [2] 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9358) BigQueryIO potential write speed regression

2020-02-21 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-9358:

Description: 
There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) 
[1], as well as 10x increase in runtime [2] for python BigQueryIO in the 
PerfKit dashboard.

Seems to be fairly recent, started on the Feb 20th and continued on the Feb 
21st. Maybe a flake, but still worth investigating.

[1] 
[https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938]

[2] 
[https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888]

  was:
There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5), 
as well as 10x increase in runtime [2] for python BigQueryIO in the PerfKit 
dashboard [1].

Seems to be fairly recent, started on the Feb 20th and continued on the Feb 
21st. Maybe a flake, but still worth investigating.

[1] 
[https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938]

[2] 
[https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888]


> BigQueryIO potential write speed regression
> ---
>
> Key: BEAM-9358
> URL: https://issues.apache.org/jira/browse/BEAM-9358
> Project: Beam
>  Issue Type: Task
>  Components: io-py-gcp
>Affects Versions: 2.19.0
>Reporter: Kirill Kozlov
>Priority: Minor
>
> There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5) 
> [1], as well as 10x increase in runtime [2] for python BigQueryIO in the 
> PerfKit dashboard.
> Seems to be fairly recent, started on the Feb 20th and continued on the Feb 
> 21st. Maybe a flake, but still worth investigating.
> [1] 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938]
> [2] 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9358) BigQueryIO potential write speed regression

2020-02-21 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9358:
---

 Summary: BigQueryIO potential write speed regression
 Key: BEAM-9358
 URL: https://issues.apache.org/jira/browse/BEAM-9358
 Project: Beam
  Issue Type: Task
  Components: io-py-gcp
Affects Versions: 2.19.0
Reporter: Kirill Kozlov


There is a drastic decrease in Megabytes/second write speeds (from ~50 to ~5), 
as well as 10x increase in runtime [2] for python BigQueryIO in the PerfKit 
dashboard [1].

Seems to be fairly recent, started on the Feb 20th and continued on the Feb 
21st. Maybe a flake, but still worth investigating.

[1] 
[https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=1939451611&container=847031938]

[2] 
[https://apache-beam-testing.appspot.com/explore?dashboard=5667383922393088&widget=2088160722&container=15365888]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-4457) Analyze FieldAccessDescriptors and drop fields that are never accessed

2020-02-04 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030165#comment-17030165
 ] 

Kirill Kozlov commented on BEAM-4457:
-

Definitely a cool idea! Yes, right now push-down heavily relies on the SQL 
optimizer rules, but this can help simplify them.

> Analyze FieldAccessDescriptors and drop fields that are never accessed
> --
>
> Key: BEAM-4457
> URL: https://issues.apache.org/jira/browse/BEAM-4457
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-gcp
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>
> We can walk backwards through the graph, analyzing which fields are accessed. 
> When we find paths where many fields are never accessed, we can insert a 
> projection transform to drop those fields preemptively. This can save a lot 
> of resources in the case where many fields in the input are never accessed.
> To do this, the FieldAccessDescriptor information must be added to the 
> portability protos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8042) Parsing of aggregate query fails

2020-01-30 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8042.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
> Fix For: 2.20.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2020-01-24 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023346#comment-17023346
 ] 

Kirill Kozlov commented on BEAM-9027:
-

Thanks! Yes, feel free to assign to yourself.

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-24 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/24/20 10:00 PM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to ensure that strings with special characters (like: 
{noformat}
SELECT 'abc\\n'{noformat}
) get escaped.

It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL 
does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 
{noformat}
SELECT 'abc\\n'{noformat}
) get escaped.

It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL 
does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-24 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-9169.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8890) Update google-cloud-java version

2020-01-24 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8890:

Description: 
google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB).

Which is a problem when reading in DIRECT_READ (Read API) from BigQuery tables 
with large rows.

Updating to v0.117.0 or above should fix the problem.

[https://github.com/googleapis/google-cloud-java/blob/v0.117.0/google-cloud-clients/google-cloud-bigquerystorage/pom.xml]

  was:
google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB).

Which is a problem when reading in DIRECT_READ (Read API) from BigQuery tables 
with large rows.

Updating to v0.117.0 or above should fix the problem.


> Update google-cloud-java version
> 
>
> Key: BEAM-8890
> URL: https://issues.apache.org/jira/browse/BEAM-8890
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Kirill Kozlov
>Priority: Major
>
> google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB).
> Which is a problem when reading in DIRECT_READ (Read API) from BigQuery 
> tables with large rows.
> Updating to v0.117.0 or above should fix the problem.
> [https://github.com/googleapis/google-cloud-java/blob/v0.117.0/google-cloud-clients/google-cloud-bigquerystorage/pom.xml]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9072) [SQL] Add support for Datastore source

2020-01-23 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-9072.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> [SQL] Add support for Datastore source
> --
>
> Key: BEAM-9072
> URL: https://issues.apache.org/jira/browse/BEAM-9072
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> * Create a Datastore table and table provider
>  * Conversion between Datastore and Beam data types
>  * Implement buildIOReader
>  * Implement buildIOWrite
>  * Implement getTableStatistics
> Doc: 
> [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9180) [ZetaSQL] Support 4-byte unicode in literal string unparsing

2020-01-23 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9180:
---

 Summary: [ZetaSQL] Support 4-byte unicode in literal string 
unparsing
 Key: BEAM-9180
 URL: https://issues.apache.org/jira/browse/BEAM-9180
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql-zetasql
Reporter: Kirill Kozlov


When unprasing literal strings we need to escape special symbols (ex: `\n`, 
`\r`, `\u0012`).

ZetaSQL supports for some 4-byte (or 8 hex digit) unicode via `\U`.

As of 
[now|[https://github.com/apache/beam/blob/8a35f408f640d04c38ad6e2a497d30410b3bff32/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L59]]
 only 2-byte (or 4 hex digit) unicode is supported by escaping it via `\u`.

 

More about escape sequences here (need to scroll down a little): 

https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-22 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021459#comment-17021459
 ] 

Kirill Kozlov commented on BEAM-9169:
-

Created a PR to use a custom escape method. The issue involving `/` should be 
fixed now, but I noticed that we might have some issues escaping single quotes.
{code:java}
SELECT 'it\'s'
{code}
A correctly escaped string with single quotes gets corrupted by 
SqlCharStringLiteral.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020690#comment-17020690
 ] 

Kirill Kozlov commented on BEAM-9169:
-

Yes, we might be able to fix this by specifying what needs escaping and what 
does not (or using a different escaping method). Not sure how to achieve that, 
but I will look into it.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: '
{noformat}
SELECT 'abc\\n'{noformat}
') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:30 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 
{noformat}
SELECT 'abc\\n'{noformat}
) get escaped.

It looks like `StringEscapeUtils.escapeJava` method escapes characters ZetaSQL 
does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: '
{noformat}
SELECT 'abc\\n'{noformat}
') get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Kirill Kozlov
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681
 ] 

Kirill Kozlov edited comment on BEAM-9169 at 1/22/20 12:28 AM:
---

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.


was (Author: kirillkozlov):
[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

 

Edit:

This was added to insure that strings with special characters (like: 'abc\\n') 
get escaped. It looks like `StringEscapeUtils.escapeJava` method escapes 
characters ZetaSQL does not expect to be escaped.

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9169) Extra character introduced during Calcite unparsing

2020-01-21 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020681#comment-17020681
 ] 

Kirill Kozlov commented on BEAM-9169:
-

[~robinyqiu] This might be the cause: 
[https://github.com/11moon11/beam/blob/d7ef497b2360ea68f2bad9fb695a539b12cb1ddd/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BeamSqlUnparseContext.java#L52]

> Extra character introduced during Calcite unparsing
> ---
>
> Key: BEAM-9169
> URL: https://issues.apache.org/jira/browse/BEAM-9169
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Minor
>
> When I am testing query string
> {code:java}
> "SELECT STRING(TIMESTAMP \"2008-12-25 15:30:00\", 
> \"America/Los_Angeles\")"{code}
> on BeamZetaSqlCalcRel I found that the second string parameter to the 
> function is unparsed to
> {code:java}
> America\/Los_Angeles{code}
> (note an extra backslash character is added).
> This breaks the ZetaSQL evaluator with error
> {code:java}
> Syntax error: Illegal escape sequence: \/{code}
> From what I can see now this character is introduced during the Calcite 
> unparsing step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8042 started by Kirill Kozlov.
---
> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8042) Parsing of aggregate query fails

2020-01-21 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov reassigned BEAM-8042:
---

Assignee: Kirill Kozlov

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9164) [PreCommit_Java] [Flake] org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest

2020-01-21 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-9164:

Summary: [PreCommit_Java] [Flake] 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
  (was: Flaky test: 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest)

> [PreCommit_Java] [Flake] 
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
> ---
>
> Key: BEAM-9164
> URL: https://issues.apache.org/jira/browse/BEAM-9164
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kirill Kozlov
>Priority: Major
>  Labels: flake
>
> Test:
> org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
>  >> testWatermarkEmission[numTasks = 1; numSplits=1]
> Fails with the following exception:
> {code:java}
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds{code}
> Affected Jenkins job: 
> [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1665/]
> Gradle build scan: [https://scans.gradle.com/s/nvgeb425fe63q]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9164) Flaky test: org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest

2020-01-21 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9164:
---

 Summary: Flaky test: 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
 Key: BEAM-9164
 URL: https://issues.apache.org/jira/browse/BEAM-9164
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kirill Kozlov


Test:

org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapperTest$ParameterizedUnboundedSourceWrapperTest
 >> testWatermarkEmission[numTasks = 1; numSplits=1]

Fails with the following exception:
{code:java}
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds{code}
Affected Jenkins job: 
[https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1665/]

Gradle build scan: [https://scans.gradle.com/s/nvgeb425fe63q]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9072) [SQL] Add support for Datastore source

2020-01-14 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-9072:

Description: 
* Create a Datastore table and table provider
 * Conversion between Datastore and Beam data types
 * Implement buildIOReader
 * Implement buildIOWrite
 * Implement getTableStatistics

Doc: 
[https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1]

  was:
* Create a Datastore table and table provider
 * Conversion between Datastore and Beam data types
 * Implement buildIOReader
 * Implement buildIOWrite
 * Implement getTableStatistics


> [SQL] Add support for Datastore source
> --
>
> Key: BEAM-9072
> URL: https://issues.apache.org/jira/browse/BEAM-9072
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> * Create a Datastore table and table provider
>  * Conversion between Datastore and Beam data types
>  * Implement buildIOReader
>  * Implement buildIOWrite
>  * Implement getTableStatistics
> Doc: 
> [https://docs.google.com/document/d/1FxuEGewJ3GPDl0IKglfOYf1edwa2m_wryFZYRMpRNbA/edit?pli=1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2020-01-14 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010124#comment-17010124
 ] 

Kirill Kozlov edited comment on BEAM-9027 at 1/14/20 8:51 PM:
--

Brief update on this issue.

Following unparsing errors have been fixed:
 * Calcite cannot unparse RexNode back to bytes literal
 * Calcite cannot unparse some floating point literals correctly
 * Calcite cannot unparse some string literals correctly

Other issues from the initial list (INTERVAL and CAST) still need to be fixed.


was (Author: kirillkozlov):
Brief update on this issue.

Following unparsing error have been fixed:
 * Calcite cannot unparse RexNode back to bytes literal
 * Calcite cannot unparse some floating point literals correctly
 * Calcite cannot unparse some string literals correctly

Other issues from the initial list (INTERVAL and CAST) still need to be fixed.

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2020-01-14 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8844.
-
Fix Version/s: 2.19.0
   Resolution: Fixed

> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2020-01-13 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014717#comment-17014717
 ] 

Kirill Kozlov commented on BEAM-8844:
-

Result can be viewed here: 
[https://apache-beam-testing.appspot.com/explore?dashboard=5687798237495296]

The problem I have noticed with the tests: they rely on public datasets, which 
get periodically updated (with new data), making test results inconsistent.

We should make a copy of the dataset and use it for performance tests.

> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9114) BigQueryIO TableRowParser should support Arrow and Avro data formats

2020-01-13 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9114:
---

 Summary: BigQueryIO TableRowParser should support Arrow and Avro 
data formats
 Key: BEAM-9114
 URL: https://issues.apache.org/jira/browse/BEAM-9114
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Kirill Kozlov


We should implement a function like BigQueryIO#readRows that encapsulate row 
conversion logic [1] for both Arrow and Avro data formats.

BigQueryTable#buildIOReader from SQL extension should call the function 
described above.

 

[1] https://github.com/apache/beam/pull/10369#discussion_r365979515



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8993) [SQL] MongoDb should use predicate push-down

2020-01-13 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8993.
-
Fix Version/s: 2.19.0
   Resolution: Fixed

> [SQL] MongoDb should use predicate push-down
> 
>
> Key: BEAM-8993
> URL: https://issues.apache.org/jira/browse/BEAM-8993
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> * Add a MongoDbFilter class, implementing BeamSqlTableFilter.
>  ** Support simple comparison operations.
>  ** Support boolean field.
>  ** Support nested conjunction/disjunction.
>  * Update MongoDbTable#buildIOReader
>  ** Construct a push-down filter from RexNodes.
>  ** Set filter to FindQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7865) Java precommit Samza tests opening RocksDB flaky

2020-01-09 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012208#comment-17012208
 ] 

Kirill Kozlov edited comment on BEAM-7865 at 1/9/20 8:30 PM:
-

This PR [1] seems to be affected be this issue too. Jenkins tasks: [2, 3].

[1] [https://github.com/apache/beam/pull/10527]

[2] 
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Commit/9530/]

[3] [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1629/]


was (Author: kirillkozlov):
This PR [1] seems to be affected. Jenkins tasks affected [2, 3].

[1] [https://github.com/apache/beam/pull/10527]

[2] 
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Commit/9530/]

[3] [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1629/]

> Java precommit Samza tests opening RocksDB flaky
> 
>
> Key: BEAM-7865
> URL: https://issues.apache.org/jira/browse/BEAM-7865
> Project: Beam
>  Issue Type: Bug
>  Components: runner-samza, test-failures
>Reporter: Kyle Weaver
>Priority: Major
>
> org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers
>  and 
> org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testRestore
>  fail with exception: 
> org.apache.samza.SamzaException: Error opening RocksDB store beamStore at 
> location /tmp/store2
> ...
> Caused by: org.rocksdb.RocksDBException: Can't access /000359.sst: IO error: 
> while stat a file for size: /tmp/store2/000359.sst: No such file or directory 
> Can't access /000356.sst: IO error: while stat a file for size: 
> /tmp/store2/000356.sst: No such file or directory Can't access /000354.sst: 
> IO error: while stat a file for size: /tmp/store2/000354.sst: No such file or 
> directory
>  
> Example failures: 
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6896/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6906/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7865) Java precommit Samza tests opening RocksDB flaky

2020-01-09 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012208#comment-17012208
 ] 

Kirill Kozlov commented on BEAM-7865:
-

This PR [1] seems to be affected. Jenkins tasks affected [2, 3].

[1] [https://github.com/apache/beam/pull/10527]

[2] 
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Java_Commit/9530/]

[3] [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1629/]

> Java precommit Samza tests opening RocksDB flaky
> 
>
> Key: BEAM-7865
> URL: https://issues.apache.org/jira/browse/BEAM-7865
> Project: Beam
>  Issue Type: Bug
>  Components: runner-samza, test-failures
>Reporter: Kyle Weaver
>Priority: Major
>
> org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers
>  and 
> org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testRestore
>  fail with exception: 
> org.apache.samza.SamzaException: Error opening RocksDB store beamStore at 
> location /tmp/store2
> ...
> Caused by: org.rocksdb.RocksDBException: Can't access /000359.sst: IO error: 
> while stat a file for size: /tmp/store2/000359.sst: No such file or directory 
> Can't access /000356.sst: IO error: while stat a file for size: 
> /tmp/store2/000356.sst: No such file or directory Can't access /000354.sst: 
> IO error: while stat a file for size: /tmp/store2/000354.sst: No such file or 
> directory
>  
> Example failures: 
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6896/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/6906/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9072) [SQL] Add support for Datastore source

2020-01-08 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-9072:

Summary: [SQL] Add support for Datastore source  (was: [SQL] Add Datastore 
table read support)

> [SQL] Add support for Datastore source
> --
>
> Key: BEAM-9072
> URL: https://issues.apache.org/jira/browse/BEAM-9072
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>
> * Create a Datastore table and table provider
>  * Conversion between Datastore and Beam data types
>  * Implement buildIOReader
>  * Implement buildIOWrite
>  * Implement getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9072) [SQL] Add Datastore table read support

2020-01-08 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9072:
---

 Summary: [SQL] Add Datastore table read support
 Key: BEAM-9072
 URL: https://issues.apache.org/jira/browse/BEAM-9072
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


* Create a Datastore table and table provider
 * Conversion between Datastore and Beam data types
 * Implement buildIOReader
 * Implement buildIOWrite
 * Implement getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2020-01-07 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8794.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2020-01-07 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010124#comment-17010124
 ] 

Kirill Kozlov commented on BEAM-9027:
-

Brief update on this issue.

Following unparsing error have been fixed:
 * Calcite cannot unparse RexNode back to bytes literal
 * Calcite cannot unparse some floating point literals correctly
 * Calcite cannot unparse some string literals correctly

Other issues from the initial list (INTERVAL and CAST) still need to be fixed.

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses

2019-12-26 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003733#comment-17003733
 ] 

Kirill Kozlov commented on BEAM-9000:
-

I believe that the following PR [1] is somewhat relevant to this, so I decided 
to link it here just in case.

 [1] [https://github.com/apache/beam/pull/10094]

CC: [~bhulette]

> Java Test Assertions without toString for GenericJson subclasses
> 
>
> Key: BEAM-9000
> URL: https://issues.apache.org/jira/browse/BEAM-9000
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Minor
> Fix For: 2.19.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> As of now, there are many tests that assert on {{toString()}} of objects.
> {code:java}
> CounterUpdate result = testObject.transform(monitoringInfo);
> assertEquals(
> "{cumulative=true, integer={highBits=0, lowBits=0}, "
> + "nameAndKind={kind=SUM, "
> + "name=transformedValue-ElementCount}}",
> result.toString());
> {code}
> This style is prone to unnecessary maintenance of the test code when 
> upgrading dependencies. Dependencies may change the internal ordering of 
> fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to 
> upgrade google-http-client, there are ~30 comparison failure due to this 
> {{toString}} assertions.
> They are subclasses of {{com.google.api.client.json.GenericJson}}. 
> Several options to enhance these assertions.
> h1. Option 1: Assertion using Map
> Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as
> {code:java}
> ImmutableMap expected = ImmutableMap.of("cumulative", 
> true,
> "integer", ImmutableMap.of("highBits", 0, "lowBits", 0),
> "nameAndKind", ImmutableMap.of("kind", "SUM", "name", 
> "transformedValue-ElementCount"));
> assertEquals(expected, (Map)result);
> {code}
> Credit: Ben Whitehead.
> h1. Option 2: Create assertEqualsOnJson
> Leveraging the fact that instance of GenericJson can be instantiated through 
> JSON, the assertion can be written as
> {code:java}
> assertEqualsOnJson(
> "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, "
> + "\"nameAndKind\":{\"kind\":\"SUM\", "
> + "\"name\":\"transformedValue-ElementCount\"}}",
> result);
> {code}
>  
> {{assertEqualsOnJson}} is implemented as below. The following field and 
> methods should go to shared test utility class (sdks/testing?)
> {code:java}
>   private static final JacksonFactory jacksonFactory = 
> JacksonFactory.getDefaultInstance();
>   public static  void assertEqualsOnJson(String 
> expectedJsonText, T actual) {
> CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class);
> assertEquals(expected, actual);
>   }
>   public static  T parse(String text, Class clazz) {
> try {
>   JsonParser parser = jacksonFactory.createJsonParser(text);
>   return parser.parse(clazz);
> } catch (IOException ex) {
>   throw new IllegalArgumentException("Could not parse the text as " + 
> clazz, ex);
> }
>   }
> {code}
> A feature request to handle escaping double quotes via JacksonFactory: 
> [https://github.com/googleapis/google-http-java-client/issues/923]
>  
> h1. Option3: Check JSON equality via JSONassert
> * https://github.com/skyscreamer/JSONassert
> * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was 
> in 2012) 
> The JSONassert example does not carry quoted double quote characters. The 
> implementation would be converting actual object into JSON object and calling 
> {{JSONAssert.assertEqual}}.
> Credit: Luke Cwik
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2019-12-23 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002570#comment-17002570
 ] 

Kirill Kozlov commented on BEAM-9027:
-

CC: [~apilloud]

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Priority: Major
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2019-12-23 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002569#comment-17002569
 ] 

Kirill Kozlov commented on BEAM-9027:
-

CC: [~robinyqiu]

> [SQL] ZetaSQL unparsing should produce valid result
> ---
>
> Key: BEAM-9027
> URL: https://issues.apache.org/jira/browse/BEAM-9027
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Priority: Major
>
> * ZetaSQL does not recognize keyword INTERVAL
>  * Calcite cannot unparse RexNode back to bytes literal
>  * Calcite cannot unparse some floating point literals correctly
>  * Calcite cannot unparse some string literals correctly
>  * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9027) [SQL] ZetaSQL unparsing should produce valid result

2019-12-23 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-9027:
---

 Summary: [SQL] ZetaSQL unparsing should produce valid result
 Key: BEAM-9027
 URL: https://issues.apache.org/jira/browse/BEAM-9027
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql-zetasql
Reporter: Kirill Kozlov


* ZetaSQL does not recognize keyword INTERVAL
 * Calcite cannot unparse RexNode back to bytes literal
 * Calcite cannot unparse some floating point literals correctly
 * Calcite cannot unparse some string literals correctly
 * Calcite cannot unparse types correctly for CAST function



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-18 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8933 started by Kirill Kozlov.
---
> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8993) [SQL] MongoDb should use predicate push-down

2019-12-18 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8993:

Description: 
* Add a MongoDbFilter class, implementing BeamSqlTableFilter.
 ** Support simple comparison operations.
 ** Support boolean field.
 ** Support nested conjunction/disjunction.

 * Update MongoDbTable#buildIOReader
 ** Construct a push-down filter from RexNodes.
 ** Set filter to FindQuery.

  was:
* Add a MongoDbFilter class, implementing BeamSqlTableFilter.

 ** Support simple comparison operations.
 ** Support boolean field.
 ** Support nested conjunction/disjunction.
 * Update MongoDbTable#buildIOReader
 ** Construct a push-down filter from RexNodes.
 ** Set filter to FindQuery.


> [SQL] MongoDb should use predicate push-down
> 
>
> Key: BEAM-8993
> URL: https://issues.apache.org/jira/browse/BEAM-8993
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>
> * Add a MongoDbFilter class, implementing BeamSqlTableFilter.
>  ** Support simple comparison operations.
>  ** Support boolean field.
>  ** Support nested conjunction/disjunction.
>  * Update MongoDbTable#buildIOReader
>  ** Construct a push-down filter from RexNodes.
>  ** Set filter to FindQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8993) [SQL] MongoDb should use predicate push-down

2019-12-18 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8993:

Status: Open  (was: Triage Needed)

> [SQL] MongoDb should use predicate push-down
> 
>
> Key: BEAM-8993
> URL: https://issues.apache.org/jira/browse/BEAM-8993
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>
> * Add a MongoDbFilter class, implementing BeamSqlTableFilter.
>  ** Support simple comparison operations.
>  ** Support boolean field.
>  ** Support nested conjunction/disjunction.
>  * Update MongoDbTable#buildIOReader
>  ** Construct a push-down filter from RexNodes.
>  ** Set filter to FindQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8993) [SQL] MongoDb should use predicate push-down

2019-12-18 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8993:
---

 Summary: [SQL] MongoDb should use predicate push-down
 Key: BEAM-8993
 URL: https://issues.apache.org/jira/browse/BEAM-8993
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


* Add a MongoDbFilter class, implementing BeamSqlTableFilter.

 ** Support simple comparison operations.
 ** Support boolean field.
 ** Support nested conjunction/disjunction.
 * Update MongoDbTable#buildIOReader
 ** Construct a push-down filter from RexNodes.
 ** Set filter to FindQuery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-12-18 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8664.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> [SQL] MongoDb should use project push-down
> --
>
> Key: BEAM-8664
> URL: https://issues.apache.org/jira/browse/BEAM-8664
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> MongoDbTable should implement the following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public ProjectSupport supportsProjects();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-17 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8933:

Description: 
As of right now BigQuery uses Avro format for reading and writing.

We should add a config to BigQueryIO to specify which format to use: Arrow or 
Avro (with Avro as default).

  was:
As of right now BigQuery uses Avro format for reading and writing.

We should add a config to BigQueryIO to specify which format to use (with Avro 
as default).


> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-16 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997475#comment-16997475
 ] 

Kirill Kozlov commented on BEAM-8933:
-

[~iemejia], I meant to say Arrow [1] in the title, BigQuery currently supports 
Avro.

As of right now reading from BQ in Arrow does not support all features Avro 
format does, but there are people working on matching the functionality.

[1] [https://github.com/apache/arrow]

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-10 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov reassigned BEAM-8933:
---

Assignee: Kirill Kozlov

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-09 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8933:
---

 Summary: BigQuery IO should support read/write in Arrow format
 Key: BEAM-8933
 URL: https://issues.apache.org/jira/browse/BEAM-8933
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Kirill Kozlov


As of right now BigQuery uses Avro format for reading and writing.

We should add a config to BigQueryIO to specify which format to use (with Avro 
as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989329#comment-16989329
 ] 

Kirill Kozlov commented on BEAM-8896:
-

[~amaliujia], yes, join condition is pushed down to a Project which serve as an 
input to a Join.

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Assignee: Andrew Pilloud
>Priority: Major
>
> The first one of the three following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> // Ok
> String query3 =
> "WITH query AS "
> + "( "
> + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (query.id = tblB.id)";
> Schema transform3 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query3))
> .getSchema();
> System.out.println(transform3);
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query1))
> .getSchema();
> System.out.println(transform1);
> {code}
>  
> The error is:
> {noformat}
> Exception in thread "main" java.lang.AssertionError: Field ordinal 2 is 
> invalid for  type 'RecordType(VARBINARY id, VARCHAR fA1)'Exception in thread 
> "main" java.lang.AssertionError: Field ordinal 2 is invalid for  type 
> 'RecordType(VARBINARY id, VARCHAR fA1)' at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197){noformat}
>  
> If I change `schemaB.id` to `BYTES` (while also avoid using `TO_HEX`), all 
> queries work fine. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162
 ] 

Kirill Kozlov edited comment on BEAM-8896 at 12/5/19 10:27 PM:
---

After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows: 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of nodes (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, causing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulting join inputs as leaves (when they are modified ). Alternative approach 
would be to add an additional Project when a Join needs to modify an input.


was (Author: kirillkozlov):
After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows: 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of node (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated

[jira] [Comment Edited] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162
 ] 

Kirill Kozlov edited comment on BEAM-8896 at 12/5/19 10:24 PM:
---

After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows: 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of node (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, casing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulting join inputs as leaves (when they are modified ). Alternative approach 
would be to add an additional Project when a Join needs to modify an input.


was (Author: kirillkozlov):
After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows:

 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of node (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, casing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulti

[jira] [Comment Edited] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162
 ] 

Kirill Kozlov edited comment on BEAM-8896 at 12/5/19 10:23 PM:
---

After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

When Join node is being created it knows what the `condition expressions` is:
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows:

 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of node (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, casing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulting join inputs as leaves (when they are modified ). Alternative approach 
would be to add an additional Project when a Join needs to modify an input.

[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L2571]


was (Author: kirillkozlov):
After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

 

When Join node is being created it knows what the `condition expressions` is:

 
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows:

 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of node (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, casing an error above.

 

I think a viable solution would be to modify Join creation to register a 

[jira] [Commented] (BEAM-8896) WITH query AS + SELECT query JOIN other throws invalid type

2019-12-05 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989162#comment-16989162
 ] 

Kirill Kozlov commented on BEAM-8896:
-

After some investigating I have a feeling that this problem has to do with the 
way leaves (Array in SqlToRelConverter class) are registered in Calcite.

Before joins are created left and right paths are pared first. For the 1st 
query above they are as follows:

 
{code:java}
Left:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)

Right:
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
As they are processed - they are registered as leaves (added to the Array).

 

 

When Join node is being created it knows what the `condition expressions` is:

 
{code:java}
=(TO_HEX($0), $3)
{code}
Since TO_HEX is not computed anywhere - it modifies the left input to be as 
follows:

 
{code:java}
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
{code}
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not 
updated.

 

Finally, when identifier "query.fA1_2" is being converted (via 
SqlToRelConverter#convertIdentifier) for the top-most node
{code:java}
top-most node:
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR id0, VARCHAR fB1)
  LogicalJoin with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, VARCHAR 
$f3, VARCHAR id0, VARCHAR fB1)
LogicalProject with RecordType(VARBINARY id, VARCHAR fA1, VARCHAR fA1_2, 
VARCHAR $f3)
  LogicalTableScan with RecordType(VARBINARY id, VARCHAR fA1)
LogicalTableScan with RecordType(VARCHAR id, VARCHAR fB1){code}
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of 
which LookupContext is created.

In a constructor, LookupContext performs flatten, which recursively traverses  
tree of node (from above codeblock) and checks the leaves to see if they 
contain such expression. When it does get to the modified left input of a join 
it does not get a match on it and continues further down to a TableScan.

When it finally flattens the result, TableScan's RecordType knows nothing about 
a duplicated field `fA1_2`, casing an error above.

 

I think a viable solution would be to modify Join creation to register a 
resulting join inputs as leaves (when they are modified ). Alternative approach 
would be to add an additional Project when a Join needs to modify an input.

> WITH query AS + SELECT query JOIN other throws invalid type
> ---
>
> Key: BEAM-8896
> URL: https://issues.apache.org/jira/browse/BEAM-8896
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.16.0
>Reporter: fdiazgon
>Assignee: Andrew Pilloud
>Priority: Major
>
> The first one of the three following queries fails, despite queries being 
> equivalent:
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schemaA =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.BYTES),
> Schema.Field.of("fA1", Schema.FieldType.STRING));
> Schema schemaB =
> Schema.of(
> Schema.Field.of("id", Schema.FieldType.STRING),
> Schema.Field.of("fB1", Schema.FieldType.STRING));
> PCollection inputA =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaA)));
> PCollection inputB =
> 
> p.apply(Create.of(ImmutableList.of()).withCoder(SchemaCoder.of(schemaB)));
> // Fails
> String query1 =
> "WITH query AS "
> + "( "
> + " SELECT id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (TO_HEX(query.id) = tblB.id)";
> // Ok
> String query2 =
> "WITH query AS "
> + "( "
> + " SELECT fA1, fB1, fA1 AS fA1_2 "
> + " FROM tblA "
> + " JOIN tblB "
> + " ON (TO_HEX(tblA.id) = tblB.id) "
> + ")"
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query ";
> // Ok
> String query3 =
> "WITH query AS "
> + "( "
> + " SELECT TO_HEX(id) AS id, fA1, fA1 AS fA1_2 "
> + " FROM tblA"
> + ") "
> + "SELECT fA1, fB1, fA1_2 "
> + "FROM query "
> + "JOIN tblB ON (query.id = tblB.id)";
> Schema transform3 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query3))
> .getSchema();
> System.out.println(transform3);
> Schema transform2 =
> PCollectionTuple.of("tblA", inputA)
> .and("tblB", inputB)
> .apply(SqlTransform.query(query2))
> .getSchema();
> System.out.println(transform2);
> Schema transform1 =
> PCollectionTuple.of("tblA", inputA)
>

[jira] [Updated] (BEAM-8890) Update google-cloud-java version

2019-12-04 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8890:

Summary: Update google-cloud-java version  (was: Update google-cloud-java)

> Update google-cloud-java version
> 
>
> Key: BEAM-8890
> URL: https://issues.apache.org/jira/browse/BEAM-8890
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Kirill Kozlov
>Priority: Major
>
> google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB).
> Which is a problem when reading in DIRECT_READ (Read API) from BigQuery 
> tables with large rows.
> Updating to v0.117.0 or above should fix the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8890) Update google-cloud-java

2019-12-04 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8890:
---

 Summary: Update google-cloud-java
 Key: BEAM-8890
 URL: https://issues.apache.org/jira/browse/BEAM-8890
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.16.0
Reporter: Kirill Kozlov


google-cloud-java under v0.117.0 has a limit on gRPC message size (~11 MB).

Which is a problem when reading in DIRECT_READ (Read API) from BigQuery tables 
with large rows.

Updating to v0.117.0 or above should fix the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8564) Add LZO compression and decompression support

2019-12-03 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987266#comment-16987266
 ] 

Kirill Kozlov commented on BEAM-8564:
-

I do not have much experience working with GO SDK, but the Readme [1] on GitHub 
has some relevant information. I would start there.

[1] [https://github.com/apache/beam/tree/master/sdks/go]

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2019-12-03 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8844 started by Kirill Kozlov.
---
> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8564) Add LZO compression and decompression support

2019-12-02 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986375#comment-16986375
 ] 

Kirill Kozlov edited comment on BEAM-8564 at 12/2/19 9:14 PM:
--

[~AmoghTiwari], thanks for working on this!

Are you working on adding this just for the Java SDK? If so, there is no need 
to build other SDKs (the error above is from building GO SDK).

You might want to use the following command:
{code:java}
./gradlew -p sdks/java/ build
{code}
It should make your build go much faster. You could further specify the build 
target by updating the path.


was (Author: kirillkozlov):
[~AmoghTiwari], thanks for working on this!

Are you working on adding this just for the Java SDK? If so, there is no need 
to build other SDKs.

You might want to use the following command:
{code:java}
./gradlew -p sdks/java/ build
{code}
It should make your build go much faster. You could further specify the build 
target by updating the path.

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8564) Add LZO compression and decompression support

2019-12-02 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986375#comment-16986375
 ] 

Kirill Kozlov commented on BEAM-8564:
-

[~AmoghTiwari], thanks for working on this!

Are you working on adding this just for the Java SDK? If so, there is no need 
to build other SDKs.

You might want to use the following command:
{code:java}
./gradlew -p sdks/java/ build
{code}
It should make your build go much faster. You could further specify the build 
target by updating the path.

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8844) [SQL] Create performance tests for BigQueryTable

2019-11-27 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8844:

Summary: [SQL] Create performance tests for BigQueryTable  (was: [SQL] 
Create performance tests for BigQuery)

> [SQL] Create performance tests for BigQueryTable
> 
>
> Key: BEAM-8844
> URL: https://issues.apache.org/jira/browse/BEAM-8844
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>
> They should measure read-time for:
>  * DIRECT_READ w/o push-down
>  * DIRECT_READ w/ push-down
>  * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8844) [SQL] Create performance tests for BigQuery

2019-11-27 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8844:
---

 Summary: [SQL] Create performance tests for BigQuery
 Key: BEAM-8844
 URL: https://issues.apache.org/jira/browse/BEAM-8844
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


They should measure read-time for:
 * DIRECT_READ w/o push-down
 * DIRECT_READ w/ push-down
 * DEFAULT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-25 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8586.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-25 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov reassigned BEAM-8586:
---

Assignee: Kirill Kozlov

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-21 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8427.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8794:

Status: Open  (was: Triage Needed)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8794:
---

 Summary: Projects should be handled by an IOPushDownRule before 
applying AggregateProjectMergeRule
 Key: BEAM-8794
 URL: https://issues.apache.org/jira/browse/BEAM-8794
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


It is more efficient to push-down projected fields at an IO level (vs merging 
with an Aggregate), when supported.

When running queries like:
{code:java}
select SUM(score) as total_score from  group by name{code}
Projects get merged with an aggregate, as a result Calc (after an IOSourceRel) 
projects all fields and BeamIOPushDown rule does know what fields can be 
dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-20 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8343.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter constructFilter(List filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-15 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8583.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
> -
>
> Key: BEAM-8583
> URL: https://issues.apache.org/jira/browse/BEAM-8583
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> * Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
> Calcite 1.20.0, but is present in unreleased versions).
>  * Create a BigQueryFilter class.
>  * BigQueryTable#buildIOReader should translate supported filters into a Sql 
> string and pass it to BigQueryIO.
>  
> Potential improvements:
>  * After updating vendor Calcite, class 
> `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's 
> `BigQuerySqlDialect` can be utilized instead.
>  * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
> should be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8664) [SQL] MongoDb should use project push-down

2019-11-14 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8664:
---

 Summary: [SQL] MongoDb should use project push-down
 Key: BEAM-8664
 URL: https://issues.apache.org/jira/browse/BEAM-8664
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


MongoDbTable should implement the following methods:
{code:java}
public PCollection buildIOReader(
PBegin begin, BeamSqlTableFilter filters, List fieldNames);

public ProjectSupport supportsProjects();
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-13 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8583:

Status: Open  (was: Triage Needed)

> [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
> -
>
> Key: BEAM-8583
> URL: https://issues.apache.org/jira/browse/BEAM-8583
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> * Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
> Calcite 1.20.0, but is present in unreleased versions).
>  * Create a BigQueryFilter class.
>  * BigQueryTable#buildIOReader should translate supported filters into a Sql 
> string and pass it to BigQueryIO.
>  
> Potential improvements:
>  * After updating vendor Calcite, class 
> `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's 
> `BigQuerySqlDialect` can be utilized instead.
>  * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
> should be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8468) Add predicate/filter push-down capability to IO APIs

2019-11-13 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8468.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Add predicate/filter push-down capability to IO APIs
> 
>
> Key: BEAM-8468
> URL: https://issues.apache.org/jira/browse/BEAM-8468
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following methods:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);
> public BeamSqlTableFilter constructFilter(List filter);
> {code}
>  * Update a push-down rule to support predicate/filter push-down.
>  * Create a class
> {code:java}
> class TestTableFilter implements BeamSqlTableFilter{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-13 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8508.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

2019-11-13 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973664#comment-16973664
 ] 

Kirill Kozlov commented on BEAM-8042:
-

[~kanterov] After looking more into this, I suspect that it is a 
ResolvedAggregateScan bug. Aggregate list, which stores ResolvedComputedColumns 
stores wrong column indexes in argumentList. 

Following test cases pass:
{code:java}
String sql = "SELECT \n" +
"  id, \n" +
"  COUNT(id) as count, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count  \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";
{code}
{code:java}
String sql = "SELECT \n" +
"  id, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count, \n" +
"  COUNT(*) as count \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";
{code}
 

In case when COUNT is last the combined aggregates look like this (as they 
should, used field indexes are correct):
{code:java}
aggCalls = {ArrayList@5898}  size = 7
 0 = {AggregateCall@5899} "SUM($1)"
 1 = {AggregateCall@5947} "SUM($2)"
 2 = {AggregateCall@5948} "SUM($3)"
 3 = {AggregateCall@5949} "SUM($4)"
 4 = {AggregateCall@5950} "SUM($5)"
 5 = {AggregateCall@5951} "SUM($6)"
 6 = {AggregateCall@5952} "COUNT()"
{code}
When COUNT is first (used field indexes are offset by 1):
{code:java}
aggCalls = {ArrayList@5898}  size = 7
 0 = {AggregateCall@5899} "COUNT()"
 1 = {AggregateCall@5947} "SUM($2)"
 2 = {AggregateCall@5948} "SUM($3)"
 3 = {AggregateCall@5949} "SUM($4)"
 4 = {AggregateCall@5950} "SUM($5)"
 5 = {AggregateCall@5951} "SUM($6)"
 6 = {AggregateCall@5952} "SUM($7)"
{code}
It increases fields used by aggregates that happen after COUNT by 1, thus 
causing an index out of bound exception when validating the last one. Even if 
out of bound exception was not thrown, this would produce the wrong result, 
because SUM aggregates would happen over the wrong fields.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.b

[jira] [Comment Edited] (BEAM-8042) Parsing of aggregate query fails

2019-11-12 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972928#comment-16972928
 ] 

Kirill Kozlov edited comment on BEAM-8042 at 11/13/19 1:03 AM:
---

I tried running a similar query, but without COUNT( * ) and it seems construct 
BeamRelNode successfully. 
{code:java}
ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);

String sql = "SELECT \n" +
"  id, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count  \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";

BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql);
{code}
 

https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to 
AggregateProjectMergeRule.


was (Author: kirillkozlov):
I tried running a similar query, but without COUNT(*) and it seems construct 
BeamRelNode successfully. 
{code:java}
ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);

String sql = "SELECT \n" +
"  id, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count  \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";

BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql);
{code}
 

https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to 
AggregateProjectMergeRule.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   

[jira] [Comment Edited] (BEAM-8042) Parsing of aggregate query fails

2019-11-12 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972928#comment-16972928
 ] 

Kirill Kozlov edited comment on BEAM-8042 at 11/13/19 1:03 AM:
---

I tried running a similar query, but without COUNT(*) and it seems construct 
BeamRelNode successfully. 
{code:java}
ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);

String sql = "SELECT \n" +
"  id, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count  \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";

BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql);
{code}
 

https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to 
AggregateProjectMergeRule.


was (Author: kirillkozlov):
I tried running a similar query, but without COUNT(*) and it seems construct 
BeamRelNode successfully. 
{code:java}
ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);

String sql = "SELECT \n" +
"  id, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count  \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";

BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql);
{code}
 

https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to 
AggregateProjectMergeRule.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at

[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

2019-11-12 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972928#comment-16972928
 ] 

Kirill Kozlov commented on BEAM-8042:
-

I tried running a similar query, but without COUNT(*) and it seems construct 
BeamRelNode successfully. 
{code:java}
ZetaSQLQueryPlanner zetaSQLQueryPlanner = new ZetaSQLQueryPlanner(config);

String sql = "SELECT \n" +
"  id, \n" +
"  SUM(has_f1) as f1_count, \n" +
"  SUM(has_f2) as f2_count, \n" +
"  SUM(has_f3) as f3_count, \n" +
"  SUM(has_f4) as f4_count, \n" +
"  SUM(has_f5) as f5_count, \n" +
"  SUM(has_f6) as f6_count  \n" +
"FROM (select 0 as id, 0 as has_f1, 0 as has_f2, 0 as has_f3, 0 as has_f4, 
0 as has_f5, 0 as has_f6)\n" +
"GROUP BY id";

BeamRelNode node = zetaSQLQueryPlanner.convertToBeamRel(sql);
{code}
 

https://jira.apache.org/jira/browse/BEAM-7609 may also be relevant to 
AggregateProjectMergeRule.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Critical
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-11 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8586:

Summary: [SQL] Add a server for MongoDb Integration Test  (was: Add a 
server for MongoDb Integration Test)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-11 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8586:

Status: Open  (was: Triage Needed)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-11 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8427:

Description: 
* Create a MongoDB table and table provider.
 * Implement buildIOReader
 * Support primitive types
 * Implement buildIOWrite
 * improve getTableStatistics

  was:
In progress:
 * Create a MongoDB table and table provider.
 * Implement buildIOReader
 * Support primitive types

Still needs to be done:
 * Implement buildIOWrite
 * improve getTableStatistics


> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-07 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8583:

Description: 
* Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
Calcite 1.20.0, but is present in unreleased versions).
 * Create a BigQueryFilter class.
 * BigQueryTable#buildIOReader should translate supported filters into a Sql 
string and pass it to BigQueryIO.

 

Potential improvements:
 * After updating vendor Calcite, class `BigQuerySqlDialectWithTypeTranslation` 
can be deleted and Calcite's `BigQuerySqlDialect` can be utilized instead.
 * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
should be updated.

> [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
> -
>
> Key: BEAM-8583
> URL: https://issues.apache.org/jira/browse/BEAM-8583
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
> Calcite 1.20.0, but is present in unreleased versions).
>  * Create a BigQueryFilter class.
>  * BigQueryTable#buildIOReader should translate supported filters into a Sql 
> string and pass it to BigQueryIO.
>  
> Potential improvements:
>  * After updating vendor Calcite, class 
> `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's 
> `BigQuerySqlDialect` can be utilized instead.
>  * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
> should be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-07 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov closed BEAM-8574.
---
Fix Version/s: 2.18.0
   Resolution: Resolved

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8586) Add a server for MongoDb Integration Test

2019-11-07 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8586:
---

 Summary: Add a server for MongoDb Integration Test
 Key: BEAM-8586
 URL: https://issues.apache.org/jira/browse/BEAM-8586
 Project: Beam
  Issue Type: Test
  Components: dsl-sql
Reporter: Kirill Kozlov


We need to pass pipeline options with server information to the 
MongoDbReadWriteIT.

For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-07 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8583:
---

 Summary: [SQL] BigQuery should support predicate push-down in 
DIRECT_READ mode
 Key: BEAM-8583
 URL: https://issues.apache.org/jira/browse/BEAM-8583
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

Revert PR: [https://github.com/apache/beam/pull/10018]

  was:
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 
> Revert PR: [https://github.com/apache/beam/pull/10018]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

  was:
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

Revert PR: [https://github.com/apache/beam/pull/10018]


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

Cause: [https://github.com/apache/beam/pull/9806] 

  was:
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

 


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

Jenkins: 
[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

 

  was:
Integration test for Sql MongoDb table read and write fails.

[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8574:

Description: 
Integration test for Sql MongoDb table read and write fails.

[https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]

  was:Integration test for Sql MongoDb table read and write fails.


> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
>
> Integration test for Sql MongoDb table read and write fails.
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-06 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8574:
---

 Summary: [SQL] MongoDb PostCommit_SQL fails
 Key: BEAM-8574
 URL: https://issues.apache.org/jira/browse/BEAM-8574
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


Integration test for Sql MongoDb table read and write fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8343:

Description: 
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter constructFilter(List filter)
 - ProjectSupport supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

ProjectSupport is an enum with the following options:
 * NONE
 * WITHOUT_FIELD_REORDERING
 * WITH_FIELD_REORDERING

 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].

  was:
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - ProjectSupport supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

 * ProjectSupport is an enum with the following options:
 * NONE
 * WITHOUT_FIELD_REORDERING
 * WITH_FIELD_REORDERING

 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].


> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter constructFilter(List filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8343:

Description: 
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - ProjectSupport supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

 * ProjectSupport is an enum with the following options:
 * NONE
 * WITHOUT_FIELD_REORDERING
 * WITH_FIELD_REORDERING

 

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].

  was:
The objective is to create a universal way for Beam SQL IO APIs to support 
predicate/project push-down.
 A proposed way to achieve that is by introducing an interface responsible for 
identifying what portion(s) of a Calc can be moved down to IO layer. Also, 
adding following methods to a BeamSqlTable interface to pass necessary 
parameters to IO APIs:
 - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
 - Boolean supportsProjects()
 - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
List fieldNames)
  

Design doc 
[link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].


> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter supportsFilter(RexProgram program, RexNode filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
>  * ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8514) ZetaSql should use cost-based optimization to take advantage of Join Reordering Rule and Push-Down Rule

2019-11-06 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968625#comment-16968625
 ] 

Kirill Kozlov commented on BEAM-8514:
-

There are currently no ZetaSql tests to validate Join Reordering.

> ZetaSql should use cost-based optimization to take advantage of Join 
> Reordering Rule and Push-Down Rule
> ---
>
> Key: BEAM-8514
> URL: https://issues.apache.org/jira/browse/BEAM-8514
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql-zetasql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Default config should use BeamCostModel, as well as tests with custom 
> configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8498) [SQL] MongoDb table reader should convert Documents directly to Row (and vice versa)

2019-11-06 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8498:

Summary: [SQL] MongoDb table reader should convert Documents directly to 
Row (and vice versa)  (was: [SQL] MongoDb table reader should convert Documents 
directly to Row)

> [SQL] MongoDb table reader should convert Documents directly to Row (and vice 
> versa)
> 
>
> Key: BEAM-8498
> URL: https://issues.apache.org/jira/browse/BEAM-8498
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Major
>  Labels: performance
>
> As of right now, the process of reading from MongoDb in SQL is as follows:
> Read MongoDb Documents -> Convert Documents to JSON -> Convert JSON to Rows.
> It should be possible to get rid of the middle step by making use of 
> RowWithGetters (vs RowWithStorage).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8428) [SQL] BigQuery should support project push-down in DIRECT_READ mode

2019-11-02 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8428.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> [SQL] BigQuery should support project push-down in DIRECT_READ mode
> ---
>
> Key: BEAM-8428
> URL: https://issues.apache.org/jira/browse/BEAM-8428
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> BigQuery should perform project push-down for read pipelines when applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >