[jira] [Commented] (BEAM-4704) String operations yield incorrect results when executed through SQL shell

2018-09-05 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604072#comment-16604072
 ] 

Julian Hyde commented on BEAM-4704:
---

Calcite's implementation complies (mostly) with the SQL standard, which says 
that the trim string must be exactly one character long. The  standard says 
that the implementation must throw, whereas Calcite just takes the first 
character.

> String operations yield incorrect results when executed through SQL shell
> -
>
> Key: BEAM-4704
> URL: https://issues.apache.org/jira/browse/BEAM-4704
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Priority: Major
>
> {{TRIM}} is defined to trim _all_ the characters in the first string from the 
> string-to-be-trimmed. Calcite has an incorrect implementation of this. We use 
> our own fixed implementation. But when executed through the SQL shell, the 
> results do not match what we get from the PTransform path. Here two test 
> cases that pass on {{master}} but are incorrect in the shell:
> {code:sql}
> BeamSQL> select TRIM(LEADING 'eh' FROM 'hehe__hehe');
> ++
> | EXPR$0 |
> ++
> | hehe__hehe |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(TRAILING 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}
> {code:sql}
> BeamSQL> select TRIM(BOTH 'eh' FROM 'hehe__hehe');
> ++
> |   EXPR$0   |
> ++
> | hehe__heh  |
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-2478) Distinct Aggregates

2017-06-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062450#comment-16062450
 ] 

Julian Hyde edited comment on BEAM-2478 at 6/25/17 10:45 PM:
-

Your rewrite for hierarchical calculation is slightly wrong.

{code}
select a, count(distinct b) from t group by a

becomes

select a, count(distinct_b) from (
  select a, b as distinct_b
  from t
  group by a, b)
group by a
{code}

This correctly ignores rows where b is null.

Calcite's 
[AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java]
 does this rewrite; it can also do a more complex rewrite using GROUPING SETS 
if there are multiple distinct-counts in the same query. See also CALCITE-1588 
for approximate distinct-count.


was (Author: julianhyde):
Your rewrite for hierarchical calculation is slightly wrong.

{code}
select a, count(distinct b) from t group by a

becomes

select a, count(distinct_b) from (
  select a, b as distinct_b
  from t
  group by a, b)
group by a)
{code}

This correctly ignores rows where b is null.

Calcite's 
[AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java]
 does this rewrite; it can also do a more complex rewrite using GROUPING SETS 
if there are multiple distinct-counts in the same query. See also CALCITE-1588 
for approximate distinct-count.

> Distinct Aggregates
> ---
>
> Key: BEAM-2478
> URL: https://issues.apache.org/jira/browse/BEAM-2478
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Tarush Grover
>
> eg: COUNT(DISTINCT empno)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2478) Distinct Aggregates

2017-06-25 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062450#comment-16062450
 ] 

Julian Hyde commented on BEAM-2478:
---

Your rewrite for hierarchical calculation is slightly wrong.

{code}
select a, count(distinct b) from t group by a

becomes

select a, count(distinct_b) from (
  select a, b as distinct_b
  from t
  group by a, b)
group by a)
{code}

This correctly ignores rows where b is null.

Calcite's 
[AggregateExpandDistinctAggregatesRule|https://insight.io/github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AggregateExpandDistinctAggregatesRule.java]
 does this rewrite; it can also do a more complex rewrite using GROUPING SETS 
if there are multiple distinct-counts in the same query. See also CALCITE-1588 
for approximate distinct-count.

> Distinct Aggregates
> ---
>
> Key: BEAM-2478
> URL: https://issues.apache.org/jira/browse/BEAM-2478
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Jingsong Lee
>Assignee: Tarush Grover
>
> eg: COUNT(DISTINCT empno)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-06-22 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060361#comment-16060361
 ] 

Julian Hyde commented on BEAM-301:
--

I'd like to track/participate in the SQL-DSL effort. But sorry, I don't want to 
subscribe to dev@beam (I'm already subscribed to dev@drill, phoenix, samza, 
calcite... the load is too heavy). I've been watching this JIRA case hoping 
that I'd hear about efforts.

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-04-17 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971504#comment-15971504
 ] 

Julian Hyde commented on BEAM-301:
--

[~atroche], For streaming SQL, Calcite's main role is to lead the discussion of 
semantics and provide SQL parsing/validation support. We have a "reference 
implementation" of streaming SQL (single node, not very efficient) but it lags 
behind what engines such as Flink can do.

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-04-14 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969712#comment-15969712
 ] 

Julian Hyde commented on BEAM-301:
--

Yes. And "last year" is a rolling total based on the time each order was 
placed, not the time the query was started.

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-04-13 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968460#comment-15968460
 ] 

Julian Hyde commented on BEAM-301:
--

[~takidau], +1 to that approach.

In the short term I think we will have multiple variants of streaming SQL. 
Calcite will support STREAM and using the same identifier for streams/tables, 
and will provide a 
[switch|https://calcite.apache.org/apidocs/org/apache/calcite/sql/validate/SqlConformance.html]
 so that Beam can disable them. Over the longer term I will try make the case 
that these features are useful (or I will fail, and these features will wither 
away). I can't really make the case until we have features like self-join of a 
stream to its own history.

The crucial query which illustrates this is this one:

{code}select stream *
from Orders as o
where units > (
  select avg(units)
  from Orders as h
  where h.productId = o.productId
  and h.rowtime > o.rowtime - interval ‘1’ year){code}

It combines the {{Orders}} stream with its own history. But after the query has 
been running for a while, the records that passed through the stream will have 
entered the history. The history relation {{h}} is neither bounded, nor 
unbounded (in Beam's terms), but time-varying.

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-04-06 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959139#comment-15959139
 ] 

Julian Hyde commented on BEAM-301:
--

{{EMIT}} is just slide-ware at this point. I don't plan to start work on 
parsing/validating it until we have nailed down the specification. Can someone 
from the Beam side check that it captures the main triggering functionality?

I take your point about {{STREAM}}. Note that on the slide, {{EMIT WHEN 
COMPLETE}} is basically the converse of {{STREAM}}. So, we might end up keeping 
{{STREAM}} as an alternative way of specifying {{EMIT EARLY}} (or something).

Without the {{STREAM}} keyword to distinguish them, how would you phrase the 
queries {code}SELECT STREAM * FROM Orders WHERE units > 100{code} (a streaming 
query, returning only orders received from now on, and not terminating) and 
{code}SELECT COUNT(DISTINCT productId) FROM Orders WHERE EXTRACT(YEAR FROM 
rowtime) = 2015{code} (a historical query, which emits one row and terminates).

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (BEAM-301) Add a Beam SQL DSL

2017-04-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957945#comment-15957945
 ] 

Julian Hyde edited comment on BEAM-301 at 4/5/17 10:47 PM:
---

A couple of recent developments in Calcite's streaming SQL:
* A proposed {{EMIT}} clause to match Beam's "triggering" (the term "trigger" 
has some unfortunate connotations in the SQL world). Appears on slide 15 of [my 
"Streaming SQL" talk at FlinkForward 
2016|https://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912].
* A {{MATCH_RECOGNIZE}} clause, inspired by Oracle, good for non-streaming and 
streaming queries, but clearly good for "Complex Event Processing" (CEP) kinds 
of applications. Currently we have parser and validator support but no 
implementation; see CALCITE-1570.


was (Author: julianhyde):
A couple of recent developments in Calcite's streaming SQL:
* A proposed {{EMIT}} clause to match Beam's "triggering" (the term "trigger" 
has some unfortunate connotations in the SQL world). Appears on slide 15 of 
[https://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912|my
 "Streaming SQL" talk at FlinkForward].
* A {{MATCH_RECOGNIZE}} clause, inspired by Oracle, good for non-streaming and 
streaming queries, but clearly good for "Complex Event Processing" (CEP) kinds 
of applications. Currently we have parser and validator support but no 
implementation; see CALCITE-1570.

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

2017-04-05 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957945#comment-15957945
 ] 

Julian Hyde commented on BEAM-301:
--

A couple of recent developments in Calcite's streaming SQL:
* A proposed {{EMIT}} clause to match Beam's "triggering" (the term "trigger" 
has some unfortunate connotations in the SQL world). Appears on slide 15 of 
[https://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912|my
 "Streaming SQL" talk at FlinkForward].
* A {{MATCH_RECOGNIZE}} clause, inspired by Oracle, good for non-streaming and 
streaming queries, but clearly good for "Complex Event Processing" (CEP) kinds 
of applications. Currently we have parser and validator support but no 
implementation; see CALCITE-1570.

> Add a Beam SQL DSL
> --
>
> Key: BEAM-301
> URL: https://issues.apache.org/jira/browse/BEAM-301
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)