[jira] [Commented] (CALCITE-2312) Support Partition By in sql select statement

Julian Hyde (JIRA) Thu, 17 May 2018 10:18:15 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479409#comment-16479409
 ]


Julian Hyde commented on CALCITE-2312:
--------------------------------------

That sounds like a plan. You probably want to represent other partitioning 
schemes (e.g. round robin, singleton) as well as partitioned by keys. I don't 
think we'd add this to Calcite's DDL parser but if you can share what you come 
up with, it might be interesting to other people.

> Support Partition By in sql select statement
> --------------------------------------------
>
>                 Key: CALCITE-2312
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2312
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>            Reporter: Fei Xu
>            Assignee: Julian Hyde
>            Priority: Major
>
> I noticed that calcite already have an Exchange RelNode represents 
> distribution on RelNode's input, but no sql clause support. 
>  
> We are use calcite building SQL layer for out streaming platform. and in 
> streaming computation, data shuffle is a very important function, not only 
> for our engine, but also for our users.
> For example, In stream-join-table, the engine will load the table data to a 
> cache at runtime, and stream join table is actually stream look up table.
> If the stream data could partition by hash or range, before look up table. It 
> will be cache friendly cause particular stream data look up particular table 
> data. 
>  
> So I consider do some extensions on sql clause to support Exchange, e.g. 
> {code:java}
> // shuffle data using hash_distribution
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY o.productId
> // shuffle data to a singleton node
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY SINGLETON
> // shuffle data to all nodes 
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY BROADCAST
> // shuffle data to random node
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY RANDOM
> // shuffle data using round-robin policy
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY ROUND_ROBIN
> // shuffle data using range policy
> // Current I'm not sure about the appropriate clause to represents range 
> // shuffle, so it is just an demo.
> INSERT INTO output
> SELECT
>   *
> FROM orders o
> PARTITION BY RANGE o.productId, 0, 4096 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2312) Support Partition By in sql select statement

Reply via email to