[ https://issues.apache.org/jira/browse/CALCITE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479409#comment-16479409 ]
Julian Hyde commented on CALCITE-2312: -------------------------------------- That sounds like a plan. You probably want to represent other partitioning schemes (e.g. round robin, singleton) as well as partitioned by keys. I don't think we'd add this to Calcite's DDL parser but if you can share what you come up with, it might be interesting to other people. > Support Partition By in sql select statement > -------------------------------------------- > > Key: CALCITE-2312 > URL: https://issues.apache.org/jira/browse/CALCITE-2312 > Project: Calcite > Issue Type: New Feature > Components: core > Reporter: Fei Xu > Assignee: Julian Hyde > Priority: Major > > I noticed that calcite already have an Exchange RelNode represents > distribution on RelNode's input, but no sql clause support. > > We are use calcite building SQL layer for out streaming platform. and in > streaming computation, data shuffle is a very important function, not only > for our engine, but also for our users. > For example, In stream-join-table, the engine will load the table data to a > cache at runtime, and stream join table is actually stream look up table. > If the stream data could partition by hash or range, before look up table. It > will be cache friendly cause particular stream data look up particular table > data. > > So I consider do some extensions on sql clause to support Exchange, e.g. > {code:java} > // shuffle data using hash_distribution > INSERT INTO output > SELECT > * > FROM orders o > PARTITION BY o.productId > // shuffle data to a singleton node > INSERT INTO output > SELECT > * > FROM orders o > PARTITION BY SINGLETON > // shuffle data to all nodes > INSERT INTO output > SELECT > * > FROM orders o > PARTITION BY BROADCAST > // shuffle data to random node > INSERT INTO output > SELECT > * > FROM orders o > PARTITION BY RANDOM > // shuffle data using round-robin policy > INSERT INTO output > SELECT > * > FROM orders o > PARTITION BY ROUND_ROBIN > // shuffle data using range policy > // Current I'm not sure about the appropriate clause to represents range > // shuffle, so it is just an demo. > INSERT INTO output > SELECT > * > FROM orders o > PARTITION BY RANGE o.productId, 0, 4096 > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)