[jira] [Commented] (HIVE-20720) Add partition column option to JDBC handler

Jesus Camacho Rodriguez (JIRA) Wed, 17 Oct 2018 21:13:39 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654615#comment-16654615
 ]


Jesus Camacho Rodriguez commented on HIVE-20720:
------------------------------------------------

[~daijy], thanks. Wrt pattern matching on FROM clause, I believe it is quite 
safe with your latest change: Calcite will only set the splittable flag to 
'true' for Select-Filter-Scan queries (no join, group by, or other statements), 
and if user is facing issues with hardcoded query, they can always rewrite it. 
As we move forward and we split more complex computation, we may revisit that 
logic.

+1 (pending tests)

> Add partition column option to JDBC handler
> -------------------------------------------
>
>                 Key: HIVE-20720
>                 URL: https://issues.apache.org/jira/browse/HIVE-20720
>             Project: Hive
>          Issue Type: New Feature
>          Components: StorageHandler
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>            Priority: Major
>         Attachments: HIVE-20720.1.patch, HIVE-20720.2.patch, 
> HIVE-20720.3.patch, HIVE-20720.4.patch, HIVE-20720.5.patch, 
> HIVE-20720.6.patch, HIVE-20720.7.patch, HIVE-20720.8.patch
>
>
> Currently JdbcStorageHandler does not split input in Tez. The reason is 
> numSplit of JdbcInputFormat.getSplits can only pass via "mapreduce.job.maps" 
> in Tez. And "mapreduce.job.maps" is not a valid param if authorizer(eg. 
> SQLStdAuth) is in use. User ends up always use 1 split.
> We need to rely on this new feature if we want to support multi-splits. Here 
> is my proposal:
> 1. Specify partitionColumn/numPartitions, and optional lowerBound/upperBound 
> in tblproperties if user want to split jdbc data source. In case 
> lowerBound/upperBound is not specified, JdbcStorageHandler will run max/min 
> query to get this in planner. We can currently limit partitionColumn to only 
> numeric/date/timestamp column for simplicity
> 2. If partitionColumn/numPartitions are not specified, don't split input
> 3. Splits are equal intervals without respect to data distribution
> 4. There is also a "hive.sql.query.split" flag vetos the split (can be set 
> manually or automatically by calcite)
> 5. If partitionColumn is not defined, but numPartitions is defined, use 
> original limit/offset logic (however, don't rely on numSplit).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20720) Add partition column option to JDBC handler

Reply via email to