[jira] [Created] (HIVE-18049) Enable Hive on Tez to provide globally sorted clustered table

2017-11-12 Thread LingXiao Lan (JIRA)
LingXiao Lan created HIVE-18049:
---

 Summary: Enable Hive on Tez to provide globally sorted clustered 
table
 Key: HIVE-18049
 URL: https://issues.apache.org/jira/browse/HIVE-18049
 Project: Hive
  Issue Type: Improvement
  Components: Hive, Tez
Reporter: LingXiao Lan
 Fix For: 2.1.1


CREATE TABLE `test`(
   `time` int,
   `userid` bigint)
 CLUSTERED BY (
   userid)
 SORTED BY (
   userid ASC)
 INTO 4 BUCKETS
 ;
When insert data into this table, the data will be sorted into 4 buckets 
automatically. But because hive uses hash partitioner by default, the data is 
only sorted in each bucket and isn't sorted among different buckets. Sometimes 
we need the data to be globally sorted, to optimizing indexing, for example.

If we can sample the table first and use TotalOrderPartitioner, this work could 
be done. The difficulty is how do we automatically decide when to use 
TotalOrderPartitioner and when not, because a insertion query can be complex, 
which results in a complex DAG in Tez.

I have implemented a temporary version. It uses a customer partitioner which 
combines hash partitioner and totalorder partitioner. A physical optimizer is 
added to hive to decide to choose which partitioner. But in order to reduce the 
work load, this version should affect tez source code, which is not necessary 
in fact.

I'm wondering if we can implement a more common version which addresses this 
issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18048) Add qtests for Struct type with vectorization

2017-11-12 Thread Colin Ma (JIRA)
Colin Ma created HIVE-18048:
---

 Summary: Add qtests for Struct type with vectorization
 Key: HIVE-18048
 URL: https://issues.apache.org/jira/browse/HIVE-18048
 Project: Hive
  Issue Type: Sub-task
Reporter: Colin Ma
Assignee: Colin Ma


Struct type is supported in vectorization, but there is no qtests to test such 
case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18047) Support dynamic service discovery for HiveMetaStore

2017-11-12 Thread Bing Li (JIRA)
Bing Li created HIVE-18047:
--

 Summary: Support dynamic service discovery for HiveMetaStore
 Key: HIVE-18047
 URL: https://issues.apache.org/jira/browse/HIVE-18047
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Bing Li
Assignee: Bing Li


Similar like what Hive does on HiveServer2 (HIVE-7935), a HiveMetaStore client 
can dynamically resolve an HiveMetaStore service to connect to via ZooKeeper.

*High Level Design:*
Whether dynamic service discovery is supported or not can be configured by 
setting
HIVE_METASTORE_SUPPORT_DYNAMIC_SERVICE_DISCOVERY.  
* This property should ONLY work when HiveMetaStrore service is in remote mode.
* When an instance of HiveMetaStore comes up, it adds itself as a znode to 
Zookeeper under a configurable namespace (HIVE_METASTORE_ZOOKEEPER_NAMESPACE, 
e.g. hivemetastore).
* A thrift client specifies the ZooKeeper ensemble in its connection string, 
instead of pointing to a specific HiveMetaStore instance. The ZooKeeper 
ensemble will pick an instance of HiveMetaStore to connect for the session.
* When an instance is removed from ZooKeeper, the existing client sessions 
continue till completion. When the last client session completes, the instance 
shuts down.
* All new client connection pick one of the available HiveMetaStore uris from 
ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)