[jira] [Commented] (HIVE-18049) Enable Hive on Tez to provide globally sorted clustered table

Hive QA (JIRA) Sun, 12 Nov 2017 22:54:33 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249162#comment-16249162
 ]


Hive QA commented on HIVE-18049:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12897286/HIVE-18049.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7784/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7784/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7784/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-11-13 06:53:32.575
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-7784/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-11-13 06:53:32.578
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   67888cf..25a6f4c  master     -> origin/master
+ git reset --hard HEAD
HEAD is now at 67888cf HIVE-17995 Run checkstyle on standalone-metastore module 
with proper configuration (Adam Szita via Alan Gates)
+ git clean -f -d
Removing ${project.basedir}/
Removing 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 25a6f4c HIVE-17615: Task.executeTask has to be thread safe for 
parallel execution (Anishek Agarwal reviewed by Daniel Dai)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-11-13 06:53:37.768
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
fatal: git diff header lacks filename information when removing 0 leading 
pathname components (line 41)
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12897286 - PreCommit-HIVE-Build

> Enable Hive on Tez to provide globally sorted clustered table
> -------------------------------------------------------------
>
>                 Key: HIVE-18049
>                 URL: https://issues.apache.org/jira/browse/HIVE-18049
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, Tez
>            Reporter: LingXiao Lan
>             Fix For: 2.1.1
>
>         Attachments: HIVE-18049.1.patch, HIVE-18049.2.patch, 
> HIVE-18049.3.patch
>
>
> {code:sql}
> CREATE TABLE `test`(
>    `time` int,
>    `userid` bigint)
>  CLUSTERED BY (
>    userid)
>  SORTED BY (
>    userid ASC)
>  INTO 4 BUCKETS
>  ;
> {code}
> When insert data into this table, the data will be sorted into 4 buckets 
> automatically. But because hive uses hash partitioner by default, the data is 
> only sorted in each bucket and isn't sorted among different buckets. 
> Sometimes we need the data to be globally sorted, to optimizing indexing, for 
> example.
> If we can sample the table first and use TotalOrderPartitioner, this work 
> could be done. The difficulty is how do we automatically decide when to use 
> TotalOrderPartitioner and when not, because a insertion query can be complex, 
> which results in a complex DAG in Tez.
> I have implemented a temporary version. It uses a customer partitioner which 
> combines hash partitioner and totalorder partitioner. A physical optimizer is 
> added to hive to decide to choose which partitioner. But in order to reduce 
> the work load, this version should affect tez source code, which is not 
> necessary in fact.
> I'm wondering if we can implement a more common version which addresses this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-18049) Enable Hive on Tez to provide globally sorted clustered table

Reply via email to