[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834436#comment-13834436 ]
Hive QA commented on HIVE-3286: ------------------------------- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12616156/HIVE-3286.13.patch.txt Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/472/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/472/console Messages: {noformat} **** This message was trimmed, see log for full details **** As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_CASE TinyintLiteral" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_NULL GREATERTHAN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_NOT DecimalLiteral" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as "LPAREN KW_CASE DecimalLiteral" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:108:5: Decision can match input such as "KW_ORDER KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:121:5: Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:133:5: Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:144:5: Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:155:5: Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:172:7: Decision can match input such as "STAR" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:185:5: Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:185:5: Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:185:5: Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:267:5: Decision can match input such as "KW_DATE StringLiteral" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "KW_BETWEEN KW_MAP LPAREN" using multiple alternatives: 8, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:399:5: Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY" using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:524:5: Decision can match input such as "{AMPERSAND..BITWISEXOR, DIV..DIVIDE, EQUAL..EQUAL_NS, GREATERTHAN..GREATERTHANOREQUALTO, KW_AND, KW_ARRAY, KW_BETWEEN..KW_BOOLEAN, KW_CASE, KW_DOUBLE, KW_FLOAT, KW_IF, KW_IN, KW_INT, KW_LIKE, KW_MAP, KW_NOT, KW_OR, KW_REGEXP, KW_RLIKE, KW_SMALLINT, KW_STRING..KW_STRUCT, KW_TINYINT, KW_UNIONTYPE, KW_WHEN, LESSTHAN..LESSTHANOREQUALTO, MINUS..NOTEQUAL, PLUS, STAR, TILDE}" using multiple alternatives: 1, 3 As a result, alternative(s) 3 were disabled for that input [INFO] [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ hive-exec --- [debug] execute contextualize [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] Copying 1 resource [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-exec --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-exec --- [INFO] Compiling 1400 source files to /data/hive-ptest/working/apache-svn-trunk-source/ql/target/classes [INFO] ------------------------------------------------------------- [WARNING] COMPILATION WARNING : [INFO] ------------------------------------------------------------- [WARNING] Note: Some input files use or override a deprecated API. [WARNING] Note: Recompile with -Xlint:deprecation for details. [WARNING] Note: Some input files use unchecked or unsafe operations. [WARNING] Note: Recompile with -Xlint:unchecked for details. [INFO] 4 warnings [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------- [ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/plan/SkewContext.java:[118,49] cannot find symbol symbol : method getRandom() location: class org.apache.hadoop.hive.ql.io.HiveKey [INFO] 1 error [INFO] ------------------------------------------------------------- [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Hive .............................................. SUCCESS [2.846s] [INFO] Hive Ant Utilities ................................ SUCCESS [9.072s] [INFO] Hive Shims Common ................................. SUCCESS [3.459s] [INFO] Hive Shims 0.20 ................................... SUCCESS [2.210s] [INFO] Hive Shims Secure Common .......................... SUCCESS [2.711s] [INFO] Hive Shims 0.20S .................................. SUCCESS [1.397s] [INFO] Hive Shims 0.23 ................................... SUCCESS [3.679s] [INFO] Hive Shims ........................................ SUCCESS [3.073s] [INFO] Hive Common ....................................... SUCCESS [13.388s] [INFO] Hive Serde ........................................ SUCCESS [11.713s] [INFO] Hive Metastore .................................... SUCCESS [26.188s] [INFO] Hive Query Language ............................... FAILURE [30.091s] [INFO] Hive Service ...................................... SKIPPED [INFO] Hive JDBC ......................................... SKIPPED [INFO] Hive Beeline ...................................... SKIPPED [INFO] Hive CLI .......................................... SKIPPED [INFO] Hive Contrib ...................................... SKIPPED [INFO] Hive HBase Handler ................................ SKIPPED [INFO] Hive HCatalog ..................................... SKIPPED [INFO] Hive HCatalog Core ................................ SKIPPED [INFO] Hive HCatalog Pig Adapter ......................... SKIPPED [INFO] Hive HCatalog Server Extensions ................... SKIPPED [INFO] Hive HCatalog Webhcat Java Client ................. SKIPPED [INFO] Hive HCatalog Webhcat ............................. SKIPPED [INFO] Hive HCatalog HBase Storage Handler ............... SKIPPED [INFO] Hive HWI .......................................... SKIPPED [INFO] Hive ODBC ......................................... SKIPPED [INFO] Hive Shims Aggregator ............................. SKIPPED [INFO] Hive TestUtils .................................... SKIPPED [INFO] Hive Packaging .................................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:52.444s [INFO] Finished at: Wed Nov 27 20:19:46 EST 2013 [INFO] Final Memory: 51M/371M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/plan/SkewContext.java:[118,49] cannot find symbol [ERROR] symbol : method getRandom() [ERROR] location: class org.apache.hadoop.hive.ql.io.HiveKey [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :hive-exec + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12616156 > Explicit skew join on user provided condition > --------------------------------------------- > > Key: HIVE-3286 > URL: https://issues.apache.org/jira/browse/HIVE-3286 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Navis > Assignee: Navis > Priority: Minor > Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, > HIVE-3286.13.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, > HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, > HIVE-3286.D4287.9.patch > > > Join operation on table with skewed data takes most of execution time > handling the skewed keys. But mostly we already know about that and even know > what is look like the skewed keys. > If we can explicitly assign reducer slots for the skewed keys, total > execution time could be greatly shortened. > As for a start, I've extended join grammar something like this. > {code} > select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 > < 100, a.key < 150); > {code} > which means if above query is executed by 20 reducers, one reducer for > a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= > a.key < 150, and 17 reducers for others (could be extended to assign more > than one reducer later) > This can be only used with common-inner-equi joins. And skew condition should > be composed of join keys only. > Work till done now will be updated shortly after code cleanup. > ---------------------------- > Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially > at runtime, and first 'true' one decides skew group for the row. Each skew > group has reserved partition slot(s), to which all rows in a group would be > assigned. > The number of partition slot reserved for each group is decided also at > runtime by simple calculation of percentage. If a skew group is "CLUSTER BY > 20 PERCENT" and total partition slot (=number of reducer) is 20, that group > will reserve 4 partition slots, etc. > "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of > reserved slots (If there is only one slot for a group, this is meaningless). > Currently, three distribution policies are available: RANDOM, KEYS, > <expression>. > 1. RANDOM : rows of driver** alias are dispersed by random and rows of > non-driver alias are duplicated for all the slots (default if not specified) > 2. KEYS : determined by hash value of keys (same with previous) > 3. expression : determined by hash of object evaluated by user-provided > expression > Only possible with inner, equi, common-joins. Not yet supports join tree > merging. > Might be used by other RS users like "SORT BY" or "GROUP BY" > If there exists column statistics for the key, it could be possible to apply > automatically. > For example, if 20 reducers are used for the query below, > {code} > select count(*) from src a join src b on a.key=b.key skew on ( > a.key = '0' CLUSTER BY 10 PERCENT, > b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), > cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); > {code} > group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will > reserve slots 0~5. > For a row with key='0' from alias a, the row is randomly assigned in the > range of 6~7 (driver alias) : 6 or 7 > For a row with key='0' from alias b, the row is disributed for all slots in > 6~7 (non-driver alias) : 6 and 7 > For a row with key='50', the row is assigned in the range of 8~11 by hashcode > of upper(b.key) : 8 + (hash(upper(key)) % 4) > For a row with key='500', the row is assigned in the range of 12~19 by > hashcode of join key : 12 + (hash(key) % 8) > For a row with key='200', this is not belong to any skew group : hash(key) % 6 > *expressions in skew condition : > 1. all expressions should be made of expression in join condition, which > means if join condition is "a.key=b.key", user can make any expression with > "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make > expression with "a.key" solely (should make expression with "a.key+1"). > 2. all expressions should reference one and only-one side of aliases. For > example, simple constant expressions or expressions referencing both side of > join condition ("a.key+b.key<100") is not allowed. > 3. all functions in expression should be deteministic and stateless. > 4. if "DISTRIBUTED BY expression" is used, distibution expression also should > have same alias with skew expression. > **driver alias : > 1. driver alias means the sole referenced alias from skew expression, which > is important for RANDOM distribution. rows of driver alias are assigned to > single slot randomly, but rows of non-driver alias are duplicated for all the > slots. So, driver alias should be the biggest one in join aliases. -- This message was sent by Atlassian JIRA (v6.1#6144)