[
https://issues.apache.org/jira/browse/HIVE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182826#comment-13182826
]
Phabricator commented on HIVE-2279:
-----------------------------------
zhenxiao has commented on the revision "HIVE-2279 [jira] Implement sort(array)
UDF".
sort() is a better name for sort_array(), while, seems currently the
parser/semantic analyzer has some problem taking a reserved keyword as UDF
function name.
I tried the following changes in HIve.g:
[~/Code/hive]git diff ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
b/ql/src/java/org/apache/hadoop/h
index 888bf47..ec256de 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
@@ -1816,7 +1816,7 @@ functionName
@init { msgs.push("function name"); }
@after { msgs.pop(); }
: // Keyword IF is also a function name
- Identifier | KW_IF | KW_ARRAY | KW_MAP | KW_STRUCT | KW_UNIONTYPE
+ Identifier | KW_IF | KW_ARRAY | KW_MAP | KW_STRUCT | KW_UNIONTYPE |
KW_SORT
;
castExpression
@@ -2091,6 +2091,7 @@ sysFuncNames
| KW_MAP
| KW_STRUCT
| KW_UNIONTYPE
+ | KW_SORT
| EQUAL
| NOTEQUAL
| LESSTHANOREQUALTO
While, the testcase always fails during semantic analysis on argument length:
-- Evaluate function against STRING valued keys
EXPLAIN
SELECT sort(array("b", "d", "c", "a")) FROM src LIMIT 1
2012-01-09 11:31:55,134 INFO parse.ParseDriver (ParseDriver.java:parse(426))
- Parsing command:
-- Evaluate function against STRING valued keys
EXPLAIN
SELECT sort(array("b", "d", "c", "a")) FROM src LIMIT 1
2012-01-09 11:31:55,146 INFO parse.ParseDriver (ParseDriver.java:parse(443))
- Parse Completed
2012-01-09 11:31:55,147 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:analyzeInternal(7445)) - Starting Semantic Analysis
2012-01-09 11:31:55,148 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:analyzeInternal(7475)) - Completed phase 1 of Semantic
Analysis
2012-01-09 11:31:55,148 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:getMetaData(942)) - Get metadata for source tables
2012-01-09 11:31:55,149 INFO metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(528)) - 0: get_table : db=default tbl=src
2012-01-09 11:31:55,200 INFO hive.log
(MetaStoreUtils.java:getDDLFromFieldSchema(457)) - DDL: struct src { string
key, string value}
2012-01-09 11:31:55,200 DEBUG lazy.LazySimpleSerDe
(LazySimpleSerDe.java:initialize(195)) -
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with:
columnNames=[key, value] columnTypes=[string, string] separator=[[B@3bb20e65]
nullstring=\N lastColumnTakesRest=false
2012-01-09 11:31:55,200 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:getMetaData(1021)) - Get metadata for subqueries
2012-01-09 11:31:55,201 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:getMetaData(1035)) - Get metadata for destination tables
2012-01-09 11:31:55,201 INFO parse.SemanticAnalyzer
(SemanticAnalyzer.java:analyzeInternal(7478)) - Completed getting MetaData in
Semantic Analysis
2012-01-09 11:31:55,203 INFO hive.log
(MetaStoreUtils.java:getDDLFromFieldSchema(457)) - DDL: struct src { string
key, string value}
2012-01-09 11:31:55,203 DEBUG lazy.LazySimpleSerDe
(LazySimpleSerDe.java:initialize(195)) -
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with:
columnNames=[key, value] columnTypes=[string, string] separator=[[B@12e84396]
nullstring=\N lastColumnTakesRest=false
2012-01-09 11:31:55,222 DEBUG parse.SemanticAnalyzer
(SemanticAnalyzer.java:genTablePlan(6598)) - Created Table Plan for src
org.apache.hadoop.hive.ql.exec.TableScanOperator@5e9ea579
2012-01-09 11:31:55,223 DEBUG parse.SemanticAnalyzer
(SemanticAnalyzer.java:genSelectPlan(2117)) - tree: (TOK_SELECT (TOK_SELEXPR
(TOK_FUNCTION sort (TOK_FUNCTION array "b" "d" "c" "a"))))
2012-01-09 11:31:55,225 DEBUG parse.SemanticAnalyzer
(SemanticAnalyzer.java:genSelectPlan(2222)) - genSelectPlan: input =
src{(key,key: string)(value,value:
string)(block__offset__inside__file,BLOCK__OFFSET__INSIDE__FILE:
bigint)(input__file__name,INPUT__FILE__NAME: string)}
2012-01-09 11:31:55,234 ERROR ql.Driver (SessionState.java:printError(380)) -
FAILED: Error in semantic analysis: Line 5:7 Arguments length mismatch 'sort':
The function SORT(array(obj1, obj2,...)) needs one argument.
org.apache.hadoop.hive.ql.parse.SemanticException: Line 5:7 Arguments length
mismatch 'sort': The function SORT(array(obj1, obj2,...)) needs one argument.
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:810)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:161)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7708)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2301
The same thing happens when I was doing format().
REVISION DETAIL
https://reviews.facebook.net/D1125
> Implement sort(array) UDF
> -------------------------
>
> Key: HIVE-2279
> URL: https://issues.apache.org/jira/browse/HIVE-2279
> Project: Hive
> Issue Type: New Feature
> Components: UDF
> Reporter: Carl Steinbach
> Assignee: Zhenxiao Luo
> Attachments: HIVE-2279.D1059.1.patch, HIVE-2279.D1101.1.patch,
> HIVE-2279.D1107.1.patch, HIVE-2279.D1125.1.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira