[ https://issues.apache.org/jira/browse/HIVE-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-14159: ---------------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master: {noformat} % g log -1 --stat commit 6e76ee3aef2210b2a1efa20d92ac997800cfcb75 Author: Carl Steinbach <cstei...@linkedin.com> Date: Wed Sep 7 11:28:35 2016 -0700 HIVE-14159 : sorting of tuple array using multiple field[s] (Simanchal Das via Carl Steinbach) itests/src/test/resources/testconfiguration.properties | 1 + itests/src/test/resources/testconfiguration.properties.orig | 8 +- ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java | 1 + .../org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java | 202 ++++++++++++++++++ .../apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java | 228 ++++++++++++++++++++ ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q | 2 + ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q | 2 + ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q | 16 ++ ql/src/test/queries/clientpositive/udf_sort_array_by.q | 136 ++++++++++++ ql/src/test/results/beelinepositive/show_functions.q.out | 1 + ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out | 1 + ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out | 1 + ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out | 37 ++++ ql/src/test/results/clientpositive/show_functions.q.out | 1 + ql/src/test/results/clientpositive/udf_sort_array_by.q.out | 401 +++++++++++++++++++++++++++++++++++ 15 files changed, 1036 insertions(+), 2 deletions(-) {noformat} > sorting of tuple array using multiple field[s] > ---------------------------------------------- > > Key: HIVE-14159 > URL: https://issues.apache.org/jira/browse/HIVE-14159 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Simanchal Das > Assignee: Simanchal Das > Labels: patch > Fix For: 2.2.0 > > Attachments: HIVE-14159.1.patch, HIVE-14159.2.patch, > HIVE-14159.3.patch, HIVE-14159.4.patch > > > Problem Statement: > When we are working with complex structure of data like avro. > Most of the times we are encountering array contains multiple tuples and each > tuple have struct schema. > Suppose here struct schema is like below: > {noformat} > { > "name": "employee", > "type": [{ > "type": "record", > "name": "Employee", > "namespace": "com.company.Employee", > "fields": [{ > "name": "empId", > "type": "int" > }, { > "name": "empName", > "type": "string" > }, { > "name": "age", > "type": "int" > }, { > "name": "salary", > "type": "double" > }] > }] > } > {noformat} > Then while running our hive query complex array looks like array of employee > objects. > {noformat} > Example: > //(array<struct<empId,empName,age,salary>>) > > Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)] > {noformat} > When we are implementing business use cases day to day life we are > encountering problems like sorting a tuple array by specific field[s] like > empId,name,salary,etc by ASC or DESC order. > Proposal: > I have developed a udf 'sort_array_by' which will sort a tuple array by one > or more fields in ASC or DESC order provided by user ,default is ascending > order . > {noformat} > Example: > 1.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC"); > output: > array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)] > > 2.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > 3.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)