[GitHub] incubator-hivemall issue #111: [HIVEMALL-17] Support SLIM

2017-09-13 Thread nzw0301
Github user nzw0301 commented on the issue:

https://github.com/apache/incubator-hivemall/pull/111
  
@myui done.


---


[GitHub] incubator-hivemall issue #111: [HIVEMALL-17] Support SLIM

2017-09-13 Thread nzw0301
Github user nzw0301 commented on the issue:

https://github.com/apache/incubator-hivemall/pull/111
  
@myui ok, please give me a time.


---


[jira] [Closed] (HIVEMALL-132) Generalize f1score UDAF to support any Beta value

2017-09-13 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-132.
---
Resolution: Fixed
  Assignee: Makoto Yui

> Generalize f1score UDAF to support any Beta value
> -
>
> Key: HIVEMALL-132
> URL: https://issues.apache.org/jira/browse/HIVEMALL-132
> Project: Hivemall
>  Issue Type: Improvement
>Reporter: Makoto Yui
>Assignee: Makoto Yui
>Priority: Minor
>
> Currently, `f1score` only support 1.0 for β value of F-measure computation.
> https://en.wikipedia.org/wiki/F1_score
> So, better to provide `f_measure` function that support any β value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-hivemall issue #107: [HIVEMALL-132] Generalize f1score UDAF to sup...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/107
  
@nzw0301 LGTM 👍 Merged. Well done! (thank you for your review @takuti )


---


[GitHub] incubator-hivemall pull request #107: [HIVEMALL-132] Generalize f1score UDAF...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/107


---


[GitHub] incubator-hivemall issue #107: [HIVEMALL-132] Generalize f1score UDAF to sup...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/107
  
@nzw0301 I'll fix and merge it. No need to update this PR. 


---


[GitHub] incubator-hivemall pull request #107: [HIVEMALL-132] Generalize f1score UDAF...

2017-09-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/107#discussion_r138613348
  
--- Diff: docs/gitbook/eval/auc.md ---
@@ -100,7 +100,7 @@ Note that `floor(prob / 0.2)` means that the rows are 
distributed to 5 bins for
 
 # Difference between AUC and Logarithmic Loss
 
-Hivemall has another metric called [Logarithmic 
Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and 
Logarithmic Loss compute scores for probability-label pairs. 
+Hivemall has another metric called [Logarithmic 
Loss](stat_eval.html#logarithmic-loss) for binary classification. Both AUC and 
Logarithmic Loss compute scores for probability-label pairs.
--- End diff --

Missing link. `stat_eval.html` is deleted. 


---


[GitHub] incubator-hivemall pull request #110: [HIVEMALL-142] Implement SingularizeUD...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/110


---


[GitHub] incubator-hivemall issue #114: [HIVEMALL-138-2] refactored to_ordered_map & ...

2017-09-13 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/114
  

[![Coverage 
Status](https://coveralls.io/builds/13251185/badge)](https://coveralls.io/builds/13251185)

Coverage increased (+0.3%) to 40.536% when pulling 
**df39bd5c8254db064ea2f54e7fce6d5e3f363961 on myui:HIVEMALL-138-2** into 
**3804789168dab5c5d43aac1fd4000e07688c6a06 on apache:master**.



---


[jira] [Closed] (HIVEMALL-140) Rename precision UDAF because it is reserved keyword of Hive

2017-09-13 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-140.
---
Resolution: Fixed

> Rename precision UDAF because it is reserved keyword of Hive
> 
>
> Key: HIVEMALL-140
> URL: https://issues.apache.org/jira/browse/HIVEMALL-140
> Project: Hivemall
>  Issue Type: Bug
>Reporter: Makoto Yui
>Assignee: Takuya Kitazawa
>Priority: Critical
>
> `drop temporary function if exists precision;` fails on Hive v2.2.0 or later 
> because `precision` became a reserved keyword from hive v2.2.0.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords
> So, we need to revise UDF name.
> {code}
> precision => precision_at
> recall => recall_at
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-hivemall pull request #109: [HIVEMALL-140] Rename PrecisionUDAF an...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/109


---


[GitHub] incubator-hivemall issue #109: [HIVEMALL-140] Rename PrecisionUDAF and Recal...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/109
  
👍  Merged! Thanks.


---


[jira] [Closed] (HIVEMALL-136) Support train_classifier and train_regressor for Spark

2017-09-13 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-136.
---
Resolution: Fixed
  Assignee: Takeshi Yamamuro

> Support train_classifier and train_regressor for Spark
> --
>
> Key: HIVEMALL-136
> URL: https://issues.apache.org/jira/browse/HIVEMALL-136
> Project: Hivemall
>  Issue Type: Improvement
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>
> This ticket is to support GeneralRegressorUDTF and GeneralClassifierUDTF.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-hivemall issue #113: [HIVEMALL-136][SPARK] Support train_classifie...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/113
  
👍  Merged!


---


[jira] [Closed] (HIVEMALL-133) Support spark-v2.2 in the hivemalls-spark module

2017-09-13 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-133.
---
Resolution: Fixed
  Assignee: Takeshi Yamamuro

> Support spark-v2.2 in the hivemalls-spark module
> 
>
> Key: HIVEMALL-133
> URL: https://issues.apache.org/jira/browse/HIVEMALL-133
> Project: Hivemall
>  Issue Type: Improvement
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>  Labels: spark
>
> Since Spark-v2.2 available now, we support it in /spark module.
> https://databricks.com/blog/2017/07/11/introducing-apache-spark-2-2.html?utm_campaign=Engineering%20Blog_content=57373960_medium=social_source=twitter



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-hivemall issue #112: [HIVEMALL-133][SPARK] Support spark-v2.2 in t...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/112
  
👍  Merged!


---


[jira] [Closed] (HIVEMALL-138) Implement to_top_k_ordered_map

2017-09-13 Thread Makoto Yui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVEMALL-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Makoto Yui closed HIVEMALL-138.
---
Resolution: Fixed

> Implement to_top_k_ordered_map
> --
>
> Key: HIVEMALL-138
> URL: https://issues.apache.org/jira/browse/HIVEMALL-138
> Project: Hivemall
>  Issue Type: New Feature
>Reporter: Takuya Kitazawa
>Assignee: Takuya Kitazawa
>Priority: Minor
>
> As an alternative "each_top_k" functionality, let us implement 
> "to_top_k_ordered_map(int k, int key, int value)" UDAF. Compared to the 
> CLUSTER BY + "each_top_k" option, UDAF enables us to utilize mapper-side 
> aggregation.
> According to [~myui]:
> A problem is that multiple to_top_k_ordered_map UDAFs is concurrently 
> executed and memory consumption is not reduced.
> to_top_k_ordered_map will become O(|article_id|*k) (or, 
> O(|article_id|*k/reducers*combiner_effect_ratio) per a reducer) space 
> complexity while each_top_k is O(k) (or O(k/reducers) per a reducer) space 
> complexity in an operator. each_top_k internally uses priority queue (not 
> sorting), assuming the given inputs are sorted by a group key using CLUSTER 
> BY. Shuffle involves a scalable external sort and memory space complexity can 
> be avoided.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-hivemall issue #108: [HIVEMALL-138] `to_ordered_map` & `to_ordered...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/108
  
Merged. Thanks!


---


[GitHub] incubator-hivemall issue #114: [HIVEMALL-138-2] refactored to_ordered_map & ...

2017-09-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/114
  
@takuti reverted some changes by me and merged.


---


[GitHub] incubator-hivemall pull request #114: [HIVEMALL-138-2] refactored to_ordered...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/114


---


[GitHub] incubator-hivemall pull request #108: [HIVEMALL-138] `to_ordered_map` & `to_...

2017-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/108


---


[GitHub] incubator-hivemall pull request #108: [HIVEMALL-138] `to_ordered_map` & `to_...

2017-09-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/108#discussion_r138026120
  
--- Diff: core/src/main/java/hivemall/tools/list/UDAFToOrderedList.java ---
@@ -0,0 +1,535 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.tools.list;
+
+import hivemall.utils.collections.BoundedPriorityQueue;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.CommandLineUtils;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Options;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFParameterInfo;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.BooleanWritable;
+import org.apache.hadoop.io.IntWritable;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.util.*;
+
+/**
+ * Return list of values sorted by value itself or specific key.
+ */
+@Description(
+name = "to_ordered_list",
+value = "_FUNC_(value [, key, const string options]) - Return list 
of values sorted by value itself or specific key")
+public class UDAFToOrderedList extends AbstractGenericUDAFResolver {
+
+@Override
+public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo info)
+throws SemanticException {
+@SuppressWarnings("deprecation")
+TypeInfo[] typeInfo = info.getParameters();
+ObjectInspector[] argOIs = info.getParameterObjectInspectors();
+if ((typeInfo.length == 1) || (typeInfo.length == 2 && 
HiveUtils.isConstString(argOIs[1]))) {
+// sort values by value itself w/o key
+if (typeInfo[0].getCategory() != 
ObjectInspector.Category.PRIMITIVE) {
+throw new UDFArgumentTypeException(0,
+"Only primitive type arguments are accepted for value 
but "
++ typeInfo[0].getTypeName() + " was passed as 
the first parameter.");
+}
+} else if ((typeInfo.length == 2)
+|| (typeInfo.length == 3 && 
HiveUtils.isConstString(argOIs[2]))) {
+// sort values by key
+if (typeInfo[1].getCategory() != 
ObjectInspector.Category.PRIMITIVE) {
+throw new UDFArgumentTypeException(1,
+"Only primitive type arguments are accepted for key 
but "
++ typeInfo[1].getTypeName() + " was passed as 
the second parameter.");
+}
+} else {
+throw new UDFArgumentTypeException(typeInfo.length - 1,
+"Number of arguments must be in [1, 3] including constant 
string for options: "
++ typeInfo.length);
+}
+return new UDAFToOrderedListEvaluator();
+}
+
+public static class UDAFToOrderedListEvaluator extends 
GenericUDAFEvaluator {
+
+private ObjectInspector valueOI;
+private PrimitiveObjectInspector keyOI;
+
+private ListObjectInspector valueListOI;
+private ListObjectInspector keyListOI;
+
+private StructObjectInspector internalMergeOI;
+
+private 

[GitHub] incubator-hivemall pull request #108: [HIVEMALL-138] `to_ordered_map` & `to_...

2017-09-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/108#discussion_r138024907
  
--- Diff: core/src/main/java/hivemall/tools/list/UDAFToOrderedList.java ---
@@ -0,0 +1,535 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.tools.list;
+
+import hivemall.utils.collections.BoundedPriorityQueue;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.CommandLineUtils;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Options;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFParameterInfo;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.BooleanWritable;
+import org.apache.hadoop.io.IntWritable;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import java.io.PrintWriter;
+import java.io.StringWriter;
+import java.util.*;
+
+/**
+ * Return list of values sorted by value itself or specific key.
+ */
+@Description(
+name = "to_ordered_list",
+value = "_FUNC_(value [, key, const string options]) - Return list 
of values sorted by value itself or specific key")
+public class UDAFToOrderedList extends AbstractGenericUDAFResolver {
+
+@Override
+public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo info)
+throws SemanticException {
+@SuppressWarnings("deprecation")
+TypeInfo[] typeInfo = info.getParameters();
+ObjectInspector[] argOIs = info.getParameterObjectInspectors();
+if ((typeInfo.length == 1) || (typeInfo.length == 2 && 
HiveUtils.isConstString(argOIs[1]))) {
+// sort values by value itself w/o key
+if (typeInfo[0].getCategory() != 
ObjectInspector.Category.PRIMITIVE) {
+throw new UDFArgumentTypeException(0,
+"Only primitive type arguments are accepted for value 
but "
++ typeInfo[0].getTypeName() + " was passed as 
the first parameter.");
+}
+} else if ((typeInfo.length == 2)
+|| (typeInfo.length == 3 && 
HiveUtils.isConstString(argOIs[2]))) {
+// sort values by key
+if (typeInfo[1].getCategory() != 
ObjectInspector.Category.PRIMITIVE) {
+throw new UDFArgumentTypeException(1,
+"Only primitive type arguments are accepted for key 
but "
++ typeInfo[1].getTypeName() + " was passed as 
the second parameter.");
+}
+} else {
+throw new UDFArgumentTypeException(typeInfo.length - 1,
+"Number of arguments must be in [1, 3] including constant 
string for options: "
++ typeInfo.length);
+}
+return new UDAFToOrderedListEvaluator();
+}
+
+public static class UDAFToOrderedListEvaluator extends 
GenericUDAFEvaluator {
+
+private ObjectInspector valueOI;
+private PrimitiveObjectInspector keyOI;
+
+private ListObjectInspector valueListOI;
+private ListObjectInspector keyListOI;
+
+private StructObjectInspector internalMergeOI;
+
+private 

[GitHub] incubator-hivemall pull request #108: [HIVEMALL-138] `to_ordered_map` & `to_...

2017-09-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/108#discussion_r138024379
  
--- Diff: core/src/main/java/hivemall/tools/map/UDAFToOrderedMap.java ---
@@ -54,19 +68,35 @@ public GenericUDAFEvaluator 
getEvaluator(GenericUDAFParameterInfo info)
 "Only primitive type arguments are accepted for the key 
but "
 + typeInfo[0].getTypeName() + " was passed as 
parameter 1.");
 }
+
 boolean reverseOrder = false;
+int size = 0;
 if (typeInfo.length == 3) {
-if (HiveUtils.isBooleanTypeInfo(typeInfo[2]) == false) {
-throw new UDFArgumentTypeException(2, "The three argument 
must be boolean type: "
-+ typeInfo[2].getTypeName());
-}
 ObjectInspector[] argOIs = info.getParameterObjectInspectors();
-reverseOrder = HiveUtils.getConstBoolean(argOIs[2]);
+if (HiveUtils.isBooleanTypeInfo(typeInfo[2])) {
+reverseOrder = HiveUtils.getConstBoolean(argOIs[2]);
+} else if (HiveUtils.isIntegerTypeInfo(typeInfo[2])) {
+size = HiveUtils.getConstInt(argOIs[2]);
+if (size == 0) {
+throw new UDFArgumentException("Map size must be 
nonzero: " + size);
+}
+reverseOrder = (size > 0); // positive size => top-k
+} else {
+throw new UDFArgumentTypeException(2,
+"The third argument must be boolean or integer type: "
++ typeInfo[2].getTypeName());
+}
 }
 
-if (reverseOrder) {
+if (reverseOrder) { // descending
--- End diff --

Better to implement `BoundedSortedMap` to avoid duplicate codes and memory 
in-efficient top-k operation.


---


[GitHub] incubator-hivemall pull request #108: [HIVEMALL-138] `to_ordered_map` & `to_...

2017-09-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/108#discussion_r138585976
  
--- Diff: core/src/main/java/hivemall/tools/map/UDAFToOrderedMap.java ---
@@ -92,4 +122,172 @@ public void reset(@SuppressWarnings("deprecation") 
AggregationBuffer agg)
 
 }
 
+public static class TopKOrderedMapEvaluator extends 
GenericUDAFEvaluator {
+
+protected PrimitiveObjectInspector inputKeyOI;
+protected ObjectInspector inputValueOI;
+protected StandardMapObjectInspector partialMapOI;
+protected PrimitiveObjectInspector sizeOI;
+
+protected StructObjectInspector internalMergeOI;
+
+protected StructField partialMapField;
+protected StructField sizeField;
+
+@Override
+public ObjectInspector init(Mode mode, ObjectInspector[] argOIs) 
throws HiveException {
+super.init(mode, argOIs);
+
+// initialize input
+if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {// from 
original data
+this.inputKeyOI = 
HiveUtils.asPrimitiveObjectInspector(argOIs[0]);
+this.inputValueOI = argOIs[1];
+this.sizeOI = HiveUtils.asIntegerOI(argOIs[2]);
+} else {// from partial aggregation
+StructObjectInspector soi = (StructObjectInspector) 
argOIs[0];
+this.internalMergeOI = soi;
+
+this.partialMapField = soi.getStructFieldRef("partialMap");
+// re-extract input key/value OIs
+StandardMapObjectInspector partialMapOI = 
(StandardMapObjectInspector) partialMapField.getFieldObjectInspector();
+this.inputKeyOI = 
HiveUtils.asPrimitiveObjectInspector(partialMapOI.getMapKeyObjectInspector());
+this.inputValueOI = 
partialMapOI.getMapValueObjectInspector();
+
+this.partialMapOI = 
ObjectInspectorFactory.getStandardMapObjectInspector(
+
ObjectInspectorUtils.getStandardObjectInspector(inputKeyOI),
+
ObjectInspectorUtils.getStandardObjectInspector(inputValueOI));
+
+this.sizeField = soi.getStructFieldRef("size");
+this.sizeOI = (PrimitiveObjectInspector) 
sizeField.getFieldObjectInspector();
+}
+
+// initialize output
+final ObjectInspector outputOI;
+if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {// 
terminatePartial
+outputOI = internalMergeOI(inputKeyOI, inputValueOI);
+} else {// terminate
+outputOI = 
ObjectInspectorFactory.getStandardMapObjectInspector(
+
ObjectInspectorUtils.getStandardObjectInspector(inputKeyOI),
+
ObjectInspectorUtils.getStandardObjectInspector(inputValueOI));
+}
+return outputOI;
+}
+
+private static StructObjectInspector internalMergeOI(
+@Nonnull PrimitiveObjectInspector keyOI, @Nonnull 
ObjectInspector valueOI) {
+ArrayList fieldNames = new ArrayList();
+ArrayList fieldOIs = new 
ArrayList();
+
+fieldNames.add("partialMap");
+
fieldOIs.add(ObjectInspectorFactory.getStandardMapObjectInspector(
+ObjectInspectorUtils.getStandardObjectInspector(keyOI),
+ObjectInspectorUtils.getStandardObjectInspector(valueOI)));
+
+fieldNames.add("size");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableIntObjectInspector);
+
+return 
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
+}
+
+static class MapAggregationBuffer extends 
AbstractAggregationBuffer {
+Map container;
+int size;
+
+MapAggregationBuffer() {
+super();
+}
+}
+
+@Override
+public void reset(@SuppressWarnings("deprecation") 
AggregationBuffer agg)
+throws HiveException {
+MapAggregationBuffer myagg = (MapAggregationBuffer) agg;
+myagg.container = new TreeMap(Collections.reverseOrder());
+myagg.size = Integer.MAX_VALUE;
+}
+
+@Override
+public MapAggregationBuffer getNewAggregationBuffer() throws 
HiveException {
+MapAggregationBuffer myagg = new MapAggregationBuffer();
+reset(myagg);
+return myagg;
+}
+
+@Override
+public void 

[GitHub] incubator-hivemall pull request #108: [HIVEMALL-138] `to_ordered_map` & `to_...

2017-09-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/108#discussion_r138586028
  
--- Diff: core/src/main/java/hivemall/tools/map/UDAFToOrderedMap.java ---
@@ -92,4 +122,172 @@ public void reset(@SuppressWarnings("deprecation") 
AggregationBuffer agg)
 
 }
 
+public static class TopKOrderedMapEvaluator extends 
GenericUDAFEvaluator {
+
+protected PrimitiveObjectInspector inputKeyOI;
+protected ObjectInspector inputValueOI;
+protected StandardMapObjectInspector partialMapOI;
+protected PrimitiveObjectInspector sizeOI;
+
+protected StructObjectInspector internalMergeOI;
+
+protected StructField partialMapField;
+protected StructField sizeField;
+
+@Override
+public ObjectInspector init(Mode mode, ObjectInspector[] argOIs) 
throws HiveException {
+super.init(mode, argOIs);
+
+// initialize input
+if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {// from 
original data
+this.inputKeyOI = 
HiveUtils.asPrimitiveObjectInspector(argOIs[0]);
+this.inputValueOI = argOIs[1];
+this.sizeOI = HiveUtils.asIntegerOI(argOIs[2]);
--- End diff --

parameter might be boolean for `argOIs[2]`


---