[GitHub] incubator-hivemall pull request #52: [HIVEMALL-78] Implement AUC UDAF for bi...

2017-02-27 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/52#discussion_r103388346
  
--- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
@@ -49,35 +50,251 @@
 @SuppressWarnings("deprecation")
 @Description(
 name = "auc",
-value = "_FUNC_(array rankItems, array correctItems [, const int 
recommendSize = rankItems.size])"
+value = "_FUNC_(array rankItems | double score, array correctItems 
| double label "
++ "[, const int recommendSize = rankItems.size ])"
 + " - Returns AUC")
 public final class AUCUDAF extends AbstractGenericUDAFResolver {
 
-// prevent instantiation
-private AUCUDAF() {}
-
 @Override
 public GenericUDAFEvaluator getEvaluator(@Nonnull TypeInfo[] typeInfo) 
throws SemanticException {
 if (typeInfo.length != 2 && typeInfo.length != 3) {
 throw new UDFArgumentTypeException(typeInfo.length - 1,
 "_FUNC_ takes two or three arguments");
 }
 
-ListTypeInfo arg1type = HiveUtils.asListTypeInfo(typeInfo[0]);
-if 
(!HiveUtils.isPrimitiveTypeInfo(arg1type.getListElementTypeInfo())) {
-throw new UDFArgumentTypeException(0,
-"The first argument `array rankItems` is invalid form: " + 
typeInfo[0]);
+if (HiveUtils.isNumberTypeInfo(typeInfo[0]) && 
HiveUtils.isNumberTypeInfo(typeInfo[1])) {
--- End diff --

`&& HiveUtils.isIntegerTypeInfo(typeInfo[1])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #52: [HIVEMALL-78] Implement AUC UDAF for bi...

2017-02-27 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/52#discussion_r103387903
  
--- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
@@ -49,35 +50,251 @@
 @SuppressWarnings("deprecation")
 @Description(
 name = "auc",
-value = "_FUNC_(array rankItems, array correctItems [, const int 
recommendSize = rankItems.size])"
+value = "_FUNC_(array rankItems | double score, array correctItems 
| double label "
++ "[, const int recommendSize = rankItems.size ])"
 + " - Returns AUC")
 public final class AUCUDAF extends AbstractGenericUDAFResolver {
 
-// prevent instantiation
-private AUCUDAF() {}
-
 @Override
 public GenericUDAFEvaluator getEvaluator(@Nonnull TypeInfo[] typeInfo) 
throws SemanticException {
 if (typeInfo.length != 2 && typeInfo.length != 3) {
 throw new UDFArgumentTypeException(typeInfo.length - 1,
 "_FUNC_ takes two or three arguments");
 }
 
-ListTypeInfo arg1type = HiveUtils.asListTypeInfo(typeInfo[0]);
-if 
(!HiveUtils.isPrimitiveTypeInfo(arg1type.getListElementTypeInfo())) {
-throw new UDFArgumentTypeException(0,
-"The first argument `array rankItems` is invalid form: " + 
typeInfo[0]);
+if (HiveUtils.isNumberTypeInfo(typeInfo[0]) && 
HiveUtils.isNumberTypeInfo(typeInfo[1])) {
+return new ClassificationEvaluator();
+} else {
+ListTypeInfo arg1type = HiveUtils.asListTypeInfo(typeInfo[0]);
+if 
(!HiveUtils.isPrimitiveTypeInfo(arg1type.getListElementTypeInfo())) {
+throw new UDFArgumentTypeException(0,
+"The first argument `array rankItems` is invalid form: 
" + typeInfo[0]);
+}
+
+ListTypeInfo arg2type = HiveUtils.asListTypeInfo(typeInfo[1]);
+if 
(!HiveUtils.isPrimitiveTypeInfo(arg2type.getListElementTypeInfo())) {
+throw new UDFArgumentTypeException(1,
+"The second argument `array correctItems` is invalid 
form: " + typeInfo[1]);
+}
+
+return new RankingEvaluator();
+}
+}
+
+public static class ClassificationEvaluator extends 
GenericUDAFEvaluator {
+
+private PrimitiveObjectInspector scoreOI;
+private PrimitiveObjectInspector labelOI;
+
+private StructObjectInspector internalMergeOI;
+private StructField aField;
+private StructField scorePrevField;
+private StructField fpField;
+private StructField tpField;
+private StructField fpPrevField;
+private StructField tpPrevField;
+
+public ClassificationEvaluator() {}
+
+@Override
+public ObjectInspector init(Mode mode, ObjectInspector[] 
parameters) throws HiveException {
+assert (parameters.length == 2 || parameters.length == 3) : 
parameters.length;
+super.init(mode, parameters);
+
+// initialize input
+if (mode == Mode.PARTIAL1 || mode == Mode.COMPLETE) {// from 
original data
+this.scoreOI = (PrimitiveObjectInspector) parameters[0];
+this.labelOI = (PrimitiveObjectInspector) parameters[1];
+} else {// from partial aggregation
+StructObjectInspector soi = (StructObjectInspector) 
parameters[0];
+this.internalMergeOI = soi;
+this.aField = soi.getStructFieldRef("a");
+this.scorePrevField = soi.getStructFieldRef("scorePrev");
+this.fpField = soi.getStructFieldRef("fp");
+this.tpField = soi.getStructFieldRef("tp");
+this.fpPrevField = soi.getStructFieldRef("fpPrev");
+this.tpPrevField = soi.getStructFieldRef("tpPrev");
+}
+
+// initialize output
+final ObjectInspector outputOI;
+if (mode == Mode.PARTIAL1 || mode == Mode.PARTIAL2) {// 
terminatePartial
+outputOI = internalMergeOI();
+} else {// terminate
+outputOI = 
PrimitiveObjectInspectorFactory.writableDoubleObjectInspector;
+}
+return outputOI;
+}
+
+private static StructObjectInspector internalMergeOI() {
+ArrayList fieldNames = new ArrayList();
+ArrayList fieldOIs = new 
ArrayList();
+
+fieldNames.add("a");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector);
+

[GitHub] incubator-hivemall pull request #52: [HIVEMALL-78] Implement AUC UDAF for bi...

2017-02-27 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/52#discussion_r103386937
  
--- Diff: core/src/test/java/hivemall/evaluation/AUCUDAFTest.java ---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.evaluation;
+
+import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
+import 
org.apache.hadoop.hive.ql.udf.generic.SimpleGenericUDAFParameterInfo;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+public class AUCUDAFTest {
+AUCUDAF auc;
+GenericUDAFEvaluator evaluator;
+ObjectInspector[] inputOIs;
+ObjectInspector[] partialOI;
+AUCUDAF.ClassificationAUCAggregationBuffer agg;
+
+@Before
+public void setUp() throws Exception {
+auc = new AUCUDAF();
+
+inputOIs = new ObjectInspector[] {
+
PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(
+PrimitiveObjectInspector.PrimitiveCategory.DOUBLE),
+
PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(
+
PrimitiveObjectInspector.PrimitiveCategory.DOUBLE)};
+
+evaluator = auc.getEvaluator(new 
SimpleGenericUDAFParameterInfo(inputOIs, false, false));
+
+ArrayList fieldNames = new ArrayList();
+ArrayList fieldOIs = new 
ArrayList();
+fieldNames.add("a");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector);
+fieldNames.add("scorePrev");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableDoubleObjectInspector);
+fieldNames.add("fp");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableLongObjectInspector);
+fieldNames.add("tp");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableLongObjectInspector);
+fieldNames.add("fpPrev");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableLongObjectInspector);
+fieldNames.add("tpPrev");
+
fieldOIs.add(PrimitiveObjectInspectorFactory.writableLongObjectInspector);
+
+partialOI = new ObjectInspector[2];
+partialOI[0] = 
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
+
+agg = (AUCUDAF.ClassificationAUCAggregationBuffer) 
evaluator.getNewAggregationBuffer();
+}
+
+@Test
+public void test() throws Exception {
+// should be sorted by scores in a descending order
+final double[] scores = new double[] {0.8, 0.7, 0.5, 0.3, 0.2};
+final double[] labels = new double[] {1, 1, 0, 1, 0};
+
+evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL1, inputOIs);
+evaluator.reset(agg);
+
+for (int i = 0; i < scores.length; i++) {
+evaluator.iterate(agg, new Object[] {scores[i], labels[i]});
+}
+
+Assert.assertEquals(0.8, agg.get(), 1e-5);
+}
+
+@Test
+public void testAllTruePositive() throws Exception {
+final double[] scores = new double[] {0.8, 0.7, 0.5, 0.3, 0.2};
+final double[] labels = new double[] {1, 1, 1, 1, 1};
+
+evaluator.init(GenericUDAFEvaluator.Mode.PARTIAL1, inputOIs);
+evaluator.reset(agg);
+
+for (int i = 0; i < scores.length; i++) {
+evaluator.iterate(agg, new Object[] {scores[i], labels[i]});
+}
+
+// AUC for all TP scores 

[GitHub] incubator-hivemall pull request #52: [HIVEMALL-78] Implement AUC UDAF for bi...

2017-02-27 Thread takuti
GitHub user takuti opened a pull request:

https://github.com/apache/incubator-hivemall/pull/52

[HIVEMALL-78] Implement AUC UDAF for binary classification

## What changes were proposed in this pull request?

In addition to current `auc(array, array)` for ranking (myui/hivemall#326), 
this patch supports `auc(double, double)` for binary classification.

## What type of PR is it?

Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-78

## How was this patch tested?

Created unit test for the UDAF, and passed:

```
$ mvn -Dtest=hivemall.evaluation.AUCUDAFTest test
```

Moreover, I have launched manual tests by the following queries:

```sql
with data as (
  select 0.5 as prob, 0 as label
  union all
  select 0.3 as prob, 1 as label
  union all
  select 0.2 as prob, 0 as label
  union all
  select 0.8 as prob, 1 as label
  union all
  select 0.7 as prob, 1 as label
), data_ordered as (
  select prob, label
  from data
  order by prob desc
)
select auc(prob, label)
from (
  select prob, label
  from data_ordered
  distribute by floor(prob / 0.2)
) t;
```

```sql
with data as (
  select 0.5 as prob, 0 as label
  union all
  select 0.3 as prob, 1 as label
  union all
  select 0.2 as prob, 0 as label
  union all
  select 0.8 as prob, 1 as label
  union all
  select 0.7 as prob, 1 as label
), data_ordered as (
  select prob, label
  from data
  order by prob desc
)
select auc(prob, label)
from data_ordered;
```

Both showed `AUC=0.8`. This result is same as [scikit-learn's 
roc_auc_score()](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html):

```
>>> roc_auc_score([0,1,0,1,1],[0.5,0.3,0.2,0.8,0.7])
0.83326
```

## How to use this feature?

See above queries. Input data needs to be ordered by scores in a descending 
order.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/takuti/incubator-hivemall auc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/52.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #52


commit e60ff231e07aa515666ec7f4863ed1c8401e0e27
Author: Takuya Kitazawa 
Date:   2017-02-28T06:08:33Z

Implement AUCUDAF

commit 4756f463700740af0bd51ab7a25e383649a2d504
Author: Takuya Kitazawa 
Date:   2017-02-28T06:09:18Z

Add unit test of AUCUDAF for classification




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #51: [WIP][HIVEMALL-75] Support Sparse Vector Forma...

2017-02-27 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/51
  

[![Coverage 
Status](https://coveralls.io/builds/10345031/badge)](https://coveralls.io/builds/10345031)

Coverage decreased (-0.02%) to 36.332% when pulling 
**d5dfe6c299406d0223d7cad15ebb96fcba432663 on myui:HIVEMALL-75** into 
**19d472b54d611273f6a88d0a8e17eb277fc0e729 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #51: [WIP][HIVEMALL-75] Support Sparse Vector Forma...

2017-02-27 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/51
  

[![Coverage 
Status](https://coveralls.io/builds/10343530/badge)](https://coveralls.io/builds/10343530)

Coverage increased (+0.03%) to 36.381% when pulling 
**051e855d8e0bade6ce34173e2721021039eaebb6 on myui:HIVEMALL-75** into 
**19d472b54d611273f6a88d0a8e17eb277fc0e729 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---