[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread nzw0301
Github user nzw0301 commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136113122
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -217,13 +347,48 @@ private void train(int i, Map Ri, Map 
topKRatesOfI, int j, Map
 double eui = rui - predict(u, i, topKRatesOfI, j);
 gradSum += ruj * eui;
 rateSum += ruj * ruj;
-errs += eui * eui;
+
+if (this.numIterations > 1){
+this.A.unsafeSet((int) u, j, ruj); // need optimize
+}
 }
 
 gradSum /= N;
 rateSum /= N;
-errs /= N;
 
+this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum));
+}
+
+
+//private void train(int i, Map Ri, Map topKRatesOfI, int 
j, Map Rj) {
+//int N = Rj.size();
+//double gradSum = 0.d;
+//double rateSum = 0.d;
+//double errs = 0.d;
+//for (Map.Entry userRate : Rj.entrySet()) {
+//Object u = userRate.getKey();
+//double ruj = 
PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(),
+//this.itemJRateValueOI);
+//double rui = 0.d;
+//if (Ri.containsKey(u)) {
+//rui = PrimitiveObjectInspectorUtils.getDouble(Ri.get(u), 
this.itemIRateValueOI);
+//}
+//
+//double eui = rui - predict(u, i, topKRatesOfI, j);
+//gradSum += ruj * eui;
+//rateSum += ruj * ruj;
+//errs += eui * eui;
+//}
+//
+//gradSum /= N;
+//rateSum /= N;
+//errs /= N;
+//
+//this.loss += errs;
+//this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum));
+//}
+
+private double getUpdateTerm(double gradSum, double rateSum){
--- End diff --

Sorry, I missed an error related this definition.
This function refers class variables `l1` and `l2`, so I will change it to 
`getUpdateTerm(final double gradSum, final double rateSum, final double l1, 
final double l2)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread nzw0301
Github user nzw0301 commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136111887
  
--- Diff: core/src/main/java/hivemall/math/matrix/sparse/DoKMatrix.java ---
@@ -309,6 +309,20 @@ public void eachNonZeroInColumn(@Nonnegative final int 
col,
 }
 }
 
+public void eachNonZeroCell(@Nonnull final VectorProcedure procedure) {
+if (nnz == 0) {
+return;
+}
+final IMapIterator itor = elements.entries();
+while (itor.next() != -1) {
+long k = itor.getKey();
+int row = Primitives.getHigh(k);
+int col = Primitives.getLow(k);
+double value = itor.getValue();
+procedure.apply(row, col, value);
--- End diff --

Oop, Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136094514
  
--- Diff: core/src/main/java/hivemall/math/matrix/sparse/DoKMatrix.java ---
@@ -309,6 +309,20 @@ public void eachNonZeroInColumn(@Nonnegative final int 
col,
 }
 }
 
+public void eachNonZeroCell(@Nonnull final VectorProcedure procedure) {
+if (nnz == 0) {
+return;
+}
+final IMapIterator itor = elements.entries();
+while (itor.next() != -1) {
+long k = itor.getKey();
+int row = Primitives.getHigh(k);
+int col = Primitives.getLow(k);
+double value = itor.getValue();
+procedure.apply(row, col, value);
--- End diff --

`VectorProcedure#apply(@Nonnegative int row, @Nonnegative int col, double 
value)` should also be introduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136030564
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -217,13 +347,48 @@ private void train(int i, Map Ri, Map 
topKRatesOfI, int j, Map
 double eui = rui - predict(u, i, topKRatesOfI, j);
 gradSum += ruj * eui;
 rateSum += ruj * ruj;
-errs += eui * eui;
+
+if (this.numIterations > 1){
+this.A.unsafeSet((int) u, j, ruj); // need optimize
+}
 }
 
 gradSum /= N;
 rateSum /= N;
-errs /= N;
 
+this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum));
+}
+
+
+//private void train(int i, Map Ri, Map topKRatesOfI, int 
j, Map Rj) {
+//int N = Rj.size();
+//double gradSum = 0.d;
+//double rateSum = 0.d;
+//double errs = 0.d;
+//for (Map.Entry userRate : Rj.entrySet()) {
+//Object u = userRate.getKey();
+//double ruj = 
PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(),
+//this.itemJRateValueOI);
+//double rui = 0.d;
+//if (Ri.containsKey(u)) {
+//rui = PrimitiveObjectInspectorUtils.getDouble(Ri.get(u), 
this.itemIRateValueOI);
+//}
+//
+//double eui = rui - predict(u, i, topKRatesOfI, j);
+//gradSum += ruj * eui;
+//rateSum += ruj * ruj;
+//errs += eui * eui;
+//}
+//
+//gradSum /= N;
+//rateSum /= N;
+//errs /= N;
+//
+//this.loss += errs;
+//this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum));
+//}
+
+private double getUpdateTerm(double gradSum, double rateSum){
 double update = 0.d;
 if (this.l1 < Math.abs(gradSum)) {
 if (gradSum > 0.) {
--- End diff --

Various representation of 0 such as `0.d` and `0.`  Use `0.d` for 
consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136037465
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -144,11 +208,76 @@ public void process(Object[] args) throws 
HiveException {
 Map topKRatesOfI = this.topKRatesOfIOI.getMap(args[2]);
 int j = PrimitiveObjectInspectorUtils.getInt(args[3], itemJOI);
 Map Rj = this.itemJRatesOI.getMap(args[4]);
-train(i, Ri, topKRatesOfI, j, Rj);
+trainAndStore(i, Ri, topKRatesOfI, j, Rj);
+
+if (this.numIterations == 1) {
+return;
+}
+
+if (this.previousItemId != i){
+this.previousItemId = i;
+
+for (Map.Entry userRate : ((Map) Ri).entrySet()) {
+Object u = userRate.getKey();
+double rui = 
PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(), 
this.itemIRateValueOI);
+this.A.unsafeSet((int) u, i, rui); // need optimize
+}
+
+// save KNNi
+// count element size size: i, numKNN, [[u, numKNNu, [[item, 
rate], ...], ...]
+ByteBuffer buf = inputBuf;
+NioStatefullSegment dst = fileIO;
+
+int numElementOfKNNi = 0;
+Map knn = this.topKRatesOfIOI.getMap(topKRatesOfI);
+for (Map.Entry ri : knn.entrySet()) {
+numElementOfKNNi += 
this.topKRatesOfIValueOI.getMap(ri.getValue()).size();
+}
+
+int recordBytes = SizeOf.INT + SizeOf.INT + SizeOf.INT * 2 * 
knn.size() + (SizeOf.DOUBLE+SizeOf.INT) * numElementOfKNNi;
+int requiredBytes = SizeOf.INT + recordBytes; // need to 
allocate space for "recordBytes" itself
+
+int remain = buf.remaining();
+if (remain < requiredBytes) {
+writeBuffer(buf, dst);
+}
+
+buf.putInt(i);
+buf.putInt(knn.size());
+for (Map.Entry ri : 
this.topKRatesOfIOI.getMap(topKRatesOfI).entrySet()){
+int user = 
PrimitiveObjectInspectorUtils.getInt(ri.getKey(), this.topKRatesOfIKeyOI);
+Map userKNN = 
this.topKRatesOfIValueOI.getMap(ri.getValue());
+
+buf.putInt(user);
+buf.putInt(userKNN.size());
+
+for (Map.Entry ratings : userKNN.entrySet()) {
+int item = 
PrimitiveObjectInspectorUtils.getInt(ratings.getKey(), 
this.topKRatesOfIValueKeyOI);
+double rating = 
PrimitiveObjectInspectorUtils.getDouble(ratings.getValue(), 
this.topKRatesOfIValueValueOI);
+
+buf.putInt(item);
+buf.putDouble(rating);
+}
+}
+}
+}
+
+private static void writeBuffer(@Nonnull ByteBuffer srcBuf, @Nonnull 
NioStatefullSegment dst)
+throws HiveException {
+srcBuf.flip();
+try {
+dst.write(srcBuf);
+} catch (IOException e) {
+throw new HiveException("Exception causes while writing a 
buffer to file", e);
+}
+srcBuf.clear();
 }
 
 @Override
 public void close() throws HiveException {
+
+runIterativeTraining();
+
 int numItem = Math.max(this.W.numRows(), this.W.numColumns());
--- End diff --

Please add the following method in `DoKMatrix` and use it.

```java
public void eachNonZeroCell(@Nonnull final VectorProcedure procedure) {
if (nnz == 0) {
return;
}
final IMapIterator itor = elements.entries();
while (itor.next() != -1) {
long k = itor.getKey();
int row = Primitives.getHigh(k);
int col = Primitives.getLow(k);
double value = itor.getValue();
procedure.apply(row, col, value);
}
}

@Override
public RowMajorMatrix toRowMajorMatrix() {
```

```java
public abstract class VectorProcedure {
...
public void apply(@Nonnegative int row, @Nonnegative int col, double 
value) {}

}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136031740
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -77,6 +106,26 @@ public StructObjectInspector 
initialize(ObjectInspector[] argOIs) throws UDFArgu
 List fieldNames = new ArrayList<>();
 List fieldOIs = new ArrayList<>();
 
+// initialize temporary file to save knn for iterative training
+if (mapredContext != null && numIterations > 1) {
+// invoke only at task node (initialize is also invoked in 
compilation)
+final File file;
+try {
+file = File.createTempFile("hivemall_slim", ".sgmt"); // 
A, Knn and R
+file.deleteOnExit();
+if (!file.canWrite()) {
+throw new UDFArgumentException("Cannot write a 
temporary file: "
++ file.getAbsolutePath());
+}
+} catch (IOException ioe) {
+throw new UDFArgumentException(ioe);
+} catch (Throwable e) {
+throw new UDFArgumentException(e);
+}
+this.fileIO = new NioStatefullSegment(file,false);
+this.inputBuf = ByteBuffer.allocateDirect(1024*1024); // 1MB
--- End diff --

1MB might be too small to store KNNi. 4~8MB is enough (?). Estimate it by 
recordBytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136027874
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -235,16 +400,12 @@ private void train(int i, Map Ri, Map 
topKRatesOfI, int j, Map
 update = 0.;
 }
 }
-
-this.loss += errs;
-this.W.unsafeSet(i, j, update);
+return update;
 }
 
-public void resetLoss() {
-this.loss = 0.d;
-}
+private final void runIterativeTraining() throws HiveException {
--- End diff --

Remove `final` and make `SlimUDTF` itself final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136031239
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -144,11 +208,76 @@ public void process(Object[] args) throws 
HiveException {
 Map topKRatesOfI = this.topKRatesOfIOI.getMap(args[2]);
 int j = PrimitiveObjectInspectorUtils.getInt(args[3], itemJOI);
 Map Rj = this.itemJRatesOI.getMap(args[4]);
-train(i, Ri, topKRatesOfI, j, Rj);
+trainAndStore(i, Ri, topKRatesOfI, j, Rj);
+
+if (this.numIterations == 1) {
+return;
+}
+
+if (this.previousItemId != i){
--- End diff --

extract methods for better readability.

```java
if(previousItemId != i && numIterations > 1) {
   recordTrainingInput(i, topKRatesOfI);
} 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM

2017-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/111#discussion_r136030303
  
--- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java ---
@@ -217,13 +347,48 @@ private void train(int i, Map Ri, Map 
topKRatesOfI, int j, Map
 double eui = rui - predict(u, i, topKRatesOfI, j);
 gradSum += ruj * eui;
 rateSum += ruj * ruj;
-errs += eui * eui;
+
+if (this.numIterations > 1){
+this.A.unsafeSet((int) u, j, ruj); // need optimize
+}
 }
 
 gradSum /= N;
 rateSum /= N;
-errs /= N;
 
+this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum));
+}
+
+
+//private void train(int i, Map Ri, Map topKRatesOfI, int 
j, Map Rj) {
+//int N = Rj.size();
+//double gradSum = 0.d;
+//double rateSum = 0.d;
+//double errs = 0.d;
+//for (Map.Entry userRate : Rj.entrySet()) {
+//Object u = userRate.getKey();
+//double ruj = 
PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(),
+//this.itemJRateValueOI);
+//double rui = 0.d;
+//if (Ri.containsKey(u)) {
+//rui = PrimitiveObjectInspectorUtils.getDouble(Ri.get(u), 
this.itemIRateValueOI);
+//}
+//
+//double eui = rui - predict(u, i, topKRatesOfI, j);
+//gradSum += ruj * eui;
+//rateSum += ruj * ruj;
+//errs += eui * eui;
+//}
+//
+//gradSum /= N;
+//rateSum /= N;
+//errs /= N;
+//
+//this.loss += errs;
+//this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum));
+//}
+
+private double getUpdateTerm(double gradSum, double rateSum){
--- End diff --

`private static double getUpdateTerm(final double gradSum, final double 
rateSum, final double l1) {`
for inline optimization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #105: [WIP][HIVEMALL-24] Scalable field-aware facto...

2017-08-30 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/105
  

[![Coverage 
Status](https://coveralls.io/builds/13048972/badge)](https://coveralls.io/builds/13048972)

Coverage decreased (-0.3%) to 40.563% when pulling 
**784ce76cae6acc7682f8c7b90550eae6aee9cb65 on myui:HIVEMALL-24-2** into 
**7205de1e959f0d9b96ac756e415d8a8ada7e92af on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #105: [WIP][HIVEMALL-24] Scalable field-aware facto...

2017-08-30 Thread coveralls
Github user coveralls commented on the issue:

https://github.com/apache/incubator-hivemall/pull/105
  

[![Coverage 
Status](https://coveralls.io/builds/13048054/badge)](https://coveralls.io/builds/13048054)

Coverage decreased (-0.3%) to 40.56% when pulling 
**54116c53f3ea0d2263162f84c5dbd8795ae2510b on myui:HIVEMALL-24-2** into 
**7205de1e959f0d9b96ac756e415d8a8ada7e92af on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---