[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user nzw0301 commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136113122 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -217,13 +347,48 @@ private void train(int i, Map Ri, Map topKRatesOfI, int j, Map double eui = rui - predict(u, i, topKRatesOfI, j); gradSum += ruj * eui; rateSum += ruj * ruj; -errs += eui * eui; + +if (this.numIterations > 1){ +this.A.unsafeSet((int) u, j, ruj); // need optimize +} } gradSum /= N; rateSum /= N; -errs /= N; +this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum)); +} + + +//private void train(int i, Map Ri, Map topKRatesOfI, int j, Map Rj) { +//int N = Rj.size(); +//double gradSum = 0.d; +//double rateSum = 0.d; +//double errs = 0.d; +//for (Map.Entry userRate : Rj.entrySet()) { +//Object u = userRate.getKey(); +//double ruj = PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(), +//this.itemJRateValueOI); +//double rui = 0.d; +//if (Ri.containsKey(u)) { +//rui = PrimitiveObjectInspectorUtils.getDouble(Ri.get(u), this.itemIRateValueOI); +//} +// +//double eui = rui - predict(u, i, topKRatesOfI, j); +//gradSum += ruj * eui; +//rateSum += ruj * ruj; +//errs += eui * eui; +//} +// +//gradSum /= N; +//rateSum /= N; +//errs /= N; +// +//this.loss += errs; +//this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum)); +//} + +private double getUpdateTerm(double gradSum, double rateSum){ --- End diff -- Sorry, I missed an error related this definition. This function refers class variables `l1` and `l2`, so I will change it to `getUpdateTerm(final double gradSum, final double rateSum, final double l1, final double l2)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user nzw0301 commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136111887 --- Diff: core/src/main/java/hivemall/math/matrix/sparse/DoKMatrix.java --- @@ -309,6 +309,20 @@ public void eachNonZeroInColumn(@Nonnegative final int col, } } +public void eachNonZeroCell(@Nonnull final VectorProcedure procedure) { +if (nnz == 0) { +return; +} +final IMapIterator itor = elements.entries(); +while (itor.next() != -1) { +long k = itor.getKey(); +int row = Primitives.getHigh(k); +int col = Primitives.getLow(k); +double value = itor.getValue(); +procedure.apply(row, col, value); --- End diff -- Oop, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136094514 --- Diff: core/src/main/java/hivemall/math/matrix/sparse/DoKMatrix.java --- @@ -309,6 +309,20 @@ public void eachNonZeroInColumn(@Nonnegative final int col, } } +public void eachNonZeroCell(@Nonnull final VectorProcedure procedure) { +if (nnz == 0) { +return; +} +final IMapIterator itor = elements.entries(); +while (itor.next() != -1) { +long k = itor.getKey(); +int row = Primitives.getHigh(k); +int col = Primitives.getLow(k); +double value = itor.getValue(); +procedure.apply(row, col, value); --- End diff -- `VectorProcedure#apply(@Nonnegative int row, @Nonnegative int col, double value)` should also be introduced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136030564 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -217,13 +347,48 @@ private void train(int i, Map Ri, Map topKRatesOfI, int j, Map double eui = rui - predict(u, i, topKRatesOfI, j); gradSum += ruj * eui; rateSum += ruj * ruj; -errs += eui * eui; + +if (this.numIterations > 1){ +this.A.unsafeSet((int) u, j, ruj); // need optimize +} } gradSum /= N; rateSum /= N; -errs /= N; +this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum)); +} + + +//private void train(int i, Map Ri, Map topKRatesOfI, int j, Map Rj) { +//int N = Rj.size(); +//double gradSum = 0.d; +//double rateSum = 0.d; +//double errs = 0.d; +//for (Map.Entry userRate : Rj.entrySet()) { +//Object u = userRate.getKey(); +//double ruj = PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(), +//this.itemJRateValueOI); +//double rui = 0.d; +//if (Ri.containsKey(u)) { +//rui = PrimitiveObjectInspectorUtils.getDouble(Ri.get(u), this.itemIRateValueOI); +//} +// +//double eui = rui - predict(u, i, topKRatesOfI, j); +//gradSum += ruj * eui; +//rateSum += ruj * ruj; +//errs += eui * eui; +//} +// +//gradSum /= N; +//rateSum /= N; +//errs /= N; +// +//this.loss += errs; +//this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum)); +//} + +private double getUpdateTerm(double gradSum, double rateSum){ double update = 0.d; if (this.l1 < Math.abs(gradSum)) { if (gradSum > 0.) { --- End diff -- Various representation of 0 such as `0.d` and `0.` Use `0.d` for consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136037465 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -144,11 +208,76 @@ public void process(Object[] args) throws HiveException { Map topKRatesOfI = this.topKRatesOfIOI.getMap(args[2]); int j = PrimitiveObjectInspectorUtils.getInt(args[3], itemJOI); Map Rj = this.itemJRatesOI.getMap(args[4]); -train(i, Ri, topKRatesOfI, j, Rj); +trainAndStore(i, Ri, topKRatesOfI, j, Rj); + +if (this.numIterations == 1) { +return; +} + +if (this.previousItemId != i){ +this.previousItemId = i; + +for (Map.Entry userRate : ((Map) Ri).entrySet()) { +Object u = userRate.getKey(); +double rui = PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(), this.itemIRateValueOI); +this.A.unsafeSet((int) u, i, rui); // need optimize +} + +// save KNNi +// count element size size: i, numKNN, [[u, numKNNu, [[item, rate], ...], ...] +ByteBuffer buf = inputBuf; +NioStatefullSegment dst = fileIO; + +int numElementOfKNNi = 0; +Map knn = this.topKRatesOfIOI.getMap(topKRatesOfI); +for (Map.Entry ri : knn.entrySet()) { +numElementOfKNNi += this.topKRatesOfIValueOI.getMap(ri.getValue()).size(); +} + +int recordBytes = SizeOf.INT + SizeOf.INT + SizeOf.INT * 2 * knn.size() + (SizeOf.DOUBLE+SizeOf.INT) * numElementOfKNNi; +int requiredBytes = SizeOf.INT + recordBytes; // need to allocate space for "recordBytes" itself + +int remain = buf.remaining(); +if (remain < requiredBytes) { +writeBuffer(buf, dst); +} + +buf.putInt(i); +buf.putInt(knn.size()); +for (Map.Entry ri : this.topKRatesOfIOI.getMap(topKRatesOfI).entrySet()){ +int user = PrimitiveObjectInspectorUtils.getInt(ri.getKey(), this.topKRatesOfIKeyOI); +Map userKNN = this.topKRatesOfIValueOI.getMap(ri.getValue()); + +buf.putInt(user); +buf.putInt(userKNN.size()); + +for (Map.Entry ratings : userKNN.entrySet()) { +int item = PrimitiveObjectInspectorUtils.getInt(ratings.getKey(), this.topKRatesOfIValueKeyOI); +double rating = PrimitiveObjectInspectorUtils.getDouble(ratings.getValue(), this.topKRatesOfIValueValueOI); + +buf.putInt(item); +buf.putDouble(rating); +} +} +} +} + +private static void writeBuffer(@Nonnull ByteBuffer srcBuf, @Nonnull NioStatefullSegment dst) +throws HiveException { +srcBuf.flip(); +try { +dst.write(srcBuf); +} catch (IOException e) { +throw new HiveException("Exception causes while writing a buffer to file", e); +} +srcBuf.clear(); } @Override public void close() throws HiveException { + +runIterativeTraining(); + int numItem = Math.max(this.W.numRows(), this.W.numColumns()); --- End diff -- Please add the following method in `DoKMatrix` and use it. ```java public void eachNonZeroCell(@Nonnull final VectorProcedure procedure) { if (nnz == 0) { return; } final IMapIterator itor = elements.entries(); while (itor.next() != -1) { long k = itor.getKey(); int row = Primitives.getHigh(k); int col = Primitives.getLow(k); double value = itor.getValue(); procedure.apply(row, col, value); } } @Override public RowMajorMatrix toRowMajorMatrix() { ``` ```java public abstract class VectorProcedure { ... public void apply(@Nonnegative int row, @Nonnegative int col, double value) {} } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136031740 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -77,6 +106,26 @@ public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgu List fieldNames = new ArrayList<>(); List fieldOIs = new ArrayList<>(); +// initialize temporary file to save knn for iterative training +if (mapredContext != null && numIterations > 1) { +// invoke only at task node (initialize is also invoked in compilation) +final File file; +try { +file = File.createTempFile("hivemall_slim", ".sgmt"); // A, Knn and R +file.deleteOnExit(); +if (!file.canWrite()) { +throw new UDFArgumentException("Cannot write a temporary file: " ++ file.getAbsolutePath()); +} +} catch (IOException ioe) { +throw new UDFArgumentException(ioe); +} catch (Throwable e) { +throw new UDFArgumentException(e); +} +this.fileIO = new NioStatefullSegment(file,false); +this.inputBuf = ByteBuffer.allocateDirect(1024*1024); // 1MB --- End diff -- 1MB might be too small to store KNNi. 4~8MB is enough (?). Estimate it by recordBytes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136027874 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -235,16 +400,12 @@ private void train(int i, Map Ri, Map topKRatesOfI, int j, Map update = 0.; } } - -this.loss += errs; -this.W.unsafeSet(i, j, update); +return update; } -public void resetLoss() { -this.loss = 0.d; -} +private final void runIterativeTraining() throws HiveException { --- End diff -- Remove `final` and make `SlimUDTF` itself final. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136031239 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -144,11 +208,76 @@ public void process(Object[] args) throws HiveException { Map topKRatesOfI = this.topKRatesOfIOI.getMap(args[2]); int j = PrimitiveObjectInspectorUtils.getInt(args[3], itemJOI); Map Rj = this.itemJRatesOI.getMap(args[4]); -train(i, Ri, topKRatesOfI, j, Rj); +trainAndStore(i, Ri, topKRatesOfI, j, Rj); + +if (this.numIterations == 1) { +return; +} + +if (this.previousItemId != i){ --- End diff -- extract methods for better readability. ```java if(previousItemId != i && numIterations > 1) { recordTrainingInput(i, topKRatesOfI); } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #111: [WIP][HIVEMALL-17] Support SLIM
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/111#discussion_r136030303 --- Diff: core/src/main/java/hivemall/recommend/SlimUDTF.java --- @@ -217,13 +347,48 @@ private void train(int i, Map Ri, Map topKRatesOfI, int j, Map double eui = rui - predict(u, i, topKRatesOfI, j); gradSum += ruj * eui; rateSum += ruj * ruj; -errs += eui * eui; + +if (this.numIterations > 1){ +this.A.unsafeSet((int) u, j, ruj); // need optimize +} } gradSum /= N; rateSum /= N; -errs /= N; +this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum)); +} + + +//private void train(int i, Map Ri, Map topKRatesOfI, int j, Map Rj) { +//int N = Rj.size(); +//double gradSum = 0.d; +//double rateSum = 0.d; +//double errs = 0.d; +//for (Map.Entry userRate : Rj.entrySet()) { +//Object u = userRate.getKey(); +//double ruj = PrimitiveObjectInspectorUtils.getDouble(userRate.getValue(), +//this.itemJRateValueOI); +//double rui = 0.d; +//if (Ri.containsKey(u)) { +//rui = PrimitiveObjectInspectorUtils.getDouble(Ri.get(u), this.itemIRateValueOI); +//} +// +//double eui = rui - predict(u, i, topKRatesOfI, j); +//gradSum += ruj * eui; +//rateSum += ruj * ruj; +//errs += eui * eui; +//} +// +//gradSum /= N; +//rateSum /= N; +//errs /= N; +// +//this.loss += errs; +//this.W.unsafeSet(i, j, getUpdateTerm(gradSum, rateSum)); +//} + +private double getUpdateTerm(double gradSum, double rateSum){ --- End diff -- `private static double getUpdateTerm(final double gradSum, final double rateSum, final double l1) {` for inline optimization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #105: [WIP][HIVEMALL-24] Scalable field-aware facto...
Github user coveralls commented on the issue: https://github.com/apache/incubator-hivemall/pull/105 [![Coverage Status](https://coveralls.io/builds/13048972/badge)](https://coveralls.io/builds/13048972) Coverage decreased (-0.3%) to 40.563% when pulling **784ce76cae6acc7682f8c7b90550eae6aee9cb65 on myui:HIVEMALL-24-2** into **7205de1e959f0d9b96ac756e415d8a8ada7e92af on apache:master**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #105: [WIP][HIVEMALL-24] Scalable field-aware facto...
Github user coveralls commented on the issue: https://github.com/apache/incubator-hivemall/pull/105 [![Coverage Status](https://coveralls.io/builds/13048054/badge)](https://coveralls.io/builds/13048054) Coverage decreased (-0.3%) to 40.56% when pulling **54116c53f3ea0d2263162f84c5dbd8795ae2510b on myui:HIVEMALL-24-2** into **7205de1e959f0d9b96ac756e415d8a8ada7e92af on apache:master**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---