[systemds] branch main updated: [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project WS2021/22 Closes #1506

2022-01-20 Thread baunsgaard
This is an automated email from the ASF dual-hosted git repository.

baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/main by this push:
 new 741be73  [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project 
WS2021/22 Closes #1506
741be73 is described below

commit 741be739c8659e67105a6ba66a972b1b3f7d3d11
Author: Magdalena Hinterkoerner 
AuthorDate: Wed Jan 5 14:26:12 2022 +0100

[SYSTEMDS-3149] Decision Tree Prediction Builtin
DIA project WS2021/22
Closes #1506
---
 scripts/builtin/decisionTreePredict.dml| 149 +
 .../java/org/apache/sysds/common/Builtins.java |   1 +
 .../part1/BuiltinDecisionTreePredictTest.java  |  87 
 .../functions/builtin/decisionTreePredict.dml  |  25 
 4 files changed, 262 insertions(+)

diff --git a/scripts/builtin/decisionTreePredict.dml 
b/scripts/builtin/decisionTreePredict.dml
new file mode 100644
index 000..48c7f6f
--- /dev/null
+++ b/scripts/builtin/decisionTreePredict.dml
@@ -0,0 +1,149 @@
+#-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-
+
+#
+# Builtin script implementing prediction based on classification trees with 
scale features using prediction methods of the
+# Hummingbird paper (https://www.usenix.org/system/files/osdi20-nakandala.pdf).
+#
+# INPUT   PARAMETERS:
+# 
-
+#  NAME  TYPE   MEANING
+# 
-
+#  M Matrix[Double] Decision tree matrix M, as generated by 
scripts/builtin/decisionTree.dml, where each column corresponds 
+#   to a node in the learned tree and each row 
contains the following information:
+#   M[1,j]: id of node j (in a complete binary 
tree)
+#   M[2,j]: Offset (no. of columns) to left 
child of j if j is an internal node, otherwise 0
+#   M[3,j]: Feature index of the feature 
(scale feature id if the feature is scale or
+#   categorical feature id if the feature is 
categorical)
+#   that node j looks at if j is an internal 
node, otherwise 0
+#   M[4,j]: Type of the feature that node j 
looks at if j is an internal node: holds
+#   the same information as R input vector
+#   M[5,j]: If j is an internal node: 1 if the 
feature chosen for j is scale,
+#   otherwise the size of the subset of values
+#   stored in rows 6,7,... if j is categorical
+#   If j is a leaf node: number of 
misclassified samples reaching at node j
+#   M[6:,j]: If j is an internal node: 
Threshold the example's feature value is compared
+#   to is stored at M[6,j] if the feature 
chosen for j is scale,
+#   otherwise if the feature chosen for j is 
categorical rows 6,7,... depict the value subset chosen for j
+#   If j is a leaf node 1 if j is impure and 
the number of samples at j > threshold, otherwise 0
+#
+#  X Matrix[Double]Feature matrix X
+#
+#  strategy  StringPrediction strategy, can be one of ["GEMM", 
"TT", "PTT"], referring to "Generic matrix multiplication", 
+# "Tree traversal", and "Perfect tree 
traversal", respectively
+# 
---
+# OUTPUT:
+# 
-
+#  NAME TYPEMEANING
+# 

[systemds] branch main updated (741be73 -> 8978e13)

2022-01-20 Thread baunsgaard
This is an automated email from the ASF dual-hosted git repository.

baunsgaard pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git.


from 741be73  [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project 
WS2021/22 Closes #1506
 add c690b78  [MINOR] Set language level to 11
 new 8978e13  [MINOR] Update spark and hadoop for security issues

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 pom.xml | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)


[systemds] 01/01: [MINOR] Update spark and hadoop for security issues

2022-01-20 Thread baunsgaard
This is an automated email from the ASF dual-hosted git repository.

baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git

commit 8978e135a7b6467c4796a82d2c7440f2e1ba6be2
Author: baunsgaard 
AuthorDate: Mon Nov 8 11:50:08 2021 +0100

[MINOR] Update spark and hadoop for security issues

spark 3.0.0 -> 3.2.0
hadoop 3.0.0 -> 3.3.1

The specific version changes are based on the spark release versions.
https://github.com/apache/spark/releases/tag/v3.2.0

Closes #1444
---
 pom.xml | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/pom.xml b/pom.xml
index 2449c83..51f38c2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -39,9 +39,10 @@

 

-   3.0.0
-   4.5.3
-   3.0.0
+   3.3.1
+   
+   4.8
+   3.2.0
2.12.0
2.12
-MM-dd HH:mm:ss 
z
@@ -1045,7 +1046,7 @@

com.fasterxml.jackson.core
jackson-databind
-   2.10.0
+   2.12.3

 

@@ -1074,7 +1075,7 @@

org.codehaus.janino
janino
-   3.0.8
+   3.0.16
provided

 
@@ -1107,7 +1108,7 @@

io.netty
netty-all
-   4.1.47.Final
+   4.1.68.Final
provided

 


[systemds] branch main updated: [SYSTEMDS-2832] Refactoring of old performance benchmarks

2022-01-20 Thread baunsgaard
This is an automated email from the ASF dual-hosted git repository.

baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/main by this push:
 new 79be9e9  [SYSTEMDS-2832] Refactoring of old performance benchmarks
79be9e9 is described below

commit 79be9e96b1891ef6be1e121b2fff91aed00dc4f0
Author: David Sandru 
AuthorDate: Tue Nov 30 13:47:33 2021 +0100

[SYSTEMDS-2832] Refactoring of old performance benchmarks

This commit extensively modify the performance benchmarks to use
the builtin functions. also added is arguments to execute the entire
benchmark within specific memory budgets.

DIA project WS2021/22

Closes #1481

In detail:

- Refactored old statistics benchmarks and changed them to use built-in 
functions.
- Improved logging management for benchmark outputs
- ALS conjugate gradient and direct solve benchmark with prediction.
- Added forced execution in specific folder.
---
 scripts/datagen/genRandData4PCA.dml|  4 +-
 scripts/perftest/CHANGES.md| 57 --
 scripts/perftest/MatrixMult.sh | 47 ++-
 scripts/perftest/MatrixTranspose.sh| 59 --
 .../{scripts/transpose.dml => conf/env-variables}  | 11 +--
 scripts/perftest/conf/log4j-off.properties | 10 +--
 scripts/perftest/conf/log4j.properties | 56 ++---
 scripts/perftest/fed/genALS_FedData.sh | 56 +
 scripts/perftest/fed/runALSFed.sh  | 36 ++---
 .../perftest/{runALS.sh => fed/runALS_CG_Fed.sh}   | 13 ++-
 scripts/perftest/fed/runAllFed.sh  |  7 +-
 scripts/perftest/genALSData.sh | 58 +-
 scripts/perftest/genBinomialData.sh| 66 ++--
 scripts/perftest/genClusteringData.sh  | 66 
 scripts/perftest/genDescriptiveStatisticsData.sh   | 60 ++
 .../{todo => }/genDimensionReductionData.sh| 38 ++---
 scripts/perftest/genL2SVMData.sh   |  6 ++
 scripts/perftest/genMultinomialData.sh | 62 +--
 scripts/perftest/genStratStatisticsData.sh | 59 ++
 scripts/perftest/{runALS.sh => runALS_CG.sh}   | 29 +--
 scripts/perftest/{runALS.sh => runALS_DS.sh}   | 31 ++--
 scripts/perftest/runAll.sh | 75 +++---
 .../{runAllMultinomial.sh => runAllALS.sh} | 44 +--
 scripts/perftest/runAllBinomial.sh | 15 +++-
 scripts/perftest/{todo => }/runAllClustering.sh| 37 +
 .../{todo => }/runAllDimensionReduction.sh | 30 ---
 scripts/perftest/runAllMultinomial.sh  | 17 +++-
 scripts/perftest/runAllRegression.sh   | 17 +++-
 scripts/perftest/{todo => }/runAllStats.sh | 43 +-
 scripts/perftest/{todo => }/runBivarStats.sh   | 21 +++--
 scripts/perftest/runGLM_binomial_probit.sh |  7 +-
 scripts/perftest/runGLM_gamma_log.sh   |  7 +-
 scripts/perftest/runGLM_poisson_log.sh |  7 +-
 .../perftest/{runNaiveBayes.sh => runKmeans.sh}| 32 
 scripts/perftest/runL2SVM.sh   |  6 ++
 scripts/perftest/runLinearRegCG.sh |  7 +-
 scripts/perftest/runLinearRegDS.sh |  7 +-
 scripts/perftest/runMSVM.sh|  8 +-
 scripts/perftest/runMultiLogReg.sh |  7 +-
 scripts/perftest/runNaiveBayes.sh  |  8 +-
 scripts/perftest/{todo => }/runPCA.sh  | 21 +++--
 scripts/perftest/{todo => }/runStratStats.sh   | 22 --
 scripts/perftest/{todo => }/runUnivarStats.sh  | 23 --
 .../scripts/{transpose.dml => Kmeans-predict.dml}  | 11 +--
 .../perftest/scripts/{transpose.dml => Kmeans.dml} | 15 ++--
 scripts/perftest/scripts/MM.dml|  2 +-
 scripts/perftest/scripts/{alsCG.dml => PCA.dml}| 31 
 .../scripts/{transpose.dml => Univar-Stats.dml}| 10 +--
 .../scripts/{transpose.dml => als-predict.dml} | 23 +-
 scripts/perftest/scripts/alsCG.dml | 10 +--
 scripts/perftest/scripts/{alsCG.dml => alsDS.dml}  | 11 ++-
 .../scripts/{alsCG.dml => bivar-stats.dml} | 26 +++---
 .../scripts/{transpose.dml => stratstats.dml}  | 14 ++--
 scripts/perftest/scripts/transpose.dml |  2 +-
 scripts/perftest/todo/genClusteringData.sh | 52 
 .../perftest/todo/genDescriptiveStatisticsData.sh  | 46 ---
 scripts/perftest/todo/genRandLogRegData_LTStats.sh |  0
 scripts/perftest/todo/genStratStatisticsData.sh| 41 --
 scripts/perftest/todo/genTreeData.sh   | 15 ++--
 scripts/perftest/todo/runAllTrees.sh   |  2 +-
 scripts/perftest/todo/runDecTree.sh|