[systemds] branch main updated: [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project WS2021/22 Closes #1506
This is an automated email from the ASF dual-hosted git repository. baunsgaard pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/systemds.git The following commit(s) were added to refs/heads/main by this push: new 741be73 [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project WS2021/22 Closes #1506 741be73 is described below commit 741be739c8659e67105a6ba66a972b1b3f7d3d11 Author: Magdalena Hinterkoerner AuthorDate: Wed Jan 5 14:26:12 2022 +0100 [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project WS2021/22 Closes #1506 --- scripts/builtin/decisionTreePredict.dml| 149 + .../java/org/apache/sysds/common/Builtins.java | 1 + .../part1/BuiltinDecisionTreePredictTest.java | 87 .../functions/builtin/decisionTreePredict.dml | 25 4 files changed, 262 insertions(+) diff --git a/scripts/builtin/decisionTreePredict.dml b/scripts/builtin/decisionTreePredict.dml new file mode 100644 index 000..48c7f6f --- /dev/null +++ b/scripts/builtin/decisionTreePredict.dml @@ -0,0 +1,149 @@ +#- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#- + +# +# Builtin script implementing prediction based on classification trees with scale features using prediction methods of the +# Hummingbird paper (https://www.usenix.org/system/files/osdi20-nakandala.pdf). +# +# INPUT PARAMETERS: +# - +# NAME TYPE MEANING +# - +# M Matrix[Double] Decision tree matrix M, as generated by scripts/builtin/decisionTree.dml, where each column corresponds +# to a node in the learned tree and each row contains the following information: +# M[1,j]: id of node j (in a complete binary tree) +# M[2,j]: Offset (no. of columns) to left child of j if j is an internal node, otherwise 0 +# M[3,j]: Feature index of the feature (scale feature id if the feature is scale or +# categorical feature id if the feature is categorical) +# that node j looks at if j is an internal node, otherwise 0 +# M[4,j]: Type of the feature that node j looks at if j is an internal node: holds +# the same information as R input vector +# M[5,j]: If j is an internal node: 1 if the feature chosen for j is scale, +# otherwise the size of the subset of values +# stored in rows 6,7,... if j is categorical +# If j is a leaf node: number of misclassified samples reaching at node j +# M[6:,j]: If j is an internal node: Threshold the example's feature value is compared +# to is stored at M[6,j] if the feature chosen for j is scale, +# otherwise if the feature chosen for j is categorical rows 6,7,... depict the value subset chosen for j +# If j is a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0 +# +# X Matrix[Double]Feature matrix X +# +# strategy StringPrediction strategy, can be one of ["GEMM", "TT", "PTT"], referring to "Generic matrix multiplication", +# "Tree traversal", and "Perfect tree traversal", respectively +# --- +# OUTPUT: +# - +# NAME TYPEMEANING +#
[systemds] branch main updated (741be73 -> 8978e13)
This is an automated email from the ASF dual-hosted git repository. baunsgaard pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/systemds.git. from 741be73 [SYSTEMDS-3149] Decision Tree Prediction Builtin DIA project WS2021/22 Closes #1506 add c690b78 [MINOR] Set language level to 11 new 8978e13 [MINOR] Update spark and hadoop for security issues The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: pom.xml | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-)
[systemds] 01/01: [MINOR] Update spark and hadoop for security issues
This is an automated email from the ASF dual-hosted git repository. baunsgaard pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/systemds.git commit 8978e135a7b6467c4796a82d2c7440f2e1ba6be2 Author: baunsgaard AuthorDate: Mon Nov 8 11:50:08 2021 +0100 [MINOR] Update spark and hadoop for security issues spark 3.0.0 -> 3.2.0 hadoop 3.0.0 -> 3.3.1 The specific version changes are based on the spark release versions. https://github.com/apache/spark/releases/tag/v3.2.0 Closes #1444 --- pom.xml | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/pom.xml b/pom.xml index 2449c83..51f38c2 100644 --- a/pom.xml +++ b/pom.xml @@ -39,9 +39,10 @@ - 3.0.0 - 4.5.3 - 3.0.0 + 3.3.1 + + 4.8 + 3.2.0 2.12.0 2.12 -MM-dd HH:mm:ss z @@ -1045,7 +1046,7 @@ com.fasterxml.jackson.core jackson-databind - 2.10.0 + 2.12.3 @@ -1074,7 +1075,7 @@ org.codehaus.janino janino - 3.0.8 + 3.0.16 provided @@ -1107,7 +1108,7 @@ io.netty netty-all - 4.1.47.Final + 4.1.68.Final provided
[systemds] branch main updated: [SYSTEMDS-2832] Refactoring of old performance benchmarks
This is an automated email from the ASF dual-hosted git repository. baunsgaard pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/systemds.git The following commit(s) were added to refs/heads/main by this push: new 79be9e9 [SYSTEMDS-2832] Refactoring of old performance benchmarks 79be9e9 is described below commit 79be9e96b1891ef6be1e121b2fff91aed00dc4f0 Author: David Sandru AuthorDate: Tue Nov 30 13:47:33 2021 +0100 [SYSTEMDS-2832] Refactoring of old performance benchmarks This commit extensively modify the performance benchmarks to use the builtin functions. also added is arguments to execute the entire benchmark within specific memory budgets. DIA project WS2021/22 Closes #1481 In detail: - Refactored old statistics benchmarks and changed them to use built-in functions. - Improved logging management for benchmark outputs - ALS conjugate gradient and direct solve benchmark with prediction. - Added forced execution in specific folder. --- scripts/datagen/genRandData4PCA.dml| 4 +- scripts/perftest/CHANGES.md| 57 -- scripts/perftest/MatrixMult.sh | 47 ++- scripts/perftest/MatrixTranspose.sh| 59 -- .../{scripts/transpose.dml => conf/env-variables} | 11 +-- scripts/perftest/conf/log4j-off.properties | 10 +-- scripts/perftest/conf/log4j.properties | 56 ++--- scripts/perftest/fed/genALS_FedData.sh | 56 + scripts/perftest/fed/runALSFed.sh | 36 ++--- .../perftest/{runALS.sh => fed/runALS_CG_Fed.sh} | 13 ++- scripts/perftest/fed/runAllFed.sh | 7 +- scripts/perftest/genALSData.sh | 58 +- scripts/perftest/genBinomialData.sh| 66 ++-- scripts/perftest/genClusteringData.sh | 66 scripts/perftest/genDescriptiveStatisticsData.sh | 60 ++ .../{todo => }/genDimensionReductionData.sh| 38 ++--- scripts/perftest/genL2SVMData.sh | 6 ++ scripts/perftest/genMultinomialData.sh | 62 +-- scripts/perftest/genStratStatisticsData.sh | 59 ++ scripts/perftest/{runALS.sh => runALS_CG.sh} | 29 +-- scripts/perftest/{runALS.sh => runALS_DS.sh} | 31 ++-- scripts/perftest/runAll.sh | 75 +++--- .../{runAllMultinomial.sh => runAllALS.sh} | 44 +-- scripts/perftest/runAllBinomial.sh | 15 +++- scripts/perftest/{todo => }/runAllClustering.sh| 37 + .../{todo => }/runAllDimensionReduction.sh | 30 --- scripts/perftest/runAllMultinomial.sh | 17 +++- scripts/perftest/runAllRegression.sh | 17 +++- scripts/perftest/{todo => }/runAllStats.sh | 43 +- scripts/perftest/{todo => }/runBivarStats.sh | 21 +++-- scripts/perftest/runGLM_binomial_probit.sh | 7 +- scripts/perftest/runGLM_gamma_log.sh | 7 +- scripts/perftest/runGLM_poisson_log.sh | 7 +- .../perftest/{runNaiveBayes.sh => runKmeans.sh}| 32 scripts/perftest/runL2SVM.sh | 6 ++ scripts/perftest/runLinearRegCG.sh | 7 +- scripts/perftest/runLinearRegDS.sh | 7 +- scripts/perftest/runMSVM.sh| 8 +- scripts/perftest/runMultiLogReg.sh | 7 +- scripts/perftest/runNaiveBayes.sh | 8 +- scripts/perftest/{todo => }/runPCA.sh | 21 +++-- scripts/perftest/{todo => }/runStratStats.sh | 22 -- scripts/perftest/{todo => }/runUnivarStats.sh | 23 -- .../scripts/{transpose.dml => Kmeans-predict.dml} | 11 +-- .../perftest/scripts/{transpose.dml => Kmeans.dml} | 15 ++-- scripts/perftest/scripts/MM.dml| 2 +- scripts/perftest/scripts/{alsCG.dml => PCA.dml}| 31 .../scripts/{transpose.dml => Univar-Stats.dml}| 10 +-- .../scripts/{transpose.dml => als-predict.dml} | 23 +- scripts/perftest/scripts/alsCG.dml | 10 +-- scripts/perftest/scripts/{alsCG.dml => alsDS.dml} | 11 ++- .../scripts/{alsCG.dml => bivar-stats.dml} | 26 +++--- .../scripts/{transpose.dml => stratstats.dml} | 14 ++-- scripts/perftest/scripts/transpose.dml | 2 +- scripts/perftest/todo/genClusteringData.sh | 52 .../perftest/todo/genDescriptiveStatisticsData.sh | 46 --- scripts/perftest/todo/genRandLogRegData_LTStats.sh | 0 scripts/perftest/todo/genStratStatisticsData.sh| 41 -- scripts/perftest/todo/genTreeData.sh | 15 ++-- scripts/perftest/todo/runAllTrees.sh | 2 +- scripts/perftest/todo/runDecTree.sh|