Deron Eriksson created SYSTEMML-1379:
----------------------------------------

             Summary: Investigate script metadata to simplify MLContext script 
interaction
                 Key: SYSTEMML-1379
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1379
             Project: SystemML
          Issue Type: Improvement
          Components: Algorithms, APIs
            Reporter: Deron Eriksson
            Assignee: Deron Eriksson


Currently many scripts contain usage comments such as the following:
{code}
# THIS SCRIPT COMPUTES AN APPROXIMATE FACTORIZATIONOF A LOW-RANK MATRIX X INTO 
TWO MATRICES U AND V 
# USING ALTERNATING-LEAST-SQUARES (ALS) ALGORITHM WITH CONJUGATE GRADIENT 
# MATRICES U AND V ARE COMPUTED BY MINIMIZING A LOSS FUNCTION (WITH 
REGULARIZATION)
#
# INPUT   PARAMETERS:
# 
---------------------------------------------------------------------------------------------
# NAME    TYPE     DEFAULT  MEANING
# 
---------------------------------------------------------------------------------------------
# X       String   ---      Location to read the input matrix X to be factorized
# U       String   ---      Location to write the factor matrix U
# V       String   ---      Location to write the factor matrix V
# rank    Int      10       Rank of the factorization
# reg     String   "L2"     Regularization: 
#                           "L2" = L2 regularization;
#                           "wL2" = weighted L2 regularization
# lambda  Double   0.000001 Regularization parameter, no regularization if 0.0
# maxi    Int      50       Maximum number of iterations
# check   Boolean  FALSE    Check for convergence after every iteration, i.e., 
updating U and V once
# thr     Double   0.0001   Assuming check is set to TRUE, the algorithm stops 
and convergence is declared 
#                           if the decrease in loss in any two consecutive 
iterations falls below this threshold; 
#                           if check is FALSE thr is ignored
# fmt     String   "text"   The output format of the factor matrices L and R, 
such as "text" or "csv"
# 
---------------------------------------------------------------------------------------------
# OUTPUT: 
# 1- An m x r matrix U, where r is the factorization rank 
# 2- An r x n matrix V
#
# HOW TO INVOKE THIS SCRIPT - EXAMPLE:
# hadoop jar SystemML.jar -f ALS-CG.dml -nvargs X=INPUT_DIR/X U=OUTPUT_DIR/U 
V=OUTPUT_DIR/V rank=10 reg="L2" lambda=0.0001 fmt=csv
{code}

Comments such as these are difficult to refer to from a programmatic 
interactive environment such as the Spark Shell. If similar information is 
provided in a parseable format, such as JSON or XML, it can potentially be 
parsed and used to provide such information programmatically, such as through 
the MLContext API in the Spark Shell.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to