Deron Eriksson created SYSTEMML-1379: ----------------------------------------
Summary: Investigate script metadata to simplify MLContext script interaction Key: SYSTEMML-1379 URL: https://issues.apache.org/jira/browse/SYSTEMML-1379 Project: SystemML Issue Type: Improvement Components: Algorithms, APIs Reporter: Deron Eriksson Assignee: Deron Eriksson Currently many scripts contain usage comments such as the following: {code} # THIS SCRIPT COMPUTES AN APPROXIMATE FACTORIZATIONOF A LOW-RANK MATRIX X INTO TWO MATRICES U AND V # USING ALTERNATING-LEAST-SQUARES (ALS) ALGORITHM WITH CONJUGATE GRADIENT # MATRICES U AND V ARE COMPUTED BY MINIMIZING A LOSS FUNCTION (WITH REGULARIZATION) # # INPUT PARAMETERS: # --------------------------------------------------------------------------------------------- # NAME TYPE DEFAULT MEANING # --------------------------------------------------------------------------------------------- # X String --- Location to read the input matrix X to be factorized # U String --- Location to write the factor matrix U # V String --- Location to write the factor matrix V # rank Int 10 Rank of the factorization # reg String "L2" Regularization: # "L2" = L2 regularization; # "wL2" = weighted L2 regularization # lambda Double 0.000001 Regularization parameter, no regularization if 0.0 # maxi Int 50 Maximum number of iterations # check Boolean FALSE Check for convergence after every iteration, i.e., updating U and V once # thr Double 0.0001 Assuming check is set to TRUE, the algorithm stops and convergence is declared # if the decrease in loss in any two consecutive iterations falls below this threshold; # if check is FALSE thr is ignored # fmt String "text" The output format of the factor matrices L and R, such as "text" or "csv" # --------------------------------------------------------------------------------------------- # OUTPUT: # 1- An m x r matrix U, where r is the factorization rank # 2- An r x n matrix V # # HOW TO INVOKE THIS SCRIPT - EXAMPLE: # hadoop jar SystemML.jar -f ALS-CG.dml -nvargs X=INPUT_DIR/X U=OUTPUT_DIR/U V=OUTPUT_DIR/V rank=10 reg="L2" lambda=0.0001 fmt=csv {code} Comments such as these are difficult to refer to from a programmatic interactive environment such as the Spark Shell. If similar information is provided in a parseable format, such as JSON or XML, it can potentially be parsed and used to provide such information programmatically, such as through the MLContext API in the Spark Shell. -- This message was sent by Atlassian JIRA (v6.3.15#6346)