Hi SystemML folks, I'm trying to pass some data from Spark to a DML script via the MLContext API. The data is derived from a parquet file containing a dataframe with the schema: [label: Integer, features: SparseVector]. I am doing the following:
val input_data = spark.read.parquet(inputPath) val x = input_data.select("features") val y = input_data.select("y") val x_meta = new MatrixMetadata(DF_VECTOR) val y_meta = new MatrixMetadata(DF_DOUBLES) val script = dmlFromFile(s"${script_path}/script.dml"). in("X", x, x_meta). in("Y", y, y_meta) ... However, this results in an error from SystemML: java.lang.ArrayIndexOutOfBoundsException: 0 I'm guessing this has something to do with SparkML being zero indexed and SystemML being 1 indexed. Is there something I should be doing differently here? Note that I also tried converting the dataframe to a CoordinateMatrix and then creating an RDD[String] in IJV format. That too resulted in "ArrayIndexOutOfBoundsExceptions." I'm guessing there's something simple I'm doing wrong here, but I haven't been able to figure out exactly what. Please let me know if you need more information (I can send along the full error stacktrace if that would be helpful)! Thanks, Anthony