[ https://issues.apache.org/jira/browse/SPARK-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957345#comment-14957345 ]
Joseph K. Bradley commented on SPARK-10513: ------------------------------------------- [~yanboliang] This is really helpful feedback. Thanks very much for taking the time! I'll try to list plans for addressing the various issues you found: 1. Here's the closest issue I could find for spark-csv: [https://github.com/databricks/spark-csv/issues/48] Would you mind commenting there to try to escalate the issue? 2. What would be your ideal way to write this in the DataFrame API? Something like {{train.withColumn(train("label").cast(DoubleType).as("label")).na.drop()}}? (I think that almost works now, but I'm not actually sure if the cast works or fails when it encounters empty Strings.) 3. Just made a JIRA: [SPARK-11108] 4. Do you mean a completely missing value? Or do you mean that StringIndexer should handle an empty String differently? 5. Multi-value support for transformers: [SPARK-8418] 6. Here's some more detailed discussion which I just wrote down: [SPARK-11106] I haven't yet looked at your example code, but will try to soon. Thanks again for working on this! > Springleaf Marketing Response > ----------------------------- > > Key: SPARK-10513 > URL: https://issues.apache.org/jira/browse/SPARK-10513 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Yanbo Liang > Assignee: Yanbo Liang > > Apply ML pipeline API to Springleaf Marketing Response > (https://www.kaggle.com/c/springleaf-marketing-response) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org