[ 
https://issues.apache.org/jira/browse/SPARK-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957345#comment-14957345
 ] 

Joseph K. Bradley commented on SPARK-10513:
-------------------------------------------

[~yanboliang]  This is really helpful feedback.  Thanks very much for taking 
the time!  I'll try to list plans for addressing the various issues you found:

1. Here's the closest issue I could find for spark-csv: 
[https://github.com/databricks/spark-csv/issues/48]  Would you mind commenting 
there to try to escalate the issue?

2. What would be your ideal way to write this in the DataFrame API?  Something 
like 
{{train.withColumn(train("label").cast(DoubleType).as("label")).na.drop()}}?  
(I think that almost works now, but I'm not actually sure if the cast works or 
fails when it encounters empty Strings.)

3. Just made a JIRA: [SPARK-11108]

4. Do you mean a completely missing value?  Or do you mean that StringIndexer 
should handle an empty String differently?

5. Multi-value support for transformers: [SPARK-8418]

6. Here's some more detailed discussion which I just wrote down: [SPARK-11106]

I haven't yet looked at your example code, but will try to soon.  Thanks again 
for working on this!

> Springleaf Marketing Response
> -----------------------------
>
>                 Key: SPARK-10513
>                 URL: https://issues.apache.org/jira/browse/SPARK-10513
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Yanbo Liang
>            Assignee: Yanbo Liang
>
> Apply ML pipeline API to Springleaf Marketing Response 
> (https://www.kaggle.com/c/springleaf-marketing-response)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to