[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564828#comment-15564828 ] Sean Owen commented on SPARK-17219: --- Yeah, unless you return some complex object with normal buckets

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564786#comment-15564786 ] Apache Spark commented on SPARK-17219: -- User 'VinceShieh' has created a pull request for this issue:

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-10 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563247#comment-15563247 ] Joseph K. Bradley commented on SPARK-17219: --- True, there is an order created there, but the

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-09 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559597#comment-15559597 ] Vincent commented on SPARK-17219: - No problem. I will try to submit another PR based on above

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557755#comment-15557755 ] Sean Owen commented on SPARK-17219: --- @Vincent thanks for your perseverance. I know we discussed this at

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557034#comment-15557034 ] Vincent commented on SPARK-17219: - [~josephkb] [~srowen] [~timhunter] let me know what I can do to help

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557021#comment-15557021 ] Vincent commented on SPARK-17219: - in this PR(https://github.com/apache/spark/pull/14858) NaN values are

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556130#comment-15556130 ] Joseph K. Bradley commented on SPARK-17219: --- That does make sense, and I agree we should

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1968#comment-1968 ] Barry Becker commented on SPARK-17219: -- I'll make another attempt to clarify my use case. Nulls are

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1925#comment-1925 ] Sean Owen commented on SPARK-17219: --- OK. Let me just work up a patch to fix forward rather than bother

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-07 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1883#comment-1883 ] Joseph K. Bradley commented on SPARK-17219: --- I agree with [~thunterdb]. The 2 ways of handling

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553503#comment-15553503 ] Sean Owen commented on SPARK-17219: --- I could go this way too. I ended up sympathizing with trying to

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-10-06 Thread Timothy Hunter (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553490#comment-15553490 ] Timothy Hunter commented on SPARK-17219: If I understand correctly the PR, I am concerned by this

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445873#comment-15445873 ] Vincent commented on SPARK-17219: - Cool. I will refine the patch. thanks [~srowen] :) >

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445862#comment-15445862 ] Sean Owen commented on SPARK-17219: --- Agree, and that's a reasonable requirement for any implementation.

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445858#comment-15445858 ] Vincent commented on SPARK-17219: - yes, discretizer can do it easily, especially if only

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445820#comment-15445820 ] Sean Owen commented on SPARK-17219: --- No, the discretizer can do this easily, right? The discretizer

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445808#comment-15445808 ] Vincent commented on SPARK-17219: - then we have to shift this work to user, who needs to filter out the

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445779#comment-15445779 ] Sean Owen commented on SPARK-17219: --- No, there's no meaning to a split bounded by NaN. However, it's

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445768#comment-15445768 ] Vincent commented on SPARK-17219: - [~srowen] Hi all, per discussion, I thought we are going to handle NaN

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-29 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445198#comment-15445198 ] Apache Spark commented on SPARK-17219: -- User 'VinceShieh' has created a pull request for this issue:

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436873#comment-15436873 ] Vincent commented on SPARK-17219: - Okay, thanks. So, meaning we will have no options for users actually.

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436870#comment-15436870 ] Vincent commented on SPARK-17219: - yes, if we wanna make this scenario more general to all bucketizer

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436863#comment-15436863 ] Sean Owen commented on SPARK-17219: --- Yes, agree with that. I think it will involve a change to anything

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436856#comment-15436856 ] Vincent commented on SPARK-17219: - [~srowen] sorryOwen, by saying 'keep it to one behavior'? do u mean

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436819#comment-15436819 ] Barry Becker commented on SPARK-17219: -- In my opinion, yes. It is something that applies to all

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436804#comment-15436804 ] Vincent commented on SPARK-17219: - if so, we have to add this option within Bucketizer, right? >

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436776#comment-15436776 ] Sean Owen commented on SPARK-17219: --- Let's keep it to one behavior, an extra bucket. The caller can

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436767#comment-15436767 ] Barry Becker commented on SPARK-17219: -- If you support the different strategies as R does, please

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-25 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436553#comment-15436553 ] Vincent commented on SPARK-17219: - I can work on this issue if no one else is on it :) >

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436143#comment-15436143 ] Vincent commented on SPARK-17219: - for this scenario, we can add a new parameter for QuantileDiscretizer,

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436136#comment-15436136 ] Vincent commented on SPARK-17219: - for cases where only null and non-null buckets are needed, I guess we

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435651#comment-15435651 ] Barry Becker commented on SPARK-17219: -- If the decision is to have an additional null/NaN bucket,

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435588#comment-15435588 ] Sean Owen commented on SPARK-17219: --- Yes, those seem like the 3 options. Hm, I'm reluctant to introduce

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435484#comment-15435484 ] Barry Becker commented on SPARK-17219: -- Nulls were not accepted in the column. I had to change them

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435469#comment-15435469 ] Sean Owen commented on SPARK-17219: --- These aren't null though, but NaN. There's no meaningful way to

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435403#comment-15435403 ] Barry Becker commented on SPARK-17219: -- There needs to be some way to handle null values when

[jira] [Commented] (SPARK-17219) QuantileDiscretizer does strange things with NaN values

2016-08-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435391#comment-15435391 ] Sean Owen commented on SPARK-17219: --- Ah this is because NaN != NaN. Where it ends up is pretty