[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564828#comment-15564828
]
Sean Owen commented on SPARK-17219:
---
Yeah, unless you return some complex object with normal buckets
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564786#comment-15564786
]
Apache Spark commented on SPARK-17219:
--
User 'VinceShieh' has created a pull request for this issue:
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563247#comment-15563247
]
Joseph K. Bradley commented on SPARK-17219:
---
True, there is an order created there, but the
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559597#comment-15559597
]
Vincent commented on SPARK-17219:
-
No problem. I will try to submit another PR based on above
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557755#comment-15557755
]
Sean Owen commented on SPARK-17219:
---
@Vincent thanks for your perseverance. I know we discussed this at
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557034#comment-15557034
]
Vincent commented on SPARK-17219:
-
[~josephkb] [~srowen] [~timhunter] let me know what I can do to help
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557021#comment-15557021
]
Vincent commented on SPARK-17219:
-
in this PR(https://github.com/apache/spark/pull/14858) NaN values are
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556130#comment-15556130
]
Joseph K. Bradley commented on SPARK-17219:
---
That does make sense, and I agree we should
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1968#comment-1968
]
Barry Becker commented on SPARK-17219:
--
I'll make another attempt to clarify my use case.
Nulls are
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1925#comment-1925
]
Sean Owen commented on SPARK-17219:
---
OK. Let me just work up a patch to fix forward rather than bother
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1883#comment-1883
]
Joseph K. Bradley commented on SPARK-17219:
---
I agree with [~thunterdb]. The 2 ways of handling
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553503#comment-15553503
]
Sean Owen commented on SPARK-17219:
---
I could go this way too. I ended up sympathizing with trying to
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553490#comment-15553490
]
Timothy Hunter commented on SPARK-17219:
If I understand correctly the PR, I am concerned by this
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445873#comment-15445873
]
Vincent commented on SPARK-17219:
-
Cool. I will refine the patch. thanks [~srowen] :)
>
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445862#comment-15445862
]
Sean Owen commented on SPARK-17219:
---
Agree, and that's a reasonable requirement for any implementation.
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445858#comment-15445858
]
Vincent commented on SPARK-17219:
-
yes, discretizer can do it easily, especially if only
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445820#comment-15445820
]
Sean Owen commented on SPARK-17219:
---
No, the discretizer can do this easily, right? The discretizer
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445808#comment-15445808
]
Vincent commented on SPARK-17219:
-
then we have to shift this work to user, who needs to filter out the
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445779#comment-15445779
]
Sean Owen commented on SPARK-17219:
---
No, there's no meaning to a split bounded by NaN. However, it's
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445768#comment-15445768
]
Vincent commented on SPARK-17219:
-
[~srowen] Hi all, per discussion, I thought we are going to handle NaN
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445198#comment-15445198
]
Apache Spark commented on SPARK-17219:
--
User 'VinceShieh' has created a pull request for this issue:
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436873#comment-15436873
]
Vincent commented on SPARK-17219:
-
Okay, thanks.
So, meaning we will have no options for users actually.
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436870#comment-15436870
]
Vincent commented on SPARK-17219:
-
yes, if we wanna make this scenario more general to all bucketizer
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436863#comment-15436863
]
Sean Owen commented on SPARK-17219:
---
Yes, agree with that. I think it will involve a change to anything
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436856#comment-15436856
]
Vincent commented on SPARK-17219:
-
[~srowen] sorryOwen, by saying 'keep it to one behavior'? do u mean
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436819#comment-15436819
]
Barry Becker commented on SPARK-17219:
--
In my opinion, yes. It is something that applies to all
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436804#comment-15436804
]
Vincent commented on SPARK-17219:
-
if so, we have to add this option within Bucketizer, right?
>
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436776#comment-15436776
]
Sean Owen commented on SPARK-17219:
---
Let's keep it to one behavior, an extra bucket. The caller can
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436767#comment-15436767
]
Barry Becker commented on SPARK-17219:
--
If you support the different strategies as R does, please
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436553#comment-15436553
]
Vincent commented on SPARK-17219:
-
I can work on this issue if no one else is on it :)
>
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436143#comment-15436143
]
Vincent commented on SPARK-17219:
-
for this scenario, we can add a new parameter for QuantileDiscretizer,
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436136#comment-15436136
]
Vincent commented on SPARK-17219:
-
for cases where only null and non-null buckets are needed, I guess we
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435651#comment-15435651
]
Barry Becker commented on SPARK-17219:
--
If the decision is to have an additional null/NaN bucket,
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435588#comment-15435588
]
Sean Owen commented on SPARK-17219:
---
Yes, those seem like the 3 options. Hm, I'm reluctant to introduce
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435484#comment-15435484
]
Barry Becker commented on SPARK-17219:
--
Nulls were not accepted in the column. I had to change them
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435469#comment-15435469
]
Sean Owen commented on SPARK-17219:
---
These aren't null though, but NaN. There's no meaningful way to
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435403#comment-15435403
]
Barry Becker commented on SPARK-17219:
--
There needs to be some way to handle null values when
[
https://issues.apache.org/jira/browse/SPARK-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435391#comment-15435391
]
Sean Owen commented on SPARK-17219:
---
Ah this is because NaN != NaN. Where it ends up is pretty
38 matches
Mail list logo