[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034641#comment-15034641 ] Maciej Bryński commented on SPARK-12030: Will the fix be included in 1.6.0 ? > Incorrect results

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034615#comment-15034615 ] Davies Liu commented on SPARK-12030: I also figured out the root cause last night, that's an

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034652#comment-15034652 ] Yin Huai commented on SPARK-12030: -- Yes, it will be in 1.6.0. > Incorrect results when aggregate joined

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034936#comment-15034936 ] Yin Huai commented on SPARK-12030: -- I also merged the patch to branch 1.5. Please note that, in 1.5, we

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035164#comment-15035164 ] Xiao Li commented on SPARK-12030: - I did verify the fix using my test cases. It works! I posted a

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034141#comment-15034141 ] Apache Spark commented on SPARK-12030: -- User 'nongli' has created a pull request for this issue:

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033356#comment-15033356 ] Xiao Li commented on SPARK-12030: - [~nongli] Thank you very much! Your finding sounds reasonable. I

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-12-01 Thread Nong Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033321#comment-15033321 ] Nong Li commented on SPARK-12030: - I think I tracked it down. The bug is from this PR which exposed

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032887#comment-15032887 ] Xiao Li commented on SPARK-12030: - [SPARK-7542][SQL] Support off-heap index/sort buffer

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032939#comment-15032939 ] Xiao Li commented on SPARK-12030: - Let me post a simple case that can trigger the data corruption. The

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032724#comment-15032724 ] Xiao Li commented on SPARK-12030: - I believe I already found which PRs introduced the regression.

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-30 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032857#comment-15032857 ] Davies Liu commented on SPARK-12030: [~smilegator] Could you post the related PRs here? So we can

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-30 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032902#comment-15032902 ] Xiao Li commented on SPARK-12030: - I already excluded Exchange and Partitioning. It should be caused by

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-30 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032927#comment-15032927 ] Yin Huai commented on SPARK-12030: -- [~smilegator] Can you post the case that triggers the problem? Also,

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-29 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031206#comment-15031206 ] Xiao Li commented on SPARK-12030: - I can reproduced a similar issue in a Sort. I think the impact could

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-29 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030883#comment-15030883 ] Maciej Bryński commented on SPARK-12030: [~smilegator] Problem is not only with distinct but with

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-29 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030887#comment-15030887 ] Maciej Bryński commented on SPARK-12030: [~smilegator] I tested 1.5.2 (binaries from spark page)

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-29 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031060#comment-15031060 ] Xiao Li commented on SPARK-12030: - [~maver1ck] Yeah, the problem was introduced in 1.6.0. So far, I think

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030430#comment-15030430 ] Xiao Li commented on SPARK-12030: - What is the data type of id1? > Incorrect results when aggregate

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030618#comment-15030618 ] Xiao Li commented on SPARK-12030: - If you cache `joined`, can you see the same issue? > Incorrect

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030612#comment-15030612 ] Maciej Bryński commented on SPARK-12030: id1, id2 and fk1 are integers. > Incorrect results when

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread Xiu(Joe) Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030615#comment-15030615 ] Xiu(Joe) Guo commented on SPARK-12030: -- I tried your scenario with some TPCDS table last night,

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030632#comment-15030632 ] Maciej Bryński commented on SPARK-12030: When I cache joined the result of distinct(id) is always

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030652#comment-15030652 ] Xiao Li commented on SPARK-12030: - Thank you! [~maver1ck] That will be great if we can know if this is

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030633#comment-15030633 ] Maciej Bryński commented on SPARK-12030: And spark-defaults.conf: {code} spark.master

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030643#comment-15030643 ] Maciej Bryński commented on SPARK-12030: I tried following things: - disable kryoserializer -

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030638#comment-15030638 ] Xiao Li commented on SPARK-12030: - Trying to reproduce it using your parquet files. Thanks! > Incorrect

[jira] [Commented] (SPARK-12030) Incorrect results when aggregate joined data

2015-11-28 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030694#comment-15030694 ] Xiao Li commented on SPARK-12030: - I can reproduce it now. Will take a look at it and try to fix it.