[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2019-04-11 Thread Victor Tso (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815763#comment-16815763 ] Victor Tso commented on SPARK-20144: This one was clearly decided against. I ended up writing my own

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2019-04-11 Thread David Greenberg (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815755#comment-16815755 ] David Greenberg commented on SPARK-20144: - Hello, this issue is also a major one for me. Almost

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-11-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697410#comment-16697410 ] Dongjoon Hyun commented on SPARK-20144: --- Sorry, [~darabos]. IMHO, the proposed way is not

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-11-23 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697319#comment-16697319 ] Daniel Darabos commented on SPARK-20144: So where do we go from here? Should I try to find a

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650817#comment-16650817 ] Daniel Darabos commented on SPARK-20144: Thanks, those are good questions. # The global option

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650779#comment-16650779 ] Dongjoon Hyun commented on SPARK-20144: --- [~silvermast] and [~darabos].  1. The proposed

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650722#comment-16650722 ] Daniel Darabos commented on SPARK-20144: Yeah, I'm not too happy about the alphabetical ordering

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Victor Tso (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650704#comment-16650704 ] Victor Tso commented on SPARK-20144: It should, because by convention the parquet files are 0-padded 

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650642#comment-16650642 ] Dongjoon Hyun commented on SPARK-20144: --- For me, I don't think that PR resolve this issue,

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650492#comment-16650492 ] Daniel Darabos commented on SPARK-20144: Thanks Victor! I've expanded the test with a case where

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-15 Thread Victor Tso (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650303#comment-16650303 ] Victor Tso commented on SPARK-20144: I looked at the PR and liked what I saw. I would only suggest

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-08 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642401#comment-16642401 ] Daniel Darabos commented on SPARK-20144: Sorry, I had an idea for a quick fix for this and sent

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642038#comment-16642038 ] Apache Spark commented on SPARK-20144: -- User 'darabos' has created a pull request for this issue:

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-10-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642036#comment-16642036 ] Apache Spark commented on SPARK-20144: -- User 'darabos' has created a pull request for this issue:

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-05-30 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494852#comment-16494852 ] sam commented on SPARK-20144: - Regarding the original issue of sorting, I agree with [~srowen] in that it

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2018-05-30 Thread Unai Sarasola (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494810#comment-16494810 ] Unai Sarasola commented on SPARK-20144: --- But if you want to have exactly a copy from your data in

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-10-13 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203882#comment-16203882 ] sam commented on SPARK-20144: - I think this is a regression. We used to be able to easily control the number

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-05-04 Thread Bill (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996517#comment-15996517 ] Bill commented on SPARK-20144: -- Increasing {{spark.sql.files.openCostInBytes}} prevents the individual

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-04-07 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961170#comment-15961170 ] Andrew Ash commented on SPARK-20144: This is a regression from 1.6 to the 2.x line. [~marmbrus]

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-04-04 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956030#comment-15956030 ] Li Jin commented on SPARK-20144: > When you save the sorted data into Parquet, only the data in

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-04-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954559#comment-15954559 ] Liang-Chi Hsieh commented on SPARK-20144: - I don't think the API has the guarantee about the data

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951202#comment-15951202 ] Li Jin commented on SPARK-20144: Thanks Sean! I appreciate your time and help very much. >

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951095#comment-15951095 ] Sean Owen commented on SPARK-20144: --- Probably best to wait for an informed opinion but I would assume

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951084#comment-15951084 ] Li Jin commented on SPARK-20144: Also, I am not sure about "If the data were sorted, sorting would be

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951073#comment-15951073 ] Li Jin commented on SPARK-20144: I totally agree Correctness takes precedence. If sorting is the only

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950988#comment-15950988 ] Sean Owen commented on SPARK-20144: --- If the data were sorted, sorting would be pretty cheap, in

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950979#comment-15950979 ] Li Jin commented on SPARK-20144: Thanks for getting back to me. Sorting in this case will just add extra

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-31 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950627#comment-15950627 ] Sean Owen commented on SPARK-20144: --- If you need a particular ordering, I think you need to sort. I am

[jira] [Commented] (SPARK-20144) spark.read.parquet no long maintains ordering of the data

2017-03-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950221#comment-15950221 ] Li Jin commented on SPARK-20144: Ping, anyone? This is a pretty big blocker for us. > spark.read.parquet