[ https://issues.apache.org/jira/browse/SPARK-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-16207: ------------------------------ Priority: Minor (was: Major) Generally, things like RDD and DataFrame don't guarantee any order at all, unless they are product of an ordering operation like sort. I don't think blogs/SO are relevant as much as Spark docs, and they do cover this in places. If you have a specific suggestion, make a PR, but if this is a question, then it should be closed. > order guarantees for DataFrames > ------------------------------- > > Key: SPARK-16207 > URL: https://issues.apache.org/jira/browse/SPARK-16207 > Project: Spark > Issue Type: Documentation > Components: Spark Core > Affects Versions: 1.6.1 > Reporter: Max Moroz > Priority: Minor > > There's no clear explanation in the documentation about what guarantees are > available for the preservation of order in DataFrames. Different blogs, SO > answers, and posts on course websites suggest different things. It would be > good to provide clarity on this. > Examples of questions on which I could not find clarification: > 1) Does groupby() preserve order? > 2) Does take() preserve order? > 3) Is DataFrame guaranteed to have the same order of lines as the text file > it was read from? (Or as the json file, etc.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org