[ https://issues.apache.org/jira/browse/SPARK-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900220#comment-15900220 ]
Chris Rogers commented on SPARK-16207: -------------------------------------- [~srowen] since there is no documentation yet, I don't know whether a clear, coherent generalization can be made. I would be happy with "most of the methods DO NOT preserve order, with these specific exceptions", or "most of the methods DO preserve order, with these specific exceptions". Failing a generalization, I'd also be happy with method-by-method documentation of ordering semantics, which seems like a very minimal amount of copy-pasting ("Preserves ordering: yes", "Preserves ordering: no"). Maybe that's a good place to start, since there seems to be some confusion about what the generalization would be. > order guarantees for DataFrames > ------------------------------- > > Key: SPARK-16207 > URL: https://issues.apache.org/jira/browse/SPARK-16207 > Project: Spark > Issue Type: Documentation > Components: Spark Core > Affects Versions: 1.6.1 > Reporter: Max Moroz > Priority: Minor > > There's no clear explanation in the documentation about what guarantees are > available for the preservation of order in DataFrames. Different blogs, SO > answers, and posts on course websites suggest different things. It would be > good to provide clarity on this. > Examples of questions on which I could not find clarification: > 1) Does groupby() preserve order? > 2) Does take() preserve order? > 3) Is DataFrame guaranteed to have the same order of lines as the text file > it was read from? (Or as the json file, etc.) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org