[ 
https://issues.apache.org/jira/browse/SPARK-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900220#comment-15900220
 ] 

Chris Rogers edited comment on SPARK-16207 at 3/7/17 9:52 PM:
--------------------------------------------------------------

[~srowen] since there is no documentation yet, I don't know whether a clear, 
coherent generalization can be made.  I would be happy with "most of the 
methods DO NOT preserve order, with these specific exceptions", or "most of the 
methods DO preserve order, with these specific exceptions".

Failing a generalization, I'd also be happy with method-by-method documentation 
of ordering semantics, which seems like a very minimal amount of copy-pasting 
("Preserves ordering: yes", "Preserves ordering: no").  Maybe that's a good 
place to start, since there seems to be some confusion about what the 
generalization would be.

I'm new to Scala so not sure if this is practical, but maybe the appropriate 
methods could be moved to an `RDDPreservesOrdering` class with an implicit 
conversion, akin to `PairRDDFunctions`?


was (Author: rcrogers):
[~srowen] since there is no documentation yet, I don't know whether a clear, 
coherent generalization can be made.  I would be happy with "most of the 
methods DO NOT preserve order, with these specific exceptions", or "most of the 
methods DO preserve order, with these specific exceptions".

Failing a generalization, I'd also be happy with method-by-method documentation 
of ordering semantics, which seems like a very minimal amount of copy-pasting 
("Preserves ordering: yes", "Preserves ordering: no").  Maybe that's a good 
place to start, since there seems to be some confusion about what the 
generalization would be.

> order guarantees for DataFrames
> -------------------------------
>
>                 Key: SPARK-16207
>                 URL: https://issues.apache.org/jira/browse/SPARK-16207
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's no clear explanation in the documentation about what guarantees are 
> available for the preservation of order in DataFrames. Different blogs, SO 
> answers, and posts on course websites suggest different things. It would be 
> good to provide clarity on this.
> Examples of questions on which I could not find clarification:
> 1) Does groupby() preserve order?
> 2) Does take() preserve order?
> 3) Is DataFrame guaranteed to have the same order of lines as the text file 
> it was read from? (Or as the json file, etc.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to