[ 
https://issues.apache.org/jira/browse/SPARK-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-13299.
-------------------------------
    Resolution: Not A Problem

Unless your DataFrame has a defined ordering, I don't think you'd expect the 
first N to be a particular set. First w.r.t. what? it will be some default 
ordering that's a function of however it was partitioned and read

> DataFrame limit operation is not consistent
> -------------------------------------------
>
>                 Key: SPARK-13299
>                 URL: https://issues.apache.org/jira/browse/SPARK-13299
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>            Reporter: Nazarii Balkovskyi
>              Labels: SparkSQL, dataframe
>         Attachments: SparkLimitIssue.png
>
>
> I faced to a problem with using limit method from DataFrame API. 
> I try to get first 999 records from the AVRO source which contains about 3.5K 
> records. 
> {code:java}
> DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
> df = df.limit(999);
> {code}
> Then after saving operation I get the rows not in the same order as in input 
> data set. Sometimes it gives me proper order but usually not. 
> {code:java}
> df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
> {code}
> Here you can see Spark plan (maybe it can help to figure out the cause of the 
> issue):
> {code}
> == Parsed Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)
> == Analyzed Logical Plan ==
> mobileNumber: bigint, tariff: string, debit: float
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)
> == Optimized Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)
> == Physical Plan ==
> Limit 999
>  Scan 
> AvroRelation(hdfs://<server_name>:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]
> Code Generation: true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to