from:"Nazarii Balkovskyi \(JIRA\)"

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

2016-02-12 Thread Nazarii Balkovskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazarii Balkovskyi updated SPARK-13299:
---
Description: 
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}

  was:
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}


> DataFrame limit operation is not consistent
> ---
>
> Key: SPARK-13299
> URL: https://issues.apache.org/jira/browse/SPARK-13299
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Nazarii Balkovskyi
>  Labels: SparkSQL, dataframe
> Attachments: SparkLimitIssue.png
>
>
> I faced to a problem with using limit method from DataFrame API. 
> I try to get first 999 records from the AVRO source which contains about 3.5K 
> records. 
> {code:java}
> DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
> df = df.limit(999);
> {code}
> Then after saving operation I get the rows not in the same order as in input 
> data set. Sometimes it gives me proper order but usually not. 
> {code:java}
> df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
> {code}
> Here you can see Spark plan (maybe it can help to figure out the cause of the 
> issue):
> {code}
> == Parsed Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Analyzed Logical Plan ==
> mobileNumber: bigint, tariff: string, debit: float
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Optimized Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Physical Plan ==
> Limit 999
>  Scan 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]
> Code Generation: true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

2016-02-12 Thread Nazarii Balkovskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazarii Balkovskyi updated SPARK-13299:
---
Description: 
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}

  was:
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://:8020/user/hdfs/clientsENG10M.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}


> DataFrame limit operation is not consistent
> ---
>
> Key: SPARK-13299
> URL: https://issues.apache.org/jira/browse/SPARK-13299
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Nazarii Balkovskyi
>  Labels: SparkSQL, dataframe
> Attachments: SparkLimitIssue.png
>
>
> I faced to a problem with using limit method from DataFrame API. 
> I try to get first 999 records from the AVRO source which contains about 3.5K 
> records. 
> {code:java}
> DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
> df = df.limit(999);
> {code}
> Then after saving operation I get the rows not in the same order as in input 
> data set. Sometimes it gives me proper order but usually not. 
> {code:java}
> df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
> {code}
> Here you can see Spark plan (maybe it can help to figure out the cause of the 
> issue):
> {code}
> == Parsed Logical Plan ==
> Limit 999
>  Filter (1 = 1)
>   Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Analyzed Logical Plan ==
> mobileNumber: bigint, tariff: string, debit: float
> Limit 999
>  Filter (1 = 1)
>   Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Optimized Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Physical Plan ==
> Limit 999
>  Scan 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]
> Code Generation: true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

2016-02-12 Thread Nazarii Balkovskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazarii Balkovskyi updated SPARK-13299:
---
Description: 
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://:8020/user/hdfs/clientsENG10M.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}

  was:
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://lssparkmaster.edvantis.com:8020/user/hdfs/clientsENG10M.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}


> DataFrame limit operation is not consistent
> ---
>
> Key: SPARK-13299
> URL: https://issues.apache.org/jira/browse/SPARK-13299
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Nazarii Balkovskyi
>  Labels: SparkSQL, dataframe
> Attachments: SparkLimitIssue.png
>
>
> I faced to a problem with using limit method from DataFrame API. 
> I try to get first 999 records from the AVRO source which contains about 3.5K 
> records. 
> {code:java}
> DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
> df = df.limit(999);
> {code}
> Then after saving operation I get the rows not in the same order as in input 
> data set. Sometimes it gives me proper order but usually not. 
> {code:java}
> df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
> {code}
> Here you can see Spark plan (maybe it can help to figure out the cause of the 
> issue):
> {code}
> == Parsed Logical Plan ==
> Limit 999
>  Filter (1 = 1)
>   Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Analyzed Logical Plan ==
> mobileNumber: bigint, tariff: string, debit: float
> Limit 999
>  Filter (1 = 1)
>   Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Optimized Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Physical Plan ==
> Limit 999
>  Scan 
> AvroRelation(hdfs://:8020/user/hdfs/clientsENG10M.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]
> Code Generation: true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

2016-02-12 Thread Nazarii Balkovskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazarii Balkovskyi updated SPARK-13299:
---
Description: 
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):
{code}
== Parsed Logical Plan ==
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Analyzed Logical Plan ==
mobileNumber: bigint, tariff: string, debit: float
Limit 999
 Filter (1 = 1)
  Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Optimized Logical Plan ==
Limit 999
 Relation[mobileNumber#0L,tariff#1,debit#2] 
AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)

== Physical Plan ==
Limit 999
 Scan 
AvroRelation(hdfs://lssparkmaster.edvantis.com:8020/user/hdfs/clientsENG10M.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]

Code Generation: true
{code}

  was:
I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):

== Parsed Logical Plan ==
Limit 999
 Relation[color#0,id#1,type#2,rand#3,junk#4] 
AvroRelation(hdfs://:8020/tmp/hdfs.2016-02-12--10-18-55-171-488/hdfs.2016-02-12--10-19-05-109-895.avro,None,0)

== Analyzed Logical Plan ==
color: string, id: int, type: string, rand: int, junk: string
Limit 999
 Relation[color#0,id#1,type#2,rand#3,junk#4] 
AvroRelation(hdfs://:8020/tmp/hdfs.2016-02-12--10-18-55-171-488/hdfs.2016-02-12--10-19-05-109-895.avro,None,0)

== Optimized Logical Plan ==
InMemoryRelation [color#0,id#1,type#2,rand#3,junk#4], true, 1, 
StorageLevel(true, true, false, true, 1), (Limit 999), None

== Physical Plan ==
InMemoryColumnarTableScan [color#0,id#1,type#2,rand#3,junk#4], 
(InMemoryRelation [color#0,id#1,type#2,rand#3,junk#4], true, 1, 
StorageLevel(true, true, false, true, 1), (Limit 999), None)

Code Generation: true




> DataFrame limit operation is not consistent
> ---
>
> Key: SPARK-13299
> URL: https://issues.apache.org/jira/browse/SPARK-13299
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Nazarii Balkovskyi
>  Labels: SparkSQL, dataframe
> Attachments: SparkLimitIssue.png
>
>
> I faced to a problem with using limit method from DataFrame API. 
> I try to get first 999 records from the AVRO source which contains about 3.5K 
> records. 
> {code:java}
> DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
> df = df.limit(999);
> {code}
> Then after saving operation I get the rows not in the same order as in input 
> data set. Sometimes it gives me proper order but usually not. 
> {code:java}
> df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
> {code}
> Here you can see Spark plan (maybe it can help to figure out the cause of the 
> issue):
> {code}
> == Parsed Logical Plan ==
> Limit 999
>  Filter (1 = 1)
>   Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Analyzed Logical Plan ==
> mobileNumber: bigint, tariff: string, debit: float
> Limit 999
>  Filter (1 = 1)
>   Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Optimized Logical Plan ==
> Limit 999
>  Relation[mobileNumber#0L,tariff#1,debit#2] 
> AvroRelation(hdfs://:8020/user/hdfs/dataset.avro,None,0)
> == Physical Plan ==
> Limit 999
>  Scan 
> AvroRelation(hdfs://lssparkmaster.edvantis.com:8020/user/hdfs/clientsENG10M.avro,None,0)[mobileNumber#0L,tariff#1,debit#2]
> Code Generation: true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For addit

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

2016-02-12 Thread Nazarii Balkovskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nazarii Balkovskyi updated SPARK-13299:
---
Attachment: SparkLimitIssue.png

> DataFrame limit operation is not consistent
> ---
>
> Key: SPARK-13299
> URL: https://issues.apache.org/jira/browse/SPARK-13299
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Nazarii Balkovskyi
>  Labels: SparkSQL, dataframe
> Attachments: SparkLimitIssue.png
>
>
> I faced to a problem with using limit method from DataFrame API. 
> I try to get first 999 records from the AVRO source which contains about 3.5K 
> records. 
> {code:java}
> DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");
> df = df.limit(999);
> {code}
> Then after saving operation I get the rows not in the same order as in input 
> data set. Sometimes it gives me proper order but usually not. 
> {code:java}
> df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
> {code}
> Here you can see Spark plan (maybe it can help to figure out the cause of the 
> issue):
> == Parsed Logical Plan ==
> Limit 999
>  Relation[color#0,id#1,type#2,rand#3,junk#4] 
> AvroRelation(hdfs://:8020/tmp/hdfs.2016-02-12--10-18-55-171-488/hdfs.2016-02-12--10-19-05-109-895.avro,None,0)
> == Analyzed Logical Plan ==
> color: string, id: int, type: string, rand: int, junk: string
> Limit 999
>  Relation[color#0,id#1,type#2,rand#3,junk#4] 
> AvroRelation(hdfs://:8020/tmp/hdfs.2016-02-12--10-18-55-171-488/hdfs.2016-02-12--10-19-05-109-895.avro,None,0)
> == Optimized Logical Plan ==
> InMemoryRelation [color#0,id#1,type#2,rand#3,junk#4], true, 1, 
> StorageLevel(true, true, false, true, 1), (Limit 999), None
> == Physical Plan ==
> InMemoryColumnarTableScan [color#0,id#1,type#2,rand#3,junk#4], 
> (InMemoryRelation [color#0,id#1,type#2,rand#3,junk#4], true, 1, 
> StorageLevel(true, true, false, true, 1), (Limit 999), None)
> Code Generation: true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-13299) DataFrame limit operation is not consistent

2016-02-12 Thread Nazarii Balkovskyi (JIRA)

Nazarii Balkovskyi created SPARK-13299:
--

 Summary: DataFrame limit operation is not consistent
 Key: SPARK-13299
 URL: https://issues.apache.org/jira/browse/SPARK-13299
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.0, 1.5.2, 1.5.1, 1.5.0, 1.3.1
Reporter: Nazarii Balkovskyi


I faced to a problem with using limit method from DataFrame API. 
I try to get first 999 records from the AVRO source which contains about 3.5K 
records. 

{code:java}
DataFrame df = sqlContext.load(inputSource, "com.databricks.spark.avro");

df = df.limit(999);
{code}

Then after saving operation I get the rows not in the same order as in input 
data set. Sometimes it gives me proper order but usually not. 

{code:java}
df.save(filepathToSave, "com.databricks.spark.avro", SaveMode.ErrorIfExists);
{code}

Here you can see Spark plan (maybe it can help to figure out the cause of the 
issue):

== Parsed Logical Plan ==
Limit 999
 Relation[color#0,id#1,type#2,rand#3,junk#4] 
AvroRelation(hdfs://:8020/tmp/hdfs.2016-02-12--10-18-55-171-488/hdfs.2016-02-12--10-19-05-109-895.avro,None,0)

== Analyzed Logical Plan ==
color: string, id: int, type: string, rand: int, junk: string
Limit 999
 Relation[color#0,id#1,type#2,rand#3,junk#4] 
AvroRelation(hdfs://:8020/tmp/hdfs.2016-02-12--10-18-55-171-488/hdfs.2016-02-12--10-19-05-109-895.avro,None,0)

== Optimized Logical Plan ==
InMemoryRelation [color#0,id#1,type#2,rand#3,junk#4], true, 1, 
StorageLevel(true, true, false, true, 1), (Limit 999), None

== Physical Plan ==
InMemoryColumnarTableScan [color#0,id#1,type#2,rand#3,junk#4], 
(InMemoryRelation [color#0,id#1,type#2,rand#3,junk#4], true, 1, 
StorageLevel(true, true, false, true, 1), (Limit 999), None)

Code Generation: true





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

[jira] [Updated] (SPARK-13299) DataFrame limit operation is not consistent

[jira] [Created] (SPARK-13299) DataFrame limit operation is not consistent

6 matches

Site Navigation

Mail list logo

Footer information