[jira] [Commented] (SPARK-26699) Dataset column output discrepancies

2019-01-24 Thread Praveena (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751159#comment-16751159
 ] 

Praveena commented on SPARK-26699:
--

I am trying to understand why its behaving differently on Local and Cluster 
mode. 

Please let me know the emailing list, so i can reach them. Thanks in advance

> Dataset column output discrepancies 
> 
>
> Key: SPARK-26699
> URL: https://issues.apache.org/jira/browse/SPARK-26699
> Project: Spark
>  Issue Type: Question
>  Components: Input/Output
>Affects Versions: 2.3.2
>Reporter: Praveena
>Priority: Major
>
> Hi,
>  
> When i run my job in Local mode (meaning as standalone in Eclipse) with same 
> parquet input files, the output is -
>  
> locations
>  
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  null
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  
> But when i run the same code base with same input parquet files in the YARN 
> cluster mode, my output is as below -
> 
>  locations
>  
>  [*WrappedArray*([tr...
>  [*WrappedArray*([tr...
>  [WrappedArray([tr...
>  null
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
> Its appending WrappedArray :(
> I am using Apache Spark 2.3.2 version and the EMR Version is 5.19.0. What 
> could be the reason for discrepancies in the output of certain Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Praveena (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveena updated SPARK-26699:
-
Issue Type: Question  (was: Bug)

> Dataset column output discrepancies 
> 
>
> Key: SPARK-26699
> URL: https://issues.apache.org/jira/browse/SPARK-26699
> Project: Spark
>  Issue Type: Question
>  Components: Input/Output
>Affects Versions: 2.3.2
>Reporter: Praveena
>Priority: Major
>
> Hi,
>  
> When i run my job in Local mode (meaning as standalone in Eclipse) with same 
> parquet input files, the output is -
>  
> locations
>  
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  null
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  
> But when i run the same code base with same input parquet files in the YARN 
> cluster mode, my output is as below -
> 
>  locations
>  
>  [*WrappedArray*([tr...
>  [*WrappedArray*([tr...
>  [WrappedArray([tr...
>  null
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
> Its appending WrappedArray :(
> I am using Apache Spark 2.3.2 version and the EMR Version is 5.19.0. What 
> could be the reason for discrepancies in the output of certain Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Praveena (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveena updated SPARK-26699:
-
Description: 
Hi,

 

When i run my job in Local mode (meaning as standalone in Eclipse) with same 
parquet input files, the output is -

 

locations
 
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 null
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...

 

But when i run the same code base with same input parquet files in the YARN 
cluster mode, my output is as below -


 locations
 
 [*WrappedArray*([tr...
 [*WrappedArray*([tr...
 [WrappedArray([tr...
 null
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...

Its appending WrappedArray :(

I am using Apache Spark 2.3.2 version and the EMR Version is 5.19.0. What could 
be the reason for discrepancies in the output of certain Table columns ?

  was:
Hi,

 

When i run my job in Local mode (meaning as standalone in Eclipse) with same 
parquet input files, the output is -

 

locations
 
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 null
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...

 

But when i run the same code base with same input parquet files in the YARN 
cluster mode, my output is as below -


 locations
 
 [*WrappedArray*([tr...
 [*WrappedArray*([tr...
 [WrappedArray([tr...
 null
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...

Its appending WrappedArray :(

I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 
5.19.0. What could be the reason for discrepancies in the output of certain 
Table columns ?


> Dataset column output discrepancies 
> 
>
> Key: SPARK-26699
> URL: https://issues.apache.org/jira/browse/SPARK-26699
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.3.2
>Reporter: Praveena
>Priority: Major
>
> Hi,
>  
> When i run my job in Local mode (meaning as standalone in Eclipse) with same 
> parquet input files, the output is -
>  
> locations
>  
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  null
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  
> But when i run the same code base with same input parquet files in the YARN 
> cluster mode, my output is as below -
> 
>  locations
>  
>  [*WrappedArray*([tr...
>  [*WrappedArray*([tr...
>  [WrappedArray([tr...
>  null
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
> Its appending WrappedArray :(
> I am using Apache Spark 2.3.2 version and the EMR Version is 5.19.0. What 
> could be the reason for discrepancies in the output of certain Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-23 Thread Praveena (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveena updated SPARK-26699:
-
Description: 
Hi,

 

When i run my job in Local mode (meaning as standalone in Eclipse) with same 
parquet input files, the output is -

 

locations
 
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 null
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...
 [[[true, [[, phys...

 

But when i run the same code base with same input parquet files in the YARN 
cluster mode, my output is as below -


 locations
 
 [*WrappedArray*([tr...
 [*WrappedArray*([tr...
 [WrappedArray([tr...
 null
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...
 [WrappedArray([tr...

Its appending WrappedArray :(

I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 
5.19.0. What could be the reason for discrepancies in the output of certain 
Table columns ?

  was:
Hi,

 

When i run my job in Local mode with same parquet input files, the output is -

 

locations

[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
 null
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...

 

But when i run the same code base with same input parquet files in the YARN 
cluster mode, my output is as below -


 locations

[*WrappedArray*([tr...
[*WrappedArray*([tr...
[WrappedArray([tr...
 null
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...

Its appending WrappedArray :(

I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 
5.19.0. What could be the reason for discrepancies in the output of certain 
Table columns ?


> Dataset column output discrepancies 
> 
>
> Key: SPARK-26699
> URL: https://issues.apache.org/jira/browse/SPARK-26699
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.3.2
>Reporter: Praveena
>Priority: Major
>
> Hi,
>  
> When i run my job in Local mode (meaning as standalone in Eclipse) with same 
> parquet input files, the output is -
>  
> locations
>  
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  null
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  [[[true, [[, phys...
>  
> But when i run the same code base with same input parquet files in the YARN 
> cluster mode, my output is as below -
> 
>  locations
>  
>  [*WrappedArray*([tr...
>  [*WrappedArray*([tr...
>  [WrappedArray([tr...
>  null
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
>  [WrappedArray([tr...
> Its appending WrappedArray :(
> I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 
> 5.19.0. What could be the reason for discrepancies in the output of certain 
> Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26699) Dataset column discrepancies between Parquet

2019-01-22 Thread Lakshmi Praveena (JIRA)
Lakshmi Praveena created SPARK-26699:


 Summary: Dataset column discrepancies between Parquet 
 Key: SPARK-26699
 URL: https://issues.apache.org/jira/browse/SPARK-26699
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 2.3.2
Reporter: Lakshmi Praveena


Hi,

 

When i run my job in Local mode with same parquet input files, the output is -

 

locations

[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
 null
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...
[[[true, [[, phys...

 

But when i run the same code base with same input parquet files in the YARN 
cluster mode, my output is as below -


 locations

[*WrappedArray*([tr...
[*WrappedArray*([tr...
[WrappedArray([tr...
 null
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...
[WrappedArray([tr...

Its appending WrappedArray :(

I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 
5.19.0. What could be the reason for discrepancies in the output of certain 
Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26699) Dataset column output discrepancies

2019-01-22 Thread Lakshmi Praveena (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lakshmi Praveena updated SPARK-26699:
-
Summary: Dataset column output discrepancies   (was: Dataset column 
discrepancies between Parquet )

> Dataset column output discrepancies 
> 
>
> Key: SPARK-26699
> URL: https://issues.apache.org/jira/browse/SPARK-26699
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.3.2
>Reporter: Lakshmi Praveena
>Priority: Major
>
> Hi,
>  
> When i run my job in Local mode with same parquet input files, the output is -
>  
> locations
> 
> [[[true, [[, phys...
> [[[true, [[, phys...
> [[[true, [[, phys...
>  null
> [[[true, [[, phys...
> [[[true, [[, phys...
> [[[true, [[, phys...
> [[[true, [[, phys...
> [[[true, [[, phys...
> [[[true, [[, phys...
>  
> But when i run the same code base with same input parquet files in the YARN 
> cluster mode, my output is as below -
> 
>  locations
> 
> [*WrappedArray*([tr...
> [*WrappedArray*([tr...
> [WrappedArray([tr...
>  null
> [WrappedArray([tr...
> [WrappedArray([tr...
> [WrappedArray([tr...
> [WrappedArray([tr...
> [WrappedArray([tr...
> [WrappedArray([tr...
> Its appending WrappedArray :(
> I am using Apache Spark 2.3.2 version and the EMR Version while cluster is 
> 5.19.0. What could be the reason for discrepancies in the output of certain 
> Table columns ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org