subject:"\[jira\] \[Commented\] \(SPARK\-12778\) Use of Java Unsafe should take endianness into account"

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094404#comment-15094404
 ] 

Apache Spark commented on SPARK-12778:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/10725

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.
> Below is a copy of Adam's report:
> I've been experimenting with DataFrame operations in a mixed endian 
> environment - a big endian master with little endian workers. With tungsten 
> enabled I'm encountering data corruption issues. 
> For example, with this simple test code: 
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark._
> import org.apache.spark.sql.SQLContext
> object SimpleSQL {
>   def main(args: Array[String]): Unit = {
> if (args.length != 1) {
>   println("Not enough args, you need to specify the master url")
> }
> val masterURL = args(0)
> println("Setting up Spark context at: " + masterURL)
> val sparkConf = new SparkConf
> val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
> println("Performing SQL tests")
> val sqlContext = new SQLContext(sc)
> println("SQL context set up")
> val df = sqlContext.read.json("/tmp/people.json")
> df.show()
> println("Selecting everyone's age and adding one to it")
> df.select(df("name"), df("age") + 1).show()
> println("Showing all people over the age of 21")
> df.filter(df("age") > 21).show()
> println("Counting people by age")
> df.groupBy("age").count().show()
>   }
> } 
> {code}
> Instead of getting 
> {code}
> ++-+
> | age|count|
> ++-+
> |null|1|
> |  19|1|
> |  30|1|
> ++-+ 
> {code}
> I get the following with my mixed endian set up: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |   null|1|
> |1369094286720630784|72057594037927936|
> | 30|1|
> +---+-+ 
> {code}
> and on another run: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |  0|72057594037927936|
> | 19|1| 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Tim Preece (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094185#comment-15094185
 ] 

Tim Preece commented on SPARK-12778:


The testcase in 12555 only fails on big-endian ( even though I can see the 
problem in a debugger on little endian ).

So perhaps it's best if I should create a testcase which explicity fails on 
both BE and LE and then update 12555.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.
> Below is a copy of Adam's report:
> I've been experimenting with DataFrame operations in a mixed endian 
> environment - a big endian master with little endian workers. With tungsten 
> enabled I'm encountering data corruption issues. 
> For example, with this simple test code: 
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark._
> import org.apache.spark.sql.SQLContext
> object SimpleSQL {
>   def main(args: Array[String]): Unit = {
> if (args.length != 1) {
>   println("Not enough args, you need to specify the master url")
> }
> val masterURL = args(0)
> println("Setting up Spark context at: " + masterURL)
> val sparkConf = new SparkConf
> val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
> println("Performing SQL tests")
> val sqlContext = new SQLContext(sc)
> println("SQL context set up")
> val df = sqlContext.read.json("/tmp/people.json")
> df.show()
> println("Selecting everyone's age and adding one to it")
> df.select(df("name"), df("age") + 1).show()
> println("Showing all people over the age of 21")
> df.filter(df("age") > 21).show()
> println("Counting people by age")
> df.groupBy("age").count().show()
>   }
> } 
> {code}
> Instead of getting 
> {code}
> ++-+
> | age|count|
> ++-+
> |null|1|
> |  19|1|
> |  30|1|
> ++-+ 
> {code}
> I get the following with my mixed endian set up: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |   null|1|
> |1369094286720630784|72057594037927936|
> | 30|1|
> +---+-+ 
> {code}
> and on another run: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |  0|72057594037927936|
> | 19|1| 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Tim Preece (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094128#comment-15094128
 ] 

Tim Preece commented on SPARK-12778:


https://issues.apache.org/jira/browse/SPARK-12555 is not related, and in fact 
is not an Endian problem.

However, whilst investigating 12555 we did wonder how/if a mixed Endian Spark 
cluster could work, given an unsafe row mixes writing Integers and reading 
bytes.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.
> Below is a copy of Adam's report:
> I've been experimenting with DataFrame operations in a mixed endian 
> environment - a big endian master with little endian workers. With tungsten 
> enabled I'm encountering data corruption issues. 
> For example, with this simple test code: 
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark._
> import org.apache.spark.sql.SQLContext
> object SimpleSQL {
>   def main(args: Array[String]): Unit = {
> if (args.length != 1) {
>   println("Not enough args, you need to specify the master url")
> }
> val masterURL = args(0)
> println("Setting up Spark context at: " + masterURL)
> val sparkConf = new SparkConf
> val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
> println("Performing SQL tests")
> val sqlContext = new SQLContext(sc)
> println("SQL context set up")
> val df = sqlContext.read.json("/tmp/people.json")
> df.show()
> println("Selecting everyone's age and adding one to it")
> df.select(df("name"), df("age") + 1).show()
> println("Showing all people over the age of 21")
> df.filter(df("age") > 21).show()
> println("Counting people by age")
> df.groupBy("age").count().show()
>   }
> } 
> {code}
> Instead of getting 
> {code}
> ++-+
> | age|count|
> ++-+
> |null|1|
> |  19|1|
> |  30|1|
> ++-+ 
> {code}
> I get the following with my mixed endian set up: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |   null|1|
> |1369094286720630784|72057594037927936|
> | 30|1|
> +---+-+ 
> {code}
> and on another run: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |  0|72057594037927936|
> | 19|1| 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094135#comment-15094135
 ] 

Sean Owen commented on SPARK-12778:
---

Could you update it then? it is certainly described as a problem that manifests 
on big-endian machines and seems to involve Tungsten. If you know its cause is 
different, that's good to know.

I also do not expect Tungsten to work in such a mixed environment and would 
assume you have to disable it.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.
> Below is a copy of Adam's report:
> I've been experimenting with DataFrame operations in a mixed endian 
> environment - a big endian master with little endian workers. With tungsten 
> enabled I'm encountering data corruption issues. 
> For example, with this simple test code: 
> {code}
> import org.apache.spark.SparkContext
> import org.apache.spark._
> import org.apache.spark.sql.SQLContext
> object SimpleSQL {
>   def main(args: Array[String]): Unit = {
> if (args.length != 1) {
>   println("Not enough args, you need to specify the master url")
> }
> val masterURL = args(0)
> println("Setting up Spark context at: " + masterURL)
> val sparkConf = new SparkConf
> val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
> println("Performing SQL tests")
> val sqlContext = new SQLContext(sc)
> println("SQL context set up")
> val df = sqlContext.read.json("/tmp/people.json")
> df.show()
> println("Selecting everyone's age and adding one to it")
> df.select(df("name"), df("age") + 1).show()
> println("Showing all people over the age of 21")
> df.filter(df("age") > 21).show()
> println("Counting people by age")
> df.groupBy("age").count().show()
>   }
> } 
> {code}
> Instead of getting 
> {code}
> ++-+
> | age|count|
> ++-+
> |null|1|
> |  19|1|
> |  30|1|
> ++-+ 
> {code}
> I get the following with my mixed endian set up: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |   null|1|
> |1369094286720630784|72057594037927936|
> | 30|1|
> +---+-+ 
> {code}
> and on another run: 
> {code}
> +---+-+
> |age|count|
> +---+-+
> |  0|72057594037927936|
> | 19|1| 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094090#comment-15094090
 ] 

Ted Yu commented on SPARK-12778:


I have seen SPARK-12555 but it didn't identify where in the codebase the 
problem is.

I checked unsafe/src/main/java/org/apache/spark/unsafe/Platform.java in master 
branch where endianness is not taken care of.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094062#comment-15094062
 ] 

Sean Owen commented on SPARK-12778:
---

Can you include some detail here? Also, there are related JIRAs already 
resolved on this topic. What version? set a component while your'e at it, 
please.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094082#comment-15094082
 ] 

Sean Owen commented on SPARK-12778:
---

See for example https://issues.apache.org/jira/browse/SPARK-12555
Checked master for what? there is more detail in the thread that needs to be 
reproduced here, please; by itself this does not describe the problem.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

2016-01-12 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094078#comment-15094078
 ] 

Ted Yu commented on SPARK-12778:


I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find 
any unresolved one.

I checked master branch code base before logging this JIRA.

> Use of Java Unsafe should take endianness into account
> --
>
> Key: SPARK-12778
> URL: https://issues.apache.org/jira/browse/SPARK-12778
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In Platform.java, methods of Java Unsafe are called directly without 
> considering endianness.
> In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported 
> data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian 
> environment.
> Platform.java should take endianness into account.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account

8 matches

Site Navigation

Mail list logo

Footer information