[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094404#comment-15094404 ] Apache Spark commented on SPARK-12778: -- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/10725 > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. > Below is a copy of Adam's report: > I've been experimenting with DataFrame operations in a mixed endian > environment - a big endian master with little endian workers. With tungsten > enabled I'm encountering data corruption issues. > For example, with this simple test code: > {code} > import org.apache.spark.SparkContext > import org.apache.spark._ > import org.apache.spark.sql.SQLContext > object SimpleSQL { > def main(args: Array[String]): Unit = { > if (args.length != 1) { > println("Not enough args, you need to specify the master url") > } > val masterURL = args(0) > println("Setting up Spark context at: " + masterURL) > val sparkConf = new SparkConf > val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) > println("Performing SQL tests") > val sqlContext = new SQLContext(sc) > println("SQL context set up") > val df = sqlContext.read.json("/tmp/people.json") > df.show() > println("Selecting everyone's age and adding one to it") > df.select(df("name"), df("age") + 1).show() > println("Showing all people over the age of 21") > df.filter(df("age") > 21).show() > println("Counting people by age") > df.groupBy("age").count().show() > } > } > {code} > Instead of getting > {code} > ++-+ > | age|count| > ++-+ > |null|1| > | 19|1| > | 30|1| > ++-+ > {code} > I get the following with my mixed endian set up: > {code} > +---+-+ > |age|count| > +---+-+ > | null|1| > |1369094286720630784|72057594037927936| > | 30|1| > +---+-+ > {code} > and on another run: > {code} > +---+-+ > |age|count| > +---+-+ > | 0|72057594037927936| > | 19|1| > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094185#comment-15094185 ] Tim Preece commented on SPARK-12778: The testcase in 12555 only fails on big-endian ( even though I can see the problem in a debugger on little endian ). So perhaps it's best if I should create a testcase which explicity fails on both BE and LE and then update 12555. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. > Below is a copy of Adam's report: > I've been experimenting with DataFrame operations in a mixed endian > environment - a big endian master with little endian workers. With tungsten > enabled I'm encountering data corruption issues. > For example, with this simple test code: > {code} > import org.apache.spark.SparkContext > import org.apache.spark._ > import org.apache.spark.sql.SQLContext > object SimpleSQL { > def main(args: Array[String]): Unit = { > if (args.length != 1) { > println("Not enough args, you need to specify the master url") > } > val masterURL = args(0) > println("Setting up Spark context at: " + masterURL) > val sparkConf = new SparkConf > val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) > println("Performing SQL tests") > val sqlContext = new SQLContext(sc) > println("SQL context set up") > val df = sqlContext.read.json("/tmp/people.json") > df.show() > println("Selecting everyone's age and adding one to it") > df.select(df("name"), df("age") + 1).show() > println("Showing all people over the age of 21") > df.filter(df("age") > 21).show() > println("Counting people by age") > df.groupBy("age").count().show() > } > } > {code} > Instead of getting > {code} > ++-+ > | age|count| > ++-+ > |null|1| > | 19|1| > | 30|1| > ++-+ > {code} > I get the following with my mixed endian set up: > {code} > +---+-+ > |age|count| > +---+-+ > | null|1| > |1369094286720630784|72057594037927936| > | 30|1| > +---+-+ > {code} > and on another run: > {code} > +---+-+ > |age|count| > +---+-+ > | 0|72057594037927936| > | 19|1| > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094128#comment-15094128 ] Tim Preece commented on SPARK-12778: https://issues.apache.org/jira/browse/SPARK-12555 is not related, and in fact is not an Endian problem. However, whilst investigating 12555 we did wonder how/if a mixed Endian Spark cluster could work, given an unsafe row mixes writing Integers and reading bytes. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. > Below is a copy of Adam's report: > I've been experimenting with DataFrame operations in a mixed endian > environment - a big endian master with little endian workers. With tungsten > enabled I'm encountering data corruption issues. > For example, with this simple test code: > {code} > import org.apache.spark.SparkContext > import org.apache.spark._ > import org.apache.spark.sql.SQLContext > object SimpleSQL { > def main(args: Array[String]): Unit = { > if (args.length != 1) { > println("Not enough args, you need to specify the master url") > } > val masterURL = args(0) > println("Setting up Spark context at: " + masterURL) > val sparkConf = new SparkConf > val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) > println("Performing SQL tests") > val sqlContext = new SQLContext(sc) > println("SQL context set up") > val df = sqlContext.read.json("/tmp/people.json") > df.show() > println("Selecting everyone's age and adding one to it") > df.select(df("name"), df("age") + 1).show() > println("Showing all people over the age of 21") > df.filter(df("age") > 21).show() > println("Counting people by age") > df.groupBy("age").count().show() > } > } > {code} > Instead of getting > {code} > ++-+ > | age|count| > ++-+ > |null|1| > | 19|1| > | 30|1| > ++-+ > {code} > I get the following with my mixed endian set up: > {code} > +---+-+ > |age|count| > +---+-+ > | null|1| > |1369094286720630784|72057594037927936| > | 30|1| > +---+-+ > {code} > and on another run: > {code} > +---+-+ > |age|count| > +---+-+ > | 0|72057594037927936| > | 19|1| > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094135#comment-15094135 ] Sean Owen commented on SPARK-12778: --- Could you update it then? it is certainly described as a problem that manifests on big-endian machines and seems to involve Tungsten. If you know its cause is different, that's good to know. I also do not expect Tungsten to work in such a mixed environment and would assume you have to disable it. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. > Below is a copy of Adam's report: > I've been experimenting with DataFrame operations in a mixed endian > environment - a big endian master with little endian workers. With tungsten > enabled I'm encountering data corruption issues. > For example, with this simple test code: > {code} > import org.apache.spark.SparkContext > import org.apache.spark._ > import org.apache.spark.sql.SQLContext > object SimpleSQL { > def main(args: Array[String]): Unit = { > if (args.length != 1) { > println("Not enough args, you need to specify the master url") > } > val masterURL = args(0) > println("Setting up Spark context at: " + masterURL) > val sparkConf = new SparkConf > val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) > println("Performing SQL tests") > val sqlContext = new SQLContext(sc) > println("SQL context set up") > val df = sqlContext.read.json("/tmp/people.json") > df.show() > println("Selecting everyone's age and adding one to it") > df.select(df("name"), df("age") + 1).show() > println("Showing all people over the age of 21") > df.filter(df("age") > 21).show() > println("Counting people by age") > df.groupBy("age").count().show() > } > } > {code} > Instead of getting > {code} > ++-+ > | age|count| > ++-+ > |null|1| > | 19|1| > | 30|1| > ++-+ > {code} > I get the following with my mixed endian set up: > {code} > +---+-+ > |age|count| > +---+-+ > | null|1| > |1369094286720630784|72057594037927936| > | 30|1| > +---+-+ > {code} > and on another run: > {code} > +---+-+ > |age|count| > +---+-+ > | 0|72057594037927936| > | 19|1| > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094090#comment-15094090 ] Ted Yu commented on SPARK-12778: I have seen SPARK-12555 but it didn't identify where in the codebase the problem is. I checked unsafe/src/main/java/org/apache/spark/unsafe/Platform.java in master branch where endianness is not taken care of. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094062#comment-15094062 ] Sean Owen commented on SPARK-12778: --- Can you include some detail here? Also, there are related JIRAs already resolved on this topic. What version? set a component while your'e at it, please. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094082#comment-15094082 ] Sean Owen commented on SPARK-12778: --- See for example https://issues.apache.org/jira/browse/SPARK-12555 Checked master for what? there is more detail in the thread that needs to be reproduced here, please; by itself this does not describe the problem. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12778) Use of Java Unsafe should take endianness into account
[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094078#comment-15094078 ] Ted Yu commented on SPARK-12778: I did perform a search in JIRA for Unsafe related Spark JIRAs but didn't find any unresolved one. I checked master branch code base before logging this JIRA. > Use of Java Unsafe should take endianness into account > -- > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org