[ https://issues.apache.org/jira/browse/SPARK-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094404#comment-15094404 ]
Apache Spark commented on SPARK-12778: -------------------------------------- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/10725 > Use of Java Unsafe should take endianness into account > ------------------------------------------------------ > > Key: SPARK-12778 > URL: https://issues.apache.org/jira/browse/SPARK-12778 > Project: Spark > Issue Type: Bug > Components: Input/Output > Reporter: Ted Yu > > In Platform.java, methods of Java Unsafe are called directly without > considering endianness. > In thread, 'Tungsten in a mixed endian environment', Adam Roberts reported > data corruption when "spark.sql.tungsten.enabled" is enabled in mixed endian > environment. > Platform.java should take endianness into account. > Below is a copy of Adam's report: > I've been experimenting with DataFrame operations in a mixed endian > environment - a big endian master with little endian workers. With tungsten > enabled I'm encountering data corruption issues. > For example, with this simple test code: > {code} > import org.apache.spark.SparkContext > import org.apache.spark._ > import org.apache.spark.sql.SQLContext > object SimpleSQL { > def main(args: Array[String]): Unit = { > if (args.length != 1) { > println("Not enough args, you need to specify the master url") > } > val masterURL = args(0) > println("Setting up Spark context at: " + masterURL) > val sparkConf = new SparkConf > val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf) > println("Performing SQL tests") > val sqlContext = new SQLContext(sc) > println("SQL context set up") > val df = sqlContext.read.json("/tmp/people.json") > df.show() > println("Selecting everyone's age and adding one to it") > df.select(df("name"), df("age") + 1).show() > println("Showing all people over the age of 21") > df.filter(df("age") > 21).show() > println("Counting people by age") > df.groupBy("age").count().show() > } > } > {code} > Instead of getting > {code} > +----+-----+ > | age|count| > +----+-----+ > |null| 1| > | 19| 1| > | 30| 1| > +----+-----+ > {code} > I get the following with my mixed endian set up: > {code} > +-------------------+-----------------+ > | age| count| > +-------------------+-----------------+ > | null| 1| > |1369094286720630784|72057594037927936| > | 30| 1| > +-------------------+-----------------+ > {code} > and on another run: > {code} > +-------------------+-----------------+ > | age| count| > +-------------------+-----------------+ > | 0|72057594037927936| > | 19| 1| > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org