[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

rxin Thu, 08 Oct 2015 11:36:18 -0700

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/9030


    [SPARK-10914] UnsafeRow serialization breaks when two machines have 
different Oops size.

    UnsafeRow contains 3 pieces of information when pointing to some data in 
memory (an object, a base offset, and length). When the row is serialized with 
Java/Kryo serialization, the object layout in memory can change if two machines 
have different pointer width (Oops in JVM).
    
    To reproduce, launch Spark using
    
    MASTER=local-cluster[2,1,1024] bin/spark-shell --conf 
"spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
    
    And then run the following
    
    scala> sql("select 1 xx").collect()


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-10914

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9030
    
----
commit 465fc8e18147b9e8cf34e0f5bcbc338d03ad4f95
Author: Reynold Xin <r...@databricks.com>
Date:   2015-10-08T18:34:14Z

    [SPARK-10914] UnsafeRow serialization breaks when two machines have 
different Oops size.
    
    The problem is that UnsafeRow contains 3 pieces of information when 
pointing to some data in memory (an object, a base offset, and length). When 
the row is serialized with Java/Kryo serialization, the object layout in memory 
can change if two machines have different pointer width (Oops in JVM).
    
    To reproduce, launch Spark using
    
    MASTER=local-cluster[2,1,1024] bin/spark-shell --conf 
"spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
    
    And then run the following
    
    scala> sql("select 1 xx").collect()
    
    (cherry picked from commit 157b2a818d3993b1321cc41fb7b30407bd13490b)
    Signed-off-by: Reynold Xin <r...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10914] UnsafeRow serialization breaks w...

Reply via email to