Matt Cheah created SPARK-10877:
----------------------------------

             Summary: Assertions fail straightforward DataFrame job
                 Key: SPARK-10877
                 URL: https://issues.apache.org/jira/browse/SPARK-10877
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Matt Cheah


I have some code that I’m running in a unit test suite, but the code I’m 
running is failing with an assertion error.

I have translated the JUnit test that was failing, to a Scala script that I 
will attach to the ticket. The assertion error is the following:

{code}
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost 
task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: 
lengthInBytes must be a multiple of 8 (word-aligned)
at 
org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53)
at 
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289)
at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149)
at 
org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247)
at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
{code}

However, it turns out that this code actually works normally and computes the 
correct result if assertions are turned off.

I traced the code and found that when hashUnsafeWords was called, it was given 
a byte-length of 12, which clearly is not a multiple of 8. However, the job 
seems to compute correctly regardless of this fact. Of course, I can’t just 
disable assertions for my unit test though.

A few things we need to understand:

1. Why is the lengthInBytes of size 12?
2. Is it actually a problem that the byte length is not word-aligned? If so, 
how should we fix the byte length? If it's not a problem, why is the assertion 
flagging a false negative?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to