[jira] [Comment Edited] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment

Reynold Xin (JIRA) Fri, 16 Oct 2015 15:26:29 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943956#comment-14943956
 ]


Reynold Xin edited comment on SPARK-10877 at 10/16/15 10:25 PM:
----------------------------------------------------------------

I ran your SparkFilterByKeyTest.scala from spark-shell but did not run into the 
problem you stated above. At which statement did you run into the exception? 
randomUUID()?

{code}
scala> df1.printSchema
root
 |-- col1: integer (nullable = true)
 |-- col2: integer (nullable = true)
 |-- col3: string (nullable = true)
 |-- col4: double (nullable = true)
 |-- col5: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- indexfordefaultordering: integer (nullable = true)


scala> df2.printSchema
root
 |-- col1: integer (nullable = true)
 |-- col2: integer (nullable = true)
 |-- col3: string (nullable = true)
 |-- col4: double (nullable = true)
 |-- col5: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- indexfordefaultordering: integer (nullable = true)

scala> renamedRightDf.printSchema
root
 |-- cole9cf9a80b37641a1957509ad61e1f823: integer (nullable = true)
 |-- col3e0904a3138e4f2ba987b3d282d61927: integer (nullable = true)
 |-- colb92786c9d3944106a536038604bcb4ee: string (nullable = true)
 |-- coled77bd68bd634a328787c1489564e7ac: double (nullable = true)
 |-- colf8b2385adf014d7cb215ddbffb8d1b24: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- col0d1102e414e5441faab32b617a27ac60: integer (nullable = true)

scalc> joinedDf.collect
res6: Array[org.apache.spark.sql.Row] = Array()

{code}


was (Author: jasoncl):
I ran your SparkFilterByKeyTest.scala from spark-shell but did not run into the 
problem you stated above. At which statement did you run into the exception? 
randomUUID()?

scala> df1.printSchema
root
 |-- col1: integer (nullable = true)
 |-- col2: integer (nullable = true)
 |-- col3: string (nullable = true)
 |-- col4: double (nullable = true)
 |-- col5: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- indexfordefaultordering: integer (nullable = true)


scala> df2.printSchema
root
 |-- col1: integer (nullable = true)
 |-- col2: integer (nullable = true)
 |-- col3: string (nullable = true)
 |-- col4: double (nullable = true)
 |-- col5: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- indexfordefaultordering: integer (nullable = true)

scala> renamedRightDf.printSchema
root
 |-- cole9cf9a80b37641a1957509ad61e1f823: integer (nullable = true)
 |-- col3e0904a3138e4f2ba987b3d282d61927: integer (nullable = true)
 |-- colb92786c9d3944106a536038604bcb4ee: string (nullable = true)
 |-- coled77bd68bd634a328787c1489564e7ac: double (nullable = true)
 |-- colf8b2385adf014d7cb215ddbffb8d1b24: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- col0d1102e414e5441faab32b617a27ac60: integer (nullable = true)

scalc> joinedDf.collect
res6: Array[org.apache.spark.sql.Row] = Array()


> Assertions fail straightforward DataFrame job due to word alignment
> -------------------------------------------------------------------
>
>                 Key: SPARK-10877
>                 URL: https://issues.apache.org/jira/browse/SPARK-10877
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Matt Cheah
>         Attachments: SparkFilterByKeyTest.scala
>
>
> I have some code that I’m running in a unit test suite, but the code I’m 
> running is failing with an assertion error.
> I have translated the JUnit test that was failing, to a Scala script that I 
> will attach to the ticket. The assertion error is the following:
> {code}
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
> Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: 
> lengthInBytes must be a multiple of 8 (word-aligned)
> at 
> org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53)
> at 
> org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289)
> at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149)
> at 
> org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247)
> at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85)
> at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
> at 
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> {code}
> However, it turns out that this code actually works normally and computes the 
> correct result if assertions are turned off.
> I traced the code and found that when hashUnsafeWords was called, it was 
> given a byte-length of 12, which clearly is not a multiple of 8. However, the 
> job seems to compute correctly regardless of this fact. Of course, I can’t 
> just disable assertions for my unit test though.
> A few things we need to understand:
> 1. Why is the lengthInBytes of size 12?
> 2. Is it actually a problem that the byte length is not word-aligned? If so, 
> how should we fix the byte length? If it's not a problem, why is the 
> assertion flagging a false negative?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment

Reply via email to