[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962093#comment-14962093 ] Davies Liu commented on SPARK-10877: Yes > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah >Assignee: Davies Liu > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961505#comment-14961505 ] Davies Liu commented on SPARK-10877: This is already fixed in master and 1.5 branch. > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah >Assignee: Davies Liu > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961550#comment-14961550 ] Matt Cheah commented on SPARK-10877: Is it this? https://github.com/apache/spark/pull/8987 > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah >Assignee: Davies Liu > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961507#comment-14961507 ] Matt Cheah commented on SPARK-10877: [~davies] can you link the PR that resolved this? > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah >Assignee: Davies Liu > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961457#comment-14961457 ] Matt Cheah commented on SPARK-10877: The exception is occurring executor side and not in any code that I have written on my end. It's manifesting deep within the shuffle partitioning logic. > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943956#comment-14943956 ] Jason C Lee commented on SPARK-10877: - I ran your SparkFilterByKeyTest.scala from spark-shell but did not run into the problem you stated above. At which statement did you run into the exception? randomUUID()? scala> df1.printSchema root |-- col1: integer (nullable = true) |-- col2: integer (nullable = true) |-- col3: string (nullable = true) |-- col4: double (nullable = true) |-- col5: array (nullable = true) ||-- element: string (containsNull = true) |-- indexfordefaultordering: integer (nullable = true) scala> df2.printSchema root |-- col1: integer (nullable = true) |-- col2: integer (nullable = true) |-- col3: string (nullable = true) |-- col4: double (nullable = true) |-- col5: array (nullable = true) ||-- element: string (containsNull = true) |-- indexfordefaultordering: integer (nullable = true) scala> renamedRightDf.printSchema root |-- cole9cf9a80b37641a1957509ad61e1f823: integer (nullable = true) |-- col3e0904a3138e4f2ba987b3d282d61927: integer (nullable = true) |-- colb92786c9d3944106a536038604bcb4ee: string (nullable = true) |-- coled77bd68bd634a328787c1489564e7ac: double (nullable = true) |-- colf8b2385adf014d7cb215ddbffb8d1b24: array (nullable = true) ||-- element: string (containsNull = true) |-- col0d1102e414e5441faab32b617a27ac60: integer (nullable = true) scalc> joinedDf.collect res6: Array[org.apache.spark.sql.Row] = Array() > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944002#comment-14944002 ] Matt Cheah commented on SPARK-10877: Can you turn off assertions when you spawn the shell? Assertions are off by default for all JVMs but are turned on for unit tests. > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944010#comment-14944010 ] Matt Cheah commented on SPARK-10877: Is it possible that this error is JVM or platform dependent? > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944141#comment-14944141 ] Jason C Lee commented on SPARK-10877: - I enabled assertions by specifying either the following in my build.sbt javaOptions += "-ea" and use set package to build. I also run it with spark-submit instead of spark-shell...still doesn't see what you see. $SPARK_HOME/bin/spark-submit --class "SparkFilterByKeyTest" --master local[2] target/scala-2.10/simple-project_2.10-1.0.jar > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944150#comment-14944150 ] Matt Cheah commented on SPARK-10877: Does spark-submit enable assertions? I'm not sure how SBT passes these kinds of assertion options along. Also, what JDK / Java Version are you using, and what OS? > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10877) Assertions fail straightforward DataFrame job due to word alignment
[ https://issues.apache.org/jira/browse/SPARK-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936259#comment-14936259 ] Matt Cheah commented on SPARK-10877: Also the error doesn't occur if I turn code-generation off despite keeping tungsten on. > Assertions fail straightforward DataFrame job due to word alignment > --- > > Key: SPARK-10877 > URL: https://issues.apache.org/jira/browse/SPARK-10877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Matt Cheah > Attachments: SparkFilterByKeyTest.scala > > > I have some code that I’m running in a unit test suite, but the code I’m > running is failing with an assertion error. > I have translated the JUnit test that was failing, to a Scala script that I > will attach to the ticket. The assertion error is the following: > {code} > Exception in thread "main" org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: > Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AssertionError: > lengthInBytes must be a multiple of 8 (word-aligned) > at > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeWords(Murmur3_x86_32.java:53) > at > org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.hashCode(UnsafeArrayData.java:289) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.hashCode(rows.scala:149) > at > org.apache.spark.sql.catalyst.expressions.GenericMutableRow.hashCode(rows.scala:247) > at org.apache.spark.HashPartitioner.getPartition(Partitioner.scala:85) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1$$anonfun$4$$anonfun$apply$4.apply(Exchange.scala:180) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > {code} > However, it turns out that this code actually works normally and computes the > correct result if assertions are turned off. > I traced the code and found that when hashUnsafeWords was called, it was > given a byte-length of 12, which clearly is not a multiple of 8. However, the > job seems to compute correctly regardless of this fact. Of course, I can’t > just disable assertions for my unit test though. > A few things we need to understand: > 1. Why is the lengthInBytes of size 12? > 2. Is it actually a problem that the byte length is not word-aligned? If so, > how should we fix the byte length? If it's not a problem, why is the > assertion flagging a false negative? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org