[ 
https://issues.apache.org/jira/browse/SPARK-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591245#comment-14591245
 ] 

Nathan McCarthy edited comment on SPARK-6009 at 6/18/15 4:48 AM:
-----------------------------------------------------------------

Doesnt seem to just affect ORDER BY RAND(). 

Seeing the same issue in spark 1.4 with a query that has a join and uses 
something like CLUSTER BY (which implicitly does a sortBy). Created another 
issue for this one https://issues.apache.org/jira/browse/SPARK-8428 


was (Author: nemccarthy):
Doesnt seem to just affect ORDER BY RAND(). 

Seeing the same issue in spark 1.4 with a query that has a join and uses 
something like CLUSTER BY (which implicitly does a sortBy). Will raise another 
issue for this one 

> IllegalArgumentException thrown by TimSort when SQL ORDER BY RAND ()
> --------------------------------------------------------------------
>
>                 Key: SPARK-6009
>                 URL: https://issues.apache.org/jira/browse/SPARK-6009
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.0
>         Environment: Centos 7, Hadoop 2.6.0, Hive 0.15.0
> java version "1.7.0_75"
> OpenJDK Runtime Environment (rhel-2.5.4.2.el7_0-x86_64 u75-b13)
> OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>            Reporter: Paul Barber
>
> Running the following SparkSQL query over JDBC:
> {noformat}
>    SELECT *
>     FROM FAA
>   WHERE Year >= 1998 AND Year <= 1999
>     ORDER BY RAND () LIMIT 100000
> {noformat}
> This results in one or more workers throwing the following exception, with 
> variations for {{mergeLo}} and {{mergeHi}}. 
> {noformat}
>     :java.lang.IllegalArgumentException: Comparison method violates its 
> general contract!
>     - at java.util.TimSort.mergeHi(TimSort.java:868)
>     - at java.util.TimSort.mergeAt(TimSort.java:485)
>     - at java.util.TimSort.mergeCollapse(TimSort.java:410)
>     - at java.util.TimSort.sort(TimSort.java:214)
>     - at java.util.Arrays.sort(Arrays.java:727)
>     - at 
> org.spark-project.guava.common.collect.Ordering.leastOf(Ordering.java:708)
>     - at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>     - at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1138)
>     - at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1135)
>     - at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
>     - at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
>     - at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>     - at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>     - at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>     - at org.apache.spark.scheduler.Task.run(Task.scala:56)
>     - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>     - at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     - at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     - at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We have tested with both Spark 1.2.0 and Spark 1.2.1 and have seen the same 
> error in both. The query sometimes succeeds, but fails more often than not. 
> Whilst this sounds similar to bugs 3032 and 3656, we believe it it is not the 
> same.
> The {{ORDER BY RAND ()}} is using TimSort to produce the random ordering by 
> sorting a list of random values. Having spent some time looking at the issue 
> with jdb, it appears that the problem is triggered by the random values being 
> changed during the sort - the code which triggers this is in 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Row.scala}}
>  - class RowOrdering, function compare, line 250 - where a new random number 
> is taken for the same row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to