Unfortunately I can't share any snippet quickly as the code is generated, but for now at least can share the plan. (See it here - http://pastebin.dqd.cz/RAhm/)
After I've increased spark.sql.autoBroadcastJoinThreshold to 300000 from 100000 it went through without any problems. With 100000 it was always failing during the "planning" phase with the Exception above. 2016-03-17 22:05 GMT+01:00 Jakob Odersky <ja...@odersky.com>: > Can you share a snippet that reproduces the error? What was > spark.sql.autoBroadcastJoinThreshold before your last change? > > On Thu, Mar 17, 2016 at 10:03 AM, Jiří Syrový <syrovy.j...@gmail.com> > wrote: > > Hi, > > > > any idea what could be causing this issue? It started appearing after > > changing parameter > > > > spark.sql.autoBroadcastJoinThreshold to 100000 > > > > > > Caused by: java.lang.IllegalArgumentException: Can't zip RDDs with > unequal > > numbers of partitions > > at > > > org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:57) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at > > org.apache.spark.rdd.PartitionCoalescer.<init>(CoalescedRDD.scala:172) > > at > > org.apache.spark.rdd.CoalescedRDD.getPartitions(CoalescedRDD.scala:85) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at > > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at > > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at > > > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > > at > > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > > at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91) > > at > > > org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220) > > at > > > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254) > > at > > > org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248) > > at > > > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) > > ... 28 more > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >