[jira] [Created] (SPARK-2578) RIGHT OUTER JOIN causes ClassCastException

2014-07-18 Thread Christian Wuertz (JIRA)
Christian Wuertz created SPARK-2578:
---

 Summary: RIGHT OUTER JOIN causes ClassCastException
 Key: SPARK-2578
 URL: https://issues.apache.org/jira/browse/SPARK-2578
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Christian Wuertz


When I run a hql query that contains a right outer join I always get this 
exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a JOIN (SELECT 
year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = b.year AND 
a.runs = b.runs)).collect.foreach(println)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2578) RIGHT OUTER JOIN causes ClassCastException

2014-07-18 Thread Christian Wuertz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Wuertz updated SPARK-2578:


Description: 
When I run a hql query that contains a right outer join I always get this 
exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a RIGHT OUTER 
JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = 
b.year AND a.runs = b.runs)).collect.foreach(println)
{code}

  was:
When I run a hql query that contains a right outer join I always get this 
exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a JOIN (SELECT 
year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = b.year AND 
a.runs = b.runs)).collect.foreach(println)
{code}


 RIGHT OUTER JOIN causes ClassCastException
 --

 Key: SPARK-2578
 URL: https://issues.apache.org/jira/browse/SPARK-2578
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Christian Wuertz

 When I run a hql query that contains a right outer join I always get this 
 exception:
 {code}
 org.apache.spark.SparkDriverExecutionException: Execution error
   at 
 

[jira] [Updated] (SPARK-2578) RIGHT OUTER JOIN causes ClassCastException

2014-07-18 Thread Christian Wuertz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Wuertz updated SPARK-2578:


Description: 
When I run a hql query that contains a right outer join I always get this 
exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a RIGHT OUTER 
JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = 
b.year AND a.runs = b.runs)).collect.foreach(println)
{code}

I compiled the 1.0.1 release myself with the following command: 
./make-distribution.sh --hadoop=2.2.0 --with-yarn --with-hive --tgz

  was:
When I run a hql query that contains a right outer join I always get this 
exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a RIGHT OUTER 
JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = 
b.year AND a.runs = b.runs)).collect.foreach(println)
{code}


 RIGHT OUTER JOIN causes ClassCastException
 --

 Key: SPARK-2578
 URL: https://issues.apache.org/jira/browse/SPARK-2578
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Christian Wuertz

 When I run a hql query that contains a right outer join I always get this 
 

[jira] [Updated] (SPARK-2578) OUTER JOINs cause ClassCastException

2014-07-18 Thread Christian Wuertz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Wuertz updated SPARK-2578:


Summary: OUTER JOINs cause ClassCastException  (was: RIGHT OUTER JOIN 
causes ClassCastException)

 OUTER JOINs cause ClassCastException
 

 Key: SPARK-2578
 URL: https://issues.apache.org/jira/browse/SPARK-2578
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Christian Wuertz

 When I run a hql query that contains a right outer join I always get this 
 exception:
 {code}
 org.apache.spark.SparkDriverExecutionException: Execution error
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
 cannot be cast to scala.collection.mutable.BitSet
   at 
 org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
   at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
   at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
   at 
 org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
   ... 10 more
 {code} 
 This can easily reproduced using the queries and data from this 
 [tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
  Only the last query was modified to use a right outer join.
 {code}
 hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a RIGHT 
 OUTER JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON 
 (a.year = b.year AND a.runs = b.runs)).collect.foreach(println)
 {code}
 I compiled the 1.0.1 release myself with the following command: 
 ./make-distribution.sh --hadoop=2.2.0 --with-yarn --with-hive --tgz



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (SPARK-2578) OUTER JOINs cause ClassCastException

2014-07-18 Thread Christian Wuertz (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Wuertz updated SPARK-2578:


Description: 
When I run a hql query that contains a right or a left outer join I always get 
this exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a RIGHT OUTER 
JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = 
b.year AND a.runs = b.runs)).collect.foreach(println)
{code}

I compiled the 1.0.1 release myself with the following command: 
./make-distribution.sh --hadoop=2.2.0 --with-yarn --with-hive --tgz

  was:
When I run a hql query that contains a right outer join I always get this 
exception:

{code}
org.apache.spark.SparkDriverExecutionException: Execution error
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:849)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ClassCastException: scala.collection.mutable.HashSet 
cannot be cast to scala.collection.mutable.BitSet
at 
org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$7.apply(joins.scala:338)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:811)
at org.apache.spark.rdd.RDD$$anonfun$19.apply(RDD.scala:808)
at 
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:845)
... 10 more
{code} 

This can easily reproduced using the queries and data from this 
[tutorial|http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/].
 Only the last query was modified to use a right outer join.

{code}
hiveContext.hql(SELECT a.year, a.player_id, a.runs from batting a RIGHT OUTER 
JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = 
b.year AND a.runs = b.runs)).collect.foreach(println)
{code}

I compiled the 1.0.1 release myself with the following command: 
./make-distribution.sh --hadoop=2.2.0 --with-yarn --with-hive --tgz


 OUTER JOINs cause ClassCastException
 

 Key: SPARK-2578
 URL: https://issues.apache.org/jira/browse/SPARK-2578
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects