Re: can't union two rdds

2015-03-31 Thread ankurjain.nitrr
Rdd union will result in  

  1 2 
  3 4 
  5 6 
  7 8 
  9 10 
11 12

What you are trying to do is join.
There must be a logic/key to perform join operation.

I think in your case you want the order (index) to be the joining key here.
RDD is a distributed data structure and is not apt for your case.

If that amount for data is less, you can use rdd.collect, just iterate on it
both the list and produce the desired result



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/can-t-union-two-rdds-tp22320p22323.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark sql query fails with executor lost/ out of memory expection while caching a table

2015-03-31 Thread ankurjain.nitrr
Hi,

I am using spark 1.2.1

I am using thrift server to query my data.


while executing query CACHE TABLE tablename

Fails with exception

Error: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 10.0 failed 4 times, most recent failure: Lost task 0.3 in
stage 10.0 (TID 41, bbr-dev178): Execu
torLostFailure (executor 12 lost)

and sometimes

Error: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in
stage 8.0 (TID 33, bbr-dev178): java.la
ng.OutOfMemoryError: Java heap space


I understand that my executors are going out of memory during the caching
and therefore getting killed.

My question is.. 

Is there a way to make the thirft server spill the data to disk if it is not
able keep the entire dataset in memory?
Can i change the Storage Level for spark sql thrift server for caching?

I don't want my executors to get lost and cache queries to get failed.








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-sql-query-fails-with-executor-lost-out-of-memory-expection-while-caching-a-table-tp22322.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org