Re: Spark SQL - Exception only when using cacheTable

2014-10-13 Thread poiuytrez
This is how the table was created:

transactions = parts.map(lambda p: Row(customer_id=long(p[0]),
chain=int(p[1]), dept=int(p[2]), category=int(p[3]), company=int(p[4]),
brand=int(p[5]), date=str(p[6]), productsize=float(p[7]),
productmeasure=str(p[8]), purchasequantity=int(p[9]),
purchaseamount=float(p[10])))

# Infer the schema, and register the Schema RDD as a table
schemaTransactions = sqlContext.inferSchema(transactions)
schemaTransactions.registerTempTable(transactions)
sqlContext.cacheTable(transactions)

t = sqlContext.sql(SELECT * FROM transactions WHERE purchaseamount = 50)
t.count()


Thank you,
poiuytrez



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL - Exception only when using cacheTable

2014-10-11 Thread Cheng Lian
 120.3 in
stage 7.0 (TID 2248, spark-w-0.c.db.internal): java.lang.ClassCastException:

Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16138.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


​


Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread visakh
Can you try checking whether the table is being cached? You can use isCached
method. More details are here -
http://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/sql/SQLContext.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16123.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread Cheng Lian
Hi Poiuytrez, what version of Spark are you using? Exception details 
like stacktrace are really needed to investigate this issue. You can 
find them in the executor logs, or just browse the application 
stderr/stdout link from Spark Web UI.


On 10/9/14 9:37 PM, poiuytrez wrote:

Hello,

I have a weird issue, this request works fine:
sqlContext.sql(SELECT customer_id FROM transactions WHERE purchaseamount =
200).count()

However, when I cache the table before making the request:
sqlContext.cacheTable(transactions)
sqlContext.sql(SELECT customer_id FROM transactions WHERE purchaseamount =
200).count()

I am getting an exception on of the task:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
120 in stage 104.0 failed 4 times, most recent failure: Lost task 120.3 in
stage 104.0 (TID 20537, spark-w-0.c.internal): java.lang.ClassCastException:

(I have no details after the ':')

Any ideas of what could be wrong?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread poiuytrez
I am using the python api. Unfortunately, I cannot find the isCached method
equivalent in the documentation:
https://spark.apache.org/docs/1.1.0/api/python/index.html in the SQLContext
section.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread poiuytrez
)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16138.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark SQL - Exception only when using cacheTable

2014-10-09 Thread poiuytrez
Hello, 

I have a weird issue, this request works fine:
sqlContext.sql(SELECT customer_id FROM transactions WHERE purchaseamount =
200).count()

However, when I cache the table before making the request:
sqlContext.cacheTable(transactions)
sqlContext.sql(SELECT customer_id FROM transactions WHERE purchaseamount =
200).count()

I am getting an exception on of the task:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
120 in stage 104.0 failed 4 times, most recent failure: Lost task 120.3 in
stage 104.0 (TID 20537, spark-w-0.c.internal): java.lang.ClassCastException: 

(I have no details after the ':')

Any ideas of what could be wrong? 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org