Re: Spark SQL takes unexpected time

Shailesh Birari Mon, 03 Nov 2014 17:54:07 -0800

Yes, I am using Spark1.1.0 and have used rdd.registerTempTable().
I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than
earlier).


I also tried by changing schema to use Long data type in some fields but
seems conversion takes more time. 
Is there any way to specify index ?  Though I checked and didn't found any,
just want to confirm.

For your reference here is the snippet of code.

-----------------------------------------------------------------------------------------------------------------
case class EventDataTbl(EventUID: Long, 
                ONum: Long,
                RNum: Long,
                Timestamp: java.sql.Timestamp,
                Duration: String,
                Type: String,
                Source: String,
                OName: String,
                RName: String)

                val format = new java.text.SimpleDateFormat("yyyy-MM-dd 
hh:mm:ss")
                val cedFileName = 
"hdfs://hadoophost:8020/demo/poc/JoinCsv/output_2"
                val cedRdd = sc.textFile(cedFileName).map(_.split(",", 
-1)).map(p =>
EventDataTbl(p(0).toLong, p(1).toLong, p(2).toLong, new
java.sql.Timestamp(format.parse(p(3)).getTime()), p(4), p(5), p(6), p(7),
p(8)))

                cedRdd.registerTempTable("EventDataTbl")
                sqlCntxt.cacheTable("EventDataTbl")
                
                val t1 = System.nanoTime()
                println("\n\n10 Most frequent conversations between the 
Originators and
Recipients\n")
                sql("SELECT COUNT(*) AS Frequency,ONum,OName,RNum,RName FROM 
EventDataTbl
GROUP BY ONum,OName,RNum,RName ORDER BY Frequency DESC LIMIT
10").collect().foreach(println)
                val t2 = System.nanoTime()
                println("Time taken " + (t2-t1)/1000000000.0 + " Seconds")

-----------------------------------------------------------------------------------------------------------------

Thanks,
  Shailesh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-takes-unexpected-time-tp17925p18017.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark SQL takes unexpected time

Reply via email to