Michael, I should probably look closer myself @ the design of 1.2 vs 1.1 but I've been curious why Spark's in-memory data uses the heap instead of putting it off heap? Was this the optimization that was done in 1.2 to alleviate GC?
On Mon, Nov 3, 2014 at 8:52 PM, Shailesh Birari <sbir...@wynyardgroup.com> wrote: > Yes, I am using Spark1.1.0 and have used rdd.registerTempTable(). > I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more > than > earlier). > > I also tried by changing schema to use Long data type in some fields but > seems conversion takes more time. > Is there any way to specify index ? Though I checked and didn't found any, > just want to confirm. > > For your reference here is the snippet of code. > > > ----------------------------------------------------------------------------------------------------------------- > case class EventDataTbl(EventUID: Long, > ONum: Long, > RNum: Long, > Timestamp: java.sql.Timestamp, > Duration: String, > Type: String, > Source: String, > OName: String, > RName: String) > > val format = new java.text.SimpleDateFormat("yyyy-MM-dd > hh:mm:ss") > val cedFileName = > "hdfs://hadoophost:8020/demo/poc/JoinCsv/output_2" > val cedRdd = sc.textFile(cedFileName).map(_.split(",", > -1)).map(p => > EventDataTbl(p(0).toLong, p(1).toLong, p(2).toLong, new > java.sql.Timestamp(format.parse(p(3)).getTime()), p(4), p(5), p(6), p(7), > p(8))) > > cedRdd.registerTempTable("EventDataTbl") > sqlCntxt.cacheTable("EventDataTbl") > > val t1 = System.nanoTime() > println("\n\n10 Most frequent conversations between the > Originators and > Recipients\n") > sql("SELECT COUNT(*) AS Frequency,ONum,OName,RNum,RName > FROM EventDataTbl > GROUP BY ONum,OName,RNum,RName ORDER BY Frequency DESC LIMIT > 10").collect().foreach(println) > val t2 = System.nanoTime() > println("Time taken " + (t2-t1)/1000000000.0 + " Seconds") > > > ----------------------------------------------------------------------------------------------------------------- > > Thanks, > Shailesh > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-takes-unexpected-time-tp17925p18017.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >