Re: Measuer Bytes READ and Peak Memory Usage for Query

2015-03-24 Thread anamika gupta
Yeah thanks, I can now see the memory usage.

Please also verify if bytes read == Combined size of all RDDs ?

Actually, all my RDDs are completely cached in memory. So, Combined size of
my RDDs = Mem used (verified from WebUI)


On Fri, Mar 20, 2015 at 12:07 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You could do a cache and see the memory usage under Storage tab in the
 driver UI (runs on port 4040)

 Thanks
 Best Regards

 On Fri, Mar 20, 2015 at 12:02 PM, anu anamika.guo...@gmail.com wrote:

 Hi All

 I would like to measure Bytes Read and Peak Memory Usage for a Spark SQL
 Query.

 Please clarify if Bytes Read = aggregate size of all RDDs ??
 All my RDDs are in memory and 0B spill to disk.

 And I am clueless how to measure Peak Memory Usage.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Measuer-Bytes-READ-and-Peak-Memory-Usage-for-Query-tp22159.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Optimizing SQL Query

2015-03-09 Thread anamika gupta
Please fine the query plan

scala sqlContext.sql(SELECT dw.DAY_OF_WEEK, dw.HOUR, avg(dw.SDP_USAGE) AS
AVG_SDP_USAGE FROM (SELECT sdp.WID, DAY_OF_WEEK, HOUR, SUM(INTERVAL_VALUE)
AS SDP_USAGE FROM (SELECT * FROM date_d AS dd JOIN interval_f AS intf ON
intf.DATE_WID = dd.WID WHERE intf.DATE_WID = 20150101 AND intf.DATE_WID =
20150110 AND CAST(INTERVAL_END_TIME AS STRING) = '2015-01-01 00:00:00.000'
AND CAST(INTERVAL_END_TIME AS STRING) = '2015-01-10 00:00:00.000' AND
MEAS_WID = 3) AS test JOIN sdp_d AS sdp on test.SDP_WID = sdp.WID where
sdp.UDC_ID = 'SP-168451834' group by sdp.WID, DAY_OF_WEEK, HOUR) AS dw
group by dw.DAY_OF_WEEK, dw.HOUR)


q2: org.apache.spark.sql.SchemaRDD = SchemaRDD[36] at RDD at
SchemaRDD.scala:103

== Query Plan ==
== Physical Plan ==

Aggregate false, [DAY_OF_WEEK#3,HOUR#43L],
[DAY_OF_WEEK#3,HOUR#43L,(CAST(SUM(PartialSum#133), DoubleType) /
CAST(SUM(PartialCount#134L), DoubleType)) AS AVG_SDP_USAGE#126]
 Exchange (HashPartitioning [DAY_OF_WEEK#3,HOUR#43L], 200)
  Aggregate true, [DAY_OF_WEEK#3,HOUR#43L],
[DAY_OF_WEEK#3,HOUR#43L,COUNT(SDP_USAGE#130) AS
PartialCount#134L,SUM(SDP_USAGE#130) AS PartialSum#133]
   Project [DAY_OF_WEEK#3,HOUR#43L,SDP_USAGE#130]
Aggregate false, [WID#49,DAY_OF_WEEK#3,HOUR#43L],
[WID#49,DAY_OF_WEEK#3,HOUR#43L,SUM(PartialSum#136) AS SDP_USAGE#130]
 Exchange (HashPartitioning [WID#49,DAY_OF_WEEK#3,HOUR#43L], 200)
  Aggregate true, [WID#49,DAY_OF_WEEK#3,HOUR#43L], [...


Re: Facing error: java.lang.ArrayIndexOutOfBoundsException while executing SparkSQL join query

2015-02-28 Thread anamika gupta
The issue is now resolved.

One of the csv files had an incorrect record at the end.

On Fri, Feb 27, 2015 at 4:24 PM, anamika gupta anamika.guo...@gmail.com
wrote:

 I have three tables with the following schema:

 case class* date_d*(WID: Int, CALENDAR_DATE: java.sql.Timestamp,
 DATE_STRING: String, DAY_OF_WEEK: String, DAY_OF_MONTH: Int, DAY_OF_YEAR:
 Int, END_OF_MONTH_FLAG: String, YEARWEEK: Int, CALENDAR_MONTH: String,
 MONTH_NUM: Int, YEARMONTH: Int, QUARTER: Int, YEAR: Int)



 case class* interval_f*(ORG_ID: Int, CHANNEL_WID: Int, SDP_WID: Int,
 MEAS_WID: Int, DATE_WID: Int, TIME_WID: Int, VALIDATION_STATUS_CD: Int,
 VAL_FAIL_CD:Int, INTERVAL_FLAG_CD: Int, CHANGE_METHOD_WID:Int,
 SOURCE_LAST_UPD_TIME: java.sql.Timestamp, INTERVAL_END_TIME:
 java.sql.Timestamp, LOCKED: String, EXT_VERSION_TIME: java.sql.Timestamp,
 INTERVAL_VALUE: Double, INSERT_TIME: java.sql.Timestamp, LAST_UPD_TIME:
 java.sql.Timestamp)



 class *sdp_d*( WID :Option[Int], BATCH_ID :Option[Int], SRC_ID
 :Option[String], ORG_ID :Option[Int], CLASS_WID :Option[Int], DESC_TEXT
 :Option[String], PREMISE_WID :Option[Int], FEED_LOC :Option[String],
 GPS_LAT :Option[Double], GPS_LONG :Option[Double], PULSE_OUTPUT_BLOCK
 :Option[String], UDC_ID :Option[String], UNIVERSAL_ID :Option[String],
 IS_VIRTUAL_FLG :Option[String], SEAL_INFO :Option[String], ACCESS_INFO
 :Option[String], ALT_ACCESS_INFO :Option[String], LOC_INFO :Option[String],
 ALT_LOC_INFO :Option[String], TYPE :Option[String], SUB_TYPE
 :Option[String], TIMEZONE_ID :Option[Int], GIS_ID :Option[String],
 BILLED_UPTO_TIME :Option[java.sql.Timestamp], POWER_STATUS :Option[String],
 LOAD_STATUS :Option[String], BILLING_HOLD_STATUS :Option[String],
 INSERT_TIME :Option[java.sql.Timestamp], LAST_UPD_TIME
 :Option[java.sql.Timestamp]) extends Product{

 @throws(classOf[IndexOutOfBoundsException])
 override def productElement(n: Int) = n match
 {
 case 0 = WID; case 1 = BATCH_ID; case 2 = SRC_ID; case 3 =
 ORG_ID; case 4 = CLASS_WID; case 5 = DESC_TEXT; case 6 = PREMISE_WID;
 case 7 = FEED_LOC; case 8 = GPS_LAT; case 9 = GPS_LONG; case 10 =
 PULSE_OUTPUT_BLOCK; case 11 = UDC_ID; case 12 = UNIVERSAL_ID; case 13 =
 IS_VIRTUAL_FLG; case 14 = SEAL_INFO; case 15 = ACCESS_INFO; case 16 =
 ALT_ACCESS_INFO; case 17 = LOC_INFO; case 18 = ALT_LOC_INFO; case 19 =
 TYPE; case 20 = SUB_TYPE; case 21 = TIMEZONE_ID; case 22 = GIS_ID; case
 23 = BILLED_UPTO_TIME; case 24 = POWER_STATUS; case 25 = LOAD_STATUS;
 case 26 = BILLING_HOLD_STATUS; case 27 = INSERT_TIME; case 28 =
 LAST_UPD_TIME; case _ = throw new IndexOutOfBoundsException(n.toString())
 }

 override def productArity: Int = 29; override def canEqual(that: Any):
 Boolean = that.isInstanceOf[sdp_d]
 }



 Non-join queries work fine:

 *val q1 = sqlContext.sql(SELECT YEAR, DAY_OF_YEAR, MAX(WID), MIN(WID),
 COUNT(*) FROM date_d GROUP BY YEAR, DAY_OF_YEAR ORDER BY YEAR,
 DAY_OF_YEAR)*

 res4: Array[org.apache.spark.sql.Row] =
 Array([2014,305,20141101,20141101,1], [2014,306,20141102,20141102,1],
 [2014,307,20141103,20141103,1], [2014,308,20141104,20141104,1],
 [2014,309,20141105,20141105,1], [2014,310,20141106,20141106,1],
 [2014,311,20141107,20141107,1], [2014,312,20141108,20141108,1],
 [2014,313,20141109,20141109,1], [2014,314,20141110,20141110,1],
 [2014,315,2014,2014,1], [2014,316,20141112,20141112,1],
 [2014,317,20141113,20141113,1], [2014,318,20141114,20141114,1],
 [2014,319,20141115,20141115,1], [2014,320,20141116,20141116,1],
 [2014,321,20141117,20141117,1], [2014,322,20141118,20141118,1],
 [2014,323,20141119,20141119,1], [2014,324,20141120,20141120,1],
 [2014,325,20141121,20141121,1], [2014,326,20141122,20141122,1],
 [2014,327,20141123,20141123,1], [2014,328,20141...



 But the join queries throw this error:*
 java.lang.ArrayIndexOutOfBoundsException*

 *scala val q = sqlContext.sql(select * from date_d dd join interval_f
 intf on intf.DATE_WID = dd.WID Where intf.DATE_WID = 20141101 AND
 intf.DATE_WID = 20141110)*

 q: org.apache.spark.sql.SchemaRDD =
 SchemaRDD[38] at RDD at SchemaRDD.scala:103
 == Query Plan ==
 == Physical Plan ==
 Project
 [WID#0,CALENDAR_DATE#1,DATE_STRING#2,DAY_OF_WEEK#3,DAY_OF_MONTH#4,DAY_OF_YEAR#5,END_OF_MONTH_FLAG#6,YEARWEEK#7,CALENDAR_MONTH#8,MONTH_NUM#9,YEARMONTH#10,QUARTER#11,YEAR#12,ORG_ID#13,CHANNEL_WID#14,SDP_WID#15,MEAS_WID#16,DATE_WID#17,TIME_WID#18,VALIDATION_STATUS_CD#19,VAL_FAIL_CD#20,INTERVAL_FLAG_CD#21,CHANGE_METHOD_WID#22,SOURCE_LAST_UPD_TIME#23,INTERVAL_END_TIME#24,LOCKED#25,EXT_VERSION_TIME#26,INTERVAL_VALUE#27,INSERT_TIME#28,LAST_UPD_TIME#29]
  ShuffledHashJoin [WID#0], [DATE_WID#17], BuildRight
   Exchange (HashPartitioning [WID#0], 200)
InMemoryColumnarTableScan
 [WID#0,CALENDAR_DATE#1,DATE_STRING#2,DAY_OF_WEEK#3,DAY_OF_MONTH#4,DAY_OF_YEAR#5,END_OF_MONTH_FLA...


 *scala q.take(5).foreach(println)*

 15/02/27 15:50:26 INFO SparkContext: Starting job: runJob at
 basicOperators.scala:136
 15/02/27 15:50:26 INFO DAGScheduler: Registering RDD 46

Facing error: java.lang.ArrayIndexOutOfBoundsException while executing SparkSQL join query

2015-02-27 Thread anamika gupta
I have three tables with the following schema:

case class* date_d*(WID: Int, CALENDAR_DATE: java.sql.Timestamp,
DATE_STRING: String, DAY_OF_WEEK: String, DAY_OF_MONTH: Int, DAY_OF_YEAR:
Int, END_OF_MONTH_FLAG: String, YEARWEEK: Int, CALENDAR_MONTH: String,
MONTH_NUM: Int, YEARMONTH: Int, QUARTER: Int, YEAR: Int)



case class* interval_f*(ORG_ID: Int, CHANNEL_WID: Int, SDP_WID: Int,
MEAS_WID: Int, DATE_WID: Int, TIME_WID: Int, VALIDATION_STATUS_CD: Int,
VAL_FAIL_CD:Int, INTERVAL_FLAG_CD: Int, CHANGE_METHOD_WID:Int,
SOURCE_LAST_UPD_TIME: java.sql.Timestamp, INTERVAL_END_TIME:
java.sql.Timestamp, LOCKED: String, EXT_VERSION_TIME: java.sql.Timestamp,
INTERVAL_VALUE: Double, INSERT_TIME: java.sql.Timestamp, LAST_UPD_TIME:
java.sql.Timestamp)



class *sdp_d*( WID :Option[Int], BATCH_ID :Option[Int], SRC_ID
:Option[String], ORG_ID :Option[Int], CLASS_WID :Option[Int], DESC_TEXT
:Option[String], PREMISE_WID :Option[Int], FEED_LOC :Option[String],
GPS_LAT :Option[Double], GPS_LONG :Option[Double], PULSE_OUTPUT_BLOCK
:Option[String], UDC_ID :Option[String], UNIVERSAL_ID :Option[String],
IS_VIRTUAL_FLG :Option[String], SEAL_INFO :Option[String], ACCESS_INFO
:Option[String], ALT_ACCESS_INFO :Option[String], LOC_INFO :Option[String],
ALT_LOC_INFO :Option[String], TYPE :Option[String], SUB_TYPE
:Option[String], TIMEZONE_ID :Option[Int], GIS_ID :Option[String],
BILLED_UPTO_TIME :Option[java.sql.Timestamp], POWER_STATUS :Option[String],
LOAD_STATUS :Option[String], BILLING_HOLD_STATUS :Option[String],
INSERT_TIME :Option[java.sql.Timestamp], LAST_UPD_TIME
:Option[java.sql.Timestamp]) extends Product{

@throws(classOf[IndexOutOfBoundsException])
override def productElement(n: Int) = n match
{
case 0 = WID; case 1 = BATCH_ID; case 2 = SRC_ID; case 3 =
ORG_ID; case 4 = CLASS_WID; case 5 = DESC_TEXT; case 6 = PREMISE_WID;
case 7 = FEED_LOC; case 8 = GPS_LAT; case 9 = GPS_LONG; case 10 =
PULSE_OUTPUT_BLOCK; case 11 = UDC_ID; case 12 = UNIVERSAL_ID; case 13 =
IS_VIRTUAL_FLG; case 14 = SEAL_INFO; case 15 = ACCESS_INFO; case 16 =
ALT_ACCESS_INFO; case 17 = LOC_INFO; case 18 = ALT_LOC_INFO; case 19 =
TYPE; case 20 = SUB_TYPE; case 21 = TIMEZONE_ID; case 22 = GIS_ID; case
23 = BILLED_UPTO_TIME; case 24 = POWER_STATUS; case 25 = LOAD_STATUS;
case 26 = BILLING_HOLD_STATUS; case 27 = INSERT_TIME; case 28 =
LAST_UPD_TIME; case _ = throw new IndexOutOfBoundsException(n.toString())
}

override def productArity: Int = 29; override def canEqual(that: Any):
Boolean = that.isInstanceOf[sdp_d]
}



Non-join queries work fine:

*val q1 = sqlContext.sql(SELECT YEAR, DAY_OF_YEAR, MAX(WID), MIN(WID),
COUNT(*) FROM date_d GROUP BY YEAR, DAY_OF_YEAR ORDER BY YEAR,
DAY_OF_YEAR)*

res4: Array[org.apache.spark.sql.Row] =
Array([2014,305,20141101,20141101,1], [2014,306,20141102,20141102,1],
[2014,307,20141103,20141103,1], [2014,308,20141104,20141104,1],
[2014,309,20141105,20141105,1], [2014,310,20141106,20141106,1],
[2014,311,20141107,20141107,1], [2014,312,20141108,20141108,1],
[2014,313,20141109,20141109,1], [2014,314,20141110,20141110,1],
[2014,315,2014,2014,1], [2014,316,20141112,20141112,1],
[2014,317,20141113,20141113,1], [2014,318,20141114,20141114,1],
[2014,319,20141115,20141115,1], [2014,320,20141116,20141116,1],
[2014,321,20141117,20141117,1], [2014,322,20141118,20141118,1],
[2014,323,20141119,20141119,1], [2014,324,20141120,20141120,1],
[2014,325,20141121,20141121,1], [2014,326,20141122,20141122,1],
[2014,327,20141123,20141123,1], [2014,328,20141...



But the join queries throw this error:*
java.lang.ArrayIndexOutOfBoundsException*

*scala val q = sqlContext.sql(select * from date_d dd join interval_f
intf on intf.DATE_WID = dd.WID Where intf.DATE_WID = 20141101 AND
intf.DATE_WID = 20141110)*

q: org.apache.spark.sql.SchemaRDD =
SchemaRDD[38] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project
[WID#0,CALENDAR_DATE#1,DATE_STRING#2,DAY_OF_WEEK#3,DAY_OF_MONTH#4,DAY_OF_YEAR#5,END_OF_MONTH_FLAG#6,YEARWEEK#7,CALENDAR_MONTH#8,MONTH_NUM#9,YEARMONTH#10,QUARTER#11,YEAR#12,ORG_ID#13,CHANNEL_WID#14,SDP_WID#15,MEAS_WID#16,DATE_WID#17,TIME_WID#18,VALIDATION_STATUS_CD#19,VAL_FAIL_CD#20,INTERVAL_FLAG_CD#21,CHANGE_METHOD_WID#22,SOURCE_LAST_UPD_TIME#23,INTERVAL_END_TIME#24,LOCKED#25,EXT_VERSION_TIME#26,INTERVAL_VALUE#27,INSERT_TIME#28,LAST_UPD_TIME#29]
 ShuffledHashJoin [WID#0], [DATE_WID#17], BuildRight
  Exchange (HashPartitioning [WID#0], 200)
   InMemoryColumnarTableScan
[WID#0,CALENDAR_DATE#1,DATE_STRING#2,DAY_OF_WEEK#3,DAY_OF_MONTH#4,DAY_OF_YEAR#5,END_OF_MONTH_FLA...


*scala q.take(5).foreach(println)*

15/02/27 15:50:26 INFO SparkContext: Starting job: runJob at
basicOperators.scala:136
15/02/27 15:50:26 INFO DAGScheduler: Registering RDD 46 (mapPartitions at
Exchange.scala:48)
15/02/27 15:50:26 INFO FileInputFormat: Total input paths to process : 1
15/02/27 15:50:26 INFO DAGScheduler: Registering RDD 42 (mapPartitions at
Exchange.scala:48)
15/02/27 15:50:26 INFO DAGScheduler: Got job 2 

Re: Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-26 Thread anamika gupta
Hi Patrick

Thanks a ton for your in-depth answer. The compilation error is now
resolved.

Thanks a lot again !!

On Thu, Feb 26, 2015 at 2:40 PM, Patrick Varilly 
patrick.vari...@dataminded.be wrote:

 Hi, Akhil,

 In your definition of sdp_d
 http://stackoverflow.com/questions/28689186/facing-error-while-extending-scala-class-with-product-interface-to-overcome-limi,
 all your fields are of type Option[X].  In Scala, a value of type Option[X]
 can hold one of two things:

 1. None
 2. Some(x), where x is of type X

 So to fix your immediate problem, wrap all your parameters to the sdp_d
 constructor in Some(...), as follows:

new sdp_d(Some(r(0).trim.toInt), Some(r(1).trim.toInt),
 Some(r(2).trim), ...

 Your earlier question of why writing sdp_d(...) for a case class works but
 you need to write new sdp_d(...) for an explicit class, there's a simple
 answer.  When you create a case class X in scala, Scala also makes a
 companion object X behind the scenes with an apply method that calls new
 (see below).  Scala's rules will call this apply method automatically.  So,
 when you write X(...), you're really calling X.apply(...) which in turn
 calls new X(...).  (This is the same trick behind writing things like
 List(1,2,3))  If you don't use a case class, you'd have to make the
 companion object yourself explicitly.

 For reference, this statement:

case class X(a: A, b: B)

 is conceptually equivalent to

class X(val a: A, val b: B) extends ... {

   override def toString: String = // Auto-generated
   override def hashCode: Int = // Auto-generated
   override def equals(that: Any): Boolean = // Auto-generated

   ... more convenience methods ...
}

object X {
   def apply(a: A, b: B) = new X(a, b)
   ... more convenience methods ...
}

 If you want to peek under the hood, try compiling a simple X.scala file
 with the line case class X(a: Int, b: Double), then taking apart the
 generated X.class and X$.class (e.g., javap X.class).

 More info here
 http://docs.scala-lang.org/tutorials/tour/case-classes.html, here
 http://www.scala-lang.org/docu/files/ScalaReference.pdf and in Programming
 in Scala http://www.artima.com/shop/programming_in_scala_2ed ch 15.

 Hope that helps!

 Best,

 Patrick

 On Thu, Feb 26, 2015 at 6:37 AM, anamika gupta anamika.guo...@gmail.com
 wrote:

 I am now getting the following error. I cross-checked my types and
 corrected three of them i.e. r26--String, r27--Timestamp,
 r28--Timestamp. This error still persists.

 scala
 sc.textFile(/home/cdhuser/Desktop/Sdp_d.csv).map(_.split(,)).map { r =
  | val upto_time = sdf.parse(r(23).trim);
  | calendar.setTime(upto_time);
  | val r23 = new java.sql.Timestamp(upto_time.getTime)
  | val insert_time = sdf.parse(r(27).trim)
  | calendar.setTime(insert_time)
  | val r27 = new java.sql.Timestamp(insert_time.getTime)
  | val last_upd_time = sdf.parse(r(28).trim)
  | calendar.setTime(last_upd_time)
  | val r28 = new java.sql.Timestamp(last_upd_time.getTime)
  | new sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim,
 r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim,
 r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim,
 r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim,
 r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim,
 r(25).trim, r(26).trim, r27, r28)
  | }.registerAsTable(sdp_d)

 console:26: error: type mismatch;
  found   : Int
  required: Option[Int]
   new sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim,
 r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim,
 r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim,
 r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim,
 r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim,
 r(25).trim, r(26).trim, r27, r28)

 On Wed, Feb 25, 2015 at 2:32 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 It says sdp_d not found, since it is a class you need to instantiate it
 once. like:

 sc.textFile(derby.log).map(_.split(,)).map( r = {
   val upto_time = sdf.parse(r(23).trim);
   calendar.setTime(upto_time);
   val r23 = new java.sql.Timestamp(upto_time.getTime);

   val insert_time = sdf.parse(r(26).trim);
   calendar.setTime(insert_time);
   val r26 = new java.sql.Timestamp(insert_time.getTime);

   val last_upd_time = sdf.parse(r(27).trim);
   calendar.setTime(last_upd_time);
   val r27 = new java.sql.Timestamp(last_upd_time.getTime);

   *new* *sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim,
 r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim,
 r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim,
 r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim,
 r(19).trim, r(20).trim, r(21

Re: Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-25 Thread anamika gupta
I am now getting the following error. I cross-checked my types and
corrected three of them i.e. r26--String, r27--Timestamp,
r28--Timestamp. This error still persists.

scala sc.textFile(/home/cdhuser/Desktop/Sdp_d.csv).map(_.split(,)).map
{ r =
 | val upto_time = sdf.parse(r(23).trim);
 | calendar.setTime(upto_time);
 | val r23 = new java.sql.Timestamp(upto_time.getTime)
 | val insert_time = sdf.parse(r(27).trim)
 | calendar.setTime(insert_time)
 | val r27 = new java.sql.Timestamp(insert_time.getTime)
 | val last_upd_time = sdf.parse(r(28).trim)
 | calendar.setTime(last_upd_time)
 | val r28 = new java.sql.Timestamp(last_upd_time.getTime)
 | new sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim,
r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim,
r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim,
r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim,
r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim,
r(25).trim, r(26).trim, r27, r28)
 | }.registerAsTable(sdp_d)

console:26: error: type mismatch;
 found   : Int
 required: Option[Int]
  new sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim,
r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim,
r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim,
r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim,
r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim,
r(25).trim, r(26).trim, r27, r28)

On Wed, Feb 25, 2015 at 2:32 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 It says sdp_d not found, since it is a class you need to instantiate it
 once. like:

 sc.textFile(derby.log).map(_.split(,)).map( r = {
   val upto_time = sdf.parse(r(23).trim);
   calendar.setTime(upto_time);
   val r23 = new java.sql.Timestamp(upto_time.getTime);

   val insert_time = sdf.parse(r(26).trim);
   calendar.setTime(insert_time);
   val r26 = new java.sql.Timestamp(insert_time.getTime);

   val last_upd_time = sdf.parse(r(27).trim);
   calendar.setTime(last_upd_time);
   val r27 = new java.sql.Timestamp(last_upd_time.getTime);

   *new* *sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim,
 r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim,
 r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim,
 r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim,
 r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim,
 r(25).trim, r26, r27, r(28).trim)*
   }).registerAsTable(sdp)

 Thanks
 Best Regards

 On Wed, Feb 25, 2015 at 2:14 PM, anamika gupta anamika.guo...@gmail.com
 wrote:

 The link has proved helpful. I have been able to load data, register it
 as a table and perform simple queries. Thanks Akhil !!

 Though, I still look forward to knowing where I was going wrong with my
 previous technique of extending the Product Interface to overcome case
 class's limit of 22 fields.

 On Wed, Feb 25, 2015 at 9:45 AM, anamika gupta anamika.guo...@gmail.com
 wrote:

 Hi Akhil

 I guess it skipped my attention. I would definitely give it a try.

 While I would still like to know what is the issue with the way I have
 created schema?

 On Tue, Feb 24, 2015 at 4:35 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Did you happen to have a look at
 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

 Thanks
 Best Regards

 On Tue, Feb 24, 2015 at 3:39 PM, anu anamika.guo...@gmail.com wrote:

 My issue is posted here on stack-overflow. What am I doing wrong here?


 http://stackoverflow.com/questions/28689186/facing-error-while-extending-scala-class-with-product-interface-to-overcome-limi

 --
 View this message in context: Facing error while extending scala
 class with Product interface to overcome limit of 22 fields in spark-shell
 http://apache-spark-user-list.1001560.n3.nabble.com/Facing-error-while-extending-scala-class-with-Product-interface-to-overcome-limit-of-22-fields-in-spl-tp21787.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.








Re: Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-25 Thread anamika gupta
The link has proved helpful. I have been able to load data, register it as
a table and perform simple queries. Thanks Akhil !!

Though, I still look forward to knowing where I was going wrong with my
previous technique of extending the Product Interface to overcome case
class's limit of 22 fields.

On Wed, Feb 25, 2015 at 9:45 AM, anamika gupta anamika.guo...@gmail.com
wrote:

 Hi Akhil

 I guess it skipped my attention. I would definitely give it a try.

 While I would still like to know what is the issue with the way I have
 created schema?

 On Tue, Feb 24, 2015 at 4:35 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Did you happen to have a look at
 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

 Thanks
 Best Regards

 On Tue, Feb 24, 2015 at 3:39 PM, anu anamika.guo...@gmail.com wrote:

 My issue is posted here on stack-overflow. What am I doing wrong here?


 http://stackoverflow.com/questions/28689186/facing-error-while-extending-scala-class-with-product-interface-to-overcome-limi

 --
 View this message in context: Facing error while extending scala class
 with Product interface to overcome limit of 22 fields in spark-shell
 http://apache-spark-user-list.1001560.n3.nabble.com/Facing-error-while-extending-scala-class-with-Product-interface-to-overcome-limit-of-22-fields-in-spl-tp21787.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.






Re: Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-24 Thread anamika gupta
Hi Akhil

I guess it skipped my attention. I would definitely give it a try.

While I would still like to know what is the issue with the way I have
created schema?

On Tue, Feb 24, 2015 at 4:35 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Did you happen to have a look at
 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

 Thanks
 Best Regards

 On Tue, Feb 24, 2015 at 3:39 PM, anu anamika.guo...@gmail.com wrote:

 My issue is posted here on stack-overflow. What am I doing wrong here?


 http://stackoverflow.com/questions/28689186/facing-error-while-extending-scala-class-with-product-interface-to-overcome-limi

 --
 View this message in context: Facing error while extending scala class
 with Product interface to overcome limit of 22 fields in spark-shell
 http://apache-spark-user-list.1001560.n3.nabble.com/Facing-error-while-extending-scala-class-with-Product-interface-to-overcome-limit-of-22-fields-in-spl-tp21787.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.