parquet file not loading (spark v 1.1.0)

2014-12-10 Thread Rahul Bindlish
Hi,

I have created a parquet-file from case-class using saveAsParquetFile
Then try to reload using parquetFile but it fails.

Sample code is attached.

Any help would be appreciated.

Regards,
Rahul

rahul@...
sample_parquet.sample_parquet
http://apache-spark-user-list.1001560.n3.nabble.com/file/n20618/sample_parquet.sample_parquet
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/parquet-file-not-loading-spark-v-1-1-0-tp20618.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SPARK LIMITATION - more than one case class is not allowed !!

2014-12-05 Thread Rahul Bindlish
Tobias,

Understand and thanks for quick resolution of problem.

Thanks
~Rahul



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20446.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SPARK LIMITATION - more than one case class is not allowed !!

2014-12-04 Thread Rahul Bindlish
Is it a limitation that spark does not support more than one case class at a
time.

Regards,
Rahul



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20415.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SPARK LIMITATION - more than one case class is not allowed !!

2014-12-04 Thread Rahul Bindlish
Hi Tobias,

Thanks Tobias for your response.

I have created  objectfiles [person_obj,office_obj] from
csv[person_csv,office_csv] files using case classes[person,office] with API
(saveAsObjectFile)

Now I restarted spark-shell and load objectfiles using API(objectFile).

*Once any of one object-class is loaded successfully, rest of object-class
gives serialization error.*

So my understanding is that more than one case class is not allowed.

Hope, I am able to clarify myself.

Regards,
Rahul





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20421.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SPARK LIMITATION - more than one case class is not allowed !!

2014-12-04 Thread Rahul Bindlish
Tobias,

Thanks for quick reply.

Definitely, after restart case classes need to be defined again.

I have done so thats why spark is able to load objectfile [e.g. person_obj]
and spark has maintained serialVersionUID [person_obj].

Next time when I am trying to load another objectfile [e.g. office_obj] and
I think spark is matching serialVersionUID [person_obj] with previous
serialVersionUID [person_obj] and giving mismatch error.

In my first post, I have give statements which can be executed easily to
replicate this issue.

Thanks
~Rahul








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SPARK LIMITATION - more than one case class is not allowed !!

2014-12-04 Thread Rahul Bindlish
Tobias,

Find csv and scala files and below are steps:

1. Copy csv files in current directory.
2. Open spark-shell from this directory.
3. Run one_scala file which will create object-files from csv-files in
current directory.
4. Restart spark-shell
5. a. Run two_scala file, while running it is giving error during loading
of office_csv
b. If we edit two_scala file by below contents 

---
case class person(id: Int, name: String, fathername: String, officeid: Int) 
case class office(id: Int, name: String, landmark: String, areacode: String) 
sc.objectFile[office](office_obj).count
sc.objectFile[person](person_obj).count 

while running it is giving error during loading of person_csv

Regards,
Rahul

sample.gz
http://apache-spark-user-list.1001560.n3.nabble.com/file/n20435/sample.gz  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20435.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



serialization issue in case of case class is more than 1

2014-12-03 Thread Rahul Bindlish
Hi,

I am newbie in Spark and performed following steps during POC execution:

1. Map csv file to object-file after some transformations once.
2. Serialize object-file to RDD for operation, as per need.

In case of 2 csv/object-files, first object-file is serialized to RDD
successfully but during serialization of second object-file error appears.
This error occurs only when spark-shell is restarted between step-1 and
step-2.

Please suggest how to serialize 2 object-files.

Also find below executed code on spark-shell
***
//#1//Start spark-shell and csv to object-file creation
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

case class person(id: Int, name: String, fathername: String, officeid: Int)
val baseperson = sc.textFile(person_csv).flatMap(line =
line.split(\n)).map(_.split(,))
baseperson.map(p = person(p(0).trim.toInt, p(1), p(2),
p(3).trim.toInt)).saveAsObjectFile(person_obj)

case class office(id: Int, name: String, landmark: String, areacode: String)
val baseoffice = sc.textFile(office_csv).flatMap(line =
line.split(\n)).map(_.split(,))
baseoffice.map(p = office(p(0).trim.toInt, p(1), p(2),
p(3))).saveAsObjectFile(office_obj)

//#2//Stop spark-shell
//#3//Start spark-shell and map object-file
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
case class person(id: Int, name: String, fathername: String, officeid: Int)
case class office(id: Int, name: String, landmark: String, areacode: String)

sc.objectFile[person](person_obj).count [OK]
sc.objectFile[office](office_obj).count *[FAILS]*
***
stack trace is attached
stacktrace.txt
http://apache-spark-user-list.1001560.n3.nabble.com/file/n20334/stacktrace.txt
 
rahul@...
***

Regards,
Rahul   







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org