parquet file not loading (spark v 1.1.0)
Hi, I have created a parquet-file from case-class using saveAsParquetFile Then try to reload using parquetFile but it fails. Sample code is attached. Any help would be appreciated. Regards, Rahul rahul@... sample_parquet.sample_parquet http://apache-spark-user-list.1001560.n3.nabble.com/file/n20618/sample_parquet.sample_parquet -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parquet-file-not-loading-spark-v-1-1-0-tp20618.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SPARK LIMITATION - more than one case class is not allowed !!
Tobias, Understand and thanks for quick resolution of problem. Thanks ~Rahul -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20446.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
SPARK LIMITATION - more than one case class is not allowed !!
Is it a limitation that spark does not support more than one case class at a time. Regards, Rahul -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20415.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SPARK LIMITATION - more than one case class is not allowed !!
Hi Tobias, Thanks Tobias for your response. I have created objectfiles [person_obj,office_obj] from csv[person_csv,office_csv] files using case classes[person,office] with API (saveAsObjectFile) Now I restarted spark-shell and load objectfiles using API(objectFile). *Once any of one object-class is loaded successfully, rest of object-class gives serialization error.* So my understanding is that more than one case class is not allowed. Hope, I am able to clarify myself. Regards, Rahul -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20421.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SPARK LIMITATION - more than one case class is not allowed !!
Tobias, Thanks for quick reply. Definitely, after restart case classes need to be defined again. I have done so thats why spark is able to load objectfile [e.g. person_obj] and spark has maintained serialVersionUID [person_obj]. Next time when I am trying to load another objectfile [e.g. office_obj] and I think spark is matching serialVersionUID [person_obj] with previous serialVersionUID [person_obj] and giving mismatch error. In my first post, I have give statements which can be executed easily to replicate this issue. Thanks ~Rahul -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20428.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SPARK LIMITATION - more than one case class is not allowed !!
Tobias, Find csv and scala files and below are steps: 1. Copy csv files in current directory. 2. Open spark-shell from this directory. 3. Run one_scala file which will create object-files from csv-files in current directory. 4. Restart spark-shell 5. a. Run two_scala file, while running it is giving error during loading of office_csv b. If we edit two_scala file by below contents --- case class person(id: Int, name: String, fathername: String, officeid: Int) case class office(id: Int, name: String, landmark: String, areacode: String) sc.objectFile[office](office_obj).count sc.objectFile[person](person_obj).count while running it is giving error during loading of person_csv Regards, Rahul sample.gz http://apache-spark-user-list.1001560.n3.nabble.com/file/n20435/sample.gz -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334p20435.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
serialization issue in case of case class is more than 1
Hi, I am newbie in Spark and performed following steps during POC execution: 1. Map csv file to object-file after some transformations once. 2. Serialize object-file to RDD for operation, as per need. In case of 2 csv/object-files, first object-file is serialized to RDD successfully but during serialization of second object-file error appears. This error occurs only when spark-shell is restarted between step-1 and step-2. Please suggest how to serialize 2 object-files. Also find below executed code on spark-shell *** //#1//Start spark-shell and csv to object-file creation val sqlContext = new org.apache.spark.sql.SQLContext(sc) case class person(id: Int, name: String, fathername: String, officeid: Int) val baseperson = sc.textFile(person_csv).flatMap(line = line.split(\n)).map(_.split(,)) baseperson.map(p = person(p(0).trim.toInt, p(1), p(2), p(3).trim.toInt)).saveAsObjectFile(person_obj) case class office(id: Int, name: String, landmark: String, areacode: String) val baseoffice = sc.textFile(office_csv).flatMap(line = line.split(\n)).map(_.split(,)) baseoffice.map(p = office(p(0).trim.toInt, p(1), p(2), p(3))).saveAsObjectFile(office_obj) //#2//Stop spark-shell //#3//Start spark-shell and map object-file val sqlContext = new org.apache.spark.sql.SQLContext(sc) case class person(id: Int, name: String, fathername: String, officeid: Int) case class office(id: Int, name: String, landmark: String, areacode: String) sc.objectFile[person](person_obj).count [OK] sc.objectFile[office](office_obj).count *[FAILS]* *** stack trace is attached stacktrace.txt http://apache-spark-user-list.1001560.n3.nabble.com/file/n20334/stacktrace.txt rahul@... *** Regards, Rahul -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/serialization-issue-in-case-of-case-class-is-more-than-1-tp20334.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org