Re: SPARK LIMITATION - more than one case class is not allowed !!

Tobias Pfeiffer Thu, 04 Dec 2014 23:43:48 -0800

Rahul,

On Fri, Dec 5, 2014 at 3:51 PM, Rahul Bindlish <
rahul.bindl...@nectechnologies.in> wrote:
>
> 1. Copy csv files in current directory.
> 2. Open spark-shell from this directory.
> 3. Run "one_scala" file which will create object-files from csv-files in
> current directory.
> 4. Restart spark-shell
> 5. a. Run "two_scala" file, while running it is giving error during loading
> of office_csv
>     b. If we edit "two_scala" file by below contents
>
> -----------------------------------------------------------------------------------
> case class person(id: Int, name: String, fathername: String, officeid: Int)
> case class office(id: Int, name: String, landmark: String, areacode:
> String)
> sc.objectFile[office]("office_obj").count
> sc.objectFile[person]("person_obj").count
>
> --------------------------------------------------------------------------------
> while running it is giving error during loading of person_csv
>


One good news is: I can reproduce the error you see.

Another good news is: I can tell you how to fix this. In your one.scala
file, define all case classes *before* you use saveAsObjectFile() for the
first time. With
  case class person(id: Int, name: String, fathername: String, officeid:
Int)
  case class office(id: Int, name: String, landmark: String, areacode:
String)
  val baseperson =
sc.textFile("person_csv")....saveAsObjectFile("person_obj")
  val baseoffice =
sc.textFile("office_csv")....saveAsObjectFile("office_obj")
I can deserialize the obj files (in any order).

The bad news is: I have no idea about the reason for this. I blame it on
the REPL/shell and assume it would not happen for a compiled application.

Tobias

Re: SPARK LIMITATION - more than one case class is not allowed !!

Reply via email to