Re: spark sql writing in avro

Kevin Peng Thu, 12 Mar 2015 20:58:24 -0700

Dale,

I basically have the same maven dependency above, but my code will not
compile due to not being able to reference to AvroSaver, though the
saveAsAvro reference compiles fine, which is weird.  Eventhough saveAsAvro
compiles for me, it errors out when running the spark job due to it not
being implemented (the job quits and says non implemented method or
something along those lines).


I will try going the spark shell and passing in the jar built from github
since I haven't tried that quite yet.

On Thu, Mar 12, 2015 at 6:44 PM, M. Dale <medal...@yahoo.com> wrote:

> Short answer: if you downloaded spark-avro from the repo.maven.apache.org
> repo you might be using an old version (pre-November 14, 2014) -
> see timestamps at http://repo.maven.apache.org/
> maven2/com/databricks/spark-avro_2.10/0.1/
> Lots of changes at https://github.com/databricks/spark-avro since then.
>
> Databricks, thank you for sharing the Avro code!!!
>
> Could you please push out the latest version or update the version
> number and republish to repo.maven.apache.org (I have no idea how jars get
> there). Or is there a different repository that users should point to for
> this artifact?
>
> Workaround: Download from https://github.com/databricks/spark-avro and
> build
> with latest functionality (still version 0.1) and add to your local Maven
> or Ivy repo.
>
> Long version:
> I used a default Maven build and declared my dependency on:
>
>         <dependency>
>             <groupId>com.databricks</groupId>
>             <artifactId>spark-avro_2.10</artifactId>
>             <version>0.1</version>
>         </dependency>
>
> Maven downloaded the 0.1 version from http://repo.maven.apache.org/
> maven2/com/databricks/spark-avro_2.10/0.1/ and included it in my app code
> jar.
>
> From spark-shell:
>
> import com.databricks.spark.avro._
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
>
> # This schema includes LONG for time in millis (https://github.com/medale/
> spark-mail/blob/master/mailrecord/src/main/avro/com/
> uebercomputing/mailrecord/MailRecord.avdl)
> val recordsSchema = sqlContext.avroFile("/opt/rpm1/enron/enron-tiny.avro")
> java.lang.RuntimeException: Unsupported type LONG
>
> However, checking out the spark-avro code from its GitHub repo and adding
> a test case against the MailRecord avro everything ran fine.
>
> So I built the databricks spark-avro locally on my box and then put it in
> my
> local Maven repo - everything worked from spark-shell when adding that jar
> as dependency.
>
> Hope this helps for the "save" case as well. On the pre-14NOV version,
> avro.scala
> says:
>  // TODO: Implement me.
>   implicit class AvroSchemaRDD(schemaRDD: SchemaRDD) {
>     def saveAsAvroFile(path: String): Unit = ???
>   }
>
> Markus
>
> On 03/12/2015 07:05 PM, kpeng1 wrote:
>
>> Hi All,
>>
>> I am current trying to write out a scheme RDD to avro.  I noticed that
>> there
>> is a databricks spark-avro library and I have included that in my
>> dependencies, but it looks like I am not able to access the AvroSaver
>> object.  On compilation of the job I get this:
>> error: not found: value AvroSaver
>> [ERROR]     AvroSaver.save(resultRDD, args(4))
>>
>> I also tried calling saveAsAvro on the resultRDD(the actual rdd with the
>> results) and that passes compilation, but when I run the code I get an
>> error
>> that says the saveAsAvro is not implemented.  I am using version 0.1 of
>> spark-avro_2.10
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/spark-sql-writing-in-avro-tp22021.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: spark sql writing in avro

Reply via email to