Re: Spark SQL 1.0.0 - RDD from snappy compress avro file
Ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20267.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL 1.0.0 - RDD from snappy compress avro file
btw the same error from above also happen on 1.1.0 (just tested) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20106.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL 1.0.0 - RDD from snappy compress avro file
Hi Vikas and Simone, thanks for the replies. Yeah I understand this would be easier with 1.2 but this is completely out of my control. I really have to work with 1.0.0. About Simone's approach, during the imports I get: /scala> import org.apache.avro.mapreduce.{ AvroJob, AvroKeyInputFormat, AvroKeyOutputFormat } :17: error: object mapreduce is not a member of package org.apache.avro import org.apache.avro.mapreduce.{ AvroJob, AvroKeyInputFormat, AvroKeyOutputFormat } ^ scala> import org.apache.avro.mapred.AvroKey :17: error: object mapred is not a member of package org.apache.avro import org.apache.avro.mapred.AvroKey ^ scala> import com.twitter.chill.avro.AvroSerializer :18: error: object avro is not a member of package com.twitter.chill import com.twitter.chill.avro.AvroSerializer ^/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20073.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL 1.0.0 - RDD from snappy compress avro file
Did you have a look at my reply in this thread? http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html I am using 1.1.0 though, so not sure if that code would work entirely with 1.0.0, but you can try. Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Sat, Nov 29, 2014 at 5:43 AM, Vikas Agarwal wrote: > Just in case it helps: https://github.com/databricks/spark-avro > > On Fri, Nov 28, 2014 at 8:48 PM, cjdc wrote: > >> To make it simpler, for now forget the snappy compression. Just assume >> they >> are binary Avro files... >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Regards, > Vikas Agarwal > 91 – 9928301411 > > InfoObjects, Inc. > Execution Matters > http://www.infoobjects.com > 2041 Mission College Boulevard, #280 > Santa Clara, CA 95054 > +1 (408) 988-2000 Work > +1 (408) 716-2726 Fax > >
Re: Spark SQL 1.0.0 - RDD from snappy compress avro file
Just in case it helps: https://github.com/databricks/spark-avro On Fri, Nov 28, 2014 at 8:48 PM, cjdc wrote: > To make it simpler, for now forget the snappy compression. Just assume they > are binary Avro files... > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax
Re: Spark SQL 1.0.0 - RDD from snappy compress avro file
To make it simpler, for now forget the snappy compression. Just assume they are binary Avro files... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998p20008.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark SQL 1.0.0 - RDD from snappy compress avro file
Hi everyone, I am using Spark 1.0.0 and I am facing some issues with handling binary snappy compressed avro files which I get form HDFS. I know there are improved mechanisms to handle these files on more recent version of Spark, but updating is not an option since I am operating on a Cloudera cluster with no admin privileges. I would simply like to get some of these avro files, create de RDD and then do simple SQL queries to their content. By following Spark SQL 1.0.0 Programming Guide, we have: */val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val myData = sc.textFile("/example/mydir/MyFile1.avro") ### QUESTION ### ### How to dynamically define the schema from the Avro header?? ### # # val Schema = myData.registerAsTable("MyDB") val query = sql("SELECT * FROM MyDB") query.collect().foreach(println)/* so, how would you modify this to make it work (considering the Spark version)? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-0-0-RDD-from-snappy-compress-avro-file-tp19998.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org