Querying over mutliple (avro) files using Spark SQL

2015-01-13 Thread thomas j
Hi, I have a program that loads a single avro file using spark SQL, queries it, transforms it and then outputs the data. The file is loaded with: val records = sqlContext.avroFile(filePath) val data = records.registerTempTable(data) ... Now I want to run it over tens of thousands of Avro files

Re: How can I read this avro file using spark scala?

2014-11-21 Thread thomas j
Thanks for the pointer Michael. I've downloaded spark 1.2.0 from https://people.apache.org/~pwendell/spark-1.2.0-snapshot1/ and clone and built the spark-avro repo you linked to. When I run it against the example avro file linked to in the documentation it works. However, when I try to load my

Re: How can I read this avro file using spark scala?

2014-11-21 Thread thomas j
= (r.getInt(2), r)).take(4).collect() Is there any way to be able to specify the column name (user_id) instead of needing to know/calculate the offset somehow? Thanks again On Fri, Nov 21, 2014 at 11:48 AM, thomas j beanb...@googlemail.com wrote: Thanks for the pointer Michael. I've downloaded