Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Michael Armbrust
Eventually it would be nice for us to have some sort of function to do the
conversion you are talking about on a single column, but for now I usually
hack it as you suggested:

val withId = origRDD.map { case (id, str) => s"""{"id":$id,
${str.trim.drop(1)}""" }
val table = sqlContext.jsonRDD(withId)

On Thu, Jan 29, 2015 at 1:29 AM, Tobias Pfeiffer  wrote:

> Hi Ayoub,
>
> thanks for your mail!
>
> On Thu, Jan 29, 2015 at 6:23 PM, Ayoub 
> wrote:
>>
>> SQLContext and hiveContext have a "jsonRDD" method which accept an
>> RDD[String] where the string is a JSON String a returns a SchemaRDD, it
>> extends RDD[Row] which the type you want.
>>
>> After words you should be able to do a join to keep your tuple.
>>
>
> I'm afraid that's not so easy, because you can only join on a certain key,
> and the key is exactly what I have to drop in order to infer the schema.
>
> Thanks
> Tobias
>
>


Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Tobias Pfeiffer
Hi Ayoub,

thanks for your mail!

On Thu, Jan 29, 2015 at 6:23 PM, Ayoub  wrote:
>
> SQLContext and hiveContext have a "jsonRDD" method which accept an
> RDD[String] where the string is a JSON String a returns a SchemaRDD, it
> extends RDD[Row] which the type you want.
>
> After words you should be able to do a join to keep your tuple.
>

I'm afraid that's not so easy, because you can only join on a certain key,
and the key is exactly what I have to drop in order to infer the schema.

Thanks
Tobias


Re: SQL query over (Long, JSON string) tuples

2015-01-29 Thread Ayoub
Hello,

SQLContext and hiveContext have a "jsonRDD" method which accept an
RDD[String] where the string is a JSON String a returns a SchemaRDD, it
extends RDD[Row] which the type you want.

After words you should be able to do a join to keep your tuple.

Best,
Ayoub.

2015-01-29 10:12 GMT+01:00 Tobias Pfeiffer :

> Hi,
>
> I have data as RDD[(Long, String)], where the Long is a timestamp and the
> String is a JSON-encoded string. I want to infer the schema of the JSON and
> then do a SQL statement on the data (no aggregates, just column selection
> and UDF application), but still have the timestamp associated with each row
> of the result. I completely fail to see how that would be possible. Any
> suggestions?
>
> I can't even see how I would get an RDD[(Long, Row)] so that I *might* be
> able to add the timestamp to the row after schema inference. Is there *any*
> way other than string-manipulating the JSON string and adding the timestamp
> to it?
>
> Thanks
> Tobias
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-SQL-query-over-Long-JSON-string-tuples-tp21419.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

SQL query over (Long, JSON string) tuples

2015-01-29 Thread Tobias Pfeiffer
Hi,

I have data as RDD[(Long, String)], where the Long is a timestamp and the
String is a JSON-encoded string. I want to infer the schema of the JSON and
then do a SQL statement on the data (no aggregates, just column selection
and UDF application), but still have the timestamp associated with each row
of the result. I completely fail to see how that would be possible. Any
suggestions?

I can't even see how I would get an RDD[(Long, Row)] so that I *might* be
able to add the timestamp to the row after schema inference. Is there *any*
way other than string-manipulating the JSON string and adding the timestamp
to it?

Thanks
Tobias