Re: udf StructField to JSON String

2016-03-11 Thread Tristan Nixon
So I think in your case you’d do something more like:

val jsontrans = new 
JsonSerializationTransformer[StructType].setInputCol(“event").setOutputCol(“eventJSON")


> On Mar 11, 2016, at 3:51 PM, Tristan Nixon  wrote:
> 
> val jsontrans = new 
> JsonSerializationTransformer[Document].setInputCol("myEntityColumn")
>  .setOutputCol("myOutputColumn")



Re: udf StructField to JSON String

2016-03-11 Thread Tristan Nixon
It’s pretty simple, really:

import com.fasterxml.jackson.databind.ObjectMapper
import org.apache.spark.ml.UnaryTransformer
import org.apache.spark.ml.util.Identifiable
import org.apache.spark.sql.types.{DataType, StringType}

/**
 * A SparkML Transformer that will transform an
 * entity of type T into a JSON-formatted string.
 * Created by Tristan Nixon  on 3/11/16.
 */
class JsonSerializationTransformer[T](override val uid: String)
  extends UnaryTransformer[T,String,JsonSerializationTransformer[T]]
{
 def this() = this(Identifiable.randomUID("JsonSerializationTransformer"))
 val mapper = new ObjectMapper
 // add additional mapper configuration code here, like this:
 // mapper.setAnnotationIntrospector(new JaxbAnnotationIntrospector)
 // or this:
  // mapper.getSerializationConfig.withFeatures( 
SerializationFeature.WRITE_DATES_AS_TIMESTAMPS )

 override protected def createTransformFunc: ( T ) => String =
  mapper.writeValueAsString

 override protected def outputDataType: DataType = new StringType
}
and you would use it like any other transformer:

val jsontrans = new 
JsonSerializationTransformer[Document].setInputCol("myEntityColumn")
 .setOutputCol("myOutputColumn")

val dfWithJson = jsontrans.transform( entityDF )

Note that this implementation is for Jackson 2.x. If you want to use Jackson 
1.x, it’s a bit trickier because the ObjectMapper class is not Serializable, 
and so you need to initialize it per-partition rather than having it just be a 
standard property.

> On Mar 11, 2016, at 12:49 PM, Jacek Laskowski  wrote:
> 
> Hi Tristan,
> 
> Mind sharing the relevant code? I'd like to learn the way you use Transformer 
> to do so. Thanks!
> 
> Jacek
> 
> 11.03.2016 7:07 PM "Tristan Nixon"  > napisał(a):
> I have a similar situation in an app of mine. I implemented a custom ML 
> Transformer that wraps the Jackson ObjectMapper - this gives you full control 
> over how your custom entities / structs are serialized.
> 
>> On Mar 11, 2016, at 11:53 AM, Caires Vinicius > > wrote:
>> 
>> Hmm. I think my problem is a little more complex. I'm using 
>> https://github.com/databricks/spark-redshift 
>>  and when I read from JSON 
>> file I got this schema.
>> 
>> root
>> |-- app: string (nullable = true)
>> 
>>  |-- ct: long (nullable = true)
>> 
>>  |-- event: struct (nullable = true)
>> 
>> ||-- attributes: struct (nullable = true)
>> 
>>  |||-- account: string (nullable = true)
>> 
>>  |||-- accountEmail: string (nullable = true)
>> 
>> 
>>  |||-- accountId: string (nullable = true)
>> 
>> 
>> 
>> I want to transform the Column event into String (formatted as JSON). 
>> 
>> I was trying to use udf but without success.
>> 
>> 
>> On Fri, Mar 11, 2016 at 1:53 PM Tristan Nixon > > wrote:
>> Have you looked at DataFrame.write.json( path )?
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>>  
>> 
>> 
>> > On Mar 11, 2016, at 7:15 AM, Caires Vinicius > > > wrote:
>> >
>> > I have one DataFrame with nested StructField and I want to convert to JSON 
>> > String. There is anyway to accomplish this?
>> 
> 



Re: udf StructField to JSON String

2016-03-11 Thread Caires Vinicius
I would like to see the code as well Tristan!

On Fri, Mar 11, 2016 at 1:53 PM Tristan Nixon  wrote:

> Have you looked at DataFrame.write.json( path )?
>
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>
> > On Mar 11, 2016, at 7:15 AM, Caires Vinicius  wrote:
> >
> > I have one DataFrame with nested StructField and I want to convert to
> JSON String. There is anyway to accomplish this?
>
>


Re: udf StructField to JSON String

2016-03-11 Thread Michael Armbrust
df.select("event").toJSON

On Fri, Mar 11, 2016 at 9:53 AM, Caires Vinicius  wrote:

> Hmm. I think my problem is a little more complex. I'm using
> https://github.com/databricks/spark-redshift and when I read from JSON
> file I got this schema.
>
> root
>
> |-- app: string (nullable = true)
>
>  |-- ct: long (nullable = true)
>
>  |-- event: struct (nullable = true)
>
> ||-- attributes: struct (nullable = true)
>
>  |||-- account: string (nullable = true)
>
>  |||-- accountEmail: string (nullable = true)
>
>  |||-- accountId: string (nullable = true)
>
>
> I want to transform the Column *event* into String (formatted as JSON).
>
> I was trying to use udf but without success.
>
> On Fri, Mar 11, 2016 at 1:53 PM Tristan Nixon 
> wrote:
>
>> Have you looked at DataFrame.write.json( path )?
>>
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>>
>> > On Mar 11, 2016, at 7:15 AM, Caires Vinicius 
>> wrote:
>> >
>> > I have one DataFrame with nested StructField and I want to convert to
>> JSON String. There is anyway to accomplish this?
>>
>>


Re: udf StructField to JSON String

2016-03-11 Thread Jacek Laskowski
Hi Tristan,

Mind sharing the relevant code? I'd like to learn the way you use
Transformer to do so. Thanks!

Jacek
11.03.2016 7:07 PM "Tristan Nixon"  napisał(a):

> I have a similar situation in an app of mine. I implemented a custom ML
> Transformer that wraps the Jackson ObjectMapper - this gives you full
> control over how your custom entities / structs are serialized.
>
> On Mar 11, 2016, at 11:53 AM, Caires Vinicius  wrote:
>
> Hmm. I think my problem is a little more complex. I'm using
> https://github.com/databricks/spark-redshift and when I read from JSON
> file I got this schema.
>
> root
>
> |-- app: string (nullable = true)
>
>  |-- ct: long (nullable = true)
>
>  |-- event: struct (nullable = true)
>
> ||-- attributes: struct (nullable = true)
>
>  |||-- account: string (nullable = true)
>
>  |||-- accountEmail: string (nullable = true)
>
>  |||-- accountId: string (nullable = true)
>
>
> I want to transform the Column *event* into String (formatted as JSON).
>
> I was trying to use udf but without success.
>
> On Fri, Mar 11, 2016 at 1:53 PM Tristan Nixon 
> wrote:
>
>> Have you looked at DataFrame.write.json( path )?
>>
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>>
>> > On Mar 11, 2016, at 7:15 AM, Caires Vinicius 
>> wrote:
>> >
>> > I have one DataFrame with nested StructField and I want to convert to
>> JSON String. There is anyway to accomplish this?
>>
>>
>


Re: udf StructField to JSON String

2016-03-11 Thread Tristan Nixon
I have a similar situation in an app of mine. I implemented a custom ML 
Transformer that wraps the Jackson ObjectMapper - this gives you full control 
over how your custom entities / structs are serialized.

> On Mar 11, 2016, at 11:53 AM, Caires Vinicius  wrote:
> 
> Hmm. I think my problem is a little more complex. I'm using 
> https://github.com/databricks/spark-redshift 
>  and when I read from JSON file 
> I got this schema.
> 
> root
> |-- app: string (nullable = true)
> 
>  |-- ct: long (nullable = true)
> 
>  |-- event: struct (nullable = true)
> 
> ||-- attributes: struct (nullable = true)
> 
>  |||-- account: string (nullable = true)
> 
>  |||-- accountEmail: string (nullable = true)
> 
> 
>  |||-- accountId: string (nullable = true)
> 
> 
> 
> I want to transform the Column event into String (formatted as JSON). 
> 
> I was trying to use udf but without success.
> 
> 
> On Fri, Mar 11, 2016 at 1:53 PM Tristan Nixon  > wrote:
> Have you looked at DataFrame.write.json( path )?
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>  
> 
> 
> > On Mar 11, 2016, at 7:15 AM, Caires Vinicius  > > wrote:
> >
> > I have one DataFrame with nested StructField and I want to convert to JSON 
> > String. There is anyway to accomplish this?
> 



Re: udf StructField to JSON String

2016-03-11 Thread Caires Vinicius
Hmm. I think my problem is a little more complex. I'm using
https://github.com/databricks/spark-redshift and when I read from JSON file
I got this schema.

root

|-- app: string (nullable = true)

 |-- ct: long (nullable = true)

 |-- event: struct (nullable = true)

||-- attributes: struct (nullable = true)

 |||-- account: string (nullable = true)

 |||-- accountEmail: string (nullable = true)

 |||-- accountId: string (nullable = true)


I want to transform the Column *event* into String (formatted as JSON).

I was trying to use udf but without success.

On Fri, Mar 11, 2016 at 1:53 PM Tristan Nixon  wrote:

> Have you looked at DataFrame.write.json( path )?
>
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter
>
> > On Mar 11, 2016, at 7:15 AM, Caires Vinicius  wrote:
> >
> > I have one DataFrame with nested StructField and I want to convert to
> JSON String. There is anyway to accomplish this?
>
>


Re: udf StructField to JSON String

2016-03-11 Thread Tristan Nixon
Have you looked at DataFrame.write.json( path )?
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

> On Mar 11, 2016, at 7:15 AM, Caires Vinicius  wrote:
> 
> I have one DataFrame with nested StructField and I want to convert to JSON 
> String. There is anyway to accomplish this?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org