Re: Scala types to StructType

2016-02-15 Thread Ted Yu
Please the last line of convertToCatalyst(a: Any) :

   case other => other

FYI

On Mon, Feb 15, 2016 at 12:09 AM, Fabian Böhnlein <
fabian.boehnl...@gmail.com> wrote:

> Interesting, thanks.
>
> The (only) publicly accessible method seems *convertToCatalyst*:
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L425
>
> Seems it's missing some types like Integer, Short, Long... I'll give it a
> try.
>
> Thanks,
> Fabian
>
>
> On 12/02/16 05:53, Yogesh Mahajan wrote:
>
> Right, Thanks Ted.
>
> On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu  wrote:
>
>> Minor correction: the class is CatalystTypeConverters.scala
>>
>> On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan <
>> ymaha...@snappydata.io> wrote:
>>
>>> CatatlystTypeConverters.scala has all types of utility methods to
>>> convert from Scala to row and vice a versa.
>>>
>>>
>>> On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan <
>>> rishabh...@gmail.com> wrote:
>>>
 I had the same issue. I resolved it in Java, but I am pretty sure it
 would work with scala too. Its kind of a gross hack. But what I did is say
 I had a table in Mysql with 1000 columns
 what is did is that I threw a jdbc query to extracted the schema of the
 table. I stored that schema and wrote a map function to create StructFields
 using structType and Row.Factory. Then I took that table loaded as a
 dataFrame, event though it had a schema. I converted that data frame into
 an RDD, this is when it lost the schema. Then performed something using
 that RDD and then converted back that RDD with the structfield.
 If your source is structured type then it would be better if you can
 load it directly as a DF that way you can preserve the schema. However, in
 your case you should do something like this
 List fields = new ArrayList
 for(keys in MAP)
  fields.add(DataTypes.createStructField(keys, DataTypes.StringType,
 true));

 StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);

 sqlcontext.createDataFrame(rdd, schemaOfDataFrame);

 This is how I would do it to make it in Java, not sure about scala
 syntax. Please tell me if that helped.

 On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein <
 fabian.boehnl...@gmail.com> wrote:

 Hi all,

 is there a way to create a Spark SQL Row schema based on Scala data
 types without creating a manual mapping?

 That's the only example I can find which doesn't require
 spark.sql.types.DataType already as input, but it requires to define them
 as Strings.

 * val struct = (new StructType)*   .add("a", "int")*   .add("b", "long")*  
  .add("c", "string")



 Specifically I have an RDD where each element is a Map of 100s of
 variables with different data types which I want to transform to a 
 DataFrame
 where the keys should end up as the column names:

 Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )


 Is there a different possibility than building a mapping from the
 values' .getClass to the Spark SQL DataTypes?


 Thanks,
 Fabian





>>>
>>
>
>


Re: Scala types to StructType

2016-02-15 Thread Fabian Böhnlein

Interesting, thanks.

The (only) publicly accessible method seems /convertToCatalyst/:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L425

Seems it's missing some types like Integer, Short, Long... I'll give it 
a try.


Thanks,
Fabian

On 12/02/16 05:53, Yogesh Mahajan wrote:

Right, Thanks Ted.

On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu > wrote:


Minor correction: the class is CatalystTypeConverters.scala

On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan
> wrote:

CatatlystTypeConverters.scala has all types of utility methods
to convert from Scala to row and vice a versa.


On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan
> wrote:

I had the same issue. I resolved it in Java, but I am
pretty sure it would work with scala too. Its kind of a
gross hack. But what I did is say I had a table in Mysql
with 1000 columns
what is did is that I threw a jdbc query to extracted the
schema of the table. I stored that schema and wrote a map
function to create StructFields using structType and
Row.Factory. Then I took that table loaded as a dataFrame,
event though it had a schema. I converted that data frame
into an RDD, this is when it lost the schema. Then
performed something using that RDD and then converted back
that RDD with the structfield.
If your source is structured type then it would be better
if you can load it directly as a DF that way you can
preserve the schema. However, in your case you should do
something like this
List fields = new ArrayList
for(keys in MAP)
 fields.add(DataTypes.createStructField(keys,
DataTypes.StringType, true));

StrructType schemaOfDataFrame =
DataTypes.createStructType(conffields);

sqlcontext.createDataFrame(rdd, schemaOfDataFrame);

This is how I would do it to make it in Java, not sure
about scala syntax. Please tell me if that helped.

On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein
> wrote:

Hi all,

is there a way to create a Spark SQL Row schema based on
Scala data types without creating a manual mapping?

That's the only example I can find which doesn't require
spark.sql.types.DataType already as input, but it
requires to define them as Strings.

* val struct = (new StructType) * .add("a", "int") *
.add("b", "long") * .add("c", "string")


Specifically I have an RDD where each element is a Map of
100s of variables with different data types which I want
to transform to a DataFrame
where the keys should end up as the column names:
Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )

Is there a different possibility than building a mapping
from the values' .getClass to the Spark SQL DataTypes?


Thanks,
Fabian










Re: Scala types to StructType

2016-02-11 Thread Rishabh Wadhawan
I had the same issue. I resolved it in Java, but I am pretty sure it would work 
with scala too. Its kind of a gross hack. But what I did is say I had a table 
in Mysql with 1000 columns
what is did is that I threw a jdbc query to extracted the schema of the table. 
I stored that schema and wrote a map function to create StructFields using 
structType and Row.Factory. Then I took that table loaded as a dataFrame, event 
though it had a schema. I converted that data frame into an RDD, this is when 
it lost the schema. Then performed something using that RDD and then converted 
back that RDD with the structfield.
If your source is structured type then it would be better if you can load it 
directly as a DF that way you can preserve the schema. However, in your case 
you should do something like this
List fields = new ArrayList
for(keys in MAP)
 fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true));

StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);

sqlcontext.createDataFrame(rdd, schemaOfDataFrame);

This is how I would do it to make it in Java, not sure about scala syntax. 
Please tell me if that helped.
> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein  
> wrote:
> 
> Hi all,
> 
> is there a way to create a Spark SQL Row schema based on Scala data types 
> without creating a manual mapping? 
> 
> That's the only example I can find which doesn't require 
> spark.sql.types.DataType already as input, but it requires to define them as 
> Strings.
> 
> * val struct = (new StructType)
> *   .add("a", "int")
> *   .add("b", "long")
> *   .add("c", "string")
> 
> 
> Specifically I have an RDD where each element is a Map of 100s of variables 
> with different data types which I want to transform to a DataFrame
> where the keys should end up as the column names:
> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )
> 
> Is there a different possibility than building a mapping from the values' 
> .getClass to the Spark SQL DataTypes?
> 
> 
> Thanks,
> Fabian
> 
> 
> 



Scala types to StructType

2016-02-11 Thread Fabian Böhnlein

Hi all,

is there a way to create a Spark SQL Row schema based on Scala data 
types without creating a manual mapping?


That's the only example I can find which doesn't require 
spark.sql.types.DataType already as input, but it requires to define 
them as Strings.


* val struct = (new StructType) * .add("a", "int") * .add("b", "long") * 
.add("c", "string")




Specifically I have an RDD where each element is a Map of 100s of 
variables with different data types which I want to transform to a DataFrame

where the keys should end up as the column names:

Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )


Is there a different possibility than building a mapping from the 
values' .getClass to the Spark SQL DataTypes?



Thanks,
Fabian




Re: Scala types to StructType

2016-02-11 Thread Yogesh Mahajan
CatatlystTypeConverters.scala has all types of utility methods to convert
from Scala to row and vice a versa.

On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan 
wrote:

> I had the same issue. I resolved it in Java, but I am pretty sure it would
> work with scala too. Its kind of a gross hack. But what I did is say I had
> a table in Mysql with 1000 columns
> what is did is that I threw a jdbc query to extracted the schema of the
> table. I stored that schema and wrote a map function to create StructFields
> using structType and Row.Factory. Then I took that table loaded as a
> dataFrame, event though it had a schema. I converted that data frame into
> an RDD, this is when it lost the schema. Then performed something using
> that RDD and then converted back that RDD with the structfield.
> If your source is structured type then it would be better if you can load
> it directly as a DF that way you can preserve the schema. However, in your
> case you should do something like this
> List fields = new ArrayList
> for(keys in MAP)
>  fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true
> ));
>
> StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);
>
> sqlcontext.createDataFrame(rdd, schemaOfDataFrame);
>
> This is how I would do it to make it in Java, not sure about scala syntax.
> Please tell me if that helped.
>
> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein 
> wrote:
>
> Hi all,
>
> is there a way to create a Spark SQL Row schema based on Scala data types
> without creating a manual mapping?
>
> That's the only example I can find which doesn't require
> spark.sql.types.DataType already as input, but it requires to define them
> as Strings.
>
> * val struct = (new StructType)*   .add("a", "int")*   .add("b", "long")*   
> .add("c", "string")
>
>
>
> Specifically I have an RDD where each element is a Map of 100s of
> variables with different data types which I want to transform to a DataFrame
> where the keys should end up as the column names:
>
> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )
>
>
> Is there a different possibility than building a mapping from the values'
> .getClass to the Spark SQL DataTypes?
>
>
> Thanks,
> Fabian
>
>
>
>
>


Re: Scala types to StructType

2016-02-11 Thread Yogesh Mahajan
Right, Thanks Ted.

On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu  wrote:

> Minor correction: the class is CatalystTypeConverters.scala
>
> On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan 
> wrote:
>
>> CatatlystTypeConverters.scala has all types of utility methods to convert
>> from Scala to row and vice a versa.
>>
>>
>> On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan 
>> wrote:
>>
>>> I had the same issue. I resolved it in Java, but I am pretty sure it
>>> would work with scala too. Its kind of a gross hack. But what I did is say
>>> I had a table in Mysql with 1000 columns
>>> what is did is that I threw a jdbc query to extracted the schema of the
>>> table. I stored that schema and wrote a map function to create StructFields
>>> using structType and Row.Factory. Then I took that table loaded as a
>>> dataFrame, event though it had a schema. I converted that data frame into
>>> an RDD, this is when it lost the schema. Then performed something using
>>> that RDD and then converted back that RDD with the structfield.
>>> If your source is structured type then it would be better if you can
>>> load it directly as a DF that way you can preserve the schema. However, in
>>> your case you should do something like this
>>> List fields = new ArrayList
>>> for(keys in MAP)
>>>  fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true
>>> ));
>>>
>>> StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);
>>>
>>> sqlcontext.createDataFrame(rdd, schemaOfDataFrame);
>>>
>>> This is how I would do it to make it in Java, not sure about scala
>>> syntax. Please tell me if that helped.
>>>
>>> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein 
>>> wrote:
>>>
>>> Hi all,
>>>
>>> is there a way to create a Spark SQL Row schema based on Scala data
>>> types without creating a manual mapping?
>>>
>>> That's the only example I can find which doesn't require
>>> spark.sql.types.DataType already as input, but it requires to define them
>>> as Strings.
>>>
>>> * val struct = (new StructType)*   .add("a", "int")*   .add("b", "long")*   
>>> .add("c", "string")
>>>
>>>
>>>
>>> Specifically I have an RDD where each element is a Map of 100s of
>>> variables with different data types which I want to transform to a DataFrame
>>> where the keys should end up as the column names:
>>>
>>> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )
>>>
>>>
>>> Is there a different possibility than building a mapping from the
>>> values' .getClass to the Spark SQL DataTypes?
>>>
>>>
>>> Thanks,
>>> Fabian
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Scala types to StructType

2016-02-11 Thread Ted Yu
Minor correction: the class is CatalystTypeConverters.scala

On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan 
wrote:

> CatatlystTypeConverters.scala has all types of utility methods to convert
> from Scala to row and vice a versa.
>
>
> On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan 
> wrote:
>
>> I had the same issue. I resolved it in Java, but I am pretty sure it
>> would work with scala too. Its kind of a gross hack. But what I did is say
>> I had a table in Mysql with 1000 columns
>> what is did is that I threw a jdbc query to extracted the schema of the
>> table. I stored that schema and wrote a map function to create StructFields
>> using structType and Row.Factory. Then I took that table loaded as a
>> dataFrame, event though it had a schema. I converted that data frame into
>> an RDD, this is when it lost the schema. Then performed something using
>> that RDD and then converted back that RDD with the structfield.
>> If your source is structured type then it would be better if you can load
>> it directly as a DF that way you can preserve the schema. However, in your
>> case you should do something like this
>> List fields = new ArrayList
>> for(keys in MAP)
>>  fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true
>> ));
>>
>> StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);
>>
>> sqlcontext.createDataFrame(rdd, schemaOfDataFrame);
>>
>> This is how I would do it to make it in Java, not sure about scala
>> syntax. Please tell me if that helped.
>>
>> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein 
>> wrote:
>>
>> Hi all,
>>
>> is there a way to create a Spark SQL Row schema based on Scala data types
>> without creating a manual mapping?
>>
>> That's the only example I can find which doesn't require
>> spark.sql.types.DataType already as input, but it requires to define them
>> as Strings.
>>
>> * val struct = (new StructType)*   .add("a", "int")*   .add("b", "long")*   
>> .add("c", "string")
>>
>>
>>
>> Specifically I have an RDD where each element is a Map of 100s of
>> variables with different data types which I want to transform to a DataFrame
>> where the keys should end up as the column names:
>>
>> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> )
>>
>>
>> Is there a different possibility than building a mapping from the values'
>> .getClass to the Spark SQL DataTypes?
>>
>>
>> Thanks,
>> Fabian
>>
>>
>>
>>
>>
>