Right, Thanks Ted.

On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Minor correction: the class is CatalystTypeConverters.scala
>
> On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan <ymaha...@snappydata.io>
> wrote:
>
>> CatatlystTypeConverters.scala has all types of utility methods to convert
>> from Scala to row and vice a versa.
>>
>>
>> On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan <rishabh...@gmail.com>
>> wrote:
>>
>>> I had the same issue. I resolved it in Java, but I am pretty sure it
>>> would work with scala too. Its kind of a gross hack. But what I did is say
>>> I had a table in Mysql with 1000 columns
>>> what is did is that I threw a jdbc query to extracted the schema of the
>>> table. I stored that schema and wrote a map function to create StructFields
>>> using structType and Row.Factory. Then I took that table loaded as a
>>> dataFrame, event though it had a schema. I converted that data frame into
>>> an RDD, this is when it lost the schema. Then performed something using
>>> that RDD and then converted back that RDD with the structfield.
>>> If your source is structured type then it would be better if you can
>>> load it directly as a DF that way you can preserve the schema. However, in
>>> your case you should do something like this
>>> List<StructFrield> fields = new ArrayList<StructField>
>>> for(keys in MAP)
>>>  fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true
>>> ));
>>>
>>> StrructType schemaOfDataFrame = DataTypes.createStructType(conffields);
>>>
>>> sqlcontext.createDataFrame(rdd, schemaOfDataFrame);
>>>
>>> This is how I would do it to make it in Java, not sure about scala
>>> syntax. Please tell me if that helped.
>>>
>>> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein <fabian.boehnl...@gmail.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> is there a way to create a Spark SQL Row schema based on Scala data
>>> types without creating a manual mapping?
>>>
>>> That's the only example I can find which doesn't require
>>> spark.sql.types.DataType already as input, but it requires to define them
>>> as Strings.
>>>
>>> * val struct = (new StructType)*   .add("a", "int")*   .add("b", "long")*   
>>> .add("c", "string")
>>>
>>>
>>>
>>> Specifically I have an RDD where each element is a Map of 100s of
>>> variables with different data types which I want to transform to a DataFrame
>>> where the keys should end up as the column names:
>>>
>>> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> ....)
>>>
>>>
>>> Is there a different possibility than building a mapping from the
>>> values' .getClass to the Spark SQL DataTypes?
>>>
>>>
>>> Thanks,
>>> Fabian
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to