Right, Thanks Ted. On Fri, Feb 12, 2016 at 10:21 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Minor correction: the class is CatalystTypeConverters.scala > > On Thu, Feb 11, 2016 at 8:46 PM, Yogesh Mahajan <ymaha...@snappydata.io> > wrote: > >> CatatlystTypeConverters.scala has all types of utility methods to convert >> from Scala to row and vice a versa. >> >> >> On Fri, Feb 12, 2016 at 12:21 AM, Rishabh Wadhawan <rishabh...@gmail.com> >> wrote: >> >>> I had the same issue. I resolved it in Java, but I am pretty sure it >>> would work with scala too. Its kind of a gross hack. But what I did is say >>> I had a table in Mysql with 1000 columns >>> what is did is that I threw a jdbc query to extracted the schema of the >>> table. I stored that schema and wrote a map function to create StructFields >>> using structType and Row.Factory. Then I took that table loaded as a >>> dataFrame, event though it had a schema. I converted that data frame into >>> an RDD, this is when it lost the schema. Then performed something using >>> that RDD and then converted back that RDD with the structfield. >>> If your source is structured type then it would be better if you can >>> load it directly as a DF that way you can preserve the schema. However, in >>> your case you should do something like this >>> List<StructFrield> fields = new ArrayList<StructField> >>> for(keys in MAP) >>> fields.add(DataTypes.createStructField(keys, DataTypes.StringType, true >>> )); >>> >>> StrructType schemaOfDataFrame = DataTypes.createStructType(conffields); >>> >>> sqlcontext.createDataFrame(rdd, schemaOfDataFrame); >>> >>> This is how I would do it to make it in Java, not sure about scala >>> syntax. Please tell me if that helped. >>> >>> On Feb 11, 2016, at 7:20 AM, Fabian Böhnlein <fabian.boehnl...@gmail.com> >>> wrote: >>> >>> Hi all, >>> >>> is there a way to create a Spark SQL Row schema based on Scala data >>> types without creating a manual mapping? >>> >>> That's the only example I can find which doesn't require >>> spark.sql.types.DataType already as input, but it requires to define them >>> as Strings. >>> >>> * val struct = (new StructType)* .add("a", "int")* .add("b", "long")* >>> .add("c", "string") >>> >>> >>> >>> Specifically I have an RDD where each element is a Map of 100s of >>> variables with different data types which I want to transform to a DataFrame >>> where the keys should end up as the column names: >>> >>> Map ("Amean" -> 20.3, "Asize" -> 12, "Bmean" -> ....) >>> >>> >>> Is there a different possibility than building a mapping from the >>> values' .getClass to the Spark SQL DataTypes? >>> >>> >>> Thanks, >>> Fabian >>> >>> >>> >>> >>> >> >