Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-26 Thread Rick Moritz
Hi,

We solved this the ugly way, when parsing external column definitions:

private def columnTypeToFieldType(columnType: String): DataType = {
  columnType match {
case "IntegerType" => IntegerType
case "StringType" => StringType
case "DateType" => DateType
case "FloatType" => FloatType
case "DecimalType" => DecimalType.SYSTEM_DEFAULT
case "TimeStampType" => TimestampType
case "BooleanType" => BooleanType
case _ => throw new IllegalArgumentException(s"ColumnType
$columnType is not known " +
  s"please add it in the ${this.getClass.getName} class!")
  }
}

There may be a prettier solution than this, but especially with
DecimalType, there are limitations where even with Reflection and
Class.forName, it's not trivial (i.e.
Class.forName(s"org.apache.spark.sql.types.$columnType"))
Furthermore, getting a companion object for a class name is a bit uglier
than getting just the class, see
https://stackoverflow.com/questions/11020746/get-companion-object-instance-with-new-scala-reflection-api
Since the number of types can be expected to be roughly constant, you only
have to overhead of Scala's matching engine, in the ugly solution. In our
case, the effort of engineering something was outshone by a simple method
that might rarely fail, but then does so in a mostly understandable way.

N.B.: mapping isn't complete -- complex types weren't in our scope.

On Fri, Jan 26, 2018 at 8:11 AM, Kurt Fehlhauer  wrote:

> Can you share your code and a sample of your data? WIthout seeing it, I
> can't give a definitive answer. I can offer some hints. If you have a
> column of strings you should either be able to create a new column casted
> to Integer. This can be accomplished two ways:
>
> df.withColumn("newColumn", df.currentColumn.cast(IntegerType))
>
> or
>
> val df = df.select("cast(CurretColumn as int) newColum")
>
>
> Without seeing your json, I really can't offer assistance.
>
>
> On Thu, Jan 25, 2018 at 11:39 PM, kant kodali  wrote:
>
>> It seems like its hard to construct a DataType given its String literal
>> representation.
>>
>> dataframe.types() return column names and its corresponding Types. for
>> example say I have an integer column named "sum" doing dataframe.dtypes()
>> would return "sum" and "IntegerType" but this string  representation
>> "IntegerType" doesnt seem to be very useful because I cannot do
>> DataType.fromJson("IntegerType") This will throw an error. so I am not
>> quite sure how to construct a DataType given its String representation ?
>>
>> On Thu, Jan 25, 2018 at 4:22 PM, kant kodali  wrote:
>>
>>> Hi All,
>>>
>>> I have a datatype "IntegerType" represented as a String and now I want
>>> to create DataType object out of that. I couldn't find in the DataType or
>>> DataTypes api on how to do that?
>>>
>>> Thanks!
>>>
>>
>>
>


Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-25 Thread Kurt Fehlhauer
Can you share your code and a sample of your data? WIthout seeing it, I
can't give a definitive answer. I can offer some hints. If you have a
column of strings you should either be able to create a new column casted
to Integer. This can be accomplished two ways:

df.withColumn("newColumn", df.currentColumn.cast(IntegerType))

or

val df = df.select("cast(CurretColumn as int) newColum")


Without seeing your json, I really can't offer assistance.


On Thu, Jan 25, 2018 at 11:39 PM, kant kodali  wrote:

> It seems like its hard to construct a DataType given its String literal
> representation.
>
> dataframe.types() return column names and its corresponding Types. for
> example say I have an integer column named "sum" doing dataframe.dtypes()
> would return "sum" and "IntegerType" but this string  representation
> "IntegerType" doesnt seem to be very useful because I cannot do
> DataType.fromJson("IntegerType") This will throw an error. so I am not
> quite sure how to construct a DataType given its String representation ?
>
> On Thu, Jan 25, 2018 at 4:22 PM, kant kodali  wrote:
>
>> Hi All,
>>
>> I have a datatype "IntegerType" represented as a String and now I want to
>> create DataType object out of that. I couldn't find in the DataType or
>> DataTypes api on how to do that?
>>
>> Thanks!
>>
>
>


Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-25 Thread kant kodali
It seems like its hard to construct a DataType given its String literal
representation.

dataframe.types() return column names and its corresponding Types. for
example say I have an integer column named "sum" doing dataframe.dtypes()
would return "sum" and "IntegerType" but this string  representation
"IntegerType" doesnt seem to be very useful because I cannot do
DataType.fromJson("IntegerType") This will throw an error. so I am not
quite sure how to construct a DataType given its String representation ?

On Thu, Jan 25, 2018 at 4:22 PM, kant kodali  wrote:

> Hi All,
>
> I have a datatype "IntegerType" represented as a String and now I want to
> create DataType object out of that. I couldn't find in the DataType or
> DataTypes api on how to do that?
>
> Thanks!
>