I found the reason why it did not work:

When returning the Spark data type I was calling new StringType(). When 
changing it to DataTypes.StringType it worked. 

Greets,
Rico. 

> Am 17.02.2022 um 14:13 schrieb Gourav Sengupta <gourav.sengu...@gmail.com>:
> 
> 
> Hi,
> 
> can you please post a screen shot of the exact CAST statement that you are 
> using? Did you use the SQL method mentioned by me earlier?
> 
> Regards,
> Gourav Sengupta
> 
>> On Thu, Feb 17, 2022 at 12:17 PM Rico Bergmann <i...@ricobergmann.de> wrote:
>> hi!
>> 
>> Casting another int column that is not a partition column fails with the 
>> same error. 
>> 
>> The Schema before the cast (column names are anonymized):
>> 
>> root
>> |-- valueObject: struct (nullable = true)
>> |    |-- value1: string (nullable = true)
>> |    |-- value2: string (nullable = true)
>> |    |-- value3: timestamp (nullable = true)
>> |    |-- value4: string (nullable = true)
>> |-- partitionColumn2: string (nullable = true)
>> |-- partitionColumn3: timestamp (nullable = true)
>> |-- partitionColumn1: integer (nullable = true)
>> 
>> I wanted to cast partitionColumn1 to String which gives me the described 
>> error. 
>> 
>> Best,
>> Rico
>> 
>> 
>>>> Am 17.02.2022 um 09:56 schrieb ayan guha <guha.a...@gmail.com>:
>>>> 
>>> 
>>> Can you try to cast any other Int field which is NOT a partition column? 
>>> 
>>>> On Thu, 17 Feb 2022 at 7:34 pm, Gourav Sengupta 
>>>> <gourav.sengu...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> This appears interesting, casting INT to STRING has never been an issue 
>>>> for me.
>>>> 
>>>> Can you just help us with the output of : df.printSchema()  ?
>>>> 
>>>> I prefer to use SQL, and the method I use for casting is: CAST(<<column 
>>>> name>> AS STRING) <<alias>>.
>>>> 
>>>> Regards,
>>>> Gourav
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Thu, Feb 17, 2022 at 6:02 AM Rico Bergmann <i...@ricobergmann.de> 
>>>>> wrote:
>>>>> Here is the code snippet:
>>>>> 
>>>>> var df = session.read().parquet(basepath);
>>>>> for(Column partition : partitionColumnsList){
>>>>>   df = df.withColumn(partition.getName(), 
>>>>> df.col(partition.getName()).cast(partition.getType()));
>>>>> }
>>>>> 
>>>>> Column is a class containing Schema Information, like for example the 
>>>>> name of the column and the data type of the column. 
>>>>> 
>>>>> Best, Rico.
>>>>> 
>>>>> > Am 17.02.2022 um 03:17 schrieb Morven Huang <morven.hu...@gmail.com>:
>>>>> > 
>>>>> > Hi Rico, you have any code snippet? I have no problem casting int to 
>>>>> > string.
>>>>> > 
>>>>> >> 2022年2月17日 上午12:26,Rico Bergmann <i...@ricobergmann.de> 写道:
>>>>> >> 
>>>>> >> Hi!
>>>>> >> 
>>>>> >> I am reading a partitioned dataFrame into spark using automatic type 
>>>>> >> inference for the partition columns. For one partition column the data 
>>>>> >> contains an integer, therefor Spark uses IntegerType for this column. 
>>>>> >> In general this is supposed to be a StringType column. So I tried to 
>>>>> >> cast this column to StringType. But this fails with AnalysisException 
>>>>> >> “cannot cast int to string”.
>>>>> >> 
>>>>> >> Is this a bug? Or is it really not allowed to cast an int to a string?
>>>>> >> 
>>>>> >> I’m using Spark 3.1.1
>>>>> >> 
>>>>> >> Best regards
>>>>> >> 
>>>>> >> Rico. 
>>>>> >> 
>>>>> >> ---------------------------------------------------------------------
>>>>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>> >> 
>>>>> > 
>>>>> > 
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>> > 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>> 
>>> -- 
>>> Best Regards,
>>> Ayan Guha

Reply via email to