Re: ClassCastException while reading parquet data via Hive metastore

Naresh Peshwe Mon, 07 Nov 2022 04:48:57 -0800

Understood, thanks Evyatar.

On Mon, Nov 7, 2022, 17:42 Evy M <evya...@gmail.com> wrote:


> TBH I'm not sure why there is an issue casting the int to BigInt and I'm
> also not sure about the Jira ticket, I hope someone else can help here.
> Regarding the solution - IMO the more correct solution here would be to
> modify the Hive table to use INT since it seems that there is no need to
> use BigInt (Long). This approach is also far more simple since it won't
> require any rewrites of the data which might be a costly operation -
> changing the table in the metastore is a pretty effortless operation.
>
> Best,
> Evyatar
>
> On Mon, 7 Nov 2022 at 13:37, Naresh Peshwe <nareshpeshwe12...@gmail.com>
> wrote:
>
>> Hi Evyatar,
>> Yes, directly reading the parquet data works. Since we use hive metastore
>> to obfuscate the underlying datastore details, we want to avoid directly
>> accessing the files.
>> I guess then the only option is to either change the data or change the
>> schema of the hive metastore as you suggested right?
>> But int to long / bigint seems to be a reasonable evolution (correct me
>> if I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
>> reason for that getting closed?
>>
>>
>> Regards,
>> Naresh
>>
>>
>> On Mon, Nov 7, 2022, 16:55 Evy M <evya...@gmail.com> wrote:
>>
>>> Hi Naresh,
>>>
>>> Have you tried any of the following in order to resolve your issue:
>>>
>>>    1. Reading the Parquet files (directly, not via Hive [i.e,
>>>    spark.read.parquet(<path>)]), casting to LongType and creating the hive
>>>    table based on this dataframe? Hive's BigInt and Spark's Long should have
>>>    the same values as seen here Hive Types
>>>    
>>> <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)>
>>>    ; Spark Types
>>>    <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>.
>>>    2. Modifying the hive table to have the columns as INT? If the
>>>    underlying data is an INT, I guess there is no reason to have a BigInt
>>>    definition for that column.
>>>
>>> I hope this might help.
>>>
>>> Best,
>>> Evyatar
>>>
>>> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <nareshpeshwe12...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>> I am trying to read data (using spark sql) via a hive metastore which
>>>> has a column of type bigint. Underlying parquet data has int as the
>>>> datatype for the same column. I am getting the following error while trying
>>>> to read the data using spark sql -
>>>>
>>>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be 
>>>> cast to org.apache.hadoop.io.LongWritable
>>>> at 
>>>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>>>> at 
>>>> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>>>> ...
>>>>
>>>> I believe it is related to 
>>>> https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how 
>>>> I can work around this issue?
>>>>
>>>> Spark version: 2.4.5
>>>>
>>>> Regards,
>>>>
>>>> Naresh
>>>>
>>>>
>>>>

Re: ClassCastException while reading parquet data via Hive metastore

Reply via email to