Re: ClassCastException while reading parquet data via Hive metastore

2022-11-07 Thread Naresh Peshwe
Understood, thanks Evyatar.

On Mon, Nov 7, 2022, 17:42 Evy M  wrote:

> TBH I'm not sure why there is an issue casting the int to BigInt and I'm
> also not sure about the Jira ticket, I hope someone else can help here.
> Regarding the solution - IMO the more correct solution here would be to
> modify the Hive table to use INT since it seems that there is no need to
> use BigInt (Long). This approach is also far more simple since it won't
> require any rewrites of the data which might be a costly operation -
> changing the table in the metastore is a pretty effortless operation.
>
> Best,
> Evyatar
>
> On Mon, 7 Nov 2022 at 13:37, Naresh Peshwe 
> wrote:
>
>> Hi Evyatar,
>> Yes, directly reading the parquet data works. Since we use hive metastore
>> to obfuscate the underlying datastore details, we want to avoid directly
>> accessing the files.
>> I guess then the only option is to either change the data or change the
>> schema of the hive metastore as you suggested right?
>> But int to long / bigint seems to be a reasonable evolution (correct me
>> if I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
>> reason for that getting closed?
>>
>>
>> Regards,
>> Naresh
>>
>>
>> On Mon, Nov 7, 2022, 16:55 Evy M  wrote:
>>
>>> Hi Naresh,
>>>
>>> Have you tried any of the following in order to resolve your issue:
>>>
>>>1. Reading the Parquet files (directly, not via Hive [i.e,
>>>spark.read.parquet()]), casting to LongType and creating the hive
>>>table based on this dataframe? Hive's BigInt and Spark's Long should have
>>>the same values as seen here Hive Types
>>>
>>> 
>>>; Spark Types
>>>.
>>>2. Modifying the hive table to have the columns as INT? If the
>>>underlying data is an INT, I guess there is no reason to have a BigInt
>>>definition for that column.
>>>
>>> I hope this might help.
>>>
>>> Best,
>>> Evyatar
>>>
>>> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe 
>>> wrote:
>>>
 Hi all,
 I am trying to read data (using spark sql) via a hive metastore which
 has a column of type bigint. Underlying parquet data has int as the
 datatype for the same column. I am getting the following error while trying
 to read the data using spark sql -

 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be 
 cast to org.apache.hadoop.io.LongWritable
 at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
 at 
 org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
 ...

 I believe it is related to 
 https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how 
 I can work around this issue?

 Spark version: 2.4.5

 Regards,

 Naresh





Re: ClassCastException while reading parquet data via Hive metastore

2022-11-07 Thread Naresh Peshwe
Hi Evyatar,
Yes, directly reading the parquet data works. Since we use hive metastore
to obfuscate the underlying datastore details, we want to avoid directly
accessing the files.
I guess then the only option is to either change the data or change the
schema of the hive metastore as you suggested right?
But int to long / bigint seems to be a reasonable evolution (correct me if
I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
reason for that getting closed?


Regards,
Naresh


On Mon, Nov 7, 2022, 16:55 Evy M  wrote:

> Hi Naresh,
>
> Have you tried any of the following in order to resolve your issue:
>
>1. Reading the Parquet files (directly, not via Hive [i.e,
>spark.read.parquet()]), casting to LongType and creating the hive
>table based on this dataframe? Hive's BigInt and Spark's Long should have
>the same values as seen here Hive Types
>
> 
>; Spark Types
>.
>2. Modifying the hive table to have the columns as INT? If the
>underlying data is an INT, I guess there is no reason to have a BigInt
>definition for that column.
>
> I hope this might help.
>
> Best,
> Evyatar
>
> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe 
> wrote:
>
>> Hi all,
>> I am trying to read data (using spark sql) via a hive metastore which has
>> a column of type bigint. Underlying parquet data has int as the datatype
>> for the same column. I am getting the following error while trying to read
>> the data using spark sql -
>>
>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be 
>> cast to org.apache.hadoop.io.LongWritable
>> at 
>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>> at 
>> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>> ...
>>
>> I believe it is related to 
>> https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I 
>> can work around this issue?
>>
>> Spark version: 2.4.5
>>
>> Regards,
>>
>> Naresh
>>
>>
>>


Re: ClassCastException while reading parquet data via Hive metastore

2022-11-07 Thread Evy M
TBH I'm not sure why there is an issue casting the int to BigInt and I'm
also not sure about the Jira ticket, I hope someone else can help here.
Regarding the solution - IMO the more correct solution here would be to
modify the Hive table to use INT since it seems that there is no need to
use BigInt (Long). This approach is also far more simple since it won't
require any rewrites of the data which might be a costly operation -
changing the table in the metastore is a pretty effortless operation.

Best,
Evyatar

On Mon, 7 Nov 2022 at 13:37, Naresh Peshwe 
wrote:

> Hi Evyatar,
> Yes, directly reading the parquet data works. Since we use hive metastore
> to obfuscate the underlying datastore details, we want to avoid directly
> accessing the files.
> I guess then the only option is to either change the data or change the
> schema of the hive metastore as you suggested right?
> But int to long / bigint seems to be a reasonable evolution (correct me if
> I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
> reason for that getting closed?
>
>
> Regards,
> Naresh
>
>
> On Mon, Nov 7, 2022, 16:55 Evy M  wrote:
>
>> Hi Naresh,
>>
>> Have you tried any of the following in order to resolve your issue:
>>
>>1. Reading the Parquet files (directly, not via Hive [i.e,
>>spark.read.parquet()]), casting to LongType and creating the hive
>>table based on this dataframe? Hive's BigInt and Spark's Long should have
>>the same values as seen here Hive Types
>>
>> 
>>; Spark Types
>>.
>>2. Modifying the hive table to have the columns as INT? If the
>>underlying data is an INT, I guess there is no reason to have a BigInt
>>definition for that column.
>>
>> I hope this might help.
>>
>> Best,
>> Evyatar
>>
>> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe 
>> wrote:
>>
>>> Hi all,
>>> I am trying to read data (using spark sql) via a hive metastore which
>>> has a column of type bigint. Underlying parquet data has int as the
>>> datatype for the same column. I am getting the following error while trying
>>> to read the data using spark sql -
>>>
>>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be 
>>> cast to org.apache.hadoop.io.LongWritable
>>> at 
>>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>>> at 
>>> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>>> ...
>>>
>>> I believe it is related to 
>>> https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I 
>>> can work around this issue?
>>>
>>> Spark version: 2.4.5
>>>
>>> Regards,
>>>
>>> Naresh
>>>
>>>
>>>


Re: ClassCastException while reading parquet data via Hive metastore

2022-11-07 Thread Evy M
Hi Naresh,

Have you tried any of the following in order to resolve your issue:

   1. Reading the Parquet files (directly, not via Hive [i.e,
   spark.read.parquet()]), casting to LongType and creating the hive
   table based on this dataframe? Hive's BigInt and Spark's Long should have
   the same values as seen here Hive Types
   

   ; Spark Types
   .
   2. Modifying the hive table to have the columns as INT? If the
   underlying data is an INT, I guess there is no reason to have a BigInt
   definition for that column.

I hope this might help.

Best,
Evyatar

On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe 
wrote:

> Hi all,
> I am trying to read data (using spark sql) via a hive metastore which has
> a column of type bigint. Underlying parquet data has int as the datatype
> for the same column. I am getting the following error while trying to read
> the data using spark sql -
>
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
> to org.apache.hadoop.io.LongWritable
> at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
> ...
>
> I believe it is related to https://issues.apache.org/jira/browse/SPARK-17477. 
> Any suggestions on how I can work around this issue?
>
> Spark version: 2.4.5
>
> Regards,
>
> Naresh
>
>
>


ClassCastException while reading parquet data via Hive metastore

2022-11-06 Thread Naresh Peshwe
Hi all,
I am trying to read data (using spark sql) via a hive metastore which has a
column of type bigint. Underlying parquet data has int as the datatype for
the same column. I am getting the following error while trying to read the
data using spark sql -

java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot
be cast to org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
...

I believe it is related to
https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on
how I can work around this issue?

Spark version: 2.4.5

Regards,

Naresh