Understood, thanks Evyatar. On Mon, Nov 7, 2022, 17:42 Evy M <evya...@gmail.com> wrote:
> TBH I'm not sure why there is an issue casting the int to BigInt and I'm > also not sure about the Jira ticket, I hope someone else can help here. > Regarding the solution - IMO the more correct solution here would be to > modify the Hive table to use INT since it seems that there is no need to > use BigInt (Long). This approach is also far more simple since it won't > require any rewrites of the data which might be a costly operation - > changing the table in the metastore is a pretty effortless operation. > > Best, > Evyatar > > On Mon, 7 Nov 2022 at 13:37, Naresh Peshwe <nareshpeshwe12...@gmail.com> > wrote: > >> Hi Evyatar, >> Yes, directly reading the parquet data works. Since we use hive metastore >> to obfuscate the underlying datastore details, we want to avoid directly >> accessing the files. >> I guess then the only option is to either change the data or change the >> schema of the hive metastore as you suggested right? >> But int to long / bigint seems to be a reasonable evolution (correct me >> if I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any >> reason for that getting closed? >> >> >> Regards, >> Naresh >> >> >> On Mon, Nov 7, 2022, 16:55 Evy M <evya...@gmail.com> wrote: >> >>> Hi Naresh, >>> >>> Have you tried any of the following in order to resolve your issue: >>> >>> 1. Reading the Parquet files (directly, not via Hive [i.e, >>> spark.read.parquet(<path>)]), casting to LongType and creating the hive >>> table based on this dataframe? Hive's BigInt and Spark's Long should have >>> the same values as seen here Hive Types >>> >>> <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)> >>> ; Spark Types >>> <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>. >>> 2. Modifying the hive table to have the columns as INT? If the >>> underlying data is an INT, I guess there is no reason to have a BigInt >>> definition for that column. >>> >>> I hope this might help. >>> >>> Best, >>> Evyatar >>> >>> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <nareshpeshwe12...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> I am trying to read data (using spark sql) via a hive metastore which >>>> has a column of type bigint. Underlying parquet data has int as the >>>> datatype for the same column. I am getting the following error while trying >>>> to read the data using spark sql - >>>> >>>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be >>>> cast to org.apache.hadoop.io.LongWritable >>>> at >>>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36) >>>> at >>>> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418) >>>> ... >>>> >>>> I believe it is related to >>>> https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how >>>> I can work around this issue? >>>> >>>> Spark version: 2.4.5 >>>> >>>> Regards, >>>> >>>> Naresh >>>> >>>> >>>>