Hello Team,
We are using the spark 3.3.0 version.
We’ve created external HDFS tables using beeline spark with thrift server
Here we have multiple parquet files in one partition need to be attached to
the external HDFS table. Note that HDFS data is stored as distributed setup.
- There is a scenario when there are multiple parquet files at the same
HDFS partition. Once that particular partition is added to the external
table which is created using beeline and run a query to external HDFS table
the query seems to be halted.
Then we are opening another beeline session and run another query (no need
to be the same table). The previous halted query which dispatched for
external HDFS table will be giving results.
- In the other scenario adding those parquet files one by one and
refreshing the table to that same partition. No issue will occur when
querying in the external hdfs table.
- And also if there is only one parquet file exists on that hdfs
partition, queries to external hdfs table are returning without above
mentioned behavior.
Is there any solution to avoid above abnormal behavior
Thankyou and regards
Kalhara Gurugamage