Re: Parallel write to different partitions

2023-09-21 Thread Shrikant Prasad
Found this issue reported earlier but was bulk closed:
https://issues.apache.org/jira/browse/SPARK-27030

Regards,
Shrikant

On Fri, 22 Sep 2023 at 12:03 AM, Shrikant Prasad 
wrote:

> Hi all,
>
> We have multiple spark jobs running in parallel trying to write into same
> hive table but each job writing into different partition. This was working
> fine with Spark 2.3 and Hadoop 2.7.
>
> But after upgrading to Spark 3.2 and Hadoop 3.2.2, these parallel jobs are
> failing with FileNotFound exceptions for files under
> /warehouse/db/table/temporary/0/ directory.
>
> It seems earlier the temporary dir was created under the partition being
> written but now its created directly under the table directory which is
> causing concurrency issues with multiple jobs trying to cleanup the same
> temporary directory.
>
> Is there a way now to achieve parallel writes to different partitions of
> same table? Also any insight into what caused the change in behavior of
> temporary dir creation will be helpful.
>
> Thanks and regards,
> Shrikant
>


Parallel write to different partitions

2023-09-21 Thread Shrikant Prasad
Hi all,

We have multiple spark jobs running in parallel trying to write into same
hive table but each job writing into different partition. This was working
fine with Spark 2.3 and Hadoop 2.7.

But after upgrading to Spark 3.2 and Hadoop 3.2.2, these parallel jobs are
failing with FileNotFound exceptions for files under
/warehouse/db/table/temporary/0/ directory.

It seems earlier the temporary dir was created under the partition being
written but now its created directly under the table directory which is
causing concurrency issues with multiple jobs trying to cleanup the same
temporary directory.

Is there a way now to achieve parallel writes to different partitions of
same table? Also any insight into what caused the change in behavior of
temporary dir creation will be helpful.

Thanks and regards,
Shrikant