[
https://issues.apache.org/jira/browse/HIVE-29225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sercan Tekin updated HIVE-29225:
--------------------------------
Status: Patch Available (was: In Progress)
> Premature deletion of scratch directories during output streaming
> -----------------------------------------------------------------
>
> Key: HIVE-29225
> URL: https://issues.apache.org/jira/browse/HIVE-29225
> Project: Hive
> Issue Type: Bug
> Reporter: Sercan Tekin
> Assignee: Sercan Tekin
> Priority: Critical
> Labels: pull-request-available
> Fix For: 4.2.0
>
>
> Once a job or application finishes, the corresponding lock file is released,
> and YARN no longer reports any active jobs or applications. At this point,
> Hive assumes the associated scratch directory is no longer needed and
> proceeds to delete it upon *ClearDanglingScratchDir* service is invoked.
> However, in some cases, Hive may still be streaming output to the client
> after the application is marked as finished. This causes the scratch
> directory to be deleted prematurely, even though it is still required for
> ongoing output.
> As a result, queries can fail with *IOException* errors because the scratch
> directory is removed while Hive is still using it.
> {code:java}
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException:
> java.io.IOException: 2049.323.265264
> /user/mapr/tmp/hive/mapr/ecfdf7a4-91ba-4832-a408-8d459f90ac4b/hive_2025-07-23_12-57-08_793_7535333864129536266-1/-mr-10001/.hive-staging_hive_2025-07-23_12-57-08_793_7535333864129536266-1/-ext-10002/000008_0
> (Input/output error)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)