.pipeout)

Marco Gaido (JIRA) Thu, 18 Jan 2018 01:07:42 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-23130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330269#comment-16330269
 ]


Marco Gaido commented on SPARK-23130:
-------------------------------------

[~seano] there is no JIRA for the pipeout issue and there cannot be, since it 
is a problem in Hive codebase, not in the Spark one, so there is no fix which 
can be provided in Spark. SPARK-20202 tracks not relying anymore on Spark's 
fork of hive and using proper Hive releases instead. This will solve the issue 
since the fix needed by the pipeout issue is included in newer Hive versions.

> Spark Thrift does not clean-up temporary files (/tmp/*_resources and 
> /tmp/hive/*.pipeout)
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-23130
>                 URL: https://issues.apache.org/jira/browse/SPARK-23130
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.3, 2.1.0, 2.2.0
>         Environment: * Hadoop distributions: HDP 2.5 - 2.6.3.0
>  * OS: Seen on SLES12, RHEL 7.3 & RHEL 7.4
>            Reporter: Sean Roberts
>            Priority: Major
>              Labels: thrift
>
> Spark Thrift is not cleaning up /tmp for files & directories named like:
>  /tmp/hive/*.pipeout
>  /tmp/*_resources
> There are such a large number that /tmp quickly runs out of inodes *causing 
> the partition to be unusable and many services to crash*. This is even true 
> when the only jobs submitted are routine service checks.
> Used `strace` to show that Spark Thrift is responsible:
> {code:java}
> strace.out.118864:04:53:49 
> open("/tmp/hive/55ad7fc1-f79a-4ad8-8e02-26bbeaa86bbc7288010135864174970.pipeout",
>  O_RDWR|O_CREAT|O_EXCL, 0666) = 134
> strace.out.118864:04:53:49 
> mkdir("/tmp/b6dfbf9e-2f7c-4c25-95a1-73c44318ecf4_resources", 0777) = 0
> {code}
> *Those files were left behind, even days later.*
> ----
> Example files:
> {code:java}
> # stat 
> /tmp/hive/55ad7fc1-f79a-4ad8-8e02-26bbeaa86bbc7288010135864174970.pipeout
>   File: 
> ‘/tmp/hive/55ad7fc1-f79a-4ad8-8e02-26bbeaa86bbc7288010135864174970.pipeout’
>   Size: 0             Blocks: 0          IO Block: 4096   regular empty file
> Device: fe09h/65033d  Inode: 678         Links: 1
> Access: (0644/-rw-r--r--)  Uid: ( 1000/    hive)   Gid: ( 1002/  hadoop)
> Access: 2017-12-19 04:53:49.126777260 -0600
> Modify: 2017-12-19 04:53:49.126777260 -0600
> Change: 2017-12-19 04:53:49.126777260 -0600
>  Birth: -
> # stat /tmp/b6dfbf9e-2f7c-4c25-95a1-73c44318ecf4_resources
>   File: ‘/tmp/b6dfbf9e-2f7c-4c25-95a1-73c44318ecf4_resources’
>   Size: 4096          Blocks: 8          IO Block: 4096   directory
> Device: fe09h/65033d  Inode: 668         Links: 2
> Access: (0700/drwx------)  Uid: ( 1000/    hive)   Gid: ( 1002/  hadoop)
> Access: 2017-12-19 04:57:38.458937635 -0600
> Modify: 2017-12-19 04:53:49.062777216 -0600
> Change: 2017-12-19 04:53:49.066777218 -0600
>  Birth: -
> {code}
> Showing the large number:
> {code:java}
> # find /tmp/ -name '*_resources' | wc -l
> 68340
> # find /tmp/hive -name "*.pipeout" | wc -l
> 51837
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23130) Spark Thrift does not clean-up temporary files (/tmp/*_resources and /tmp/hive/*.pipeout)

Reply via email to

[jira] [Commented] (SPARK-23130) Spark Thrift does not clean-up temporary files (/tmp/_resources and /tmp/hive/.pipeout)