[jira] [Created] (IMPALA-8569) Periodically scrub deleted files from the file handle cache

2019-05-21 Thread Todd Lipcon (JIRA)
Todd Lipcon created IMPALA-8569:
---

 Summary: Periodically scrub deleted files from the file handle 
cache
 Key: IMPALA-8569
 URL: https://issues.apache.org/jira/browse/IMPALA-8569
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Todd Lipcon


Currently, if you query a file, and then later delete that file (eg drop the 
partition or table), the file will still stay in the impalad's file handle 
cache. Because the file is open, the space can't be reclaimed on disk until the 
impalad restarts or churns through its cache enough to drop the handle.

Typically this isn't a big deal in practice, since most files don't get deleted 
shortly after being read, and the FH cache should cycle through after 6 hours 
by default. Additionally, fixing it would be a bit of a pain since we'd need to 
add HDFS and libhdfs hooks to get HDFS to tell us if the underlying short 
circuit FD is unlinked, which probably also means adding JNI code to let Java 
call to fstat() in order to check st_nlink. Given that, I'm not sure it's worth 
fixing, or if we should just consider a shorter default expiry on the FH cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8569) Periodically scrub deleted files from the file handle cache

2019-05-21 Thread Todd Lipcon (JIRA)
Todd Lipcon created IMPALA-8569:
---

 Summary: Periodically scrub deleted files from the file handle 
cache
 Key: IMPALA-8569
 URL: https://issues.apache.org/jira/browse/IMPALA-8569
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Todd Lipcon


Currently, if you query a file, and then later delete that file (eg drop the 
partition or table), the file will still stay in the impalad's file handle 
cache. Because the file is open, the space can't be reclaimed on disk until the 
impalad restarts or churns through its cache enough to drop the handle.

Typically this isn't a big deal in practice, since most files don't get deleted 
shortly after being read, and the FH cache should cycle through after 6 hours 
by default. Additionally, fixing it would be a bit of a pain since we'd need to 
add HDFS and libhdfs hooks to get HDFS to tell us if the underlying short 
circuit FD is unlinked, which probably also means adding JNI code to let Java 
call to fstat() in order to check st_nlink. Given that, I'm not sure it's worth 
fixing, or if we should just consider a shorter default expiry on the FH cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)