[ 
https://issues.apache.org/jira/browse/KAFKA-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963632#comment-14963632
 ] 

Jay Kreps edited comment on KAFKA-2580 at 10/19/15 5:21 PM:
------------------------------------------------------------

10TB of space with 1GB segment files means about 10k FDs (though probably a bit 
more since the last segment would be, on average, only 512M). A file descriptor 
is pretty cheap and the perf seems pretty reasonable even with a lot of them. 
So just keeping the files open should not be a huge blocker--changing your FD 
max isn't a bad thing. So let's only do this if we can do it in a way the code 
gets better and cleaner.

If we do do it I really think we have to provide a hard bound on the total 
number of FDs. I agree that it could be a bit simpler and more efficient to 
just have a timeout after which FDs are closed, but since you have to set a 
hard limit on FDs this doesn't quite solve the problem--you still have to model 
which timeout will keep you under that limit. But if you do that you might as 
well just model the total FD count which is simpler to reason about and just 
raise the FD limit itself.

So a lot of this comes down to the implementation. A naive 10k item LRU cache 
could easily be far more memory hungry than having 50k open FDs, plus being in 
heap this would add a huge number of objects to manage.

The only concern with this approach is that there could be a situation in which 
your active set of FDs is larger than the cache size and you end up opening and 
closing a file on each request. It's true that this could be a performance 
problem for pathological open file settings (e.g. 0). However in general file 
open and close isn't too expensive (maybe 1-3 disk accesses) so as long as it 
isn't too frequent it should be okay. A default of 10k should generally be very 
safe since access tends to be concentrated on active segments.


was (Author: jkreps):
10TB of space with 1GB segment files means about 10k FDs (though probably a bit 
more since the last segment would be, on average, only 512M). A file descriptor 
is pretty cheap and the perf seems pretty reasonable even with a lot of them. 
So just keeping the files open should not be a huge blocker--changing your FD 
max isn't a bad thing. So let's only do this if we can do it in a way the code 
gets better and cleaner.

If we do do it I really think we have to provide a hard bound on the total 
number of FDs. I agree that it could be a bit simpler and more efficient to 
just have a timeout after which FDs are closed, but since you have to set a 
hard limit on FDs this doesn't quite solve the problem--you still have to model 
which timeout will keep you under that limit. But if you do that you might as 
well just model the total FD count which is simpler to reason about and just 
raise the FD limit itself.

The only concern with this approach is that there could be a situation in which 
your active set of FDs is larger than the cache size and you end up opening and 
closing a file on each request. It's true that this could be a performance 
problem for pathological open file settings (e.g. 0). However in general file 
open and close isn't too expensive (maybe 1-3 disk accesses) so as long as it 
isn't too frequent it should be okay. A default of 10k should generally be very 
safe since access tends to be concentrated on active segments.

> Kafka Broker keeps file handles open for all log files (even if its not 
> written to/read from)
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2580
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2580
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2.1
>            Reporter: Vinoth Chandar
>            Assignee: Grant Henke
>
> We noticed this in one of our clusters where we stage logs for a longer 
> amount of time. It appears that the Kafka broker keeps file handles open even 
> for non active (not written to or read from) files. (in fact, there are some 
> threads going back to 2013 
> http://grokbase.com/t/kafka/users/132p65qwcn/keeping-logs-forever) 
> Needless to say, this is a problem and forces us to either artificially bump 
> up ulimit (its already at 100K) or expand the cluster (even if we have 
> sufficient IO and everything). 
> Filing this ticket, since I could find anything similar. Very interested to 
> know if there are plans to address this (given how Samza's changelog topic is 
> meant to be a persistent large state use case).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to