I have a background worker process (on a server, not a browser) that kicks off every minute or so and issues some queries sequentially to the rest query endpoint. In 1.4 with no authentication this worked fine except that in 1 instance I need to issue a CTAS query with a different format (json).
I upgraded to 1.5-SNAPSHOT commit bb3fc15216d9cab804fc9a6f0e5bd34597dd4394 Since the upgrade I am getting a resource starvation problem with or without authentication The drillbit process stays up for a an hour or less and then becomes unresponsive and eats up the cpu. It is definitely a resource starvation issue, not sure if its a resource leak. Below is a stack trace. Also when i lsof on the pid there are a lot (more than a thousand) of files like this listed which are used by NIO selectors. so it smells like a resource leak. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 2931 root 288u 0000 0,11 0 7705 anon_inode 2016-02-02 21:56:26,520 [qtp1250890858-11590] ERROR o.a.d.e.s.r.a.AnonymousLoginService - Login failed. java.lang.IllegalStateException: failed to create a child event loop at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49) ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:61) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:49) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:73) ~[drill-rpc-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.server.rest.auth.AbstractDrillLoginService.createDrillClient(AbstractDrillLoginService.java:56) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.server.rest.auth.AnonymousLoginService.login(AnonymousLoginService.java:47) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.server.rest.auth.AnonymousAuthenticator.validateRequest(AnonymousAuthenticator.java:71) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:503) [jetty-security-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478) [jetty-servlet-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.Server.handle(Server.java:462) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534) [jetty-io-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607) [jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536) [jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91] Caused by: java.lang.RuntimeException: epoll_create1() failed: Too many open files at io.netty.channel.epoll.Native.epollCreate(Native Method) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:74) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:76) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] ... 25 common frames omitted 2016-02-02 21:56:30,130 [qtp1250890858-11591] ERROR o.a.d.e.s.r.a.AnonymousLoginService - Login failed. java.lang.IllegalStateException: failed to create a child event loop at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49) ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:61) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:49) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:73) ~[drill-rpc-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.server.rest.auth.AbstractDrillLoginService.createDrillClient(AbstractDrillLoginService.java:56) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.server.rest.auth.AnonymousLoginService.login(AnonymousLoginService.java:47) ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.server.rest.auth.AnonymousAuthenticator.validateRequest(AnonymousAuthenticator.java:71) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:503) [jetty-security-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478) [jetty-servlet-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.Server.handle(Server.java:462) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232) [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534) [jetty-io-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607) [jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505] at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536) [jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91] Caused by: java.lang.RuntimeException: epoll_create1() failed: Too many open files at io.netty.channel.epoll.Native.epollCreate(Native Method) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:74) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:76) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] ... 25 common frames omitted > On Feb 2, 2016, at 7:40 AM, Venki Korukanti <venki.koruka...@gmail.com> wrote: > > Currently we keep the DrillClient per session. All the state is in Server > and DrillClient is the reference to reuse the state. DrillClient is > automatically closed when the session expires (default value is 1hr after > the last activity on session) or user explicitly logs out. I am trying to > understand if there is a resource leak. Do you have too many sessions open > when the system load is max or just few sessions but you have already ran > many queries using the existing sessions? If it is the former it is > understandable to have per connection per session life. Also are the > resources not freeing up after logout? > > If you need to have multiple simultaneous sessions, it is better to connect > to different Drillbits (may be in a round-robin fashion) than always > connecting to a single Drillbit. > > Thanks > Venki > > On Mon, Feb 1, 2016 at 11:51 PM, Josh Schlesser <j...@spoutable.com > <mailto:j...@spoutable.com>> wrote: > >> First: Im a total newb at contributing to apache projects so please excuse >> any indiscretions, feel free to give comments on style or whatever, i take >> feedback well. Thick skin too. >> >> >> Ill give some background next and then a proposal. >> >> Background: >> I recently changed over to using authentication in the 1.5 snapshot >> because I need to have a session via the REST api so that I can set the >> session storage options in an initial query for a subsequent CTAS query. >> Previously all rest calls seemed to be completely independent. >> >> Since the change I have started seeing ‘too many files open’ errors in my >> drillbit.log and the drillbit java process becomes effectively hung waiting >> for open file descriptor slots. When running the top command the machine >> is running at max load due to the drillbit process and the drillbit becomes >> effectively unresponsive, even the simple pages in the web console don’t >> respond. Investigating further it seems that there might be a file kept >> open per session by the drillbit process for the life of the session. I >> used the lsof unix command on the drillbit process and found a lot of unix >> pipes. Looking at the code it looks like these pipes could be for the >> communication between the web process and the rpc server, with one being >> allocated per session. I haven’t validated this, its just a guess after >> scanning the code. I had 1.4 running without this requirement and without >> ever seeing the error. It seems without authentication the number of open >> files is a non-issue for me, possibly due to sessions. >> >> I'm wondering if my guess about what is causing the ‘too many open files’ >> error is plausible? Does anybody with a deeper understanding of the >> architecture have any comments on this? >> >> Proposal: >> Assuming sessions are the issue, I am making some changes to my rest >> client so that sessions are more effectively used and I can up the ulimit >> for the drillbit process for the linux user in hopes of mitigating this. I >> am effectively creating a rest client based session pool that resets >> session variables to defaults when the session gets reused. However, it >> seems hacky. >> >> Below is an idea for getting per request based settings which seems less >> hacky in the long term. >> >> Can I add a new array member to the query.json REST method in a backwards >> compatible way to set session level parameters in a single request? >> Currently a rest request via the api has a body like so: >> { “queryType”: “SQL”, “query” : “<drill query>”} >> >> id like to do the following >> >> { “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”: >> [“option_1_name”:”option_1_value”, “option_2_name”:”option_2_value”]} >> >> or even >> >> { “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”: [“SET >> `option_name` = value”, “SET `option_name1` = value1”,“SET `option_name2` = >> value2”, “SET `option_name3` = value3”]} >> >> As far as I can tell drill is essentially stateless between queries right >> now except for session level system parameters and authentication. There >> aren’t any in memory temp tables or cursors or variables like PL/SQL or >> PSQL or other SQLs that would make it stateful. >> >> Given the stateless assumption, being able to set session level params on >> a per request basis would cover all of the cases that I might need. It >> looks relatively straight forward to add something to QueryWrapper to >> accept an optional query session settings section of the json packet and >> execute those ’SET' commands before the final query. This will work for >> me, as I can run without authentication in an ’secure' backend environment >> which will remove sessions and hence file descriptors, assuming my >> assumptions about file descriptors and sessions are correct. >> >> >> My java is rusty (circa 2003) but some casual googling implies that if >> this were added as a 3rd @FormParam to submitQuery in QueryResources it >> would be magically be null if it werent present and could easily be >> ignored. If its present then an alternative constructor of QueryWrapper >> could be called with the extra param and it would be easy to alter its run >> method to execute the SET commands. There would need to be some error >> handling of course if the SET commands were illegal or failed to run for >> some reason. >> >> If this seems reasonable, how do I go about contributing? I looked >> through the links in the docs to apache foundation incubator projects but >> the links to drill were broken :( http://drill.apache.org/team.html < >> http://drill.apache.org/team.html <http://drill.apache.org/team.html>> I >> read this >> http://drill.apache.org/docs/apache-drill-contribution-guidelines/ >> <http://drill.apache.org/docs/apache-drill-contribution-guidelines/> < >> http://drill.apache.org/docs/apache-drill-contribution-guidelines/> and >> i have subscribed to the dev mailing list (obvious since you are getting >> this). It said to post here before creating a JIRA. Am I missing >> anything in my assumptions? Comments? Should I just submit a JIRA and a >> patch or submit a JIRA and a comment or wait for comments before coding >> stuff up as an example? >> >> Thanks for taking the time to read and respond. >> >> Josh