I have a background worker process (on a server, not a browser) that kicks off 
every minute or so and issues some queries sequentially to the rest query 
endpoint.    In 1.4 with no authentication this worked fine except that in 1 
instance I need to issue a CTAS query with a different format (json).

I upgraded to 1.5-SNAPSHOT commit bb3fc15216d9cab804fc9a6f0e5bd34597dd4394

Since the upgrade I am getting a resource starvation problem with or without 
authentication
The drillbit process stays up for a an hour or less and then becomes 
unresponsive and eats up the cpu.

It is definitely a resource starvation issue, not sure if its a resource leak.
Below is a stack trace.
Also when i lsof on the pid there are a lot (more than a thousand) of files 
like this listed which are used by NIO selectors.  so it smells like a resource 
leak.

COMMAND  PID USER   FD   TYPE             DEVICE  SIZE/OFF    NODE NAME
java    2931 root  288u  0000               0,11         0    7705 anon_inode

2016-02-02 21:56:26,520 [qtp1250890858-11590] ERROR 
o.a.d.e.s.r.a.AnonymousLoginService - Login failed.
java.lang.IllegalStateException: failed to create a child event loop
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
 ~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:61) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:49) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:73)
 ~[drill-rpc-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) 
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) 
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) 
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.server.rest.auth.AbstractDrillLoginService.createDrillClient(AbstractDrillLoginService.java:56)
 ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.server.rest.auth.AnonymousLoginService.login(AnonymousLoginService.java:47)
 ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.server.rest.auth.AnonymousAuthenticator.validateRequest(AnonymousAuthenticator.java:71)
 [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:503) 
[jetty-security-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478) 
[jetty-servlet-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at org.eclipse.jetty.server.Server.handle(Server.java:462) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534) 
[jetty-io-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
 [jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536) 
[jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91]
Caused by: java.lang.RuntimeException: epoll_create1() failed: Too many open 
files
        at io.netty.channel.epoll.Native.epollCreate(Native Method) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:74) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:76)
 ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
        ... 25 common frames omitted
2016-02-02 21:56:30,130 [qtp1250890858-11591] ERROR 
o.a.d.e.s.r.a.AnonymousLoginService - Login failed.
java.lang.IllegalStateException: failed to create a child event loop
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:68)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
 ~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:61) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.<init>(EpollEventLoopGroup.java:49) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:73)
 ~[drill-rpc-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) 
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) 
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) 
~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.server.rest.auth.AbstractDrillLoginService.createDrillClient(AbstractDrillLoginService.java:56)
 ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.server.rest.auth.AnonymousLoginService.login(AnonymousLoginService.java:47)
 ~[drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.apache.drill.exec.server.rest.auth.AnonymousAuthenticator.validateRequest(AnonymousAuthenticator.java:71)
 [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
        at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:503) 
[jetty-security-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478) 
[jetty-servlet-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
 [jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at org.eclipse.jetty.server.Server.handle(Server.java:462) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232) 
[jetty-server-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534) 
[jetty-io-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
 [jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536) 
[jetty-util-9.1.5.v20140505.jar:9.1.5.v20140505]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_91]
Caused by: java.lang.RuntimeException: epoll_create1() failed: Too many open 
files
        at io.netty.channel.epoll.Native.epollCreate(Native Method) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at io.netty.channel.epoll.EpollEventLoop.<init>(EpollEventLoop.java:74) 
~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.channel.epoll.EpollEventLoopGroup.newChild(EpollEventLoopGroup.java:76)
 ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
        ... 25 common frames omitted



> On Feb 2, 2016, at 7:40 AM, Venki Korukanti <venki.koruka...@gmail.com> wrote:
> 
> Currently we keep the DrillClient per session. All the state is in Server
> and DrillClient is the reference to reuse the state. DrillClient is
> automatically closed when the session expires (default value is 1hr after
> the last activity on session) or user explicitly logs out. I am trying to
> understand if there is a resource leak. Do you have too many sessions open
> when the system load is max or just few sessions but you have already ran
> many queries using the existing sessions? If it is the former it is
> understandable to have per connection per session life. Also are the
> resources not freeing up after logout?
> 
> If you need to have multiple simultaneous sessions, it is better to connect
> to different Drillbits (may be in a round-robin fashion) than always
> connecting to a single Drillbit.
> 
> Thanks
> Venki
> 
> On Mon, Feb 1, 2016 at 11:51 PM, Josh Schlesser <j...@spoutable.com 
> <mailto:j...@spoutable.com>> wrote:
> 
>> First: Im a total newb at contributing to apache projects so please excuse
>> any indiscretions, feel free to give comments on style or whatever, i take
>> feedback well.  Thick skin too.
>> 
>> 
>> Ill give some background next and then a proposal.
>> 
>> Background:
>> I recently changed over to using authentication in the 1.5 snapshot
>> because I need to have a session via the REST api so that I can set the
>> session storage options in an initial query for a subsequent CTAS query.
>> Previously all rest calls seemed to be completely independent.
>> 
>> Since the change I have started seeing ‘too many files open’ errors in my
>> drillbit.log and the drillbit java process becomes effectively hung waiting
>> for open file descriptor slots.  When running the top command the machine
>> is running at max load due to the drillbit process and the drillbit becomes
>> effectively unresponsive, even the simple pages in the web console don’t
>> respond.   Investigating further it seems that there might be a file kept
>> open per session by the drillbit process for the life of the session.   I
>> used the lsof unix command on the drillbit process and found a lot of unix
>> pipes.  Looking at the code it looks like these pipes could be for the
>> communication between the web process and the rpc server, with one being
>> allocated per session.  I haven’t validated this, its just a guess after
>> scanning the code.   I had 1.4 running without this requirement and without
>> ever seeing the error.  It seems without authentication the number of open
>> files is a non-issue for me, possibly due to sessions.
>> 
>> I'm wondering if my guess about what is causing the ‘too many open files’
>> error is plausible?   Does anybody with a deeper understanding of the
>> architecture have any comments on this?
>> 
>> Proposal:
>> Assuming sessions are the issue, I am making some changes to my rest
>> client so that sessions are more effectively used and I can up the ulimit
>> for the drillbit process for the linux user in hopes of mitigating this.  I
>> am effectively creating a rest client based session pool that resets
>> session variables to defaults  when the session gets reused.   However, it
>> seems hacky.
>> 
>> Below is an idea for getting per request based settings which seems less
>> hacky in the long term.
>> 
>> Can I add a new array member to the query.json REST method in a backwards
>> compatible way to set session level parameters in a single request?
>> Currently a rest request via the api has a body like so:
>> { “queryType”: “SQL”, “query” : “<drill query>”}
>> 
>> id like to do the following
>> 
>> { “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”:
>> [“option_1_name”:”option_1_value”, “option_2_name”:”option_2_value”]}
>> 
>> or even
>> 
>> { “queryType”: “SQL”, “query” : “<drill query>”, “sessionSettings”: [“SET
>> `option_name` = value”, “SET `option_name1` = value1”,“SET `option_name2` =
>> value2”, “SET `option_name3` = value3”]}
>> 
>> As far as I can tell drill is essentially stateless between queries right
>> now except for session level system parameters and authentication.  There
>> aren’t any in memory temp tables or cursors or variables like PL/SQL or
>> PSQL or other SQLs that would make it stateful.
>> 
>> Given the stateless assumption, being able to set session level params on
>> a per request basis would cover all of the cases that I might need.  It
>> looks relatively straight forward to add something to QueryWrapper to
>> accept an optional query session settings section of the json packet and
>> execute those ’SET' commands before the final query.    This will work for
>> me, as I can run without authentication in an ’secure' backend environment
>> which will remove sessions and hence file descriptors, assuming my
>> assumptions about file descriptors and sessions are correct.
>> 
>> 
>> My java is rusty (circa 2003) but some casual googling implies that if
>> this were added as a 3rd @FormParam to submitQuery in QueryResources it
>> would be magically be null if it werent present and could easily be
>> ignored. If its present then an alternative constructor of QueryWrapper
>> could be called with the extra param and it would be easy to alter its run
>> method to execute the SET commands.  There would need to be some error
>> handling of course if the SET commands were illegal or failed to run for
>> some reason.
>> 
>> If this seems reasonable, how do I go about contributing?  I looked
>> through the links in the docs to apache foundation incubator projects but
>> the links to drill were broken :(   http://drill.apache.org/team.html <
>> http://drill.apache.org/team.html <http://drill.apache.org/team.html>>  I 
>> read this
>> http://drill.apache.org/docs/apache-drill-contribution-guidelines/ 
>> <http://drill.apache.org/docs/apache-drill-contribution-guidelines/> <
>> http://drill.apache.org/docs/apache-drill-contribution-guidelines/>  and
>> i have subscribed to the dev mailing list (obvious since you are getting
>> this).    It said to post here before creating a JIRA.  Am I missing
>> anything in my assumptions?  Comments?  Should I just submit a JIRA and a
>> patch or submit a JIRA and a comment or wait for comments before coding
>> stuff up as an example?
>> 
>> Thanks for taking the time to read and respond.
>> 
>> Josh

Reply via email to