By itself CLOSE_WAIT state does not indicate a problem. Check jstack of
the Drillbit and Jetty worker threads in particular. Try increasing
drill.exec.http.jetty.server.selector setting.
Thank you,
Vlad
On 6/26/18 01:26, Ken Qi (Guangquan) wrote:
Hi Team,
Hope all is good.
We need your help.
Here is the apache drill process which we installed in our server.
drill 19220 1 17 16:48 ? 00:15:32 /usr/java/jdk/bin/java
-Xms8G -Xmx8G -XX:MaxDirectMemorySize=96G -XX:ReservedCodeCacheSize=1024m
-Ddrill.exec.enable-epoll=false -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC
-Dlog.path=/var/log/drill/drillbit.log
-Dlog.query.path=/var/log/drill/drillbit_queries.json -cp
/usr/local/apache-drill-1.13.1/conf:/usr/local/apache-drill-1.13.1/jars/*:/usr/local/apache-drill-1.13.1/jars/ext/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/*:/usr/local/apache-drill-1.13.1/jars/classb/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/linux/*
org.apache.drill.exec.server.Drillbit
root 23651 23227 0 18:16 pts/1 00:00:00 grep --color=auto java
Question 1:
There are a lot of CLOSE_WAIT states when I access apache drill https://ip
address:8047 <https://theremin.digitalalchemy.net.au:8047/> I have changed
our server ip to xxxx for the secruity reason, this caused that we can't
access apache drill by https://ip address:8047
<https://theremin.digitalalchemy.net.au:8047/>, so we can't check which SQL
run failed.
tcp6 0 0 :::8047 :::* LISTEN
19220/java
tcp6 518 0 192.168.xxxx:8047 192.168.100.131:54132
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.100.222:52986
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53009
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54131
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.3.119:61202
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54366
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54129
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58627
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58486
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54134
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53008
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.3.119:56226
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52991
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.3.119:51172
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.3.119:36136
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54133
CLOSE_WAIT 19220/java
tcp6 24 0 192.168. xxxx :8047 192.168.100.131:57474
ESTABLISHED 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54069
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54130
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53001
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52985
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52990
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54212
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.100.131:58628
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.100.131:53955
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:57391
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.3.119:41219
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54307
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53000
CLOSE_WAIT 19220/java
tcp6 518 0 192.168 xxxx :8047 192.168.100.222:52984
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54308
CLOSE_WAIT 19220/java
tcp6 1 0 192.168. xxxx :8047 192.168.3.119:46189
CLOSE_WAIT 19220/java
tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54211
CLOSE_WAIT 19220/java
Question 2
Our apache drill was down frequently, it seems that it is due to memory
leak. However, we have configured 96G memory for apache dirll, so can you
please advise how can we identify which SQL took a lot of memory? and how
can improve our performance?
Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on
theremin.root.digitalalchemy:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
IllegalStateException: Memory was leaked by query. Memory leaked: (67043328)
Allocator(op:14:0:0:HashPartitionSender)
1000000/67043328/101535744/10000000000 (res/actual/peak/limit)
Fragment 14:0
[Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on
theremin.root.digitalalchemy:31010]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
~[drill-common-1.13.0.jar:1.13.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300)
[drill-java-exec-1.13.0.jar:1.13.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
[drill-java-exec-1.13.0.jar:1.13.0]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
[drill-java-exec-1.13.0.jar:1.13.0]
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.13.0.jar:1.13.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_161]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
)
Thank you.
Regards.
Ken Qi
System Operations Department Leader
Digital Alchemy (Nanjing) Limited Company
T : +86 25 83177103 (Ext:2003)
M: +86 13913876298
https://www.digitalalchemy.asia/
<https://www.linkedin.com/pulse/digital-alchemy-expands-north-america-regan-yan/>