Hi Team, Hope all is good.
We need your help. Here is the apache drill process which we installed in our server. drill 19220 1 17 16:48 ? 00:15:32 /usr/java/jdk/bin/java -Xms8G -Xmx8G -XX:MaxDirectMemorySize=96G -XX:ReservedCodeCacheSize=1024m -Ddrill.exec.enable-epoll=false -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC -Dlog.path=/var/log/drill/drillbit.log -Dlog.query.path=/var/log/drill/drillbit_queries.json -cp /usr/local/apache-drill-1.13.1/conf:/usr/local/apache-drill-1.13.1/jars/*:/usr/local/apache-drill-1.13.1/jars/ext/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/*:/usr/local/apache-drill-1.13.1/jars/classb/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/linux/* org.apache.drill.exec.server.Drillbit root 23651 23227 0 18:16 pts/1 00:00:00 grep --color=auto java Question 1: There are a lot of CLOSE_WAIT states when I access apache drill https://ip address:8047 <https://theremin.digitalalchemy.net.au:8047/> I have changed our server ip to xxxx for the secruity reason, this caused that we can't access apache drill by https://ip address:8047 <https://theremin.digitalalchemy.net.au:8047/>, so we can't check which SQL run failed. tcp6 0 0 :::8047 :::* LISTEN 19220/java tcp6 518 0 192.168.xxxx:8047 192.168.100.131:54132 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.100.222:52986 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53009 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54131 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.3.119:61202 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54366 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54129 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58627 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58486 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54134 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53008 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.3.119:56226 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52991 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.3.119:51172 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.3.119:36136 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54133 CLOSE_WAIT 19220/java tcp6 24 0 192.168. xxxx :8047 192.168.100.131:57474 ESTABLISHED 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54069 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54130 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53001 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52985 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52990 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54212 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.100.131:58628 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.100.131:53955 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:57391 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.3.119:41219 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54307 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53000 CLOSE_WAIT 19220/java tcp6 518 0 192.168 xxxx :8047 192.168.100.222:52984 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54308 CLOSE_WAIT 19220/java tcp6 1 0 192.168. xxxx :8047 192.168.3.119:46189 CLOSE_WAIT 19220/java tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54211 CLOSE_WAIT 19220/java Question 2 Our apache drill was down frequently, it seems that it is due to memory leak. However, we have configured 96G memory for apache dirll, so can you please advise how can we identify which SQL took a lot of memory? and how can improve our performance? Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on theremin.root.digitalalchemy:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (67043328) Allocator(op:14:0:0:HashPartitionSender) 1000000/67043328/101535744/10000000000 (res/actual/peak/limit) Fragment 14:0 [Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on theremin.root.digitalalchemy:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.13.0.jar:1.13.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300) [drill-java-exec-1.13.0.jar:1.13.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.13.0.jar:1.13.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266) [drill-java-exec-1.13.0.jar:1.13.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.13.0.jar:1.13.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] ) Thank you. Regards. Ken Qi System Operations Department Leader Digital Alchemy (Nanjing) Limited Company T : +86 25 83177103 (Ext:2003) M: +86 13913876298 https://www.digitalalchemy.asia/ <https://www.linkedin.com/pulse/digital-alchemy-expands-north-america-regan-yan/>