LompleZ opened a new issue, #64138: URL: https://github.com/apache/doris/issues/64138
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version doris 1.2+ ### What's Wrong? apache hdfs broker有线程泄露的问题,随着broker使用时间的增长,最终会oom。 broker日志如下: ```java [INFO ] 2025-05-25 00:33:57,190 method:org.apache.hadoop.lite.client.LiteClientImpl.batchPing(LiteClientImpl.java:1158) batch ping size: 3, first 3 sessions: 5796d47541baad0c, 3784319f7320ff12, 3c9338822793ba8c, lparam: 3d1129a5242ff0dd, current ping num:1 Exception in thread "TThreadPoolServer WorkerProcess-319" java.lang.OutOfMemoryError: Java heap space Exception in thread "TThreadPoolServer WorkerProcess-343" java.lang.OutOfMemoryError: Java heap space Exception in thread "TThreadPoolServer WorkerProcess-264" java.lang.OutOfMemoryError: Java heap space [WARN ] 2025-05-25 00:34:49,256 method:org.apache.hadoop.lite.client.LiteClientImpl.sendRequest(LiteClientImpl.java:381) failed to send message: DFS_OPEN, lparam: e2065745f259d106, from: /10.138.71.133:36690:1102: java.lang.OutOfMemoryError: Java heap space [ERROR] 2025-05-25 00:34:49,257 method:org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:243) failed to send message: DFS_OPEN, lparam: e2065745f259d106, from: /10.138.71.133:36690:1102: java.lang.OutOfMemoryError: Java heap space [ERROR] 2025-05-25 00:34:49,257 method:org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:243) failed to send request, will try again, lparm: e2065745f259d106: java.io.IOException: failed to send from: /10.138.71.133:36690:1102: at org.apache.hadoop.lite.client.LiteClientImpl.sendRequest(LiteClientImpl.java:384) at org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:241) at org.apache.hadoop.lite.client.LiteClientImpl.openInternal(LiteClientImpl.java:924) at org.apache.hadoop.lite.client.LiteClientImpl.open(LiteClientImpl.java:905) at org.apache.hadoop.fs.lite.file.LiteFileStreamWrapperImpl.open(LiteFileStreamWrapperImpl.java:27) at org.apache.hadoop.fs.LibDFileSystemImpl.openFile(LibDFileSystemImpl.java:454) at org.apache.hadoop.fs.LibDFSInputStream.<init>(LibDFSInputStream.java:26) at org.apache.hadoop.fs.LiteFileSystem.open(LiteFileSystem.java:133) at org.apache.doris.broker.hdfs.FileSystemManager.openReader(FileSystemManager.java:1224) at org.apache.doris.broker.hdfs.HDFSBrokerServiceImpl.openReader(HDFSBrokerServiceImpl.java:184) failed to send request, will try again, lparm: e2065745f259d106: java.io.IOException: failed to send from: /10.138.71.133:36690:1102: at org.apache.hadoop.lite.client.LiteClientImpl.sendRequest(LiteClientImpl.java:384) at org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:241) at org.apache.hadoop.lite.client.LiteClientImpl.openInternal(LiteClientImpl.java:924) at org.apache.hadoop.lite.client.LiteClientImpl.open(LiteClientImpl.java:905) at org.apache.hadoop.fs.lite.file.LiteFileStreamWrapperImpl.open(LiteFileStreamWrapperImpl.java:27) [ERROR] 2025-05-25 00:34:49,257 method:org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:243) failed to send request, will try again, lparm: e2065745f259d106: java.io.IOException: failed to send from: /10.138.71.133:36690:1102: at org.apache.hadoop.lite.client.LiteClientImpl.sendRequest(LiteClientImpl.java:384) at org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:241) at org.apache.hadoop.lite.client.LiteClientImpl.openInternal(LiteClientImpl.java:924) at org.apache.hadoop.lite.client.LiteClientImpl.open(LiteClientImpl.java:905) [ERROR] 2025-05-25 00:34:49,257 method:org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:243) Exception in thread "TThreadPoolServer WorkerProcess-264" java.lang.OutOfMemoryError: Java heap space [WARN ] 2025-05-25 00:34:49,256 method:org.apache.hadoop.lite.client.LiteClientImpl.sendRequest(LiteClientImpl.java:381) failed to send message: DFS_OPEN, lparam: e2065745f259d106, from: /10.138.71.133:36690:1102: java.lang.OutOfMemoryError: Java heap space [ERROR] 2025-05-25 00:34:49,257 method:org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:243) failed to send request, will try again, lparm: e2065745f259d106: java.io.IOException: failed to send from: /10.138.71.133:36690:1102: at org.apache.hadoop.lite.client.LiteClientImpl.sendRequest(LiteClientImpl.java:384) at org.apache.hadoop.lite.client.LiteClientImpl.sendMsg(LiteClientImpl.java:241) at org.apache.hadoop.lite.client.LiteClientImpl.openInternal(LiteClientImpl.java:924) at org.apache.hadoop.lite.client.LiteClientImpl.open(LiteClientImpl.java:905) at org.apache.hadoop.fs.lite.file.LiteFileStreamWrapperImpl.open(LiteFileStreamWrapperImpl.java:27) at org.apache.hadoop.fs.LibDFileSystemImpl.openFile(LibDFileSystemImpl.java:454) at org.apache.hadoop.fs.LibDFSInputStream.<init>(LibDFSInputStream.java:26) at org.apache.hadoop.fs.LiteFileSystem.open(LiteFileSystem.java:133) at org.apache.doris.broker.hdfs.FileSystemManager.openReader(FileSystemManager.java:1224) at org.apache.doris.broker.hdfs.HDFSBrokerServiceImpl.openReader(HDFSBrokerServiceImpl.java:184) at org.apache.doris.thrift.TPaloBrokerService$Processor$openReader.getResult(TPaloBrokerService.java:1145) at org.apache.doris.thrift.TPaloBrokerService$Processor$openReader.getResult(TPaloBrokerService.java:1125) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.handler.codec.EncoderException: java.lang.OutOfMemoryError: Java heap space at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:125) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ... 1 more Caused by: java.lang.OutOfMemoryError: Java heap space ``` 原因: [6fe207eb4b85c92c2b11c266de90bd57e23c9922](https://github.com/apache/doris/commit/6fe207eb4b85c92c2b11c266de90bd57e23c9922 )该commit 注释掉了代码中主动关闭FileSystem.close()的方法。 我翻阅百度代码库的历史记录,那时的broker还没有引入hadoop-common.jar,当时情况下注释这里确实不会产生负面影响,但是后期随着doris的开源转而引用了hadoop-common.jar ,该jar在创建FileSystem的时候会创建一个netty线程池(EventLoopGroup),线程对象属于gc roots对象,如果不手动调用.close()方法无法被jvm回收。 如果只是简单的回滚这个commit是不可行的,因为当前broker的情况多线程竞态模式下,可能A线程调用.close()方法导致B线程的读取出现异常。 同时 updateCachedFileSystem() 函数在多线程竞态场景下存在bug ### What You Expected? 我已经修复了代码,很快会提交 ### How to Reproduce? 当be通过broker频繁的进行错误的导入和导出的时候,broker会不断创建新的FileSystem对象和线程池对象,可以用下面这条命令,看到jvm中有大量未被回收的线程池对象 ```bash jps | awk '/BrokerBootstrap/{print $1}' | xargs jstack | grep -P "pool-\d+-thread-\d+" ``` ### Anything Else? _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
