[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014960#comment-17014960 ]
benj commented on DRILL-7449: ----------------------------- Hi [~IhorHuzenko] The problem doesn't appears for each run. Sometimes (with exactly the same data) it will works 5 times before to crash. With the official 1.17 on a small cluster 3 node (for each ~ 48 proc / 128 Go (DRILL_HEAP=15G, DRILL_MAX_DIRECT_MEMORY=80G)) With a file of 688Mo / 1 118 320 JSON records On cluster When comparing profile of correct and crashed executions I can see that : - crash appears at "02-xx-02 - EXTERNAL_SORT" level - on "02-xx-03 - UNORDERED_RECEIVER" : - correct execution : 99% of the Max Records are concentrated on 1 of the 8 Minor fragment, and the cumulative total is correct - on crash execution : Max Record are ~ evenly/homogeneously distributed on the 8 Minor fragment and the cumulative total is incorrect (lower) (already incorrect in 03-xx-02 - PROJECT and 03-xx-00 - JSON_SUB_SCAN ) On my local Machine (1.17 too 8 Proc / 32Go), in embedded mode, When comparing profile of correct and crashed executions I can see that : - crash appears at "02-xx-02 - EXTERNAL_SORT" level - The difference is on 03-xx-00 - JSON_SUB_SCAN, crash execution doesn't have the good number for Max Records - for 02-xx-03 - UNORDERED_RECEIVER , in correct and crash Max Records are ~ evenly/homogeneously distributed on the 6 Minor fragment Example of log data from crash execution on cluster: {noformat} 2020-01-14 08:22:33,681 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a issued by anonymous: CREATE TABLE dfs.test.`output_pqt` AS ( SELECT R.parsed.host AS D FROM (SELECT parse_url(T.Url) AS parsed FROM dfs.test.`demo2.big.json` AS T) AS R ORDER BY D ) 2020-01-14 08:22:33,724 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:foreman] INFO o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table [output_pqt]. 2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:3] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:5] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State to report: RUNNING 2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:3] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State to report: RUNNING 2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:5] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: State to report: RUNNING 2020-01-14 08:22:33,782 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,782 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State to report: RUNNING 2020-01-14 08:22:33,787 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,787 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:0:0] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State to report: RUNNING 2020-01-14 08:22:41,672 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,673 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,674 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,674 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,675 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,675 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,677 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,677 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,677 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State change requested CANCELLATION_REQUESTED --> FINISHED 2020-01-14 08:22:41,677 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State to report: CANCELLED 2020-01-14 08:22:41,678 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,678 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,679 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State change requested CANCELLATION_REQUESTED --> FINISHED 2020-01-14 08:22:41,679 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State to report: CANCELLED 2020-01-14 08:22:41,679 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] WARN o.a.d.exec.rpc.control.WorkEventBus - A fragment message arrived but there was no registered listener for that message: profile ... (some other rows with A fragment message arrived but there was no registered listener for that message: profile )... {noformat} > memory leak parse_url function > ------------------------------ > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.16.0 > Reporter: benj > Assignee: Igor Guzenko > Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538) > at > org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58) > at > org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1107) > at > org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1118) > at > org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200) > at > org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156) > at > org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217) > at sqlline.Commands.execute(Commands.java:971) > at sqlline.Commands.sql(Commands.java:906) > at sqlline.SqlLine.dispatch(SqlLine.java:740) > at sqlline.SqlLine.runCommands(SqlLine.java:1691) > at sqlline.Commands.run(Commands.java:1513) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38) > at sqlline.SqlLine.dispatch(SqlLine.java:736) > at sqlline.SqlLine.initArgs(SqlLine.java:475) > at sqlline.SqlLine.begin(SqlLine.java:531) > at sqlline.SqlLine.start(SqlLine.java:270) > at sqlline.SqlLine.main(SqlLine.java:201) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) > at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit) > at > org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:520) > at > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:552) > at > org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:546) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:386) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:214) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:329) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at .......(:0 > {noformat} > Note that same request including some other field and excluding use of > _parse_url_ works perfectly well. > -- This message was sent by Atlassian Jira (v8.3.4#803005)