[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016064#comment-17016064
 ] 

benj commented on DRILL-7449:
-----------------------------

[~IhorHuzenko], please find in attachment (execution with leak from my local 
machine in embedded 1.17 on xubuntu 18.04):
 - [^embedded_FullJsonProfile.txt]
 - [^embedded_sqlline.log.txt]

The Physical plan:
{noformat}
00-00    Screen : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number of 
records written): rowcount = 5845417.0, cumulative cost = {7.07295457E7 rows, 
7.424585516618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 
memory}, id = 739
00-01      Project(Fragment=[$0], Number of records written=[$1]) : rowType = 
RecordType(VARCHAR(255) Fragment, BIGINT Number of records written): rowcount = 
5845417.0, cumulative cost = {7.0145004E7 rows, 7.418740099618018E8 cpu, 
5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 738
00-02        Writer : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number 
of records written): rowcount = 5845417.0, cumulative cost = {6.4299587E7 rows, 
7.301831759618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 
memory}, id = 737
00-03          ProjectAllowDup(Domain=[$0]) : rowType = RecordType(ANY Domain): 
rowcount = 5845417.0, cumulative cost = {5.845417E7 rows, 7.243377589618018E8 
cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 736
00-04            Project(Domain=[$0]) : rowType = RecordType(ANY Domain): 
rowcount = 5845417.0, cumulative cost = {5.2608753E7 rows, 7.184923419618018E8 
cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 735
00-05              SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
Domain): rowcount = 5845417.0, cumulative cost = {4.6763336E7 rows, 
7.126469249618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 
memory}, id = 734
01-01                OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
Domain): rowcount = 5845417.0, cumulative cost = {4.0917919E7 rows, 
6.658835889618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 
memory}, id = 733
02-01                  SelectionVectorRemover : rowType = RecordType(ANY 
Domain): rowcount = 5845417.0, cumulative cost = {3.5072502E7 rows, 
6.600381719618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 
memory}, id = 732
02-02                    Sort(sort0=[$0], dir0=[ASC]) : rowType = 
RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {2.9227085E7 
rows, 6.541927549618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 
4.6763336E7 memory}, id = 731
02-03                      HashToRandomExchange(dist0=[[$0]]) : rowType = 
RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {2.3381668E7 
rows, 1.28599174E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 0.0 memory}, 
id = 730
03-01                        Project(Domain=[ITEM($0, 'host')]) : rowType = 
RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {1.7536251E7 
rows, 3.5072502E7 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 729
03-02                          Project(parsed=[PARSE_URL($0)]) : rowType = 
RecordType(ANY parsed): rowcount = 5845417.0, cumulative cost = {1.1690834E7 
rows, 2.9227085E7 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 728
03-03                            Scan(table=[[dfs, tmp, 
fbingredagg.bigcopy.json]], groupscan=[EasyGroupScan 
[selectionRoot=file:/tmp/fbingredagg.bigcopy.json, numFiles=1, columns=[`Url`], 
files=[file:/tmp/fbingredagg.bigcopy.json], schema=null]]) : rowType = 
RecordType(ANY Url): rowcount = 5845417.0, cumulative cost = {5845417.0 rows, 
5845417.0 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 727
{noformat}
And the Operator profile
 Note that Rows are 8 695 808 although in the file there is 8 999 940 rows
{noformat}
Operator ID     Type    Avg Setup Time  Max Setup Time  Avg Process Time        
Max Process Time        Min Wait Time   Avg Wait Time   Max Wait Time   % 
Fragment Time % Query Time    Rows    Avg Peak Memory Max Peak Memory
00-xx-00        SCREEN  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
0,94%   0,00%   0       -       -
00-xx-01        PROJECT 0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
2,37%   0,00%   0       -       -
00-xx-02        PARQUET_WRITER  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
0,000s  6,08%   0,00%   0       -       -
00-xx-03        PROJECT_ALLOW_DUP       0,000s  0,000s  0,000s  0,000s  0,000s  
0,000s  0,000s  16,61%  0,00%   0       52KB    52KB
00-xx-04        PROJECT 0,001s  0,001s  0,000s  0,000s  0,000s  0,000s  0,000s  
35,03%  0,00%   0       52KB    52KB
00-xx-05        MERGING_RECEIVER        0,000s  0,000s  0,000s  0,000s  40,382s 
40,382s 40,382s 38,96%  0,00%   0       52KB    52KB
01-xx-00        SINGLE_SENDER   0,000s  0,000s  0,000s  0,000s  0,001s  0,001s  
0,001s  27,14%  0,00%   0       -       -
01-xx-01        MERGING_RECEIVER        0,000s  0,000s  0,000s  0,000s  40,390s 
40,390s 40,390s 72,86%  0,00%   0       52KB    52KB
02-xx-00        SINGLE_SENDER   0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
0,000s  0,03%   0,00%   0       -       -
02-xx-01        SELECTION_VECTOR_REMOVER        0,000s  0,000s  0,000s  0,000s  
0,000s  0,000s  0,000s  0,02%   0,00%   0       52KB    52KB
02-xx-02        EXTERNAL_SORT   0,000s  0,000s  0,151s  0,204s  0,000s  0,000s  
0,000s  90,81%  2,46%   8 999 940       63MB    78MB
02-xx-03        UNORDERED_RECEIVER      0,000s  0,000s  0,015s  0,021s  40,116s 
40,174s 40,206s 9,14%   0,25%   8 999 940       64KB    64KB
03-xx-00        JSON_SUB_SCAN   0,000s  0,000s  9,860s  19,691s 0,440s  1,094s  
1,749s  54,79%  53,30%  17 391 616      3MB     6MB
03-xx-01        PROJECT 0,001s  0,001s  0,124s  0,124s  0,000s  0,000s  0,000s  
0,34%   0,34%   8 695 808       292KB   292KB
03-xx-02        PROJECT 0,001s  0,001s  16,149s 16,149s 0,000s  0,000s  0,000s  
44,87%  43,65%  8 695 808       2MB     2MB
{noformat}

hoping it would help

> memory leak parse_url function
> ------------------------------
>
>                 Key: DRILL-7449
>                 URL: https://issues.apache.org/jira/browse/DRILL-7449
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.16.0
>            Reporter: benj
>            Assignee: Igor Guzenko
>            Priority: Major
>         Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
>     org.apache.drill.exec.memory.BaseAllocator.close():520
>     org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
>     org.apache.drill.exec.ops.FragmentContextImpl.close():546
>     
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
>     org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
>     org.apache.drill.exec.memory.BaseAllocator.close():520
>     org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
>     org.apache.drill.exec.ops.FragmentContextImpl.close():546
>     
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
>     org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
>       at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
>       at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610)
>       at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>       at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>       at 
> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>       at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1107)
>       at 
> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1118)
>       at 
> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>       at 
> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>       at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>       at 
> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>       at sqlline.Commands.execute(Commands.java:971)
>       at sqlline.Commands.sql(Commands.java:906)
>       at sqlline.SqlLine.dispatch(SqlLine.java:740)
>       at sqlline.SqlLine.runCommands(SqlLine.java:1691)
>       at sqlline.Commands.run(Commands.java:1513)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
>       at sqlline.SqlLine.dispatch(SqlLine.java:736)
>       at sqlline.SqlLine.initArgs(SqlLine.java:475)
>       at sqlline.SqlLine.begin(SqlLine.java:531)
>       at sqlline.SqlLine.start(SqlLine.java:270)
>       at sqlline.SqlLine.main(SqlLine.java:201)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
>     org.apache.drill.exec.memory.BaseAllocator.close():520
>     org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
>     org.apache.drill.exec.ops.FragmentContextImpl.close():546
>     
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
>     org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>     java.lang.Thread.run():748
>       at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>       at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>       at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>       at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>       at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>       at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>       at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>       at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>       at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>       at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>       at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>       at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>       at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>       at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 3000000/256/9337280/30000000000 (res/actual/peak/limit)
>       at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:520)
>       at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:552)
>       at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:546)
>       at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:386)
>       at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:214)
>       at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:329)
>       at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>       at .......(:0
> {noformat}
> Note that same request including some other field and excluding use of 
> _parse_url_ works perfectly well.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to