[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019540#comment-17019540 ] benj commented on DRILL-7449: - [~arina], I would like but it's not possible, it's not a problem of size but a regulatory content issue. > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, > embedded_sqlline_with_enable_debug_logging.log.txt > > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019510#comment-17019510 ] Arina Ielchiieva commented on DRILL-7449: - [~benj641] why not provide the file itself? If it's too big to be attached, you can always use file share system. > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, > embedded_sqlline_with_enable_debug_logging.log.txt > > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019508#comment-17019508 ] benj commented on DRILL-7449: - Hi [~IhorHuzenko] I realized that the problem may from input passed to _parse_url_. With the strict repetition of 2 extracted from beginning I can't produce the problem. But I have isolated typical row (from big original data) that can produce the problem when they are many. Others example of possible rows: {noformat} {"Attributable":true,"Description":"Website has been identified as malicious by Bing","FirstReportedDateTime":"2018-03-12T17:40:01Z","IndicatorExpirationDateTime":"2018-04-11T23:39:23Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T17:40:01Z","NetworkDestinationAsn":0,"NetworkDestinationIPv4":"255.255.255.255","NetworkDestinationPort":80,"Tags":["??"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://www.guruvittal.org/lzp/gets.php?hl=Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¹Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%9AÃ%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82©Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82³-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%9AÃ%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82©Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82³-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82²Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%99Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%86","Version":1.5} {"Attributable":true,"Description":"Website has been identified as malicious by Bing","FirstReportedDateTime":"2018-03-12T17:54:33Z","IndicatorExpirationDateTime":"2018-04-11T23:39:23Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T17:54:33Z","NetworkDestinationAsn":0,"NetworkDestinationIPv4":"255.255.255.255","NetworkDestinationPort":80,"Tags":["??"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://www.guruvittal.org/lzp/gets.php?hl=Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82·Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82²-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¨Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82±Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¹Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82±Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¨","Version":1.5} {"Attributable":true,"Description":"Website has been identified as malicious by
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018156#comment-17018156 ] Igor Guzenko commented on DRILL-7449: - Hello [~benj641], from the attached log I found that thread for fragment 3.0 failed while closing allocator for HashPartitionSender. I quess it's something between second project and sort operator in your physical plan (maybe HashToRandomExchange...). {code} 2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] DEBUG o.a.d.exec.ops.OperatorContextImpl - Closing context for org.apache.drill.exec.physical.config.Project 2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] DEBUG o.a.d.exec.ops.OperatorContextImpl - Closing context for org.apache.drill.exec.physical.config.Project 2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] DEBUG o.a.d.exec.ops.OperatorContextImpl - Closing context for org.apache.drill.exec.physical.config.HashPartitionSender 2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 21dfc400-1fab-4eca-ce7d-babb333b1ce6:3:0: State change requested RUNNING --> FAILED {code} Since I'm not an expert in Drill's custom memory management (aka BufferAllocator and related things) I won't guarantee that I'll fix the issue in a short time without repro on my machine. I hope I'll have some time to spend on the issue and find potential reasons causing the problem. > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, > embedded_sqlline_with_enable_debug_logging.log.txt > > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016797#comment-17016797 ] benj commented on DRILL-7449: - hi [~IhorHuzenko], I have enable debug logging and the result is here: [^embedded_sqlline_with_enable_debug_logging.log.txt] > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, > embedded_sqlline_with_enable_debug_logging.log.txt > > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016168#comment-17016168 ] Igor Guzenko commented on DRILL-7449: - hi [~benj641] unfortunately the attached log contains very little information about prerequisites of the error. From query profile seems that at some point major fragment 3.0 failed and then cancellation request was sent to other operators above. Could you please enable debug logging in your embedded Drill and share the new log file with detailed error info? > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt > > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016064#comment-17016064 ] benj commented on DRILL-7449: - [~IhorHuzenko], please find in attachment (execution with leak from my local machine in embedded 1.17 on xubuntu 18.04): - [^embedded_FullJsonProfile.txt] - [^embedded_sqlline.log.txt] The Physical plan: {noformat} 00-00Screen : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number of records written): rowcount = 5845417.0, cumulative cost = {7.07295457E7 rows, 7.424585516618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 739 00-01 Project(Fragment=[$0], Number of records written=[$1]) : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number of records written): rowcount = 5845417.0, cumulative cost = {7.0145004E7 rows, 7.418740099618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 738 00-02Writer : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number of records written): rowcount = 5845417.0, cumulative cost = {6.4299587E7 rows, 7.301831759618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 737 00-03 ProjectAllowDup(Domain=[$0]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {5.845417E7 rows, 7.243377589618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 736 00-04Project(Domain=[$0]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {5.2608753E7 rows, 7.184923419618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 735 00-05 SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {4.6763336E7 rows, 7.126469249618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 734 01-01OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {4.0917919E7 rows, 6.658835889618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 memory}, id = 733 02-01 SelectionVectorRemover : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {3.5072502E7 rows, 6.600381719618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 memory}, id = 732 02-02Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {2.9227085E7 rows, 6.541927549618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 memory}, id = 731 02-03 HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {2.3381668E7 rows, 1.28599174E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 0.0 memory}, id = 730 03-01Project(Domain=[ITEM($0, 'host')]) : rowType = RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {1.7536251E7 rows, 3.5072502E7 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 729 03-02 Project(parsed=[PARSE_URL($0)]) : rowType = RecordType(ANY parsed): rowcount = 5845417.0, cumulative cost = {1.1690834E7 rows, 2.9227085E7 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 728 03-03Scan(table=[[dfs, tmp, fbingredagg.bigcopy.json]], groupscan=[EasyGroupScan [selectionRoot=file:/tmp/fbingredagg.bigcopy.json, numFiles=1, columns=[`Url`], files=[file:/tmp/fbingredagg.bigcopy.json], schema=null]]) : rowType = RecordType(ANY Url): rowcount = 5845417.0, cumulative cost = {5845417.0 rows, 5845417.0 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 727 {noformat} And the Operator profile Note that Rows are 8 695 808 although in the file there is 8 999 940 rows {noformat} Operator ID TypeAvg Setup Time Max Setup Time Avg Process Time Max Process TimeMin Wait Time Avg Wait Time Max Wait Time % Fragment Time % Query TimeRowsAvg Peak Memory Max Peak Memory 00-xx-00SCREEN 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 0,94% 0,00% 0 - - 00-xx-01PROJECT 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 2,37% 0,00% 0 - - 00-xx-02PARQUET_WRITER 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 6,08% 0,00% 0 - - 00-xx-03PROJECT_ALLOW_DUP 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 0,000s 16,61% 0,00% 0 52KB52KB 00-xx-04PROJECT 0,001s 0,001s 0,000s 0,000s 0,000s 0,000s 0,000s 35,03% 0,00% 0 52KB52KB 00-xx-05MERGING_RECEIVER0,000s 0,000s 0,000s 0,000s 40,382s 40,382s 40,382s 38,96% 0,00% 0 52KB52KB 01-xx-00SINGLE_SENDER 0,000s 0,000s 0,000s 0,000s 0,001s
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016011#comment-17016011 ] Igor Guzenko commented on DRILL-7449: - Hello [~benj641], I'm still trying to reproduce the leak, made about 40 unsuccessful attempts... Could you please attach all available log files and full query profile JSON to this Jira? > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014960#comment-17014960 ] benj commented on DRILL-7449: - Hi [~IhorHuzenko] The problem doesn't appears for each run. Sometimes (with exactly the same data) it will works 5 times before to crash. With the official 1.17 on a small cluster 3 node (for each ~ 48 proc / 128 Go (DRILL_HEAP=15G, DRILL_MAX_DIRECT_MEMORY=80G)) With a file of 688Mo / 1 118 320 JSON records On cluster When comparing profile of correct and crashed executions I can see that : - crash appears at "02-xx-02 - EXTERNAL_SORT" level - on "02-xx-03 - UNORDERED_RECEIVER" : - correct execution : 99% of the Max Records are concentrated on 1 of the 8 Minor fragment, and the cumulative total is correct - on crash execution : Max Record are ~ evenly/homogeneously distributed on the 8 Minor fragment and the cumulative total is incorrect (lower) (already incorrect in 03-xx-02 - PROJECT and 03-xx-00 - JSON_SUB_SCAN ) On my local Machine (1.17 too 8 Proc / 32Go), in embedded mode, When comparing profile of correct and crashed executions I can see that : - crash appears at "02-xx-02 - EXTERNAL_SORT" level - The difference is on 03-xx-00 - JSON_SUB_SCAN, crash execution doesn't have the good number for Max Records - for 02-xx-03 - UNORDERED_RECEIVER , in correct and crash Max Records are ~ evenly/homogeneously distributed on the 6 Minor fragment Example of log data from crash execution on cluster: {noformat} 2020-01-14 08:22:33,681 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query with id 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a issued by anonymous: CREATE TABLE dfs.test.`output_pqt` AS ( SELECT R.parsed.host AS D FROM (SELECT parse_url(T.Url) AS parsed FROM dfs.test.`demo2.big.json` AS T) AS R ORDER BY D ) 2020-01-14 08:22:33,724 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:foreman] INFO o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table [output_pqt]. 2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:3] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:5] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: State to report: RUNNING 2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:3] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State to report: RUNNING 2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:5] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: State to report: RUNNING 2020-01-14 08:22:33,782 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,782 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State to report: RUNNING 2020-01-14 08:22:33,787 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State change requested AWAITING_ALLOCATION --> RUNNING 2020-01-14 08:22:33,787 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:0:0] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State to report: RUNNING 2020-01-14 08:22:41,672 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,673 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,674 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,674 [BitServer-2] INFO o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State to report: CANCELLATION_REQUESTED 2020-01-14 08:22:41,675 [BitServer-2] INFO o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State change requested RUNNING --> CANCELLATION_REQUESTED 2020-01-14 08:22:41,675
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014414#comment-17014414 ] Igor Guzenko commented on DRILL-7449: - Hello [~benj641], Unfortunately, I wasn't able to reproduce the issue from the description. I've generated 500k, 1 and 10 million JSON rows with unique URLs and used Drill 1.16 and latest master build in embedded mode. Both versions of Drill did well for the query from the description. Could you please share more details about your query (query profile) and cluster topology? Thank you in advance, Igor > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014216#comment-17014216 ] benj commented on DRILL-7449: - I have had a full check and in reality we havn't used the drill-url-tools because it sometimes produce incorrect values on big dataset (due to memory problem catch into UDF ?) . After some other tests, the standard Drill *parse_url* works well (no Memory leak) +if remove the ORDER BY clause+. And note that Memory leaked can already appears with url_parse (from drill-url-tools) if using ORDER BY clause produce already. The only code that does not cause any critical problem for our use is regexp of the type: {code:sql} SELECT REGEXP_REPLACE(Activity,'^(?:.*:.*@)?([^:]*)(?::.*)?$','$1') As Host FROM (SELECT REGEXP_REPLACE(NULLIF(Url, ''),'^(?:(?:[^:/?#]+):)?(?://([^/?#]*))(?:[^?#]*)?(?:.*)?','$1') AS Activity FROM ...) {code} Don't know why, but in terms of observation, ORDER BY clause produce number of error of different contexts with complex request and it's sometimes necessary to split the request into 2 distinct requests (one for the SELECT with computations and one for the SELECT with ORDER BY) Note that with the regexp there is no error even with ORDER BY clause. > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012893#comment-17012893 ] Arina Ielchiieva commented on DRILL-7449: - Thanks, this is really interesting issue. [~benj641] please confirm that you don't see memory leak when using url_parse from external source? > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012874#comment-17012874 ] benj commented on DRILL-7449: - In the meantime and if it can helps someone I have found this page [https://www.r-bloggers.com/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names/] and now using the function _url_parse_ from [https://github.com/hrbrmstr/drill-url-tools] that use [http://galimatias.mola.io/] > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012841#comment-17012841 ] Arina Ielchiieva commented on DRILL-7449: - This should be investigated. > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 >
[jira] [Commented] (DRILL-7449) memory leak parse_url function
[ https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012838#comment-17012838 ] Charles Givre commented on DRILL-7449: -- Do we know the underlying cause for this? > memory leak parse_url function > -- > > Key: DRILL-7449 > URL: https://issues.apache.org/jira/browse/DRILL-7449 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 >Reporter: benj >Assignee: Igor Guzenko >Priority: Major > > Requests with *parse_url* works well when the number of treated rows is low > but produce memory leak when number of rows grows (~ between 500 000 and 1 > million) (and for certain number of row sometimes the request works and > sometimes it failed with memory leaks) > Extract from dataset tested: > {noformat} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5} > {"Attributable":true,"Description":"Website has been identified as malicious > by > Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5} > {noformat} > Request tested: > {code:sql} > ALTER SESSION SET `store.format`='parquet'; > ALTER SESSION SET `store.parquet.use_new_reader` = true; > ALTER SESSION SET `store.parquet.compression` = 'snappy'; > ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true; > ALTER SESSION SET `store.json.all_text_mode` = true; > ALTER SESSION SET `exec.enable_union_type` = true; > ALTER SESSION SET `store.json.all_text_mode` = true; > CREATE TABLE dfs.test.`output_pqt` AS > ( > SELECT R.parsed.host AS Domain > FROM ( > SELECT parse_url(T.Url) AS parsed > FROM dfs.test.`file.json` AS T > ) AS R > ORDER BY Domain > ); > {code} > > Result when memory leak: > {noformat} > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():329 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1149 > java.util.concurrent.ThreadPoolExecutor$Worker.run():624 > java.lang.Thread.run():748 (state=,code=0) > java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked > by query. Memory leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > Fragment 3:0 > Please, refer to logs for more information. > [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010] > (java.lang.IllegalStateException) Memory was leaked by query. Memory > leaked: (256) > Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit) > org.apache.drill.exec.memory.BaseAllocator.close():520 > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552 > org.apache.drill.exec.ops.FragmentContextImpl.close():546 > > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386 > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214 >