[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-20 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019540#comment-17019540
 ] 

benj commented on DRILL-7449:
-

[~arina], I would like but it's not possible, it's not a problem of size but a 
regulatory content issue.

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-20 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019510#comment-17019510
 ] 

Arina Ielchiieva commented on DRILL-7449:
-

[~benj641] why not provide the file itself? If it's too big to be attached, you 
can always use file share system.

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
>   

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-20 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019508#comment-17019508
 ] 

benj commented on DRILL-7449:
-

Hi [~IhorHuzenko]
I realized that the problem may from input passed to _parse_url_.
With the strict repetition of 2 extracted from beginning I can't produce the 
problem.  
But I have isolated typical row (from big original data) that can produce the 
problem when they are many.

Others example of possible rows:
{noformat}
{"Attributable":true,"Description":"Website has been identified as malicious by 
Bing","FirstReportedDateTime":"2018-03-12T17:40:01Z","IndicatorExpirationDateTime":"2018-04-11T23:39:23Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T17:40:01Z","NetworkDestinationAsn":0,"NetworkDestinationIPv4":"255.255.255.255","NetworkDestinationPort":80,"Tags":["??"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://www.guruvittal.org/lzp/gets.php?hl=Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¹Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%9AÃ%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82©Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82³-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%9AÃ%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82©Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82³-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82²Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%83Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%99Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%82Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82Â%C2%86","Version":1.5}
{"Attributable":true,"Description":"Website has been identified as malicious by 
Bing","FirstReportedDateTime":"2018-03-12T17:54:33Z","IndicatorExpirationDateTime":"2018-04-11T23:39:23Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T17:54:33Z","NetworkDestinationAsn":0,"NetworkDestinationIPv4":"255.255.255.255","NetworkDestinationPort":80,"Tags":["??"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://www.guruvittal.org/lzp/gets.php?hl=Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82·Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82²-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¨Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82±Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82¯Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¿Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82½?-Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¹Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82±Ã%C2%83Â%C2%83Ã%C2%82Â%C2%83Ã%C2%83Â%C2%82Ã%C2%82Â%C2%98Ã%C2%83Â%C2%83Ã%C2%82Â%C2%82Ã%C2%83Â%C2%82Ã%C2%82¨","Version":1.5}
{"Attributable":true,"Description":"Website has been identified as malicious by 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-17 Thread Igor Guzenko (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018156#comment-17018156
 ] 

Igor Guzenko commented on DRILL-7449:
-

Hello [~benj641], from the attached log I found that thread for fragment 3.0 
failed while closing allocator for HashPartitionSender. I quess it's something 
between second project and sort operator in your physical plan (maybe 
HashToRandomExchange...). 
{code}
2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] DEBUG 
o.a.d.exec.ops.OperatorContextImpl - Closing context for 
org.apache.drill.exec.physical.config.Project
2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] DEBUG 
o.a.d.exec.ops.OperatorContextImpl - Closing context for 
org.apache.drill.exec.physical.config.Project
2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] DEBUG 
o.a.d.exec.ops.OperatorContextImpl - Closing context for 
org.apache.drill.exec.physical.config.HashPartitionSender

2020-01-16 10:34:11,457 [21dfc400-1fab-4eca-ce7d-babb333b1ce6:frag:3:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 21dfc400-1fab-4eca-ce7d-babb333b1ce6:3:0: 
State change requested RUNNING --> FAILED
{code}

Since I'm not an expert in Drill's custom memory management (aka 
BufferAllocator and related things) I won't guarantee that I'll fix the issue 
in a short time without repro on my machine.  I hope I'll have some time to 
spend on the issue and find potential reasons causing the problem. 

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-16 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016797#comment-17016797
 ] 

benj commented on DRILL-7449:
-

hi [~IhorHuzenko], I have enable debug logging and the result is here:  
[^embedded_sqlline_with_enable_debug_logging.log.txt]

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt, 
> embedded_sqlline_with_enable_debug_logging.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-15 Thread Igor Guzenko (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016168#comment-17016168
 ] 

Igor Guzenko commented on DRILL-7449:
-

hi [~benj641] unfortunately the attached log contains very little information 
about prerequisites of the error. From query profile seems that at some point 
major fragment 3.0 failed and then cancellation request was sent to other 
operators above.  Could you please enable debug logging in your embedded Drill 
and share the new log file with detailed error info? 

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
> Attachments: embedded_FullJsonProfile.txt, embedded_sqlline.log.txt
>
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-15 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016064#comment-17016064
 ] 

benj commented on DRILL-7449:
-

[~IhorHuzenko], please find in attachment (execution with leak from my local 
machine in embedded 1.17 on xubuntu 18.04):
 - [^embedded_FullJsonProfile.txt]
 - [^embedded_sqlline.log.txt]

The Physical plan:
{noformat}
00-00Screen : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number of 
records written): rowcount = 5845417.0, cumulative cost = {7.07295457E7 rows, 
7.424585516618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 
memory}, id = 739
00-01  Project(Fragment=[$0], Number of records written=[$1]) : rowType = 
RecordType(VARCHAR(255) Fragment, BIGINT Number of records written): rowcount = 
5845417.0, cumulative cost = {7.0145004E7 rows, 7.418740099618018E8 cpu, 
5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 738
00-02Writer : rowType = RecordType(VARCHAR(255) Fragment, BIGINT Number 
of records written): rowcount = 5845417.0, cumulative cost = {6.4299587E7 rows, 
7.301831759618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 
memory}, id = 737
00-03  ProjectAllowDup(Domain=[$0]) : rowType = RecordType(ANY Domain): 
rowcount = 5845417.0, cumulative cost = {5.845417E7 rows, 7.243377589618018E8 
cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 736
00-04Project(Domain=[$0]) : rowType = RecordType(ANY Domain): 
rowcount = 5845417.0, cumulative cost = {5.2608753E7 rows, 7.184923419618018E8 
cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 memory}, id = 735
00-05  SingleMergeExchange(sort0=[0]) : rowType = RecordType(ANY 
Domain): rowcount = 5845417.0, cumulative cost = {4.6763336E7 rows, 
7.126469249618018E8 cpu, 5.985707595E9 io, 4.7885656064E10 network, 4.6763336E7 
memory}, id = 734
01-01OrderedMuxExchange(sort0=[0]) : rowType = RecordType(ANY 
Domain): rowcount = 5845417.0, cumulative cost = {4.0917919E7 rows, 
6.658835889618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 
memory}, id = 733
02-01  SelectionVectorRemover : rowType = RecordType(ANY 
Domain): rowcount = 5845417.0, cumulative cost = {3.5072502E7 rows, 
6.600381719618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 4.6763336E7 
memory}, id = 732
02-02Sort(sort0=[$0], dir0=[ASC]) : rowType = 
RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {2.9227085E7 
rows, 6.541927549618018E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 
4.6763336E7 memory}, id = 731
02-03  HashToRandomExchange(dist0=[[$0]]) : rowType = 
RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {2.3381668E7 
rows, 1.28599174E8 cpu, 5.985707595E9 io, 2.3942828032E10 network, 0.0 memory}, 
id = 730
03-01Project(Domain=[ITEM($0, 'host')]) : rowType = 
RecordType(ANY Domain): rowcount = 5845417.0, cumulative cost = {1.7536251E7 
rows, 3.5072502E7 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 729
03-02  Project(parsed=[PARSE_URL($0)]) : rowType = 
RecordType(ANY parsed): rowcount = 5845417.0, cumulative cost = {1.1690834E7 
rows, 2.9227085E7 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 728
03-03Scan(table=[[dfs, tmp, 
fbingredagg.bigcopy.json]], groupscan=[EasyGroupScan 
[selectionRoot=file:/tmp/fbingredagg.bigcopy.json, numFiles=1, columns=[`Url`], 
files=[file:/tmp/fbingredagg.bigcopy.json], schema=null]]) : rowType = 
RecordType(ANY Url): rowcount = 5845417.0, cumulative cost = {5845417.0 rows, 
5845417.0 cpu, 5.985707595E9 io, 0.0 network, 0.0 memory}, id = 727
{noformat}
And the Operator profile
 Note that Rows are 8 695 808 although in the file there is 8 999 940 rows
{noformat}
Operator ID TypeAvg Setup Time  Max Setup Time  Avg Process Time
Max Process TimeMin Wait Time   Avg Wait Time   Max Wait Time   % 
Fragment Time % Query TimeRowsAvg Peak Memory Max Peak Memory
00-xx-00SCREEN  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
0,94%   0,00%   0   -   -
00-xx-01PROJECT 0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
2,37%   0,00%   0   -   -
00-xx-02PARQUET_WRITER  0,000s  0,000s  0,000s  0,000s  0,000s  0,000s  
0,000s  6,08%   0,00%   0   -   -
00-xx-03PROJECT_ALLOW_DUP   0,000s  0,000s  0,000s  0,000s  0,000s  
0,000s  0,000s  16,61%  0,00%   0   52KB52KB
00-xx-04PROJECT 0,001s  0,001s  0,000s  0,000s  0,000s  0,000s  0,000s  
35,03%  0,00%   0   52KB52KB
00-xx-05MERGING_RECEIVER0,000s  0,000s  0,000s  0,000s  40,382s 
40,382s 40,382s 38,96%  0,00%   0   52KB52KB
01-xx-00SINGLE_SENDER   0,000s  0,000s  0,000s  0,000s  0,001s  

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-15 Thread Igor Guzenko (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016011#comment-17016011
 ] 

Igor Guzenko commented on DRILL-7449:
-

Hello [~benj641], I'm still trying to reproduce the leak, made about 40 
unsuccessful attempts... Could you please attach all available log files and 
full query profile JSON to this Jira?  

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-14 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014960#comment-17014960
 ] 

benj commented on DRILL-7449:
-

Hi [~IhorHuzenko]

The problem doesn't appears for each run. Sometimes (with exactly the same 
data) it will works 5 times before to crash.

With the official 1.17 on a small cluster 3 node (for each ~ 48 proc / 128 Go 
(DRILL_HEAP=15G, DRILL_MAX_DIRECT_MEMORY=80G))
 With a file of 688Mo / 1 118 320 JSON records

On cluster When comparing profile of correct and crashed executions I can see 
that :
 - crash appears at "02-xx-02 - EXTERNAL_SORT" level
 - on "02-xx-03 - UNORDERED_RECEIVER" :
 - correct execution : 99% of the Max Records are concentrated on 1 of the 8 
Minor fragment, and the cumulative total is correct
 - on crash execution : Max Record are ~ evenly/homogeneously distributed on 
the 8 Minor fragment and the cumulative total is incorrect (lower) (already 
incorrect in 03-xx-02 - PROJECT and 03-xx-00 - JSON_SUB_SCAN )

On my local Machine (1.17 too 8 Proc / 32Go),  in embedded mode, When comparing 
profile of correct and crashed executions I can see that :
 - crash appears at "02-xx-02 - EXTERNAL_SORT" level
 - The difference is on 03-xx-00 - JSON_SUB_SCAN, crash execution doesn't have 
the good number for Max Records
 - for 02-xx-03 - UNORDERED_RECEIVER , in correct and crash Max Records are ~ 
evenly/homogeneously distributed on the 6 Minor fragment

Example of log data from crash execution on cluster:
{noformat}
  2020-01-14 08:22:33,681 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
21e285b6-4d53-58fd-8a4d-dedc0cbfb86a issued by anonymous: CREATE TABLE 
dfs.test.`output_pqt` AS (
SELECT R.parsed.host AS D FROM (SELECT parse_url(T.Url) AS parsed FROM 
dfs.test.`demo2.big.json` AS T) AS R ORDER BY D
)
2020-01-14 08:22:33,724 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:foreman] INFO  
o.a.d.e.p.s.h.CreateTableHandler - Creating persistent table [output_pqt].
2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:3] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: 
State change requested AWAITING_ALLOCATION --> RUNNING
2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: 
State change requested AWAITING_ALLOCATION --> RUNNING
2020-01-14 08:22:33,779 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:5] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: 
State change requested AWAITING_ALLOCATION --> RUNNING
2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:7] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:7: 
State to report: RUNNING
2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:3] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: 
State to report: RUNNING
2020-01-14 08:22:33,780 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:2:5] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:5: 
State to report: RUNNING
2020-01-14 08:22:33,782 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: 
State change requested AWAITING_ALLOCATION --> RUNNING
2020-01-14 08:22:33,782 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:1:2] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: 
State to report: RUNNING
2020-01-14 08:22:33,787 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: 
State change requested AWAITING_ALLOCATION --> RUNNING
2020-01-14 08:22:33,787 [21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:frag:0:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: 
State to report: RUNNING
2020-01-14 08:22:41,672 [BitServer-2] INFO  o.a.d.e.w.fragment.FragmentExecutor 
- 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State change requested RUNNING --> 
CANCELLATION_REQUESTED
2020-01-14 08:22:41,673 [BitServer-2] INFO  o.a.d.e.w.f.FragmentStatusReporter 
- 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:0:0: State to report: 
CANCELLATION_REQUESTED
2020-01-14 08:22:41,674 [BitServer-2] INFO  o.a.d.e.w.fragment.FragmentExecutor 
- 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State change requested RUNNING --> 
CANCELLATION_REQUESTED
2020-01-14 08:22:41,674 [BitServer-2] INFO  o.a.d.e.w.f.FragmentStatusReporter 
- 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:1:2: State to report: 
CANCELLATION_REQUESTED
2020-01-14 08:22:41,675 [BitServer-2] INFO  o.a.d.e.w.fragment.FragmentExecutor 
- 21e285b6-4d53-58fd-8a4d-dedc0cbfb86a:2:3: State change requested RUNNING --> 
CANCELLATION_REQUESTED
2020-01-14 08:22:41,675 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-13 Thread Igor Guzenko (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014414#comment-17014414
 ] 

Igor Guzenko commented on DRILL-7449:
-

Hello [~benj641], 

Unfortunately, I wasn't able to reproduce the issue from the description. I've 
generated 500k, 1 and 10 million JSON rows with unique URLs and used Drill 1.16 
and latest master build in embedded mode. Both versions of Drill did well for 
the query from the description. Could you please share more details about your 
query (query profile) and cluster topology?

Thank you in advance, 
Igor


> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-13 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014216#comment-17014216
 ] 

benj commented on DRILL-7449:
-

I have had a full check and in reality we havn't used the drill-url-tools 
because it sometimes produce incorrect values on big dataset (due to memory 
problem catch into UDF ?) .

After some other tests, the standard Drill *parse_url* works well (no Memory 
leak) +if remove the ORDER BY clause+.
And note that Memory leaked can already appears with url_parse (from 
drill-url-tools) if using ORDER BY clause produce already.

The only code that does not cause any critical problem for our use is regexp of 
the type:
{code:sql}
SELECT REGEXP_REPLACE(Activity,'^(?:.*:.*@)?([^:]*)(?::.*)?$','$1') As Host
FROM (SELECT REGEXP_REPLACE(NULLIF(Url, 
''),'^(?:(?:[^:/?#]+):)?(?://([^/?#]*))(?:[^?#]*)?(?:.*)?','$1') AS Activity 
FROM ...)
{code}

Don't know why, but in terms of observation, ORDER BY clause produce number of 
error of different contexts with complex request and it's sometimes necessary 
to split the request into 2 distinct requests (one for the SELECT with 
computations and one for the SELECT with ORDER BY)

Note that with the regexp there is no error even with ORDER BY clause.

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-10 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012893#comment-17012893
 ] 

Arina Ielchiieva commented on DRILL-7449:
-

Thanks, this is really interesting issue.
[~benj641] please confirm that you don't see memory leak when using url_parse 
from external source?

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-10 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012874#comment-17012874
 ] 

benj commented on DRILL-7449:
-

In the meantime and if it can helps someone

I have found this page 
[https://www.r-bloggers.com/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names/]
 and now using the function _url_parse_ from 
[https://github.com/hrbrmstr/drill-url-tools]
 that use [http://galimatias.mola.io/]

 

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-10 Thread Arina Ielchiieva (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012841#comment-17012841
 ] 

Arina Ielchiieva commented on DRILL-7449:
-

This should be investigated.

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> 

[jira] [Commented] (DRILL-7449) memory leak parse_url function

2020-01-10 Thread Charles Givre (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012838#comment-17012838
 ] 

Charles Givre commented on DRILL-7449:
--

Do we know the underlying cause for this?  

> memory leak parse_url function
> --
>
> Key: DRILL-7449
> URL: https://issues.apache.org/jira/browse/DRILL-7449
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
>Reporter: benj
>Assignee: Igor Guzenko
>Priority: Major
>
> Requests with *parse_url* works well when the number of treated rows is low 
> but produce memory leak when number of rows grows (~ between 500 000 and 1 
> million) (and for certain number of row sometimes the request works and 
> sometimes it failed with memory leaks)
> Extract from dataset tested:
> {noformat}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:49:38Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:49:38Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"172.217.8.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/beginilah-cara-orang-jepang-berpacaran.html","Version":1.5}
> {"Attributable":true,"Description":"Website has been identified as malicious 
> by 
> Bing","FirstReportedDateTime":"2018-03-12T18:14:51Z","IndicatorExpirationDateTime":"2018-04-11T23:33:13Z","IndicatorProvider":"Bing","IndicatorThreatType":"MaliciousUrl","IsPartnerShareable":true,"IsProductLicensed":true,"LastReportedDateTime":"2018-03-12T18:14:51Z","NetworkDestinationAsn":15169,"NetworkDestinationIPv4":"216.58.192.193","NetworkDestinationPort":80,"Tags":["us"],"ThreatDetectionProduct":"ES","TLPLevel":"Amber","Url":"http://pasuruanbloggers.blogspot.ru/2012/12/cara-membuat-widget-slideshow-postingan.html","Version":1.5}
> {noformat}
> Request tested:
> {code:sql}
> ALTER SESSION SET `store.format`='parquet';
> ALTER SESSION SET `store.parquet.use_new_reader` = true;
> ALTER SESSION SET `store.parquet.compression` = 'snappy';
> ALTER SESSION SET `drill.exec.functions.cast_empty_string_to_null`= true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> ALTER SESSION SET `exec.enable_union_type` = true;
> ALTER SESSION SET `store.json.all_text_mode` = true;
> CREATE TABLE dfs.test.`output_pqt` AS
> (
> SELECT R.parsed.host AS Domain
> FROM ( 
>   SELECT parse_url(T.Url) AS parsed
>   FROM dfs.test.`file.json` AS T
> ) AS R 
> ORDER BY Domain
> );
> {code}
>  
>  Result when memory leak:
> {noformat}
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():329
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1149
> java.util.concurrent.ThreadPoolExecutor$Worker.run():624
> java.lang.Thread.run():748 (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: IllegalStateException: Memory was leaked 
> by query. Memory leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> Fragment 3:0
> Please, refer to logs for more information.
> [Error Id: 3ffa5b43-0dde-4518-bb5a-ea3aab97f3d4 on servor01:31010]
>   (java.lang.IllegalStateException) Memory was leaked by query. Memory 
> leaked: (256)
> Allocator(frag:3:0) 300/256/9337280/300 (res/actual/peak/limit)
> org.apache.drill.exec.memory.BaseAllocator.close():520
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose():552
> org.apache.drill.exec.ops.FragmentContextImpl.close():546
> 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources():386
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup():214
>