[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-22 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113114#comment-15113114
 ] 

Jacques Nadeau commented on DRILL-4266:
---

netty?

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-22 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113105#comment-15113105
 ] 

Victoria Markman commented on DRILL-4266:
-

I should change the title. After [~jacq...@dremio.com] added RPC metrics we 
confirmed that it has nothing to do with RPC. It's most likely betty business 
we debugged couple of months before.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-22 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113060#comment-15113060
 ] 

Victoria Markman commented on DRILL-4266:
-

Download "wget http://apache-drill.s3.amazonaws.com/files/mondrian_parquet.tgz; 
into /drill/testdata/mondrian and untar 
it.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-22 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113141#comment-15113141
 ] 

Victoria Markman commented on DRILL-4266:
-

netty netty :)


> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-22 Thread Deneche A. Hakim (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112962#comment-15112962
 ] 

Deneche A. Hakim commented on DRILL-4266:
-

[~vicky] the tests require the following schema: dfs.drillTestDirMondrian. How 
to generate it ?

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-19 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107208#comment-15107208
 ] 

Victoria Markman commented on DRILL-4266:
-

What I've seen in the past is this behavior:

1. You start the server and execute query that requires 2GB of direct memory to 
run. It succeeds.
2. You run bunch of concurrent queries like in this bug: memory grows, but you 
don't get OOM
3. You run query #1 and now it fails with OOM

To answer your direct question: I never tried setting direct memory to 1 GB and 
running queries to see if I get an OOM. Will give it a try.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-19 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107133#comment-15107133
 ] 

Jacques Nadeau commented on DRILL-4266:
---

[~vicky], can you confirm whether this ultimately results in an OOM or simply 
is seen as memory growth.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105411#comment-15105411
 ] 

Victoria Markman commented on DRILL-4266:
-

Important detail: no memory leak was detected on shutdown. 

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105414#comment-15105414
 ] 

Jacques Nadeau commented on DRILL-4266:
---

In these attachments, I don't see the information on what the Web UI is 
reporting in terms of memory allocation. Were you able to capture that and I 
just missed it? The UI will tell us how much memory is owned by the various RPC 
layers as well as other key metrics.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105469#comment-15105469
 ] 

Victoria Markman commented on DRILL-4266:
-

Of course I forgot about WebUI ... Attaching 500 iterations of the same: 
WebUI_500_iterations.txt, 
memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105674#comment-15105674
 ] 

Jacques Nadeau commented on DRILL-4266:
---

The output provided shows minimal use of the RPC layer. Are any of the tests 
run going to run in more than one fragment?

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105741#comment-15105741
 ] 

Victoria Markman commented on DRILL-4266:
-

No queries that are run in more than one fragment.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105979#comment-15105979
 ] 

Jacques Nadeau commented on DRILL-4266:
---

Based on these metrics, the leak isn't in the RPC layer. Let me add some more 
metrics and we'll get a better snapshot of the memory allocation caching layer.

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-18 Thread Victoria Markman (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106033#comment-15106033
 ] 

Victoria Markman commented on DRILL-4266:
-

If you look at the memory output, it looks very strange: it's not like it is 
growing constantly. It grows and drops, then grows a bit again, and drops. I 
thought that it stabilized somewhere around 1.8GB, but no, in one of the runs 
it went to 2GB. This looks more like fragmentation of some sort. Addin more 
metrics will help a lot. It would be nice if we could graph that staff as well. 

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: WebUI_500_iterations.txt, drill.log.2016-01-12-16, 
> memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, 
> memComsumption_framework.output_Mon_Jan_18_15_500_iterations.txt, 
> memComsumption_framework.output_Sun_Jan_17_04_jacques_branch_drill-4131, 
> test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-15 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102752#comment-15102752
 ] 

Jacques Nadeau commented on DRILL-4266:
---

I have a patch in DRILL-4131 that may give us some more information here. Once 
I run some more tests, maybe you can try it here. It doesn't address allocation 
but it does have better metrics. After seeing the growth, it would be 
interesting to look at the drill.alloc.* metrics in the web ui to see where the 
growth is coming from.

I'm running tests on my branch now and will let you know once it is ready to 
try.

[1] https://github.com/apache/drill/pull/327

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4266) Possible memory leak (fragmentation ?) in rpc layer

2016-01-15 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102863#comment-15102863
 ] 

Jacques Nadeau commented on DRILL-4266:
---

[~vicky], looks like this patch is pretty clean. Can you see what the metrics 
report once you run your test again?

> Possible memory leak (fragmentation ?)  in rpc layer
> 
>
> Key: DRILL-4266
> URL: https://issues.apache.org/jira/browse/DRILL-4266
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Jacques Nadeau
> Attachments: drill.log.2016-01-12-16, memComsumption.txt, 
> memComsumption_framework.output_Fri_Jan_15_width_per_node=4.log, test.tar
>
>
> I have executed 5 tests from Advanced/mondrian test suite in a loop overnight.
> My observation is that direct memory steadily grew from 117MB to 1.8GB and 
> remained on that level for 14875 iteration of the tests.
> My question is: why do 5 queries that were able to execute with 117MB of 
> memory require 1.8GB of memory after 5 hours of execution ?
> Attached:
> * Memory used after each test iteration : memComsumption.txt
> * Log of the framework run: drill.log.2016-01-12-16
> * Tests: test.tar
> Setup:
> {noformat}
> Single node 32 core box. 
> DRILL_MAX_DIRECT_MEMORY="4G"
> DRILL_HEAP="1G"
> 0: jdbc:drill:schema=dfs> select * from sys.options where status like 
> '%CHANGED%';
> +---+--+-+--+--+-+---++
> |   name|   kind   |  type   |  status  | num_val 
>  | string_val  | bool_val  | float_val  |
> +---+--+-+--+--+-+---++
> | planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | null
>  | null| true  | null   |
> +---+--+-+--+--+-+---++
> 1 row selected (1.309 seconds)
> {noformat}
> {noformat}
> Reproduction:
> * tar xvf test.tar into Functional/test directory 
> * ./run.sh -s Functional/test -g regression -t 180 -n 5 -i 1000 -m
> {noformat}
> This is very similar behavior as Hakim and I observed long time ago with 
> window functions. Now, that new allocator is in place we rerun this test and 
> we see the similar things, and allocator does not seem to think that we have 
> a memory leak. Hence the speculation that memory is leaked in RPC layer.
> I'm going to reduce planner.width.max_per_node and see if it has any effect 
> on memory allocation (speculating again ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)