[ http://issues.apache.org/jira/browse/DERBY-634?page=all ]

Andrew McIntyre reopened DERBY-634:
-----------------------------------

             
reopening to change Fix In version.

> Subquery materialization can cause stack overflow
> -------------------------------------------------
>
>                 Key: DERBY-634
>                 URL: http://issues.apache.org/jira/browse/DERBY-634
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.1.1
>            Reporter: Jeff Lichtman
>         Assigned To: Satheesh Bandaram
>             Fix For: 10.2.1.0
>
>         Attachments: DERBY-634.diff
>
>
> A performance optimization in subquery processing can cause a stack overflow.
> The optimization materializes a subquery ResultSet in memory where it thinks 
> the rows will fit in memory. The materialization is done as a  set of  nested 
> unions of constant rows (UnionResultSets and RowResultSets). If there are a 
> lot of rows this can cause a stack overflow when fetching a row.
> The obvious fix is to make it use an iterative technique rather than a 
> recursive one for storing and returning the rows. See the method 
> BaseActivation.materializeResultSetIfPossible() in the language execution 
> code.
> There are some other issues with this performance optimization that should be 
> looked at:
> 1) The optimization can backfire, making the query run much slower. For 
> example, in the query:
>     select * from one_row_table where column1 not in
>         (select column2 from million_row_table)
> reading million_row_table into memory is an expensive operation. If there is 
> an index on million_row_table.column2, the query should return a result very 
> quickly despite the large size of million_row_table by doing a single probe 
> into million_row_table via the index.
> Since in-memory materialization can be an expensive operation, the decision 
> about whether to do it should be made based on query optimizer cost 
> estimates. See SubqueryNode.generateExpression().
> 2) It may not be wise to cache partial query results in memory at all. 
> Although this can help performance in some cases, it also chews up memory. 
> This is different from a limited-size cache with a backing store (like what 
> the store uses for page caching). The language has no way to limit the total 
> amount of memory used in this type of processing. Note that hash joins 
> originally used in-memory hash tables with no backing store, and that a 
> backing store was added later.
> 3) The implementation of this optimization has some problems. The decision to 
> materialize the subquery results in memory is made during code generation - 
> all such decisions should be made during the compilation phase. It's not 
> clear to me why materializeResultSetIfPossible() is in BaseActivation - I 
> would expect the of materialization to be done by a type of ResultSet, not by 
> a method in BaseActivation. Also, this method calls getMaxMemoryPerTable() in 
> the OptimizerFactory - nothing in the execution code should refer to anything 
> in the compilation code (perhaps getMaxMemoryPerTable() should be moved 
> somewhere else).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to