[ http://issues.apache.org/jira/browse/DERBY-634?page=all ]
Andrew McIntyre closed DERBY-634. --------------------------------- > Subquery materialization can cause stack overflow > ------------------------------------------------- > > Key: DERBY-634 > URL: http://issues.apache.org/jira/browse/DERBY-634 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.1.1.1 > Reporter: Jeff Lichtman > Assigned To: Satheesh Bandaram > Fix For: 10.2.1.0 > > Attachments: DERBY-634.diff > > > A performance optimization in subquery processing can cause a stack overflow. > The optimization materializes a subquery ResultSet in memory where it thinks > the rows will fit in memory. The materialization is done as a set of nested > unions of constant rows (UnionResultSets and RowResultSets). If there are a > lot of rows this can cause a stack overflow when fetching a row. > The obvious fix is to make it use an iterative technique rather than a > recursive one for storing and returning the rows. See the method > BaseActivation.materializeResultSetIfPossible() in the language execution > code. > There are some other issues with this performance optimization that should be > looked at: > 1) The optimization can backfire, making the query run much slower. For > example, in the query: > select * from one_row_table where column1 not in > (select column2 from million_row_table) > reading million_row_table into memory is an expensive operation. If there is > an index on million_row_table.column2, the query should return a result very > quickly despite the large size of million_row_table by doing a single probe > into million_row_table via the index. > Since in-memory materialization can be an expensive operation, the decision > about whether to do it should be made based on query optimizer cost > estimates. See SubqueryNode.generateExpression(). > 2) It may not be wise to cache partial query results in memory at all. > Although this can help performance in some cases, it also chews up memory. > This is different from a limited-size cache with a backing store (like what > the store uses for page caching). The language has no way to limit the total > amount of memory used in this type of processing. Note that hash joins > originally used in-memory hash tables with no backing store, and that a > backing store was added later. > 3) The implementation of this optimization has some problems. The decision to > materialize the subquery results in memory is made during code generation - > all such decisions should be made during the compilation phase. It's not > clear to me why materializeResultSetIfPossible() is in BaseActivation - I > would expect the of materialization to be done by a type of ResultSet, not by > a method in BaseActivation. Also, this method calls getMaxMemoryPerTable() in > the OptimizerFactory - nothing in the execution code should refer to anything > in the compilation code (perhaps getMaxMemoryPerTable() should be moved > somewhere else). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
