[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dag H. Wanvik updated DERBY-827: Derby Categories: [Performance] Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Reporter: Daniel John Debrunner Assignee: Dyre Tjeldvoll Fix For: 10.3.1.4 Attachments: candidate.diff, candidate.stat, close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, MiscResultSetConstantAction.diff, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, resetMembersScrollInsensitive.diff, RowChanger.diff, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, TempTableToExecute.diff, test-isolation.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Knut Anders Hatlen updated DERBY-827: - Attachment: candidate.stat candidate.diff I think the candidate row fix proposed in derby-827.extra.diff is correct, although I think it is better to place it in the base class where candidate is declared and instantiated. I also noticed that HashScanResultSet, TableScanResultSet, LastIndexKeyResultSet and DependentResultSet all use candidate rows the same way and probably all need the same fix. The first three of them share a common super class (ScanResultSet), and the last one looks like it should have been a sub-class of ScanResultSet. I have therefore created a patch (candidate.diff) which makes DependentResultSet extend ScanResultSet and pushes the creation/resetting of the candidate row into the base class. Also, Dyre mentioned that he would have preferred to use DataValueDescriptor.setToNull() instead of getNewNull(). To avoid the asserts that prevented him from doing it that way, I added a new recycle() method to the DataValueDescriptor interface. This method simply invokes restoreToNull() and returns itself if possible. If it can't set its value to null, it allocates a new object which it returns. suites.All ran cleanly with this patch. Derbyall has not finished yet. Will report back if it fails. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: candidate.diff, candidate.stat, close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, MiscResultSetConstantAction.diff, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, resetMembersScrollInsensitive.diff, RowChanger.diff, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, TempTableToExecute.diff, test-isolation.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: RowChanger.diff Attaching a patch (RowChanger.diff). An ASSERT is triggered in RowChanger.open() when open() is called multiple times without an intervening close(). This happens in lang/refActions.sql when re-using result sets. The patch makes two changes: 1) Adds a call to RowChanger.close() in DeleteResultSet.cleanUp() 2) In DeleteCascadeResultSet.open() it moves the call to cleanUp() (which in turn calls DeleteResultSet.cleanUp()) into the finally block so that it gets called even when an exception is thrown. suites.All and derbyall both run without errors. The patch can be committed independently. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, MiscResultSetConstantAction.diff, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, resetMembersScrollInsensitive.diff, RowChanger.diff, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, TempTableToExecute.diff, test-isolation.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: resetMembersScrollInsensitive.diff Attaching a patch (resetMembersScrollInsensitive.diff) which fixes the failure in lang/ScrollCursors1Test, which can be applied independently. derbyall and suites.All pass. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, MiscResultSetConstantAction.diff, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, resetMembersScrollInsensitive.diff, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, test-isolation.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Knut Anders Hatlen updated DERBY-827: - Attachment: test-isolation.diff Attaching a patch (test-isolation.diff) which modifies these test cases in ResultSetsFromPreparedStatementTests: testScalarAggregateResultSet, testLastIndexKeyResultSet, testDistinctScanResultSet, testDistinctScalarAggregateResultSet, testDistinctGroupedAggregateResultSet, testGroupedAggregateResultSet, testNestedLoopResultSet, testHashTableResultSet, testNestedLoopLeftOuterJoinResultSet, testHashLeftOuterJoinResultSet They now try to re-execute the statements after dropping and recreating the tables and after changing the transaction isolation level. The test still passes on trunk, but all the modified test cases fail with a lock timeout when derby827_update920.txt is applied. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, MiscResultSetConstantAction.diff, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, test-isolation.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: MiscResultSetConstantAction.diff Attaching MiscResultSetConstantAction.diff. derbyall and suites.All pass. I believe I know the why lang/declareGlobalTempTableJava is failing. It turns out that a call to 'markTempTableAsModifiedInUnitOfWork' gets compiled into the fillResultSet() method when accessing a (global?) temporary table. Since fillResultSet() now is only called on the first execution the temp table isn't marked as modified in later executions, and rollback doesn't clean it up as it should. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, MiscResultSetConstantAction.diff, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Knut Anders Hatlen updated DERBY-827: - Attachment: d827-close-cleanup.diff I had a look at Dyre's snapshot of his sandbox (derby-827.snapshot.diff) and I think some of the changes could be committed independently. I applied the changes to the close methods of ProjectRestrictResultSet, LastIndexKeyResultSet and InsertResultSet on a clean sandbox, as these changes looked simple and harmless to me, and ran suites.All and derbyall with no failures. (I removed the check for source != null in ProjectRestrictResultSet.close() since its constructor asserts that source is not null.) Attached a new patch containing only these changes (d827-close-cleanup.diff) which I plan to commit unless anyone objects. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827-close-cleanup.diff, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from
Re: [jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
Dyre Tjeldvoll (JIRA) wrote: Here is the repro (test_inbetween.sql) requested by AB as well as a snapshot patch of my sandbox (includes Dan's changes and my fixups) (derby-827.snapshot.diff). Feel free to comment on the patch, but please note that it is work in progress, (that means I will ignore any comments about missing comments and/or poor indentation). Hi Dyre, I tried the repro it works as expected for me!!! ---snip--- ij -- Only parameter markers (no constants). prepare p1 as 'select * from bt1 where i in (?, ?)'; ij execute p1 using 'values (2, 4)'; I |C|DE 2 |two |22.2 1 row selected ij execute p1 using 'values (-2, -4)'; I |C|DE 0 rows selected -end snip--- I had tried this repro on the development trunk only (10.3) Updated to revision 523546. Am I missing any thing ? Saurabh The patch traces creation of all language result sets, and this creates a lot of output. This means that no diff-based tests will pass with this patch. Most of this output can be removed by reverting GenericResultSetFactory.java after applying the patch. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, noclose_finish.txt, noclose_nofinish.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an
Re: [jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
Sorry for not being clear in previous reply, I haven't applied you patch before running the repro!! but still I got 0 rows selected Saurabh Vyas wrote: Dyre Tjeldvoll (JIRA) wrote: Here is the repro (test_inbetween.sql) requested by AB as well as a snapshot patch of my sandbox (includes Dan's changes and my fixups) (derby-827.snapshot.diff). Feel free to comment on the patch, but please note that it is work in progress, (that means I will ignore any comments about missing comments and/or poor indentation). Hi Dyre, I tried the repro it works as expected for me!!! ---snip--- ij -- Only parameter markers (no constants). prepare p1 as 'select * from bt1 where i in (?, ?)'; ij execute p1 using 'values (2, 4)'; I |C|DE 2 |two |22.2 1 row selected ij execute p1 using 'values (-2, -4)'; I |C|DE 0 rows selected -end snip--- I had tried this repro on the development trunk only (10.3) Updated to revision 523546. Am I missing any thing ? Saurabh
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] A B updated DERBY-827: -- Attachment: multiprobe_notTested.patch In a thread on derby-dev I wrote the following: I think the approach would be to generate the array of probe values as a saved object, which can then be retrieved from the activation just like any other saved object. My perhaps slightly uninformed guess is that this shouldn't be too difficult--maybe a day or two of coding? It turns out my guess was indeed misinformed. I looked at the various places where existing code calls addSavedObject() and in all cases the object being passed in is a specific compile-time object. That's also what the javadoc for addSavedObject() indicates. But in the case of IN list probing values the object of interest (namely, an array of DataValueDescriptors) does not exist until execution time because we create it as part of code generation. So use of saved objects is not going to work here. That said, the generated values for parameters are pluggable, meaning that if we generate some DataValueDescriptor for a parameter p1, then whatever value is bound to p1 will end up in the generated DataValueDescriptor at execution time. If we re-execute the statement with a different value assigned to p1, then the generated DataValueDescriptor will end up with new value, as well. Given that, I think all we need to do is save off the DataValueDescriptor array that we receive in the MultiProbeTableScanResultSet constructor, and then reload from that array on every call to openCore(). The relevant code changes are pretty small; I'm attaching them as multiprobe_notTested.patch. With these simple changes to MultiProbeTableScanResultSet, the test_inbetween.sql script that you attached to DERBY-827 returned the correct results (even after applying derby-827.snapshot.diff). I haven't tested this at all, though (aside from running test_inbetween.sql), so please take this suggestion with caution. Just something I thought of when I realized that the savedObject approach wasn't going to work... Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby-827.snapshot.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, multiprobe_notTested.patch, noclose_finish.txt, noclose_nofinish.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff, test_inbetween.sql Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: close_nofinish.txt noclose_finish.txt noclose_nofinish.txt Inspired by the initial discussion in this issue and my own observations, I have compiled and attached 3 lists: noclose_nofinish.txt: Lang ResultSets that neither override close() nor finish() noclose_finish.txt: Lang ResultSets that don't override close() but overrides finish() close_nofinish.txt: Lang ResultSets that override close() but not finish() The explanation of the difference between close() and finish() makes sense to me, but how does cleanUp() fit into this? Looking at DeleteResultSet as an example, it seems like cleanUp() gets called at the end of open(). Presumably the author of the class knew that all the cleanup could be done at the end of open, and by calling the method 'cleanUp()' rather than 'close()' no extra work would have to be done when the ResultSet was closed. Maybe. I'm just guessing here... Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: close_nofinish.txt, d827_execute_method_cleanup.txt, derby-827.extra.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, noclose_finish.txt, noclose_nofinish.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel John Debrunner updated DERBY-827: Attachment: d827_execute_method_cleanup.txt d827_execute_cleanup is a patch that cleans up the generation of the execute method and cleans up comments to reflect reality. E.g. that fillResultSet() is required to create a result set, not just an artifact of a code generation limit (which no longer exists anyway). I think this will fix the problem with current_timestamp you are seeing, I've run limited testing on this, will run the complete tests, this can be committed independent of any re-use of ResultSet change, it's basically just cleanup that clarifies how execute fillResultSet work. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: d827_execute_method_cleanup.txt, derby-827.extra.diff, derby827_draft_reuse_result_sets.txt, derby827_update920.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - You can reply to this
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: rsfromps.v1.stat rsfromps.v1.diff I've attached a complete patch (v1) for the new test. It runs cleanly on trunk, but two fixtures fail when running with derby827_update920.txt applied. The new test has been added to lang._Suite. Please comment/review. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: derby827_draft_reuse_result_sets.txt, derby827_update920.txt, rsfromps.v1.diff, rsfromps.v1.stat, rsfromps_prelim.diff, rsfromps_prelim2.diff Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: rsfromps_prelim.diff I've attached a preliminary (hence no ASF license) patch that creates a new junit test that attempts to fill the mentioned gap in the existing test suites. The new test instantiates all ResultSets returned from GenericResultSetFactory from a PreparedStatement that is executed repeatedly. At this point all ResultSets except InsertVTIResultSet, DeleteVTIResultSet, UpdateVTIResultSet and MaterializedResultSet are created. According to the EMMA reports these ResultSets are never created when running derbyall or the junit tests, so I don't know how to write a test case that does. The preliminary version of this patch adds a println to the ResultSetFactory which shows the ResultSets that are instantiated and the Statement ID and query that did so. This makes it easier to see if the test does what it is supposed to, but would obviously have to be removed in a final version of the patch. Reviews/comments would be much appreciated. Thanks. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: derby827_draft_reuse_result_sets.txt, derby827_update920.txt, rsfromps_prelim.diff Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ https://issues.apache.org/jira/browse/DERBY-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dyre Tjeldvoll updated DERBY-827: - Attachment: rsfromps_prelim2.diff Attaching a modified test with better result checking, and that deletes rows from the underlying tables while executing PreparedStatements. (with ASF license this time, but it still isn't ready for inclusion). Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: https://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: derby827_draft_reuse_result_sets.txt, derby827_update920.txt, rsfromps_prelim.diff, rsfromps_prelim2.diff Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ http://issues.apache.org/jira/browse/DERBY-827?page=all ] Daniel John Debrunner updated DERBY-827: Attachment: derby827_update920.txt Updated version of the patch, accounting for the fact some of the changes were made under DERBY-1142. This is still an example patch, much more work is needed to ensure ResultSet implementations work as expected and can safely be re-used. Plus more testing of repeated use of PreparedStatements. Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: http://issues.apache.org/jira/browse/DERBY-827 Project: Derby Issue Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: derby827_draft_reuse_result_sets.txt, derby827_update920.txt Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (DERBY-827) Performance can be improved by re-using language ResultSets across Activation executions.
[ http://issues.apache.org/jira/browse/DERBY-827?page=all ] Daniel John Debrunner updated DERBY-827: Attachment: derby827_draft_reuse_result_sets.txt Patch that re-uses language ResultSets across activation executions. This is a draft patch to enable further testing only, around 25-30 tests fail with this patch. Some possibly due to incorrect close logic in ResultSet implementations, Some probably due to changes in runtime statistics output for items like the execution count. M java\engine\org\apache\derby\impl\sql\execute\NoPutResultSetImpl.java M java\engine\org\apache\derby\impl\sql\execute\BaseActivation.java M java\engine\org\apache\derby\impl\sql\GenericActivationHolder.java M java\engine\org\apache\derby\impl\jdbc\EmbedResultSet.java M java\engine\org\apache\derby\impl\jdbc\EmbedStatement.java M java\engine\org\apache\derby\iapi\sql\Activation.java Performance can be improved by re-using language ResultSets across Activation executions. - Key: DERBY-827 URL: http://issues.apache.org/jira/browse/DERBY-827 Project: Derby Type: Improvement Components: Performance Reporter: Daniel John Debrunner Attachments: derby827_draft_reuse_result_sets.txt Shouldn't DistinctScalarAggregateRS implement a close or a finish method (not sure what the difference is) and close the scan controller there. The close() and finish() methods are actually explained in their javadoc in the language org.apache.derby.iapi.sql.ResultSet class. [note this is not a JDBC java.sql.ResultSet object] close() - Tells the system that there will be no more calls to getNextRow() (until the next open() call) finish() - Tells the system that there will be no more access to any database information via this result set So close means the ResultSet may be opened again for more access, while finish means it will not be used again. However, their use in the code always doesn't match that, and that does cause confusion, at least to me. Language ResultSets (not JDBC ones) can be and are opened multiple times, for example when scanning a table multiple times within a join. An Activation, which represents the internal state of java.sql.PreparedStatement object has the lifetime of the java.sql.PreparedStatement, contains a top-level language ResultSet. This top-level language ResultSet provides the execution of the SQL statement, DML, DDL or a query. The top-level ResultSet may contain other ResultSets and could be seen as a tree structure. For the simple case of a primary key lookup query like: select name from customer where id = ? The activation would contain this: top result set ProjectRestrictRS IndexRowToBaseRowRS TableScanRS Now for some reason, even though the api of ResultSet say they can be re-used, and in some cases they are, this result set tree is thrown away after each execution. That is, the top result set has its finish() method called and then the activation removes its reference to it. Then on the next execution a new (identical) tree is set up. There is potential for a huge performance gain if this top level result set and its tree are re-used and have the same lifetime as the Activation. The saving comes in two forms, not having to create many objects on each execution, and not creating short-lived objects for the garbage collector to handle. I made a simple fix, it's a couple of lines of code, just calling close finish at the correct times, and for the above simple primary key lookup query, the performance went from 17,300 to 24,000 selects per second (cached data, single user). I'll post a patch shortly as an indication of the direction, once I can separate it from other changes in my client. However, I'm running the Derby tests and there are some (maybe 25-30) failures, I think because not all the language ResultSet implementations are correctly written to be re-opened. Interestingly, the first failure I saw was in an aggregrate test, which goes back to the issue Manish saw. Even if derbyall passed I would be nervous about submitting this patch for real, because I don't think there's a lot of testing using repeat executions of PreparedStatements in the tests. The ij tests mainly use Statement, this is a single use of an activation so this change would not affect them. Thus such a patch could regress Derby by making it more likely existing bugs would be exposed. Given the performance gains, I think we need to start re-using ResultSets from Activation, and devise a way to ensure the testing covers the re-use. The main issue is there is a large number of ResultSet implementations to cover. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: