[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mamta A. Satoor updated DERBY-3788:
-----------------------------------

    Attachment: DERBY3788_patch3_stat.txt
                DERBY3788_patch3_diff.txt

I am attaching a new patch which incorporates following comments from Knut.
2)Are the calls to EmbedConnection30.setupContextStack() and 
restoreContextStack() needed around the call to execute()? I thought execute() 
would call setup/restoreContextStack() itself. 
I removed the calls to setup and restore the stack since as Knut pointed out, 
they are unnecessary.
3)Creating an EmbedConnection30 object directly breaks the modularity. Unless 
the calls to the internal methods are necessary, it may be better to use 
InternalDriver.activeDriver().connect(url, info) instead. 
I removed the direct creation of EmbedConnection30 and instead use 
InternalDriver.activeDriver().connect(url, info) 
5)There's a comment in DDImpl5.updateStatisticsInBackGround() saying that "cm 
is null the very first time, and whenever we aren't actually nested." I'm not 
sure I understand that comment. Why is it null the first time? And isn't the 
method always called in a nested context? And if it is null, wouldn't that 
cause a NullPointerException in EmbedConnection's constructor when url=null is 
passed in? 
In the case that ContextManager may be null, I do not fire statistics creation 
in background because that will cause NPE later on because url is null.
6) DDImpl5.updateStatisticsInBackGround() updates the shared variable 
executorForUpdateStatistics if it is null. But it is not protected by 
synchronization, so race conditions are possible. 
I now have synchronization code around the code that can result in race 
conditions.
7) DDImpl5.stop() should call super.stop(). 
Did this.
8) In BackgroundUpdateStatisticsTask, using a prepared statement with the table 
name and the index name parametrized would be better because it would handle 
quoting special characters correctly (not handled in 
the current patch) and it would reduce the number of entries in the statement 
cache. 
Now using PreparedStatement rather than Statement.

The 2 comments from Knut that are not addressed in this patch are as follows
4) I think that the creation of a new connection will reboot the database if it 
has been shut down in the user thread, which may lead to unpredictable 
behaviour. It also seems like it will preserve all connection attributes, like 
attributes to reencrypt the database or to start replication master. 
I think there might be an issue as raised by Knut's comment 4). I still have to 
trace down completely why some of the junit test failures with my patch say 
Database "Failed to start(/create) database". It might be because of the 
properties that are used to create the new Connection object to run the 
statistics in background. I am not sure yet if it happens during every run of 
the test jdbcapi.DriverMgrAuthenticationTest but I have seen that test to be 
one of the tests that fails with Failed to create database. I will work on 
trying to figure out at what point does the test fail with these errors. If 
anyone has any ideas on the failures or how we should create the new Connection 
object,
I will highly appreciate that.
9) As to the locking issues, I would have tried to call 
Connection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED) in 
the background thread to see if that solved/reduced the issues. 
I am not 100% about this but I this using isolation 
Connection.TRANSACTION_READ_UNCOMMITTED had slowed down test runs on my 
machine. I would like to address number 4 first so that at least the test 
failures go down and then I can focus on usage of isolation level 
Connection.TRANSACTION_READ_UNCOMMITTED. There ARE few tests that fail because 
of lock time out and may be using the isolation level recommended by Knut will 
help.

So, at this point, I will focus on 4) to try to narrow down where are the 
failures coming from. In the mean time, if community has time to run the junit 
tests to see if they notice noticeable performance problems with my patch, I 
will appreciate that feedback. Thanks

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to