[ 
https://issues.apache.org/jira/browse/JENA-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398580#comment-13398580
 ] 

Andy Seaborne commented on JENA-256:
------------------------------------

This is not a regression.  This is effect of one of a number of fixes based on 
code analysis that make the system more robust to sudden death failures 
(external factors, e.g. lost of power or operating system level process kill 
(e.g. kill -9)).  They are more likely to cause database corruption in direct 
mode where caching is separate from the filing system.  That the failure modes 
have not been reported in the wild suggests that are rare (and not test for 
extensively).

Had the system been acting correctly in the first place, there would not be a 
change in performance.

Optimization of write-back can be done but, I believe, needs careful testing. 
The obvious technique is to batch write backs but then the system must be 
careful about (1) increased memory usage due to delays in write-back and (2) no 
active causing the queue not to be flushed.  Crude prototyping by settign a 
write-back size of 10 transactions shows it works, using the test case, but 
issues (1) and (2) need understanding before putting the main codebase.

A partial improvement is to sync only files that have actually been updated but 
it seems the OS (Linux at least) is quite efficient here anyway.

Batching effects can also be seen by start a read transaction before the main 
loop, and ending it after the loop.  In this case, the read blocks immediate 
write back and all non-journal changes happen at once at the end, but then 
there is a clear point to flush the delayed write-back queue.

Oddly, as the %-age of the time a read transaction is active increases under 
load, the performance may well improve because batchign effects emerge.  This 
may explain why initial reports saw less of a change as they were read/write 
mixes, not singe threaded write only access patterns.

                
> Significant performance regression (TDB?) on 2.7.1 RC compared to May 15 build
> ------------------------------------------------------------------------------
>
>                 Key: JENA-256
>                 URL: https://issues.apache.org/jira/browse/JENA-256
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 2.7.1
>         Environment: Windows 7, 64 bit, tested against 2.7.1 RC from June 9 
> versus a build from May 15
>            Reporter: Simon Helsen
>            Priority: Critical
>         Attachments: 271May15Output1.txt, 271May15Output2.txt, 
> 271May15Output3.txt, 271May15Output4.txt, 271RCOutput1.txt, 271RCOutput2.txt, 
> 271RCOutput3.txt, 271RCOutput4.txt, PerformanceRegressionTest.java
>
>
> See also 
> http://mail-archives.apache.org/mod_mbox/jena-dev/201206.mbox/%3C4FD9F19E.3080904%40bristol.ac.uk%3E
> I was able to reproduce the performance regression with an isolated test 
> scenario. So I recreated the components ARQ, CORE, IRI, and TDB with the SVN 
> state of May 15 9 (so svn update -r {2012-05-15} and then svn clean install)
> I then created a simple test program (attached as 
> PerformanceRegressionTest.java) which I ran 4 times in a row for each version 
> of Jena. Note that I deleted the TDB directory after the first 4 runs before 
> using the other Jena version. Attached are the files with the output. The 
> regression is obvious

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to