Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
Johan Corveleyn wrote: On Mon, Sep 6, 2010 at 2:10 AM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: The only way to get closer to understanding the nature of the problem, I've added extensive logging code to the cache. Please run svnserve -M1000 -d --forground -r ... svnserve.log, zip the log file after the programm crashed got terminated and send it to me directly. Ok, here it is in attachment, zipped together with the .log and .dmp files from the crash. The exact log command I executed was: $ svn log svn://localhost/trunk/sourcesEJB/uz/s9server/services/toegang/driver/settings.xml Cheers, Thanks Johan! I finally found the problem and it actually was more likely to hit 32 bit systems then 64 bit ones. The relevant part from the log is this: create cache 1 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:RRI create cache 2 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:DAG create cache 3 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:DIR create cache 4 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:PACK-MANIFEST create cache 5 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:TEXT create cache 6 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:NODEREVS get 6: key=P+, Y3 MD5=52408b02cc58866e204010717f31c0ef hash=42680402 (0,257607) not found set 6: key=P+, Y3 MD5=52408b02cc58866e204010717f31c0ef hash=42680402 (0,257607) 4...@2113312 get 6: key=P+, Y3 MD5=52408b02cc58866e204010717f31c0ef hash=42680402 (0,257607) 4...@2113312 get 2: key=2( /trunk/sourcesEJB/uz/s9server MD5=52408b029f15368c2de222f27705f679 hash=42680402 (0,257607) 4...@2113312 crash As you can see, bucket (0,257607) contains a NODEREV the last get will read it as a DAG node. It only tested the first 4 bytes of the MD5 key hash because the to_find variable in find_entry has been a local array once and has become a pointer later on. sizeof(to_find) was now 4 .. 8 instead of 16. r994956 should fix the problem. -- Stefan^2.
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
On Wed, Sep 8, 2010 at 12:41 PM, Johan Corveleyn jcor...@gmail.com wrote: On Wed, Sep 8, 2010 at 12:00 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: Johan Corveleyn wrote: On Mon, Sep 6, 2010 at 2:10 AM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: The only way to get closer to understanding the nature of the problem, I've added extensive logging code to the cache. Please run svnserve -M1000 -d --forground -r ... svnserve.log, zip the log file after the programm crashed got terminated and send it to me directly. Ok, here it is in attachment, zipped together with the .log and .dmp files from the crash. The exact log command I executed was: $ svn log svn://localhost/trunk/sourcesEJB/uz/s9server/services/toegang/driver/settings.xml Cheers, Thanks Johan! I finally found the problem and it actually was more likely to hit 32 bit systems then 64 bit ones. The relevant part from the log is this: create cache 1 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:RRI create cache 2 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:DAG create cache 3 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:DIR create cache 4 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:PACK-MANIFEST create cache 5 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:TEXT create cache 6 prefix = fsfs:8d52877e-3a3d-4b59-a976-c8364f526eee/C:/research/svn/experiment/repos/db:NODEREVS get 6: key=P+, Y3 MD5=52408b02cc58866e204010717f31c0ef hash=42680402 (0,257607) not found set 6: key=P+, Y3 MD5=52408b02cc58866e204010717f31c0ef hash=42680402 (0,257607) 4...@2113312 get 6: key=P+, Y3 MD5=52408b02cc58866e204010717f31c0ef hash=42680402 (0,257607) 4...@2113312 get 2: key=2( /trunk/sourcesEJB/uz/s9server MD5=52408b029f15368c2de222f27705f679 hash=42680402 (0,257607) 4...@2113312 crash As you can see, bucket (0,257607) contains a NODEREV the last get will read it as a DAG node. It only tested the first 4 bytes of the MD5 key hash because the to_find variable in find_entry has been a local array once and has become a pointer later on. sizeof(to_find) was now 4 .. 8 instead of 16. r994956 should fix the problem. That's great news! I'll give it a spin tonight. Yes! My tests pass now, there are no more crashes. It all works fine. Cheers, -- Johan
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
On Tue, Aug 31, 2010 at 11:35 PM, Johan Corveleyn jcor...@gmail.com wrote: Stefan, On Tue, Aug 31, 2010 at 10:07 PM, Johan Corveleyn jcor...@gmail.com wrote: Some additional info: - I couldn't reproduce the crash with a narrow range. Not even 9:0 would crash it (right after startup). - BUT: if after 9:0 I run log without -r arguments, I get an error on the client side: [[[ ..\..\..\subversion\svn\log-cmd.c:746: (apr_err=160013) ..\..\..\subversion\libsvn_client\log.c:606: (apr_err=160013) ..\..\..\subversion\libsvn_repos\log.c:1474: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_repos\log.c:372: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3313: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3313: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3159: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:668: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ]]] - This also happens when the first run is 6:0 or even 42000:0. If the first run is 41000:0, then the second run doesn't get the client-side error, but the server crashes on the expected spot (after rev 42100). - The above client-side error also happens if the second run is 96000:9 instead of a log without -r argument. - However, if I run log -r96000:9 right after startup, no problem. - Other than that, it crashes reproducibly after 42100 if I run log with no -r arguments right after startup. I experimented some more. There must be something strange with that revision 90799 that also causes the client-side error. Some log runs immediately after startup: - svn log -r90798:0 svn://localhost/path/bigfile.xml: no crash - svn log -r90799:0 svn://localhost/path/bigfile.xml: crash (last log entry: 42104 (one before 42100 of the regular crash)) - svn log -r90921:0 svn://localhost/path/bigfile.xml: crash (last log entry: 42130 (two before 42100 of the regular crash)). r90921 is one before 90799. - svn log -r90998:0 svn://localhost/path/bigfile.xml: crash (last log entry: 42149 (three before 42100 of the regular crash)). r90998 is two before 90799. - svn log svn://localhost/path/bigfile.xml: still crashes consistently with last log entry 42100. Still r90799 itself seems a very normal commit, with only text modifications to bigfile.xml. Stefan, If needed, I could send some individual rev files (like r90779 and 42100) via private email. If that would help in finding the problem ... Cheers, -- Johan
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
On Tue, Aug 31, 2010 at 12:37 AM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: Johan Corveleyn wrote: On Mon, Aug 30, 2010 at 9:32 PM, Johan Corveleyn jcor...@gmail.commailto: jcor...@gmail.com wrote: On Mon, Aug 30, 2010 at 12:05 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de mailto:stefanfuhrm...@alice-dsl.de wrote: I thought so ;) To narrow down the nature of the problem, I added some checks that should be able to discern plain data corruption from (de-)serialization issues. Please apply either the patch or replace the original files with the versions in the .zip file. A debug build should then, hopefully, trigger a different and more specific assertion. Ok, will try that now. :-(, I updated to r990759 and applied your attached debug.patch, but I just get the same assertion (offset by 4 lines), with no additional information: [[[ $ svnserve -d -r c:/research/svn/experiment/repos Assertion failed: *ptr buffer, file ..\..\..\subversion\libsvn_subr\svn_temp_serializer.c, line 286 ]]] Anything else I can try? The failure was expected. With the patched code, I just hope to get more context information. Could you send me the call stack dump? My suspicion is that the DAG node is already (partially) garbage before it actually gets added to the cache. Ok, here it is in attachment (I removed the assert that you added in this commit, so that the app would really crash; otherwise I couldn't get a crash dump (or can I?)). (I hope the attachments (file svn-crash...log.txt and svn-crash...dmp) get through; if not, let me know and I'll zip them or something) Some additional info: - I couldn't reproduce the crash with a narrow range. Not even 9:0 would crash it (right after startup). - BUT: if after 9:0 I run log without -r arguments, I get an error on the client side: [[[ ..\..\..\subversion\svn\log-cmd.c:746: (apr_err=160013) ..\..\..\subversion\libsvn_client\log.c:606: (apr_err=160013) ..\..\..\subversion\libsvn_repos\log.c:1474: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_repos\log.c:372: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3313: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3313: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3159: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:668: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ]]] - This also happens when the first run is 6:0 or even 42000:0. If the first run is 41000:0, then the second run doesn't get the client-side error, but the server crashes on the expected spot (after rev 42100). - The above client-side error also happens if the second run is 96000:9 instead of a log without -r argument. - However, if I run log -r96000:9 right after startup, no problem. - Other than that, it crashes reproducibly after 42100 if I run log with no -r arguments right after startup. Hmm, I hope you can figure this out ... -- Johan Process info: Cmd line: C:\research\svn\client_build\svn_branch_performance\dist\bin\svnserve.exe -d -r c:/research/svn/experiment/repos Working Dir: C:\research\svn\experiment\repos Version: 1.7.0 (dev build), compiled Aug 19 2010, 00:50:40 Platform: Windows OS version 5.1 build 2600 Service Pack 3 Exception: ACCESS_VIOLATION Registers: eax=de12e56f ebx=00d31370 ecx=00dd3573 edx=00dd3573 esi=00f0f7c4 edi=00f0f810 eip=003baffc esp=00f0f784 ebp=00f0f784 efl=00010286 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs= Stacktrace: #1 0x003baffc in svn_fs_fs__id_deserialize() buffer = 0x00dd356f id = #2 0x003bb598 in svn_fs_fs__noderev_deserialize() buffer = 0x00dd3570 noderev_p = noderev = #3 0x003bfa4e in svn_fs_fs__dag_deserialize() out = data = data_len = 1e0 pool = node = #4 0x004bc6fa in membuffer_cache_get() cache = key = key_len = 31 item = deserializer = pool = buffer = 0x00dd3570 group_index = 6587 to_find = entry = size = 1e0 #5 0x004bc3b4 in svn_membuffer_cache_get() svn_err__temp = value_p = found = 0x00f0f8c8 cache_void = 0x00d6bc40 key = pool = full_key = 0x00dd3520 cache = full_key_len = 31 #6 0x004bdf93 in svn_cache__get() value_p = found = 0x00f0f8c8 cache = key = pool = #7 0x003b1158 in dag_node_cache_get() svn_err__temp = node_p = root = path = pool = node = cache = key = found = 0 #8
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
Stefan, On Tue, Aug 31, 2010 at 10:07 PM, Johan Corveleyn jcor...@gmail.com wrote: Some additional info: - I couldn't reproduce the crash with a narrow range. Not even 9:0 would crash it (right after startup). - BUT: if after 9:0 I run log without -r arguments, I get an error on the client side: [[[ ..\..\..\subversion\svn\log-cmd.c:746: (apr_err=160013) ..\..\..\subversion\libsvn_client\log.c:606: (apr_err=160013) ..\..\..\subversion\libsvn_repos\log.c:1474: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_repos\log.c:372: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3313: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3313: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:3159: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ..\..\..\subversion\libsvn_fs_fs\tree.c:668: (apr_err=160013) svn: File not found: revision 90799, path '?\239' ]]] - This also happens when the first run is 6:0 or even 42000:0. If the first run is 41000:0, then the second run doesn't get the client-side error, but the server crashes on the expected spot (after rev 42100). - The above client-side error also happens if the second run is 96000:9 instead of a log without -r argument. - However, if I run log -r96000:9 right after startup, no problem. - Other than that, it crashes reproducibly after 42100 if I run log with no -r arguments right after startup. I experimented some more. There must be something strange with that revision 90799 that also causes the client-side error. Some log runs immediately after startup: - svn log -r90798:0 svn://localhost/path/bigfile.xml: no crash - svn log -r90799:0 svn://localhost/path/bigfile.xml: crash (last log entry: 42104 (one before 42100 of the regular crash)) - svn log -r90921:0 svn://localhost/path/bigfile.xml: crash (last log entry: 42130 (two before 42100 of the regular crash)). r90921 is one before 90799. - svn log -r90998:0 svn://localhost/path/bigfile.xml: crash (last log entry: 42149 (three before 42100 of the regular crash)). r90998 is two before 90799. - svn log svn://localhost/path/bigfile.xml: still crashes consistently with last log entry 42100. Still r90799 itself seems a very normal commit, with only text modifications to bigfile.xml. One more note: the repository was once created by converting our CVS repository with cvs2svn (it's an old conversion that we did as an experiment, after which we did the real conversion; but I still use the old converted repo to test things). I just now notice that we did that old conversion with the cvs-revnum option, i.e. updating the cvs2svn:cvs-rev property on every commit, to make it contain the cvs revision number of the file. So every commit also contains prop changes. Probably not relevant, but you never know :-). Cheers, -- Johan
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
On Mon, Aug 30, 2010 at 12:05 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: Johan Corveleyn wrote: On Sun, Aug 29, 2010 at 12:32 PM, stefan2_at_apache.org wrote: / Author: stefan2 / / Date: Sun Aug 29 10:32:08 2010 / / New Revision: 990537 / / / / URL: http://svn.apache.org/viewvc?rev=990537view=rev http://svn.apache.org/viewvc?rev=990537view=rev / / Log: / / Looking for the cause of Johan Corveleyn's crash (see / / http://svn.haxx.se/dev/archive-2010-08/0652.shtml), it / / seems that wrong / corrupted data contains backward / / pointers, i.e. negative offsets. That cannot happen if / / everything works as intended. / I've just retried my test after this change (actually with performance-branch_at_990579, so updated just 10 minutes ago). Now I get the assertion error, after running log or blame on that particular file: [[[ $ svnserve -d -r c:/research/svn/experiment/repos Assertion failed: *ptr buffer, file ..\..\..\subversion\libsvn_subr\svn_temp_serializer.c, line 282 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. ]]] That is what I expected looking at the call stacks you posted. My preliminary analysis goes as follows: * The error seems to be limited to relatively rare occasions. That sufficiently excludes alignment issues and plainly wrong parameters / function calls. * It might be a (still rare) 32-bit-only issue. * There seems to be no miscast of types, i.e. the DAG node being read and causing the PF is actually a DAG node. Even if conflicting keys were used, the structure could still be read from the cache and would lead to some logic failure elsewhere. What else could it be? Most of the following are rather * concurrency issue * data corruption within the cache itself * some strange serialization issue that needs very specific data and / or 32 bit pointers to show up Is there any way I can find more information about this failure, so I can help you diagnose the problem? In fact there is. Just some questions: * You are the only one accessing the server and you use a single client process? Yes. All on the same machine actually (my laptop). Accessing the server with svn://localhost. * Does it happen if you log / blame the file for the first time and no other requests have been made to the server before? Yes * Does a command line svn log produce some output before the crash? If so, is there something unusual happening around these revisions (branch replacement or so)? Yes. Running svn log svn://localhost/trunk/some/path/bigfile.xml yields 969 of the 2279 log entries. From r95849 (last change to this file) down to r42100. Then it suddenly stops. I've checked r42100 with log -v, and it only mentions text modification of bigfile.xml. Same goes for the previous and next revisions in which bigfile.xml was affected (r42104 and r42042). Also, please verify that the crash gets triggered if the server is started with the following extra parameters: * -c0 -M0 -F0 No crash * -c0 -M0 No crash * -c0 -M1500 -F0 Crash (actually I did it with -M1000, because M1500 would give me an Out of memory immediately). * -c0 -M1500 Crash (with -M1000 that is) Just to be clear: the very same repos does not have this problem when accessed by a trunk svnserve. I thought so ;) To narrow down the nature of the problem, I added some checks that should be able to discern plain data corruption from (de-)serialization issues. Please apply either the patch or replace the original files with the versions in the .zip file. A debug build should then, hopefully, trigger a different and more specific assertion. Ok, will try that now. -- Johan
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
On Mon, Aug 30, 2010 at 9:32 PM, Johan Corveleyn jcor...@gmail.com wrote: On Mon, Aug 30, 2010 at 12:05 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de wrote: I thought so ;) To narrow down the nature of the problem, I added some checks that should be able to discern plain data corruption from (de-)serialization issues. Please apply either the patch or replace the original files with the versions in the .zip file. A debug build should then, hopefully, trigger a different and more specific assertion. Ok, will try that now. :-(, I updated to r990759 and applied your attached debug.patch, but I just get the same assertion (offset by 4 lines), with no additional information: [[[ $ svnserve -d -r c:/research/svn/experiment/repos Assertion failed: *ptr buffer, file ..\..\..\subversion\libsvn_subr\svn_temp_serializer.c, line 286 ]]] Anything else I can try? FWIW, I do see a lot of warnings during compilation of cache-membuffer.c (this is after applying your patch, so line-numbers should be interpreted accordingly): [[[ ..\..\..\subversion\libsvn_subr\cache-membuffer.c(353): warning C4245: '=' : conversion from 'int' to 'apr_uint64_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(384): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(433): warning C4245: 'return' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(546): warning C4244: 'initializing' : conversion from 'apr_uint64_t' to 'apr_size_t', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(656): warning C4244: 'initializing' : conversion from 'apr_uint64_t' to 'int', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(663): warning C4018: '=' : signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(731): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(732): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(733): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(736): warning C4244: 'function' : conversion from 'apr_uint64_t' to 'apr_size_t', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(737): warning C4305: 'type cast' : truncation from 'apr_uint64_t' to 'unsigned char *' ..\..\..\subversion\libsvn_subr\cache-membuffer.c(771): warning C4018: '' : signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(773): warning C4245: '=' : conversion from 'int' to 'apr_uint64_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(944): warning C4244: '=' : conversion from 'apr_uint64_t' to 'apr_size_t', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(946): warning C4305: 'type cast' : truncation from 'apr_uint64_t' to 'char *' ]]] Cheers, -- Johan
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
Johan Corveleyn wrote: On Mon, Aug 30, 2010 at 12:05 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de mailto:stefanfuhrm...@alice-dsl.de wrote: Is there any way I can find more information about this failure, so I can help you diagnose the problem? In fact there is. Just some questions: * You are the only one accessing the server and you use a single client process? Yes. All on the same machine actually (my laptop). Accessing the server with svn://localhost. * Does it happen if you log / blame the file for the first time and no other requests have been made to the server before? Yes * Does a command line svn log produce some output before the crash? If so, is there something unusual happening around these revisions (branch replacement or so)? Yes. Running svn log svn://localhost/trunk/some/path/bigfile.xml yields 969 of the 2279 log entries. From r95849 (last change to this file) down to r42100. Then it suddenly stops. I've checked r42100 with log -v, and it only mentions text modification of bigfile.xml. Same goes for the previous and next revisions in which bigfile.xml was affected (r42104 and r42042). Good, we should then be able to find the root cause since this is a simple setup and the crash is deterministic. Have you tried svn log with a range like -r42104:42042 (or larger)? Perhaps, we can narrow it down to a single revision (probably svn log -r42042). Also, please verify that the crash gets triggered if the server is started with the following extra parameters: * -c0 -M0 -F0 No crash That disabled all membuffer as well as the file handle caches but the serialization code is still active. * -c0 -M0 No crash So, its not an issue with file handles kept open, positions not being tracked properly or something. * -c0 -M1500 -F0 Crash (actually I did it with -M1000, because M1500 would give me an Out of memory immediately). Good. -c0 is just the standard setup I use and it does not prevent the crash. -M1000 is certainly large enough to keep all data in cache, i.e. the entry eviction / cache compaction code won't get triggered. * -c0 -M1500 Crash (with -M1000 that is) Again, the file handle cache has no impact. -- Stefan^2.
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
Johan Corveleyn wrote: On Mon, Aug 30, 2010 at 9:32 PM, Johan Corveleyn jcor...@gmail.com mailto:jcor...@gmail.com wrote: On Mon, Aug 30, 2010 at 12:05 PM, Stefan Fuhrmann stefanfuhrm...@alice-dsl.de mailto:stefanfuhrm...@alice-dsl.de wrote: I thought so ;) To narrow down the nature of the problem, I added some checks that should be able to discern plain data corruption from (de-)serialization issues. Please apply either the patch or replace the original files with the versions in the .zip file. A debug build should then, hopefully, trigger a different and more specific assertion. Ok, will try that now. :-(, I updated to r990759 and applied your attached debug.patch, but I just get the same assertion (offset by 4 lines), with no additional information: [[[ $ svnserve -d -r c:/research/svn/experiment/repos Assertion failed: *ptr buffer, file ..\..\..\subversion\libsvn_subr\svn_temp_serializer.c, line 286 ]]] Anything else I can try? The failure was expected. With the patched code, I just hope to get more context information. Could you send me the call stack dump? My suspicion is that the DAG node is already (partially) garbage before it actually gets added to the cache. FWIW, I do see a lot of warnings during compilation of cache-membuffer.c (this is after applying your patch, so line-numbers should be interpreted accordingly): [[[ ..\..\..\subversion\libsvn_subr\cache-membuffer.c(353): warning C4245: '=' : conversion from 'int' to 'apr_uint64_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(384): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(433): warning C4245: 'return' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(546): warning C4244: 'initializing' : conversion from 'apr_uint64_t' to 'apr_size_t', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(656): warning C4244: 'initializing' : conversion from 'apr_uint64_t' to 'int', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(663): warning C4018: '=' : signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(731): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(732): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(733): warning C4245: '=' : conversion from 'int' to 'apr_uint32_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(736): warning C4244: 'function' : conversion from 'apr_uint64_t' to 'apr_size_t', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(737): warning C4305: 'type cast' : truncation from 'apr_uint64_t' to 'unsigned char *' None of those should affect the behavior. ..\..\..\subversion\libsvn_subr\cache-membuffer.c(771): warning C4018: '' : signed/unsigned mismatch That one is a theoretical issue on LLP64. ..\..\..\subversion\libsvn_subr\cache-membuffer.c(773): warning C4245: '=' : conversion from 'int' to 'apr_uint64_t', signed/unsigned mismatch ..\..\..\subversion\libsvn_subr\cache-membuffer.c(944): warning C4244: '=' : conversion from 'apr_uint64_t' to 'apr_size_t', possible loss of data ..\..\..\subversion\libsvn_subr\cache-membuffer.c(946): warning C4305: 'type cast' : truncation from 'apr_uint64_t' to 'char *' ]]] Uncritical, again. I will fix them all later this week. Thank you for your time helping me to debug that code! -- Stefan^2.
Re: svn commit: r990537 - /subversion/branches/performance/subversion/libsvn_subr/svn_temp_serializer.c
Hi Stefan, On Sun, Aug 29, 2010 at 12:32 PM, stef...@apache.org wrote: Author: stefan2 Date: Sun Aug 29 10:32:08 2010 New Revision: 990537 URL: http://svn.apache.org/viewvc?rev=990537view=rev Log: Looking for the cause of Johan Corveleyn's crash (see http://svn.haxx.se/dev/archive-2010-08/0652.shtml), it seems that wrong / corrupted data contains backward pointers, i.e. negative offsets. That cannot happen if everything works as intended. I've just retried my test after this change (actually with performance-bra...@990579, so updated just 10 minutes ago). Now I get the assertion error, after running log or blame on that particular file: [[[ $ svnserve -d -r c:/research/svn/experiment/repos Assertion failed: *ptr buffer, file ..\..\..\subversion\libsvn_subr\svn_temp_serializer.c, line 282 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. ]]] Is there any way I can find more information about this failure, so I can help you diagnose the problem? Just to be clear: the very same repos does not have this problem when accessed by a trunk svnserve. -- Johan