Re: [sqlite] corrupt database recovery
We weren't logging the encryption failure. >>>> The feof() does return true until you attempt to read PAST the end of a >> file. The "does" in the above line should be "doesn't" >> while code Yeah, this code was written as a 'do...while', instead of a 'while do.' -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Machin Sent: Thursday, May 28, 2009 7:19 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery On 29/05/2009 9:34 AM, Gene Allen wrote: > Yeah. > > Since my code works in blocks, read/compress/encrypt/write, loop. Almost > all the real data was being written to the compressed file, however any > finalization and flushing of the stream wasn't occurring (since the encrypt > was failing) and the encrypt failure wasn't logged? > so the last bit of any SQLite database wouldn't be written. If so, pragma integrity_check should report that some of the pages actually written contain pointers to pages that are past the end of the file, shouldn't it? [snip] >> Well...a more structured test exposed the problem and it was this: >> >> The feof() does return true until you attempt to read PAST the end of a >> file. If feof doesn't continue to return true, it is broken. > So the code worked great until the file's length was a multiple of > the >> buffer size (in my case 262,144 bytes). As you can imagine that doesn't >> happen too often in the real world. >> >> Since I assumed that a feof would return true where there wasn't any more >> data in the file, I would start another pass at reading a chunk of data >> (which wouldn't find anything) and run thru the compression/encryption > code. >> The compression code worked handled it correctly, but the encryption >> required that a DWORD boundary (blowfish) and since 0 is on such a > boundary >> but at the wrong end...it would fail. Silently? Unlogged? In any case, I would have thought using feof() was not needed ... long time since I've written C in earnest, but isn't something like this the standard idiom: #define BUFSIZ 262144 buff char[BUFSIZ]; size_t nbytes; FILE *f; f = fopen("filename", "rb"); while ((nbytes = fread(buff, 1, BUFSIZ, f)) { do_something(buff, nbytes); } ?? HTH, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
On 29/05/2009 10:18 AM, John Machin wrote: > On 29/05/2009 9:34 AM, Gene Allen wrote: >> Yeah. >> >> Since my code works in blocks, read/compress/encrypt/write, loop. Almost >> all the real data was being written to the compressed file, however any >> finalization and flushing of the stream wasn't occurring (since the encrypt >> was failing) > > and the encrypt failure wasn't logged? > > > so the last bit of any SQLite database wouldn't be written. > > If so, pragma integrity_check should report that some of the pages > actually written contain pointers to pages that are past the end of the > file, shouldn't it? Your output from the integrity_check shows complaints about invalid page numbers in the range 462 to 773. At the default page size of 1024, those page numbers span (773-462+1)*1024 = 319488 bytes so you are missing more than a 262144-byte chunk [unless your page size is 512!]. This doesn't seem to gel with the combination of "almost all the real data was being written" and the hypothesis that the database was corrupted merely by truncation. What are the page size, the expected size of the database, and the actual (truncated) size of the database? What evidence do you have that the feof problem actually happened in this case? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
On 29/05/2009 9:34 AM, Gene Allen wrote: > Yeah. > > Since my code works in blocks, read/compress/encrypt/write, loop. Almost > all the real data was being written to the compressed file, however any > finalization and flushing of the stream wasn't occurring (since the encrypt > was failing) and the encrypt failure wasn't logged? > so the last bit of any SQLite database wouldn't be written. If so, pragma integrity_check should report that some of the pages actually written contain pointers to pages that are past the end of the file, shouldn't it? [snip] >> Well...a more structured test exposed the problem and it was this: >> >> The feof() does return true until you attempt to read PAST the end of a >> file. If feof doesn't continue to return true, it is broken. > So the code worked great until the file's length was a multiple of > the >> buffer size (in my case 262,144 bytes). As you can imagine that doesn't >> happen too often in the real world. >> >> Since I assumed that a feof would return true where there wasn't any more >> data in the file, I would start another pass at reading a chunk of data >> (which wouldn't find anything) and run thru the compression/encryption > code. >> The compression code worked handled it correctly, but the encryption >> required that a DWORD boundary (blowfish) and since 0 is on such a > boundary >> but at the wrong end...it would fail. Silently? Unlogged? In any case, I would have thought using feof() was not needed ... long time since I've written C in earnest, but isn't something like this the standard idiom: #define BUFSIZ 262144 buff char[BUFSIZ]; size_t nbytes; FILE *f; f = fopen("filename", "rb"); while ((nbytes = fread(buff, 1, BUFSIZ, f)) { do_something(buff, nbytes); } ?? HTH, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
> > Since my code works in blocks, read/compress/encrypt/write, loop. Almost > all the real data was being written to the compressed file, however any > finalization and flushing of the stream wasn't occurring (since the encrypt > was failing) so the last bit of any SQLite database wouldn't be written. > > I'm guessing that also contributes to how infrequent I saw a problem. > What size blocks where you using? So, what _is_ stored in the final "block size" length of the DB file? It must be something not-so-critical as you are able to dump the DB, right? Perhaps adding a full-file verify pass to your compress/encrypt code would be a good idea. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
Yeah. Since my code works in blocks, read/compress/encrypt/write, loop. Almost all the real data was being written to the compressed file, however any finalization and flushing of the stream wasn't occurring (since the encrypt was failing) so the last bit of any SQLite database wouldn't be written. I'm guessing that also contributes to how infrequent I saw a problem. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Allen Fowler Sent: Thursday, May 28, 2009 5:17 PM To: General Discussion of SQLite Database; kennethinbox-sql...@yahoo.com Subject: Re: [sqlite] corrupt database recovery > I think I found my defect: my old stress tests was based on doing > compression/encryptions/decryption/decompression passes on files of random > sizes; so I would do about a 10 million passes or so and say...that's pretty > good. > > Well...a more structured test exposed the problem and it was this: > > The feof() does return true until you attempt to read PAST the end of a > file. So the code worked great until the file's length was a multiple of the > buffer size (in my case 262,144 bytes). As you can imagine that doesn't > happen too often in the real world. > > Since I assumed that a feof would return true where there wasn't any more > data in the file, I would start another pass at reading a chunk of data > (which wouldn't find anything) and run thru the compression/encryption code. > The compression code worked handled it correctly, but the encryption > required that a DWORD boundary (blowfish) and since 0 is on such a boundary > but at the wrong end...it would fail. > But was causing the DB to be corrupt, but a dump to work fine? What structure was getting damaged? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
> I think I found my defect: my old stress tests was based on doing > compression/encryptions/decryption/decompression passes on files of random > sizes; so I would do about a 10 million passes or so and say...that's pretty > good. > > Well...a more structured test exposed the problem and it was this: > > The feof() does return true until you attempt to read PAST the end of a > file. So the code worked great until the file's length was a multiple of the > buffer size (in my case 262,144 bytes). As you can imagine that doesn't > happen too often in the real world. > > Since I assumed that a feof would return true where there wasn't any more > data in the file, I would start another pass at reading a chunk of data > (which wouldn't find anything) and run thru the compression/encryption code. > The compression code worked handled it correctly, but the encryption > required that a DWORD boundary (blowfish) and since 0 is on such a boundary > but at the wrong end...it would fail. > But was causing the DB to be corrupt, but a dump to work fine? What structure was getting damaged? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
I have not! but I just might...thanks for the tip. I think I found my defect: my old stress tests was based on doing compression/encryptions/decryption/decompression passes on files of random sizes; so I would do about a 10 million passes or so and say...that's pretty good. Well...a more structured test exposed the problem and it was this: The feof() does return true until you attempt to read PAST the end of a file. So the code worked great until the file's length was a multiple of the buffer size (in my case 262,144 bytes). As you can imagine that doesn't happen too often in the real world. Since I assumed that a feof would return true where there wasn't any more data in the file, I would start another pass at reading a chunk of data (which wouldn't find anything) and run thru the compression/encryption code. The compression code worked handled it correctly, but the encryption required that a DWORD boundary (blowfish) and since 0 is on such a boundary but at the wrong end...it would fail. I fixed the code to test for 0 bytes being read, instead of solely relying on feof() and all seems well. :) I would like to say thank you for all your help and advice. Gene -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Ken Sent: Thursday, May 28, 2009 11:32 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery Gene, Im sure others have suggested, but have you tried running your code through valgrind? Can you remove the custom VFS ? --- On Wed, 5/27/09, Gene Allen wrote: > From: Gene Allen > Subject: Re: [sqlite] corrupt database recovery > To: mgr...@medcom-online.de, "'General Discussion of SQLite Database'" > Date: Wednesday, May 27, 2009, 4:51 PM > I've reviewed the code over and over > again, but am yet to find it. But it > is a good bit of very complicated code (blowfish and > compression code). > > That's why I was asking about the format of the > database. Since I'm able to > successfully do a .dump but the integrity_check whines, I'm > thinking that > maybe if I focus on where what could cause that, it might > give me a clue. > > For example, if the tree is stored at the end of file maybe > I'm falling out > the loop without writing the final bytes. But if the > tree is not localized > to one area of the file, I'll have to rethink my approach. > > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] > On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 2:44 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > so, if you think it is a coding error on your side it will > be a hard work to find the source. I can only image side > effects > on an used API like sqlite3 with the classics: > > - stack frame overload in a function that calls sqlite > functions > - using a local stack variable as a return pointer and > reuse > this external. > - memory overwriting on malloc'ed pointers or reusing of > allready > free'd memory pages. > > one and two might be found using a code review and > identify > local variables or particular arrays that are written: > So wherever you call anything from sqlite check the stack > declaration and review the usage of these variables... > of course just don't do number two... ;) > > memory errors might be detected using special debug > versions of the > malloc/free library, by code review, or manually by adding > some > test variables whereever you malloc or free a pointer. > > I'm sure you allready did some of these... anyway, good > luck > > are you using threads? would be another beautiful issue to > trace...;) > > Marcus > > > > > It is a server class machine running Windows 2003 with > 4 GB. No, it's a > > local drive with 20GB free on it. > > > > I'm sure that it's a coding error on my part. > SQLite is very stable, in > > my > > opinion. I'm just trying to get a rough idea on > where I'm screwing up the > > database. > > > > > > -Original Message- > > From: sqlite-users-boun...@sqlite.org > > [mailto:sqlite-users-boun...@sqlite.org] > On Behalf Of John Elrick > > Sent: Wednesday, May 27, 2009 12:58 PM > > To: General Discussion of SQLite Database > > Subject: Re: [sqlite] corrupt database recovery > > > > What platform? Any chance they are using a > network drive? > > > > > > John > > > > Gene wrote: > >> My code is outside th
Re: [sqlite] corrupt database recovery
Gene, Im sure others have suggested, but have you tried running your code through valgrind? Can you remove the custom VFS ? --- On Wed, 5/27/09, Gene Allen wrote: > From: Gene Allen > Subject: Re: [sqlite] corrupt database recovery > To: mgr...@medcom-online.de, "'General Discussion of SQLite Database'" > > Date: Wednesday, May 27, 2009, 4:51 PM > I've reviewed the code over and over > again, but am yet to find it. But it > is a good bit of very complicated code (blowfish and > compression code). > > That's why I was asking about the format of the > database. Since I'm able to > successfully do a .dump but the integrity_check whines, I'm > thinking that > maybe if I focus on where what could cause that, it might > give me a clue. > > For example, if the tree is stored at the end of file maybe > I'm falling out > the loop without writing the final bytes. But if the > tree is not localized > to one area of the file, I'll have to rethink my approach. > > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] > On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 2:44 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > so, if you think it is a coding error on your side it will > be a hard work to find the source. I can only image side > effects > on an used API like sqlite3 with the classics: > > - stack frame overload in a function that calls sqlite > functions > - using a local stack variable as a return pointer and > reuse > this external. > - memory overwriting on malloc'ed pointers or reusing of > allready > free'd memory pages. > > one and two might be found using a code review and > identify > local variables or particular arrays that are written: > So wherever you call anything from sqlite check the stack > declaration and review the usage of these variables... > of course just don't do number two... ;) > > memory errors might be detected using special debug > versions of the > malloc/free library, by code review, or manually by adding > some > test variables whereever you malloc or free a pointer. > > I'm sure you allready did some of these... anyway, good > luck > > are you using threads? would be another beautiful issue to > trace...;) > > Marcus > > > > > It is a server class machine running Windows 2003 with > 4 GB. No, it's a > > local drive with 20GB free on it. > > > > I'm sure that it's a coding error on my part. > SQLite is very stable, in > > my > > opinion. I'm just trying to get a rough idea on > where I'm screwing up the > > database. > > > > > > -Original Message- > > From: sqlite-users-boun...@sqlite.org > > [mailto:sqlite-users-boun...@sqlite.org] > On Behalf Of John Elrick > > Sent: Wednesday, May 27, 2009 12:58 PM > > To: General Discussion of SQLite Database > > Subject: Re: [sqlite] corrupt database recovery > > > > What platform? Any chance they are using a > network drive? > > > > > > John > > > > Gene wrote: > >> My code is outside the database layer. So I > do all my database work, > >> then > >> compress and encrypt it. No errors are > returned anywhere. I'm guessing > >> that it's going to be an uninitialized variable or > byte alignment > >> problems > >> somewhere. > >> > >> This code is running on hundreds of machines > without a problem and I've > >> never reproduced it but every now and again I get > a support ticket > >> showing > > a > >> corrupt database. So I'm trying to figure > out WHERE to look. > >> > >> -Original Message- > >> From: sqlite-users-boun...@sqlite.org > >> [mailto:sqlite-users-boun...@sqlite.org] > On Behalf Of John Elrick > >> Sent: Wednesday, May 27, 2009 10:59 AM > >> To: General Discussion of SQLite Database > >> Subject: Re: [sqlite] corrupt database recovery > >> > >> Gene Allen wrote: > >> > >>> Ok...it's happened again and I've decided that > I need to track this > >>> down > >>> once and for all! > >>> > >>> Here is what I'm seeing: I get errors when I > do a integrity_check (see > >>> below), but I can .dump it to a text file and > then .read it into > >>> another > >
Re: [sqlite] corrupt database recovery
I might have missed something, but: What are you compressing? ehm... Are compressing the database file while sqlite is running ? > The problem is happening in the field and the program deals with a corrupt > database by renaming it, logging the error and creating a new database. > So > I only end up with the corrupt one. > > I would GUESS that the database doesn't have any free pages except those > at > the end since it's an auditing product so we never delete anything...but I > could be wrong. > > The only reason I think I'm 'dropping a byte' is that all the > compression/encryption code is byte based and the database gets corrupt > (invalid pages) but I'm able to .dump it successfully.That and I try > to > take the blame first, since it's usually my fault. (just ask my wife!) > > Right now I'm running a large test on the encryption/compression code to > see > if I can get it to break. > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 5:09 PM > To: 'General Discussion of SQLite Database' > Subject: Re: [sqlite] corrupt database recovery > > sorry, I don't know where sqlite stores any kind of data like > index or raw table, my suspect is that, since sqlite > usually looks for a free page, the pages are randomly > distributed over the file or added at the end if no free > page has been found. > > what I don't understand: > what makes you thinking that you may miss some bytes of a tree > (and thus of a page) ? > That is sqlite internal, except you are using your own > vfs, I guess, ...do you ? > > by the way: did your application crash prior or nearby you had this > corrupted DB ? > >> I've reviewed the code over and over again, but am yet to find it. But >> it >> is a good bit of very complicated code (blowfish and compression code). >> >> That's why I was asking about the format of the database. Since I'm >> able >> to >> successfully do a .dump but the integrity_check whines, I'm thinking >> that >> maybe if I focus on where what could cause that, it might give me a >> clue. >> >> For example, if the tree is stored at the end of file maybe I'm falling >> out >> the loop without writing the final bytes. But if the tree is not >> localized >> to one area of the file, I'll have to rethink my approach. >> >> >> -Original Message- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm >> Sent: Wednesday, May 27, 2009 2:44 PM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] corrupt database recovery >> >> so, if you think it is a coding error on your side it will >> be a hard work to find the source. I can only image side effects >> on an used API like sqlite3 with the classics: >> >> - stack frame overload in a function that calls sqlite functions >> - using a local stack variable as a return pointer and reuse >> this external. >> - memory overwriting on malloc'ed pointers or reusing of allready >> free'd memory pages. >> >> one and two might be found using a code review and identify >> local variables or particular arrays that are written: >> So wherever you call anything from sqlite check the stack >> declaration and review the usage of these variables... >> of course just don't do number two... ;) >> >> memory errors might be detected using special debug versions of the >> malloc/free library, by code review, or manually by adding some >> test variables whereever you malloc or free a pointer. >> >> I'm sure you allready did some of these... anyway, good luck >> >> are you using threads? would be another beautiful issue to trace...;) >> >> Marcus >> >> >> >>> It is a server class machine running Windows 2003 with 4 GB. No, it's >>> a >>> local drive with 20GB free on it. >>> >>> I'm sure that it's a coding error on my part. SQLite is very stable, >>> in >>> my >>> opinion. I'm just trying to get a rough idea on where I'm screwing up >>> the >>> database. >>> >>> >>> -Original Message- >>> From: sqlite-users-boun...@sqlite.org >>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick >>> Sent: Wednesday, May 27, 2009 12:58 PM >>> To: G
Re: [sqlite] corrupt database recovery
That's pretty much my test harness. Right now, I'm running a test on every file size from 0 to a 10 MB. If that works correctly (which I expect it will), I think I'm going to have to spend the time and effort setting up a massive test environment and stress the mess out of the whole system. I'm trying to avoid that since it would take a least a week to set that up. Thank you for the support, I need it. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Doug Sent: Wednesday, May 27, 2009 5:20 PM To: 'General Discussion of SQLite Database' Subject: Re: [sqlite] corrupt database recovery I'm sorry that I can't help with the SQLite part, but couldn't you write a simple test harness that would create a file (of random size?) with random contents, encrypt/compress to a secondary file, then decompress/decrypt to a third file. Compare first and third files. If they match, delete and run test again. If they don't match you now have your debugging input. I'd think letting something like that run for a day or so would flush out boundary/buffer errors. It's not deterministic, but should be easy enough to bang out and let run on a secondary machine while you continue your main search. Good luck either way. Doug > -Original Message- > From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > boun...@sqlite.org] On Behalf Of Gene Allen > Sent: Wednesday, May 27, 2009 4:52 PM > To: mgr...@medcom-online.de; 'General Discussion of SQLite Database' > Subject: Re: [sqlite] corrupt database recovery > > I've reviewed the code over and over again, but am yet to find it. But > it > is a good bit of very complicated code (blowfish and compression code). > > That's why I was asking about the format of the database. Since I'm > able to > successfully do a .dump but the integrity_check whines, I'm thinking > that > maybe if I focus on where what could cause that, it might give me a > clue. > > For example, if the tree is stored at the end of file maybe I'm falling > out > the loop without writing the final bytes. But if the tree is not > localized > to one area of the file, I'll have to rethink my approach. > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 2:44 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > so, if you think it is a coding error on your side it will > be a hard work to find the source. I can only image side effects > on an used API like sqlite3 with the classics: > > - stack frame overload in a function that calls sqlite functions > - using a local stack variable as a return pointer and reuse > this external. > - memory overwriting on malloc'ed pointers or reusing of allready > free'd memory pages. > > one and two might be found using a code review and identify > local variables or particular arrays that are written: > So wherever you call anything from sqlite check the stack > declaration and review the usage of these variables... > of course just don't do number two... ;) > > memory errors might be detected using special debug versions of the > malloc/free library, by code review, or manually by adding some > test variables whereever you malloc or free a pointer. > > I'm sure you allready did some of these... anyway, good luck > > are you using threads? would be another beautiful issue to trace...;) > > Marcus > > > > > It is a server class machine running Windows 2003 with 4 GB. No, > it's a > > local drive with 20GB free on it. > > > > I'm sure that it's a coding error on my part. SQLite is very stable, > in > > my > > opinion. I'm just trying to get a rough idea on where I'm screwing > up the > > database. > > > > > > -Original Message- > > From: sqlite-users-boun...@sqlite.org > > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > > Sent: Wednesday, May 27, 2009 12:58 PM > > To: General Discussion of SQLite Database > > Subject: Re: [sqlite] corrupt database recovery > > > > What platform? Any chance they are using a network drive? > > > > > > John > > > > Gene wrote: > >> My code is outside the database layer. So I do all my database > work, > >> then > >> compress and encrypt it. No errors are returned anywhere. I'm > guessing > >> that it's going to be an uninitialized variab
Re: [sqlite] corrupt database recovery
The problem is happening in the field and the program deals with a corrupt database by renaming it, logging the error and creating a new database. So I only end up with the corrupt one. I would GUESS that the database doesn't have any free pages except those at the end since it's an auditing product so we never delete anything...but I could be wrong. The only reason I think I'm 'dropping a byte' is that all the compression/encryption code is byte based and the database gets corrupt (invalid pages) but I'm able to .dump it successfully.That and I try to take the blame first, since it's usually my fault. (just ask my wife!) Right now I'm running a large test on the encryption/compression code to see if I can get it to break. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm Sent: Wednesday, May 27, 2009 5:09 PM To: 'General Discussion of SQLite Database' Subject: Re: [sqlite] corrupt database recovery sorry, I don't know where sqlite stores any kind of data like index or raw table, my suspect is that, since sqlite usually looks for a free page, the pages are randomly distributed over the file or added at the end if no free page has been found. what I don't understand: what makes you thinking that you may miss some bytes of a tree (and thus of a page) ? That is sqlite internal, except you are using your own vfs, I guess, ...do you ? by the way: did your application crash prior or nearby you had this corrupted DB ? > I've reviewed the code over and over again, but am yet to find it. But it > is a good bit of very complicated code (blowfish and compression code). > > That's why I was asking about the format of the database. Since I'm able > to > successfully do a .dump but the integrity_check whines, I'm thinking that > maybe if I focus on where what could cause that, it might give me a clue. > > For example, if the tree is stored at the end of file maybe I'm falling > out > the loop without writing the final bytes. But if the tree is not > localized > to one area of the file, I'll have to rethink my approach. > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 2:44 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > so, if you think it is a coding error on your side it will > be a hard work to find the source. I can only image side effects > on an used API like sqlite3 with the classics: > > - stack frame overload in a function that calls sqlite functions > - using a local stack variable as a return pointer and reuse > this external. > - memory overwriting on malloc'ed pointers or reusing of allready > free'd memory pages. > > one and two might be found using a code review and identify > local variables or particular arrays that are written: > So wherever you call anything from sqlite check the stack > declaration and review the usage of these variables... > of course just don't do number two... ;) > > memory errors might be detected using special debug versions of the > malloc/free library, by code review, or manually by adding some > test variables whereever you malloc or free a pointer. > > I'm sure you allready did some of these... anyway, good luck > > are you using threads? would be another beautiful issue to trace...;) > > Marcus > > > >> It is a server class machine running Windows 2003 with 4 GB. No, it's a >> local drive with 20GB free on it. >> >> I'm sure that it's a coding error on my part. SQLite is very stable, in >> my >> opinion. I'm just trying to get a rough idea on where I'm screwing up >> the >> database. >> >> >> -Original Message- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick >> Sent: Wednesday, May 27, 2009 12:58 PM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] corrupt database recovery >> >> What platform? Any chance they are using a network drive? >> >> >> John >> >> Gene wrote: >>> My code is outside the database layer. So I do all my database work, >>> then >>> compress and encrypt it. No errors are returned anywhere. I'm >>> guessing >>> that it's going to be an uninitialized variable or byte alignment >>> problems >>> somewhere. >>> >>> This code is running on hundreds of machines without a problem and I
Re: [sqlite] corrupt database recovery
I'm sorry that I can't help with the SQLite part, but couldn't you write a simple test harness that would create a file (of random size?) with random contents, encrypt/compress to a secondary file, then decompress/decrypt to a third file. Compare first and third files. If they match, delete and run test again. If they don't match you now have your debugging input. I'd think letting something like that run for a day or so would flush out boundary/buffer errors. It's not deterministic, but should be easy enough to bang out and let run on a secondary machine while you continue your main search. Good luck either way. Doug > -Original Message- > From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > boun...@sqlite.org] On Behalf Of Gene Allen > Sent: Wednesday, May 27, 2009 4:52 PM > To: mgr...@medcom-online.de; 'General Discussion of SQLite Database' > Subject: Re: [sqlite] corrupt database recovery > > I've reviewed the code over and over again, but am yet to find it. But > it > is a good bit of very complicated code (blowfish and compression code). > > That's why I was asking about the format of the database. Since I'm > able to > successfully do a .dump but the integrity_check whines, I'm thinking > that > maybe if I focus on where what could cause that, it might give me a > clue. > > For example, if the tree is stored at the end of file maybe I'm falling > out > the loop without writing the final bytes. But if the tree is not > localized > to one area of the file, I'll have to rethink my approach. > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 2:44 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > so, if you think it is a coding error on your side it will > be a hard work to find the source. I can only image side effects > on an used API like sqlite3 with the classics: > > - stack frame overload in a function that calls sqlite functions > - using a local stack variable as a return pointer and reuse > this external. > - memory overwriting on malloc'ed pointers or reusing of allready > free'd memory pages. > > one and two might be found using a code review and identify > local variables or particular arrays that are written: > So wherever you call anything from sqlite check the stack > declaration and review the usage of these variables... > of course just don't do number two... ;) > > memory errors might be detected using special debug versions of the > malloc/free library, by code review, or manually by adding some > test variables whereever you malloc or free a pointer. > > I'm sure you allready did some of these... anyway, good luck > > are you using threads? would be another beautiful issue to trace...;) > > Marcus > > > > > It is a server class machine running Windows 2003 with 4 GB. No, > it's a > > local drive with 20GB free on it. > > > > I'm sure that it's a coding error on my part. SQLite is very stable, > in > > my > > opinion. I'm just trying to get a rough idea on where I'm screwing > up the > > database. > > > > > > -Original Message- > > From: sqlite-users-boun...@sqlite.org > > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > > Sent: Wednesday, May 27, 2009 12:58 PM > > To: General Discussion of SQLite Database > > Subject: Re: [sqlite] corrupt database recovery > > > > What platform? Any chance they are using a network drive? > > > > > > John > > > > Gene wrote: > >> My code is outside the database layer. So I do all my database > work, > >> then > >> compress and encrypt it. No errors are returned anywhere. I'm > guessing > >> that it's going to be an uninitialized variable or byte alignment > >> problems > >> somewhere. > >> > >> This code is running on hundreds of machines without a problem and > I've > >> never reproduced it but every now and again I get a support ticket > >> showing > > a > >> corrupt database. So I'm trying to figure out WHERE to look. > >> > >> -Original Message- > >> From: sqlite-users-boun...@sqlite.org > >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > >> Sent: Wednesday, May 27, 2009 10:59 AM > >> To: General Discussion of SQLite Database > >> Subject: Re: [
Re: [sqlite] corrupt database recovery
sorry, I don't know where sqlite stores any kind of data like index or raw table, my suspect is that, since sqlite usually looks for a free page, the pages are randomly distributed over the file or added at the end if no free page has been found. what I don't understand: what makes you thinking that you may miss some bytes of a tree (and thus of a page) ? That is sqlite internal, except you are using your own vfs, I guess, ...do you ? by the way: did your application crash prior or nearby you had this corrupted DB ? > I've reviewed the code over and over again, but am yet to find it. But it > is a good bit of very complicated code (blowfish and compression code). > > That's why I was asking about the format of the database. Since I'm able > to > successfully do a .dump but the integrity_check whines, I'm thinking that > maybe if I focus on where what could cause that, it might give me a clue. > > For example, if the tree is stored at the end of file maybe I'm falling > out > the loop without writing the final bytes. But if the tree is not > localized > to one area of the file, I'll have to rethink my approach. > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm > Sent: Wednesday, May 27, 2009 2:44 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > so, if you think it is a coding error on your side it will > be a hard work to find the source. I can only image side effects > on an used API like sqlite3 with the classics: > > - stack frame overload in a function that calls sqlite functions > - using a local stack variable as a return pointer and reuse > this external. > - memory overwriting on malloc'ed pointers or reusing of allready > free'd memory pages. > > one and two might be found using a code review and identify > local variables or particular arrays that are written: > So wherever you call anything from sqlite check the stack > declaration and review the usage of these variables... > of course just don't do number two... ;) > > memory errors might be detected using special debug versions of the > malloc/free library, by code review, or manually by adding some > test variables whereever you malloc or free a pointer. > > I'm sure you allready did some of these... anyway, good luck > > are you using threads? would be another beautiful issue to trace...;) > > Marcus > > > >> It is a server class machine running Windows 2003 with 4 GB. No, it's a >> local drive with 20GB free on it. >> >> I'm sure that it's a coding error on my part. SQLite is very stable, in >> my >> opinion. I'm just trying to get a rough idea on where I'm screwing up >> the >> database. >> >> >> -Original Message- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick >> Sent: Wednesday, May 27, 2009 12:58 PM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] corrupt database recovery >> >> What platform? Any chance they are using a network drive? >> >> >> John >> >> Gene wrote: >>> My code is outside the database layer. So I do all my database work, >>> then >>> compress and encrypt it. No errors are returned anywhere. I'm >>> guessing >>> that it's going to be an uninitialized variable or byte alignment >>> problems >>> somewhere. >>> >>> This code is running on hundreds of machines without a problem and I've >>> never reproduced it but every now and again I get a support ticket >>> showing >> a >>> corrupt database. So I'm trying to figure out WHERE to look. >>> >>> -Original Message- >>> From: sqlite-users-boun...@sqlite.org >>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick >>> Sent: Wednesday, May 27, 2009 10:59 AM >>> To: General Discussion of SQLite Database >>> Subject: Re: [sqlite] corrupt database recovery >>> >>> Gene Allen wrote: >>> >>>> Ok...it's happened again and I've decided that I need to track this >>>> down >>>> once and for all! >>>> >>>> Here is what I'm seeing: I get errors when I do a integrity_check (see >>>> below), but I can .dump it to a text file and then .read it into >>>> another >>>> database ok. >>>> >>>> It seems to me
Re: [sqlite] corrupt database recovery
I've reviewed the code over and over again, but am yet to find it. But it is a good bit of very complicated code (blowfish and compression code). That's why I was asking about the format of the database. Since I'm able to successfully do a .dump but the integrity_check whines, I'm thinking that maybe if I focus on where what could cause that, it might give me a clue. For example, if the tree is stored at the end of file maybe I'm falling out the loop without writing the final bytes. But if the tree is not localized to one area of the file, I'll have to rethink my approach. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Marcus Grimm Sent: Wednesday, May 27, 2009 2:44 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery so, if you think it is a coding error on your side it will be a hard work to find the source. I can only image side effects on an used API like sqlite3 with the classics: - stack frame overload in a function that calls sqlite functions - using a local stack variable as a return pointer and reuse this external. - memory overwriting on malloc'ed pointers or reusing of allready free'd memory pages. one and two might be found using a code review and identify local variables or particular arrays that are written: So wherever you call anything from sqlite check the stack declaration and review the usage of these variables... of course just don't do number two... ;) memory errors might be detected using special debug versions of the malloc/free library, by code review, or manually by adding some test variables whereever you malloc or free a pointer. I'm sure you allready did some of these... anyway, good luck are you using threads? would be another beautiful issue to trace...;) Marcus > It is a server class machine running Windows 2003 with 4 GB. No, it's a > local drive with 20GB free on it. > > I'm sure that it's a coding error on my part. SQLite is very stable, in > my > opinion. I'm just trying to get a rough idea on where I'm screwing up the > database. > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > Sent: Wednesday, May 27, 2009 12:58 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > What platform? Any chance they are using a network drive? > > > John > > Gene wrote: >> My code is outside the database layer. So I do all my database work, >> then >> compress and encrypt it. No errors are returned anywhere. I'm guessing >> that it's going to be an uninitialized variable or byte alignment >> problems >> somewhere. >> >> This code is running on hundreds of machines without a problem and I've >> never reproduced it but every now and again I get a support ticket >> showing > a >> corrupt database. So I'm trying to figure out WHERE to look. >> >> -Original Message----- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick >> Sent: Wednesday, May 27, 2009 10:59 AM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] corrupt database recovery >> >> Gene Allen wrote: >> >>> Ok...it's happened again and I've decided that I need to track this >>> down >>> once and for all! >>> >>> Here is what I'm seeing: I get errors when I do a integrity_check (see >>> below), but I can .dump it to a text file and then .read it into >>> another >>> database ok. >>> >>> It seems to me that I'm screwing up an index or something. Are indexes >>> stored at the end of the database file? All I can think of is that my >>> compression/encryption routines are messing something up and I'm trying > to >>> figure out 'where' to look. >>> >>> I guess the real question is, what would I have to do to make an >>> integrity_check fail, but still let a dump work correctly? >>> >>> Many thanks for any advice on tracking down this ugliness. >>> >>> >> >> SNIP >> >> Personally, I'd refactor the code to allow me to verify the operation of >> the compression/encryption routines independently of the database >> operation. How are you injecting the compression/encryption into the >> database layer? >> >> >> John >> ___ >> sqlite-users mailing list >> sqli
Re: [sqlite] corrupt database recovery
so, if you think it is a coding error on your side it will be a hard work to find the source. I can only image side effects on an used API like sqlite3 with the classics: - stack frame overload in a function that calls sqlite functions - using a local stack variable as a return pointer and reuse this external. - memory overwriting on malloc'ed pointers or reusing of allready free'd memory pages. one and two might be found using a code review and identify local variables or particular arrays that are written: So wherever you call anything from sqlite check the stack declaration and review the usage of these variables... of course just don't do number two... ;) memory errors might be detected using special debug versions of the malloc/free library, by code review, or manually by adding some test variables whereever you malloc or free a pointer. I'm sure you allready did some of these... anyway, good luck are you using threads? would be another beautiful issue to trace...;) Marcus > It is a server class machine running Windows 2003 with 4 GB. No, it's a > local drive with 20GB free on it. > > I'm sure that it's a coding error on my part. SQLite is very stable, in > my > opinion. I'm just trying to get a rough idea on where I'm screwing up the > database. > > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > Sent: Wednesday, May 27, 2009 12:58 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > What platform? Any chance they are using a network drive? > > > John > > Gene wrote: >> My code is outside the database layer. So I do all my database work, >> then >> compress and encrypt it. No errors are returned anywhere. I'm guessing >> that it's going to be an uninitialized variable or byte alignment >> problems >> somewhere. >> >> This code is running on hundreds of machines without a problem and I've >> never reproduced it but every now and again I get a support ticket >> showing > a >> corrupt database. So I'm trying to figure out WHERE to look. >> >> -Original Message----- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick >> Sent: Wednesday, May 27, 2009 10:59 AM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] corrupt database recovery >> >> Gene Allen wrote: >> >>> Ok...it's happened again and I've decided that I need to track this >>> down >>> once and for all! >>> >>> Here is what I'm seeing: I get errors when I do a integrity_check (see >>> below), but I can .dump it to a text file and then .read it into >>> another >>> database ok. >>> >>> It seems to me that I'm screwing up an index or something. Are indexes >>> stored at the end of the database file? All I can think of is that my >>> compression/encryption routines are messing something up and I'm trying > to >>> figure out 'where' to look. >>> >>> I guess the real question is, what would I have to do to make an >>> integrity_check fail, but still let a dump work correctly? >>> >>> Many thanks for any advice on tracking down this ugliness. >>> >>> >> >> SNIP >> >> Personally, I'd refactor the code to allow me to verify the operation of >> the compression/encryption routines independently of the database >> operation. How are you injecting the compression/encryption into the >> database layer? >> >> >> John >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.0.238 / Virus Database: 270.12.39/2133 - Release Date: >> 05/25/09 > 08:16:00 >> >> > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
It is a server class machine running Windows 2003 with 4 GB. No, it's a local drive with 20GB free on it. I'm sure that it's a coding error on my part. SQLite is very stable, in my opinion. I'm just trying to get a rough idea on where I'm screwing up the database. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick Sent: Wednesday, May 27, 2009 12:58 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery What platform? Any chance they are using a network drive? John Gene wrote: > My code is outside the database layer. So I do all my database work, then > compress and encrypt it. No errors are returned anywhere. I'm guessing > that it's going to be an uninitialized variable or byte alignment problems > somewhere. > > This code is running on hundreds of machines without a problem and I've > never reproduced it but every now and again I get a support ticket showing a > corrupt database. So I'm trying to figure out WHERE to look. > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > Sent: Wednesday, May 27, 2009 10:59 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > Gene Allen wrote: > >> Ok...it's happened again and I've decided that I need to track this down >> once and for all! >> >> Here is what I'm seeing: I get errors when I do a integrity_check (see >> below), but I can .dump it to a text file and then .read it into another >> database ok. >> >> It seems to me that I'm screwing up an index or something. Are indexes >> stored at the end of the database file? All I can think of is that my >> compression/encryption routines are messing something up and I'm trying to >> figure out 'where' to look. >> >> I guess the real question is, what would I have to do to make an >> integrity_check fail, but still let a dump work correctly? >> >> Many thanks for any advice on tracking down this ugliness. >> >> > > SNIP > > Personally, I'd refactor the code to allow me to verify the operation of > the compression/encryption routines independently of the database > operation. How are you injecting the compression/encryption into the > database layer? > > > John > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.0.238 / Virus Database: 270.12.39/2133 - Release Date: 05/25/09 08:16:00 > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
What platform? Any chance they are using a network drive? John Gene wrote: > My code is outside the database layer. So I do all my database work, then > compress and encrypt it. No errors are returned anywhere. I'm guessing > that it's going to be an uninitialized variable or byte alignment problems > somewhere. > > This code is running on hundreds of machines without a problem and I've > never reproduced it but every now and again I get a support ticket showing a > corrupt database. So I'm trying to figure out WHERE to look. > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > Sent: Wednesday, May 27, 2009 10:59 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > Gene Allen wrote: > >> Ok...it's happened again and I've decided that I need to track this down >> once and for all! >> >> Here is what I'm seeing: I get errors when I do a integrity_check (see >> below), but I can .dump it to a text file and then .read it into another >> database ok. >> >> It seems to me that I'm screwing up an index or something. Are indexes >> stored at the end of the database file? All I can think of is that my >> compression/encryption routines are messing something up and I'm trying to >> figure out 'where' to look. >> >> I guess the real question is, what would I have to do to make an >> integrity_check fail, but still let a dump work correctly? >> >> Many thanks for any advice on tracking down this ugliness. >> >> > > SNIP > > Personally, I'd refactor the code to allow me to verify the operation of > the compression/encryption routines independently of the database > operation. How are you injecting the compression/encryption into the > database layer? > > > John > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.0.238 / Virus Database: 270.12.39/2133 - Release Date: 05/25/09 > 08:16:00 > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
What platform is this? I just posted a note today that my Linux box running 2.6-25 and ext3 isn't doing fsyncs like it should, so I would be susceptible to DB corruption if my machine crashed during DB I/O. I posted a C program you could run on the specific machine with a corrupt DB to see if it's really doing synchronous I/O. The other thing you might want to check is hardware. Running memtest on the machine overnight could show something, unless you are on server-class hardware with ECC memory. Jim On 5/27/09, Gene wrote: > My code is outside the database layer. So I do all my database work, then > compress and encrypt it. No errors are returned anywhere. I'm guessing > that it's going to be an uninitialized variable or byte alignment problems > somewhere. > > This code is running on hundreds of machines without a problem and I've > never reproduced it but every now and again I get a support ticket showing a > corrupt database. So I'm trying to figure out WHERE to look. > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > Sent: Wednesday, May 27, 2009 10:59 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > Gene Allen wrote: >> Ok...it's happened again and I've decided that I need to track this down >> once and for all! >> >> Here is what I'm seeing: I get errors when I do a integrity_check (see >> below), but I can .dump it to a text file and then .read it into another >> database ok. >> >> It seems to me that I'm screwing up an index or something. Are indexes >> stored at the end of the database file? All I can think of is that my >> compression/encryption routines are messing something up and I'm trying to >> figure out 'where' to look. >> >> I guess the real question is, what would I have to do to make an >> integrity_check fail, but still let a dump work correctly? >> >> Many thanks for any advice on tracking down this ugliness. >> > > SNIP > > Personally, I'd refactor the code to allow me to verify the operation of > the compression/encryption routines independently of the database > operation. How are you injecting the compression/encryption into the > database layer? > > > John > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
My code is outside the database layer. So I do all my database work, then compress and encrypt it. No errors are returned anywhere. I'm guessing that it's going to be an uninitialized variable or byte alignment problems somewhere. This code is running on hundreds of machines without a problem and I've never reproduced it but every now and again I get a support ticket showing a corrupt database. So I'm trying to figure out WHERE to look. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick Sent: Wednesday, May 27, 2009 10:59 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery Gene Allen wrote: > Ok...it's happened again and I've decided that I need to track this down > once and for all! > > Here is what I'm seeing: I get errors when I do a integrity_check (see > below), but I can .dump it to a text file and then .read it into another > database ok. > > It seems to me that I'm screwing up an index or something. Are indexes > stored at the end of the database file? All I can think of is that my > compression/encryption routines are messing something up and I'm trying to > figure out 'where' to look. > > I guess the real question is, what would I have to do to make an > integrity_check fail, but still let a dump work correctly? > > Many thanks for any advice on tracking down this ugliness. > SNIP Personally, I'd refactor the code to allow me to verify the operation of the compression/encryption routines independently of the database operation. How are you injecting the compression/encryption into the database layer? John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
Gene Allen wrote: > Ok...it's happened again and I've decided that I need to track this down > once and for all! > > Here is what I'm seeing: I get errors when I do a integrity_check (see > below), but I can .dump it to a text file and then .read it into another > database ok. > > It seems to me that I'm screwing up an index or something. Are indexes > stored at the end of the database file? All I can think of is that my > compression/encryption routines are messing something up and I'm trying to > figure out 'where' to look. > > I guess the real question is, what would I have to do to make an > integrity_check fail, but still let a dump work correctly? > > Many thanks for any advice on tracking down this ugliness. > SNIP Personally, I'd refactor the code to allow me to verify the operation of the compression/encryption routines independently of the database operation. How are you injecting the compression/encryption into the database layer? John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
Ok...it's happened again and I've decided that I need to track this down once and for all! Here is what I'm seeing: I get errors when I do a integrity_check (see below), but I can .dump it to a text file and then .read it into another database ok. It seems to me that I'm screwing up an index or something. Are indexes stored at the end of the database file? All I can think of is that my compression/encryption routines are messing something up and I'm trying to figure out 'where' to look. I guess the real question is, what would I have to do to make an integrity_check fail, but still let a dump work correctly? Many thanks for any advice on tracking down this ugliness. Gene >>Output details sqlite> pragma integrity_check; *** in database main *** On tree page 3 cell 26: invalid page number 469 On tree page 3 cell 26: Child page depth differs On tree page 3 cell 27: invalid page number 490 On tree page 3 cell 28: invalid page number 511 On tree page 3 cell 29: invalid page number 533 On tree page 3 cell 30: invalid page number 554 On tree page 3 cell 31: invalid page number 576 On tree page 3 cell 32: invalid page number 598 On tree page 3 cell 33: invalid page number 620 On tree page 3 cell 34: invalid page number 642 On tree page 3 cell 35: invalid page number 666 On tree page 3 cell 36: invalid page number 688 On tree page 3 cell 37: invalid page number 709 On tree page 3 cell 38: invalid page number 730 On tree page 3 cell 39: invalid page number 752 On page 3 at right child: invalid page number 773 On tree page 419 cell 86: invalid page number 462 On tree page 419 cell 86: Child page depth differs On tree page 419 cell 87: invalid page number 463 I said...h...so I do a .dump to see if I can sqlite> .output c:\\a.sql sqlite> .dump sqlite> .quit resulting in: 05/27/2009 10:08 AM 496,051 a.sql 1 File(s)496,051 bytes 0 Dir(s) 104,974,647,296 bytes free -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Machin Sent: Saturday, April 25, 2009 8:15 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery On 26/04/2009 5:47 AM, Gene wrote: > Every now and again, we have a database that gets corrupt in the field (bad > coding on our end, not sqlite). > Hi Gene, The obvious question: why not fix the bad code? What does 'PRAGMA integrity_check;' say about these corrupt databases? > When we get one of these corrupt databases, we recover what we can by get > one row at a time by rowid, like 'select * from mytable where rowid = 1' and > we inc the row number every time. Each row we successfully recover we > insert into a clean database. How do you know when to stop incrementing the row number? Does 'select min(rowid), max(rowid) from mytable' give you believable answers? What happens when you do 'select * from mytable' ? Approximately how many rows are there? How many 'select * from mytable where rowid = ' attempts fail, and for what reasons (previously deleted, some other result code(s))? Are the failures restricted to a relatively small range of rowids? > Works pretty well, except over time we've added more and more columns, each > one that has to be pulled and bound to get it into the new table. What is causing this "pull and bind" problem, the fact that some columns weren't present initially? or just the sheer number of columns i.e. you need to "pull and bind" all columns (not just the later additions)? In any case, please explain what you mean by "pulled" and "bound". > We tried > to do an 'attach' so we could do a 'select into' the clean database directly > from the corrupt one. But as soon as we attempt to 'attach' the corrupt > database, we understandable get a 'database is malformed' error. It's not quite so understandable why 'select * from mytable where rowid = 1' doesn't get an error. > Is there an easier way to pull the good records out of a corrupt database > and put them into a new one without binding each column by hand? Can you give us an example of a row or two of (a) what you get from the 'select * from mytable where rowid = ' (b) the insert statement that you need to do to insert that data into the clean database? Doesn't have to be real data -- e.g. assume 3 columns initially, now grown to 5. What rules/procedure/recipe do you follow when producing (b) from (a) by hand? Assuming that 'select * from mytable' doesn't work, and subject to understanding the pulling and binding by hand thing, I would have thought the solution would look something like this: Write a script that loops around doing 's
Re: [sqlite] corrupt database recovery
Cool solution. On Sun, Apr 26, 2009 at 6:39 AM, Jim Wilcoxson wrote: > You could do a binary search to find the highest accessible rowid: > > select rowid where rowid = 2147483647 (fails) > select rowid where rowid = 1073741824 (fails) > ... > select rowid where rowid = 65536 (fails) > select rowid where rowid = 32768 (works!) > select rowid where rowid = 49152 (works!) > > Within 32 selects, you will find the highest accessible rowid. Then > do an insert from select * where rowid <= XXX. > > Jim > > On 4/25/09, Gene wrote: >> You are exactly right John...that is indeed what the code looks >> like...except we have over 25 columns (it's a flat table). >> >> We've already fixed the bad code, but there are some customers who have old >> versions...it didn't break very often with the old code, but it does still >> did. >> >> I haven't tried a select Min or max on the row id but a select count(*) >> returns an error...that's how I know I need to do the row by row recovery >> method. Select * from mytable also returns an error. >> >> The tables usually have tens of thousands of rows, sometimes over a couple >> hundred thousand but that's rare. >> >> What seems to work is that I do a select * from myTable where rowId = 'X' >> incing X until I get an error. After I get the error, every row higher then >> X also returns an error. So as soon as I get an error, I stop trying to >> recover more rows. > > -- > Software first. Software lasts! > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Jim Dodgen j...@dodgen.us ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
You could do a binary search to find the highest accessible rowid: select rowid where rowid = 2147483647 (fails) select rowid where rowid = 1073741824 (fails) ... select rowid where rowid = 65536 (fails) select rowid where rowid = 32768 (works!) select rowid where rowid = 49152 (works!) Within 32 selects, you will find the highest accessible rowid. Then do an insert from select * where rowid <= XXX. Jim On 4/25/09, Gene wrote: > You are exactly right John...that is indeed what the code looks > like...except we have over 25 columns (it's a flat table). > > We've already fixed the bad code, but there are some customers who have old > versions...it didn't break very often with the old code, but it does still > did. > > I haven't tried a select Min or max on the row id but a select count(*) > returns an error...that's how I know I need to do the row by row recovery > method. Select * from mytable also returns an error. > > The tables usually have tens of thousands of rows, sometimes over a couple > hundred thousand but that's rare. > > What seems to work is that I do a select * from myTable where rowId = 'X' > incing X until I get an error. After I get the error, every row higher then > X also returns an error. So as soon as I get an error, I stop trying to > recover more rows. -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
Hi Gene, On Apr 25, 2009, at 3:47 PM, Gene wrote: > Every now and again, we have a database that gets corrupt in the > field (bad > coding on our end, not sqlite). How do you corrupt a database with bad coding? Just curious... -- Tito ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
It's not hard to update it...I'm just whining about that it's out of date and I hate having to deal with matching everything up so I was hoping that I could get past the Attach. I think your python script below might ease my maintenance whoas a bit. I'll port it to C# and just build my insert on the fly. Thank you. Gene -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Machin Sent: Saturday, April 25, 2009 9:30 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery On 26/04/2009 11:28 AM, Gene wrote: > You are exactly right John...that is indeed what the code looks > like...except we have over 25 columns (it's a flat table). I presume that you are referring to this: """ Write a script that loops around doing 'select * from mytable where rowid = ?' on a connection to your corrupt database and doing 'insert into mytable values(?,?,?,?,? etc etc etc)' on a connection to your clean database. """ If that's correct, then surely the only maintenance you need to do to the above when an extra column is added to your table is to add an extra two characters ',?' to the insert statement ... you don't even have to do that e.g. # Python TABLE_NAME = "mytable" NUMBER_OF_COLS = 25 question_marks = ",".join("?" * NUMBER_OF_COLS) insert_sql = "insert into %s values (%s)" % (TABLE_NAME, question_marks) AND the output from the select should be able to be pumped straight into the insert with no changes at all. AND there might even be a pragma or suchlike that will enable you to easily find the number of columns on the fly in your script ... > > We've already fixed the bad code, but there are some customers who have old > versions...it didn't break very often with the old code, but it does still > did. > > I haven't tried a select Min or max on the row id but a select count(*) > returns an error...that's how I know I need to do the row by row recovery > method. Select * from mytable also returns an error. > > The tables usually have tens of thousands of rows, sometimes over a couple > hundred thousand but that's rare. > > What seems to work is that I do a select * from myTable where rowId = 'X' > incing X until I get an error. After I get the error, every row higher then > X also returns an error. So as soon as I get an error, I stop trying to > recover more rows. Does this usually mean that you are able to recover almost all of the rows? > The pull and bind code is just ugly and we don't update our 'recovery > utility' as quickly as we make changes to the database so it tends to get > out of data. That's all. I don't understand what is "the pull and bind" code and why you would need anything other that what I've outlined. Cheers, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
On 26/04/2009 11:28 AM, Gene wrote: > You are exactly right John...that is indeed what the code looks > like...except we have over 25 columns (it's a flat table). I presume that you are referring to this: """ Write a script that loops around doing 'select * from mytable where rowid = ?' on a connection to your corrupt database and doing 'insert into mytable values(?,?,?,?,? etc etc etc)' on a connection to your clean database. """ If that's correct, then surely the only maintenance you need to do to the above when an extra column is added to your table is to add an extra two characters ',?' to the insert statement ... you don't even have to do that e.g. # Python TABLE_NAME = "mytable" NUMBER_OF_COLS = 25 question_marks = ",".join("?" * NUMBER_OF_COLS) insert_sql = "insert into %s values (%s)" % (TABLE_NAME, question_marks) AND the output from the select should be able to be pumped straight into the insert with no changes at all. AND there might even be a pragma or suchlike that will enable you to easily find the number of columns on the fly in your script ... > > We've already fixed the bad code, but there are some customers who have old > versions...it didn't break very often with the old code, but it does still > did. > > I haven't tried a select Min or max on the row id but a select count(*) > returns an error...that's how I know I need to do the row by row recovery > method. Select * from mytable also returns an error. > > The tables usually have tens of thousands of rows, sometimes over a couple > hundred thousand but that's rare. > > What seems to work is that I do a select * from myTable where rowId = 'X' > incing X until I get an error. After I get the error, every row higher then > X also returns an error. So as soon as I get an error, I stop trying to > recover more rows. Does this usually mean that you are able to recover almost all of the rows? > The pull and bind code is just ugly and we don't update our 'recovery > utility' as quickly as we make changes to the database so it tends to get > out of data. That's all. I don't understand what is "the pull and bind" code and why you would need anything other that what I've outlined. Cheers, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
You are exactly right John...that is indeed what the code looks like...except we have over 25 columns (it's a flat table). We've already fixed the bad code, but there are some customers who have old versions...it didn't break very often with the old code, but it does still did. I haven't tried a select Min or max on the row id but a select count(*) returns an error...that's how I know I need to do the row by row recovery method. Select * from mytable also returns an error. The tables usually have tens of thousands of rows, sometimes over a couple hundred thousand but that's rare. What seems to work is that I do a select * from myTable where rowId = 'X' incing X until I get an error. After I get the error, every row higher then X also returns an error. So as soon as I get an error, I stop trying to recover more rows. The pull and bind code is just ugly and we don't update our 'recovery utility' as quickly as we make changes to the database so it tends to get out of data. That's all. I haven't tried a PRAGMA integrity_check; in a long time so I can't remember what it tells me. I'll run it again. Thanks for you comments John! -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Machin Sent: Saturday, April 25, 2009 8:15 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] corrupt database recovery On 26/04/2009 5:47 AM, Gene wrote: > Every now and again, we have a database that gets corrupt in the field (bad > coding on our end, not sqlite). > Hi Gene, The obvious question: why not fix the bad code? What does 'PRAGMA integrity_check;' say about these corrupt databases? > When we get one of these corrupt databases, we recover what we can by get > one row at a time by rowid, like 'select * from mytable where rowid = 1' and > we inc the row number every time. Each row we successfully recover we > insert into a clean database. How do you know when to stop incrementing the row number? Does 'select min(rowid), max(rowid) from mytable' give you believable answers? What happens when you do 'select * from mytable' ? Approximately how many rows are there? How many 'select * from mytable where rowid = ' attempts fail, and for what reasons (previously deleted, some other result code(s))? Are the failures restricted to a relatively small range of rowids? > Works pretty well, except over time we've added more and more columns, each > one that has to be pulled and bound to get it into the new table. What is causing this "pull and bind" problem, the fact that some columns weren't present initially? or just the sheer number of columns i.e. you need to "pull and bind" all columns (not just the later additions)? In any case, please explain what you mean by "pulled" and "bound". > We tried > to do an 'attach' so we could do a 'select into' the clean database directly > from the corrupt one. But as soon as we attempt to 'attach' the corrupt > database, we understandable get a 'database is malformed' error. It's not quite so understandable why 'select * from mytable where rowid = 1' doesn't get an error. > Is there an easier way to pull the good records out of a corrupt database > and put them into a new one without binding each column by hand? Can you give us an example of a row or two of (a) what you get from the 'select * from mytable where rowid = ' (b) the insert statement that you need to do to insert that data into the clean database? Doesn't have to be real data -- e.g. assume 3 columns initially, now grown to 5. What rules/procedure/recipe do you follow when producing (b) from (a) by hand? Assuming that 'select * from mytable' doesn't work, and subject to understanding the pulling and binding by hand thing, I would have thought the solution would look something like this: Write a script that loops around doing 'select * from mytable where rowid = ?' on a connection to your corrupt database and doing 'insert into mytable values(?,?,?,?,? etc etc etc)' on a connection to your clean database. HTH, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
On 26/04/2009 5:47 AM, Gene wrote: > Every now and again, we have a database that gets corrupt in the field (bad > coding on our end, not sqlite). > Hi Gene, The obvious question: why not fix the bad code? What does 'PRAGMA integrity_check;' say about these corrupt databases? > When we get one of these corrupt databases, we recover what we can by get > one row at a time by rowid, like 'select * from mytable where rowid = 1' and > we inc the row number every time. Each row we successfully recover we > insert into a clean database. How do you know when to stop incrementing the row number? Does 'select min(rowid), max(rowid) from mytable' give you believable answers? What happens when you do 'select * from mytable' ? Approximately how many rows are there? How many 'select * from mytable where rowid = ' attempts fail, and for what reasons (previously deleted, some other result code(s))? Are the failures restricted to a relatively small range of rowids? > Works pretty well, except over time we've added more and more columns, each > one that has to be pulled and bound to get it into the new table. What is causing this "pull and bind" problem, the fact that some columns weren't present initially? or just the sheer number of columns i.e. you need to "pull and bind" all columns (not just the later additions)? In any case, please explain what you mean by "pulled" and "bound". > We tried > to do an 'attach' so we could do a 'select into' the clean database directly > from the corrupt one. But as soon as we attempt to 'attach' the corrupt > database, we understandable get a 'database is malformed' error. It's not quite so understandable why 'select * from mytable where rowid = 1' doesn't get an error. > Is there an easier way to pull the good records out of a corrupt database > and put them into a new one without binding each column by hand? Can you give us an example of a row or two of (a) what you get from the 'select * from mytable where rowid = ' (b) the insert statement that you need to do to insert that data into the clean database? Doesn't have to be real data -- e.g. assume 3 columns initially, now grown to 5. What rules/procedure/recipe do you follow when producing (b) from (a) by hand? Assuming that 'select * from mytable' doesn't work, and subject to understanding the pulling and binding by hand thing, I would have thought the solution would look something like this: Write a script that loops around doing 'select * from mytable where rowid = ?' on a connection to your corrupt database and doing 'insert into mytable values(?,?,?,?,? etc etc etc)' on a connection to your clean database. HTH, John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users