Re: [firebird-support] internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 239)

2016-10-05 Thread Robert martin r...@chreos.com [firebird-support]

Hi

I am no expert in these things but I thought that DB file access that 
was not 'through' the FB server while FB server is running could cause 
DB file corruption.  With our systems we always exclude the FDB from 
virus scanning and external backup applications. We have a FB backup 
(GBak) run and then have the external backup tool backup the .fbk file.  
Seems like your external backup could be causing this, which would 
explain the regularity of the issue.


Thanks
Rob



On 4/10/2016 11:00 PM, Alexey Kovyazin a...@ib-aid.com [firebird-support] 
wrote:


Hi Ian,

It seems that you have multi-volume database, is it correct?

Regards,
Alexey Kovyazin
IBSurgeon


Hi FB Support!


We have been running FB since IB4.2 days and have only previously had 
any database corruption when we've had hardware issues...until 
recently, when I've seen this error on 2 different servers that I'm 
reasonably confident are both fine! However, our setup is probably 
fairly unique. I'm not panicking at the moment as we recovered all 
the data from the backup, but as I've had this twice in a month on 
different machines I'm keen to adjust the setup so it doesn't happen 
again.



We are running about 20 customers on a server, each with their own 
set of Firebird/PHP-FPM/Nginx/custom python servers running as 
services bound to separate IP addresses – this is (in theory!) to 
keep each customer isolated from the others whilst sharing most of 
the resources on the box without too much overhead from containers or 
full visualisation.



FB is listening as a tcpsvd service but with a shared lock folder:


exec tcpsvd -c 60 -u firebird:firebird -l $remote_bind_addr 
$remote_bind_addr $remote_bind_port /usr/sbin/fb_inet_server -i -e 
"/srv/$CUSTOMER/interbase/" -el "/tmp/firebird/tmpfs"



and with a “localhost” for our own convenience:


exec tcpsvd -c 60 -u firebird:firebird -l 127.0.0.1 127.0.0.1 3050 
/usr/sbin/fb_inet_server -i -e "/var/interbase/" -el 
"/tmp/firebird/tmpfs"



I had assumed because they were using the share lock folder that this 
is allowed.



Each night we loop through all the customers and run :


for database in /customers/*.gdb

do

echo “Doing $database:”

echo -n “gbak ”

gbak -b -g localhost:$database /backupdata/$database.gbak

echo $?

echo -n “gfix ”

gfix -sweep localhost:$database

echo $?

echo -n “isql “

echo “EXECURE PROCEDURE nightlySQL;” | isql localhost$database

echo $?

done


Whilst this backup is running we don't disable any external access 
and have a mixture of python and php clients accessing throughout the 
night using the $remote_bind_addr:$database connection string rather 
than localhost:.



Over the weekend a 12GB backup finished successfully, but errors 
started appearing in the firebird2.5.log file at the same time as the 
next database was being backed up – suggesting that either the gfix 
or isql had tripped up before the script moved on. However, the 
output from the backup was:



Doing :

gbak 0

gfix 0

isql 0


The firebird log has the next customer's sweep starting immediately 
after the first “internal Firebird consistency check (decompression 
overran buffer (179), file: sqz.cpp line: 239)” on the now corrupt 
database which suggests the isql line is “to blame” - but the 
nightlySQL doesn't do a lot, just deletes from a table and 
re-populates a load of summaries from a big table. The firebird log 
error coincides with an NGINX request for the data that the 
nightlySQL is building, but I've repeated this today and it doesn't 
by itself kill the database. I've got nothing else in either syslog 
or dmesg.



I've run IBSurgeon against the file. It gives me the following output:


03/10/2016 11:23:53 INFO: Open database files: Z:\home\**-bad.gdb


03/10/2016 11:23:53 INFO: Analyzing database low-level structures...

03/10/2016 11:27:42 INFO: Actual PageCount: 1534112 found in database

03/10/2016 11:27:42 ERROR: Found 18 undefined or unrecognized pages.

03/10/2016 11:27:42 INFO: == DATABASE IS READY FOR DIAGNOSING AND 
REPAIRING. 


03/10/2016 11:27:42 INFO: == Now choose "Diagnose" or "Repair". 

03/10/2016 11:51:31 INFO: --- Starting diagnose

03/10/2016 11:51:31 INFO: Running procedure: Header page check

03/10/2016 11:51:31 INFO: ODS Major = 11 (32779)

03/10/2016 11:51:31 INFO: ODS Minor = 2

03/10/2016 11:51:31 INFO: Next transaction = 13910909

03/10/2016 11:51:31 INFO: Oldest transaction = 13910907

03/10/2016 11:51:31 INFO: Oldest active = 13910908

03/10/2016 11:51:31 INFO: Oldest snapshot = 13910908

03/10/2016 11:51:31 INFO: PageSize is Ok = 8192

03/10/2016 11:51:31 INFO: Running procedure: Checking of RDB$Pages 
consistency


03/10/2016 11:53:42 INFO: Checking of RDB$Pages consistency: Ok

03/10/2016 11:53:42 INFO: Running procedure: Low-level check of all 
relations


03/10/2016 11:53:43 INFO: Relation RDB$DATABASE (1) is OK

03/10/2016 11:53:44 INFO: Relation RDB$FIELDS (2) is OK

03/10/2016 11:53:46 INFO: Relation 

Re: [firebird-support] internal Firebird consistency check (decompression overran buffer (179), file: sqz.cpp line: 239)

2016-10-04 Thread Alexey Kovyazin a...@ib-aid.com [firebird-support]

Hi Ian,

It seems that you have multi-volume database, is it correct?

Regards,
Alexey Kovyazin
IBSurgeon


Hi FB Support!


We have been running FB since IB4.2 days and have only previously had 
any database corruption when we've had hardware issues...until 
recently, when I've seen this error on 2 different servers that I'm 
reasonably confident are both fine! However, our setup is probably 
fairly unique. I'm not panicking at the moment as we recovered all the 
data from the backup, but as I've had this twice in a month on 
different machines I'm keen to adjust the setup so it doesn't happen 
again.



We are running about 20 customers on a server, each with their own set 
of Firebird/PHP-FPM/Nginx/custom python servers running as services 
bound to separate IP addresses – this is (in theory!) to keep each 
customer isolated from the others whilst sharing most of the resources 
on the box without too much overhead from containers or full 
visualisation.



FB is listening as a tcpsvd service but with a shared lock folder:


exec tcpsvd -c 60 -u firebird:firebird -l $remote_bind_addr 
$remote_bind_addr $remote_bind_port /usr/sbin/fb_inet_server -i -e 
"/srv/$CUSTOMER/interbase/" -el "/tmp/firebird/tmpfs"



and with a “localhost” for our own convenience:


exec tcpsvd -c 60 -u firebird:firebird -l 127.0.0.1 127.0.0.1 3050 
/usr/sbin/fb_inet_server -i -e "/var/interbase/" -el "/tmp/firebird/tmpfs"



I had assumed because they were using the share lock folder that this 
is allowed.



Each night we loop through all the customers and run :


for database in /customers/*.gdb

do

echo “Doing $database:”

echo -n “gbak ”

gbak -b -g localhost:$database /backupdata/$database.gbak

echo $?

echo -n “gfix ”

gfix -sweep localhost:$database

echo $?

echo -n “isql “

echo “EXECURE PROCEDURE nightlySQL;” | isql localhost$database

echo $?

done


Whilst this backup is running we don't disable any external access and 
have a mixture of python and php clients accessing throughout the 
night using the $remote_bind_addr:$database connection string rather 
than localhost:.



Over the weekend a 12GB backup finished successfully, but errors 
started appearing in the firebird2.5.log file at the same time as the 
next database was being backed up – suggesting that either the gfix or 
isql had tripped up before the script moved on. However, the output 
from the backup was:



Doing :

gbak 0

gfix 0

isql 0


The firebird log has the next customer's sweep starting immediately 
after the first “internal Firebird consistency check (decompression 
overran buffer (179), file: sqz.cpp line: 239)” on the now corrupt 
database which suggests the isql line is “to blame” - but the 
nightlySQL doesn't do a lot, just deletes from a table and 
re-populates a load of summaries from a big table. The firebird log 
error coincides with an NGINX request for the data that the nightlySQL 
is building, but I've repeated this today and it doesn't by itself 
kill the database. I've got nothing else in either syslog or dmesg.



I've run IBSurgeon against the file. It gives me the following output:


03/10/2016 11:23:53 INFO: Open database files: Z:\home\**-bad.gdb


03/10/2016 11:23:53 INFO: Analyzing database low-level structures...

03/10/2016 11:27:42 INFO: Actual PageCount: 1534112 found in database

03/10/2016 11:27:42 ERROR: Found 18 undefined or unrecognized pages.

03/10/2016 11:27:42 INFO: == DATABASE IS READY FOR DIAGNOSING AND 
REPAIRING. 


03/10/2016 11:27:42 INFO: == Now choose "Diagnose" or "Repair". 

03/10/2016 11:51:31 INFO: --- Starting diagnose

03/10/2016 11:51:31 INFO: Running procedure: Header page check

03/10/2016 11:51:31 INFO: ODS Major = 11 (32779)

03/10/2016 11:51:31 INFO: ODS Minor = 2

03/10/2016 11:51:31 INFO: Next transaction = 13910909

03/10/2016 11:51:31 INFO: Oldest transaction = 13910907

03/10/2016 11:51:31 INFO: Oldest active = 13910908

03/10/2016 11:51:31 INFO: Oldest snapshot = 13910908

03/10/2016 11:51:31 INFO: PageSize is Ok = 8192

03/10/2016 11:51:31 INFO: Running procedure: Checking of RDB$Pages 
consistency


03/10/2016 11:53:42 INFO: Checking of RDB$Pages consistency: Ok

03/10/2016 11:53:42 INFO: Running procedure: Low-level check of all 
relations


03/10/2016 11:53:43 INFO: Relation RDB$DATABASE (1) is OK

03/10/2016 11:53:44 INFO: Relation RDB$FIELDS (2) is OK

03/10/2016 11:53:46 INFO: Relation RDB$INDEX_SEGMENTS (3) is OK

03/10/2016 11:53:47 INFO: Relation RDB$INDICES (4) is OK

03/10/2016 11:53:47 INFO: Relation RDB$RELATION_FIELDS (5) is OK

03/10/2016 11:53:48 INFO: Relation RDB$RELATIONS (6) is OK

03/10/2016 11:53:48 INFO: Relation RDB$VIEW_RELATIONS (7) is OK

03/10/2016 11:53:48 INFO: Relation RDB$FORMATS (8) is OK

03/10/2016 11:53:48 INFO: Relation RDB$SECURITY_CLASSES (9) is OK

03/10/2016 11:53:48 ERROR: DP#1533855 has wrong rel#:136

03/10/2016 11:53:48 ERROR: Found 1 record errors on datapage#1533855

03/10/2016 11:53:48 ERROR: Error on