On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
> <john.nay...@2ndquadrant.com> wrote:
> >
> > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila <amit.kapil...@gmail.com>
wrote:
> > > There are a few buildfarm failures due to this commit, see my email on
> > > pgsql-committers.  If you have time, you can also once look into
> > > those.
> >
> > I didn't see anything in common with the configs of the failed
> > members. None have a non-default BLCKSZ that I can see.
> >
>
> I have done an analysis of the different failures on buildfarm.
>
>
> 2.
> @@ -15,13 +15,9 @@
>  SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS
main_100;
>  ERROR:  block number 100 is out of range for relation "test_rel_forks"
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> - fsm_0
> --------
> -  8192
> -(1 row)
> -
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or
directory
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> -ERROR:  block number 10 is out of range for relation "test_rel_forks"
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or
directory
>
> This indicates that even though the Vacuum is executed, but the FSM
> doesn't get created.  This could be due to different BLCKSZ, but the
> failed machines don't seem to have a non-default value of it.  I am
> not sure why this could happen, maybe we need to check once in the
> failed regression database to see the size of relation?
>

This symptom is shown in the below buildfarm critters:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2019-01-28%2005%3A05%3A22
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-28%2003%3A20%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2019-01-28%2003%3A13%3A47
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-01-28%2003%3A07%3A39

All of these seems to run with fsync=off.  Is it possible that vacuum has
updated FSM, but the same is not synced to disk and when we try to read it,
we didn't get the required page?  This is just a guess.

I have checked all the buildfarm failures and I see only 4 symptoms for
which I have sent some initial analysis.  I think you can also once
cross-verify the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply via email to