subject:"\[HACKERS\] What is happening on buildfarm member dugong"

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-16 Thread Gregory Stark

"Stefan Kaltenbrunner" <[EMAIL PROTECTED]> writes:

> Tom Lane wrote:
>> Teodor Sigaev <[EMAIL PROTECTED]> writes:
>>> It seems to me last run 
>>> (http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugong&dt=2007-09-11%2016:05:01)
>>>  
>>> points to problem with hash implementation.
>> 
>>>SELECT to_tsvector('thesaurus_tst', 'one postgres one two one two three 
>>> one');
>>> + NOTICE:  thesaurus word-sample "the" is recognized as stop-word, assign 
>>> any 
>>> stop-word (rule 8)

FWIW what does this message even mean? Is "assign any stop-word" just a
description of where this message is coming from? Is it imperative,
instructing the user to do something?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-16 Thread Stefan Kaltenbrunner

Tom Lane wrote:
> Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> It seems to me last run 
>> (http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugong&dt=2007-09-11%2016:05:01)
>>  
>> points to problem with hash implementation.
> 
>>SELECT to_tsvector('thesaurus_tst', 'one postgres one two one two three 
>> one');
>> + NOTICE:  thesaurus word-sample "the" is recognized as stop-word, assign 
>> any 
>> stop-word (rule 8)
> 
>> At this place of tsdicts test dictionary thesaurus should be already loaded 
>> and 
>> initialized, but this NOTICE points that thesaurus was initialized here.
> 
> I just realized what that probably is actually from: there was a cache
> invalidation event sometime between when thesaurus was initially loaded
> and when this statement tried to use it, so the ts_cache entry had to
> be reloaded.  The parallel regression tests are quite capable of
> provoking sinval queue overflows (and ensuing cache resets) at fairly
> random places.
> 
> It is not good design to have any user-visible behavior that occurs during
> a cache load, because you can't predict when those will happen.  Perhaps
> this NOTICE should not be emitted, or should be emitted from some other
> place.
> 
> (In fact, this test ought to be failing right now on whichever buildfarm
> machine is supposed to be testing CLOBBER_CACHE_ALWAYS.  Which animal
> was that again?)

fwiw - I managed to trigger that exact same regression failure during my
testing of the pltcl patch on quagga too ...


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-16 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> It seems to me last run 
> (http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugong&dt=2007-09-11%2016:05:01)
>  
> points to problem with hash implementation.

>SELECT to_tsvector('thesaurus_tst', 'one postgres one two one two three 
> one');
> + NOTICE:  thesaurus word-sample "the" is recognized as stop-word, assign any 
> stop-word (rule 8)

> At this place of tsdicts test dictionary thesaurus should be already loaded 
> and 
> initialized, but this NOTICE points that thesaurus was initialized here.

I just realized what that probably is actually from: there was a cache
invalidation event sometime between when thesaurus was initially loaded
and when this statement tried to use it, so the ts_cache entry had to
be reloaded.  The parallel regression tests are quite capable of
provoking sinval queue overflows (and ensuing cache resets) at fairly
random places.

It is not good design to have any user-visible behavior that occurs during
a cache load, because you can't predict when those will happen.  Perhaps
this NOTICE should not be emitted, or should be emitted from some other
place.

(In fact, this test ought to be failing right now on whichever buildfarm
machine is supposed to be testing CLOBBER_CACHE_ALWAYS.  Which animal
was that again?)

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-13 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
> It turned out that the offending assert is
> Assert(BgWriterShmem != NULL); in bgwriter.c:990
> After commenting it out everything works.

That's simply bizarre ...

> Also, I tried to add 'volatile' to the declaration of BgWriterShmem. After 
> that the problem disappears too.

Hm.  I don't see any very good reason in the code to add the "volatile",
and I see at least one place where we'd have to cast it away (the MemSet
at line 836).  My inclination is just to remove the Assert at line 990.
It's not proving anything, since if indeed BgWriterShmem was NULL there,
we'd dump core on the dereferences just a couple lines below.

Do you want this patched any further back than HEAD?  The buildfarm
status page doesn't show dugong doing any back branches ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-13 Thread Sergey E. Koposov


On Tue, 11 Sep 2007, Tom Lane wrote:


Well, the first thing I'd suggest is trying to localize which Assert
makes it fail.  From the bug's behavior I think it is highly probable
that the problem is in fsync signalling, which puts it either in
bgwriter.c or md.c.  Try recompiling those modules separately without
cassert (leaving all else enabled) and see if the problem comes and
goes; if so, comment out one Assert at a time till you find which one.


It turned out that the offending assert is
Assert(BgWriterShmem != NULL); in bgwriter.c:990
After commenting it out everything works.

Also, I tried to add 'volatile' to the declaration of BgWriterShmem. After 
that the problem disappears too.


I'm not sure that it demonstrates that it's not an ICC bug, because 
obviously 'volatile' flag can change the way how the compiler works...


I tried add the volatile keyword for BgWriterMem in PG 8.2.4, and indeed 
it solved the problem with PG8.2.4 version too.


From what I see in bgwriter.c, the volatile keyword for BgWriterShmem 
seems very reasonable to me, although I'm not sure that it's really 
required there


regards,
Sergey

PS I'm sorry for the wrong information about anti-aliasing flags for ICC. 
I was obviously confused by the ICC docs.


***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov


On Tue, 11 Sep 2007, Andrew Dunstan wrote:


Your buildfarm member must be seriously misconfigured if you can get the logs 
from different postmasters comingled. Every run gets its own logfile in its 
own inst directory.


No, everything I'm doing now about that bug, I'm doing in the very 
separate from the buildfarm place. That logs were mixed there.


regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Andrew Dunstan




Sergey E. Koposov wrote:


With hash_seq_search ERROR, it was partially a false alarm. I've had 
some old postgres daemon hanging around and writing that to the log. 
Although I remember seeing that hash_seq_search message recently when 
dealing with this bug, it does not show up in the course of standard 
regression tests.



Your buildfarm member must be seriously misconfigured if you can get the 
logs from different postmasters comingled. Every run gets its own 
logfile in its own inst directory.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
> Yes, indeed. After several make installcheck's
> I get
> ERROR:  too many active hash_seq_search scans, cannot start one on "smgr 
> relation table"
> ERROR:  too many active hash_seq_search scans, cannot start one on "smgr 
> relation table"

Hm, so that must be coming from smgrcloseall(), which is the only user
of hash_seq_search on SMgrRelationHash.  I bet that's popping up once a
second and the bgwriter is getting nothing done, because it's failing
again at the bottom of error recovery :-(.  It's a good thing you
happened to notice those messages, because this is a pretty bad bug.

Anyway, I've committed a fix for that, so we can get back to the main
question, which is why you're getting the fsync error in the first place.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov


On Tue, 11 Sep 2007, Tom Lane wrote:


"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:

On Tue, 11 Sep 2007, Tom Lane wrote:
NOTICE:  database "contrib_regression" does not exist, skipping
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans



With hash_seq_search ERROR, it was partially a false alarm. I've had some
old postgres daemon hanging around and writing that to the log.
Although I remember seeing that hash_seq_search message recently when
dealing with this bug, it does not show up in the course of standard
regression tests.


Yeah, it's not there on your buildfarm reports, but that's not totally
surprising.  I would expect it to start showing up after 100 failed
checkpoint attempts, which is how long it'd take the bgwriter's
hash_seq_search table to overflow ...


Yes, indeed. After several make installcheck's
I get
ERROR:  too many active hash_seq_search scans, cannot start one on "smgr 
relation table"
ERROR:  too many active hash_seq_search scans, cannot start one on "smgr 
relation table"


Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
> On Tue, 11 Sep 2007, Tom Lane wrote:
> NOTICE:  database "contrib_regression" does not exist, skipping
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans

> With hash_seq_search ERROR, it was partially a false alarm. I've had some 
> old postgres daemon hanging around and writing that to the log. 
> Although I remember seeing that hash_seq_search message recently when 
> dealing with this bug, it does not show up in the course of standard 
> regression tests.

Yeah, it's not there on your buildfarm reports, but that's not totally
surprising.  I would expect it to start showing up after 100 failed
checkpoint attempts, which is how long it'd take the bgwriter's
hash_seq_search table to overflow ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov


On Tue, 11 Sep 2007, Tom Lane wrote:

NOTICE:  database "contrib_regression" does not exist, skipping
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans


With hash_seq_search ERROR, it was partially a false alarm. I've had some 
old postgres daemon hanging around and writing that to the log. 
Although I remember seeing that hash_seq_search message recently when 
dealing with this bug, it does not show up in the course of standard 
regression tests.


regards,
Sergey
***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
> It seems to me last run 
> (http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugong&dt=2007-09-11%2016:05:01)
>  
> points to problem with hash implementation.

dynahash.c is used all over the system, though.  If it were broken by a
compiler issue, it's hard to credit that we'd be getting through all but
one or two regression tests ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Teodor Sigaev

It seems to me last run 
(http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=dugong&dt=2007-09-11%2016:05:01) 
points to problem with hash implementation.



*** ./expected/tsdicts.out  Tue Sep 11 20:05:23 2007
--- ./results/tsdicts.out   Tue Sep 11 20:18:38 2007
***
*** 301,306 
--- 301,307 
lword, lpart_hword, lhword
WITH synonym, thesaurus, english_stem;
  SELECT to_tsvector('thesaurus_tst', 'one postgres one two one two three one');
+ NOTICE:  thesaurus word-sample "the" is recognized as stop-word, assign any 
stop-word (rule 8)


At this place of tsdicts test dictionary thesaurus should be already loaded and 
initialized, but this NOTICE points that thesaurus was initialized here.



ERROR:  too many active hash_seq_search scans


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
> Actually, in the log file I also see some messages about has_seq_search:

> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans

BTW, I just made a commit to include the hash table name in this
message.  Could you update src/backend/utils/hash/dynahash.c and retry
the test?  I suspect it'll say the bgwriter's pending-ops table, but
we should verify that.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
> Actually, in the log file I also see some messages about has_seq_search:

> =EB=EF=ED=E1=EE=E4=E1:  CREATE DATABASE "contrib_regression" TEMPLATE=3Dtem=
> plate0
> NOTICE:  database "contrib_regression" does not exist, skipping
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans
> ERROR:  too many active hash_seq_search scans
> ERROR:  could not fsync segment 0 of relation 1663/16384/2617: No such file=
>  or directory
> ERROR:  checkpoint request failed

That could be a consequent effect I think --- bgwriter is lacking an
AtEOXact_HashTables call in error recovery (something I will go fix)
and so after enough fsync errors we'd start getting these.

Anyway it seems we need to cast the net a bit wider for where the
troublesome Assert is.  I'd suggest rebuilding the whole system with
--enable-cassert, then comment out the USE_ASSERT_CHECKING #define
in pg_config.h, and "make clean/make" in one backend subdirectory
at a time till you see where it stops failing.  Then repeat at the
file level.  Divide and conquer...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov



Actually, in the log file I also see some messages about has_seq_search:

КОМАНДА:  CREATE DATABASE "contrib_regression" TEMPLATE=template0
NOTICE:  database "contrib_regression" does not exist, skipping
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  too many active hash_seq_search scans
ERROR:  could not fsync segment 0 of relation 1663/16384/2617: No such file or 
directory
ERROR:  checkpoint request failed

I also tried to turn off asserting for dynahash.c, but it didn't help...

regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov



Well, the first thing I'd suggest is trying to localize which Assert
makes it fail.  From the bug's behavior I think it is highly probable
that the problem is in fsync signalling, which puts it either in
bgwriter.c or md.c.  Try recompiling those modules separately without
cassert (leaving all else enabled) and see if the problem comes and
goes; if so, comment out one Assert at a time till you find which one.

Actually ... another possibility is that it's not directly an Assert,
but CLOBBER_FREED_MEMORY that exposes the bug.  (This would suggest
that the compiler is trying to re-order memory accesses around a pfree.)
So before you get into the one-assert-at-a-time test, try with
--enable-cassert but modify pg_config_manual.h to not define
CLOBBER_FREED_MEMORY.


It seems that neither undefining CLOBBER_FREED_MEMORY, nor disabling 
casserting for md.c and bgwriter.c helps The contrib-installcheck 
still fails.


I disabled casserting for md.c and bgwriter.c by inserting
#undef USE_ASSERT_CHECKING
in the top of the md.c and bgwriter.c exactly after the inclusion of 
postgres.h, but before other includes. (I think it is the right 
way to do it)



My configure flags:
./configure --enable-cassert --enable-depend --enable-debug --enable-nls 
--enable-integer-datetimes --with-libxml LDFLAGS='-lirc -limf' 
--enable-depend --prefix=/home/math/cvs/install/ CC=ic


Regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
>> BTW, does ICC have any switch corresponding to gcc's -fno-strict-aliasing?
>> I see that configure tries to feed that switch to it, but it might
>> want some other spelling.

> Apparently in none of the ICC manuals -fno-strict-aliasing is described, 
> but ICC accepts such flag, and produce the same code as with
> '-fno-alias' flag (described in ICC manuals).

Well, since configure.in has a separate code path for ICC anyway,
it seems like we might as well provide it the official spelling.
Any objections to a patch like this?

 if test "$GCC" = yes -a "$ICC" = no; then
   CFLAGS="$CFLAGS -Wall -Wmissing-prototypes -Wpointer-arith -Winline"
   # These work in some but not all gcc versions
   PGAC_PROG_CC_CFLAGS_OPT([-Wdeclaration-after-statement])
   PGAC_PROG_CC_CFLAGS_OPT([-Wendif-labels])
   # Disable strict-aliasing rules; needed for gcc 3.3+
   PGAC_PROG_CC_CFLAGS_OPT([-fno-strict-aliasing])
 elif test "$ICC" = yes; then
   # Intel's compiler has a bug/misoptimization in checking for
   # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
   PGAC_PROG_CC_CFLAGS_OPT([-mp1])
-  # Not clear if this is needed, but seems like a good idea
-  PGAC_PROG_CC_CFLAGS_OPT([-fno-strict-aliasing])
+  # ICC prefers to spell the no-strict-aliasing switch like this
+  PGAC_PROG_CC_CFLAGS_OPT([-fno-alias])
 elif test x"${CC}" = x"xlc"; then
   # AIX xlc has to have strict aliasing turned off too
   PGAC_PROG_CC_CFLAGS_OPT([-qnoansialias])
 fi

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov


This could be a compiler bug, or it could be our fault --- might need
a "volatile" on some pointer or other, for example, to prevent the
compiler from making an otherwise legitimate assumption.  So it seems
worth chasing it down.


Tom, Thank you for the directions, I'll try to do what you recommended.


BTW, does ICC have any switch corresponding to gcc's -fno-strict-aliasing?
I see that configure tries to feed that switch to it, but it might
want some other spelling.


Apparently in none of the ICC manuals -fno-strict-aliasing is described, 
but ICC accepts such flag, and produce the same code as with

'-fno-alias' flag (described in ICC manuals).

regards,
Sergey


***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Tom Lane

"Sergey E. Koposov" <[EMAIL PROTECTED]> writes:
> On Tue, 11 Sep 2007, Tom Lane wrote:
>> dugong has been failing contribcheck repeatably for the last day or so,
>> with a very interesting symptom: CREATE DATABASE is failing with

> The reason for that is that I've been trying to switch from 9.1 to 10.0 
> version of the ICC compiler.

Hah, interesting.

> Few notes:
> 1) without the --enable-cassert everything works
> 2) with --enable-cassert it, the only thing that fails in the tests is 
> contrib-installcheck...
> 3) And recently I tried to compile PG also with -O0 flag, it actually 
> worked.
> 4) Also, just now I tried to compile PG 8.2.4 and the same problem occurs.

> So, I can either completely switch back to 9.1 and forget it, or we 
> can try to find or at least localize this bug(if it is ICC fault). But to 
> do that, I need some advices/help, how to do it better...

Well, the first thing I'd suggest is trying to localize which Assert
makes it fail.  From the bug's behavior I think it is highly probable
that the problem is in fsync signalling, which puts it either in
bgwriter.c or md.c.  Try recompiling those modules separately without
cassert (leaving all else enabled) and see if the problem comes and
goes; if so, comment out one Assert at a time till you find which one.

Actually ... another possibility is that it's not directly an Assert,
but CLOBBER_FREED_MEMORY that exposes the bug.  (This would suggest
that the compiler is trying to re-order memory accesses around a pfree.)
So before you get into the one-assert-at-a-time test, try with
--enable-cassert but modify pg_config_manual.h to not define
CLOBBER_FREED_MEMORY.

This could be a compiler bug, or it could be our fault --- might need
a "volatile" on some pointer or other, for example, to prevent the
compiler from making an otherwise legitimate assumption.  So it seems
worth chasing it down.

BTW, does ICC have any switch corresponding to gcc's -fno-strict-aliasing?
I see that configure tries to feed that switch to it, but it might
want some other spelling.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] What is happening on buildfarm member dugong

2007-09-11 Thread Sergey E. Koposov


On Tue, 11 Sep 2007, Tom Lane wrote:


dugong has been failing contribcheck repeatably for the last day or so,
with a very interesting symptom: CREATE DATABASE is failing with


The reason for that is that I've been trying to switch from 9.1 to 10.0 
version of the ICC compiler. A month ago, I've tried for the first time, 
discovered a segfault due the bug in ICC, submitted it to Intel. Lately 
Intel fixed it in icc 10.0.026. And in last several days I tried to 
make the new version work with postgres. From the first impression it 
worked, so I upgraded the compiler for the buildfarm, but the buildfarm 
failed.


Few notes:
1) without the --enable-cassert everything works
2) with --enable-cassert it, the only thing that fails in the tests is 
contrib-installcheck...
3) And recently I tried to compile PG also with -O0 flag, it actually 
worked.

4) Also, just now I tried to compile PG 8.2.4 and the same problem occurs.

So for me the most probable explanation is the ICC bug, but 
unfortunately since it is not a pure segfault, it is a bit hard for me to 
tackle...


So, I can either completely switch back to 9.1 and forget it, or we 
can try to find or at least localize this bug(if it is ICC fault). But to 
do that, I need some advices/help, how to do it better...


regards,
Sergey
***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] What is happening on buildfarm member dugong?

2007-09-10 Thread Tom Lane

dugong has been failing contribcheck repeatably for the last day or so,
with a very interesting symptom: CREATE DATABASE is failing with

ERROR:  could not fsync segment 0 of relation 1663/40960/41403: No such file or 
directory
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.

I'd think we broke something in fsync cancellation signalling, except
that AFAICS there were no CVS commits at all between the last working
build at 2007-09-09 23:05:01 UTC and the first failure at 2007-09-10
00:45:27 UTC (and even if I got the timezone conversion wrong, there are
no nearby commits in the backend/storage area).  Has this machine had
any system-software updates around then?  Can anyone suggest another
theory?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

Re: [HACKERS] What is happening on buildfarm member dugong

[HACKERS] What is happening on buildfarm member dugong?

22 matches

Site Navigation

Mail list logo

Footer information