Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-26 Thread Patrick Welche

On Thu, Jan 25, 2001 at 10:13:29PM -0500, Tom Lane wrote:
 Frank Joerdens [EMAIL PROTECTED] writes:
  I just did that and ran make check 4 times. 3 times went completely
  smoothly, once I had random fail. This is the same behaviour that I saw
  when running make installcheck (76 successful most of the time,
  sometimes you get 75 out of 76 with random being the one that fails).
 
 Er, you do realize that the random test is *supposed* to fail every so
 often?  (Else it'd not be random...)  See the pages on interpreting
 regression test results in the admin guide.
 
 What troubles me is the nonrepeatable failures you saw on other tests.
 As Peter says, if "make installcheck" (serial tests) is perfectly solid
 and "make check" (parallel tests) is not, that suggests some kind of
 interprocess locking problem.  But we haven't heard about any such issue
 on Solaris.

Or simply running out of processes - check maxproc? (Deleted beginning of
this thread, so may have missed something)

Cheers,

Patrick



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 12:42:45AM +0100, Peter Eisentraut wrote:
 Frank Joerdens writes:
 
 [randomly varying set of regression tests fail]
 
  Running the tests on my Linux box gives no failed tests. Must I assume
  that those failed tests indicate some issue that is is detrimental to
  the proper functioning of the server on this Solaris installation? Do
  you want the regression.diffs?
 
 Could you go into src/test/regress/pg_regress.sh and edit around line 162
 
 #case $host_platform in
 #*-*-qnx* | *beos*)
 unix_sockets=no;;
 #*)
 #unix_sockets=yes;;
 #esac
 
 (i.e., ensure that unix_sockets is set to 'no'), and rerun 'make check'.

I just did that and ran make check 4 times. 3 times went completely
smoothly, once I had random fail. This is the same behaviour that I saw
when running make installcheck (76 successful most of the time,
sometimes you get 75 out of 76 with random being the one that fails).
 
 I have experienced before that Unix sockets will cause random connection
 abortions on Solaris [ . . . ]

Isn't that _really_ bad? Random connection abortions when going over
Unix sockets?? My app does _all_ the connecting over Unix sockets?!

  I also tried using the Sun compiler, which didn't work at all.
 
 details on "didn't work" requested...

-- begin details --
$ export CC=CC
$ echo $CC
CC
$ ./configure
creating cache ./config.cache
checking host system type... sparc-sun-solaris2.7
checking which template to use... solaris
checking whether to build with locale support... no
checking whether to build with recode support... no
checking whether to build with multibyte character support... no
checking whether to build with Unicode conversion support... no
checking for default port number... 5432
checking for default soft limit on number of connections... 32
checking for gcc... CC
checking whether the C compiler (CC  ) works... yes
checking whether the C compiler (CC  ) is a cross-compiler... no
checking whether we are using GNU C... no
checking whether CC accepts -g... yes
using CFLAGS=-v
checking whether the C compiler (CC -Xa -v ) works... no
configure: error: installation or configuration problem: C compiler
cannot create executables.
-- end details --

Cheers, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 05:12:02PM +0100, Peter Eisentraut wrote:
 Frank Joerdens writes:
 
   I have experienced before that Unix sockets will cause random connection
   abortions on Solaris [ . . . ]
 
  Isn't that _really_ bad? Random connection abortions when going over
  Unix sockets?? My app does _all_ the connecting over Unix sockets?!
 
 That's bad, for sure.  Maybe you can check for odd conditions surrounding
 the /tmp directory, like is it on NFS, permission problems, mount options.

I don't have neither root nor physical access to this machine, hence my
options are kinda limited. However, the sysadmin told me that most of
the storage space on this box is mounted over a fibre channel (I only
have a very hazy notion of what exactly that might be) from a "storage
server" which is allegedly as fast as a local SCSI disk.

 Or is there something odd in the kernel configuration?  If I'm counting
 correctly this is the third independent report of this problem, which is
 scary.

I'll question the sysadmin about that. But why does make installcheck
work? Because it goes over TCP/IP sockets by default?

 
I also tried using the Sun compiler, which didn't work at all.
  
   details on "didn't work" requested...
 
  -- begin details --
  $ export CC=CC
 
 Using a C++ compiler to compile C code won't work.  You probably meant
 CC=cc and CXX=CC.

When I do that, make fails with the following error (after giving lots
of warnings):

"pg_dump.c", line 1063: warning: Function has no return statement : main
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
common.o common.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_archiver.o pg_backup_archiver.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_db.o pg_backup_db.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_custom.o pg_backup_custom.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_files.o pg_backup_files.c
cc -Xa -v  -I../../../src/include -I../../../src/interfaces/libpq  -c -o
pg_backup_null.o pg_backup_null.c
"pg_backup_null.c", line 90: controlling expressions must have scalar
type
cc: acomp failed for pg_backup_null.c
make[3]: *** [pg_backup_null.o] Error 2
make[3]: Leaving directory
`/usr/users/fjoerde/postgres/postgresql-7.1beta3_test/src/bin/pg_dump'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/usr/users/fjoerde/postgres/postgresql-7.1beta3_test/src/bin'
make[1]: *** [all] Error 2
make[1]: Leaving directory
`/usr/users/fjoerde/postgres/postgresql-7.1beta3_test/src'
make: *** [all] Error 2

Regards, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 05:12:02PM +0100, Peter Eisentraut wrote:
 Frank Joerdens writes:
 
   I have experienced before that Unix sockets will cause random connection
   abortions on Solaris [ . . . ]
 
  Isn't that _really_ bad? Random connection abortions when going over
  Unix sockets?? My app does _all_ the connecting over Unix sockets?!
 
 That's bad, for sure.  Maybe you can check for odd conditions surrounding
 the /tmp directory, like is it on NFS, permission problems, mount options.

I just typed

$ mount

and I get

/tmp on swap read/write/setuid on Mon Jan 22 16:39:32 2001

for the /tmp directory, which looks distinctly odd to me. What kind of
device is swap (I know what swap is normally but I didn't know you could
mount stuff there . . . )??

Regards, Frank



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Frank Joerdens

On Thu, Jan 25, 2001 at 12:04:40PM -0800, Ian Lance Taylor wrote:
[ . . . ]
  for the /tmp directory, which looks distinctly odd to me. What kind of
  device is swap (I know what swap is normally but I didn't know you could
  mount stuff there . . . )??
 
 That is a tmpfs file system which uses swap space for /tmp storage.
 Both swap usage and /tmp compete for the same partition on the disk.
 If you have a lot of swapping programs, you don't get to put much in
 /tmp.  If you have a lot of files in /tmp, you don't get to run many
 programs.
 
 As far as I can recall, this is a Sun specific thing.
 
 It's a reasonable idea on a stable system.  It's a pretty crummy idea
 on a development system, or one with unpredictable loads.  My
 experience is that either something goes crazy and fills up /tmp and
 then you can't run anything else and you have to reboot, or something
 goes crazy and fills up swap and then you can't write any /tmp files
 and daemon processes start to silently die and you have to reboot.

Very peculiar, or crummy, indeed. This is system is not used by anyone
else besides myself at the moment (cuz it's just being built up), as far
a I can tell, and is ludicrously overpowered (3 CPUs, 768 MB RAM) for
the mundane uses I am subjecting it to (installing and testing
Postgresql).

Regards, Frank 



Re: [HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-25 Thread Tom Lane

Frank Joerdens [EMAIL PROTECTED] writes:
 I just did that and ran make check 4 times. 3 times went completely
 smoothly, once I had random fail. This is the same behaviour that I saw
 when running make installcheck (76 successful most of the time,
 sometimes you get 75 out of 76 with random being the one that fails).

Er, you do realize that the random test is *supposed* to fail every so
often?  (Else it'd not be random...)  See the pages on interpreting
regression test results in the admin guide.

What troubles me is the nonrepeatable failures you saw on other tests.
As Peter says, if "make installcheck" (serial tests) is perfectly solid
and "make check" (parallel tests) is not, that suggests some kind of
interprocess locking problem.  But we haven't heard about any such issue
on Solaris.

regards, tom lane



[HACKERS] beta3 Solaris 7 (SPARC) port report [ Was: Looking for . . . ]

2001-01-24 Thread Frank Joerdens

On Tue, Jan 23, 2001 at 11:57:52AM -0500, Tom Lane wrote:
[ . . . ]
 After you build PG and test it, send us a port report, and we'll add
 Solaris 7 to the list of recently tested platforms.  That's how it
 works ...

The installation by simply running configure, make, make install went
completely smoothly, no hassle whatsoever (except for the
flex-is-not-present warning which I think you can ignore)! 

The system is, to be precise:

$ uname -a 

SunOS [hostname] 5.7 Generic_106541-12 sun4u sparc SUNW,Ultra-4

I did encounter some _weird_ stuff with the regression tests. Does that
not work via make check (the 'standalone' variety) when you've already
typed make install (on Linux it does!)?? Make installcheck seems to
produce non-failures semi-reliably (why does the random test not fail on
the 1st try, but on the 2nd, and then again not on the 3rd???). Below
are the dirty details.

As to what is mentioned in the Admin Guide about Solaris' default
settings for shared memore being too low, at least on the machine I am
testing on it is set to 4 GB!

$ cat /etc/system |grep shm
*   exclude: sys/shmsys
set shmsys:shminfo_shmmax = 4294967295
set shmsys:shminfo_shmmin = 1
set shmsys:shminfo_shmmni = 100
set shmsys:shminfo_shmseg = 10


Cheers, Frank

-- begin dirty details --
I can start, connect, create databases etc.. However, running the
regression tests gives 4 failed out of 76:

 reltime  ... FAILED
 tinterval... FAILED
test horology ... FAILED
test misc ... FAILED

I checked the timezone issue mentioned in the src/test/regress/README
file. The command

$ env TZ=PST8PDT date

returns 'Wed Jan 24 11:19:02 PST 2001', 9 hrs back, which is the time
difference between here and California, so I guess that is OK.

Running the tests on my Linux box gives no failed tests. Must I assume
that those failed tests indicate some issue that is is detrimental to
the proper functioning of the server on this Solaris installation? Do
you want the regression.diffs?

I also tried using the Sun compiler, which didn't work at all. 

 . . . [ goes away to do more testing ] . . .

What's really weird, I just ran ./configure, make, make install, make
check again, again with 4 failed, but different ones! 


 tinterval... FAILED
 inet ... FAILED
 comments ... FAILED
test misc ... FAILED


2 things were different: a) I set the compiler explicitly to
/usr/local/bin/gcc via the CC environment variable and b) I used the
default prefix this time. I'll try again with the old settings. 

 . . . [ goes away to do more testing ] . . .

make distclean
./configure --prefix=/usr/db/pgsql
make
make check

produces 6 out of 76 this time! They are:

 date ... FAILED
 type_sanity  ... FAILED
 opr_sanity   ... FAILED
 arrays   ... FAILED
 btree_index  ... FAILED
test misc ... FAILED

It looks progressively worse. I'll remove the source tree and start from scratch.

 . . . [ goes away to do more testing ] . . .

6 out of 76 again, but different ones . . .

 interval ... FAILED
 abstime  ... FAILED
 comments ... FAILED
 oidjoins ... FAILED
test horology ... FAILED
test misc ... FAILED

 . . . [ goes away to do more testing ] . . .

This time with the already installed database after initdb:

$ make installcheck

now I get scary stuff like:

--- begin scary stuff ---
test int2 ... ERROR:  pg_atoi: error in "34.5": can't
parse ".5"
ERROR:  pg_atoi: error reading "10": Result too large
ERROR:  pg_atoi: error in "asdf": can't parse "asdf"
ok
test int4 ... ERROR:  pg_atoi: error in "34.5": can't
parse ".5"
ERROR:  pg_atoi: error reading "1": Result too large
ERROR:  pg_atoi: error in "asdf": can't parse "asdf"
ok
test int8 ... ok
test oid  ... ERROR:  oidin: error in "asdfasd": can't
parse "asdfasd"
ERROR:  oidin: error in "99asdfasd": can't parse "asdfasd"
ok
test float4   ... ERROR:  Bad float4 input format --
overflow
--- end scary stuff ---

However, it works! All 76 tests pass.

 . . . [ goes away to do more testing ] . . .

running make installcheck again gives:

test random   ... failed (ignored)

 . . . [ goes away to do more testing ] . . .

All 76 tests pass.
-- end dirty details --