Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Greywolf
On Wed, 5 Feb 2003, Tom Lane wrote:

[TL: Could be.  By heritage I meant BSD-without-any-adjective.  It is
[TL: perfectly clear from Leffler, McKusick et al. (_The Design and
[TL: Implementation of the 4.3BSD UNIX Operating System_) that back then,
[TL: 8K was the standard filesystem block size.

FS block size !=  Disk Buffer Size.  Though 8k might have been the
standard FS block size, it was possible -- and occasionally practiced
-- to do 4k/512 filesystems, or 16k/2k filesystems, or M/N filesystems
where { 4k  M  16k (maybe 32k), log2(M) == int(log2(M)),
log2(N) == int(log2(N)) and M/N = 8 }.


--*greywolf;
--
NetBSD: making all computer hardware a commodity.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Ian Fry
On Wed, Feb 05, 2003 at 12:18:29PM -0500, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  On Wednesday 05 February 2003 11:49, Tom Lane wrote:
  I wonder if it is possible that, every so often,
  you are losing just the last few bytes of an NFS transfer?
  Yah, that's kind of what it looked like when I tried this before
  Christmas too although the actual errors differd.
 Wild thought here: can you reduce the MTU on the LAN linking the NFS
 server to the NetBSD box?  If so, does it help?

How about adjusting the read and write-size used by the NetBSD machine? I think
the default is 32k for both read and write on i386 machines now. Perhaps try
setting them back to 8k (it's the -r and -w flags to mount_nfs, IIRC)

Ian.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Greywolf
On Wed, 5 Feb 2003, D'Arcy J.M. Cain wrote:

[DJC: This feels rather fragile.  I doubt that it is hardware related because I dad
[DJC: tried it on the other ethernet interface in the machine which was on a
[DJC: completely different network than the one I am on now.

All I can offer up is that at one point I had to reduce to 16k NFSIO
when I replaced a switch (you didn't replace a switch, did you?) between
my i386 and my sparc (my le0 and the switch didn't play nicely together;
once I got the hme0 in, everything was happy as a clam).

[DJC: What is the implication of smaller read and write size?  Will I
[DJC: necessarily take a performance hit?

I didn't start noticing observable degradation across 100TX until I
dropped NFSIO to 4k (which I did purely for benchmarking statistics).

The differences between 8k, 16k and 32k have not been noticeable
to me.  32k IO would hang my system at one point; since that time,
something appears to have been fixed.

[DJC: --
[DJC: D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
[DJC: http://www.druid.net/darcy/|  and a sheep voting on
[DJC: +1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
[DJC:


--*greywolf;
--
NetBSD: Servers' choice!


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Thor Lancelot Simon
On Wed, Feb 05, 2003 at 03:09:09PM -0500, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  On Wednesday 05 February 2003 13:04, Ian Fry wrote:
  How about adjusting the read and write-size used by the NetBSD machine? I
  think the default is 32k for both read and write on i386 machines now.
  Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
  IIRC)
 
  Hey!  That did it.
 
 Hot diggety!
 
  So, why does this fix it?

Who knows.  One thing that I'd be interested to know is whether Darcy is
using NFSv2 or NFSv3 -- 32k requests are not, strictly speaking, within
the bounds of the v2 specification.  If he is using UDP rather than TCP
as the transport layer, another potential issue is that 32K requests will
end up as IP packets with a very large number of fragments, potentially
exposing some kind of network stack bug in which the last fragment is
dropped or corrupted (I would suspect that the likelihood of such a bug
in the NetApp stack is quite low, however).  If feasible, it is probably
better to use TCP as the transport and let it handle segmentation whether
the request size is 8K or 32K.

 I think now you file a bug report with the NetBSD kernel folk.  My
 thoughts are running in the direction of a bug having to do with
 scattering a 32K read into multiple kernel disk-cache buffers or
 gathering together multiple cache buffer contents to form a 32K write.

That doesn't make much sense to me.  Pages on i386 are 4K, so whether he
does 8K writes or 32K writes, it will always come from multiple pages in
the pagecache.

 Unless NetBSD has changed from its heritage, the kernel disk cache
 buffers are 8K, and so an 8K NFS read or write would never cross a
 cache buffer boundary.  But 32K would.

I don't know what heritage you're referring to, but it has never been
the case that NetBSD's buffer cache has used fixed-size 8K disk buffers,
and I don't believe that it was ever the case for any Net2 or 4.4-derived
system.

 Or it could be a similar bug on the NFS server's side?

That's concievable.  Of course, a client bug is quite possible, as well,
but I don't think the mechanism you suggest is likely.

-- 
 Thor Lancelot Simon  [EMAIL PROTECTED]
   But as he knew no bad language, he had called him all the names of common
 objects that he could think of, and had screamed: You lamp!  You towel!  You
 plate! and so on.  --Sigmund Freud

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Andrew Gillham
On Wed, Feb 05, 2003 at 09:24:48PM +, David Laight wrote:
  If he is using UDP rather than TCP
  as the transport layer, another potential issue is that 32K requests will
  end up as IP packets with a very large number of fragments, potentially
  exposing some kind of network stack bug in which the last fragment is
  dropped or corrupted.
 
 Actually it is worse that that, and IMHO 32k UDP requests are asking for
 trouble.
 
 A 32k UDP datagram is about 22 ethernet packets.  If ANY of them is
 lost on the network, then the entire datagram is lost.  NFS must
 regenerate the request on a timeout.  The receiving system won't
 report that it is missing a fragment.

As he stated several times, he has tested with TCP mounts and observed
the same issue.  So the above issue shouldn't be related.

 There are also an lot of ethernet cards out there which don't have
 enough buffer space for 32k of receive data.   Not to mention the
 fact that NFS can easily (at least on some systems) generate
 concurrent requests for different parts of the same file.
 
 I would suggest reducing the size back to 8k, even that causes
 trouble with some cards.

If NetBSD as an NFS client is this fragile we have problems.  The default
read/write size shouldn't be 32kB if that is not going to work reliably.

 It should also be realised that transmitting 22 full sized, back
 to back frames on the ethernet doesn't do anything for sharing
 the bandwidth betweenn different users.  The MAC layer has to very
 aggressive in order to get a packet in edgeways (so to speak).

So what?  If it is a switched network, which I assume it is since he was
talking to the NetApp gigabit port earlier, then this is irrelevant.  Even
the $40 Fry's switches are more or less non-blocking. 

Even if he is saturating the local *hub*, it shouldn't cause NetBSD to fail,
it would just be rude. :-)

There could be some packet mangling on the network, checking the amount
of retransmissions on either end of the TCP connection should give you an
idea about that.

-Andrew

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Byron Servies
On February 06, 2003 at 03:50, Justin Clift wrote:
 Tom Lane wrote:
 snip
 Hoo boy.  I was already suspecting data corruption in the index, and
 this looks like more of the same.  My thoughts are definitely straying
 in the direction of the NFS server is dropping bits, somehow.
 
 Both this and the (admittedly unproven) bt_moveright loop suggest
 corrupted values in the cross-page links that exist at the very end of
 each btree index page.  I wonder if it is possible that, every so often,
 you are losing just the last few bytes of an NFS transfer?
 
 Hmmm... does anyone remember the name of that NFS testing tool the 
 FreeBSD guys were using?  Think it came from Apple.  They used it to 
 find and isolate bugs in the FreeBSD code a while ago.
 
 Sounds like it might be useful here.
 
 :-)
 

fsx.  See also http://www.connectathon.org

hth,

Byron

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Greg A. Woods
[ On Friday, January 31, 2003 at 11:54:27 (-0500), D'Arcy J.M. Cain wrote: ]
 Subject: Re: PostgreSQL, NetBSD and NFS

 On Thursday 30 January 2003 18:32, Simon J. Gerraty wrote:
  Is postgreSQL trying to lock a file perhaps?  Would seem a sensible thing
  for it to be doing...
 
 Is that a problem?  FWIW I am running statd and lockd on the NetBSD box.

NetBSD's NFS implementation only supports locking as a _server_, not a
client.

http://www.unixcircle.com/features/nfs.php

   Optional for file locking (lockd+statd):

   lockd:

   Rpc.lockd is a daemon which provides file and record-locking services
   in an NFS environment.

   FreeBSD, NetBSD and OpenBSD file locking is only supported on server
   side.

NFS server support for locking was introduced in NetBSD-1.5:

http://www.netbsd.org/Releases/formal-1.5/NetBSD-1.5.html

 * Server part of NFS locking (implemented by rpc.lockd(8)) now works.  

and as you can also see from rcp.lockd/lockd.c:


revision 1.5
date: 2000/06/07 14:34:40;  author: bouyer;  state: Exp;  lines: +67 -25
Implement file locking in lockd. All the stuff is done in userland, using
fhopen() and flock(). This means that if you kill lockd, all locks will
be relased (but you're supposed to kill statd at the same time, so
remote hosts will know it and re-establish the lock).
Tested against solaris 2.7 and linux 2.2.14 clients.
Shared lock are not handled efficiently, they're serialised in lockd when they
could be granted.



Terry Lambert has some proposed fixes to add NFS client level locking to
the FreeBSD kernel:

http://www.freebsd.org/~terry/DIFF.LOCKS.txt
http://www.freebsd.org/~terry/DIFF.LOCKS.MAN
http://www.freebsd.org/~terry/DIFF.LOCKS

-- 
Greg A. Woods

+1 416 218-0098;[EMAIL PROTECTED];   [EMAIL PROTECTED]
Planix, Inc. [EMAIL PROTECTED]; VE3TCP; Secrets of the Weird [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-06 Thread Manuel Bouyer
On Thu, Jan 30, 2003 at 01:27:59PM -0600, Greg Copeland wrote:
 That was going to be my question too.
 
 I thought NFS didn't have some of the requisite file system behaviors
 (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
 reliably.

I don't know what locking sheme PostgreSQL use, but in theory it should
be possible to use it over NFS:
- a fflush()/msync() should work the same way on a NFS filesystem as on a
  local filesystem, provided the client and server implements the NFS
  protocol properly
- locking via temp files works over NFS, again provided the client and server
  implements the NFS protocol properly (this is why you can safely read your
  mailbox over NFS, for example). If PostgreSQL uses flock or fcntl, it's
  a problem.

-- 
Manuel Bouyer [EMAIL PROTECTED]
 NetBSD: 24 ans d'experience feront toujours la difference
--

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread James Hubbard
Justin Clift wrote:

Hmmm... does anyone remember the name of that NFS testing tool the 
FreeBSD guys were using?  Think it came from Apple.  They used it to 
find and isolate bugs in the FreeBSD code a while ago.

Sounds like it might be useful here.

:-)


You can find a write about it here:
http://kerneltrap.org/node.php?id=327

The actual link to the source
http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/fsx/

James



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Kevin Brown
Tom Lane wrote:
 Greg Copeland [EMAIL PROTECTED] writes:
  On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
  Wild thought here: can you reduce the MTU on the LAN linking the NFS
  server to the NetBSD box?  If so, does it help?
 
  I'm curious as to why you think adjusting the MTU may have an effect on
  this.  Lowering the MTU may actually increase fragmentation, lower
  efficiency, and even exacerbate the situation.
 
 I'm thinking maybe one or both LAN cards have a problem with packets
 exceeding a certain size.

But he's using NFS over TCP, so any traffic that gets truncated or
dropped should simply result in a TCP retransmit (since the packet's
data won't match its checksum anymore, and it'll get dropped on the
floor).

Of course, if the NFS layer is actually transferring data via UDP
despite explicitly being told to mount via TCP, that's something else.
It might be useful to verify via netstat that an actual TCP connection
to the NFS server is being established and used.


Makes me wonder if this might be a problem at the NFS protocol
layer...



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Justin Clift
James Hubbard wrote:

Justin Clift wrote:


Hmmm... does anyone remember the name of that NFS testing tool the 
FreeBSD guys were using?  Think it came from Apple.  They used it to 
find and isolate bugs in the FreeBSD code a while ago.

Sounds like it might be useful here.

:-)

You can find a write about it here:
http://kerneltrap.org/node.php?id=327

The actual link to the source
http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/fsx/


Thanks James.

That's definitely the one.

D'Arcy, if you want to test if your NFS layer is stable, this might 
really help.  It's a single C file that get compiled, and you run it 
against a remote NFS file.

This is supposed to be one of those tools that will try to trip up the 
NFS layer in every possible way, without violating the spec, etc.

Hope this is useful.

:)

Regards and best wishes,

Justin Clift

James



--
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
- Indira Gandhi


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 Well, it does appear to be working but it never finishes.  Here are two 
 backtraces.  One was taken while it was running and the other after a kill 
 -9.  The primary key file should have had 322846720 bytes based on the 
 database that I was copying in but it only had 4603904 after running the 
 restore for 12 hours.  The file seems to get to a static size and just stays 
 there.  I am running another test to confirm that.

Hmm --- seems like it must be getting into an infinite loop, but where
and why?  Here is a test plan:

1. Run it, let it reach the point where the file size stops growing.

2. Attach to process with gdb.  Repeatedly do 'fin' to finish out current
function call, until the prompt doesn't come back any more.  Whichever
level of function didn't finish reasonably quickly is the one that's
looping.

3. Control-C to get control back in gdb.  Do 'fin' enough times to get
back to the looping function, but not the extra time to let it run.
Now, use 'next' repeatedly to see just what lines it's circling around
in, and print out the values of its local variables as it does so.

That info should move the investigation forward ...

From looking at your existing dumps I will hazard a guess that
_bt_moveright is looping ... but why?  And why should that happen
only with NFS?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread D'Arcy J.M. Cain
On Wednesday 05 February 2003 10:12, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  Well, it does appear to be working but it never finishes.  Here are two
  backtraces.  One was taken while it was running and the other after a
  kill -9.  The primary key file should have had 322846720 bytes based on
  the database that I was copying in but it only had 4603904 after running
  the restore for 12 hours.  The file seems to get to a static size and
  just stays there.  I am running another test to confirm that.

 Hmm --- seems like it must be getting into an infinite loop, but where
 and why?  Here is a test plan:

Hmm.  This time it passed that point but this happened:

COPY certificate FROM stdin;
NOTICE:  copy: line 253677, bt_insertonpg[certificate_pkey]: parent page 
unfound - fixing branch
ERROR:  copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item 
order(1) (need to recreate index)
lost synchronization with server, resetting connection

It then continued on.  It is currently stuck on the next largest table in our 
system.  I will try this if it hangs on that other table.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 Hmm.  This time it passed that point but this happened:

 COPY certificate FROM stdin;
 NOTICE:  copy: line 253677, bt_insertonpg[certificate_pkey]: parent page 
 unfound - fixing branch
 ERROR:  copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item 
 order(1) (need to recreate index)

Hoo boy.  I was already suspecting data corruption in the index, and
this looks like more of the same.  My thoughts are definitely straying
in the direction of the NFS server is dropping bits, somehow.

Both this and the (admittedly unproven) bt_moveright loop suggest
corrupted values in the cross-page links that exist at the very end of
each btree index page.  I wonder if it is possible that, every so often,
you are losing just the last few bytes of an NFS transfer?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread D'Arcy J.M. Cain
On Wednesday 05 February 2003 11:49, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  Hmm.  This time it passed that point but this happened:
 
  COPY certificate FROM stdin;
  NOTICE:  copy: line 253677, bt_insertonpg[certificate_pkey]: parent page
  unfound - fixing branch
  ERROR:  copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item
  order(1) (need to recreate index)

 Hoo boy.  I was already suspecting data corruption in the index, and
 this looks like more of the same.  My thoughts are definitely straying
 in the direction of the NFS server is dropping bits, somehow.

 Both this and the (admittedly unproven) bt_moveright loop suggest
 corrupted values in the cross-page links that exist at the very end of
 each btree index page.  I wonder if it is possible that, every so often,
 you are losing just the last few bytes of an NFS transfer?

Yah, that's kind of what it looked like when I tried this before Christmas too 
although the actual errors differd.  At that time I got a PostgreSQL error 
that implied that something that was just written was not there when it went 
back.  Almost like a flushing issue.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 On Wednesday 05 February 2003 11:49, Tom Lane wrote:
 I wonder if it is possible that, every so often,
 you are losing just the last few bytes of an NFS transfer?

 Yah, that's kind of what it looked like when I tried this before
 Christmas too although the actual errors differd.

The observed behavior could vary wildly depending on what data happened to
get read.

Wild thought here: can you reduce the MTU on the LAN linking the NFS
server to the NetBSD box?  If so, does it help?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Justin Clift
Tom Lane wrote:
snip

Hoo boy.  I was already suspecting data corruption in the index, and
this looks like more of the same.  My thoughts are definitely straying
in the direction of the NFS server is dropping bits, somehow.

Both this and the (admittedly unproven) bt_moveright loop suggest
corrupted values in the cross-page links that exist at the very end of
each btree index page.  I wonder if it is possible that, every so often,
you are losing just the last few bytes of an NFS transfer?


Hmmm... does anyone remember the name of that NFS testing tool the 
FreeBSD guys were using?  Think it came from Apple.  They used it to 
find and isolate bugs in the FreeBSD code a while ago.

Sounds like it might be useful here.

:-)

Regards and best wishes,

Justin Clift


			regards, tom lane



--
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
- Indira Gandhi


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Greg Copeland
On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  On Wednesday 05 February 2003 11:49, Tom Lane wrote:
  I wonder if it is possible that, every so often,
  you are losing just the last few bytes of an NFS transfer?
 
  Yah, that's kind of what it looked like when I tried this before
  Christmas too although the actual errors differd.
 
 The observed behavior could vary wildly depending on what data happened to
 get read.
 
 Wild thought here: can you reduce the MTU on the LAN linking the NFS
 server to the NetBSD box?  If so, does it help?
 

Tom,

I'm curious as to why you think adjusting the MTU may have an effect on
this.  Lowering the MTU may actually increase fragmentation, lower
efficiency, and even exacerbate the situation.

Is this purely a diagnostic suggestion?


Regards,

-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Tom Lane
Greg Copeland [EMAIL PROTECTED] writes:
 On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
 Wild thought here: can you reduce the MTU on the LAN linking the NFS
 server to the NetBSD box?  If so, does it help?

 I'm curious as to why you think adjusting the MTU may have an effect on
 this.  Lowering the MTU may actually increase fragmentation, lower
 efficiency, and even exacerbate the situation.

I'm thinking maybe one or both LAN cards have a problem with packets
exceeding a certain size.

 Is this purely a diagnostic suggestion?

Well, if it changes anything then it would definitely show there's a
hardware problem to fix...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread D'Arcy J.M. Cain
On Wednesday 05 February 2003 13:04, Ian Fry wrote:
  Wild thought here: can you reduce the MTU on the LAN linking the NFS
  server to the NetBSD box?  If so, does it help?

 How about adjusting the read and write-size used by the NetBSD machine? I
 think the default is 32k for both read and write on i386 machines now.
 Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
 IIRC)

Hey!  That did it.  I hadn't tried that before because I had tried using the 
tcp option to mount and the docs suggested that that would do more than 
reducing the block size.  Besides, the man page didn't give the defaults and 
I was uncomfortable changing something when I didn't know from what.

So, why does this fix it?  It seems to me that it should have worked anyway.  
This feels rather fragile.  I doubt that it is hardware related because I dad 
tried it on the other ethernet interface in the machine which was on a 
completely different network than the one I am on now.

What is the implication of smaller read and write size?  Will I necessarily 
take a performance hit?

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 On Wednesday 05 February 2003 13:04, Ian Fry wrote:
 How about adjusting the read and write-size used by the NetBSD machine? I
 think the default is 32k for both read and write on i386 machines now.
 Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
 IIRC)

 Hey!  That did it.

Hot diggety!

 So, why does this fix it?

I think now you file a bug report with the NetBSD kernel folk.  My
thoughts are running in the direction of a bug having to do with
scattering a 32K read into multiple kernel disk-cache buffers or
gathering together multiple cache buffer contents to form a 32K write.
Unless NetBSD has changed from its heritage, the kernel disk cache
buffers are 8K, and so an 8K NFS read or write would never cross a
cache buffer boundary.  But 32K would.

Or it could be a similar bug on the NFS server's side?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-05 Thread Christopher Kings-Lynne
 Hmmm... does anyone remember the name of that NFS testing tool the 
 FreeBSD guys were using?  Think it came from Apple.  They used it to 
 find and isolate bugs in the FreeBSD code a while ago.

fsx

Chris


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-02 Thread D'Arcy J.M. Cain
On Saturday 01 February 2003 15:48, Tom Lane wrote:
 More and more bizarre.  What is the hardware platform --- does it have TAS?

NetBSD on a Pentium (i386 port) so yes, it does have TAS.  I assume you were 
thinking about the spinlock emulation.

I have been looking through backend/storage/lmgr/lwlock.c and 
backend/storage/lmgr/spin.c myself and can't find any place that it can get 
into an infinite loop without making a system call within the loop.  It's 
very odd.  Also odd, why would running over NFS have any bearing on it if we 
could find such a place?

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-02 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 Also odd, why would running over NFS have any bearing on it if we 
 could find such a place?

Yup, 'tis the question.  The only theory I have been able to come up
with is that there's something flaky about your network hardware,
such that Postgres sometimes reads bad data from the NFS server.
But the glaring problem with that theory is that bad data coming
from a regular disk drive generally results in error messages or
core dumps.  Silent hangs would be a new behavior AFAIR.

At this point I think you need to rebuild with --enable-debug and
--enable-cassert (if you didn't already) and then capture some
stack traces from the stuck backend.  We have to find out what the
backend thinks it's doing.

BTW: *are* we certain it's associated with NFS, and not a hardware
problem on your NetBSD box?  Can you perform the same tests running
the database off a local disk?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-02 Thread D'Arcy J.M. Cain
On Sunday 02 February 2003 12:26, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  Also odd, why would running over NFS have any bearing on it if we
  could find such a place?

 Yup, 'tis the question.  The only theory I have been able to come up
 with is that there's something flaky about your network hardware,

Possible but two separate networks?

 At this point I think you need to rebuild with --enable-debug and
 --enable-cassert (if you didn't already) and then capture some
 stack traces from the stuck backend.  We have to find out what the
 backend thinks it's doing.

That was going to be my next step.

 BTW: *are* we certain it's associated with NFS, and not a hardware
 problem on your NetBSD box?  Can you perform the same tests running
 the database off a local disk?

That box is running 5 production database engines on 5 different ports.  This 
is the 6th one and the only difference is that it is running from the NFS 
mounted drive.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread D'Arcy J.M. Cain
On Saturday 01 February 2003 13:09, Tom Lane wrote:
 Very bizarre.  Looks like the last page it read was block 104
 (851968/8192) in file /source/data/cert/base/16556/17063.  Could you
 provide a formatted dump of that page?  I'm partial to pg_filedump which
 you can get from http://sources.redhat.com/rhdb/tools.html.  Use
 switches -f -i to get a reasonably complete dump.

That's a 4.7 MB file.  The dump might be quite huge.  I can send you the file 
itself (privately) if you want.  Wouldn't that be even better?

I can tell you what the file is.  It is the primary key file for the 
certificate database which is the 8 million record table that I am trying to 
load.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 That's a 4.7 MB file.  The dump might be quite huge.

I really just want to see the dump of that one page, and maybe the pages
before and after it for comparison's sake.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread Tom Lane
What was the query it failed on, exactly?  That last page it read
seems to be an empty index page --- it should have moved on to the
next index page, I'd think, rather than doing anything that could
hang up.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread D'Arcy J.M. Cain
On Saturday 01 February 2003 14:00, Tom Lane wrote:
 What was the query it failed on, exactly?  That last page it read
 seems to be an empty index page --- it should have moved on to the
 next index page, I'd think, rather than doing anything that could
 hang up.

Here's the log.  As you can see, nothing was logged after the COPY command.

It's possible that the file was corrupted.  I will do a new test from scratch 
now that I am not switching speeds.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

Feb  1 04:21:17 matrix pg_test[7432]: [3] DEBUG:  connection: host=192.168.10.75 user=darcy database=cert
Feb  1 04:21:17 matrix pg_test[7432]: [4] DEBUG:  InitPostgres
Feb  1 04:21:17 matrix pg_test[7432]: [5] DEBUG:  StartTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [6] DEBUG:  query: select getdatabaseencoding()
Feb  1 04:21:17 matrix pg_test[7432]: [7] DEBUG:  ProcessQuery
Feb  1 04:21:17 matrix pg_test[7432]: [8] DEBUG:  CommitTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [9] DEBUG:  StartTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [10] DEBUG:  query: SELECT usesuper FROM pg_user WHERE usename = 'darcy'
Feb  1 04:21:17 matrix pg_test[7432]: [11] DEBUG:  ProcessQuery
Feb  1 04:21:17 matrix pg_test[7432]: [12] DEBUG:  CommitTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [13] DEBUG:  StartTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [14] DEBUG:  query: UPDATE pg_class SET reltriggers = 0 WHERE relname = 'certificate';
Feb  1 04:21:17 matrix pg_test[7432]: [15] DEBUG:  ProcessQuery
Feb  1 04:21:17 matrix pg_test[7432]: [16] DEBUG:  CommitTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [17] DEBUG:  StartTransactionCommand
Feb  1 04:21:17 matrix pg_test[7432]: [18] DEBUG:  query: COPY certificate FROM stdin;
Feb  1 04:21:17 matrix pg_test[7432]: [19] DEBUG:  ProcessUtility: COPY certificate FROM stdin;


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread D'Arcy J.M. Cain
On Saturday 01 February 2003 14:43, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  Here's the log.  As you can see, nothing was logged after the COPY
  command.

 What else was going on?  As far as I can see, the code never does a
 semop unless it's waiting for some other backend process.

Nothing except the standard background processes are running.  The ktrace.out 
I gave the ftp address for has everything that that instance of PostgreSQL 
was doing.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 On Saturday 01 February 2003 14:43, Tom Lane wrote:
 What else was going on?  As far as I can see, the code never does a
 semop unless it's waiting for some other backend process.

 Nothing except the standard background processes are running.

More and more bizarre.  What is the hardware platform --- does it have TAS?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-02-01 Thread Curt Sampson
On Fri, 31 Jan 2003, mlw wrote:

 . There are always issues with file locking across various
 platforms. I recall reading about mmap issues across NFS a while ago...

Postgres uses neither of these, IIRC, so that should be fine. (Actually,
postgres does effectively use mmap for shared memory on NetBSD, but
that's not mapping data on the NFS filesystem, so it's not an issue.)

 The NFS client may also have isses with locking, fsync, and mmap.

Any fsync problems would affect data integrity during a crash, but
nothing otherwise.

(Of course, I'm happy to be corrected on any of these issues, if someone
can point out particular parts of postgres that would fail over NFS.)

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-31 Thread D'Arcy J.M. Cain
On Thursday 30 January 2003 18:32, Simon J. Gerraty wrote:
 Is postgreSQL trying to lock a file perhaps?  Would seem a sensible thing
 for it to be doing...

Is that a problem?  FWIW I am running statd and lockd on the NetBSD box.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-31 Thread Bill Studenmund
On Fri, 31 Jan 2003, D'Arcy J.M. Cain wrote:

 On Thursday 30 January 2003 12:07, Tom Lane wrote:
  Perhaps the next thing to do is to strace (ktrace, trace, truss,
  whatever system-call tracing utility you got) the postmaster and
  child processes.  If we could determine what system call is hanging up,
  we might be a little closer to solving the mystery.

 Ktrace.  Yes, am doing another test at the moment - using 100Mb to 100Mb and
 TCP option to the mount.  Before I was using the default UDP and going 100Mb
 to 1000 Mb.  If this works I will try my guaranteed fail next and will add
 ktrace.  In fact, I will do that regardless.

Look at the -t option to ktrace. It controls what ktrace looks at
(syscalls, NAMEI lookups, etc.). Most importantly, you might want to NOT
include the 'i' option in there, which is in there by default. It logs the
data of all i/o transfers, which baloons the logs. While you may need the
data in the end, tracing w/o 'i' could show you the syscalls around the
failure which might be enough.

Take care,

Bill


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-31 Thread mlw

D'Arcy J.M. Cain wrote:


On Thursday 30 January 2003 14:02, mlw wrote:
 

Forgive my stupidity, are you running PostgreSQL with the data on an NFS
share?
   


Yes, sorry.  PostgreSQL is running from the local disk but the data is on the 
mounted drive.

I'm not sure, I guess it could work, but NFS is a pretty poor file 
system. There are always issues with file locking across various 
platforms. I recall reading about mmap issues across NFS a while ago 
(forget the platform, sorry). Depending on the NFS server, there may be 
problems there. The NFS client may also have isses with locking, fsync, 
and mmap.

If possible, look for a network block device protocol. The file level 
NFS is probably inadequate for PostgreSQL.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-31 Thread D'Arcy J.M. Cain
On Thursday 30 January 2003 12:07, Tom Lane wrote:
 D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
  I have posted before about this but I am now posting to both NetBSD and
  PostgreSQL since it seems to be some sort of interaction between the two.
   I have a NetAPP filer on which I am putting a PostgreSQL database.  I
  run PostgreSQL on a NetBSD box.  I used rsync to get the database onto
  the filer with no problem whatsoever but as soon as I try to open the
  database the NFS mount hangs and I can't do any operations on that
  mounted drive without hanging.

 That's darn odd.  But please be more specific: what's open the
 database?  Start the postmaster?  Start a psql?  Issue a query?

Start the postmaster.  It is possible that I have a corrupted database but I 
was using that as a debugging tool because I still don't think that the whole 
NFS subsystem should lock up.  The other time I tested it took hours to fail 
and I found it useful to have an immediate fail.

 Perhaps the next thing to do is to strace (ktrace, trace, truss,
 whatever system-call tracing utility you got) the postmaster and
 child processes.  If we could determine what system call is hanging up,
 we might be a little closer to solving the mystery.

Ktrace.  Yes, am doing another test at the moment - using 100Mb to 100Mb and 
TCP option to the mount.  Before I was using the default UDP and going 100Mb 
to 1000 Mb.  If this works I will try my guaranteed fail next and will add 
ktrace.  In fact, I will do that regardless.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-31 Thread D'Arcy J.M. Cain
On Thursday 30 January 2003 14:27, Greg Copeland wrote:
 That was going to be my question too.

 I thought NFS didn't have some of the requisite file system behaviors
 (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
 reliably.

 Please correct as needed.

Yes, doubly so here please.  I think I remember someone else saying that they 
use PostgreSQL over NFS so hopefully this is not the situation.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-31 Thread D'Arcy J.M. Cain
On Thursday 30 January 2003 14:02, mlw wrote:
 Forgive my stupidity, are you running PostgreSQL with the data on an NFS
 share?

Yes, sorry.  PostgreSQL is running from the local disk but the data is on the 
mounted drive.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread Tom Lane
D'Arcy J.M. Cain [EMAIL PROTECTED] writes:
 I have posted before about this but I am now posting to both NetBSD and 
 PostgreSQL since it seems to be some sort of interaction between the two.  I 
 have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
 PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
 with no problem whatsoever but as soon as I try to open the database the NFS 
 mount hangs and I can't do any operations on that mounted drive without 
 hanging.

That's darn odd.  But please be more specific: what's open the
database?  Start the postmaster?  Start a psql?  Issue a query?

 Does the shared memory stuff use disk at all?

No, I can't see that there would be any connection there.

Perhaps the next thing to do is to strace (ktrace, trace, truss,
whatever system-call tracing utility you got) the postmaster and
child processes.  If we could determine what system call is hanging up,
we might be a little closer to solving the mystery.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread Tom Lane
Greg Copeland [EMAIL PROTECTED] writes:
 That was going to be my question too.
 I thought NFS didn't have some of the requisite file system behaviors
 (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
 reliably.

Whether the thing is trustworthy is a different issue ;-).  I was just
surprised that it didn't seem to work at all.

In practice, if the NFS server never goes down then you probably haven't
got a problem.  I'm not sure you could count on the database not getting
scrambled if the NFS server crashes.  But that wasn't the question...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread Larry Rosenman


--On Thursday, January 30, 2003 16:02:17 -0500 Tom Lane [EMAIL PROTECTED] 
wrote:

Greg Copeland [EMAIL PROTECTED] writes:

That was going to be my question too.
I thought NFS didn't have some of the requisite file system behaviors
(locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
reliably.


Whether the thing is trustworthy is a different issue ;-).  I was just
surprised that it didn't seem to work at all.

In practice, if the NFS server never goes down then you probably haven't
got a problem.  I'm not sure you could count on the database not getting
scrambled if the NFS server crashes.  But that wasn't the question...

FWIW I use a netapp filer for my databases here for traffic analysis and IP 
management.

The NETAPP has battery backed NVRAM and will replay the right stuff on it's 
own.

Just another datapoint.

LER


			regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])





--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread D'Arcy J.M. Cain
I have posted before about this but I am now posting to both NetBSD and 
PostgreSQL since it seems to be some sort of interaction between the two.  I 
have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
with no problem whatsoever but as soon as I try to open the database the NFS 
mount hangs and I can't do any operations on that mounted drive without 
hanging.  Other things continue to run but the minute I do a df or an ls on 
that drive that terminal is lost.

On the NetBSD side I get a server not responding error.  On the filer I see 
no problems at all.  A reboot of the filer doesn't correct anything.

Since NetBSD works just fine with this until I start PostgreSQL and 
PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
there is something out of the ordinary about PostgreSQL's disk access that is 
triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
at all?  Perhaps that's the difference between PostgreSQL and other 
applications.

The NetApp people are being very helpful and are willing to follow up any 
leads people might have and may even suggest fixes if necessary.  I have 
Bcc'd the engineer on this message and will send anything I get to them.

-- 
D'Arcy J.M. Cain darcy@{druid|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread mlw
Forgive my stupidity, are you running PostgreSQL with the data on an NFS 
share?


D'Arcy J.M. Cain wrote:

I have posted before about this but I am now posting to both NetBSD and 
PostgreSQL since it seems to be some sort of interaction between the two.  I 
have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
with no problem whatsoever but as soon as I try to open the database the NFS 
mount hangs and I can't do any operations on that mounted drive without 
hanging.  Other things continue to run but the minute I do a df or an ls on 
that drive that terminal is lost.

On the NetBSD side I get a server not responding error.  On the filer I see 
no problems at all.  A reboot of the filer doesn't correct anything.

Since NetBSD works just fine with this until I start PostgreSQL and 
PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
there is something out of the ordinary about PostgreSQL's disk access that is 
triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
at all?  Perhaps that's the difference between PostgreSQL and other 
applications.

The NetApp people are being very helpful and are willing to follow up any 
leads people might have and may even suggest fixes if necessary.  I have 
Bcc'd the engineer on this message and will send anything I get to them.

 




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread Greg Copeland
That was going to be my question too.

I thought NFS didn't have some of the requisite file system behaviors
(locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
reliably.

Please correct as needed.

Regards,

Greg


On Thu, 2003-01-30 at 13:02, mlw wrote:
 Forgive my stupidity, are you running PostgreSQL with the data on an NFS 
 share?
 
 
 D'Arcy J.M. Cain wrote:
 
 I have posted before about this but I am now posting to both NetBSD and 
 PostgreSQL since it seems to be some sort of interaction between the two.  I 
 have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
 PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
 with no problem whatsoever but as soon as I try to open the database the NFS 
 mount hangs and I can't do any operations on that mounted drive without 
 hanging.  Other things continue to run but the minute I do a df or an ls on 
 that drive that terminal is lost.
 
 On the NetBSD side I get a server not responding error.  On the filer I see 
 no problems at all.  A reboot of the filer doesn't correct anything.
 
 Since NetBSD works just fine with this until I start PostgreSQL and 
 PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
 there is something out of the ordinary about PostgreSQL's disk access that is 
 triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
 at all?  Perhaps that's the difference between PostgreSQL and other 
 applications.
 
 The NetApp people are being very helpful and are willing to follow up any 
 leads people might have and may even suggest fixes if necessary.  I have 
 Bcc'd the engineer on this message and will send anything I get to them.
 
   
 
 
 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/users-lounge/docs/faq.html
-- 
Greg Copeland [EMAIL PROTECTED]
Copeland Computer Consulting


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PostgreSQL, NetBSD and NFS

2003-01-30 Thread Curt Sampson
On Thu, 30 Jan 2003, D'Arcy J.M. Cain wrote:

 Does the shared memory stuff use disk at all? Perhaps that's the
 difference between PostgreSQL and other applications.

Shared memory in NetBSD is just an interface to mmap'd pages, so it can
be swapped to disk. But I assume your swap is not on NFS

A ktrace would be helpful. Also, it would be helpful if you tried doing
an initdb to a directory on the filer to see if you can even create a
database cluster, and tried doing that or rsyncing and accessing your
data over NFS with a NetBSD system as the NFS server.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly