* Bruce Momjian <[EMAIL PROTECTED]> [010124 07:58] wrote:
>
> I have added this email to TODO.detail and a mention in the TODO list.
The bug mentioned here is long gone, however the problem with
issuing non-blocking COPY commands is still present (8k limit on
buffer size). I hope to get to fix this sometime soon, but you
shouldn't worry about the "normal" path.
There's also a bug with PQCopyEnd(sp?) where it can still block because
it automagically calls into a routine that select()'s waiting for data.
It's on my TODO list as well, but a little behind a several thousand
line server I'm almost complete with.
-Alfred
>
> > >> Um, I didn't have any trouble at all reproducing Patrick's complaint.
> > >> pg_dump any moderately large table (I used tenk1 from the regress
> > >> database) and try to load the script with psql. Kaboom.
> >
> > > This is after or before my latest patch?
> >
> > Before. I haven't updated since yesterday...
> >
> > > I can't seem to reproduce this problem,
> >
> > Odd. Maybe there is something different about the kernel's timing of
> > message sending on your platform. I see it very easily on HPUX 10.20,
> > and Patrick sees it very easily on whatever he's using (netbsd I think).
> > You might try varying the situation a little, say
> > psql mydb > psql -f dumpfile mydb
> > psql mydb
> > \i dumpfile
> > and the same with -h localhost (to get a TCP/IP connection instead of
> > Unix domain). At the moment (pre-patch) I see failures with the
> > first two of these, but not with the \i method. -h doesn't seem to
> > matter for me, but it might for you.
> >
> > > Telling me something is wrong without giving suggestions on how
> > > to fix it, nor direct pointers to where it fails doesn't help me
> > > one bit. You're not offering constructive critism, you're not
> > > even offering valid critism, you're just waving your finger at
> > > "problems" that you say exist but don't pin down to anything specific.
> >
> > I have been explaining it as clearly as I could. Let's try it
> > one more time.
> >
> > > I spent hours looking over what I did to pqFlush and pqPutnBytes
> > > because of what you said earlier when all the bug seems to have
> > > come down to is that I missed that the socket is set to non-blocking
> > > in all cases now.
> >
> > Letting the socket mode default to blocking will hide the problems from
> > existing clients that don't care about non-block mode. But people who
> > try to actually use the nonblock mode are going to see the same kinds of
> > problems that psql is exhibiting.
> >
> > > The old sequence of events that happened was as follows:
> >
> > > user sends data almost filling the output buffer...
> > > user sends another line of text overflowing the buffer...
> > > pqFlush is invoked blocking the user until the output pipe clears...
> > > and repeat.
> >
> > Right.
> >
> > > The nonblocking code allows sends to fail so the user can abort
> > > sending stuff to the backend in order to process other work:
> >
> > > user sends data almost filling the output buffer...
> > > user sends another line of text that may overflow the buffer...
> > > pqFlush is invoked,
> > > if the pipe can't be cleared an error is returned allowing the user to
> > > retry the send later.
> > > if the flush succeeds then more data is queued and success is returned
> >
> > But you haven't thought through the mechanics of the "error is returned
> > allowing the user to retry" code path clearly enough. Let's take
> > pqPutBytes for an example. If it returns EOF, is that a hard error or
> > does it just mean that the application needs to wait a while? The
> > application *must* distinguish these cases, or it will do the wrong
> > thing: for example, if it mistakes a hard error for "wait a while",
> > then it will wait forever without making any progress or producing
> > an error report.
> >
> > You need to provide a different return convention that indicates
> > what happened, say
> > EOF (-1)=> hard error (same as old code)
> > 0 => OK
> > 1 => no data was queued due to risk of blocking
> > And you need to guarantee that the application knows what the state is
> > when the can't-do-it-yet return is made; note that I specified "no data
> > was queued" above. If pqPutBytes might queue some of the data before
> > returning 1, the application is in trouble again. While you apparently
> > foresaw that in recoding pqPutBytes, your code doesn't actually work.
> > There is the minor code bug that you fail to update "avail" after the
> > first pqFlush call, and the much more fundamental problem that you
> > cannot guarantee to have queued all or none of the data. Think about
> > what happens if the passed nbytes is larger than the output buffer size.
> > You may pass the first pqFlush successfully, then get into the loop and
> > get a won't-block return from pqFlush