Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-09-18 Thread Tom Lane
Kris Jurka bo...@ejurka.com writes:
 On Fri, 6 Aug 2010, James William Pye wrote:
 I think there's a snag in the patch:

 Oh, duh.  It's a server side copy not going through the client at all. 
 Here's a hopefully final patch.

Applied with a correction: this would've totally broken binary copy in
old-style protocol, because there is no other EOF marker except the -1
in that case.

BTW, it strikes me that we could reduce the backwards-compatibility
impact of this patch if we made it ignore, rather than throw error for,
any extra data after the EOF marker.  I left it as-is since ISTM the
more error checking you can have in a binary data format, the better.
But a case could be made for doing the other thing, especially if
somebody wanted to argue for back-patching this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-28 Thread James William Pye
On Aug 9, 2010, at 11:49 AM, Kris Jurka wrote:
 Oh, duh.  It's a server side copy not going through the client at all. Here's 
 a hopefully final patch.

Trying it out... Works for me.

I understand the resistance to the patch, but it would be
quite nice to see this wart in the rear view. =\
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-09 Thread Kris Jurka



On Sat, 7 Aug 2010, Kris Jurka wrote:


On Fri, 6 Aug 2010, James William Pye wrote:


I think there's a snag in the patch:

postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY;
ERROR:  row field count is -1, expected 1
CONTEXT:  COPY data, line 4

Probably a quick/small fix away, I imagine.


Hmm, not quite sure why that is.  That seems to imply that it's not using V3 
protocol, but I thought binary copy could only be used with the V3 protocol. 
In any case, I think this new patch is more bulletproof.




Oh, duh.  It's a server side copy not going through the client at all. 
Here's a hopefully final patch.


Kris Jurka*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2058,2069  CopyFrom(CopyState cstate)
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count) ||
!   fld_count == -1)
{
done = true;
break;
}
  
if (fld_count != attr_count)
ereport(ERROR,
--- 2058,2088 
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count))
{
done = true;
break;
}
+   
+   if (fld_count == -1)
+   {
+   /*
+* Received EOF marker.  Check to see if this 
is really
+* the EOF and complain if we find any more 
data.
+* In a V3 protocol copy, this also enforces 
that we wait
+* for the protocol end of copy (CopyDone/Fail).
+*/
+   int8 unused;
+   if (CopyGetData(cstate, unused, 
sizeof(unused), sizeof(unused)))
+   {
+   ereport(ERROR,
+   
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg(received copy 
data after EOF marker)));
+   }
+ 
+   done = true;
+   break;
+   }
  
if (fld_count != attr_count)
ereport(ERROR,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-07 Thread Kris Jurka



On Fri, 6 Aug 2010, James William Pye wrote:


On Aug 6, 2010, at 4:31 PM, Kris Jurka wrote:

binary-copy-end-v2.patch


I think there's a snag in the patch:

postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY;
ERROR:  row field count is -1, expected 1
CONTEXT:  COPY data, line 4

Probably a quick/small fix away, I imagine.


Hmm, not quite sure why that is.  That seems to imply that it's not using 
V3 protocol, but I thought binary copy could only be used with the V3 
protocol.  In any case, I think this new patch is more bulletproof.


Kris Jurka
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2058,2069  CopyFrom(CopyState cstate)
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count) ||
!   fld_count == -1)
{
done = true;
break;
}
  
if (fld_count != attr_count)
ereport(ERROR,
--- 2058,2090 
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count))
{
done = true;
break;
}
+   
+   if (fld_count == -1)
+   {
+   /*
+* Reached EOF.  In protocol version 3, we must 
wait for
+* the protocol end of copy (CopyDone/Fail).  
If we
+* receive any more copy data after EOF, 
complain.
+*/
+   if (cstate-copy_dest == COPY_NEW_FE)
+   {
+   int8 unused;
+   if (CopyGetData(cstate, unused, 
sizeof(unused), sizeof(unused)))
+   {
+   ereport(ERROR,
+   
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+
errmsg(received copy data after EOF marker)));
+   }
+   }
+ 
+   done = true;
+   break;
+   }
  
if (fld_count != attr_count)
ereport(ERROR,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-06 Thread Kris Jurka



On Wed, 28 Jul 2010, James William Pye wrote:

Not directly regarding your patch, but while the discussion is in the 
general area. I think it would be wise to throw an error when non-empty 
CopyData messages are received after CopyData(EOF). Chances are that the 
user is making a mistake and should be notified of it.




As this is also the direction that Tom Lane indicated we should go, here 
is a patch which errors out after receiving any more copy data past the 
EOF marker.  This also fixes the protocol problem I previously brought up 
because the act of checking to see if there is any more data does ensure 
that if there isn't any more data in the current buffer, that we wait for 
the client to provide CopyDone/Fail.


Kris Jurka*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2058,2069  CopyFrom(CopyState cstate)
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count) ||
!   fld_count == -1)
{
done = true;
break;
}
  
if (fld_count != attr_count)
ereport(ERROR,
--- 2058,2090 
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count))
{
done = true;
break;
}
+   
+   if (fld_count == -1)
+   {
+   /*
+* Reached EOF.  In protocol version 3, we must 
wait for
+* the protocol end of copy (CopyDone/Fail).  
If we
+* receive any more copy data after EOF, 
complain.
+*/
+   if (cstate-copy_dest == COPY_NEW_FE)
+   {
+   int8 unused;
+   if (CopyGetData(cstate, unused, 
sizeof(unused), sizeof(unused)))
+   {
+   ereport(ERROR,
+   
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+
errmsg(received copy data after EOF marker)));
+   } else {
+   done = true;
+   break;
+   }
+   }
+   }
  
if (fld_count != attr_count)
ereport(ERROR,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-08-06 Thread James William Pye
On Aug 6, 2010, at 4:31 PM, Kris Jurka wrote:
 binary-copy-end-v2.patch

I think there's a snag in the patch:

postgres=# COPY data FROM '/Users/jwp/DATA.bcopy' WITH BINARY;
ERROR:  row field count is -1, expected 1
CONTEXT:  COPY data, line 4

Probably a quick/small fix away, I imagine.


But, I was able to trigger the new ERROR with py-postgresql:

 import postgresql as pg
 db=pg.open('localhost/postgres')
 q=db.prepare('copy data FROM STDIN WITH BINARY')
 from itertools import chain
 import sys
 db.pq.tracer = sys.stderr.write
 q.load_rows(chain(open('/Users/jwp/DATA.bcopy', 'rb'), (b'EXTRA',)))
↑ B(25): b'B\x00\x00\x00\x18\x00py:0x1268b30\x00\x00\x00\x00\x00\x00\x00'
↑ E(10): b'E\x00\x00\x00\t\x00\x00\x00\x00\x01'
↑ S(5): b'S\x00\x00\x00\x04'
↓ b'2'(0): b''
↓ b'G'(5): b'\x01\x00\x01\x00\x01'
↑__(7): b'PGCOPY\n'
↑__(3): b'\xff\r\n'
↑__(41): 
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00\x01\x00\x01\x00\x00\x00\x04\x00\x00\x00\x02\x00\x01\x00\x00\x00\x04\x00\x00\x00\x03\xff\xff'
↑__(5): b'EXTRA'
↑ c(5): b'c\x00\x00\x00\x04'
↑ S(5): b'S\x00\x00\x00\x04'
↓ b'E'(95): b'SERROR\x00C22P04\x00Mreceived copy data after EOF marker\x00WCOPY 
data, line 4\x00Fcopy.c\x00L2081\x00RCopyFrom\x00\x00'
↓ b'Z'(1): b'I'
Traceback (most recent call last):
  File stdin, line 1, in module
  snip
  File 
/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/postgresql/driver/pq3.py,
 line 462, in raise_server_error
raise server_error
postgresql.exceptions.BadCopyError: received copy data after EOF marker
  CODE: 22P04
  LOCATION: File 'copy.c', line 2081, in CopyFrom from SERVER
  CONTEXT: COPY data, line 4
STATEMENT: [prepared]
  sql_parameter_types: []
  statement_id: py:0x1268b30
  string: copy data FROM STDIN WITH BINARY
CONNECTION: [idle]
  client_address: ::1/128
  client_port: 63922
  version:
PostgreSQL 9.1devel on x86_64-apple-darwin10.4.0, compiled by GCC 
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit
CONNECTOR: [Host] pq://jwp:*...@localhost:5432/postgres
  category: None
DRIVER: postgresql.driver.pq3.Driver
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-29 Thread Matthew Wakeling
(Yes, I know I'm not on the hackers list. Most interested parties should 
get this directly anyway.)


Additionally the interface exposed by the JDBC driver lets the user 
write arbitrary CopyData bytes to the server, so without parsing all of 
that we don't know whether they've issued CopyData(EOF) or not.


Okay, so you can't know with absolute certainty without parsing the 
data, but the usual case would be handled by holding onto the last-N 
bytes or so. Enough to fit the EOF and perhaps a little more for 
paranoia's sake.


That's not to say that I'm missing the problem. When (not if, when) 
the user feeds data past a CopyData(EOF), it's going to get interesting.


This is the reason why the patch to the JDBC driver that I sent in is very 
fragile. In the case where a user provides a binary copy with lots of data 
after the EOF, the processCopyData method *will* get called after the 
CommandComplete and ReadyForQuery messages have been received, even if we 
try to delay processing of the ReadyForQuery message.


[Thinking about the logic necessary to handle such a case and avoid 
network buffer deadlock...] I would think the least invasive way to 
handle it would be to set the CommandComplete and ReadyForQuery messages 
aside when they are received if CopyDone hasn't been sent, continue the 
COPY operation as usual until it is shutdown, send CopyDone and, 
finally, reinstate CommandComplete and RFQ as if they were just 
received..


Basically, yes. We need to introduce a little more state into the JDBC 
driver. Currently, the driver is in one of two states:


1. In the middle of a copy.
2. Not in a copy.

These states are recorded in the lock system. We need to introduce a new 
state, where the copy is still locked, but we know that the 
CommandComplete and ReadyForQuery messages have been received. We can no 
longer unlock the copy in processCopyData - we need to do that in endCopy 
instead, after calling processCopyData to ensure that we wait for a valid 
CommandComplete and ReadyForQuery message first.


Matthew

--
Terrorists evolve but security is intelligently designed?  -- Jake von Slatt

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread James William Pye
On Jul 25, 2010, at 8:01 AM, Kris Jurka wrote:
 The JDBC driver reads server messages for multiple reasons.

 One of them is indeed to do early failure detection.

That's high quality. =)

 Another is to pickup NoticeResponse messages to avoid a network buffer 
 deadlock.

That's a good catch. I don't think psql/restore would often run into this as 
when COPY IN is in play, it's normally restoring a database. However, with 
JDBC, I imagine COPY would more often be used to do bulk loading into live 
tables that may very well cause a NOTICE. [Well, I reference psql/libpq because 
I don't recall it recognizing failure during COPY IN in the past, so I assume 
it's not receiving any data in that state.]

hrm, I suppose a lazy way around that problem would be to suspend all client 
messages(client_min_messages) during COPY IN. Tho, I guess one would still have 
to contend with NotificationResponse, and ParameterStatus..

 So this is possible to work around driver side by peeking into the network 
 stream and delaying processing of the end of copy until the driver agrees 
 that the copy is done, but

I don't think you would have to peek in. If the interface were to always hold 
onto the last message or last n-bytes submitted to be sent, it would be able to 
send the possible CopyData(EOF) and CopyDone once the COPY operation (at the 
interface level) is closed/shutdown/terminated. Granted, this is dependent on 
CopyData(EOF) not being in the middle of regular CopyData, but I gather that 
that would end in an ErrorResponse anyways.

 I still maintain that this is a server bug. It is not OK for the server to 
 assume that the client is done and move on, the client must tell the server 
 what it wants done.


I'm a bit torn here. While it would seem to be either a bug in the spec or a 
bug in the server, I'm inclined to call it a wart in the server's 
implementation of the spec.

I don't see the fix as being dangerous, but I imagine an implementor would want 
to have the workaround in place regardless. I certainly would.

I'd be in favor of seeing this fixed in 9.x, and the documentation updated to 
warn implementors about the wart in the older versions.. That is, I don't see 
any reason why we can't get rid of this unsightly thing considering the 
workarounds would still work with a wart-free server.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread Kris Jurka



On Wed, 28 Jul 2010, James William Pye wrote:



hrm, I suppose a lazy way around that problem would be to suspend all 
client messages(client_min_messages) during COPY IN. Tho, I guess one 
would still have to contend with NotificationResponse, and 
ParameterStatus..


Technically you won't get NotificationResponse until transaction end, so 
you don't need to worry about that mid copy.


I don't think you would have to peek in. If the interface were to always 
hold onto the last message or last n-bytes submitted to be sent, it 
would be able to send the possible CopyData(EOF) and CopyDone once the 
COPY operation (at the interface level) is closed/shutdown/terminated. 
Granted, this is dependent on CopyData(EOF) not being in the middle of 
regular CopyData, but I gather that that would end in an ErrorResponse 
anyways.


One of the key points of confusion is that CopyData(EOF) does not result 
in an error.  It results in ignoring any futher data.  The problem I have 
is that for text mode it waits for CopyDone, but in binary mode it ends 
the copy sequence immediately.  Additionally the interface exposed by the 
JDBC driver lets the user write arbitrary CopyData bytes to the server, so 
without parsing all of that we don't know whether they've issued 
CopyData(EOF) or not.


Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread James William Pye
On Jul 28, 2010, at 9:53 AM, Kris Jurka wrote:
 Technically you won't get NotificationResponse until transaction end, so you 
 don't need to worry about that mid copy.

Ah, thanks for noting that. It would appear my original reading of the async 
section didn't get far enough beyond Frontends must be prepared to deal with 
these messages at any time, even when not engaged in a query.. I see the note 
below clarifying NotificationResponse.

 One of the key points of confusion is that CopyData(EOF) does not result in 
 an error.
 It results in ignoring any futher data.
 The problem I have is that for text mode it waits for CopyDone, but in binary 
 mode it ends the copy sequence immediately.

That is bothersome. :\

 Additionally the interface exposed by the JDBC driver lets the user write 
 arbitrary CopyData bytes to the server, so without parsing all of that we 
 don't know whether they've issued CopyData(EOF) or not.

Okay, so you can't know with absolute certainty without parsing the data, but 
the usual case would be handled by holding onto the last-N bytes or so. Enough 
to fit the EOF and perhaps a little more for paranoia's sake.

That's not to say that I'm missing the problem. When (not if, when) the 
user feeds data past a CopyData(EOF), it's going to get interesting.

[Thinking about the logic necessary to handle such a case and avoid network 
buffer deadlock...]
I would think the least invasive way to handle it would be to set the 
CommandComplete and ReadyForQuery messages aside when they are received if 
CopyDone hasn't been sent, continue the COPY operation as usual until it is 
shutdown, send CopyDone and, finally, reinstate CommandComplete and RFQ as if 
they were just received.. I don't think that really accommodates for CopyFail 
as the status in RFQ will need to be adjusted to match what was actually 
done..? Well, I'm not sure you would need to worry about NoticeResponse after a 
premature CommandComplete as INSERTs are no longer happening. ugh.


+1 for a fix.


Not directly regarding your patch, but while the discussion is in the general 
area.
I think it would be wise to throw an error when non-empty CopyData messages are 
received after CopyData(EOF). Chances are that the user is making a mistake and 
should be notified of it.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-25 Thread Kris Jurka



On Sat, 24 Jul 2010, James William Pye wrote:


On Jul 23, 2010, at 7:11 AM, Tom Lane wrote:

I can't help thinking that the JDBC driver must be being overly cute
if this breaks it ...


I was wondering the same thing when I first saw Kris' message. However, 
iff I understand what JDBC is trying to achieve, I don't think I would 
call it overly.


Is this a problem because JDBC is trying to detect failures as early as 
possible during a COPY IN? Or, is it just JDBC's normal MO to always be 
reading?


The JDBC driver reads server messages for multiple reasons.  One of them 
is indeed to do early failure detection.  Another is to pickup 
NoticeResponse messages to avoid a network buffer deadlock.  If someone 
puts a trigger on the table you're copying data into that does RAISE 
NOTICE 'received row X' for each row, to avoid a full network buffer 
deadlock, the client must regularly read from the backend.  So as we are 
reading along, supposing that we're still mid-copy, we get a command 
complete message.  So this is possible to work around driver side by 
peeking into the network stream and delaying processing of the end of copy 
until the driver agrees that the copy is done, but I still maintain that 
this is a server bug.  It is not OK for the server to assume that the 
client is done and move on, the client must tell the server what it wants 
done.


Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-24 Thread James William Pye
On Jul 23, 2010, at 7:11 AM, Tom Lane wrote:
 I can't help thinking that the JDBC driver must be being overly cute
 if this breaks it ...

I was wondering the same thing when I first saw Kris' message. However, iff I 
understand what JDBC is trying to achieve, I don't think I would call it 
overly.

@Kris

Is this a problem because JDBC is trying to detect failures as early as 
possible during a COPY IN?
Or, is it just JDBC's normal MO to always be reading?


Well, I've wanted to do the former (early error detection) with py-postgresql's 
COPY support, and I imagine getting a read event marking completion prior to 
emitting done/fail could be a snag.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Kris Jurka



On Thu, 22 Jul 2010, Robert Haas wrote:


On Thu, Jul 22, 2010 at 5:34 PM, Kris Jurka bo...@ejurka.com wrote:


Attached is a patch to make the server continue to consume protocol data
until instructed to stop by the client in the same way as copying text data
to the server currently works.



I guess the obvious question is whether we shouldn't instead change
the docs to match the behavior.  I suspect there's almost no chance
we'd consider back-patching a change of this type, since it is a clear
behavior change.  And even if we did, there would still be people
running servers with the old behavior with which JDBC and other
drivers would have to cope.  Having two different behaviors might be
worse than the status quo.



It is a clear behavior change, but that's what bug fixes are.  I would 
advocate back-patching this because I doubt many people would be affected 
by this change and I think it would be awkward trying to document how 
things work differently in binary mode when sending a file end marker than 
in text mode or without a file end marker.  If this was fixed server side 
and backpatched, I would not modify the JDBC driver to work with older 
server versions.


The copy documentation is clear that you must call PQputCopyEnd or 
equivalent to end the copy sequence, so this would only affect people who 
are not doing that and using binary copy mode.  I doubt many people are 
using binary copy at all because of the additional difficulty in 
generating binary format data and the potential for portability problems.


Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Andrew Dunstan



Kris Jurka wrote:


I guess the obvious question is whether we shouldn't instead change
the docs to match the behavior. I suspect there's almost no chance
we'd consider back-patching a change of this type, since it is a clear
behavior change. And even if we did, there would still be people
running servers with the old behavior with which JDBC and other
drivers would have to cope. Having two different behaviors might be
worse than the status quo.



It is a clear behavior change, but that's what bug fixes are.


That was my first reaction. I don't think we're in the business if 
redefining bugs out of existence.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Tom Lane
Kris Jurka bo...@ejurka.com writes:
 Attached is a patch to make the server continue to consume protocol data 
 until instructed to stop by the client in the same way as copying text 
 data to the server currently works.

I believe this is a misunderstanding of the protocol spec.  The spec is
(intended to say that) we'll continue to accept data after reporting an
error, not that we will silently swallow an incorrect data stream and
not complain about it.  Which is what this patch will do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Robert Haas
On Fri, Jul 23, 2010 at 9:32 AM, Andrew Dunstan and...@dunslane.net wrote:
 Kris Jurka wrote:

 I guess the obvious question is whether we shouldn't instead change
 the docs to match the behavior. I suspect there's almost no chance
 we'd consider back-patching a change of this type, since it is a clear
 behavior change. And even if we did, there would still be people
 running servers with the old behavior with which JDBC and other
 drivers would have to cope. Having two different behaviors might be
 worse than the status quo.


 It is a clear behavior change, but that's what bug fixes are.

 That was my first reaction. I don't think we're in the business if
 redefining bugs out of existence.

I certainly understand that reaction - I just worry that there might
be people depending on the current behavior.  We really don't want to
get a reputation for breaking things in minor releases.  But this is
not an area of the code I'm very familiar with, and I'm not in a good
position to judge the likelihood of breakage, so I'll defer to those
who are...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Kris Jurka

On 7/23/2010 6:40 AM, Tom Lane wrote:

Kris Jurkabo...@ejurka.com  writes:

Attached is a patch to make the server continue to consume protocol data
until instructed to stop by the client in the same way as copying text
data to the server currently works.


I believe this is a misunderstanding of the protocol spec.  The spec is
(intended to say that) we'll continue to accept data after reporting an
error, not that we will silently swallow an incorrect data stream and
not complain about it.  Which is what this patch will do.



All this does is make binary mode match text mode.  Are you planning on 
changing text mode to match binary mode instead?  Currently text mode 
parsing ends at the data end marker (\.) and throws away any further 
data which may or may not be ill formatted.  For example there's no 
complaint about copying the following data file into a single column 
integer table even though there is bogus data after the file end marker


3
4
\.
asdf
aff
5
qq

Kris Jurka



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Tom Lane
Kris Jurka bo...@ejurka.com writes:
 On 7/23/2010 6:40 AM, Tom Lane wrote:
 I believe this is a misunderstanding of the protocol spec.  The spec is
 (intended to say that) we'll continue to accept data after reporting an
 error, not that we will silently swallow an incorrect data stream and
 not complain about it.  Which is what this patch will do.

 All this does is make binary mode match text mode.

The fact that text mode eats data after \. is a backwards-compatibility
kluge to match the behavior of pre-7.4 COPY.  It could very legitimately
be argued to be a bug in itself.  I don't think that we should make
binary mode match it.  The main concrete reason why not is that binary
mode has almost no redundancy.  It would be really easy for the code
change you suggest to result in data being silently discarded with no
hint of what went wrong.

After some reflection, I think the real issue here is that the JDBC
driver is depending on a behavior not stated in the protocol, which
is the relative timing of FE-to-BE and BE-to-FE messages.  Once you've
sent the EOF marker, the only correct follow-on for a spec-compliant
frontend is a CopyEnd message.  So the backend is just sending its
response a bit sooner.  There's nothing in the protocol spec forbidding
that.

I would be willing to accept a patch that avoided sending CopyEnd
immediately after receiving EOF, so long as it still threw an error
for extra data; but this is not that patch.  The larger issue though
is whether it wouldn't be better to change the driver behavior instead.
I can't help thinking that the JDBC driver must be being overly cute
if this breaks it ... and you're never going to get everybody to
upgrade their server version, even if we were willing to back-patch
the change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-23 Thread Kris Jurka



On Fri, 23 Jul 2010, Tom Lane wrote:


Kris Jurka bo...@ejurka.com writes:

On 7/23/2010 6:40 AM, Tom Lane wrote:

I believe this is a misunderstanding of the protocol spec.  The spec is
(intended to say that) we'll continue to accept data after reporting an
error, not that we will silently swallow an incorrect data stream and
not complain about it.  Which is what this patch will do.



All this does is make binary mode match text mode.


The fact that text mode eats data after \. is a backwards-compatibility
kluge to match the behavior of pre-7.4 COPY.  It could very legitimately
be argued to be a bug in itself.  I don't think that we should make
binary mode match it.  The main concrete reason why not is that binary
mode has almost no redundancy.  It would be really easy for the code
change you suggest to result in data being silently discarded with no
hint of what went wrong.


Binary copy mode already does this (eat data silently after -1 field 
count).  The patch I sent just made it follow the fe/be protocol while it 
does so.


jurka=# create table copytest (a int);
CREATE TABLE
jurka=# insert into copytest values (1);
INSERT 0 1
jurka=# \copy copytest to copydata with binary
jurka=# \! echo garbage  copydata
jurka=# \copy copytest from copydata with binary
jurka=# select * from copytest;
 a
---
 1
 1
(2 rows)



After some reflection, I think the real issue here is that the JDBC
driver is depending on a behavior not stated in the protocol, which
is the relative timing of FE-to-BE and BE-to-FE messages.  Once you've
sent the EOF marker, the only correct follow-on for a spec-compliant
frontend is a CopyEnd message.  So the backend is just sending its
response a bit sooner.  There's nothing in the protocol spec forbidding
that.


What about CopyFail?  The protocol docs say nothing about the message 
contents only about the messages themselves.


Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-22 Thread Kris Jurka


Per discussion and investigation on the -jdbc list, the server appears to 
violate the frontend/backend protocol when binary copy data is sent to the 
server.  Upon receiving the binary copy end of data marker (a -1 field 
count), the server immediately responds with CommandComplete and 
ReadyForQuery without waiting for the frontend to issue CopyDone or 
CopyFail.  This confuses the JDBC driver as it doesn't think the command 
sequence should have finished yet.


Attached is a patch to make the server continue to consume protocol data 
until instructed to stop by the client in the same way as copying text 
data to the server currently works.


http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-COPY
http://www.postgresql.org/docs/8.4/static/sql-copy.html

Kris Jurka*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2058,2069  CopyFrom(CopyState cstate)
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count) ||
!   fld_count == -1)
{
done = true;
break;
}
  
if (fld_count != attr_count)
ereport(ERROR,
--- 2058,2087 
int16   fld_count;
ListCell   *cur;
  
!   if (!CopyGetInt16(cstate, fld_count))
{
done = true;
break;
}
+   
+   if (fld_count == -1)
+   {
+   /*
+* Reached EOF.  In protocol version 3, we 
should ignore
+* anything after the end of copy data marker 
up to the
+* protocol end of copy data (CopyDone/Fail).
+*/
+   if (cstate-copy_dest == COPY_NEW_FE)
+   {
+   do
+   {
+   cstate-raw_buf_index = 
cstate-raw_buf_len;
+   } while (CopyLoadRawBuf(cstate));
+   continue;
+   }
+   done = true;
+   break;
+   }
  
if (fld_count != attr_count)
ereport(ERROR,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-22 Thread Robert Haas
On Thu, Jul 22, 2010 at 5:34 PM, Kris Jurka bo...@ejurka.com wrote:
 Per discussion and investigation on the -jdbc list, the server appears to
 violate the frontend/backend protocol when binary copy data is sent to the
 server.  Upon receiving the binary copy end of data marker (a -1 field
 count), the server immediately responds with CommandComplete and
 ReadyForQuery without waiting for the frontend to issue CopyDone or
 CopyFail.  This confuses the JDBC driver as it doesn't think the command
 sequence should have finished yet.

 Attached is a patch to make the server continue to consume protocol data
 until instructed to stop by the client in the same way as copying text data
 to the server currently works.

 http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-COPY
 http://www.postgresql.org/docs/8.4/static/sql-copy.html

 Kris Jurka

I guess the obvious question is whether we shouldn't instead change
the docs to match the behavior.  I suspect there's almost no chance
we'd consider back-patching a change of this type, since it is a clear
behavior change.  And even if we did, there would still be people
running servers with the old behavior with which JDBC and other
drivers would have to cope.  Having two different behaviors might be
worse than the status quo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers