Re: [GENERAL] COPY command character set
As Tom says, this doesn't really address my original issue, which was not that I read the material on encoding and misunderstood it, but that I didn't even see that material because it was mixed in with a bunch of other notes on all sorts of random subjects. To address this issue in the documentation at large, I would like to see every command that has I18N/L10N-related behavior have a separate sub-head for the explanation of that behavior. That way, anyone who needs to know about that aspect (which should be everyone), just has to look for the sub-head to be sure they have found what they need to know. Whilst I know we can't do that for every single cross-command topic, it seems to me that I18N/L10N is sufficiently important to users of pg that it merits this treatment. FWIW, I think error handling/behavior also merits its own sub-heads throughout. And there should be links within the pages to sub-heads (cf. DB2's online doc). Of course, all of this would be a substantial project. Note that for the COPY command the I18N/L10N material covers both DateStyle and encoding. In respect of Bruce's proposed changes, I prefer the original wording (for the same reasons as Tom), but with the addition of the mention of the server - "... read from or written to a file directly by the server". -- Peter Headland Architect Actuate Corporation -Original Message- From: Bruce Momjian [mailto:br...@momjian.us] Sent: Monday, February 22, 2010 22:01 To: Tom Lane Cc: Peter Headland; Adrian Klaver; pgsql-general@postgresql.org Subject: Re: [GENERAL] COPY command character set Tom Lane wrote: > Bruce Momjian writes: > > I have updated the documentation to be more direct about COPY encoding > > behavior. Patch attached and applied. > > Uh, why exactly do you find that better? "Processes data" seems a lot > vaguer to me than the previous wording. I certainly don't think that > this does much to address Peter's original complaint. I thought the problem was that we said "input", then "output" and then got to the point about the server, and I thought the reader just stopped reading that far, so I tried to shorten it so the idea was sooner, and I mentioned "server" at the end. It might not be better, but I tried. We don't want to highlight the input/output, we want to highlight that all input and output are controlled by the client encoding. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] COPY command character set
> Maybe the link might help? > > http://www.postgresql.org/docs/8.4/interactive/multibyte.html That page is too generic; what would be helpful is a section in the doc for each command that is affected by I18N/L10N considerations, that identifies how that specific command behaves. Now that I have grasped the behavior, I'm more than happy to edit the COPY doc page, if people think that would be helpful/worthwhile. -- Peter Headland Architect Actuate Corporation -Original Message- From: Adrian Klaver [mailto:akla...@comcast.net] Sent: Thursday, September 10, 2009 11:06 To: Peter Headland Cc: pgsql-general@postgresql.org; Tom Lane Subject: Re: [GENERAL] COPY command character set - "Peter Headland" wrote: > > The COPY command reference page saith > > > >Input data is interpreted according to the current client > encoding, > >and output data is encoded in the the current client encoding, > even > >if the data does not pass through the client but is read from or > >written to a file. > > Rats - I read the manual page twice and that didn't register on my > feeble consciousness. I suspect that I didn't look beyond the word > "client", since I knew I wasn't interested in client behavior and I > was > speed-reading. On the assumption that I am not uniquely stupid, maybe > we > could re-phrase this slightly, with a "for example", and add a > heading > "Localization"? > > As a general comment, I18N/L10N is a hairy enough topic that it > merits > its own heading in any commands where it is an issue. > > How about my suggestion to add a means (extend COPY syntax) to > specify > encoding explicitly and handle UTF lead bytes - would that be of > interest? > > -- > Peter Headland > Architect > Actuate Corporation > > > The COPY command reference page saith > > Input data is interpreted according to the current client > encoding, > and output data is encoded in the the current client encoding, > even > if the data does not pass through the client but is read from or > written to a file. > > Seems clear enough to me. > > regards, tom lane Maybe the link might help? http://www.postgresql.org/docs/8.4/interactive/multibyte.html Adrian Klaver akla...@comcast.net -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] COPY command character set
> There are no lead bytes in UTF-8 Sorry, sloppy use of terminology. I should have said "UTF signatures" aka the "byte order mark". IOW, the "magic number" bytes commonly found at the front of UTF encoded files: UTF-16 little-endian FF FE UTF-16 big-endian FE FF UTF-8 EF BB BF These tend to be inserted automatically by text editors, so it would be advantageous to have them handled automatically by COPY (at least as an option). Right now, if I edit a UTF-8 file then load it with COPY, I get errors or bad data if the editor chose to add the 3 signature bytes. Whilst UTF-16 is not supported internally, COPY seems to be a legitimate special case, because it is used for migration to/from other tools that may emit or expect UTF-16. ISTR that Postgres uses UCI? If so it would be near-trivial to allow COPY to read and write UTF-16. If done via a syntax extension to COPY (which I think is the most desirable implementation), this would have no adverse effect on any other capability. It also seems sufficiently isolated from sensitive/complex areas of the code that it might make a suitable first project for someone who is interested in becoming a contributor... -- Peter Headland Architect Actuate Corporation -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Thursday, September 10, 2009 11:13 To: Peter Headland Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] COPY command character set "Peter Headland" writes: > How about my suggestion to add a means (extend COPY syntax) to specify > encoding explicitly and handle UTF lead bytes - would that be of > interest? There are no lead bytes in UTF-8, and we make no pretense of handling UTF-16, so I don't think we'd be interested in some hack that cleans up misencoding problems. The idea of overriding client_encoding has been suggested before. I don't remember if it was rejected or is just languishing on the TODO list. I'd be a little worried about sending clients data in an encoding they aren't expecting, but if it only works for I/O to a file it might be okay. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] COPY command character set
> The COPY command reference page saith > >Input data is interpreted according to the current client encoding, >and output data is encoded in the the current client encoding, even >if the data does not pass through the client but is read from or >written to a file. Rats - I read the manual page twice and that didn't register on my feeble consciousness. I suspect that I didn't look beyond the word "client", since I knew I wasn't interested in client behavior and I was speed-reading. On the assumption that I am not uniquely stupid, maybe we could re-phrase this slightly, with a "for example", and add a heading "Localization"? As a general comment, I18N/L10N is a hairy enough topic that it merits its own heading in any commands where it is an issue. How about my suggestion to add a means (extend COPY syntax) to specify encoding explicitly and handle UTF lead bytes - would that be of interest? -- Peter Headland Architect Actuate Corporation -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Thursday, September 10, 2009 10:38 To: Peter Headland Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] COPY command character set "Peter Headland" writes: >> set client_encoding = 'utf8'; >> copy from stdin/to stdout; > What if I want to do this on the server side (because it's much, much > faster)? Does COPY use the default encoding of the database? If not, > what? > If this is a restrictive as it appears, and there are no outstanding > enhancements planned in this area, I might be interested in improving > this command to allow specifying the encoding and to have it do obvious > stuff like recognize UTF lead bytes automatically. At the very least, > the documentation needs some work to explain these subtleties. The COPY command reference page saith Input data is interpreted according to the current client encoding, and output data is encoded in the the current client encoding, even if the data does not pass through the client but is read from or written to a file. Seems clear enough to me. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] COPY command character set
> set client_encoding = 'utf8'; > copy from stdin/to stdout; What if I want to do this on the server side (because it's much, much faster)? Does COPY use the default encoding of the database? If not, what? If this is a restrictive as it appears, and there are no outstanding enhancements planned in this area, I might be interested in improving this command to allow specifying the encoding and to have it do obvious stuff like recognize UTF lead bytes automatically. At the very least, the documentation needs some work to explain these subtleties. -- Peter Headland Architect Actuate Corporation -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Wednesday, September 09, 2009 19:14 To: Peter Headland Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] COPY command character set "Peter Headland" writes: > The documentation of the COPY command does not state what character > set(s) are recognized or written. I need to import and export UTF-8 > data; how can I do that? set client_encoding = 'utf8'; copy from stdin/to stdout; regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] COPY command character set
The documentation of the COPY command does not state what character set(s) are recognized or written. I need to import and export UTF-8 data; how can I do that? -- Peter Headland Architect Actuate Corporation
Re: [GENERAL] Race hazard deleting using CTID?
> Well, the CTID of a row you can see can't be changed by another > transaction while your transaction is still live. However, if > someone else does modify/delete one of those rows concurrently, > it will fail the outer WHERE check and thus silently not be > deleted. Net effect is that you might delete fewer than 10 > rows. Not sure if you'd consider that a race hazard or not. Thanks for the explanation. My actual use-case is an UPDATE, and updating fewer rows than I wanted to is definitely not the desired behavior. It's a race hazard condition by definition if the behavior of a system is non-deterministic due to timing of another process. This is all in a pl/pgsql function. I was trying to avoid a cursor and loop solution in the hope of better performance. Sounds like I need to separate the SELECT into a top-level statement and get the list of ctids into an array variable that I then use in the DELETE (actually an UPDATE in my case, but that's beside the point). I know this question would be better on new users, but since we already have all the context established, I'd be grateful to know how to get the ctids into an array variable in pl/pgsql. -- Peter Headland Architect Actuate Corporation -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Tuesday, August 11, 2009 12:01 To: Peter Headland Cc: pgsql-general@postgresql.org; Pavel Stehule Subject: Re: Race hazard deleting using CTID? "Peter Headland" writes: > My question is, does this code contain a race hazard, because the list from the SELECT might get changed by another session before the DELETE uses it? > delete from del where ctid = any(array(select ctid from del limit 10)) Well, the CTID of a row you can see can't be changed by another transaction while your transaction is still live. However, if someone else does modify/delete one of those rows concurrently, it will fail the outer WHERE check and thus silently not be deleted. Net effect is that you might delete fewer than 10 rows. Not sure if you'd consider that a race hazard or not. > If so, am I correct to think that adding FOR UPDATE to create the version below would eliminate the hazard? > delete from del where ctid = any(array(select ctid from del limit 10 for update)) If you'd bothered to try that before asking the list, you'd know the system won't take it --- FOR UPDATE is only supported at top level. You could probably do something equivalent using a plpgsql loop, or pulling the CTIDs back to the client side. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Race hazard deleting using CTID?
I believe that the following statement was originally suggested by Tom Lane; I got it from Pavel Stehule's PostgreSQL Tips page. My question is, does this code contain a race hazard, because the list from the SELECT might get changed by another session before the DELETE uses it? delete from del where ctid = any(array(select ctid from del limit 10)) If so, am I correct to think that adding FOR UPDATE to create the version below would eliminate the hazard? delete from del where ctid = any(array(select ctid from del limit 10 for update)) -- Peter Headland Architect Actuate Corporation -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Does PERFORM hold a lock?
If I do the following in a pl/pgsql function, will an update lock be held until the end of the transaction? PERFORM c FROM t FOR UPDATE; -- Peter Headland Architect Actuate Corporation