Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Andrew Dunstan
Bruce Momjian said:
>> >Yes, my worry is that someone will use a multibyte character that the
>> >system sees as several bytes and enters CSV mode.
>> >
>>
>>
>> How about if we specify it explicitly, like BINARY, instead of it
>> being  implied by the length of DELIMITER?
>>
>> COPY a FROM stdin CSV DELIMITER ',"';
>>
>> That would make the patch somewhat more extensive, but maybe not
>> hugely  more invasive (I tried to keep it as uninvasive as possible).
>> I could do  that, I think.
>
> That's what I was wondering.  Is triggering CSV for multi-character
> delimiters a little too clever?  This reminds me of the use of LIMIT
> X,Y with no indication which is limit and which is offset.
>
> We certainly could code to prevent the multibyte problem I mentioned,
> but should we?


I confess that in my anglocentric world I have remained lamentably
ignorant of how MBCS works. Just reading up a little, and looking over
some of our code (e.g. the scanner) it looks like the simple solution
would be to check that the delimiter was 8-bit clean. (I assume that ASCII
is a subset of every MBCS we support - is that correct?)

However ...

>
> I am thinking just:
>
>> COPY a FROM stdin WITH CSV ',"';
>
> or
>
>> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
>
> EQUOTE for embedded quote.  These are used in very limited situations
> and don't have to be reserved words or anything.
>
> I can help with these changes if folks like them.
>


I prefer either the first, because it ensures things are specified
together.

If you want to do that I will work on some regression tests.

cheers

andrew





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] Russian FAQ update

2004-04-11 Thread Bruce Momjian
Serguei Mokhov wrote:
> Hello,
> 
> This update fixes a few small typos in names,
> pronouns and formatting in the Russian FAQ.
> 
> To Victor:
> 
> Why do you use the translit encoding for FAQ
> instead of KOI8-R as the rest of the translations
> in pg?

Would you please send a patch against the HTML version:

/doc/src/FAQ/FAQ_russian.html

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> That's what I was wondering.  Is triggering CSV for multi-character
> delimiters a little too clever?  This reminds me of the use of LIMIT X,Y
> with no indication which is limit and which is offset.

I agree, this seems risky and not at all readable to someone who doesn't
remember exactly how the parameter is defined.

>> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
> EQUOTE for embedded quote.

ESCAPE would be better no?  It's already a keyword ...

BTW, don't forget that the syntax options have to be provided in psql's
\copy as well.  Did the patch cover that?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Bruce Momjian
Andrew Dunstan wrote:
> > I am thinking just:
> >
> >> COPY a FROM stdin WITH CSV ',"';
> >
> > or
> >
> >> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
> >
> > EQUOTE for embedded quote.  These are used in very limited situations
> > and don't have to be reserved words or anything.
> >
> > I can help with these changes if folks like them.
> >
> 
> 
> I prefer either the first, because it ensures things are specified
> together.
> 
> If you want to do that I will work on some regression tests.


( Jump to the bottom for my final solution.)


I thought about this today.  I am thinking of:

> COPY a FROM stdin WITH CSV

or

> COPY a FROM stdin WITH CSV '""' DELIMITER ','

In other words, the string after CSV is optional.  However, looking at
the COPY syntax, there isn't any case where we have an optional string
after a keyword.  Is that OK?

In this case, CVS is a mode that makes the delimiter ',' and the quote
and quote escape '"'.  This way, CVS doesn't require any special quote
if you want the default.

However, this still has CSV using a two-character string with special
meaning for the first and second characters.  What if we call it QUOTE
mode:

> COPY a FROM stdin WITH QUOTE

that enables CVS with comma delimiters and '"' for quote escape, so the
above would be equavalent to:

> COPY a FROM stdin WITH QUOTE '"' ESCAPE '"' DELIMITER ','

(I have used ESCAPE because Tom suggested it and it is already a
keyword.)

I am a little worried that ESCAPE only has meaning with QUOTE, and QUOTE
sets defaults for all the others.  This makes the syntax addition pretty
confusing for users.

---

Thinking further, maybe we need to add CSV, QUOTE, and ESCAPE to COPY.
QUOTE and ESCAPE are only available in CVS mode, so you can say:

> COPY a FROM stdin WITH CSV

or

> COPY a FROM stdin WITH CSV ESCAPE '\\'

This means that there is no optional string for keywords.  Here is the
line at the bottom we have to add to the COPY syntax.

   COPY tablename [ ( column [, ...] ) ]
   TO { 'filename' | STDOUT }
   [ [ WITH ]
 [ BINARY ]
 [ OIDS ]
 [ DELIMITER [ AS ] 'delimiter' ]
 [ NULL [ AS ] 'null string' ] ]

 [ CSV [ QUOTE 'quote' ] [ ESCAPE 'escape' ] ]

DELIMITER default to tab, except in CSV mode, where it is a comma.

That sounds very clear to me.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > That's what I was wondering.  Is triggering CSV for multi-character
> > delimiters a little too clever?  This reminds me of the use of LIMIT X,Y
> > with no indication which is limit and which is offset.
> 
> I agree, this seems risky and not at all readable to someone who doesn't
> remember exactly how the parameter is defined.

Yep.

> >> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
> > EQUOTE for embedded quote.
> 
> ESCAPE would be better no?  It's already a keyword ...

Yep.

> BTW, don't forget that the syntax options have to be provided in psql's
> \copy as well.  Did the patch cover that?

Oh, yea.  That needs to be done too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] Russian FAQ update

2004-04-11 Thread Serguei Mokhov
- Original Message - 
From: "Bruce Momjian" <[EMAIL PROTECTED]>
Sent: April 11, 2004 10:25 PM

> Serguei Mokhov wrote:
> > Hello,
> > 
> > This update fixes a few small typos in names,
> > pronouns and formatting in the Russian FAQ.
> > 
> > To Victor:
> > 
> > Why do you use the translit encoding for FAQ
> > instead of KOI8-R as the rest of the translations
> > in pg?
> 
> Would you please send a patch against the HTML version:
> 
> /doc/src/FAQ/FAQ_russian.html

Diff attached.

-s



FAQ_russian.diff
Description: Binary data

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] Russian FAQ update

2004-04-11 Thread Bruce Momjian


Patch applied.  Thanks.

---


Serguei Mokhov wrote:
> - Original Message - 
> From: "Bruce Momjian" <[EMAIL PROTECTED]>
> Sent: April 11, 2004 10:25 PM
> 
> > Serguei Mokhov wrote:
> > > Hello,
> > > 
> > > This update fixes a few small typos in names,
> > > pronouns and formatting in the Russian FAQ.
> > > 
> > > To Victor:
> > > 
> > > Why do you use the translit encoding for FAQ
> > > instead of KOI8-R as the rest of the translations
> > > in pg?
> > 
> > Would you please send a patch against the HTML version:
> > 
> > /doc/src/FAQ/FAQ_russian.html
> 
> Diff attached.
> 
> -s
> 

[ Attachment, skipping... ]

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> In other words, the string after CSV is optional.  However, looking at
> the COPY syntax, there isn't any case where we have an optional string
> after a keyword.  Is that OK?

Seems better to avoid it.

> However, this still has CSV using a two-character string with special
> meaning for the first and second characters.

One point that I don't think was made before is that if we do any such
thing, we'll be forever foreclosing any chance of allowing
multi-character delimiters.  ISTM that would not be forward-looking.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Andrew Dunstan
Tom Lane said:
>>> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"';
>> EQUOTE for embedded quote.
>
> ESCAPE would be better no?  It's already a keyword ...

much better.

>
> BTW, don't forget that the syntax options have to be provided in psql's
> \copy as well.  Did the patch cover that?
>

No, because it didn't make a syntax change. I admit I didn't test that
though.

cheers

andrew



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY for CSV documentation

2004-04-11 Thread Andrew Dunstan
Bruce Momjian said:
>
>   COPY tablename [ ( column [, ...] ) ]
>   TO { 'filename' | STDOUT }
>   [ [ WITH ]
> [ BINARY ]
> [ OIDS ]
> [ DELIMITER [ AS ] 'delimiter' ]
> [ NULL [ AS ] 'null string' ] ]
>
> [ CSV [ QUOTE 'quote' ] [ ESCAPE 'escape' ] ]
>
> DELIMITER default to tab, except in CSV mode, where it is a comma.
>
> That sounds very clear to me.
>


TIMTOWTDI, as we say in the perl world, and we could debate it endlessly.
This looks fine to me.

a few points:

. ESCAPE should default to same-as-quote
. in CSV mode, NULL should default to '' - that was in what I sent in.
. I assume that for now we will enforce the one byte rule for QUOTE and
ESCAPE as well as for DELIMITER.

cheers

andrew





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]