Re: [HACKERS] invalidly encoded strings

2007-09-11 Thread db
 Try the sequence below. Then, try to dump and then reload the database.
 When you try to reload it, you will get an error:

 ERROR:  invalid byte sequence for encoding UTF8: 0xbd

 I know this could be a problem (like chr() with invalid byte pattern).

And that's enough of a problem already. We don't need more problems.

 What I really want to know is, read query something like this:

 SELECT * FROM japanese_table ORDER BY convert(japanese_text using
 utf8_to_euc_jp);

 could be a problem (I assume we use C locale).

If convert() produce a sequence of bytes that can't be interpreted as a
string in the server encoding then it's broken. Imho convert() should
return a bytea value. If we hade good encoding/charset support we could do
better, but we can't today.

The above example would work fine if convert() returned a bytea. In the C
locale the string would be compared byte for byte and that's what you get
with bytea values as well.

Strings are not sequences of bytes that can be interpreted in different
ways. That's what bytea values are. Strings are in a specific encoding
always, and in pg that encoding is fixed to a single one for a whole
cluster at initdb time. We should not confuse text with bytea.

/Dennis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] invalidly encoded strings

2007-09-10 Thread db
 I think the concern is when they use only one slash, like:
   E'\377\000\377'::bytea
 which, as I mentioned before, is not correct anyway.

 Wait, why would this be wrong? How would you enter the three byte bytea of
 consisting of those three bytes described above?

Either as

E'\\377\\000\\377'

or

'\377\000\377'

/Dennis



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] like/ilike improvements

2007-05-23 Thread db
 And Dennis said:

 It is only when you have a pattern like '%_' when this is a problem
 and we could detect this and do byte by byte when it's not. Now we
 check (*p == '\\') || (*p == '_') in each iteration when we scan over
 characters for '%', and we could do it once and have different loops
 for the two cases.

 That's pretty much what the patch does now - It never tries to match a
 single byte when it sees _, whether or not preceeded by %.

My comment was about UTF-8 since I thought we were making a special
version for UTF-8. I don't know what properties other multibyte encodings
have.

/Dennis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Money type todos?

2007-03-21 Thread db
 Dennis Bjorklund [EMAIL PROTECTED] writes:

 What is the reason to keep it?

 The words-of-one-syllable answer is that D'Arcy Cain is still willing
 to put work into supporting the money type, and if it still gets the
 job done for him then it probably gets the job done for some other
 people too.

 Personally, as a former currency trader I've not seen any proposals on
 this list for a money type that I'd consider 100% feature complete.
 The unit-identification part of it is interesting, but pales into
 insignificance compared to the problem that the unit values vary
 constantly

The unit (currency) part is what I don't like about the money type.

To have a fast and size limited fixed point type is something I think is
good. It could very well be called money if we want to or we can give it a
more neutral name.

/Dennis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings