Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tatsuo Ishii
 OK, I've been spreading rumours about fixing the internationalization
 problems, so let me make it a bit more clear.  Here are the problems that
 need to be fixed:
 
 - Only one locale per process possible.
 
 - Only one gettext-language per process possible.
 
 - lc_collate and lc_ctype need to be held fixed in the entire cluster.
 
 - Gettext relies on iconv character set conversion, which relies on
   lc_ctype, which leads to a complete screw-up in the server because of
   the previous item.
 
 - Locale fixed per cluster, but encoding fixed per database, unware
   of each other, don't get along.
 
 - No support for upper/lower with multibyte encoding.
 
 - Implementation of Unicode horribly incomplete.
 
 These are all dependent on each other and sort of flow into each other.
 
 Here is a proposed ordering of steps toward improving the situation:
 
 1. Take out the character set conversion routines from the backend and
 make them a library of their own.  This could possibly be modelled after
 iconv, but not necessarily.  Or we might conclude that we can just use
 iconv in the first place.

How do you handle user-defined conversions?

 2. Reimplement gettext to use 1. and allow switching of language and
 encoding at run-time.
 
 3. Implement Unicode collation algorithm and character classification
 routines that are aware of 1.  Use that in place of system locale
 routines.

I don't see a relationship between Unicode and the one you are going
to replace the system locale routines. If you are going to the
direction for an Unicode central implementation, I will object.

 4. Allow choice of locale per database.  (This should be fairly easy after
 3.)
 
 5. Allow choice of locale per column and implement collation coercion
 according to SQL standard.
 
 This could easily take a long time, but I feel that even if we have to
 stop after 2., 3., or 4. at feature freeze, we'd be a lot farther.
 
 Comments?  Anything else that needs fixing?
 
 -- 
 Peter Eisentraut   [EMAIL PROTECTED]
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Mon, 24 Nov 2003, Peter Eisentraut wrote:

 1. Take out the character set conversion routines from the backend and
 make them a library of their own.  This could possibly be modelled after
 iconv, but not necessarily.  Or we might conclude that we can just use
 iconv in the first place.
 
 2. Reimplement gettext to use 1. and allow switching of language and
 encoding at run-time.

Force all translations to be in unicode and convert to other client
encodings if needed. There is no need to support translations stored using
different encodings.

 3. Implement Unicode collation algorithm and character classification
 routines that are aware of 1.  Use that in place of system locale
 routines.

Couldn't we use some library that already have this, like glib (or
something else). If it's not up to what we need, than fix that library
instead.

--
/Dennis


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Tatsuo Ishii writes:

  3. Implement Unicode collation algorithm and character classification
  routines that are aware of 1.  Use that in place of system locale
  routines.

 I don't see a relationship between Unicode and the one you are going
 to replace the system locale routines. If you are going to the
 direction for an Unicode central implementation, I will object.

The Unicode collation algorithm works for any character set, not only for
Unicode.  It just happens to be published by the Unicode consortium.  So
basically this is just a concrete alternative to making up our own out of
thin air.  Also, the Unicode collation algorithm gives us the flexibility
to define customizations of collations that users frequently want, such as
ignoring or not ignoring punctuation.

Actually, what will more likely happen is that we'll define a collation as
a collection of one or more support functions, the equivalents of
strxfrm() and possibly a few more.  Then it will be up to those functions
to define the collation order.  The server will provide utility functions
that will facilitate implementing a collation order that follows the
Unicode collation algorithm, but you could just as well implement one
using memcmp() or whatever you like.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Dennis Bjorklund writes:

 Force all translations to be in unicode and convert to other client
 encodings if needed. There is no need to support translations stored using
 different encodings.

Tell that to the Japanese.

 Couldn't we use some library that already have this, like glib (or
 something else). If it's not up to what we need, than fix that library
 instead.

I wasn't aware that glib had this.  I'll look.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Zeugswetter Andreas SB SD
Have you looked at what is available from 
http://oss.software.ibm.com/icu/ ?

Seems they have a compatible license, but use some C++.

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Peter Eisentraut wrote:

  Force all translations to be in unicode and convert to other client
  encodings if needed. There is no need to support translations stored using
  different encodings.
 
 Tell that to the Japanese.

I've always thought unicode was enough to even represent Japanese. Then 
the client encoding can be something else that we can convert to. In any 
way, the encoding of the message catalog has to be known to the system so 
it can be converted to the correct encoding for the client.

  Couldn't we use some library that already have this, like glib (or
  something else). If it's not up to what we need, than fix that library
  instead.
 
 I wasn't aware that glib had this.  I'll look.

And I don't really know what demands pg has, but glib has a lot of support 
functions for utf-8. At least we should take a look at it.

-- 
/Dennis


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tatsuo Ishii
 On Tue, 25 Nov 2003, Peter Eisentraut wrote:
 
   Force all translations to be in unicode and convert to other client
   encodings if needed. There is no need to support translations stored using
   different encodings.
  
  Tell that to the Japanese.
 
 I've always thought unicode was enough to even represent Japanese. Then 
 the client encoding can be something else that we can convert to. In any 
 way, the encoding of the message catalog has to be known to the system so 
 it can be converted to the correct encoding for the client.

I'm tired of telling that Unicode is not that perfect. Another gottcha
with Unicode is the UTF-8 encoding (currently we use) consumes 3
bytes for each Kanji character, while other encodings consume only 2
bytes. IMO 3/2 storage ratio could not be neglected for database use.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] ObjectWeb/Clustered JDBC

2003-11-25 Thread Dave Cramer
Hans,

I don't understand the statement about missing DECLARE CURSOR ? The
backend supports it?

Dave
On Sun, 2003-11-23 at 12:12, Hans-Jürgen Schönig wrote:
 Peter Eisentraut wrote:
  I was at the ObjectWeb Conference today; ObjectWeb
  (http://www.objectweb.org) being a consortium that has amassed quite an
  impressive array of open-source, Java-based middleware under their
  umbrella, including for instance our old friend Enhydra.  And they
  regularly kept mentioning PostgreSQL in their presentations.
  
  To those that are interested in distributed transactions/two-phase commit,
  I recommend taking a look at Clustered JDBC
  (http://c-jdbc.objectweb.org/).  While this is not exactly the same thing,
  it looks to be a pretty neat solution for a similar class of applications.
  In particular, it provides redundancy, load balancing, caching, and even
  database independence.
  
 
 
 It is indeed a nice solution but it is far from ready yet.
 Especially the disaster recovery mechanism and things such as adding new 
 masters need some more work.
 What I really miss is DECLARE CURSOR. Maybe it will be in there some 
 day :).
 However, we have done some real testing with sync replication (4 x pg, 1 
 x oracle). It performed surprisingly well (the JDBC part, not the Oracle 
 one ;) ).
 Maybe this will be something really useful within the next few months.
 
   Cheers,
 
   Hans


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tatsuo Ishii wrote:

 I'm tired of telling that Unicode is not that perfect. Another gottcha
 with Unicode is the UTF-8 encoding (currently we use) consumes 3
 bytes for each Kanji character, while other encodings consume only 2
 bytes. IMO 3/2 storage ratio could not be neglected for database use.

I'm aware of how utf-8 works and I was talking about the message 
cataloges. It does not affect what you store in the database in any way.

-- 
/Dennis


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] ObjectWeb/Clustered JDBC

2003-11-25 Thread Peter Eisentraut
Hans-Jürgen Schönig writes:

 Especially the disaster recovery mechanism and things such as adding new
 masters need some more work.

Yes, someone is working on automatic recovery (which would extend to
adding new masters by starting recovery from zero).  In fact, they're just
across town from you (together.at).

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tatsuo Ishii wrote:

 I'm tired of telling that Unicode is not that perfect. Another gottcha
 with Unicode is the UTF-8 encoding (currently we use) consumes 3
 bytes for each Kanji character, while other encodings consume only 2
 bytes. IMO 3/2 storage ratio could not be neglected for database use.

The rest of the world seems to select unicode as the way to handle
different languages in the UI of programs. For example gnome supports
nothing but unicode. How is that handled in your country? I know that you
are tired of people who don't understand how difficult it is for you, but
I really would like to know. Is gnome not used over there because of this?

About storing data in the database, I would expect it to work with any
encoding, just like I would expect pg to be able to store images in any
format.

I'll try to not mention unicode near you in the feature :-)

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Build farm

2003-11-25 Thread Bruce Momjian
Peter Eisentraut wrote:
 Bruce Momjian writes:
 
  FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared
  directories for most of the machines, meaning you can CVS update once
  and telnet in to compile for each platform.
 
 Except that you can't open connections to the outside from these machines.

Oh, yea.  You can connect to the machines with ftp, so I guess you would
have to CVS update on your local machine, then push the changes to the
farm.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Updates for RPMS.

2003-11-25 Thread Lamar Owen
On Monday 24 November 2003 03:34 pm, David Fetter wrote:
 I just tried building 0.3PGDG on Redhat 9, and got:

 # rpmbuild --rebuild postgresql-7.4-0.3PGDG.src.rpm
 [snip]
 checking krb5.h usability... no
 checking krb5.h presence... no
 checking for krb5.h... no
 configure: error: header file krb5.h is required for Kerberos 5
 error: Bad exit status from /var/tmp/rpm-tmp.97860 (%build)
 [snip]

 Is there some way to tell the .spec file where to look for krb5.h?

Yes.  Try
rpmbuild --define 'build89 1' --rebuild postgresql-7.4-0.3PGDG.src.rpm

I'm looking at automating this; but at the moment it is manual.  The spec file 
does check the build89 macro, and, if defined, throws in the right value for 
kerberos.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Build farm

2003-11-25 Thread Andrew Dunstan
Bruce Momjian wrote:

FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared
directories for most of the machines, meaning you can CVS update once
and telnet in to compile for each platform.
 

As Peter pointed out, these machines are firewalled. But presumably one could upload a snapshot to them. What I had in mind was a more distributed system, though. 

Of course, these things are not mutually exclusive - using the HP testdrive farm looks like it might be nice. But it would be hard to automate, I suspect.

cheers

andrew





---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] fairly serious bug with pg_autovacuum in pg7.4

2003-11-25 Thread Brian Hirt
Hello,

I've run across a pretty serious problem with pg_autovacuum.
pg_autovacuum looses track of any table that's ever been truncated  
(possibly other situations too).   When i truncate a table it gets a  
new relfilenode in pg_class.  This is a problem because pg_autovacuum  
assumes pg_class.relfilenode will join to pg_stats_all_tables.relid. 
pg_stats_all_tables.relid is actallly the oid from pg_class, not the  
relfilenode.   These two values start out equal so pg_autovacuum works  
initially, but it fails later on because of this incorrect assumption.

here is one query pg_autovacuum uses (from pg_autovacuum.h) to get  
tables that breaks.

select  
a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup 
les,b.schemaname,b
.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a, pg_stat_all_tables  
b where a.relfilenode=b.relid and a.relkind = 'r'

here's a little test case you can use to see what happens:

basement=# create table test_table ( id int4 );
CREATE TABLE
basement=# select relname, relfilenode from pg_class where relkind =  
'r' and relname = 'test_table';
  relname   | relfilenode
+-
 test_table |28814151
(1 row)

basement=# select relid,relname from pg_stat_all_tables where relname =  
'test_table';
  relid   |  relname
--+
 28814151 | test_table
(1 row)

basement=# select  
a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup 
les,b.schemaname,
basement-# b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,  
pg_stat_all_tables b
basement-# where a.relfilenode=b.relid and a.relkind = 'r' and  
a.relname = 'test_table';
 relfilenode |  relname   | relnamespace | relpages | relisshared |  
reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del
-++--+--+- 
+---++---+---+---
28814151 | test_table | 2200 |   10 | f   |  
 1000 | public | 0 | 0 | 0
(1 row)

basement=#
basement=# truncate table test_table;
TRUNCATE TABLE
basement=# select relname, relfilenode from pg_class where relkind =  
'r' and relname = 'test_table';
  relname   | relfilenode
+-
 test_table |28814153
(1 row)

basement=# select relid,relname from pg_stat_all_tables where relname =  
'test_table';
  relid   |  relname
--+
 28814151 | test_table
(1 row)

basement=# select  
a.relfilenode,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltup 
les,b.schemaname,
basement-# b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,  
pg_stat_all_tables b
basement-# where a.relfilenode=b.relid and a.relkind = 'r' and  
a.relname = 'test_table';
 relfilenode | relname | relnamespace | relpages | relisshared |  
reltuples | schemaname | n_tup_ins | n_tup_upd | n_tup_del
-+-+--+--+- 
+---++---+---+---
(0 rows)

basement=# drop table test_table;
DROP TABLE
basement=#
PS: i'm running pg-7.4 and pg_autovacuum from contrib.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Providing anonymous mmap as an option of sharing memory

2003-11-25 Thread Shridhar Daithankar
Hello All,

I was looking thr. the source and thought it would be worth to seek opinion on 
this proposal.

From what I understood so far, the core shared memory handling is done in 
pgsql/src/backend/port/sysv_shmem.c. It is linked by configure as per the 
runtime environment.

So I need to write another source code file which exports same APIs as 
above(i.e. all non static functions in that file) but using mmap and that would 
do it for using anon mmap instead of sysV shared memory.

It might seem unnecessary to provide mmap based shared memory. but this is just 
one step I was thinking of.

In pgsql/src/backend/storage/ipc/shmem.c, all the shared memory allocations are 
done. I was thinking of creating a structure of all global variables in that 
file. The global variables would still be in place so that existing code would 
not break. But the structure would hold database specific buffering information. 
Let's call that structure database context.

That way we can assign different mmaped(anon, of course) regions per database. 
In the backend, we could just switch the database contexts i.e. assign global 
variables from the database context and let the backend write to appropriate 
shared memory region. Every database would need at least two shared memory 
regions. One for operating on it's own buffers and another for system where it 
could write to shared catalogs etc. It can close the shared memory region 
belonging to other databases on startup.

Of course, buffer management alone would not cover database contexts altogether. 
WAL need to be lumped in as well(Not necessarily though. If all WAL buffering go 
thr. system shared region, everything will still work). I don't know if clog and 
data file handling is affected by this. If WAL goes in database context, we can 
probably provide per database WAL which could go well with tablespaces as well.

In case of WAL per database, the operations done on a shared catalog from a 
backend would need flushing system WAL and database WAL to ensure such 
transaction commit. Otherwise only flushing database WAL would do.

This way we can provided a background writer process per database, a common 
buffer per database minimising impact of cross database load significantly. e.g. 
vacuum full on one database would not hog another database due to buffer cache 
pollution. (IO can still saturate though.) This way we can push hardware to 
limit which might not possible right now in some cases.

I was looking for the reason large number of buffers degrades the performance 
and the source code browsing spiralled in this thought. So far I haven't figured 
out any reason why large numebr of buffers can degrade the performance. Still 
looking for it.

Comments?

 Shridhar

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Dennis Bjorklund writes:
 Couldn't we use some library that already have this, like glib (or
 something else). If it's not up to what we need, than fix that library
 instead.

 I wasn't aware that glib had this.  I'll look.

Of course the trouble with relying on glibc is that we'd have no solution
for platforms that don't use glibc.

It might be okay to rely on glibc for a first-cut implementation,
realizing that we couldn't do everything at once anyway.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Actually, what will more likely happen is that we'll define a collation as
 a collection of one or more support functions, the equivalents of
 strxfrm() and possibly a few more.  Then it will be up to those functions
 to define the collation order.  The server will provide utility functions
 that will facilitate implementing a collation order that follows the
 Unicode collation algorithm, but you could just as well implement one
 using memcmp() or whatever you like.

That sounds like a good plan to me.  Personally I'd want a
memcmp()-based collation implementation available, so that people who
don't care about sorting anything beyond 7-bit ASCII don't need to pay
a lot of overhead.

We have seen over and over that strcoll() is depressingly slow in some
locales (at least on some platforms).  Do you have any feeling for the
real-world performance of the Unicode algorithm?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Doug McNaught
Tom Lane [EMAIL PROTECTED] writes:

 Peter Eisentraut [EMAIL PROTECTED] writes:
 
  I wasn't aware that glib had this.  I'll look.
 
 Of course the trouble with relying on glibc is that we'd have no solution
 for platforms that don't use glibc.

glib != glibc.  glib is the low-level library used by GTK and GNOME
for basic data structures, character handling etc.  It's LGPL AFAIK,
which would seem to rule out diredct use from a licensing perspective.

-Doug

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Hannu Krosing
Dennis Bjorklund kirjutas T, 25.11.2003 kell 14:51:
 On Tue, 25 Nov 2003, Tatsuo Ishii wrote:
 
  I'm tired of telling that Unicode is not that perfect. 

Of course not, but neither is the current multibyte with only marginal
support for unicode (many people actually need upper()/lower() )

 Another gottcha
  with Unicode is the UTF-8 encoding (currently we use) consumes 3
  bytes for each Kanji character, while other encodings consume only 2
  bytes. 

I think that for *storage* we should use SCSU (the Standard Compression
Scheme for Unicode).

 IMO 3/2 storage ratio could not be neglected for database use.

SCSU should solve that (actually it should use less than 2 bytes char
for encoding any single language)

 The rest of the world seems to select unicode as the way to handle
 different languages in the UI of programs. For example gnome supports
 nothing but unicode. How is that handled in your country? I know that you
 are tired of people who don't understand how difficult it is for you, but
 I really would like to know. Is gnome not used over there because of this?
 
 About storing data in the database, I would expect it to work with any
 encoding, just like I would expect pg to be able to store images in any
 format.
 
 I'll try to not mention unicode near you in the feature :-)

---
Hannu






---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Providing anonymous mmap as an option of sharing memory

2003-11-25 Thread Tom Lane
Shridhar Daithankar [EMAIL PROTECTED] writes:
 I was looking thr. the source and thought it would be worth to seek
 opinion on this proposal.

This has been discussed and rejected before.  See the archives.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Build farm

2003-11-25 Thread Bruce Momjian
Andrew Dunstan wrote:
 Bruce Momjian wrote:
 
 FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared
 directories for most of the machines, meaning you can CVS update once
 and telnet in to compile for each platform.
 
 
 
 
 As Peter pointed out, these machines are firewalled. But presumably
 one could upload a snapshot to them. What I had in mind was a
 more distributed system, though.
 
 Of course, these things are not mutually exclusive - using the
 HP testdrive farm looks like it might be nice. But it would be
 hard to automate, I suspect.

I figured you could just upload once and telnet and build on each
machine.

--
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Greg Stark

Peter Eisentraut [EMAIL PROTECTED] writes:

 2. Reimplement gettext to use 1. and allow switching of language and
 encoding at run-time.
 
 3. Implement Unicode collation algorithm and character classification
 routines that are aware of 1.  Use that in place of system locale
 routines.

This sounds like you want to completely reimplement all of the locale handling
provided by the OS? That seems like a dead-end approach to me. There's no way
your handling will ever be as complete or as well optimized as some OS's.

Better to find ways to use the OS gettext and locale handling on platforms
that provide good interfaces. On platforms that don't provide good interfaces
either don't support the features or use some third party library to provide
a good implementation.

The only thing you really need in the database is a second parameter on all
the collation functions like strxfrm(col,locale) etc. Then functional indexes
take care of almost everything.

The only advantage to adding locales per-column and/or per-index is the
notational simplicity. Queries could do simple standard expressions and not
have to worry about calling strxfrm or other locale-specific functions all the
time. I'm not sure it's worth the complexity of having to deal with 
WHERE xy where x and y are in different locales though.


-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Commercial binary support?

2003-11-25 Thread Josh Berkus
Josh, Hans, et. al.

Please take this thread OFF LIST IMMEDIATELY.

Its content is no longer appropriate for the Hackers mailing list, and we get 
enough traffic.  Flamewars are not a part of our community.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Build farm

2003-11-25 Thread Andrew Dunstan
Bruce Momjian wrote:

Andrew Dunstan wrote:
 

Bruce Momjian wrote:

   

FYI, the HP testdrive farm, http://www.testdrive.hp.com, has shared
directories for most of the machines, meaning you can CVS update once
and telnet in to compile for each platform.


 

As Peter pointed out, these machines are firewalled. But presumably
one could upload a snapshot to them. What I had in mind was a
more distributed system, though.
Of course, these things are not mutually exclusive - using the
HP testdrive farm looks like it might be nice. But it would be
hard to automate, I suspect.
   

I figured you could just upload once and telnet and build on each
machine.
 

What I'm working on (slowly - I'm quite busy right now, and about to be 
away from home for 5 days) is a system which would (or could) run from 
cron on every member of the farm, and upload its results to a central 
server where it could be displayed, in a somewhat similar way to the way 
the Samba build farm works - see http://build.samba.org/ - so we'd be 
able to see at a glance when something is broken and where and why. We 
could also incorporate email notification of breakage, as a refinement.

I have a few pieces of this working but not a full suite yet - it will 
essentially be 3 perl scripts - one on the client (to run the update(s), 
build(s) and upload the results) and two on the central server (one for 
upload and one for display). When I get a demo page done I'll show it 
working with a couple of hosts.

Of course, you can automate (almost) anything, including telnet, but 
right now I'm assuming the farm members will have internet connectivity.

cheers

andrew



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


splitting WAL (was RE: [HACKERS] Providing anonymous mmap as an option of sharing memory)

2003-11-25 Thread Zeugswetter Andreas SB SD

 In case of WAL per database, the operations done on a shared catalog from a 
 backend would need flushing system WAL and database WAL to ensure such 
 transaction commit. Otherwise only flushing database WAL would do.

I don't think that is a good idea. If you want databases separated you should 
install more than one instance. That gives you way more flexibility.

Imho per database WAL is a deficiency, not a feature.

Andreas

PS: problem with mmap was, that it has no attached process count

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Considerations for lib64

2003-11-25 Thread Alvaro Herrera
On Mon, Nov 24, 2003 at 07:25:56PM +0100, Peter Eisentraut wrote:

 Currently, when you specify --with-openssl, then configure automatically
 adds -L/usr/local/ssl/lib to LDFLAGS if that directory exists.  This would
 pick up the wrong directory if you are in 64-bit mode.  Analogous behavior
 exists for --with-krb5 with /usr/athena.

 I think these default installation directories of OpenSSL and Kerberos are
 mostly obsolete these days, so I'd rather get rid of that behavior
 altogether and let people specify the necessary directories with
 --with-libraries and --with-includes like for any of the other optional
 packages that PostgreSQL supports.

Both default directories are wrong according to my installation.  In
fact, both libraries are just in /usr/lib.  Certainly it will be a mess
trying to compile with a different LDFLAGS if things are added randomly.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Kurt Roeckx
On Tue, Nov 25, 2003 at 08:40:57PM +0900, Tatsuo Ishii wrote:
  On Tue, 25 Nov 2003, Peter Eisentraut wrote:
  
  I've always thought unicode was enough to even represent Japanese. Then 
  the client encoding can be something else that we can convert to. In any 
  way, the encoding of the message catalog has to be known to the system so 
  it can be converted to the correct encoding for the client.
 
 I'm tired of telling that Unicode is not that perfect.

Maybe it should be explained what the problems really are,
instead of saying it isn't perfect?

From what I understand there is only a problem converting from
the legacy encoding to unicode, and the other way around, and
no problem if you stop doing the conversion.

The conversion problem is because what in an encoding is only
represented by 1 character can be several characters in unicode.

Some examples people might understand are:
- µ: In iso 8859-1 it's char 0xB5.  In unicode it can be U+00B5 (micro
sign) or U+03BC (greek letter small mu)
- Å: ISO 8859-1: 0xC5. Unicode U+00C5 (latin capital letter a
with ring above) or U+212B (angstrom sign)
- The ohm sign vs the greek letter omega.
- Quotation marks: You have left double quote, right double
  quote, and a few others.

 Another gottcha
 with Unicode is the UTF-8 encoding (currently we use) consumes 3
 bytes for each Kanji character, while other encodings consume only 2
 bytes. IMO 3/2 storage ratio could not be neglected for database use.

You can encode unicode in different ways, and UTF-8 is only one
of them.  Is there a problem with using UCS-2 (except that it
would require more storage for ASCII)?


Kurt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Function parameter names

2003-11-25 Thread Dennis Bjorklund
On Sun, 23 Nov 2003, Tom Lane wrote:

 Actually I'd suggest text[], as there is no good reason to pad the
 array entries to a fixed length.

I've implemented this part now and it stores the paremeter names in the 
pg_proc table as a text[] field.

However, in the parser I use IDENT to get the parameter names and already
in the lexer the IDENT tokens are truncated to length NAMEDATALEN.

So I've got 3 options:

 1) Leave it as is now where the system table allows any length
but the parser only lets you insert short identifiers.

 2) Change the type to name[]

 3) Change the parser to accept identifiers of any length and add
the length check in a later phase for the identifiers that need
to be shorter.

Any opinions or should I just make a choice myself?

-- 
/Dennis


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Function parameter names

2003-11-25 Thread Tom Lane
Dennis Bjorklund [EMAIL PROTECTED] writes:
 However, in the parser I use IDENT to get the parameter names and already
 in the lexer the IDENT tokens are truncated to length NAMEDATALEN.

Right.  What's the problem?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Commercial binary support?

2003-11-25 Thread Josh Berkus
Hans, Josh,

 
 Please take this thread OFF LIST IMMEDIATELY.
 

Sorry.  Not enough coffee this AM -- should know better than to send e-mail 
when I'm short beans.

Overreacted a bit, there.Apologies.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Peter Eisentraut
Greg Stark writes:

 This sounds like you want to completely reimplement all of the locale handling
 provided by the OS? That seems like a dead-end approach to me. There's no way
 your handling will ever be as complete or as well optimized as some OS's.

Actually, I'm pretty sure it will be more complete.  About the
optimization, we'll have to see.

 Better to find ways to use the OS gettext and locale handling on platforms
 that provide good interfaces.

There are no such platforms to my knowledge.  The exception is some
version of glibc that provides undocumented interfaces to functionality
that is rumoured to do something that may or may not be relevant to what
we're doing.

 On platforms that don't provide good interfaces either don't support the
 features or use some third party library to provide a good
 implementation.

There are no such libraries.  I keep hearing ICU, but that is much too
bloated.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Function parameter names

2003-11-25 Thread Dennis Bjorklund
On Tue, 25 Nov 2003, Tom Lane wrote:

 Dennis Bjorklund [EMAIL PROTECTED] writes:
  However, in the parser I use IDENT to get the parameter names and already
  in the lexer the IDENT tokens are truncated to length NAMEDATALEN.
 
 Right.  What's the problem?

It's strange to allow identifiers to be of any length in the system table 
when there is no way to create it using normal syntax. The parser accepts 
this kind of input:

CREATE FUNCTION foo (x int) RETURNS int AS ...

and the identifier x (as all identifiers) can not be too long. Still, one 
can create the function and update the system table by hand to change x to 
a longer name. Doesn't that sound ugly to you?

It's not a technical problem, but a matter of style. Everything works as 
it is now, but works is not always enough.

-- 
/Dennis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Hannu Krosing
Peter Eisentraut kirjutas T, 25.11.2003 kell 21:13:
 Greg Stark writes:
 
  This sounds like you want to completely reimplement all of the locale handling
  provided by the OS? That seems like a dead-end approach to me. There's no way
  your handling will ever be as complete or as well optimized as some OS's.
 
 Actually, I'm pretty sure it will be more complete.  About the
 optimization, we'll have to see.
 
  Better to find ways to use the OS gettext and locale handling on platforms
  that provide good interfaces.
 
 There are no such platforms to my knowledge. 

Unless you consider ICU (http://oss.software.ibm.com/icu/) as a
platform ;)

We will hardly ever be more complete than it.

 There are no such libraries.  I keep hearing ICU, but that is much too
 bloated.

At least it is kind of standard and also something what will be
maintained for foreseeable future, it also has a compatible license and
is available on all platforms of interest to postgresql.

And I am not sure that this bloat will affect us too much unless we
want to start maintaining a parallel copy - glibc is much more bloated
than ICU .

But if you insist on rolling your own library, you can always use ICU to
write regression test to compare yours with ...

-
Hannu


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Function parameter names

2003-11-25 Thread Tom Lane
Dennis Bjorklund [EMAIL PROTECTED] writes:
 and the identifier x (as all identifiers) can not be too long. Still, one 
 can create the function and update the system table by hand to change x to 
 a longer name. Doesn't that sound ugly to you?

It has always been, and likely always will be, possible to use manual
updating of the system catalogs to arrive at states that you could not
get into otherwise, and which might or might not work correctly, for
whatever value of correctly you think is correct.  This doesn't
particularly bother me, since we have always told people that manual
updates are unsupported and are strictly for people who know exactly
what they're doing.

If it really bugs you, possibly the column could be declared as
varchar(NAMEDATALEN-1)[] rather than text[], but I think the amount
of effort needed to make that happen within the .bki file would be well
out of proportion to the usefulness.  (Actually, it'd still not be 100%
right, since varchar(N) counts characters not bytes ...)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Kurt Roeckx [EMAIL PROTECTED] writes:
 You can encode unicode in different ways, and UTF-8 is only one
 of them.  Is there a problem with using UCS-2 (except that it
 would require more storage for ASCII)?

UCS-2 is impractical without some *extremely* wide-ranging changes in
the backend.  To take just the most obvious point, doesn't it require
allowing embedded zero bytes in text strings?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] 7.4final regression failure on uw713

2003-11-25 Thread ohp
Hi,

Don't know if it's bad, but make check reports a regression failure on
join.

Here's the regression.diffs

*** ./expected/join.out Thu Sep 25 08:58:06 2003
--- ./results/join.out  Tue Nov 25 23:46:27 2003
***
*** 1732,1739 
   | 6 | 6 | six   |
   | 7 | 7 | seven |
   | 8 | 8 | eight |
-  |   |   | null  |
   |   | 0 | zero  |
  (13 rows)

  SELECT '' AS xxx, *
--- 1732,1739 
   | 6 | 6 | six   |
   | 7 | 7 | seven |
   | 8 | 8 | eight |
   |   | 0 | zero  |
+  |   |   | null  |
  (13 rows)

  SELECT '' AS xxx, *
***
*** 1752,1759 
   | 6 | 6 | six   |
   | 7 | 7 | seven |
   | 8 | 8 | eight |
-  |   |   | null  |
   |   | 0 | zero  |
  (13 rows)

  SELECT '' AS xxx, *
--- 1752,1759 
   | 6 | 6 | six   |
   | 7 | 7 | seven |
   | 8 | 8 | eight |
   |   | 0 | zero  |
+  |   |   | null  |
  (13 rows)

  SELECT '' AS xxx, *
***
*** 1793,1800 
  -+---+---+---+
   | 0 |   | zero  |
   | 1 | 4 | one   | -1
-  | 2 | 3 | two   |  2
   | 2 | 3 | two   |  4
   | 3 | 2 | three | -3
   | 4 | 1 | four  |
   | 5 | 0 | five  | -5
--- 1793,1800 
  -+---+---+---+
   | 0 |   | zero  |
   | 1 | 4 | one   | -1
   | 2 | 3 | two   |  4
+  | 2 | 3 | two   |  2
   | 3 | 2 | three | -3
   | 4 | 1 | four  |
   | 5 | 0 | five  | -5
***
*** 1815,1822 
  -+---+---+---+
   | 0 |   | zero  |
   | 1 | 4 | one   | -1
-  | 2 | 3 | two   |  2
   | 2 | 3 | two   |  4
   | 3 | 2 | three | -3
   | 4 | 1 | four  |
   | 5 | 0 | five  | -5
--- 1815,1822 
  -+---+---+---+
   | 0 |   | zero  |
   | 1 | 4 | one   | -1
   | 2 | 3 | two   |  4
+  | 2 | 3 | two   |  2
   | 3 | 2 | three | -3
   | 4 | 1 | four  |
   | 5 | 0 | five  | -5

==


-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] PANIC: rename from /data/pg_xlog/0000002200000009

2003-11-25 Thread Tom Lane
Yurgis Baykshtis [EMAIL PROTECTED] writes:
 I just noticed that the rename panic errors like this one:

 PANIC:  rename from /data/pg_xlog/0003001F to
 /data/pg_xlog/0003002C (initialization of log file 3, segment 44)
 failed: No such file or directory

 come shortly AFTER the following messages

 LOG:  recycled transaction log file 0003001B
 LOG:  recycled transaction log file 0003001C
 LOG:  recycled transaction log file 0003001D
 LOG:  recycled transaction log file 0003001E
 LOG:  removing transaction log file 0003001F
 LOG:  removing transaction log file 00030020
 LOG:  removing transaction log file 00030021
 LOG:  removing transaction log file 00030022

 So, you can see that 0003001F file was previously deleted by the
 logic in MoveOfflineLogs() function.

Interesting ...

 Now what I can see is that MoveOfflineLogs() does not seem to be
 synchronized between backends.

It's certainly supposed to be, because the only place it is called from
holds the CheckPointLock while it's doing it.  If more than one backend
is able to run MoveOfflineLogs at a time, then the LWLock code is simply
broken.  That seems unlikely, as just about nothing would work reliably
if LWLock failed to lock out concurrent operations.

What I suspect at this point is a cygwin bug: somehow, its
implementation of readdir() is able to retrieve a stale view of a
directory.  I'd suggest pinging the cygwin developers to see if that
idea strikes a chord or not.

[ thinks for a bit... ]  It might be that it isn't even a stale-data
issue, but that readdir() misbehaves if there are concurrent insert,
rename or delete operations carried out in the same directory.  (The
renames or deletes would be coming from MoveOfflineLogs itself, the
inserts, if any, from concurrent backends finding that they need more
WAL space.)  Again I would call that a cygwin bug, as we've not seen
reports of comparable behavior anywhere else.

 Also, we have a suspicion that the problem happens even with only one client
 connected to postgres.

Unless the clients are issuing explicit CHECKPOINT operations, that
wouldn't matter, because MoveOfflineLogs is called only from
checkpointing, and the postmaster never creates more than one background
checkpoint process at a time.  (So there are actually two levels of
protection in place against concurrent execution of this code.)

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 7.4final regression failure on uw713

2003-11-25 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Don't know if it's bad, but make check reports a regression failure on
 join.

I believe we'd determined that this is an acceptable platform-specific
behavior.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Christopher Kings-Lynne
About storing data in the database, I would expect it to work with any
encoding, just like I would expect pg to be able to store images in any
format.
What's stopping us supporting the other Unicode encodings, eg. UCS-16 
which could save Japansese storage space.

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Function parameter names

2003-11-25 Thread Neil Conway
Dennis Bjorklund [EMAIL PROTECTED] writes:
 It's strange to allow identifiers to be of any length in the system
 table when there is no way to create it using normal syntax.

I agree with Tom -- that doesn't seem strange to me at all.

-Neil


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Release cycle length

2003-11-25 Thread Andrew Sullivan
On Mon, Nov 24, 2003 at 11:08:44PM -0600, Jim C. Nasby wrote:

 Has anyone looked at using replication as a migration method? If

Looked at?  Sure.  Heck, I've done it.  Yes, it works.  Is it
painless?  Well, that depends on whether you think using erserver is
painless. ;-)  It's rather less downtime than pg_dump | psql, I'll
tell you.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] More detail on settings for pgavd?

2003-11-25 Thread Andrew Sullivan
On Fri, Nov 21, 2003 at 07:51:17PM -0500, Greg Stark wrote:
 The second vacuum waits for the lock to become available. If the
 situation got really bad there could end up being a growing queue
 of vacuums waiting.

Those of us who have run into this know that the situation got
really bad is earlier than one might think.  And it can indeed cause
some pretty pathological behaviour.

A


-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] A rough roadmap for internationalization fixes

2003-11-25 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 The only advantage to adding locales per-column and/or per-index is the
 notational simplicity.

Well, actually, the reason we are interested in doing it is the SQL spec
demands it.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster