[HACKERS] array_to_json re-encodes ARRAY of json type

2012-02-20 Thread Itagaki Takahiro
If we pass an ARRAY of json type to array_to_json() function, the
function seems to
re-encode the JSON text. But should the following examples be the same result?
I'm not sure why we don't have a special case for json type in datum_to_json()
-- do we need to pass-through json types in it?

=# \x
=# SELECT '[A]'::json,
  array_to_json(ARRAY['A']),
  array_to_json(ARRAY['A'::json]);
-[ RECORD 1 ]-+--
json  | [A]
array_to_json | [A]
array_to_json | [\A\]

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink: enable parameters

2011-11-22 Thread Itagaki Takahiro
On Tue, Nov 22, 2011 at 20:09, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 It would be nice to add support for SQL/MED passthrough mode... That would
 allow you to do any queries/updates to foreign servers. It wouldn't sound
 very difficult at first glance, though I'm not sure what it would mean to
 our parser, for example.

I think passthrough mode is almost impossible or has very limited usage
because we cannot pass query texts that our parser doesn't accept;
we cannot support foreign-table-specific queries.

 On 22.11.2011 06:55, Pavel Stehule wrote:
 I know so dblink is deprecated interface - but it has necessary
 functionality still -  it support a writable statements.

So, dblink on FDW connection seems to be a possible solution.
We pass a query as a form of a plain text.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dash in Extension Names

2011-11-10 Thread Itagaki Takahiro
On Fri, Nov 11, 2011 at 14:40, David E. Wheeler da...@kineticode.com wrote:
 one might use - in the name itself, but probably not -- -- it seems that 
 the dash is not actually allowed:

    create extension pgtap-core;
    ERROR:  syntax error at or near -
    LINE 1: create extension pgtap-core;

 Parser error?

You need double-quotes around the name:
=# CREATE EXTENSION uuid-ossp;
CREATE EXTENSION

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UTF-8 files with BOM in COPY FROM

2011-09-26 Thread Itagaki Takahiro
On Mon, Sep 26, 2011 at 20:12, Magnus Hagander mag...@hagander.net wrote:
 I like it in general. But if we're looking at the BOM, shouldn't we
 also look and *reject* the file if it's a BOM for a non-UTF8 file? Say
 if the BOM claims it's UTF16?

-1 because we're depending on manual configuration for now.
It would be reasonable if we had used automatic detection of
character encoding, but we don't. In addition, some crazy
encoding might use BOM codes as a valid character.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Support UTF-8 files with BOM in COPY FROM

2011-09-25 Thread Itagaki Takahiro
Hi,

I'd like to support UTF-8 text or csv files that has BOM (byte order mark)
in COPY FROM command. BOM will be automatically detected and ignored
if the file encoding is UTF-8. WIP patch attached.

I'm thinking about only COPY FROM for reads, but if someone wants to add
BOM in COPY TO, we might also support COPY TO WITH BOM for writes.

Comments welcome.

-- 
Itagaki Takahiro


copy_from_bom.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding Japanese README

2011-06-29 Thread Itagaki Takahiro
On Thu, Jun 30, 2011 at 09:42, Tatsuo Ishii is...@postgresql.org wrote:
 BTW I will talk to some Japanese speaking developers about my idea if
 community agree to add Japanese README to the source tree so that I am
 not the only one who are contributing this project

 Now, if someone wanted to set up a web site or Wiki page with
 translations, that would be up to them.

IMHO, the Wiki approach seems to be reasonable than a README file.
It will be suitable for adding non-Japanese translations and
non-core developer can join to translate or fix the docs.

If we will promote those README files as documentation, we could
move them into internals section in the SGML doc tree. If so,
the translated README will be a part of the doc translation project.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench--new transaction type

2011-06-19 Thread Itagaki Takahiro
On Mon, Jun 20, 2011 at 07:30, Greg Smith g...@2ndquadrant.com wrote:
 I applied Jeff's patch but changed this to address concerns about the
 program getting stuck running for too long in the function:

 #define plpgsql_loops   512

Is it OK to define the value as constant?

Also, it executes much more queries if -t option (transactions) specified;
Of course it runs the specified number of transactions, but actually
runs plpgsql_loops times than other modes.

 I think this is a really nice new workload to demonstrate.  One of the
 things we tell people is that code works much faster when moved server-side,

What is the most important part of the changes? The proposal includes
3 improvements. It might defocus the most variable tuning point.

 #1 Execute multiple queries in one transaction.
 #2 Run multiple queries in the server with stored procedure.
 #3 Return only one value instead of 512.

Anyway, I'm not sure we need to include the query mode into the pgbench's
codes. Instead, how about providing a sample script as a separate sql
file? pgbench can execute any script files with -f option.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible connection leak in dblink?

2011-06-14 Thread Itagaki Takahiro
On Wed, Jun 15, 2011 at 11:41, Fujii Masao masao.fu...@gmail.com wrote:
 ISTM that the root problem is that dblink_send_query calls DBLINK_GET_CONN
 though it doesn't accept the connection string as an argument.

+1 for the fix. I have the same conclusion at the first glance.

 Since the first
 argument in dblink_send_query must be the connection name, dblink_send_query
 should call DBLINK_GET_NAMED_CONN instead. The variable 'freeconn' is used
 only when DBLINK_GET_CONN is called. So, if dblink_send_query uses
 DBLINK_GET_NAMED_CONN instead, the variable 'freeconn' is no longer necessary.

 The similar problem exists in dblink_get_result and dblink_record_internal.
 Attached patch fixes those problems.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Itagaki Takahiro
On Tue, Jun 14, 2011 at 09:27, Jeff Janes jeff.ja...@gmail.com wrote:
 pgbench sends each query (per connection) and waits for the reply
 before sending another.

We can use -j option to run pgbench in multiple threads to avoid
request starvation. What setting did you use, Stefan?

 for those curious - the profile for pgbench looks like:
 samples  %symbol name
 2937841.9087  doCustom
 1750224.9672  threadRun
 7629 10.8830  pg_strcasecmp

If the bench is bottleneck, it would be better to reduce pg_strcasecmp
calls by holding meta command names as integer values of sub-META_COMMAND
instead of string comparison for each loop.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Itagaki Takahiro
On Tue, Jun 14, 2011 at 13:09, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 I noticed that pgbench's doCustom (the function highest in the profile
 posted) returns doing nothing if the connection is supposed to be
 sleeping; seems an open door for busy waiting.

pgbench uses select() with/without timeout in the cases, no?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap vacuum cleanup locks

2011-06-05 Thread Itagaki Takahiro
On Sun, Jun 5, 2011 at 12:03, Robert Haas robertmh...@gmail.com wrote:
 If other buffer pins do exist, then we can't
 defragment the page, but that doesn't mean no useful work can be done:
 we can still mark used line pointers dead, or dead line pointers
 unused.  We cannot defragment, but that can be done either by the next
 VACUUM or by a HOT cleanup.

This is just an idea -- Is it possible to have copy-on-write techniques?
VACUUM allocates a duplicated page for the pinned page, and copy valid
tuples into the new page. Following buffer readers after the VACUUM will
see the cloned page instead of the old pinned one.

Of course, copy-on-writing is more complex than skipping pinned pages,
but I wonder we cannot vacuum at all in some edge cases with the
skipping method.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock problem with autovacuum truncating heap

2011-03-26 Thread Itagaki Takahiro
On Sun, Mar 27, 2011 at 01:12, Simon Riggs si...@2ndquadrant.com wrote:
 At the same time I would
 change count_nondeletable_pages() so that it uses a forward scan direction
 (if that leads to a speedup).

+1.

 Do we need that? Linux readahead works in both directions doesn't it?
 Guess it wouldn't hurt too much.

Yes, probably. AFAIK, RHEL 5 cannot readahead in backward scans.
It might be improved in the latest kernel, but it would be safe
not to rely on kernels except simple forward scans.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] KEEPONLYALNUM for pg_trgm is not documented

2011-03-11 Thread Itagaki Takahiro
contrib/pg_trgm in 9.1 becomes more attractive feature by index supports
for LIKE operators, but only alphabet and numeric characters are indexed
by default. But, we can modify KEEPONLYALNUM in the source code to
keep all characters in n-gram words.

However, the limitation and KEEPONLYALNUM are not documented in the page:
  http://developer.postgresql.org/pgdocs/postgres/pgtrgm.html

An additonal documentation patches acceptable? The issues would be a FAQ for
non-English users. I heard that pg_trgm will be one of the *killer features*
of 9.1 in Japan, where N-gram based text search is preferred.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Header comments in the recently added files

2011-03-09 Thread Itagaki Takahiro
I found trivial mistakes in the recently added files.
Will they fixed by some automated batches, or by manual?

- Copyright (c) xxx-*2010*, PostgreSQL Global Development Group
  in pg_collation.h, pg_foreign_table.h, basebackup.h, syncrep.h,
  pg_backup_directory.c and auth_delay.c.
- IDENTIFICATION $PostgreSQL$ in pg_collation.h, syncrep.h, and syncrep.c
  Other files has their actual paths in the same place.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Header comments in the recently added files

2011-03-09 Thread Itagaki Takahiro
On Thu, Mar 10, 2011 at 12:55, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 9, 2011 at 8:33 PM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 I found trivial mistakes in the recently added files.
 Will they fixed by some automated batches, or by manual?

 I think these should be fixed manually.  The first might eventually
 get fixed automatically, but perhaps not until 2012, and I'm not sure
 the second would ever get fixed automatically.

OK, I'll do that.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] aggregate version of first_value function?

2011-03-03 Thread Itagaki Takahiro
We have window function version of first_value(),
but aggregate version looks useful to write queries something like:

=# CREATE TABLE obj (id integer, pos point);
=# SELECT X.id,
  first_value(Y.id ORDER BY X.pos - Y.pos) AS neighbor
   FROM obj X, obj Y
   GROUP BY X.id;

Is it reasonable? Or, do we have alternative ways for the same purpose?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNLOGGED tables in psql \d

2011-02-22 Thread Itagaki Takahiro
On Tue, Feb 22, 2011 at 18:11, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2011/2/22 Itagaki Takahiro itagaki.takah...@gmail.com:
 psql \d(+) doesn't show any information about UNLOGGED and TEMP attributes
 for the table. So, we cannot know the table is unlogged or not unless
 we directly select from pg_class.relpersistence.  Is this a TODO item?

 I believe it is in the title of the table presented by \d (Table,
 Unlogged table, Temp table)

Ah, I see. Thank you.  They are shown as Unlogged Table and
Unlogged Index.

- We don't have Temp for temp tables. Should we have?
- The head characters of the second words would be small letters.
  We describe other objects as Composite type, Foreign table, or so.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql tab-completion for CREATE UNLOGGED TABLE

2011-02-22 Thread Itagaki Takahiro
Here is a patch to support CREATE UNLOGGED TABLE in psql tab-completion.
It also fixes a bug that DROP is completed with TEMP and UNIQUE unexpectedly
and cleanup codes for DROP OWNED BY in drop_command_generator().

-- 
Itagaki Takahiro


psql_tab_completion_for_unlogged-20110223.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] validating foreign tables

2011-02-21 Thread Itagaki Takahiro
On Tue, Feb 22, 2011 at 10:12, Andrew Dunstan and...@dunslane.net wrote:
 The API for FDW validators doesn't appear to have any way that the
 validator function can check that the defined foreign table shape
 matches the FDW options sanely.

 Huh?  The options ought to be orthogonal to the table column info.
 If they're not, maybe you need to rethink your option definitions.

 Well, let's take a couple of cases.

 1. My old favorite, file as a text array.
 2. A hypothetical RSS feed, where the options specify which RSS fields we
 want.

I think we need to overhaul validators in 9.2 listening to FDW developers'
opinions anyway. The text array is an example, but there should be many
other requirements. Personally, I'd like to have a method to list available
options from SQL. We should also consider column-level options for foreign
tables then.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UNLOGGED tables in psql \d

2011-02-21 Thread Itagaki Takahiro
psql \d(+) doesn't show any information about UNLOGGED and TEMP attributes
for the table. So, we cannot know the table is unlogged or not unless
we directly select from pg_class.relpersistence.  Is this a TODO item?

The same issue is in TEMP tables, but we can determine them by their
schema; they are always created in pg_temp_N schema.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contrib/unaccent regression test failed

2011-02-20 Thread Itagaki Takahiro
I found make installcheck for contrib/unaccent in master is failed
on my machine, The regressions.diffs attached.

I'm not sure the reason because BuildFarm machines are all OK,
but we might need some checks in addition to database encoding.

* OS:
Linux 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010
x86_64 x86_64 x86_64 GNU/Linux
* PG DB:
Name|  Owner   | Encoding | Collate | Ctype |   Access
privileges
+--+--+-+---+---
 contrib_regression | postgres | UTF8 | C   | C |

-- 
Itagaki Takahiro


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY ENCODING revisited

2011-02-20 Thread Itagaki Takahiro
On Fri, Feb 18, 2011 at 20:12, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 + extern char *pg_any_to_server(const char *s, int len, int encoding);
 + extern char *pg_server_to_any(const char *s, int len, int encoding);

I applied the version with additional codes for file_fdw.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY ENCODING revisited

2011-02-18 Thread Itagaki Takahiro
On Fri, Feb 18, 2011 at 03:57, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 16, 2011 at 10:45 PM, Itagaki Takahiro
 I am not qualified to fully review this patch because I'm not all that
 familiar with the encoding stuff, but it looks reasonably sensible on
 a quick read-through.  I am supportive of making a change in this area
 even at this late date, because it seems to me that if we're not going
 to change this then we're pretty much giving up on having a usable
 file_fdw in 9.1.  And since postgresql_fdw isn't in very good shape
 either, that would mean we may as well give up on SQL/MED.  We might
 have to do that anyway, but I don't think we should do it just because
 of this issue, if there's a reasonable fix.

One design issue is the new function names:
  extern char *pg_client_to_server(const char *s, int len);
  extern char *pg_server_to_client(const char *s, int len);
+ extern char *pg_any_to_server(const char *s, int len, int encoding);
+ extern char *pg_server_to_any(const char *s, int len, int encoding);

They don't contain any words related to encoding or conversion.

Ishii-san, do you have comments? I guess you designed the area.
Please let me know if there are better alternatives.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Assertion failure on UNLOGGED VIEW and SEQUENCE

2011-02-18 Thread Itagaki Takahiro
UNLOGGED is defigned in OptTemp in gram.y, so the parser accepts
CREATE UNLOGGED VIEW and CREATE UNLOGGED SEQUENCE, but they are
actually not supported.

=# CREATE UNLOGGED VIEW uv AS SELECT 1;
TRAP: FailedAssertion(!(relkind == 'r' || relkind == 't'), File:
heap.c, Line: 1246)

=# CREATE UNLOGGED SEQUENCE us;
TRAP: FailedAssertion(!(relkind == 'r' || relkind == 't'), File:
heap.c, Line: 1246)

The most easiest fix would be preventing them in parser level.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY ENCODING revisited

2011-02-17 Thread Itagaki Takahiro
On Fri, Feb 18, 2011 at 04:04, Hitoshi Harada umi.tan...@gmail.com wrote:
 FWIW, I finally found the good example to cache miscellaneous data in
 file local, namely regexp.c. It allocates compiled regular expressions
 up to 32 by using malloc().

I'm not exactly sure the cache usage in mbutils.c because it doesn't have
routine for cache invalidation at all.  Conversion procs are rarely
replaced, but we might not ought to keep cached functions unlimitedly.

Regexp cache is OK because the algorithm cannot be modified at runtime.

 We need only 4 or so for encoding
 conversion cache, in which cache search doesn't seem like overhead.

We need to research what we should cache for conversion procs.
We will need 4 bytes per conversion pair if we cache only OIDs,
but sizeof(FmgrInfo) bytes if we use the same way as ToXXXConvProc cache.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-02-17 Thread Itagaki Takahiro
On Thu, Feb 17, 2011 at 17:05, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 I did a few cosmetic fixes, mainly lmgr/README and make a subroutine
 ReleaseLockForOwner() for LockReleaseSession and LockReleaseCurrentOwner.

Committed with a few typo fixes. Thanks, Marko!

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2011-01 as of 2011-02-04

2011-02-17 Thread Itagaki Takahiro
On Fri, Feb 18, 2011 at 13:15, Shigeru HANADA han...@metrosystems.co.jp wrote:
 When I've used COPY TO for testing file_fdw, I got wrong result.
 It would be because DR_copy's processed is not initialized in
 CreateCopyDestReceiver().  Please see attached patch.

Oops, thanks, applied.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] COPY ENCODING revisited

2011-02-16 Thread Itagaki Takahiro
COPY ENCODING patch was returned with feedback,
  https://commitfest.postgresql.org/action/patch_view?id=501
but we still need it for file_fdw.  Using client_encoding at runtime
is reasonable for one-time COPY command, but logically nonsense for
persistent file_fdw tables.

Base on the latest patch,
  http://archives.postgresql.org/pgsql-hackers/2011-01/msg02903.php
I added pg_any_to_server() and pg_server_to_any() functions instead of
exposing FmgrInfo in pg_wchar.h.  They are same as pg_client_to_server()
and pg_server_to_client(), but accept any encoding. They use cached
conversion procs only if the specified encoding matches the client encoding.

According to Harada's research,
  http://archives.postgresql.org/pgsql-hackers/2011-01/msg02397.php
non-cached conversions are slower than cached ones. This version provides
the same performance before when file and client encoding are same,
but would be a bit slower on other cases. We could improve the performance
in future versions, for example, caching each used conversion proc in
pg_do_pg_do_encoding_conversion().

file_fdw will support ENCODING option. Also, if not specified it might
have to store the client_encoding at CREATE FOREIGN TABLE. Even if we use
a different client_encoding at SELECT, the encoding at definition is used.

ENCODING 'quoted name' issue is also fixed; it always requires quoted names.
I think we only accept non-quoted text as identifier names. Unquoted text
should be treated as double quoted, but encoding names are not identifiers.

-- 
Itagaki Takahiro


copy_encoding-20110217.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2011-01 as of 2011-02-04

2011-02-15 Thread Itagaki Takahiro
On Tue, Feb 15, 2011 at 01:27, Robert Haas robertmh...@gmail.com wrote:
 However, file_fdw is in pretty serious trouble because (1) the copy
 API patch that it depends on still isn't committed and (2) it's going
 to be utterly broken if we don't do something about the
 client_encoding vs. file_encoding problem; there was a patch to do
 that in this CF, but we gave up on it.

Will we include the copy API patch in 9.1 even if we won't have file_fdw?
Personally, I want to include the APIs because someone can develop file_fdw
as a third party extension for 9.1 using the infrastructure. The extension
will lack of file encoding support, but still useful for many cases.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest 2011-01 as of 2011-02-04

2011-02-15 Thread Itagaki Takahiro
On Wed, Feb 16, 2011 at 02:14, Andrew Dunstan and...@dunslane.net wrote:
 I've been kind of wondering why you haven't already committed it.  If
 you're confident that the code is in good shape, I don't particularly
 see any benefit to holding off.

 +10. The sooner the better.

Thanks comments. I've applied the COPY API patch.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-02-15 Thread Itagaki Takahiro
On Tue, Feb 8, 2011 at 00:30, Shigeru HANADA han...@metrosystems.co.jp wrote:
 Fixed version is attached.

I reviewed your latest git version, that is a bit newer than the attached patch.
http://git.postgresql.org/gitweb?p=users/hanada/postgres.git;a=commit;h=0e1a1e1b0e168cb3d8ff4d637747d0ba8f7b8d55

The code still works with small adjustment, but needs to be rebased on the
latest master, especially for extension support and copy API changes.

Here are a list of comments and suggestions:

* You might forget some combination or unspecified options in
file_fdw_validator().
  For example, format == NULL or !csv  header cases. I've not tested all
  cases, but please recheck validations used in BeginCopy().

* estimate_costs() needs own row estimation rather than using baserel.
  What value does baserel-tuples have?
  Foreign tables are never analyzed for now. Is the number correct?
 No, baserel-tuples is always 0, and baserel-pages is 0 too.
 In addition, width and rows are set to 100 and 1, as default values.

It means baserel is not reliable at all, right? If so, we need alternative
solution in estimate_costs(). We adjust statistics with runtime relation
size in estimate_rel_size(). Also, we use get_rel_data_width() for not
analyzed tables. We could use similar technique in file_fdw, too.

* Should use int64 for file byte size (or, uint32 in blocks).
unsigned long might be 32bit. ulong is not portable.

* Oid List (list_make1_oid) might be more handy than Const to hold relid
  in FdwPlan.fdw_private.

* I prefer FileFdwExecutionState to FileFdwPrivate, because there are
  two different 'fdw_private' variables in FdwPlan and FdwExecutionState.

* The comment in fileIterate seems wrong. It should be
  /* Store the next tuple as a virtual tuple. */  , right?

* #include sys/stat.h is needed.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for logging the current role

2011-02-14 Thread Itagaki Takahiro
On Mon, Feb 14, 2011 at 11:51, Stephen Frost sfr...@snowman.net wrote:
 It would be awfully nice if we could inject something into the csvlog
 output format that would let client programs know which fields are
 present.  This would be especially useful for people writing tools
 that are intended to work on ANY PG installation, rather than just,
 say, their own.  I'm not sure if there's a clean way to do that,
 though.

 Added csvlog_header GUC to allow the user to ask for the header to be
 printed at the top of each log file.  If and when an option is added to
 PG's COPY to respect the header, this should resolve that issue.

We need to design csvlog_header more carefully. csvlog_header won't work
if log_filename is low-resolution, ex. log-per-day.  It's still useful when
a DBA reads the file manually, but documentation would better.
Or, should we skip writing headers when the open log file is not
empty? It works unless when csvlog_fields is modified before restart,
and also on logger's crash and restart, though it's a rare case.

A few comments and trivial issues:

* It might be my misunderstanding, but there was a short description for %U
for in log_line_prefix in postgresql.conf, right? Did you remove it in the
latest version?

* The long csvlog_fields default is a problem also in postgresql.conf,
but we have no solution for the issue... The current code is the best for now.

* In assign_csvlog_fields(), we need to cleanup memory and memory context
before return on error.
+   /* check if no option matched, and if so, return error */
+   if (curr_option == MAX_CSVLOG_OPTS)
+   return NULL;

* An added needless return should be removed.
*** 2139,2144  write_csvlog(ErrorData *edata)
--- 2379,2386 
write_pipe_chunks(buf.data, buf.len, LOG_DESTINATION_CSVLOG);

pfree(buf.data);
+
+   return;
  }

  /*

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-02-14 Thread Itagaki Takahiro
On Mon, Feb 14, 2011 at 22:06, Noah Misch n...@leadboat.com wrote:
 On Sun, Feb 13, 2011 at 12:41:11PM -0600, Kevin Grittner wrote:
 Unpatched:
 real    17m24.171s
 real    16m52.892s
 real    16m40.624s
 real    16m41.700s

 Patched:
 real    15m56.249s
 real    15m47.001s
 real    15m3.018s
 real    17m16.157s

 Since you said that a cursory test, or no test at all, should be
 good enough given the low risk of performance regression, I didn't
 book a machine and script a large test run, but if anyone feels
 that's justified, I can arrange something.

 Based on this, I've taken the liberty of marking the patch Ready for 
 Committer.

Thank you very much for performance testing and reviewing!

The result is interesting because I didn't intend performance optimization.
At least no performance regression is enough for the purpose.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for logging the current role

2011-02-14 Thread Itagaki Takahiro
On Mon, Feb 14, 2011 at 23:30, Stephen Frost sfr...@snowman.net wrote:
  * In assign_csvlog_fields(), we need to cleanup memory and memory context
  before return on error.
 Fixed this and a couple of similar issues.

Not yet fixed. Switched memory context is not restored on error.

 Updated patch attached, git log below.

Now I mark the patch to Ready for Committer,
because I don't have suggestions any more.

For reference, I note my previous questions. Some of them might be TODO
items, or might not. We can add the basic feature in 9.1, and improve it
9.2 or later versions.

* csvlog_fields and csvlog_header won't work with non-default log_filename
  when it doesn't contain seconds in the format. They expect they can always
  open empty log files.

* The long default value for csvlog_fields leads long text line in
  postgresql.conf, SHOW ALL, pg_settings view, but there were no better
  alternative solutions in the past discussion.

* csvlog_fields is marked as PGC_POSTMASTER. It can protect mixed formats
  in a csv file on default log_filename, but other similar GUC variables
  are usually marked AS PGC_SIGHUP.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-02-11 Thread Itagaki Takahiro
On Sat, Feb 12, 2011 at 00:50, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Right, but making the parser slower has a cost, too.
 ScanKeywordLookup() is already a hotspot in some workloads, and
 there's overhead buried in the bison parser, too.

 Yeah.  Keep in mind that a bison parser fundamentally runs off a
 two-dimensional array: one axis is parser state and the other is token
 number.  They have some tricks to compress the array a bit, but adding
 keywords contributes directly to a bigger array, which means slower
 parsing (more L1 cache misses).  The parser's inner loop frequently
 shows up as a hotspot in profiles I do, and I think that has to be more
 about the amount of data it's touching than the cost of the loop per se.

Did you measure the actual cost in the real world? If we are using
such a sensitive parser, it should be a problem even without the patch.

 Adding unnecessary keywords is something to be avoided.

Our conclusion is we never support multiset syntax in the SQL standard,
right?  If we think they are unnecessary, we cannot support it.

I will remove parser changes from the patch; it will add only a few array
functions. Then, please let me know functions you don't want to include
in the core, if any. I'll remove them at the same time.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-02-11 Thread Itagaki Takahiro
On Sat, Feb 12, 2011 at 01:12, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 9, 2011 at 2:03 AM, Noah Misch n...@leadboat.com wrote:
 From a functional and code structure perspective, I find this ready to 
 commit.
 (I assume you'll drop the XXX: indent only comments on commit.)  Kevin, did 
 you
 want to do that performance testing you spoke of?

 OK, so is this Ready for Committer, or we're still working on it?

Basically, we have no more tasks until the FDW core API is applied.
COPY API and file_fdw patches are waiting for it.

If we extend them a little more, I'd raise two items:
* Should we print foreign table names in error messages?
  http://archives.postgresql.org/pgsql-hackers/2011-02/msg00427.php
* COPY encoding patch was rejected, but using client_encoding is
  logically wrong for file_fdw. We might need subset of the patch
  for file_fdw.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-02-11 Thread Itagaki Takahiro
On Sat, Feb 12, 2011 at 05:01, Stephen Frost sfr...@snowman.net wrote:
 Input arrays are always flattened into one-dimensional arrays.
 That just strikes me as completely broken when it comes to PG Arrays.

Contains operators (@, , @) ignore multi-dimensions.
Array slice operator ([lo:hi]) always reset the index offsets.

 What does the spec say about this, if anything?  Is that required by
 spec, or is the spec not relevant to this because MULTISETs are only one
 dimensional..?

Yes. The SQL standard has only one-dimensional ARRAYs and MULTISETs ,
though it supports collections of collections, that we don't have.

 I would think that we would have a way to define what dimension or piece
 of the array that would be sorted or returned as a set, etc.

It would be consistent if we return (ARRAY[][])[] = ARRAY[],
but we throw errors actually.

 In my view, we should be throwing an error if we get called on a
 multi-dim array and we can't perform the operation on that in an
 obviously correct way, not forcing the input to match something we can
 make work.

Agreed, I'll do so. We could also keep lower-bounds of arrays,
at least on sort.

 I'm not thrilled with called ArrayGetNItems array_cardinality().  Why
 not just provide a function with a name like array_getnitems()
 instead?

We must use the name, because it is in the SQL standard.
  FUNCTION cardinality(collection) = number
We would have overloaded two versions for ARRAYs andMULTISETs.

 trim_array() suffers from two problems: lack of comments, and no spell
 checking done on those that are there.

What comments do you want?

 Should get_type_cache() really live in array_userfuncs.c ?

Do you find codes using the same operation in other files?

 There's more, primairly lack of comments and what I consider poor
 function naming (sort_or_unique() ?  really?),

Could you suggest better names?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-02-10 Thread Itagaki Takahiro
On Thu, Feb 10, 2011 at 08:36, Marko Tiikkaja
marko.tiikk...@cs.helsinki.fi wrote:
 One issue might be in pg_locks
 Robert suggested not doing this for 9.1, and I don't have anything against
 that.

Agreed.

 Updated patch attached.

Looks good to commit. I note a few minor issues for committer:

* Functions listed in Table 9-62. Advisory Lock Functions might need
  sorted in alphabetical order.

* We could extend LockReleaseAll() to have the 3rd mode
  instead of LockReleaseSession().  Existing behavior is:
| LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
|   allLocks == true: release all locks including session locks.
|   allLocks == false: release all non-session locks.

* Or, we might have one subroutine for LockReleaseSession() and
  LockReleaseCurrentOwner(). They have similar codes.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for logging the current role

2011-02-10 Thread Itagaki Takahiro
On Mon, Feb 7, 2011 at 04:10, Stephen Frost sfr...@snowman.net wrote:
 Yeah, doesn't seem to work for me (missing '/bin/collateindex.pl',
 apparently..).

You might need yum install openjade stylesheets or similar packages
and re-configure.

 Ok, I've cleaned up that part of the documentation to be a table instead
 of the listings that were there, seems like a better approach anyway.

Yeah, that's a good job!

 I agree that it's logically good design, but we could not accept it
 as long as it breaks tools in the real world...
 If it does, I think it's pretty clear that those tools are themselves
 broken..

The word break was my wrong choice, but your new parameter still
requires very wide monitors to display SHOW ALL and pg_settings.
I'd like to solve the issue even though the feature itself is useful.
One fast and snappy solution might be to set the default value to
default, that means the compatible set of columns.
Other better ideas?

For implementation, write_csvlog() has many following lines:
 if (curr_field != num_fields) appendStringInfoChar(buf, ',');
It will be cleaner if we add first_col flag and move it out of
the switch statement.

Other questions I raised before might be matters of preference.
I'd like to here about them form third person.
 * name: log_csv_fields vs. csvlog_fields
 * when to assign: PGC_POSTMASTER vs. PGC_SIGHUP

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql patch: tab-complete :variables also at buffer start

2011-02-10 Thread Itagaki Takahiro
On Thu, Feb 10, 2011 at 19:37, Christoph Berg c...@df7cb.de wrote:
 Currently, tab-completing :variable names in psql does not work at the
 beginning of the line. Fix this by moving the code block before the
 empty buffer case.

Seems reasonable to me.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-02-09 Thread Itagaki Takahiro
On Thu, Feb 3, 2011 at 00:24, Marko Tiikkaja
marko.tiikk...@cs.helsinki.fi wrote:
 .. and here's the patch.  I'm not too confident with the code I added to
 storage/lmgr/lock.c, but it seems to be working.

Sorry for the delayed review.

The patch needs adjustment of OIDs for recently commits, but it still works
well. See the attached small fix.  The patch looks almost ready to commit
unless we want to fix the pg_locks issue below.

=== Features ===
Now unlock functions only release session-level locks and the behavior
is documented, so no confusion here. We don't have upgrade method
for advisory locks actually -- session and xact locks block each other,
but they are acquired and released independently.

One issue might be in pg_locks, as you pointed out in the previous mail:
 if a session holds both a transaction level and a session level lock
 on the same resource, only one of them will appear in pg_locks.
Also, we cannot distinguish transaction-level locks from session-level
locks from pg_locks.

It was not an issue before because session locks are only used in
internal implementation. It looks as a transaction from users.
However, this feature reveals the status in public. We might need
to add some bits to shared lock state to show which lock is session-level.

=== Implementation ===
* pg_advisory_unlock_all() calls LockReleaseSession(), ant it releases
not only advisory locks but also all session-level locks.
We use session-level locks in some places, but there is no chance
for user to send SQL commands during the lock. The behavior is safe
as of now, but it might break something in the future.
So I'd recommend to keep locktype checks in it.

* user_lockmethod.transactional was changed to 'true', so we don't have
any differences between it and default_lockmethod except trace_flag.
LockMethodData is now almost useless, but we could keep it for compatibility.

 Earlier there was some discussion about adding regression tests for advisory
 locks.  However, I don't see where they would fit in our current .sql files
 and adding a new one just for a few tests didn't seem right.  Anyone have an
 idea where they should go or should I just add a new one?

I think you can add advisory_lock.sql for the test.

-- 
Itagaki Takahiro


advisory4fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] another mvcc.sgml typo

2011-02-09 Thread Itagaki Takahiro
On Thu, Feb 10, 2011 at 09:30, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Trivial patch attached.

Applied. Thanks!

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-08 Thread Itagaki Takahiro
On Tue, Feb 8, 2011 at 09:38, Andrew Dunstan and...@dunslane.net wrote:
 Here is a patch against the latest revision of file_fdw to exercise this
 API. It includes some regression tests, and I think apart from one or two
 small details plus a requirement for documentation, is complete.

The patch contains a few fixes for typo in the original patch.
Hanada-san, could you take them into the core file_fdw patch?

   CREATE FOREIGN TABLE jagged_text (
        t   text[]
   ) SERVER file_server
   OPTIONS (format 'csv', filename
   '/home/andrew/pgl/pg_head/contrib/file_fdw/data/jagged.csv', header
   'true', textarray 'true');

There might be another approach -- we could have jagged_file_fdw aside
from file_fdw, because we cannot support some features in textarray mode
like force_not_null option and multiple non-text[] columns.

I'll include NextCopyFromRawFields() in COPY API patch to complete
raw_fields support in CopyState. (Or, we should also revert changes
related to raw_fields.)  However, we'd better postpone jagged csv
support to 9.2. The design is still under discussion.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries

2011-02-08 Thread Itagaki Takahiro
On Mon, Feb 7, 2011 at 20:38, Thom Brown t...@linux.com wrote:
 Yes, of course, int8 functions are separate.  I attach an updated
 patch, although I still think there's a better way of doing this.

Thanks. Please add the patch to the *current* commitfest
because it's a bugfix.
https://commitfest.postgresql.org/action/commitfest_view?id=9

I've not tested the patch yet, but if we could drop the following
line in the patch, the code could be much cleaner.
  /* ensure first value in series should exist */

 I'm not sure how this should be handled.  Should there just be a check
 for either kind of infinity and return an error if that's the case?  I

Maybe so. It also works if we had infinity on timestamp overflow, but
I've not tested yet.  Anyway, we need similar fix for timestamp versions.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries

2011-02-08 Thread Itagaki Takahiro
On Wed, Feb 9, 2011 at 10:17, Andrew Dunstan and...@dunslane.net wrote:
 Isn't this all really a bug fix that should be backpatched, rather than a
 commitfest item?

Sure, but we don't have any bug trackers...

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-02-08 Thread Itagaki Takahiro
On Tue, Feb 8, 2011 at 13:34, Robert Haas robertmh...@gmail.com wrote:
 So how close are we to having a committable version of this?  Should
 we push this out to 9.2?

I think so. The feature is pretty attractive, but more works are required:
 * Re-base on synchronized snapshots patch
 * Consider to use pipe also on Windows.
 * Research libpq + fork() issue. We have a warning in docs:
http://developer.postgresql.org/pgdocs/postgres/libpq-connect.html
| On Unix, forking a process with open libpq connections can lead to
unpredictable results

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error attribution in foreign scans

2011-02-08 Thread Itagaki Takahiro
On Mon, Feb 7, 2011 at 22:47, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On Mon, Feb 7, 2011 at 21:17, Noah Misch n...@leadboat.com wrote:
 The message does not show which foreign table yielded the error.  We could 
 evade
 the problem in this case by adding a file name to the error message in the 
 COPY
 code,

 Yeah, an error context callback like that makes sense. In the case of the
 file FDW, though, just including the filename in the error message seems
 even better. Especially if the error is directly related to failure in
 reading the file.

What do you think about filenames in terms of security? We will allow
non-superusers to use existing foreign tables of file_fdw.
For reference, we hide some path settings in GUC variables.

We also reconsider privilege of fdwoptions, umoptions, etc. They could
contain password or server-side path, but all users can retrieve the
values. It's an existing issue, but will be more serious in 9.1.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Type constructors

2011-02-08 Thread Itagaki Takahiro
On Wed, Feb 9, 2011 at 14:50, Jeff Davis pg...@j-davis.com wrote:
 1.
 The obvious constructor would be:
  range(1, 10)
 But is that [1, 10), (1, 10], (1, 10), or [1, 10]? We need to support
 all 4, and it's not obvious how to do that easily.

here is the same issue in table partitioning. Also, We might use the
syntax for our partitioning in the future.  Just for reference,
DB2 uses EXCLUSIVE and INCLUSIVE keywords to specify boundaries.

  CREATE TABLE ... PARTITION BY RANGE (...)
(STARTING 0 EXCLUSIVE ENDING 100 INCLUSIVE)

http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r927.html

I'm not sure it is the best syntax, but at least it's easy to read
for beginners and works with parentheses completion by text editors.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries

2011-02-07 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 21:32, Thom Brown t...@linux.com wrote:
 The issue is that generate_series will not return if the series hits
 either the upper or lower boundary during increment, or goes beyond
 it.  The attached patch fixes this behaviour, but should probably be
 done a better way.  The first 3 examples above will not return.

There are same bug in int8 and timestamp[tz] versions.
We also need fix for them.
=# SELECT x FROM generate_series(9223372036854775807::int8,
9223372036854775807::int8) AS a(x);
=# SELECT x FROM generate_series('infinity'::timestamp, 'infinity', '1
sec') AS a(x);
=# SELECT x FROM generate_series('infinity'::timestamptz, 'infinity',
'1 sec') AS a(x);

 postgres=# SELECT x FROM generate_series(1, 9,-1) AS a(x);
 postgres=# SELECT x FROM generate_series(1, 9,3) AS a(x);
They work as expected in 9.1dev.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-02-07 Thread Itagaki Takahiro
On Mon, Feb 7, 2011 at 16:01, Shigeru HANADA han...@metrosystems.co.jp wrote:
 This patch is based on latest FDW API patches which are posted in
 another thread SQL/MED FDW API, and copy_export-20110104.patch which
 was posted by Itagaki-san.

I have questions about estimate_costs().

* What value does baserel-tuples have?
  Foreign tables are never analyzed for now. Is the number correct?

* Your previous measurement showed it has much more startup_cost.
  When you removed ReScan, it took long time but planner didn't choose
  materialized plans. It might come from lower startup costs.

* Why do you use lstat() in it?
  Even if the file is a symlink, we will read the linked file in the
  succeeding copy. So, I think it should be stat() rather than lstat().

+estimate_costs(const char *filename, RelOptInfo *baserel,
+  double *startup_cost, double *total_cost)
+{
...
+   /* get size of the file */
+   if (lstat(filename, stat) == -1)
+   {
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg(could not stat file \%s\: %m, filename)));
+   }
+
+   /*
+* The way to estimate costs is almost same as cost_seqscan(), but there
+* are some differences:
+* - DISK costs are estimated from file size.
+* - CPU costs are 10x of seq scan, for overhead of parsing records.
+*/
+   pages = stat.st_size / BLCKSZ + (stat.st_size % BLCKSZ  0 ? 1 : 0);
+   run_cost += seq_page_cost * pages;
+
+   *startup_cost += baserel-baserestrictcost.startup;
+   cpu_per_tuple = cpu_tuple_cost + baserel-baserestrictcost.per_tuple;
+   run_cost += cpu_per_tuple * 10 * baserel-tuples;
+   *total_cost = *startup_cost + run_cost;
+
+   return stat.st_size;
+}

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for logging the current role

2011-02-06 Thread Itagaki Takahiro
On Sun, Feb 6, 2011 at 23:31, Stephen Frost sfr...@snowman.net wrote:
 * Itagaki Takahiro (itagaki.takah...@gmail.com) wrote:
 I think we need to improve postgresql.conf.sample a bit more, especially
 the long line for #log_csv_fields = '...'. 330 characters in it!
   #1. Leave the long line because it is needed.
 It's needed to match what the current default is..

I agree that it's logically good design, but we could not accept it
as long as it breaks tools in the real world...
Will it break SHOW ALL and SELECT * FROM pg_settings output?
I'm worried that they are not designed to display such a long value.

 I think it depends default log filename, that contains %S (seconds)
 suffix. We can remove %S from log_filename; if we use a log per-day,
 those log might contain different columns even after restart. If we
 cannot avoid zigged csv fields completely, SIGHUP seems reasonable for it.

 This problem is bigger than SIGHUP, but at least with a restart
 required, the default will work properly.  The default configuration
 wouldn't work w/ a change to the log line and a SIGHUP done.

Only works with the default settings is just wrong design.
If we cannot provide a perfect solution, we should allow users to
control everything as they like. I still think PGC_SIGHUP is the
best mode for the parameter, with a note of caution in the docs.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-04 Thread Itagaki Takahiro
Here is a demonstration to support jagged input files. It's a patch
on the latest patch. The new added API is:

  bool NextLineCopyFrom(
[IN] CopyState cstate,
[OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)

It just returns separated fields in the next line. Fortunately, I need
no extra code for it because it is just extracted from NextCopyFrom().

I'm willing to include the change into copy APIs,
but we still have a few issues. See below.

On Fri, Feb 4, 2011 at 16:53, Andrew Dunstan and...@dunslane.net wrote:
 The problem with COPY FROM is that nobody's come up with a good syntax for
 allowing it as a FROM target. Doing what I want via FDW neatly gets us
 around that problem. But I'm quite OK with doing the hard work inside the
 COPY code - that's what my working prototype does in fact.

I think it is not only syntax issue. I found an issue that we hard to
support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
It is a fundamental problem to support jagged fields.

 One thing I'd like is to to have file_fdw do something we can't do another
 way. currently it doesn't, so it's nice but uninteresting.

BTW, how do you determine which field is shifted in your broken CSV file?
For example, the case you find AB,CD,EF for 2 columns tables.
I could provide a raw CSV reader for jagged files, but you still have to
cook the returned fields into a proper tuple...

-- 
Itagaki Takahiro


jagged_csv_api-20110204.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-02-04 Thread Itagaki Takahiro
On Sat, Feb 5, 2011 at 02:29, Robert Haas robertmh...@gmail.com wrote:
 I am still not in favor of adding this syntax.

I also don't like the syntax, but unfortunately, almost all of
them are in the SQL standard :-(.  In addition, Oracle already
uses the same feature with the special syntax, though true multiset
data types are supported in it.

 [It sure is awkward that trim_array has the word
 array second and all of our other array functions have it first - is
 this required by the spec?]

Yes, again, but we could have array_trim() for the alias.

 array_to_set() and array_is_set() seem possibly useful, but I probably
 would have called them array_remove_dups() and array_has_dups().  I
 might be in the minority on that one, though.

array_to_set is the only function we can rename freely.
Another candidates might be array_unique (contrib/intarray uses uniq).

array_is_set() is an internal representation of IS A SET operator.
So, the name is not so important (and not documented.)

 I think array_subset(), array_union(), array_intersect(), and
 array_except() are useful, but I think they should just be regular
 functions, without any special syntax.

All of the special syntax are in the spec, except the argument
types should be multisets rather than arrays.

 fusion() and intersection() also seem useful, but maybe it would be
 more consistent to all them array_fusion() and array_intersection().

Both of the names are the standard...  We could have array_fusion()
for array types and fusion() for multiset types, but I prefer
overloaded fusion() to have both names.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-02-04 Thread Itagaki Takahiro
On Sat, Feb 5, 2011 at 03:04, Robert Haas robertmh...@gmail.com wrote:
 I am still not in favor of adding this syntax.

 I also don't like the syntax, but unfortunately, almost all of
 them are in the SQL standard :-(.

 The standard specifies this syntax for arrays, or for multisets?

Multisets. But I chose the same function name and syntax because
arrays *are* multisets by definition.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-02-04 Thread Itagaki Takahiro
On Sat, Feb 5, 2011 at 04:24, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 In math class, maybe.  But in programming, no.  Multiset is a
 datatype.  Array is a different datatype.  There is no reason why we
 need to clutter our parser with extra keywords to support a
 non-standard feature extension.

 My understanding is that we will have to have those functions defined
 and user visible, and that we benefit from function overloading which is
 not in the standard.  So there's no reason not to provide those function
 for arrays already, then extend to full multiset support.

 Given PostgreSQL overloading, yes, arrays are multisets as far as
 defining those standard compliant APIs is concerned.  AFAIUI.

Yes, I'd like to use overloading.
Choosing arbitrary names increases learning costs for users.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 09:48, Andrew Dunstan and...@dunslane.net wrote:
 I'd like to be able to add a callback function to construct the values for
 the tuple. So it would become something like:
   typedef void (*copy_make_values) (CopyState cstate, NumFieldsRead int);

You can do nothing interesting in the callback probably
because the details of CopyState is not exported yet.
Also, we should pass through user context for such kind of callback.
The prototype of would have void *userdata.

 Of course, I want this so I could construct a text array from the read in
 data, but I could also imagine a foreign data wrapper wanting to mangle the
 data before handing it to postgres, say by filling in a field or hashing it.

Could you explain the actual use-cases and examples?  I think we need to have
SQL-level extensibility if we provide such flexibility. I guess typical users
don't want to write functions with C for each kind of input files.

Note that pg_bulkload has a similar feature like as:
  CREATE FUNCTION my_function(...) RETURNS record AS ...;
  COPY tbl FROM 'file' WITH (make_record_from_line = my_function)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 11:32, Andrew Dunstan and...@dunslane.net wrote:
 Umm, where? I can't find this in the documentation
 http://pgbulkload.projects.postgresql.org/pg_bulkload.html

Here:
http://pgbulkload.projects.postgresql.org/pg_bulkload.html#filter

 The object, as I have explained previously, is to have a FDW that returns a
 text array from a (possibly irregularly shaped) file.

I remember the text array proposal, but if the extension is written in C,
it can only handle one kind of input files. If another file is broken
in a different way, you need to rewrite the C code, no?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing COPY API

2011-02-03 Thread Itagaki Takahiro
On Fri, Feb 4, 2011 at 12:17, Andrew Dunstan and...@dunslane.net wrote:
 http://pgbulkload.projects.postgresql.org/pg_bulkload.html#filter
 AFAICT, this doesn't support ragged tables with too many columns, which is
 my prime use case. If it supported variadic arguments in filter functions it
 might come closer.

It will be good improvement for pg_bulkload, but it's off-topic ;-)

 Also, a FDW allows the COPY to be used as a FROM target, giving it great
 flexibility. AFAICT this does not.

BTW, which do you want to improve, FDW or COPY FROM?  If FDW, the better
API would be raw version of NextCopyFrom(). For example:
  bool NextRawFields(CopyState cstate, char **fields, int *nfields)
The caller FDW has responsibility to form tuples from the raw fields.
If you need to customize how to form the tuples from various fields,
the FDW also need to have such extensibility.

If COPY FROM, we should implement all the features in copy.c
rather than exported APIs.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump directory archive format / parallel pg_dump

2011-02-03 Thread Itagaki Takahiro
On Wed, Feb 2, 2011 at 13:32, Joachim Wieland j...@mcknight.de wrote:
 Here is a rebased version with some minor changes as well.

I read the patch works as below. Am I understanding correctly?
  1. Open all connections in a parent process.
  2. Start transactions for each connection in the parent.
  3. Spawn child processes with fork().
  4. Each child process uses one of the inherited connections.

I think we have 2 important technical issues here:
 * The consistency is not perfect. Each transaction is started
   with small delays in step 1, but we cannot guarantee no other
   transaction between them.
 * Can we inherit connections to child processes with fork() ?
   Moreover, we also need to pass running transactions to children.
   I wonder libpq is designed for such usage.

To solve both issues, we might want a way to control visibility
in a database server instead of client programs. Don't we need
server-side support like [1] before developing parallel dump?
 [1] 
http://wiki.postgresql.org/wiki/ClusterFeatures#Export_snapshots_to_other_sessions

 I haven't
 tested it on Windows now but will do so as soon as the Unix part has
 been reviewed.

It might be better to remove Windows-specific codes from the first try.
I doubt Windows message queue is the best API in such console-based
application. I hope we could use the same implementation for all
platforms for inter-process/thread communication.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add ENCODING option to COPY

2011-02-03 Thread Itagaki Takahiro
On Tue, Feb 1, 2011 at 13:08, Hitoshi Harada umi.tan...@gmail.com wrote:
 The third patch is attached, modifying mb routines so that they can
 receive conversion procedures as FmgrInof * and save the function
 pointer in CopyState.
 I tested it with encoding option and could not see performance slowdown.
 Hmm, sorry, the patch was wrong. Correct version is attached.

Here is a brief review for the patch.

* Syntax: ENCODING encoding vs. ENCODING 'encoding'
We don't have to quote encoding names in the patch, but we always need
quotes for CREATE DATABASE WITH ENCODING. I think we should modify
CREATE DATABASE to accept unquoted encoding names aside from the patch.


Changes in pg_wchar.h are the most debatable parts of the patch.
The patch adds pg_cached_encoding_conversion(). We normally use
pg_do_encoding_conversion(), but it is slower than the proposed
function because it lookups conversion proc from catalog every call.

* Can we use #ifndef FRONTEND in the header?
Usage of fmgr.h members will broke client applications without the #ifdef,
but I guess client apps don't always have definitions of FRONTEND.
If we don't want to change pg_wchar.h, pg_conversion_fn.h might be
a better header for the new API because FindDefaultConversion() is in it.


Changes in copy.c looks good except a few trivial cosmetic issues:

* encoding_option could be a local variable instead of cstate's member.
* cstate-client_encoding is renamed to target_encoding,
  but I prefer file_encoding or remote_encoding.
* CopyState can have conv_proc entity as a member instead of the pointer.
* need_transcoding checks could be replaced with conv_proc IS NULL check.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER EXTENSION UPGRADE, v3

2011-02-02 Thread Itagaki Takahiro
On Wed, Feb 2, 2011 at 03:21, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 PFA version 3 of the ALTER EXTENSION PATCH, cleaned and merged against
 recent HEAD and extension's branch from which I just produced the v30
 patch.

Excuse me for asking, but could you explain what is the purpose?
Which is true, upgrade to 9.1 from past versions or upgrade
from 9.1 to future versions?  Also, how much advantage will we
have compared with uninstall_MODULE.sql + CREATE EXTENSION?

In my understanding, the patch does two things:
 1. Add ALTER object SET EXTENSION
 2. Add MODULE.upgrade.sql script for each contrib module

#1 seems reasonable as a supplement for CREATE EXTENSION patch,
but we might need not only attach but also detach.

I guess #2 is much more difficult than expected because we might
upgrade databases from older versions. Will we need upgrade script
for each supported versions? -- if so, it would be nightmare...

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_checkpoints and restartpoint

2011-02-02 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 05:17, Robert Haas robertmh...@gmail.com wrote:
 The attached patch changes LogCheckpointEnd so that it logs
 the number of WAL files created/deleted/recycled even in
 restartpoint.

 This patch looks good to me.  For now, I'm marking it Ready for
 Committer.  Absent objections, I will also commit it.

I don't have any objections for the patch, but we might also need
to add description about restartpoints into log_checkpoints option.

http://developer.postgresql.org/pgdocs/postgres/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for logging the current role

2011-02-01 Thread Itagaki Takahiro
 Updated patch attached.

I think we need to improve postgresql.conf.sample a bit more, especially
the long line for #log_csv_fields = '...'. 330 characters in it!
  #1. Leave the long line because it is needed.
  #2. Hide the variable from the default conf.
  #3. Use short %x mnemonic both in log_line_prefix and log_csv_fields.
  (It might require many additional mnemonics.)
Which is better, or another idea?

On Sat, Jan 29, 2011 at 13:06, Stephen Frost sfr...@snowman.net wrote:
 * log_csv_fields's GUC context is PGC_POSTMASTER. Is it by design?

 Doing SIGHUP would require addressing how to get all of the backends to
 close the old log file and open the new one, because we don't want to
 have a given log file which has two different CSV formats in it (we
 wouldn't be able to load it into the database...).  This was
 specifically addressed in the thread leading up to this patch...

I think it depends default log filename, that contains %S (seconds)
suffix. We can remove %S from log_filename; if we use a log per-day,
those log might contain different columns even after restart. If we
cannot avoid zigged csv fields completely, SIGHUP seems reasonable for it.

 * What objects do you want to allocate in TopMemoryContext in
   assign_log_csv_fields() ?
 I just moved the switch to Top to be after those are allocated.

How about changing the type of csv_log_fields from List* to fixed
array of LogCSVFields? If so, we can use an array-initializer
instead of build_default_csvlog_list() ?  The code will be simplified.
Fixed length won't be a problem because it would be rare that the
same field are specified many times.

 * Docs need some tags for itemized elements or pre-formatted codes.
   They looks itemized in the sgml files, but will be flattened in
   complied HTML files.

 Not sure what you're referring to here...?  Can you elaborate?  I'm not
 great with the docs. :/

Could you try to make html in the doc directory?
Your new decumentation after
| These columns may be included in the CSV output:
will be unaligned plain text without some tags.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-02-01 Thread Itagaki Takahiro
On Fri, Jan 28, 2011 at 17:12, Marko Tiikkaja
marko.tiikk...@cs.helsinki.fi wrote:
 I still didn't address
 the issue with pg_advisory_unlock_all() releasing transaction scoped locks,

I guess you don't want independent locks, right? If an user object
is locked by session locks, it also blocks backends trying to lock it
with transaction locks.

If so, I think an ideal behavior is below:
- The transaction-or-session property is overwritten by the last lock
  function call. We can promote session locks from/to transaction locks.
- Shared and exclusive locks are managed independently.
  We could have shared session lock and exclusive transaction
  lock on the same resource in a transaction.
- Unlock functions releases both transaction and session locks.
- unlock_all() releases all both locks.

Those might be odd in DBMS-perspective, but would be natural as
programming languages. I guess advisory locks are often used in
standard programming like flows.

 Another issue I found while testing the behaviour here:
 http://archives.postgresql.org/pgsql-hackers/2011-01/msg01939.php
 is that if a session holds both a transaction level and a session level lock
 on the same resource, only one of them will appear in pg_locks.  Is that
 going to be a problem from the user's perspective?  Could it be an
 indication of a well-hidden bug?  Based on my tests it seems to work, but
 I'm not at all confident with the code.

In the above proposal, we won't have both session and transaction lock
on the same resource at the same time, though we still need to show
exclusive and shared locks in different lines.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2011-01-31 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 13:41, Robert Haas robertmh...@gmail.com wrote:
 1. Absorb fsync requests a lot more often during the sync phase.
 2. Still try to run the cleaning scan during the sync phase.
 3. Pause for 3 seconds after every fsync.

 So if we want the checkpoint
 to finish in, say, 20 minutes, we can't know whether the write phase
 needs to be finished by minute 10 or 15 or 16 or 19 or only by 19:59.

We probably need deadline-based scheduling, that is being used in write()
phase. If we want to sync 100 files in 20 minutes, each file should be
sync'ed in 12 seconds if we think each fsync takes the same time.
If we would have better estimation algorithm (file size? dirty ratio?),
each fsync chould have some weight factor.  But deadline-based scheduling
is still needed then.

BTW, we should not sleep in full-speed checkpoint. CHECKPOINT command,
shutdown, pg_start_backup(), and some of checkpoints during recovery
might don't want to sleep.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add reference to client_encoding parameter

2011-01-31 Thread Itagaki Takahiro
On Tue, Feb 1, 2011 at 00:37, Thom Brown t...@linux.com wrote:
 I've attached a small patch for the docs which adds a reference to the
 client_encoding parameter description.  This is in response to someone
 attempting to submit a comment which explains where available
 encodings can be found.

Thanks. It's a reasonable reference.
But I reworded it as below, that we are using in other a few places.

The character sets supported by the PostgreSQL server are described in ...

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] setlocale and gettext in Postgres

2011-01-31 Thread Itagaki Takahiro
2011/1/27 Hiroshi Inoue in...@tpf.co.jp:
 I see now the following lines in libintl.h of version
 0.18.1.1 which didn't exist in 0.17 version.

 The macro may cause a trouble especially on Windows.
 Attached is a patch to disable the macro on Windows.

Can anyone test the fix?

I added the patch to the current commitfest for reminder.
https://commitfest.postgresql.org/action/patch_view?id=528

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] off-by-one mistake in array code error reporting

2011-01-31 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 17:19, Alexey Klyukin al...@commandprompt.com wrote:
 While working on PL/Perl patch for arrays as input arguments I've noticed that
 PostgreSQL reports one less dimension in the 'number of array dimensions (%d)
 exceeds the maximum allowed (%d), i.e.

 select 
 '{{{1,2},{3,4}},{{5,6},{7,8}}},{{{9,10},{11,12}},{{13,14},{15,16,     
                                                                 
 17,18},{19,20}},{{21,22},{23,24}}},{{{25,26},{27,28}},{{29,30},{31,32},
  {1,2},{3,4}},{{5,6},{7,8}}},{{{9,10},{11,12}},{{13,14},{15,16,       
    
 17,18},{19,20}},{{21,22},{23,24}}},{{{25,26},{27,28}},{{29,30},{31,32}},
                                                                               
   {{1,2},{3,4}},{{5,6},{7,8}}},{{{9,10},{11,12}},{{13,14},{15,16,     
     
 17,18},{19,20}},{{21,22},{23,24}}},{{{25,26},{27,28}},{{29,30},{31,32},
  {1,2},{3,4}},{{5,6},{7,8}}},{{{9,10},{11,12}},{{13,14},{15,16,       
    
 17,18},{19,20}},{{21,22},{23,24}}},{{{25,26},{27,28}},{{29,30},{31,32}}}'
                                                                               
  ::int[];

 ERROR:  number of array dimensions (6) exceeds the maximum allowed (6)

 Attached is the simple fix for that.

Thanks. I found one more same bug in PL/pgSQL.

s=# DO $$ DECLARE a int[]; BEGIN a = (ARRAY[1])[1][1][1][1][1][1][1]; END; $$;
ERROR:  number of array dimensions (6) exceeds the maximum allowed (6)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-01-30 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 02:34, Robert Haas robertmh...@gmail.com wrote:
 I'm in favor of rejecting this patch in its entirety.  The
 functionality looks useful, but once you remove the syntax support, it
 could just as easily be distributed as a contrib module rather than in
 core.

 +1 ... if we're going to provide nonstandard behavior, it should be with
 a different syntax.  Also, with a contrib module we could keep on
 providing the nonstandard behavior for people who still need it, even
 after implementing the standard properly.

 Good point.

I agree for collect() function, that is the only function we cannot
provide compatibility when we have MULTISET. But others are still
reasonable because they won't provide nonstandard behavior.

The SQL standard seems to have abstract COLLECTION data type as a
super class of ARRAY and MULTISET. So, it's reasonable that
functions and operators that accept MULTISETs also accept ARRAYs.
For example, we will have cardinality(ARRAY) even if we have
cardinality(MULTISET). Also, trim_array() is in the SQL standard.

I can remove some parts in the patch, especially for parser changes,
but others should be still in the core.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-01-30 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 04:12, Robert Haas robertmh...@gmail.com wrote:
 Well, do you want to revise this and submit a stripped-down version?
 I'm not averse to adding things that are required by the standard and
 won't cause backward compatibility problems later.

Sure. I'll remove collect() function. I can also remove syntax support,
but it requires rework for documentation (for, IS A SET operator to
is_a_set function), so I'd like to wait for the final consensus a bit more.

 The documentation for trim_array() in the current patch version is
 pretty terrible.  The documentation describe it as having the prototype
 trim_array(anyarray), but it's called in the example as
 trim(integer[], integer) - two arguments vs. one.

Oops, it's just my mistake. trim(anyarray, integer) is correct.

 Also the docs don't
 say how it decides how many elements to remove, or what happens to a
 multi-dimensional array.

I wrote the description below in the patch:
+ In functionarray_sort/, functionset/, and functiontrim_array/
+ functions, input arrays are always flattened into one-dimensional arrays.
+ In addition, the lower bounds of the arrays are adjusted to 1.

I'm not sure what is the consistent behavior for MD arrays. For example,
array_concat() is very strict, but @ and  operators don't care about
the dimensions. I interpreted the second argument for trim_array() as
a number of elements, but of course we can redefine it as a number of
rows for MD arrays.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_ctl failover Re: [HACKERS] Latches, signals, and waiting

2011-01-30 Thread Itagaki Takahiro
On Mon, Jan 31, 2011 at 11:52, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Jan 29, 2011 at 1:11 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Ok. I will write a C user function and add to pgpool source tree. I
 think it will be fairly easy to create a trigger file in the function.

 If the pg_ctl promote patch will have been committed, I recommend that
 the C function should send the signal to the startup process rather than
 creating the trigger file.

The C function needs to create a trigger file in $PGDATA/promote
before sending signals, no?system(pg_ctl promote) seems
the easiest way if you use an external module.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions support for pg_dump, patch v27

2011-01-30 Thread Itagaki Takahiro
On Fri, Jan 28, 2011 at 18:03, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 After review, I included all your proposed changes, thanks a lot!
 Please find attached a new version of the patch, v29,

Additional questions and discussions:

* relocatable and schema seems to be duplicated options.
We could treat an extension is relocatable when schema is not
specified instead of relocatable option. Or, If we keep schema
option, it would be used as the default schema to be installed
when WITH SCHEMA is not specified.

* version field in pg_available_extension might mislead when
a newer version of an extension is available but an older version
is installed. How about returning installed version for installed
field instead of booleans? The field will be NULLs if not installed.

* I want to remove O(n^2) behavior in pg_extensions(). It scans
pg_extension catalog to return whether the extension is installed,
but it would be better to change the function to return just whole
extensions and JOIN with pg_extension in pg_available_extensions.
(it's the same technique used in pg_stat_replication)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions support for pg_dump, patch v27

2011-01-27 Thread Itagaki Takahiro
On Thu, Jan 27, 2011 at 20:01, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Ok, done.  Of course, it solves the whole problem Itagaki had with
 adminpack because we stop relying on dependencies to get it right now.

I found pg_restore with -c option fails when an extension is created
in pg_catalog. Since pg_catalog is an implicit search target, so we
might need the catalog to search_path explicitly.
Note that I can restore adminpack with not errors because it has
explicit pg_catalog. prefix in the installer script.


postgres=# CREATE EXTENSION cube WITH SCHEMA pg_catalog;
$ pg_dump -Fc postgres  postgres.dump
$ pg_restore -d postgres -c postgres.dump
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1; 3996 16678 EXTENSION cube
pg_restore: [archiver (db)] could not execute query: ERROR:  no schema
has been selected to create in


BTW, I have a minor comments for the code.
  extern bool extension_relocatable_p(Oid ext_oid);
What is _p ?  Also, we could make the function static
because it is used only in extension.c.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions support for pg_dump, patch v27

2011-01-27 Thread Itagaki Takahiro
On Thu, Jan 27, 2011 at 22:48, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Itagaki Takahiro itagaki.takah...@gmail.com writes:
 I found pg_restore with -c option fails when an extension is created
 in pg_catalog.
 Nice catch, thank you very much (again) for finding those :)

Seems good.

   extern bool extension_relocatable_p(Oid ext_oid);
 predicate.  Maybe I've done too much Emacs Lisp coding at the time I
 added that function, but it looked natural (enough) to me :)

Hmmm, I like extension_is_relocatable() or so...
Including the above, I wrote a patch on your patch for minor
cleanup. Please merge reasonable parts in it.

* access() is not portable.
The pre-checking with access() doesn't seems needed because
the same error will be raised in parse_extension_control_file().

* There are some dead code in the patch.
For example, you exported ObjectAddresses to public, but it is not
used in extension.c actually. I reverted some of changes.

* Should we support absolute control file paths?
Absolute paths are supported by get_extension_absolute_path(),
but I'm not sure actual use-cases.

* Each ereport(ERROR) should have a reasonable errcode unless
they are an internal logic error, and whether the error message
follows our guidline (starting with a lower case character, etc.)

* Changed create_extension_with_user_data to a static variable.
CreateExtensionAddress and create_extension are still exported.
We could have better names for them -- CurrentExtensionAddress
and in_create_extension?  Or, in_create_extension might be
replaced with CreateExtensionAddress.objectId != InvalidOid.

* Added psql tab completion for CREATE/DROP/ALTER EXTENSION.
* Use palloc0() instead of palloc() and memset(0).
* Several code cleanup.

-- 
Itagaki Takahiro


extension-diff-on.v28a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: WIP: plpgsql - foreach in

2011-01-26 Thread Itagaki Takahiro
On Mon, Jan 24, 2011 at 13:05, Stephen Frost sfr...@snowman.net wrote:
 FOR var in ARRAY array_expression ...

 I like that a lot more than inventing a new top-level keyword,

AFAIR, the syntax is not good at an array literal.
  FOR var IN ARRAY ARRAY[1,2,5] LOOP ...
And it was the only drawback compared with FOREACH var IN expr.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions support for pg_dump, patch v27

2011-01-25 Thread Itagaki Takahiro
On Mon, Jan 24, 2011 at 18:22, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Following recent commit of the hstore Makefile cleaning, that I included
 into my extension patch too, I have a nice occasion to push version 27
 of the patch.  Please find it enclosed.

Hi, I read the patch and test it in some degree.  It works as expected
generally, but still needs a bit more development in edge case.

* commands/extension.h needs cleanup.
- Missing extern declarations for create_extension and
create_extension_with_user_data variables.
- ExtensionControlFile and ExtensionList can be hidden from the header.
- extension_objects_fctx is not used at all.

* There are many recordDependencyOn(myself, CreateExtensionAddress, ...)
in some places, but I'm not sure we forget something -- TRIGGERs?

* Will we still need uninstaller scripts?
DROP EXTENSION depends on object dependency to drop objects,
but we have a few configurations that cannot be described in the
dependency. For example, rows inserted into pg_am for user AMs
or reverting default settings modified by the extension.

* Extensions installed in pg_catalog:
If we install an extension into pg_catalog,
  CREATE EXTENSION xxx WITH SCHEMA pg_catalog
pg_dump dumps nothing about the extension. We might need special care
for modules that install functions only in pg_catalog.

* I can install all extensions successfully, but there is a warning
in hstore. The warning was not reported to the client, but the whole
contents of hstore.sql.in was recorded in the server log.

  WARNING:  = is deprecated as an operator name

We sets client_min_messages for the issue in hstore.sql.in, but I think
we also need an additional SET log_min_messages TO ERROR around it.

* Is it support nested CREATE EXTENSION calls?
It's rare case, but we'd have some error checks for it.

* It passes all regression test for both backend and contrib modules,
but there are a couple of warnings during build and installation.

Three compiler warnings found. Please check.
pg_proc.c: In function 'ProcedureCreate':
pg_proc.c:110:8: warning: 'extensionOid' may be used uninitialized in
this function
pg_shdepend.c: In function 'shdepReassignOwned':
pg_shdepend.c:1335:6: warning: implicit declaration of function
'AlterOperatorOwner_oid'
operatorcmds.c:369:1: warning: no previous prototype for
'AlterOperatorOwner_oid'

* Five warnings also found during make install in contrib directory.
../../config/install-sh: ./uninstall_btree_gist.sql does not exist.
../../config/install-sh: ./uninstall_pageinspect.sql does not exist.
../../config/install-sh: ./uninstall_pgcrypto.sql does not exist.
../../config/install-sh: ./uninstall_pgrowlocks.sql does not exist.
../../config/install-sh: ./uninstall_pgstattuple.sql does not exist.

* You use const = expr in some places, ex. if( InvalidOid != ...),
  but we seem to prefer expr = const.

* [off topic] I found some installer scripts are inconsistent.
They uses CREATE FUNCTION for some of functions, but CREATE OR REPLACE
FUNCTION for others. We'd better write documentation about how to write
installer scripts because CREATE EXTENSION has some implicit assumptions
in them. For example, Don't use transaction, etc.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions support for pg_dump, patch v27

2011-01-25 Thread Itagaki Takahiro
On Wed, Jan 26, 2011 at 02:57, David E. Wheeler da...@kineticode.com wrote:
 Other than adminpack, I think it makes sense to say that extensions
 are not going into pg_catalog…

 Given this, we should maybe see about either making adminpack part of
 PostgreSQL's core distribution (probably a good idea) or moving it out
 of pg_catalog so we don't have an exception to the rule.

 +1

I doubt it can solve the real problem.
It is my understanding that we install them in pg_catalog because
they are designed to be installed in template1 for pgAdmin, right?

We have a problem in pg_dump when we install extension modules in
template1. If we create a database, installed functions are copied
from template1. However, they are also dumped with pg_dump unless
they are in pg_catalog. So, we encounter function already exists
errors when pg_restore.

Since pg_dump won't dump user objects in pg_catalog, adminpack can
avoid the above errors by installing functions in pg_catalog.
CREATE EXTENSION might have the same issue -- Can EXTENSION work
without errors when we install extensions in template databases?
To avoid errors, pg_dump might need to dump extensions as
CREATE OR REPLACE EXTENSION or CREATE EXTENSION IF NOT EXISTS
rather than CREATE EXTENSION.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: WIP: plpgsql - foreach in

2011-01-25 Thread Itagaki Takahiro
On Mon, Jan 24, 2011 at 20:10, Pavel Stehule pavel.steh...@gmail.com wrote:
 we have to iterate over array's items because it allow seq. access to
 array's data. I need a global index for function array_get_isnull. I
 can't to use a buildin functions like array_slize_size or
 array_get_slice, because there is high overhead of array_seek
 function. I redesigned the initial part of main cycle, but code is
 little bit longer :(, but I hope, it is more readable.

The attached patch only includes changes in plpgsql. I tested it
combined with the previous one, that includes other directories.

* src/pl/plpgsql/src/gram.y
| for_variable K_SLICE ICONST
The parameter for SLICE must be an integer literal. Is it a design?
I got a syntax error when I wrote a function like below. However,
FOREACH returns different types on SLICE = 0 or SLICE = 1.
(element type for 0, or array type for =1)  So, we cannot write
a true generic function that accepts all SLICE values even if
the syntax supports an expr for the parameter.

CREATE FUNCTION foreach_test(int[], int) RETURNS SETOF int[] AS
$$
DECLARE
  a integer[];
BEGIN
  FOREACH a SLICE $2 IN $1 LOOP -- syntax error
RETURN NEXT a;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

* /doc/src/sgml/plpgsql.sgml
+ FOREACH replaceabletarget/replaceable optional SCALE
s/SCALE/SLICE/ ?

* Why do you use foreach_a for some identifiers?
We use foreach only for arrays, so _a suffix doesn't seem needed.

* accumArrayResult() for SLICE has a bit overhead.
If we want to optimize it, we could use memcpy() because slices are
placed in continuous memory. But I'm not sure the worth; I guess
FOREACH will be used with SLICE = 0 in many cases.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: FDW API

2011-01-24 Thread Itagaki Takahiro
On Sat, Jan 22, 2011 at 07:20, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 * I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
 the handler function, if it doesn't exist yet.

 Doing that would require the equivalent of pg_pltemplate for FDWs, no?
 I think we're a long way from wanting to do that.  Also, it seems to me
 that add-on FDWs are likely to end up getting packaged as extensions,

The proposed file_fdw.sql actually creates a default FDW on installation.
So I think the installation scripts work as a template even if we don't
have FDW template catalogs.

+ /* contrib/file_fdw/file_fdw.sql.in */
+ -- create wrapper with validator and handler
+ CREATE OR REPLACE FUNCTION file_fdw_validator (text[], oid)
+ CREATE OR REPLACE FUNCTION file_fdw_handler ()
+ CREATE FOREIGN DATA WRAPPER file_fdw
+ VALIDATOR file_fdw_validator HANDLER file_fdw_handler;

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for logging the current role

2011-01-24 Thread Itagaki Takahiro
On Wed, Jan 19, 2011 at 14:36, Stephen Frost sfr...@snowman.net wrote:
 Alright, here's the latest on this patch.  I've added a log_csv_fields
 GUC along with the associated magic to make it work (at least for me).
 Also added 'role_name' and '%U' options.  Requires postmaster restart to
 change, didn't include any 'short-cut' field options, though I don't
 think it'd be hard to do if we can decide on it.  Default remains the
 same as what was in 9.0.

Hi, I reviewed log_csv_options.patch. It roughly looks good,
but I'd like to discuss about the design in some points.

 Features 
The patch adds log_csv_fields GUC variable. It allows to customize
columns in csvlog. The default setting is what we were writing 9.0 or
earlier versions.

It also add role_name for log_csv_fields and %U for log_line_prefix
to record role names. They are the name set by SET ROLE. OTOH, user_name
and %u shows authorization user names.

 Discussions 
* How about csvlog_fields rather than log_csv_fields?
  Since we have variables with syslog_ prefix, csvlog_ prefix
  seems to be better.

* We use %what syntax to specify fields in logs for log_line_prefix,
  but will use long field names for log_csv_fields. Do you have any
  better idea to share the names in both options?  At least I want to
  share the short description for them in postgresql.conf.

* log_csv_fields's GUC context is PGC_POSTMASTER. Is it by design?
  PGC_SIGHUP would be more consistent compared with log_line_prefix.
  However, the csv format will not be valid because the column
  definitions will be changed in the middle of file.

* none is not so useful for the initial role_name field.
  Should we use user_name instead of the empty role_name?

 Comments for codes 
Some of the items are trivial, though.

* What objects do you want to allocate in TopMemoryContext in
  assign_log_csv_fields() ? AFAICS, we don't have to keep rawstring
  and column_list in long-term context. Or, if you need TopMemoryContext,
  those variables should be pfree'ed at the end of function.

* appendStringInfoChar() calls in write_csvlog() seem to be wrong
  when the last field is not application_name.

* Docs need more cross-reference hyper-links for see also items.

* Docs need some tags for itemized elements or pre-formatted codes.
  They looks itemized in the sgml files, but will be flattened in
  complied HTML files.

* A declaration of assign_log_csv_fields() at the top of elog.c
  needs extern.
* There is a duplicated declaration for build_default_csvlog_list().
* list_free() is NIL-safe. You don't have to check whether the list
  is NIL before call the function.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-01-24 Thread Itagaki Takahiro
On Mon, Jan 24, 2011 at 20:49, Robert Haas robertmh...@gmail.com wrote:
 I notice that this is adding keywords and syntax support for what is
 basically a PostgreSQL extension (since we certainly can't possibly be
 following the SQL standards given that we're not implementing a new
 datatype.  Is that really a good idea?

As I wrote here,
http://archives.postgresql.org/pgsql-hackers/2011-01/msg00829.php
I think we can follow the SQL standard incrementally because we
have function overloads.

One exception is the result type of collect() aggregate function.
It returns an array for now, but will return a multiset when we
support true multiset data type.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-column collation, the finale

2011-01-24 Thread Itagaki Takahiro
On Sat, Jan 15, 2011 at 06:41, Peter Eisentraut pete...@gmx.net wrote:
 I've been going over this patch with a fine-tooth comb for the last two
 weeks, fixed some small bugs, added comments, made initdb a little
 friendlier, but no substantial changes.

What can I do to test the patch?
No regression tests are included in it, and I have an almost empty
pg_collation catalog with it:

=# SELECT * FROM pg_collation;
 collname | collnamespace | collencoding | collcollate | collctype
--+---+--+-+---
 default  |11 |0 | |
(1 row)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add ENCODING option to COPY

2011-01-24 Thread Itagaki Takahiro
On Sat, Jan 15, 2011 at 02:25, Hitoshi Harada umi.tan...@gmail.com wrote:
 The patch overrides client_encoding by the added ENCODING option, and
 restores it as soon as copy is done.

We cannot do that because error messages should be encoded in the original
encoding even during COPY commands with encoding option. Error messages
could contain non-ASCII characters if lc_messages is set.

 I see some complaints ask to use
 pg_do_encoding_conversion() instead of
 pg_client_to_server/server_to_client(), but the former will surely add
 slight overhead per reading line

If we want to reduce the overhead, we should cache the conversion procedure
in CopyState. How about adding something like FmgrInfo file_to_server_covv
into it?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] add __attribute__((noreturn)) to suppress a waring

2011-01-23 Thread Itagaki Takahiro
I found the following warning with Fedora 14 / gcc 4.5.1.

pg_backup_archiver.c: In function ‘_discoverArchiveFormat’:
pg_backup_archiver.c:1736:11: warning: ‘fh’ may be used uninitialized
in this function

To suppress it, I'm thinking to add noreturn to die_horribly().
Any objections?  Another solution might be adding a dummy assignment
after calls of die_horribly().

-- 
Itagaki Takahiro


noreturn.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to know killed by pg_terminate_backend

2011-01-21 Thread Itagaki Takahiro
On Fri, Jan 21, 2011 at 13:56, Tatsuo Ishii is...@postgresql.org wrote:
 Here is the patch to implement the feature.

 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to
    the target backend.
 2) The infrastructure used for message passing is
    storage/ipc/procsignal.c The new message type for ProcSignalReason
    is PROCSIG_TERMNINATE_BACKEND_INTERRUPT
  3) I assign new error code 57P04 which is returned from the backend
       killed by pg_terminate_backend().

 #define ERRCODE_TERMINATE_BACKEND                     MAKE_SQLSTATE('5','7', 
 'P','0','4')

 Anyone has better idea? Tom dislikes my patch but I don't know how to
 deal with it.

There was another design in the past discussion:
 One idea is postmaster sets a flag in the shared memory area
 indicating it rceived SIGTERM before forwarding the signal to
 backends.

Is it enough for your purpose and do we think it is more robust way?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-01-21 Thread Itagaki Takahiro
On Tue, Jan 18, 2011 at 09:33, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Review for CF:

Thank your for the review!

 Since it doesn't appear to be intended to change any user-visible
 behavior, I don't see any need for docs or changes to the regression
 tests.

There might be some user-visible behaviors in error messages
because I rearranged some codes to check errors, But we can see
the difference only if we have two or more errors in COPY commands.
They should be not so serious issues.

 So far everything I've done has been with asserts enabled, so I
 haven't tried to get serious benchmarks, but it seems fast.  I will
 rebuild without asserts and do performance tests before I change the
 status on the CF page.

 I'm wondering if it would make more sense to do the benchmarking with
 just this patch or the full fdw patch set?  Both?

I tested the performance on my desktop PC, but I cannot see any
differences. But welcome if any of you could test on high-performance
servers.

Comparison with file_fdw would be more interesting
If they have similar performance, we could replace COPY FROM to
CREATE TABLE AS SELECT FROM foreign_table, that is more flexible.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-01-21 Thread Itagaki Takahiro
On Fri, Jan 21, 2011 at 22:12, Shigeru HANADA han...@metrosystems.co.jp wrote:
 My concern is the explainInfo interface is not ideal for the purpose
 and therefore it will be unstable interface. If we support nested plans
 in FDWs, each FDW should receive a tree writer used internally in
 explain.c. explainInfo, that is a plan text, is not enough for complex
 FdwPlans. However, since we don't have any better solution for now,
 we could have the variable for 9.1. It's much better than nothing.

 When I was writing file_fdw, I hoped to use static functions in
 explain.c such as ExplainProperty() to handle complex information.
 Even for single plan node, I think that filename and size (currently
 they are printed in a plain text together) should be separated in the
 output of explain, especially when the format was XML or JSON.

Just an idea -- we could return complex node trees with explainInfo
if we use XML or JSON for the format. For example, pgsql_fdw can
return the result from EXPLAIN (FORMAT json) without modification.

It might be one of the reasons we should should support JSON in the core :)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-01-20 Thread Itagaki Takahiro
On Wed, Jan 19, 2011 at 00:34, Shigeru HANADA han...@metrosystems.co.jp wrote:
 Attached patch requires FDW API patches and copy_export-20110114.patch.

Some minor comments:

* Can you pass slot-tts_values and tts_isnull directly to NextCopyFrom()?
It won't allocate the arrays; just fill the array buffers.

* You can pass NULL for the 4th argument for NextCopyFrom().
| Oid tupleoid; /* just for required parameter */

* file_fdw_validator still has duplicated codes with BeginCopy,
but I have no idea to share the validation code in clean way...

* Try strVal() instead of DefElem-val.str
* FdwEPrivate seems too abbreviated for me. How about FileFdwPrivate?
* private is a bad identifier name because it's a C++ keyword.
We should rename FdwExecutionState-private.


 In that message, you also pointed out that FDW must generate
 explainInfo in every PlanRelScan call even if the planning is not for
 EXPLAIN.  I'll try to defer generating explainInfo until EXPLAIN
 VERBOSE really uses it.  It might need new hook point in expalain.c,
 though.

I complained about the overhead, but it won't be a problem for
file_fdw and pgsql_fdw. file_fdw can easily generate the text,
and pgsql_fdw needs to generate a SQL query anyway.

My concern is the explainInfo interface is not ideal for the purpose
and therefore it will be unstable interface. If we support nested plans
in FDWs, each FDW should receive a tree writer used internally in
explain.c. explainInfo, that is a plan text, is not enough for complex
FdwPlans. However, since we don't have any better solution for now,
we could have the variable for 9.1. It's much better than nothing.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON data type status?

2011-01-20 Thread Itagaki Takahiro
On Fri, Jan 21, 2011 at 09:11, Bruce Momjian br...@momjian.us wrote:
 What happened to our work to add a JSON data type for PG 9.1?

Nothing will happen in 9.1.
I assume we are in competition status:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php

Also, if PGXN will work well, we might not have to include JSON
in the core. We can download any JSON implementations from the
site after installing the core server.  Of course, if we will
use JSON types in the core (EXPLAIN JSON output?), we have to
include one of them.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-19 Thread Itagaki Takahiro
On Thu, Jan 20, 2011 at 12:16, Stephen Frost sfr...@snowman.net wrote:
 This patch looked good, in general, to me.  I added a few documentation
 updates and a comment, but it's a very straight-forward patch as far as
 I can tell.  Passes all regressions and my additional testing.

Looks good and useful for me, too.

We need to adjust a bit more documentation. The patch changes all of
EXPLAIN ANALYZE outputs. When I grep'ed the docs with loops=,
EXPLAIN ANALYZE is also used in perform.sgml and auto-explain.sgml
in addition to explain.sgml.

It's good to have documentation about nfiltered parameter. The best
place would be around the descriptions of loops in Using EXPLAIN page:

http://developer.postgresql.org/pgdocs/postgres/using-explain.html

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-18 Thread Itagaki Takahiro
On Tue, Jan 18, 2011 at 05:39, Tom Lane t...@sss.pgh.pa.us wrote:
 I haven't looked at this patch, but it seems to me that it would be
 reasonable to conclude A != B if the va_extsize values in the toast
 pointers don't agree.

It's a very light-weight alternative of memcmp the byte data,
but there is still the same issue -- we might have different
compressed results if we use different algorithm for TOASTing.

So, it would be better to apply the present patch as-is.
We can improve the comparison logic over the patch in another
development cycle if possible.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-17 Thread Itagaki Takahiro
2011/1/17 KaiGai Kohei kai...@ak.jp.nec.com:
 Are you talking about an idea to apply toast id as an alternative key?

No, probably. I'm just talking about whether diff -q A.txt B.txt and
diff -q A.gz  B.gz always returns the same result or not.

... I found it depends on version of gzip. So, if we use such logic,
we cannot improve toast compression logic because the data is migrated
by pg_upgrade.

 I did similar idea to represent security label on user tables for row
 level security in the v8.4/9.0 based implementation. Because a small
 number of security labels are shared by massive tuples, it is waste of
 space if we have all the text data being toasted individually, not only
 performance loss in toast/detoast.

It looks the same issue as large object rather than the discussion here.
We have vacuumlo in contrib to solve the issue. So, we could have
vacuumlo-like special sweeping logic for the security label table.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_replication security

2011-01-17 Thread Itagaki Takahiro
On Mon, Jan 17, 2011 at 19:51, Magnus Hagander mag...@hagander.net wrote:
 Here's a patch that limits it to superuser only. We can't easily match
 it to the user of the session given the way the walsender data is
 returned - it doesn't contain the user information. But limiting it to
 superuser only seems perfectly reasonable and in line with the
 encouragement not to use the replication user for login.

 Objections?

It hides all fields in pg_stat_wal_senders(). Instead, can we just
revoke usage of the function and view?  Or, do we have some plans
to add fields which normal users can see?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-01-17 Thread Itagaki Takahiro
On Sat, Jan 15, 2011 at 08:35, Shigeru HANADA han...@metrosystems.co.jp wrote:
 Interface of NextCopyFrom() is fixed to return HeapTuple, to support
 tableoid without any change to TupleTableSlot.

 3) 20110114-copy_export_HeapTupe.patch
 This patch fixes interface of NextCopyFrom() to return results as
 HeapTuple.

I think file_fdw can return tuples in virtual tuples forms,
and ForeignNext() calls ExecMaterializeSlot() to store tableoid.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Itagaki Takahiro
On Mon, Jan 17, 2011 at 04:05, Andy Colson a...@squeakycode.net wrote:
 This is a review of:
 https://commitfest.postgresql.org/action/patch_view?id=468

 Purpose:
 
 Equal and not-equal _may_ be quickly determined if their lengths are
 different.   This _may_ be a huge speed up if we don't have to detoast.

We can skip detoast to compare lengths of two text/bytea values
with the patch, but we still need detoast to compare the contents
of the values.

If we always generate same toasted byte sequences from the same raw
values, we don't need to detoast at all to compare the contents.
Is it possible or not?

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction-scope advisory locks

2011-01-16 Thread Itagaki Takahiro
On Sun, Jan 16, 2011 at 06:20, Marko Tiikkaja
marko.tiikk...@cs.helsinki.fi wrote:
 Here's the latest version of the patch.  It now uses the API proposed by
 Simon, but still lacks documentation changes, which I'm going to send
 tomorrow.

Here is a short review for Transaction scoped advisory locks:
https://commitfest.postgresql.org/action/patch_view?id=518

== Features ==
The patch adds pg_[try_]advisory_xact_lock[_shared] functions.
The function names follows the past discussion -- it's better than
bool isXact argument or changing the existing behavior.

== Coding ==
The patch itself is well-formed and be applied cleanly.
I expect documentation will come soon.
There is no regression test, but we have no regression test for
advisory locks even now. Tests for lock conflict might be difficult,
but we could have single-threaded test for lock/unlock and pg_locks view.

== Questions ==
I have a question about unlocking transaction-scope advisory locks.
We cannot unlock them with pg_advisory_unlock(), but can unlock with
pg_advisory_unlock_all(). It's inconsistent behavior.
Furthermore, I wonder we can allow unlocking transaction-scope locks
-- we have LOCK TABLE but don't have UNLOCK TABLE.

postgres=# BEGIN;
BEGIN
postgres=# SELECT pg_advisory_xact_lock(1);
 pg_advisory_xact_lock
---

(1 row)

postgres=# SELECT pg_advisory_unlock(1);
WARNING:  you don't own a lock of type ExclusiveLock
 pg_advisory_unlock

 f
(1 row)

postgres=# SELECT pg_advisory_unlock_all();
 pg_advisory_unlock_all


(1 row)

postgres=# ROLLBACK;
ROLLBACK

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] texteq/byteaeq: avoid detoast [REVIEW]

2011-01-16 Thread Itagaki Takahiro
On Mon, Jan 17, 2011 at 16:13, Pavel Stehule pavel.steh...@gmail.com wrote:
 If we always generate same toasted byte sequences from the same raw
 values, we don't need to detoast at all to compare the contents.
 Is it possible or not?

 For bytea, it seems it would be possible.

 For text, I think locales may make that impossible. Aren't there
 locale rules where two different characters can behave the same when
 comparing them? I know in Swedish at least w and v behave the same
 when sorting (but not when comparing) in some variants of the locale.

 Some string's comparation operations are binary now too. But it is
 question what will be new with collate support.

Right. We are using memcmp() in texteq and textne now. We consider
collations only in , =, =,  and compare support functions.
So, I think there is no regression here as long as raw values and
toasted byte sequences have one-to-one correspondence.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] textarray option for file FDW

2011-01-15 Thread Itagaki Takahiro
On Sun, Jan 16, 2011 at 02:29, Andrew Dunstan and...@dunslane.net wrote:
 textarray. This option would require that the foreign table have exactly
 one field, of type text[], and would compose all the field strings read from
 the file for each record into the array (however many there are).

   CREATE FOREIGN TABLE arr_text (
        t text[]
   )  SERVER file_server
   OPTIONS (format 'csv', filename '/path/to/ragged.csv', textarray 'true');

FOREIGN TABLE has stable definitions, so there are some issues
that doesn't exist in COPY command:
- when the type of the last column is not a text[]
- when the last column is changed by ALTER FOREIGN TABLE ADD/DROP COLUMN

BTW, those options could be specified in column options rather than table
options. But we don't have column option syntax in the current proposal.

  CREATE FOREIGN TABLE arr_text (
       i integer OPTION (column 1), -- column position in csv file
       t text[]  OPTION (column 'all the rest'),
       d dateOPTION (column 2)
  )  SERVER file_server
  OPTIONS (format 'csv', filename '/path/to/ragged.csv');

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2011-01-13 Thread Itagaki Takahiro
On Fri, Jan 14, 2011 at 14:20, Shigeru HANADA han...@metrosystems.co.jp wrote:
 After copying statisticsof pgbench_xxx tables into csv_xxx tables,
 planner generates same plans as for local tables, but costs of
 ForeignScan nodes are little lower than them of SeqScan nodes.
 Forced Nested Loop uses Materialize node as expected.

Interesting. It means we need per-column statistics for foreign
tables in addition to cost values.

 ISTM that new interface which is called from ANALYZE would help to
 update statistics of foreign talbes.  If we could leave sampling
 argorythm to FDWs, acquire_sample_rows() might fit for that purpose.

We will discuss how to collect statistics from foreign tables
in the next development cycle. I think we have two choice here:

 #1. Retrieve sample rows from remote foreign tables and
 store stats in the local pg_statistic.
 #2. Use remote statistics for each foreign table directly.

acquire_sample_rows() would be a method for #1, Another approach
for #2 is to use remote statistics directly. We provide hooks to
generate virtual statistics with get_relation_stats_hook() and
families. We could treat statistics for foreign tables in a similar
way as the hook.

file_fdw likes #1 because there are no external storage to store
statistics for CSV files, but pgsql_fdw might prefer #2 because
the remote server already has stats for the underlying table.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiset patch review

2011-01-12 Thread Itagaki Takahiro
On Wed, Jan 12, 2011 at 15:21, Peter Eisentraut pete...@gmx.net wrote:
 You may want to read this thread about the cardinality function are you
 trying to add:

 http://archives.postgresql.org/pgsql-hackers/2009-02/msg01388.php

Since our archive is split per month, this might be more readable:
http://postgresql.1045698.n5.nabble.com/cardinality-td2003172.html

We've discussed what number should cardinality() returns:
 #1. The total number of elements. (It's currently implemented.)
 #2. The length of the first dimension.
 It's as same as array_length(array, 1) .

I prefer #1 because we have no easy way to retrieve the number.
array_dims() returns similar numbers, but calculate the total
number is a bit complex.

If we will support array of arrays (jugged array), cardinality()
can return the number of elements in the most outer array.
It's similar definition in multi-dimensional arrays in C#,
that has both array of arrays and multi-dimensional arrays.

http://msdn.microsoft.com/library/system.array.length(v=VS.100).aspx

We can compare those SQL functions and C# array methods:
  * cardinality(array) -- array.Length
  * array_length(array. dim) -- array.GetLength(dim)


 Also, what happened to the idea of a separate MULTISET type?

I don't have any plans to implement dedicated MULTISET type for now
because almost all functions and operators can work also for arrays.
If we have a true MULTISET data type, we can overload them with
MULTISET arguments.

One exception might be collect() aggregate function because
we might need to change the result type from array to multiset.
collect(anyelement) = anyarray for now
Note that fusion() won't be an issue because we can overload it:
fusion(anyarray) = anyarray and (anymultiset) = anymultiset

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   9   >