[HACKERS] PATCH: CITEXT 2.0 v4

2008-07-15 Thread David E. Wheeler

Howdy,

I've attached a new patch with the latest revisions of for the citext  
contrib module patch. The changes include:


* Using strlen() to pass string lengths to the comparison function,
  since lowercasing the value can change the length. Per Tom Lane.
* Made citextcmp consistently return int32, per Tom Lane.
* Made the hash index function return the proper value, per Tom Lane.
* Removed the COMMENTs and GRANTs from citext.sql.in.
* Added a cast function from bpchar to citext, as suggested by Tom Lane.
* Set the storage type for CITEXT to "extended", to ensure that it will
  be toastable. Per Tom Lane.
* Fixed the COMMUTATOR of <=.
* Changed the cast from citext to bpchar from implicit to assignment.
  This eliminates ambiguous function resolutions.
* Eliminated superflous functions, per Tom Lane.
* Removed unnecessary `OPERATOR()` calls in NEGATORs and the like.
* Added binary in/out functions. Per Tom Lane
* Added an explicit shell type to make the output a bit quieter.
* Converted tests to pure SQL and omitted multibyte tests (though a
  few remain commented-out).
* Reorganized and expanded the documentation a bit.

This version is far better than I started with, and I'm very grateful  
for the feedback.


Now, I have a few remaining questions to ask, mostly just to get your  
opinions:


* The README for citext 1.0 on pgFoundry says:

I had to make a decision on casting between types for regular  
expressions and
decided that if any parameter is of citext type then case  
insensitive applies.
For example applying regular expressions with a varchar and a citext  
will

produce a case-insensitive result.

Having thought about this afterwards I realised that since we have  
the option
to use case-insensitive results with regular expressions I should  
have left the
behaviour exactly as text and then you have the best of both  
worlds... oh well

not hard to change for any of you perfectionists!


I followed the original and made all the regex and LIKE comparisons  
case-insensitive. But maybe I should not have? Especially since the  
regular expression functions (e.g., regexp_replace()) and a few non- 
regex functions (e.g., replace()) still don't behave case-insensitively?


* If the answer is "no", how can I make those functions behave case- 
insensitively? (See the "TODO" tests.)


* Should there be any other casts? To and from name, perhaps?

Thanks!

David


citext4.patch.gz
Description: GNU Zip compressed data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-15 Thread David Fetter
On Wed, Jul 16, 2008 at 01:57:04PM +0900, Tatsuo Ishii wrote:
> > > No idea.  What do you think if we allow only one query name at the
> > > moment.
> > 
> > I'm not sure I understand what that has to do with sorting.
> > 
> > Please find attached a place where I've found some problems sorting by
> > tree by array as Asaba-san suggested.
> 
> Humm. your problem seems to do nothing with the problem I refer to.

Sorry about that.  Is my problem reproducible?  Is there maybe some
way to include regression tests around it?

> What I have in my mind is something like:
> 
> WITH RECURSIVE foo(a, b) AS
> (SELECT ... UNION SELECT...),
> 
> bar(c, d) AS
> (SELECT ... FROM foo WHERE ...UNION...)
> )
> SELECT * FROM foo;
> 
> In this there are two query names (foo, bar) and we need to detect the
> dependency that bar relies on foo before processing the query.

I think mutually recursive queries may have been dropped from
SQL:2008.

> However, as I said earlier, this kind of use case would be rare in
> the real world, and I'd like to limit ourselves to having only one
> query name at the moment.
> 
> Also I suggest to concentrate on reviewing the WITH RECURSIVE
> implementation itself now, rather than discussing how to use git
> repository or how to write an interesting WITH RECURSIVE
> applications.
> 
> Don't get me wrong. I believe git is a great tool. But we have
> limited time and need to think about the priority.

Fair enough :)

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sorting writes during checkpoint

2008-07-15 Thread Greg Smith

On Mon, 7 Jul 2008, ITAGAKI Takahiro wrote:


I will have a plan to test it on RAID-5 disks, where sequential writing
are much better than random writing. I'll send the result as an evidence.


If you're running more tests here, please turn on log_checkpoints and 
collect the logs while the test is running.  I'm really curious if there's 
any significant difference in what that reports here in the sorted case 
vs. the regular one.


Smoothed checkpoint in 8.3 spreads write(), but calls fsync() at once. 
With sorted writes, we can call fsync() segment-by-segment for each 
writes of dirty pages contained in the segment. It could improve worst 
response time during checkpoints.


Further decreasing the amount of data that is fsync'd at any point in time 
might be a bigger improvement than just the sorting itself is doing (so 
far I haven't seen anything really significant just from the sort but am 
still testing).


One thing I didn't see any comments from you on is how/if the sorted 
writes patch lowers worst-case latency.  That's the area I'd hope an 
improved fsync protocol would help most with, rather than TPS, which might 
even go backwards because writes won't be as bunched and therefore will 
have more seeking.  It's easy enough to analyze the data coming from 
"pgbench -l" to figure that out; example shell snipped that shows just the 
worst ones:


pgbench -l -N 
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail

Actually graphing the latencies can be even more instructive, I have some 
examples of that on my web page you may have seen before.



In addition, the current smgr layer is completely useless because
it cannot be extended dynamically and cannot handle multiple md-layer
modules. I would rather merge current smgr and part of bufmgr into
a new smgr and add smgr_hook() than bulk_io_hook().


I don't really have a firm opinion here about the code to comment on this 
specific suggestion, but I will say that I've found the amount of layering 
in this area makes it difficult to understand just what's going on 
sometimes (especially when new to it).  A lot of that abstraction felt a 
bit pass-through to me, and anything that would collapse that a bit would 
be helpful for streamlining the code instrumenting going on with things 
like dtrace.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-15 Thread Tatsuo Ishii
> On Wed, Jul 16, 2008 at 09:37:25AM +0900, Tatsuo Ishii wrote:
> > > On Tue, Jul 08, 2008 at 06:01:05PM +0900, Tatsuo Ishii wrote:
> > > > Here is the patches he made against CVS HEAD (as of today).
> > > > 
> > > > According to him followings are fixed with the patches:
> > > > 
> > > > - fix crush with DISTINCT
> > > > - fix creating VIEW
> > > > - fix the case when recursion plan has another recursion plan under it
> > > > - fix WITH RECURSIVE ...(..) SELECT ...WHERE.. returns wrong result
> > > > - fix inifinit recursion with OUTER JOIN
> > > 
> > > Great!
> > > 
> > > I've patched psql for some partial support of WITH [RECURSIVE].
> > > 
> > > http://git.postgresql.org/?p=~davidfetter/postgresql/.git;a=commit;h=da63f9a82b9e902b5542f788b2e6e6bc95221793
> > 
> > Thanks. I will incorporate them with propsed patches.
> 
> Part of the point of this exercise is to make git the way to do this.
> Can you please point me to a git repository where your latest changes
> are so I can publish them?
> 
> > > > Not yet fixed:
> > > > 
> > > > - detect certain queries those are not valid acroding to the standard
> > > > - sort query names acording to the dependency
> > > 
> > > Is there something in the standard on how to do this?  How to sort
> > > the nodes other ways?
> > 
> > No idea.  What do you think if we allow only one query name at the
> > moment.
> 
> I'm not sure I understand what that has to do with sorting.
> 
> Please find attached a place where I've found some problems sorting by
> tree by array as Asaba-san suggested.

Humm. your problem seems to do nothing with the problem I refer to.

What I have in my mind is something like:

WITH RECURSIVE foo(a, b) AS
(SELECT ... UNION SELECT...),

bar(c, d) AS
(SELECT ... FROM foo WHERE ...UNION...)
)
SELECT * FROM foo;

In this there are two query names (foo, bar) and we need to detect the
dependency that bar relies on foo before processing the query.

However, as I said earlier, this kind of use case would be rare in the
real world, and I'd like to limit ourselves to having only one query
name at the moment.

Also I suggest to concentrate on reviewing the WITH RECURSIVE
implementation itself now, rather than discussing how to use git
repository or how to write an interesting WITH RECURSIVE applications.

Don't get me wrong. I believe git is a great tool. But we have limited
time and need to think about the priority.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler

On Jul 15, 2008, at 20:26, Tom Lane wrote:


"David E. Wheeler" <[EMAIL PROTECTED]> writes:

So I guess my question is: what is wrong with the  properties for
citextsend/citextrecv


[ checks catalogs... ] textsend and textrecv are marked STABLE not
IMMUTABLE.  I am not totally sure about the reasoning offhand --- it
might be because their behavior depends on client_encoding.


Thanks. Looks like maybe the xtypes docs need to be updated?

  http://www.postgresql.org/docs/8.3/static/xtypes.html

Anyway, changing them to "STABLE STRICT" appears to have done the  
trick (diff attached).



and what else might these failures be indicating
is wrong?


I think the other diffs are okay, they just reflect the fact that  
you're

depending on binary equivalence of text and citext.


Great, thanks. And with that, I think I'm just about ready to submit a  
new version of the patch, coming up shortly.


Best,

David


regression.diffs
Description: Binary data



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-15 Thread Sushant Sinha
attached are two patches:

1. documentation
2. regression tests
 for headline with fragments.

-Sushant.

On Tue, 2008-07-15 at 13:29 +0400, Teodor Sigaev wrote:
> > Attached a new patch that:
> > 
> > 1. fixes previous bug
> > 2. better handles the case when cover size is greater than the MaxWords. 
> 
> Looks good, I'll make some tests with  real-world application.
> 
> > I have not yet added the regression tests. The regression test suite 
> > seemed to be only ensuring that the function works. How many tests 
> > should I be adding? Is there any other place that I need to add 
> > different test cases for the function?
> 
> Just add 3-5 selects to src/test/regress/sql/tsearch.sql with checking basic 
> functionality and corner cases like
>   - there is no covers in text
>   - Cover(s) is too big
>   - and so on
> 
> Add some words in documentation too, pls.
> 
> 
Index: doc/src/sgml/textsearch.sgml
===
RCS file: /home/postgres/devel/pgsql-cvs/pgsql/doc/src/sgml/textsearch.sgml,v
retrieving revision 1.44
diff -c -r1.44 textsearch.sgml
*** doc/src/sgml/textsearch.sgml	16 May 2008 16:31:01 -	1.44
--- doc/src/sgml/textsearch.sgml	16 Jul 2008 02:37:28 -
***
*** 1100,1105 
--- 1100,1117 
   
   

+MaxFragments: maximum number of text excerpts 
+or fragments that matches the query words. It also triggers a 
+different headline generation function than the default one. This
+function finds text fragments with as many query words as possible.
+Each fragment will be of at most MaxWords and will not have words
+of size less than or equal to ShortWord at the start or end of a 
+fragment. If all query words are not found in the document, then
+a single fragment of MinWords will be displayed.
+   
+  
+  
+   
 HighlightAll: Boolean flag;  if
 true the whole document will be highlighted.

***
*** 1109,1115 
  Any unspecified options receive these defaults:
  
  
! StartSel=, StopSel=, MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE
  
 
  
--- 1121,1127 
  Any unspecified options receive these defaults:
  
  
! StartSel=, StopSel=, MaxFragments=0, MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE
  
 
  
Index: src/test/regress/sql/tsearch.sql
===
RCS file: /home/postgres/devel/pgsql-cvs/pgsql/src/test/regress/sql/tsearch.sql,v
retrieving revision 1.9
diff -c -r1.9 tsearch.sql
*** src/test/regress/sql/tsearch.sql	16 May 2008 16:31:02 -	1.9
--- src/test/regress/sql/tsearch.sql	16 Jul 2008 03:45:24 -
***
*** 208,213 
--- 208,253 
  ',
  to_tsquery('english', 'sea&foo'), 'HighlightAll=true');
  
+ --Check if headline fragments work 
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+   We stuck, nor breath nor motion,
+ As idle as a painted Ship
+   Upon a painted Ocean.
+ Water, water, every where
+   And all the boards did shrink;
+ Water, water, every where,
+   Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'ocean'), 'MaxFragments=1');
+ 
+ --Check if more than one fragments are displayed
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+   We stuck, nor breath nor motion,
+ As idle as a painted Ship
+   Upon a painted Ocean.
+ Water, water, every where
+   And all the boards did shrink;
+ Water, water, every where,
+   Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'Coleridge & stuck'), 'MaxFragments=2');
+ 
+ --Fragments when there all query words are not in the document
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+   We stuck, nor breath nor motion,
+ As idle as a painted Ship
+   Upon a painted Ocean.
+ Water, water, every where
+   And all the boards did shrink;
+ Water, water, every where,
+   Nor any drop to drink.
+ S. T. Coleridge (1772-1834)
+ ', to_tsquery('english', 'ocean & seahorse'), 'MaxFragments=1');
+ 
+ 
  --Rewrite sub system
  
  CREATE TABLE test_tsquery (txtkeyword TEXT, txtsample TEXT);
Index: src/test/regress/expected/tsearch.out
===
RCS file: /home/postgres/devel/pgsql-cvs/pgsql/src/test/regress/expected/tsearch.out,v
retrieving revision 1.14
diff -c -r1.14 tsearch.out
*** src/test/regress/expected/tsearch.out	16 May 2008 16:31:02 -	1.14
--- src/test/regress/expected/tsearch.out	16 Jul 2008 03:47:46 -
***
*** 632,637 
--- 632,705 
   
  (1 row)
  
+ --Check if headline fragments work 
+ SELECT ts_headline('english', '
+ Day after day, day after day,
+   We stuck, nor breath nor motion,
+ As idle as a painted Ship
+   Upon a painted Ocean.
+ Water, water, every where
+   And all the boards did shrink;
+ Water, water,

Re: [HACKERS] Lookup penalty for VARIADIC patch

2008-07-15 Thread Decibel!

On Jul 15, 2008, at 4:58 PM, Tom Lane wrote:

There are two ways we
could do it: a bool column that is TRUE if the function is variadic,
or an oid column that is the variadic array's element type, or zero
if the function isn't variadic.  The second would take more space but
would avoid having to do a catalog lookup to get the element type in
the case that the function is indeed variadic.  I'm leaning to the
second way but wanted to know if anyone objected?


If you go the second route, I'd vote for it being NULL if the  
function isn't variadic, unless that would play hell with the C side  
of the catalog code...



Also, it occurs to me that we could buy back a good part of the extra
space if we allowed pg_proc.probin to be NULL for internal functions.
Right now it's always "-" in that case, which is useless ...



I'd vote for that being NULL in any case... magic values should be  
avoided when possible.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Lookup penalty for VARIADIC patch

2008-07-15 Thread Tom Lane
Decibel! <[EMAIL PROTECTED]> writes:
> On Jul 15, 2008, at 4:58 PM, Tom Lane wrote:
>> There are two ways we
>> could do it: a bool column that is TRUE if the function is variadic,
>> or an oid column that is the variadic array's element type, or zero
>> if the function isn't variadic.  The second would take more space but
>> would avoid having to do a catalog lookup to get the element type in
>> the case that the function is indeed variadic.  I'm leaning to the
>> second way but wanted to know if anyone objected?

> If you go the second route, I'd vote for it being NULL if the  
> function isn't variadic, unless that would play hell with the C side  
> of the catalog code...

Getting rid of the check for null is *exactly* the point here --- AFAICT
that's what's eating all the time in the existing code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes:
> So I guess my question is: what is wrong with the  properties for  
> citextsend/citextrecv

[ checks catalogs... ] textsend and textrecv are marked STABLE not
IMMUTABLE.  I am not totally sure about the reasoning offhand --- it
might be because their behavior depends on client_encoding.

> and what else might these failures be indicating  
> is wrong?

I think the other diffs are okay, they just reflect the fact that you're
depending on binary equivalence of text and citext.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements

2008-07-15 Thread ITAGAKI Takahiro

Tom Lane <[EMAIL PROTECTED]> wrote:

> >> That raises the question of whether we should have ExecutorStart() and
> >> ExecutorEnd() hooks as well, to round things off.
> > Yeah, and also ExecutorRewind() hook.
> 
> I'm happy to put in hooks that there's a demonstrated need for,

Hmm, ok. I just want to hook ExecutorRun, so I'll just propose to
add ExecutorRun_hook now.

The attached patch is the proposal. It adds two global symbols:
  * ExecutorRun_hook - replacing behavior of ExecutorRun()
  * standard_ExecutorRun() - default behavior of ExecutorRun()

And also modifies one funtion:
  * ExecuteQuery() - It passes prepared query's text to portal so that
the prepared query's text is available at the executor level.
This change is almost free because it copys only string pointer,
not the string buffer.


The attached archive pg_stat_statements.tar.gz is a demonstration of
ExecutorRun_hook. It collect per-statement statistics of number of planned
and executed, plan cost, execution time, and buffer gets/reads/writes.
I'll happy if the addin will be accepted as contrib module, but if it is
not suitable, I'm willing to move it to pgFoundry.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



ExecutorRun_hook.patch
Description: Binary data


pg_stat_statements.tar.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-15 Thread David Fetter
On Wed, Jul 16, 2008 at 09:37:25AM +0900, Tatsuo Ishii wrote:
> > On Tue, Jul 08, 2008 at 06:01:05PM +0900, Tatsuo Ishii wrote:
> > > Here is the patches he made against CVS HEAD (as of today).
> > > 
> > > According to him followings are fixed with the patches:
> > > 
> > > - fix crush with DISTINCT
> > > - fix creating VIEW
> > > - fix the case when recursion plan has another recursion plan under it
> > > - fix WITH RECURSIVE ...(..) SELECT ...WHERE.. returns wrong result
> > > - fix inifinit recursion with OUTER JOIN
> > 
> > Great!
> > 
> > I've patched psql for some partial support of WITH [RECURSIVE].
> > 
> > http://git.postgresql.org/?p=~davidfetter/postgresql/.git;a=commit;h=da63f9a82b9e902b5542f788b2e6e6bc95221793
> 
> Thanks. I will incorporate them with propsed patches.

Part of the point of this exercise is to make git the way to do this.
Can you please point me to a git repository where your latest changes
are so I can publish them?

> > > Not yet fixed:
> > > 
> > > - detect certain queries those are not valid acroding to the standard
> > > - sort query names acording to the dependency
> > 
> > Is there something in the standard on how to do this?  How to sort
> > the nodes other ways?
> 
> No idea.  What do you think if we allow only one query name at the
> moment.

I'm not sure I understand what that has to do with sorting.

Please find attached a place where I've found some problems sorting by
tree by array as Asaba-san suggested.

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
CREATE TABLE region (
region_id SERIAL PRIMARY KEY,
region_name TEXT NOT NULL,
region_type TEXT NOT NULL,
parent_region_id INTEGER REFERENCES region(region_id),
CHECK (
CASE WHEN region_type = 'Country' THEN 1 ELSE 0 END +
CASE WHEN parent_region_id IS NULL THEN 0 ELSE 1 END = 1
)
);

CREATE UNIQUE INDEX country_top_idx
ON region(region_name) WHERE region_type = 'Country';

CREATE UNIQUE INDEX must_have_parents_idx ON 
region(region_name, region_type, parent_region_id)
WHERE parent_region_id IS NOT NULL;

INSERT INTO region (region_name, region_type)
VALUES
('Brasil', 'Country'),
('United States', 'Country');

INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES 
('Centro-Oeste','Região'),
('Nordeste','Região'),
('Norte','Região'),
('Sudeste','Região'),
('Sul','Região')
) AS v
WHERE (r.region_name, r.region_type) = ('Brasil','Country');


INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES
('Goiás','Estado'),
('Mato Grosso','Estado'),
('Mato Grosso do Sul','Estado'),
('Distrito Federal','Distrito Federal')
) AS v
WHERE (r.region_name, r.region_type) = ('Centro-Oeste','Região');

INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES
('Maranhão','Estado'),
('Piauí','Estado'),
('Ceará','Estado'),
('Rio Grande do Norte','Estado'),
('Paraíba','Estado'),
('Pernambuco','Estado'),
('Alagoas','Estado'),
('Sergipe','Estado'),
('Bahia','Estado')
) AS v
WHERE (r.region_name, r.region_type) = ('Nordeste','Região');

INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES
('Acre','Estado'),
('Amazonas','Estado'),
('Roraima','Estado'),
('Rondônia','Estado'),
('Pará','Estado'),
('Amapá','Estado'),
('Tocantins','Estado')
) AS v
WHERE (r.region_name, r.region_type) = ('Norte','Região');

INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES
('Minas Gerais','Estado'),
('Espírito Santo','Estado'),
('Rio de Janeiro','Estado'),
('São Paulo','Estado')
) AS v
WHERE (r.region_name, r.region_type) = ('Sudeste','Região');

INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES
('Paraná','Estado'),
('Santa Catarina','Estado'),
('Rio Grande do Sul','Estado')
) AS v
WHERE (r.region_name, r.region_type) = ('Sul','Região');


INSERT INTO region (parent_region_id, region_name, region_type)
SELECT r.region_id, v.*
FROM
region r
CROSS JOIN
(
VALUES
('Goiânia','Cidade'),
('Aparecida de Goiânia','Cidade'),
('Anápolis','Cidade'),
('Luziânia','Cidade'),
('Á

Re: [HACKERS] rfc: add pg_dump options to dump output

2008-07-15 Thread Bruce Momjian

Is this a TODO item?

---

Greg Smith wrote:
> On Tue, 3 Jun 2008, Tom Lane wrote:
> 
> > Well, the stuff included into the dump by pg_dump -v is informative,
> > too.  But we stopped doing that by default because of complaints.
> > I remain unconvinced that this proposal won't suffer the same fate.
> 
> I think it would be reasonable to only include the list of options used in 
> the dump if you use one that changes what appears in the dump.  That way, 
> you wouldn't see anything by default.  But if you make a modification that 
> will likely break a diff with an existing dump done with the default 
> parameters, the option change that introduced that should show at the very 
> beginning.
> 
> --
> * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler

On Jul 15, 2008, at 12:56, Tom Lane wrote:


Don't run the tests in a read-only directory, perhaps.


Yes, I changed the owner to the postgres system user and that did the  
trick.





Or do they matter for sanity-checking citext?


Hard to tell --- I'd suggest trying to get a clean run.  As for what  
you

have, the first diff hunk suggests you've got the wrong function
properties for citextsend/citextrecv.


Here's the new diff:

*** ./expected/opr_sanity.out   Mon Jul 14 21:55:49 2008
--- ./results/opr_sanity.outTue Jul 15 17:41:03 2008
***
*** 87,94 
   p1.provolatile != p2.provolatile OR
   p1.pronargs != p2.pronargs);
   oid | proname | oid | proname
! -+-+-+-
! (0 rows)

  -- Look for uses of different type OIDs in the argument/result type  
fields

  -- for different aliases of the same built-in function.
--- 87,96 
   p1.provolatile != p2.provolatile OR
   p1.pronargs != p2.pronargs);
   oid  | proname  |  oid  |  proname
! --+--+---+
!  2414 | textrecv | 87258 | citextrecv
!  2415 | textsend | 87259 | citextsend
! (2 rows)

  -- Look for uses of different type OIDs in the argument/result type  
fields

  -- for different aliases of the same built-in function.
***
*** 110,117 
   prorettype | prorettype
  +
   25 |   1043
 1114 |   1184
! (2 rows)

  SELECT DISTINCT p1.proargtypes[0], p2.proargtypes[0]
  FROM pg_proc AS p1, pg_proc AS p2
--- 112,120 
   prorettype | prorettype
  +
   25 |   1043
+  25 |  87255
 1114 |   1184
! (3 rows)

  SELECT DISTINCT p1.proargtypes[0], p2.proargtypes[0]
  FROM pg_proc AS p1, pg_proc AS p2
***
*** 124,133 
  -+-
25 |1042
25 |1043
  1114 |1184
  1560 |1562
  2277 |2283
! (5 rows)

  SELECT DISTINCT p1.proargtypes[1], p2.proargtypes[1]
  FROM pg_proc AS p1, pg_proc AS p2
--- 127,138 
  -+-
25 |1042
25 |1043
+   25 |   87255
+ 1042 |   87255
  1114 |1184
  1560 |1562
  2277 |2283
! (7 rows)

  SELECT DISTINCT p1.proargtypes[1], p2.proargtypes[1]
  FROM pg_proc AS p1, pg_proc AS p2
***
*** 139,148 
   proargtypes | proargtypes
  -+-
23 |  28
  1114 |1184
  1560 |1562
  2277 |2283
! (4 rows)

  SELECT DISTINCT p1.proargtypes[2], p2.proargtypes[2]
  FROM pg_proc AS p1, pg_proc AS p2
--- 144,154 
   proargtypes | proargtypes
  -+-
23 |  28
+   25 |   87255
  1114 |1184
  1560 |1562
  2277 |2283
! (5 rows)

  SELECT DISTINCT p1.proargtypes[2], p2.proargtypes[2]
  FROM pg_proc AS p1, pg_proc AS p2
***
*** 305,311 
  142 | 25 |0 | a
  142 |   1043 |0 | a
  142 |   1042 |0 | a
! (6 rows)

  --  pg_operator 
  -- Look for illegal values in pg_operator fields.
--- 311,318 
  142 | 25 |0 | a
  142 |   1043 |0 | a
  142 |   1042 |0 | a
!   87255 |   1042 |0 | a
! (7 rows)

  --  pg_operator 
  -- Look for illegal values in pg_operator fields.

==

So I guess my question is: what is wrong with the  properties for  
citextsend/citextrecv and what else might these failures be indicating  
is wrong?


CREATE OR REPLACE FUNCTION citextrecv(internal)
RETURNS citext
AS 'textrecv'
LANGUAGE 'internal' IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION citextsend(citext)
RETURNS bytea
AS 'textsend'
LANGUAGE 'internal' IMMUTABLE STRICT;

CREATE TYPE citext (
INPUT  = citextin,
OUTPUT = citextout,
RECEIVE= citextrecv,
SEND   = citextsend,
INTERNALLENGTH = VARIABLE,
STORAGE= extended
);

Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-15 Thread Tatsuo Ishii
> On Tue, Jul 08, 2008 at 06:01:05PM +0900, Tatsuo Ishii wrote:
> > Here is the patches he made against CVS HEAD (as of today).
> > 
> > According to him followings are fixed with the patches:
> > 
> > - fix crush with DISTINCT
> > - fix creating VIEW
> > - fix the case when recursion plan has another recursion plan under it
> > - fix WITH RECURSIVE ...(..) SELECT ...WHERE.. returns wrong result
> > - fix inifinit recursion with OUTER JOIN
> 
> Great!
> 
> I've patched psql for some partial support of WITH [RECURSIVE].
> 
> http://git.postgresql.org/?p=~davidfetter/postgresql/.git;a=commit;h=da63f9a82b9e902b5542f788b2e6e6bc95221793

Thanks. I will incorporate them with propsed patches.

> > Not yet fixed:
> > 
> > - detect certain queries those are not valid acroding to the standard
> > - sort query names acording to the dependency
> 
> Is there something in the standard on how to do this?  How to sort the
> nodes other ways?

No idea. What do you think if we allow only one query name at the
moment. I guess most WITH RECURISVE use cases are enough with single
query name.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] gsoc, store hash index tuple with hash code only

2008-07-15 Thread Xiao Meng
Hi, hackers.
I'm working on my gsoc project of improving hash index.
I posted a thread to ask some questions about implementation a few hours ago.
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00721.php

Someone's advice reminds me of the problem of design decision again.
I  store hash index tuple as a regular index tuple currently.  An
alternative design is to totally change the layout of hash index tuple
(using a different structure).
Here is my design.
We convert hash code into a datum using UINT32GetDatum() and store it
in a regular index tuple.
When we check the tuple, we get the hashkey back using DatumGetUINT32.
We also need a TupleDesc with uint32 attribute. I don't know how to
create it. Alternatively, I create a TupleDesc with int32 attribute.
The hash item should be in order to support binary search later. I
haven't implement it because I'd like to finish the first available
one ASAP.

I post a patch below. The patch didn't passed regression test yet. I
just posted it here to show my design.
Any comments are welcome.

diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 6a5c000..fbbd7dc 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -129,7 +129,11 @@ hashbuildCallback(Relation index,
IndexTuple itup;

/* form an index tuple and point it at the heap tuple */
+#ifdef HASHVALUE_ONLY
+itup = _hash_form_tuple(index, values,isnull);
+#else
itup = index_form_tuple(RelationGetDescr(index), values, isnull);
+#endif
itup->t_tid = htup->t_self;

/* Hash indexes don't index nulls, see notes in hashinsert */
@@ -171,7 +175,12 @@ hashinsert(PG_FUNCTION_ARGS)
IndexTuple itup;

/* generate an index tuple */
+#ifdef HASHVALUE_ONLY
+itup = _hash_form_tuple(rel,values,isnull);
+#else
itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
+#endif
+
itup->t_tid = *ht_ctid;

/*
@@ -212,7 +221,11 @@ hashgettuple(PG_FUNCTION_ARGS)
bool res;

/* Hash indexes are never lossy (at the moment anyway) */
- scan->xs_recheck = false;
+#ifdef HASHVALUE_ONLY
+ scan->xs_recheck = true;
+#else
+ scan->xs_recheck = false;
+#endif

/*
* We hold pin but not lock on current buffer while outside the hash AM.
diff --git a/src/backend/access/hash/hashpage.c
b/src/backend/access/hash/hashpage.c
index b0b5874..068337e 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -785,7 +785,7 @@ _hash_splitbucket(Relation rel,
OffsetNumber omaxoffnum;
Page opage;
Page npage;
- TupleDesc itupdesc = RelationGetDescr(rel);
+ TupleDesc itupdesc = _create_hash_desc();

/*
* It should be okay to simultaneously write-lock pages from each bucket,
@@ -854,9 +854,13 @@ _hash_splitbucket(Relation rel,
itup = (IndexTuple) PageGetItem(opage, PageGetItemId(opage, ooffnum));
datum = index_getattr(itup, 1, itupdesc, &null);
Assert(!null);
-
+#ifdef HASHVALUE_ONLY
+ bucket = _hash_hashkey2bucket(DatumGetUInt32(datum),
+  maxbucket, highmask, lowmask);
+#else
bucket = _hash_hashkey2bucket(_hash_datum2hashkey(rel, datum),
 maxbucket, highmask, lowmask);
+#endif

if (bucket == nbucket)
{
diff --git a/src/backend/access/hash/hashsearch.c
b/src/backend/access/hash/hashsearch.c
index 258526b..5e7668d 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -177,7 +177,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
else
hashkey = _hash_datum2hashkey_type(rel, cur->sk_argument,
  cur->sk_subtype);
-
+so->hashso_sk_hash = hashkey;
/*
* Acquire shared split lock so we can compute the target bucket safely
* (see README).
diff --git a/src/backend/access/hash/hashutil.c
b/src/backend/access/hash/hashutil.c
index 41e2eef..4be814e 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -20,6 +20,8 @@
#include "executor/execdebug.h"
#include "storage/bufmgr.h"
#include "utils/lsyscache.h"
+#include "utils/typcache.h"
+#include "catalog/pg_type.h"


/*
@@ -28,16 +30,29 @@
bool
_hash_checkqual(IndexScanDesc scan, IndexTuple itup)
{
- TupleDesc tupdesc = RelationGetDescr(scan->indexRelation);
+ TupleDesc tupdesc = _create_hash_desc();
ScanKey key = scan->keyData;
int scanKeySize = scan->numberOfKeys;
+Datum datum;
+bool isNull;
+HashScanOpaque  so = scan->opaque;

IncrIndexProcessed();

+#ifdef HASHVALUE_ONLY
+datum = index_getattr(itup,
+  key->sk_attno,
+  tupdesc,
+  &isNull);
+if( so->hashso_sk_hash != DatumGetInt32(datum) )
+return false;
+
+key++;
+scanKeySize--;
+#endif
+
while (scanKeySize > 0)
{
- Datum datum;
- bool isNull;
Datum test;

datum = index_getattr(itup,
@@ -50,7 +65,7 @@ _hash_checkqual(IndexScanDesc scan, IndexTuple itup)
return false;
if (key->sk_flags & SK_ISNULL)
return false;
-
+
test = FunctionCall2(&key->sk_func, datum, key->sk_argument);

if (!DatumGetBool(test))
@@ -222,3 +237,31 @@ hashoptions(PG_FUNCTION_ARGS)
PG_RETURN_BYTEA

Re: [HACKERS] Lookup penalty for VARIADIC patch

2008-07-15 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> 2008/7/15 Tom Lane <[EMAIL PROTECTED]>:
>> Also, it occurs to me that we could buy back a good part of the extra
>> space if we allowed pg_proc.probin to be NULL for internal functions.
>> Right now it's always "-" in that case, which is useless ...

> probin is used in some unofficial pl hacks, so this space its some
> times used.

Sure, if you want to use it you can.  I'm just saying we should allow it
to be really NULL, instead of a dummy value, when it isn't being used.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes:
> Well now that was cool to see. I got some failures, of course, but  
> nothing stands out to me as an obvious bug. I attach the diffs file  
> (with the citext.sql failure removed) for your perusal. What would be  
> the best way for me to resolve those permission issues?

Don't run the tests in a read-only directory, perhaps.

> Or do they matter for sanity-checking citext?

Hard to tell --- I'd suggest trying to get a clean run.  As for what you
have, the first diff hunk suggests you've got the wrong function
properties for citextsend/citextrecv.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Bruce Momjian
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > 
> > > > Added to TODO:
> > > > 
> > > > o Reduce PITR WAL file size by removing full page writes and
> > > >   by removing trailing bytes to improve compression
> > > 
> > > If we remove full page writes, how does hint bit setting get propagated
> > > to the slave?
> > 
> > We would remove full page writes that are needed for crash recovery, but
> > perhaps keep other full pages.
> 
> How do you tell which is which?

The WAL format would have to be modified to indicate which entries can
be discarded.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Alvaro Herrera
Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> > > Added to TODO:
> > > 
> > > o Reduce PITR WAL file size by removing full page writes and
> > >   by removing trailing bytes to improve compression
> > 
> > If we remove full page writes, how does hint bit setting get propagated
> > to the slave?
> 
> We would remove full page writes that are needed for crash recovery, but
> perhaps keep other full pages.

How do you tell which is which?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Bruce Momjian
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Added to TODO:
> > 
> > o Reduce PITR WAL file size by removing full page writes and
> >   by removing trailing bytes to improve compression
> 
> If we remove full page writes, how does hint bit setting get propagated
> to the slave?

We would remove full page writes that are needed for crash recovery, but
perhaps keep other full pages.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lookup penalty for VARIADIC patch

2008-07-15 Thread Pavel Stehule
2008/7/15 Tom Lane <[EMAIL PROTECTED]>:
> The proposed variadic-functions patch inserts some none-too-cheap code
> into FuncnameGetCandidates (it's deconstructing the proargmodes column
> to see if the function is variadic or not) which gets executed whether
> or not there are any variadic functions involved.  I checked whether
> this would cause a noticeable slowdown in practice, and got a
> discouraging answer:
>
> $ cat functest.sql
> select sin(5), cos(45);
> $ pgbench -c 1 -t 1 -n -f functest.sql regression
> transaction type: Custom query
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of transactions per client: 1
> number of transactions actually processed: 1/1
> tps = 927.418555 (including connections establishing)
> tps = 928.953281 (excluding connections establishing)
>
> That's with the patch.  CVS HEAD gets
> tps = 1017.901218 (including connections establishing)
> tps = 1019.724948 (excluding connections establishing)
>
> so that code is adding about 10% to the total round-trip execution time
> for the select --- considering all the other overhead involved there,
> that means the actual cost of FuncnameGetCandidates has gone up probably
> by an order of magnitude.  And that's for the *best* case, where
> proargmodes is null so SysCacheGetAttr will fall out without returning
> an array to examine.  This doesn't seem acceptable to me.
>
> What I'm thinking of doing is adding a column to pg_proc that provides
> the needed info in a trivial-to-get-at format.  There are two ways we
> could do it: a bool column that is TRUE if the function is variadic,
> or an oid column that is the variadic array's element type, or zero
> if the function isn't variadic.  The second would take more space but
> would avoid having to do a catalog lookup to get the element type in
> the case that the function is indeed variadic.  I'm leaning to the
> second way but wanted to know if anyone objected?
>
> Also, it occurs to me that we could buy back a good part of the extra
> space if we allowed pg_proc.probin to be NULL for internal functions.
> Right now it's always "-" in that case, which is useless ...

probin is used in some unofficial pl hacks, so this space its some
times used. I vote for special column that containst variadic element
type

Regards
Pavel Stehule
>
>regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Alvaro Herrera
Bruce Momjian wrote:

> Added to TODO:
> 
> o Reduce PITR WAL file size by removing full page writes and
>   by removing trailing bytes to improve compression

If we remove full page writes, how does hint bit setting get propagated
to the slave?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Lookup penalty for VARIADIC patch

2008-07-15 Thread Tom Lane
The proposed variadic-functions patch inserts some none-too-cheap code
into FuncnameGetCandidates (it's deconstructing the proargmodes column
to see if the function is variadic or not) which gets executed whether
or not there are any variadic functions involved.  I checked whether
this would cause a noticeable slowdown in practice, and got a
discouraging answer:

$ cat functest.sql
select sin(5), cos(45);
$ pgbench -c 1 -t 1 -n -f functest.sql regression
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of transactions per client: 1
number of transactions actually processed: 1/1
tps = 927.418555 (including connections establishing)
tps = 928.953281 (excluding connections establishing)

That's with the patch.  CVS HEAD gets
tps = 1017.901218 (including connections establishing)
tps = 1019.724948 (excluding connections establishing)

so that code is adding about 10% to the total round-trip execution time
for the select --- considering all the other overhead involved there,
that means the actual cost of FuncnameGetCandidates has gone up probably
by an order of magnitude.  And that's for the *best* case, where
proargmodes is null so SysCacheGetAttr will fall out without returning
an array to examine.  This doesn't seem acceptable to me.

What I'm thinking of doing is adding a column to pg_proc that provides
the needed info in a trivial-to-get-at format.  There are two ways we
could do it: a bool column that is TRUE if the function is variadic,
or an oid column that is the variadic array's element type, or zero
if the function isn't variadic.  The second would take more space but
would avoid having to do a catalog lookup to get the element type in
the case that the function is indeed variadic.  I'm leaning to the
second way but wanted to know if anyone objected?

Also, it occurs to me that we could buy back a good part of the extra
space if we allowed pg_proc.probin to be NULL for internal functions.
Right now it's always "-" in that case, which is useless ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Proposal - UUID data type

2008-07-15 Thread Andrew Dunstan



Kless wrote:

I'm sorry, but it was necessary that certain answers were answered by
someone with wide knowledge on databases and overall about its own
database. This one was the only way, and I believe that it has been
enough positive, at least for the end users -every one that choose its
data base-. At least this clarifies how is working each community, and
what is to be true or not.



Nonsense. It was not at all necessary.

If someone wants to post on this mailing list they should do it 
themselves. If not, you shouldn't cross-post for them.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Proposal - UUID data type

2008-07-15 Thread Kless
I'm sorry, but it was necessary that certain answers were answered by
someone with wide knowledge on databases and overall about its own
database. This one was the only way, and I believe that it has been
enough positive, at least for the end users -every one that choose its
data base-. At least this clarifies how is working each community, and
what is to be true or not.

On Jul 15, 6:45 pm, [EMAIL PROTECTED] (Abhijit Menon-Sen) wrote:
> At 2008-07-15 08:34:01 -0700, [EMAIL PROTECTED] wrote:
>
>
>
> > An answer of Jerry Stuckle:
>
> Please stop cross-posting messages from this list to whatever MySQL list
> you're on. It's a boring, pointless waste of time at best, and at worst
> will get you written off as a troll in both places pretty soon.
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Proposal - UUID data type

2008-07-15 Thread Mark Mielke
First - please stop copying this list - this is not the "convince Jerry 
to include UUID in MySQL" mailing list.


Really - I don't care what he thinks. But, on the subjects themselves 
and how they apply to *PostgreSQL*:



Non-standard features just force people to stick with that one
product.
  In the long run, the only people who benefit are the product
developers.
  


I chose PostgreSQL over MySQL because it provided numerous features - 
both standard and non - that I needed on the day I made my decision. I 
don't care about the long run as a user. One might as well say 90% of 
the world is wrong for using Microsoft products, because it locks one 
into Microsoft. One can say this - and people do say this - but none of 
this changes the fact that 90% of the world is relatively happy with 
their choice. They voted with their dollars. All decisions should be 
made on a cost-benefit analysis - they should not be based on some 
arbitrary code like "I will not choose a solution that locks me in".


Additionally - in the context of MySQL - the main reason I chose 
PostgreSQL over MySQL is because it provided things like CREATE VIEW, 
which MySQL did not at the time. People such as Jerry can pretend that 
standards guarantee that a feature is in all products, but it seems 
quite clear that just because something is a standard does NOT mean it 
is implemented the same everywhere, or even at all. At the time I chose 
PostgreSQL it was my opinion that PostgreSQL was far more 
standards-compliant than MySQL was going to be for at least a few years. 
I am glad I came to the correct conclusion. MySQL implemented ACID as an 
after-thought. I mean - comone.



This is incorrect. UUID at 16 bytes is already "long" in terms of
being
used as a primary index. In an 8K page, one can only fit 512 UUIDs
(forgetting the requirement for headers) - if it was stored as 32
bytes
- or 36 bytes, or 40 bytes (with punctuation), it would be at less
than
256 UUIDs per page. For a join table joining one set of UUID to
another
set, that's < 256 vs < 128. Doubling the size of an index row roughly
doubles the time to look up the value.



Incorrect.  Doubling the size of the index has very little effect on
how
long it takes to look up a value.  Intelligent databases use a binary
search so doubling the size only means one additional comparison need
be
done.  And heavily used indexes are generally cached in memory anyway.
  


Wrong. A binary search that must read double the number of pages, and 
compare double the number of bytes, will take double the amount of time. 
There are factors that will reduce this, such as if you assume that most 
of the pages are in memory or cache memory, therefore the time to read 
the page is zero, therefore it's only the time to compare bytes - but at 
this point, the majority of the time is spent comparing bytes, and it's 
still wrong. If we add in accounting for the fact that UUID is compared 
using a possibly inlined memcpy() compared to treating it as a string 
where it is variable sized, and much harder to inline (double the number 
of oeprations), and it's pretty clear that the person who would make 
such a statement as above is wrong.


As another poster wrote - why not double the size of all other data 
structures too. It costs nothing, right?


Why does MySQL have a 3-byte integer support if they truly believe that 
saving 1 byte in 4 doesn't result in a savings for keys?


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [HACKERS] Postgres-R source code release

2008-07-15 Thread Markus Wanner

Hi,

Alvaro Herrera wrote:

I think the
way to go here is to have Markus open up his Monotone repo, or convince
him to migrate it to Git, but I really doubt that's ever going to
happen.


He he... good guess ;-)

However, as much as I personally like monotone and as much as I dislike
git for being a bad rip off of monotone, git certainly has its merits.
It has a much larger users base and is faster than monotone for some 
operations.


Having used subversion, CVS and monotone for Postgres-R, I think I'm now
giving git a try. I'm about to setup the necessary repositories, but I'm 
still having a hard time grokking this tool set. (And yes, I'm missing 
some features compared to monotone. In our case, the most obvious one is 
'mtn diff --context' ;-) )


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Agreed.  I realize why we are not zeroing those bytes (for performance),
> > but can't we have the archiver zero those bytes before calling the
> > 'archive_command'?
> 
> The archiver doesn't know any more about where the end-of-data is than
> the archive_command does.  Moreover, the archiver doesn't know whether
> the archive_command cares.  I think the separate module is a fine
> solution.
> 
> It should also be pointed out that the whole thing becomes uninteresting
> if we get real-time log shipping implemented.  So I see absolutely no
> point in spending time integrating pg_clearxlogtail now.

People doing PITR are still going to be saving these files, and for a
long time, so I think this is still something we should try to address.

Added to TODO:

o Reduce PITR WAL file size by removing full page writes and
  by removing trailing bytes to improve compression

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread David E. Wheeler

On Jul 15, 2008, at 07:09, Tom Lane wrote:


Yeah, probably.  I don't think the "make check" path will support it
because it doesn't install contrib into the temp installation.
(You'd also need to have put the extra entry in parallel_schedule
not serial_schedule, but it's not gonna work anyway.)


Well now that was cool to see. I got some failures, of course, but  
nothing stands out to me as an obvious bug. I attach the diffs file  
(with the citext.sql failure removed) for your perusal. What would be  
the best way for me to resolve those permission issues? Or do they  
matter for sanity-checking citext?


Thanks,

David


regression.diffs
Description: Binary data




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Proposal - UUID data type

2008-07-15 Thread Abhijit Menon-Sen
At 2008-07-15 08:34:01 -0700, [EMAIL PROTECTED] wrote:
>
> An answer of Jerry Stuckle:

Please stop cross-posting messages from this list to whatever MySQL list
you're on. It's a boring, pointless waste of time at best, and at worst
will get you written off as a troll in both places pretty soon.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: Proposal - UUID data type

2008-07-15 Thread Kless
An answer of Jerry Stuckle:

---
>> Yes, they must be managed by the language.  Which is why it should be
>> part of the standard.  That way, changing databases does not require
>> changing code.

> You are correct that putting widely used features into a standard that
> is implemented by everyone is good.

> This does not extend to the conclusion that one should never put in a
> feature until it is standard. Look at any successful software product
> and see how it usually leads the standard rather than follows it.
> People
> only tend to make standards once they realize things are getting out
> of
> control, which is long after the products are in use.

Non-standard features just force people to stick with that one
product.
  In the long run, the only people who benefit are the product
developers.

>>> In PostgreSQL they're stored as 16 binary bytes [2], and the core
>>> database does not include any function for generating UUIDs

>> Yep, which in the grand scheme of things, probably makes zero
>> difference.  The difference between 16 and 32 bytes in any single row
>> is minuscule.

> This is incorrect. UUID at 16 bytes is already "long" in terms of
> being
> used as a primary index. In an 8K page, one can only fit 512 UUIDs
> (forgetting the requirement for headers) - if it was stored as 32
> bytes
> - or 36 bytes, or 40 bytes (with punctuation), it would be at less
> than
> 256 UUIDs per page. For a join table joining one set of UUID to
> another
> set, that's < 256 vs < 128. Doubling the size of an index row roughly
> doubles the time to look up the value.

Incorrect.  Doubling the size of the index has very little effect on
how
long it takes to look up a value.  Intelligent databases use a binary
search so doubling the size only means one additional comparison need
be
done.  And heavily used indexes are generally cached in memory anyway.

>> I am not in favor of adding more database-specific types to ANY
>> database
>> - and I think PostGres doing it was a mistake.

> As somebody who wrote his own module to do UUID for PostgreSQL when I
> needed it in PostgreSQL 8.0, I don't agree. Just as you think defining
> it in a standard is better than each vendor doing it their own way, I
> think doing it in one product is better than each user of the product
> doing it their own way.

Fine.  Whatever you want for your code.  But don't expect the rest of
the world to jump because you want it.

>> If there is a demand for it, then it should be added to the SQL
>> standard.  That is the correct way to propose a change.  That's why
>> there are standards.

> Provide a real example of any similar product doing this. Exactly
> which
> enhancement to a standard was defined without even a prototype
> existing
> used in an existing product that purports to implement the standard?

> I'm sure one or two examples must exist, but I cannot think of any.
> Every enhancement I can think of that eventually made it into a
> standard, was first implemented within a popular product, and then
> demanded as a standard to be applied to all other products.


Most features added to the SQL standard, for instance.  Like explicit
JOINs, recursive SQL and a bunch more.  Also changes to the C++
standard
such as exceptions were at least in the process of being evaluated and
approved before they were in any product.

There's a reason for having a process to propose features to a
product.
And it does not require the proposed change to be in any product.

--
==
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
[EMAIL PROTECTED]
==
---

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres-R: current state of development

2008-07-15 Thread Markus Wanner

Hi,

After having published the source code, I'd like to add some words about 
the current state of the project.


Postgres-R is currently capable of replicating tuples via binary change 
sets, does proper conflict detection and prevention. It offers three 
different timing methods: sync, eager and lazy. Out of those, the eager 
method is the most advanced one, which I've been focusing on. However, 
for fully synchronous replication, we mainly need to be able to handle 
multiple changesets per transaction. This will be necessary anyway to 
support long running transactions. Because it simply doesn't make sense 
to keep a huge changeset back and send it just before the commit.


A pretty general framework for helper processes is provided. I think 
this framework could be used for parallel querying or data loading as 
well. The helper processes are ordinary backends which process a single 
transaction at a time. But they don't have a client connection, instead 
they communicate with a manager via a messaging module based on shared 
memory and signals. Within Postgres-R, those helper backends are mostly 
called 'remote backends', which is a somewhat misleading name. It's just 
a short name for a helper backend which processes a remote transaction.


I've written interfaces to ensemble, spread and an emulated GCS for 
testing purposes. The spread interface is still lacking functionality, 
the other two should work fine. None of the interfaces is dependent on 
external libraries, because I have added asynchronous clients, which 
none of the given libraries for ensemble or spread offered, but is 
required for the replication manager.


Sequence increments are replicated just fine and sequences feature an 
additional per-node cache. The setval() functionality is still missing, 
though.


Recovery and initialization must still be done manually, although I've 
already done much of the work to synchronize table data. A daunting task 
will be the synchronization of the sytsem catalogs. Postgres-R can 
currently not replicate any DDL command.


Compared with the WAL log shipping method mentioned in the core team 
statement about built-in replication, this is certainly the longer way 
to go. But on the other hand it isn't limited to single-master 
replication and certainly offers more options for future extensions.


Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] "\ef " in psql

2008-07-15 Thread Abhijit Menon-Sen
At 2008-07-15 10:33:02 -0400, [EMAIL PROTECTED] wrote:
>
> I doubt we'd consider accepting a patch done this way.

Yes, it's much too ugly to live. I was posting it only for the record,
I should have made that clear.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements

2008-07-15 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> Simon Riggs <[EMAIL PROTECTED]> wrote:
>>> Also, after looking at the patch more closely, was there a good reason
>>> for making the hook intercept ExecutePlan rather than ExecutorRun?
>> 
>> That raises the question of whether we should have ExecutorStart() and
>> ExecutorEnd() hooks as well, to round things off.

> Yeah, and also ExecutorRewind() hook.

I'm not impressed by this line of argument.  If we start putting in
hooks just because someone might need 'em someday, we'd soon end up with
hundreds or thousands of mostly-useless hooks.  I'm happy to put in
hooks that there's a demonstrated need for, but I don't believe that
"replace the executor without touching the core code" is a sane goal.
Even if it were, the API of the executor to the rest of the system
is a whole lot wider than four functions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there anyway to create a TupleDesc with uint32 attribute easily?

2008-07-15 Thread Jonah H. Harris
On Tue, Jul 15, 2008 at 5:56 AM, Xiao Meng <[EMAIL PROTECTED]> wrote:
> Hi, hackers.
> I'm working on my gsoc project - improving hash index.

Given the amount of time left for SoC, can you please post to -hackers
your proposed design for review and comments (under a new topic).

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] "\ef " in psql

2008-07-15 Thread Tom Lane
Abhijit Menon-Sen <[EMAIL PROTECTED]> writes:
> Refactoring pg_dump was more work than I had time to do right now, and I
> wanted \ef to work, so I hacked up the attached (by copying dumpFunc and
> its dependencies to src/bin/psql/dumpfunc.[ch]).

I doubt we'd consider accepting a patch done this way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there anyway to create a TupleDesc with uint32 attribute easily?

2008-07-15 Thread Jonah H. Harris
On Tue, Jul 15, 2008 at 10:22 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
>> I would actually suggest ditching the normal page layout with line
>> pointers and IndexTuples altogether.
>
> It would be a bad idea to remove the page header.  You need the LSN
> (someday hash indexes will obey WAL logging), and removing the page
> version number seems pretty imprudent also.  I'd be inclined to
> stick with the regular header and the existing "special space"
> definition.  But yeah, what's in between need not necessarily be
> regular tuples.

Agreed.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there anyway to create a TupleDesc with uint32 attribute easily?

2008-07-15 Thread Tom Lane
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> I would actually suggest ditching the normal page layout with line 
> pointers and IndexTuples altogether.

It would be a bad idea to remove the page header.  You need the LSN
(someday hash indexes will obey WAL logging), and removing the page
version number seems pretty imprudent also.  I'd be inclined to
stick with the regular header and the existing "special space"
definition.  But yeah, what's in between need not necessarily be
regular tuples.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: CITEXT 2.0 v3

2008-07-15 Thread Tom Lane
"David E. Wheeler" <[EMAIL PROTECTED]> writes:
> Okay, I copied citext.sql into src/test/regress/sql and then added  
> "test: citext" to the top of src/test/regress/serial_schedule. Then I  
> ran `make check`. All tests passed, but I don't think that citext was  
> tested.
> Do I need to install the server, build a cluster, and run `make  
> installcheck`?

Yeah, probably.  I don't think the "make check" path will support it
because it doesn't install contrib into the temp installation.
(You'd also need to have put the extra entry in parallel_schedule
not serial_schedule, but it's not gonna work anyway.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hint Bits and Write I/O

2008-07-15 Thread Bruce Momjian

Added to TODO:

* Consider decreasing the I/O caused by updating tuple hint bits

  http://archives.postgresql.org/pgsql-hackers/2008-05/msg00847.php


---

Simon Riggs wrote:
> After some discussions at PGCon, I'd like to make some proposals for
> hint bit setting with the aim to reduce write overhead.
> 
> Currently, when we see an un-hinted row we set the bit, if possible and
> then dirty the block.
> 
> If we were to set the bit but *not* dirty the block we may be able to
> find a reduction in I/O. In many cases this would make no difference at
> all, since we often set hints on an already dirty block. In other cases,
> particularly random INSERTs, UPDATEs and DELETEs against large tables
> this would reduce I/O, though possibly increase accesses to clog.
> 
> My proposal is to have this as a two-stage process. When we set the hint
> on a tuple in a clean buffer we mark it BM_DIRTY_HINTONLY, if not
> already dirty. If we set a hint on a buffer that is BM_DIRTY_HINTONLY
> then we mark it BM_DIRTY.
> 
> The objective of this is to remove effects of single index accesses.
> 
> If the bgwriter has time, it will write out BM_DIRTY_HINTONLY buffers,
> though on a consistently busy server this should not occur.
> 
> This new behaviour should reduce the effects of random hint bit setting
> on tables with a low cache hit ratio. This can occur when a table is
> written/read fairly randomly and is much larger than shared_buffers.
> 
> This won't change the behaviour of first-read-after-copy. To improve
> that behaviour, I suggest that we only move from BM_DIRTY_HINTONLY to
> BM_DIRTY when we are setting the hint for a new xid. If we are just
> setting the same xid over-and-over again then we should avoid setting
> the page dirty. So when data has been loaded via COPY, we will just
> check the status of the xid once, then scan the whole page using the
> single-item transaction cache.
> 
> Let's discuss.
> 
> -- 
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there anyway to create a TupleDesc with uint32 attribute easily?

2008-07-15 Thread Xiao Meng
Oh, I almost forget the problem of item size.
What I consider is just to modify the source code as few as possible;-(
Thank you for reminding me, Heikki.
I can see in the older version of postgresql, it remove a  structure
HashItem and just use indextuple to store the item.
Now I would use a structure HashItem to store it again;-)

On Tue, Jul 15, 2008 at 8:04 PM, Heikki Linnakangas
<[EMAIL PROTECTED]> wrote:
> Xiao Meng wrote:
>>
>> Hi, hackers.
>> I'm working on my gsoc project - improving hash index.
>> I need to create a TupldeDesc with uint32 attribute.
>> Here is the reason why I need it -
>> If we store hash code only in the tuple, then the TupleDesc's
>> attribute should be uint32
>
> Something along the lines of:
>
> TupleDesc tupdesc = CreateTemplateTupleDesc(1, false);
> TupleDescInitEntry(tupdesc, 1, "hashcode", INT4OID, -1, 0);
>
>> In this way, we can use index_form_tuple() to create a tuple and
>> needn't write a function for hash specially.
>
> I would actually suggest ditching the normal page layout with line pointers
> and IndexTuples altogether. All the entries are fixed size, so you can get
> away with a much simpler structure, and store more entries per page. Storing
> an IndexTuple with one int32 column takes
>
> (sizeof(IndexTupleData) + sizeof(uint32) + sizeof(ItemIdData)) = 16 bytes,
>
> but you only need a pointer to the heap tuple, and the hash code, which is
> just 10 bytes in total. Or 12 if you have to align the hash code to 4 byte
> boundary.
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>



-- 
Best Regards,
Xiao Meng

DKERC, Harbin Institute of Technology, China
Gtalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
http://xiaomeng.yo2.cn

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there anyway to create a TupleDesc with uint32 attribute easily?

2008-07-15 Thread Heikki Linnakangas

Xiao Meng wrote:

Hi, hackers.
I'm working on my gsoc project - improving hash index.
I need to create a TupldeDesc with uint32 attribute.
Here is the reason why I need it -
If we store hash code only in the tuple, then the TupleDesc's
attribute should be uint32


Something along the lines of:

TupleDesc tupdesc = CreateTemplateTupleDesc(1, false);
TupleDescInitEntry(tupdesc, 1, "hashcode", INT4OID, -1, 0);


In this way, we can use index_form_tuple() to create a tuple and
needn't write a function for hash specially.


I would actually suggest ditching the normal page layout with line 
pointers and IndexTuples altogether. All the entries are fixed size, so 
you can get away with a much simpler structure, and store more entries 
per page. Storing an IndexTuple with one int32 column takes


(sizeof(IndexTupleData) + sizeof(uint32) + sizeof(ItemIdData)) = 16 bytes,

but you only need a pointer to the heap tuple, and the hash code, which 
is just 10 bytes in total. Or 12 if you have to align the hash code to 4 
byte boundary.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] "\ef " in psql

2008-07-15 Thread Abhijit Menon-Sen
Refactoring pg_dump was more work than I had time to do right now, and I
wanted \ef to work, so I hacked up the attached (by copying dumpFunc and
its dependencies to src/bin/psql/dumpfunc.[ch]).

-- ams
*** a/src/bin/psql/Makefile
--- b/src/bin/psql/Makefile
***
*** 21,27  override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) -I$(top_srcdir)/src/bin/pg_du
  
  OBJS=	command.o common.o help.o input.o stringutils.o mainloop.o copy.o \
  	startup.o prompt.o variables.o large_obj.o print.o describe.o \
! 	psqlscan.o tab-complete.o mbprint.o dumputils.o $(WIN32RES)
  
  EXTRA_OBJS = $(top_builddir)/src/backend/parser/keywords.o
  
--- 21,27 
  
  OBJS=	command.o common.o help.o input.o stringutils.o mainloop.o copy.o \
  	startup.o prompt.o variables.o large_obj.o print.o describe.o \
! 	psqlscan.o tab-complete.o mbprint.o dumputils.o dumpfunc.o $(WIN32RES)
  
  EXTRA_OBJS = $(top_builddir)/src/backend/parser/keywords.o
  
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
***
*** 38,43 
--- 38,44 
  #include "libpq-fe.h"
  #include "pqexpbuffer.h"
  #include "dumputils.h"
+ #include "dumpfunc.h"
  
  #include "common.h"
  #include "copy.h"
***
*** 56,62 
  static backslashResult exec_command(const char *cmd,
  			 PsqlScanState scan_state,
  			 PQExpBuffer query_buf);
! static bool do_edit(const char *filename_arg, PQExpBuffer query_buf);
  static bool do_connect(char *dbname, char *user, char *host, char *port);
  static bool do_shell(const char *command);
  
--- 57,64 
  static backslashResult exec_command(const char *cmd,
  			 PsqlScanState scan_state,
  			 PQExpBuffer query_buf);
! static bool do_edit(const char *filename_arg, PQExpBuffer query_buf,
! 	bool *edited);
  static bool do_connect(char *dbname, char *user, char *host, char *port);
  static bool do_shell(const char *command);
  
***
*** 444,454  exec_command(const char *cmd,
  			expand_tilde(&fname);
  			if (fname)
  canonicalize_path(fname);
! 			status = do_edit(fname, query_buf) ? PSQL_CMD_NEWEDIT : PSQL_CMD_ERROR;
  			free(fname);
  		}
  	}
  
  	/* \echo and \qecho */
  	else if (strcmp(cmd, "echo") == 0 || strcmp(cmd, "qecho") == 0)
  	{
--- 446,521 
  			expand_tilde(&fname);
  			if (fname)
  canonicalize_path(fname);
! 			if (do_edit(fname, query_buf, NULL))
! status = PSQL_CMD_NEWEDIT;
! 			else
! status = PSQL_CMD_ERROR;
  			free(fname);
  		}
  	}
  
+ 	/*
+ 	 * \ef -- edit the named function in $EDITOR.
+ 	 */
+ 
+ 	else if (strcmp(cmd, "ef") == 0)
+ 	{
+ 		Oid foid;
+ 		char *func;
+ 
+ 		func = psql_scan_slash_option(scan_state, OT_WHOLE_LINE, NULL, true);
+ 		if (!func)
+ 		{
+ 			psql_error("no function name specified\n");
+ 			status = PSQL_CMD_ERROR;
+ 		}
+ 		else if (!lookup_function_oid(pset.db, func, &foid))
+ 		{
+ 			psql_error(PQerrorMessage(pset.db));
+ 			status = PSQL_CMD_ERROR;
+ 		}
+ 		else {
+ 			termPQExpBuffer(query_buf);
+ 			if (foid)
+ 			{
+ char *s = create_or_replace_function_text(pset.db, foid);
+ if (s)
+ {
+ 	appendPQExpBufferStr(query_buf, s);
+ 	free(s);
+ }
+ else
+ 	status = PSQL_CMD_ERROR;
+ 			}
+ 			else
+ 			{
+ printfPQExpBuffer(query_buf,
+   "CREATE FUNCTION %s%s RETURNS ... AS $$\n"
+   "...\n"
+   "$$ LANGUAGE '...'\n",
+   func, strchr(func,'(') ? "" : "(...)" );
+ 			}
+ 		}
+ 
+ 		if (status != PSQL_CMD_ERROR)
+ 		{
+ 			bool edited = false;
+ 			if (!do_edit(0, query_buf, &edited))
+ 			{
+ status = PSQL_CMD_ERROR;
+ 			}
+ 			else if (!edited)
+ 			{
+ printf("No changes\n");
+ 			}
+ 			else
+ 			{
+ status = PSQL_CMD_SEND;
+ 			}
+ 			free(func);
+ 		}
+ 	}
+ 
  	/* \echo and \qecho */
  	else if (strcmp(cmd, "echo") == 0 || strcmp(cmd, "qecho") == 0)
  	{
***
*** 1410,1416  editFile(const char *fname)
  
  /* call this one */
  static bool
! do_edit(const char *filename_arg, PQExpBuffer query_buf)
  {
  	char		fnametmp[MAXPGPATH];
  	FILE	   *stream = NULL;
--- 1477,1483 
  
  /* call this one */
  static bool
! do_edit(const char *filename_arg, PQExpBuffer query_buf, bool *edited)
  {
  	char		fnametmp[MAXPGPATH];
  	FILE	   *stream = NULL;
***
*** 1532,1537  do_edit(const char *filename_arg, PQExpBuffer query_buf)
--- 1599,1608 
  psql_error("%s: %s\n", fname, strerror(errno));
  error = true;
  			}
+ 			else if (edited)
+ 			{
+ *edited = true;
+ 			}
  
  			fclose(stream);
  		}
*** /dev/null
--- b/src/bin/psql/dumpfunc.c
***
*** 0 
--- 1,496 
+ #include "dumpfunc.h"
+ 
+ #include "libpq-fe.h"
+ #include "pqexpbuffer.h"
+ #include "dumputils.h"
+ #include "common.h"
+ #include "catalog/pg_proc.h"
+ 
+ #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
+ 
+ /*
+  * This function takes a function description, e.g. "x" or "x(int)", and
+  * issues a query on the given connection to retrieve the function's oid
+  * using a

[HACKERS] Is there anyway to create a TupleDesc with uint32 attribute easily?

2008-07-15 Thread Xiao Meng
Hi, hackers.
I'm working on my gsoc project - improving hash index.
I need to create a TupldeDesc with uint32 attribute.
Here is the reason why I need it -
If we store hash code only in the tuple, then the TupleDesc's
attribute should be uint32.
In this way, we can use index_form_tuple() to create a tuple and
needn't write a function for hash specially.

I've tried lookup_rowtype_tupdesc() with INT4OID, but it dosen't work
since int is not composite type.
Is there any easy way to do it or I should change my design?
Hope to hear from you. Thanks!

--
Best Regards,
Xiao Meng

DKERC, Harbin Institute of Technology, China
Gtalk: [EMAIL PROTECTED]
MSN: [EMAIL PROTECTED]
http://xiaomeng.yo2.cn

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Fragments in tsearch2 headline

2008-07-15 Thread Teodor Sigaev

Attached a new patch that:

1. fixes previous bug
2. better handles the case when cover size is greater than the MaxWords. 


Looks good, I'll make some tests with  real-world application.

I have not yet added the regression tests. The regression test suite 
seemed to be only ensuring that the function works. How many tests 
should I be adding? Is there any other place that I need to add 
different test cases for the function?


Just add 3-5 selects to src/test/regress/sql/tsearch.sql with checking basic 
functionality and corner cases like

 - there is no covers in text
 - Cover(s) is too big
 - and so on

Add some words in documentation too, pls.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements

2008-07-15 Thread Simon Riggs

On Tue, 2008-07-15 at 16:25 +0900, ITAGAKI Takahiro wrote:
> > > Also, after looking at the patch more closely, was there a good
> reason
> > > for making the hook intercept ExecutePlan rather than ExecutorRun?
> > 
> > That raises the question of whether we should have ExecutorStart()
> and
> > ExecutorEnd() hooks as well, to round things off.
> 
> Yeah, and also ExecutorRewind() hook. There are 4 interface functions
> in executor. My addin only needs Run hook because it doesn't modify
> the actual behavior of executor. However, when someone hope to replace
> the behavior, they need all of the hooks. (Is multi-threaded executor
> project still alive?)

No plans here, just thinking: if we do it, do it once.

The reason I wasn't thinking about the rewind part though was it seems
like someone might want to set up or tear down something at appropriate
times, so adding Start/End felt "obvious". Yes, lets have Rewind also. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] VACUUM Improvements - WIP Patch

2008-07-15 Thread Heikki Linnakangas

Gregory Stark wrote:

I thought
we only pruned when we wanted to insert a new tuple and found not enough
space.


Nope, we prune on any access to the page, if the page is "full enough", 
and the pd_prune_xid field suggests that there is something to prune.


The problem with only pruning on inserts is that by the time we get to 
heap_insert/heap_update, we're already holding a pin on the page, which 
prevents us from acquiring the vacuum lock.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements

2008-07-15 Thread ITAGAKI Takahiro

Simon Riggs <[EMAIL PROTECTED]> wrote:

> > I wonder whether we ought to change things so that the real query
> > source text is available at the executor level.  Since we are (at least
> > usually) storing the query text in cached plans, I think this might just
> > require some API refactoring, not extra space and copying.  It would
> > amount to a permanent decision that we're willing to pay the overhead
> > of keeping the source text around, though.
> 
> I think its a reasonable decision to do that. Knowing what you're doing
> while you do it is pretty important.

I worked around to it, I found I can use ActivePortal->sourceText in some
situations. But there are still some problems:

 - SQL functions:
They don't modify ActivePortal->sourceText, but we could get the
source from SQLFunctionCache->src. If it is required, we might
need to add a new field in QueryDesc and copy the src to the field.
 - Multiple queries:
Query text is not divided into each query.
Only the original combined text is available.
 - RULEs:
There are similar issues with multiple queries.
Also, they don't have original query texts.

The same can be said for planner_hook(). Only available query text is
debug_query_string in it, and it is the top-level query. We cannot use
the actual SQL text which the Query object comes from. The treu query
text might be SQL functions used in the top-level query, a part of multiple
queries, or another query rewritten by RULE.

For these reasons, now I'm thinking to collect only top-query level
statistics, not per-planner+executor level statistics. i.e, when we
receive a multiple query "SELECT 1; SELECT 2;", pg_stat_statements uses
the original combined text as a key. Comsumed resource associated with
the key is sum of resources used in both "SELECT 1" and "SELECT 2".


> > Also, after looking at the patch more closely, was there a good reason
> > for making the hook intercept ExecutePlan rather than ExecutorRun?
> 
> That raises the question of whether we should have ExecutorStart() and
> ExecutorEnd() hooks as well, to round things off.

Yeah, and also ExecutorRewind() hook. There are 4 interface functions in
executor. My addin only needs Run hook because it doesn't modify the actual
behavior of executor. However, when someone hope to replace the behavior,
they need all of the hooks. (Is multi-threaded executor project still alive?)

How about adding new Executor class
and ExecutorStart() returns an instance of Executor?

typedef struct Executor
{
ExecutorRunFunc run;
ExecutorEndFunc end;
ExecutorRewindFunc  rewind;
/* there might be private fields. */
} Executor;

Executor *e = ExecutorStart_hook(...);
ExecutorRun(e, ...) => { e->run(e, ...); }
ExecutorEnd(e, ...) => { e->end(e, ...); }

It could be make APIs cleaner because QueryDesc has 3 fields only for
executor (tupDesc, estate, planstate). We can move those fields to
Executor's private fields. Is this modification acceptable?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers