Re: [HACKERS] Security and Data Protection Issues

2008-07-10 Thread Stuart Gundry
Thank you, I'm also curious as to whether the data folder is already in some
way encrypted and if so, what encryption/obfuscation is being used. There
doesn't seem to be anything about this on the web.


Re: [HACKERS] Security and Data Protection Issues

2008-07-10 Thread Richard Huxton

Stuart Gundry wrote:

Thank you, I'm also curious as to whether the data folder is already in some
way encrypted and if so, what encryption/obfuscation is being used. There
doesn't seem to be anything about this on the web.


No encryption, although large text fields may be compressed (read up on 
TOAST) so not readable as plain-text.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-explain patch

2008-07-10 Thread Simon Riggs

On Wed, 2008-07-09 at 09:11 +, Dean Rasheed wrote:
 Simon, I like your proposal, and I think I can see how to code it
 fairly easily.
 
 There is one thing that it doesn't allow, however, which the debug_xxx
 parameters do, and that is for a non-superuser to trace SQL used in
 functions, from an interactive client session. For me, this is quite a
 big thing, because I find it most convienient to turn these parameters
 on while writing and tweaking stored procedures, and have the output
 go straight to my psql window, without having to connect as a superuser
 and trawl through log files.
 
 So I suggest grouping these parameters in their own category
 (eg. sql_trace) and then having additional parameters to control
 where the output would go. So the sql_trace parameters would be:
 
 * sql_trace_min_planner_duration
 * sql_trace_min_executor_duration
 * sql_trace_explain_plan
 
 and they would work exactly as you describe, except they would be
 settable by normal users. Then the destination(s) for the statement
 and EXPLAIN logging would be controlled by:
 
 * client_sql_trace = on | off - settable by a normal user to allow a
 client session to see the sql_trace output. If this parameter is on,
 the sql_trace will be logged as NOTICE output.

After sleeping on this, I think we should follow your idea.

If its possible to do the sql_trace_* parameters as a single one, I
would prefer it, since it makes it more practical to use dynamically.
Not sure how...maybe with a wrapper function?

sql_trace(integer) sets just sql_trace_min_executor_duration
sql_trace(integer, boolean) sets executor and explain
sql_trace(integer, integer, boolean) sets all 3

I think you probably need to drop the sql_ off the front because of
parameter length only.

No need for the other log_... parameter though.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-explain patch

2008-07-10 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 On Wed, 2008-07-09 at 09:11 +, Dean Rasheed wrote:
 
 So I suggest grouping these parameters in their own category
 (eg. sql_trace) and then having additional parameters to control
 where the output would go. So the sql_trace parameters would be:
 
 * sql_trace_min_planner_duration
 * sql_trace_min_executor_duration
 * sql_trace_explain_plan
 
 If its possible to do the sql_trace_* parameters as a single one, I
 would prefer it, since it makes it more practical to use dynamically.
 Not sure how...maybe with a wrapper function?

 sql_trace(integer) sets just sql_trace_min_executor_duration
 sql_trace(integer, boolean) sets executor and explain
 sql_trace(integer, integer, boolean) sets all 3

Fwiw it seems to me trace_sql_* would be nicer, much as we have
track_* guc parameters.

Though I also wonder if there's really any distinction here between tracing
and logging like log_explain_plan and so on. Perhaps we should keep the word
trace for a more in-detail facility.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Security and Data Protection Issues

2008-07-10 Thread Jan Urbański

Stuart Gundry wrote:

Been looking into truecrypt but can't seem to get it to play nice with
postgres silent installer. When I try to set the BASEDIR=M:\, which is
where I mounted my encrypted volume it gives the following error in the log

The Cacls command can be run only on disk drives that use the NTFS file
system.


Hmm, and are sure that the encrypted partition is seen by the system as 
a NTFS partition?



It sounded like you've done this before so I was hoping you could give me
some pointers. I know its not the rest of my install command since I've used
that many times before.


I've done that, but not on Windows, so I'm not really sure how that'll 
work. Maybe someone with more Windows experience could help here?


BTW: you could try and install Postgres as usual, and just keep your WAL 
logs directory and all data from the tables on the encrypted partition.
Just read the documentation on CREATE TABLESPACE and about moving the 
pg_xlog directory.



Thank you for your time


You're welcome ;)

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread Abhijit Menon-Sen
At 2008-07-09 17:06:19 -0700, [EMAIL PROTECTED] wrote:

 I'm really new to this git thing, but I now have access to create
 git-shell accounts, etc. on git.postgresql.org. Any ideas you can
 offer on how better to handle this would really help me. :)

The question is: what is your objective in providing this repository?

I've only just cloned your repository, so I can only guess at how it is
managed, but you seem to be rebasing your changes on top of the current
Postgres source and responding to upstream changes by throwing away the
old patches and applying the new ones. (By the way, your origin/master
appears to be lagging the current HEAD by 71 commits, i.e. a month.)

That has several problems:

- There is no indication of how the WITH RECURSIVE patches have changed
  over time or in response to feedback. For example, the bugs recently
  fixed are indistinguishable from earlier changes. This would be very
  valuable information to have during review (and that's really what I
  was expecting when I cloned).

- One has to clone a 250MB repository (over HTTP, with almost no
  progress indication) to see what is essentially exactly the same
  as the posted patch.

- Rebasing isn't appropriate for a public branch, since you're
  rewriting history that people have pulled already.

If your objective is only to make an up-to-date patch always available,
then it is unnecessary to publicise your repository. You could just use
git-rebase to stay abreast of significant changes in origin/master and
run git-format-patch to generate a patch... but then you still end up
with essentially the same thing that Tatsuo Ishii posted to the list
the other day anyway.

I agree with Alvaro. If the developers aren't committing to this
repository that the patches are generated from, there's really no
point to using the repository for review. It's very much simpler
to just read the patch as posted to the list.

The only real benefit to review that I can imagine would be if full
change history were available, which it could do if a) changes were
committed separately with proper comments and b) if the branch were
*NOT* rebased, but instead periodically merged with origin/master.

That way I could pull from the repository and run e.g.
git-log --stat origin/master..with-recursive or similar.

Hope this helps.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread Gregory Stark

Abhijit Menon-Sen [EMAIL PROTECTED] writes:

 The only real benefit to review that I can imagine would be if full
 change history were available, which it could do if a) changes were
 committed separately with proper comments and b) if the branch were
 *NOT* rebased, but instead periodically merged with origin/master.

 That way I could pull from the repository and run e.g.
 git-log --stat origin/master..with-recursive or similar.

Additionally if other people could commit change patches to the repository or
submit patches which upstream could apply then git would be able to update
submitters trees with just the patches they're missing (ie, skipping the
patches they submitted upstream or merging them cleanly)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Follow-up on replication hooks for PostgreSQL

2008-07-10 Thread Marko Kreen
On 7/10/08, Robert Hodges [EMAIL PROTECTED] wrote:
  This is a quick update on a promise I made early in June to suggest
 requirements as well as ways to add replication hooks that would support
 logical replication, as opposed to the physical replication work currently
 underway based on NTT's code.

  Well, June was a pretty busy month, so it has taken a while to get back to
 this.  However, we are now beginning to examine options for PostgreSQL
 logical replication.  To make a long story short we are willing to commit
 resources to this problem or fund other people to do it for us.  If you are
 interested please contact me directly.  Meanwhile, we are quite serious
 about this problem and intend to work on helpful additions to PostgreSQL in
 this area.  I will post more as we make progress.

Well, I'm not exactly sure what you are planning.  It's OK to do
draft design privately, but before actually starting coding, the
design should be discussed in -hackers.

And I'm not exactly sure what you mean in logical replication?
Way to log DDL statements?  Do you want to log DML also?

FWIW, here's very draft design for functionality that could be used
to make current Slony-I/Londiste-like solutions to replicate DDL also.

1. CREATE DDL TRIGGER statement that allows to call function
   for all DDL statements.

   Only filtering that makes sense here is filtering by area:
   tables/functions/views/etc.

   It must be possible to do AFTER trigger.  Whether BEFORE
   trigger for DDL make sense or not, I'm not sure.

2. When function is called, following information is given:

   - Object type the event was for (table/view/function)
   - Array of object names.
   - SQL statement as text.

   The trigger function can filter further based on object names
   whether it does want to log the event or not.

Trying to make the trigger run on only subset of events is complex,
and parsing the SQL to pieces for trigger to understand it better
is also complex and neither is needed.  Unless there are some common
situation where such simple design fails to work, I would not make
the scheme more complex.

Also the design should be based on assumption that the target side
is exactly in sync.  Eg. DROP CASCADE should be replicated as DROP CASCADE.
We should not make scheme more complex to survive cases where target
is not in sync.  That way madness lies.  The effect should be like
same SQL statements are applied to target by hand, no more, no less.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Sushant Sinha
I am trying to generate a patch with respect to the current CVS head. So
ai rsynced the tree, then did cvs up and installed the db. However, when
I did initdb on a data directory it is stuck:

It is stuck after printing creating template1
creating template1 database in /home/postgres/data/base/1 ... 

I did strace  

$ strace -p 9852
Process 9852 attached - interrupt to quit
waitpid(9864,

then I  straced 9864

$ strace -p 9864
Process 9864 attached - interrupt to quit
semop(8060958, 0xbff36fee,

 $ ps aux|grep 9864   
postgres  9864  1.5  1.3  37296  6816 pts/1S+   07:51
0:02 /usr/local/pgsql/bin/postgres --boot -x1 -F


Seems like a bug to me. Is the tree stable only after commit fests and I
should not use the unstable tree for generating patches?

Thanks,
-Sushant.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-explain patch

2008-07-10 Thread Dean Rasheed

 After sleeping on this, I think we should follow your idea.


Hmm. I preferred your idea ;-) It reduces the number of new parameters
back down to 3, which makes it easier to use:

* trace_min_planner_duration - int, PGC_USERSET
* trace_min_executor_duration - int, PGC_USERSET
* trace_explain_plan - bool, PGC_USERSET

(I already decided to drop the sql_) with all output going to new level TRACE
as you described. So output goes to client and not server log by default as
soon as either of the first 2 parameters is enabled.

I've attached what I've done so far, which works according to the above
description. I've not done much testing or written any docs yet, It would be
good to know if this is along the right lines.

Changing parameter names is easy, although admittedly very important to
get right. You didn't say why you changed your mind on this?

I'm more concerned about any possible security holes it opens up.
For SQL in SQL functions, it just gives the function name and statement
number. For EXECUTEd queries, it doesn't have access to the SQL, so
it just logs the other stuff and any context info. For plpgsql functions it will
log values which appear as literals in any queries, but actually
debug_print_parse exposes far more info in this case.

_
100’s of Nikon cameras to be won with Live Search
http://clk.atdmt.com/UKM/go/101719808/direct/01/*** ./src/backend/commands/copy.c.orig	2008-07-09 16:14:54.0 +0100
--- ./src/backend/commands/copy.c	2008-07-09 16:15:54.0 +0100
***
*** 1042,1047 
--- 1042,1048 
  
  		/* plan the query */
  		plan = planner(query, 0, NULL);
+ 		plan-query_string = queryString;
  
  		/*
  		 * Use a snapshot with an updated command ID to ensure this query sees
*** ./src/backend/commands/explain.c.orig	2008-07-08 13:55:29.0 +0100
--- ./src/backend/commands/explain.c	2008-07-09 16:14:28.0 +0100
***
*** 40,65 
  explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
  
  
- typedef struct ExplainState
- {
- 	/* options */
- 	bool		printTList;		/* print plan targetlists */
- 	bool		printAnalyze;	/* print actual times */
- 	/* other states */
- 	PlannedStmt *pstmt;			/* top of plan */
- 	List	   *rtable;			/* range table */
- } ExplainState;
- 
  static void ExplainOneQuery(Query *query, ExplainStmt *stmt,
  const char *queryString,
  ParamListInfo params, TupOutputState *tstate);
  static void report_triggers(ResultRelInfo *rInfo, bool show_relname,
  StringInfo buf);
- static double elapsed_time(instr_time *starttime);
- static void explain_outNode(StringInfo str,
- Plan *plan, PlanState *planstate,
- Plan *outer_plan,
- int indent, ExplainState *es);
  static void show_plan_tlist(Plan *plan,
  			StringInfo str, int indent, ExplainState *es);
  static void show_scan_qual(List *qual, const char *qlabel,
--- 40,50 
***
*** 170,175 
--- 155,161 
  
  		/* plan the query */
  		plan = planner(query, 0, params);
+ 		plan-query_string = queryString;
  
  		/* run it (if needed) and produce output */
  		ExplainOnePlan(plan, params, stmt, tstate);
***
*** 384,390 
  }
  
  /* Compute elapsed time in seconds since given timestamp */
! static double
  elapsed_time(instr_time *starttime)
  {
  	instr_time	endtime;
--- 370,376 
  }
  
  /* Compute elapsed time in seconds since given timestamp */
! double
  elapsed_time(instr_time *starttime)
  {
  	instr_time	endtime;
***
*** 406,412 
   * side of a join with the current node.  This is only interesting for
   * deciphering runtime keys of an inner indexscan.
   */
! static void
  explain_outNode(StringInfo str,
  Plan *plan, PlanState *planstate,
  Plan *outer_plan,
--- 392,398 
   * side of a join with the current node.  This is only interesting for
   * deciphering runtime keys of an inner indexscan.
   */
! void
  explain_outNode(StringInfo str,
  Plan *plan, PlanState *planstate,
  Plan *outer_plan,
*** ./src/backend/executor/execMain.c.orig	2008-07-08 14:03:34.0 +0100
--- ./src/backend/executor/execMain.c	2008-07-10 10:34:26.0 +0100
***
*** 39,44 
--- 39,45 
  #include catalog/heap.h
  #include catalog/namespace.h
  #include catalog/toasting.h
+ #include commands/explain.h
  #include commands/tablespace.h
  #include commands/trigger.h
  #include executor/execdebug.h
***
*** 52,57 
--- 53,59 
  #include storage/lmgr.h
  #include storage/smgr.h
  #include utils/acl.h
+ #include utils/guc.h
  #include utils/lsyscache.h
  #include utils/memutils.h
  #include utils/snapmgr.h
***
*** 184,189 
--- 186,199 
  	}
  
  	/*
+ 	 * If we are tracing and explaining slow-running queries,
+ 	 * enable instrumentation
+ 	 */
+ 	if (trace_explain_plan 
+ 		(trace_min_planner_duration  -1 || trace_min_executor_duration  -1))
+ 		

Re: [HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Pavan Deolasee
On Thu, Jul 10, 2008 at 5:36 PM, Sushant Sinha [EMAIL PROTECTED] wrote:



 Seems like a bug to me. Is the tree stable only after commit fests and I
 should not use the unstable tree for generating patches?


I quickly tried on my repo and its working fine. (Well it could be a
bit out of sync with the head).

Usually, the tree may get a bit inconsistent during the active period,
but its not very common. I've seen committers doing a good job before
checking in any code and making sure it works fine (atleast initdb and
regression tests).

I would suggest doing a clean build at your end once again.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Andrew Dunstan



Sushant Sinha wrote:

I am trying to generate a patch with respect to the current CVS head. So
ai rsynced the tree, then did cvs up and installed the db. However, when
I did initdb on a data directory it is stuck:


  

[snip]

Seems like a bug to me. Is the tree stable only after commit fests and I
should not use the unstable tree for generating patches?


  



If it were really broken then the buildfarm would be showing red:
http://www.pgbuildfarm.org/cgi-bin/show_status.pl

It isn't, so I suspect the problem is at your end.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Sushant Sinha
You are right. I did not do make clean last time. After make clean, make
all, and make install it works fine. 

-Sushant.

On Thu, 2008-07-10 at 17:55 +0530, Pavan Deolasee wrote:
 On Thu, Jul 10, 2008 at 5:36 PM, Sushant Sinha [EMAIL PROTECTED] wrote:
 
 
 
  Seems like a bug to me. Is the tree stable only after commit fests and I
  should not use the unstable tree for generating patches?
 
 
 I quickly tried on my repo and its working fine. (Well it could be a
 bit out of sync with the head).
 
 Usually, the tree may get a bit inconsistent during the active period,
 but its not very common. I've seen committers doing a good job before
 checking in any code and making sure it works fine (atleast initdb and
 regression tests).
 
 I would suggest doing a clean build at your end once again.
 
 Thanks,
 Pavan
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE CAST too strict?

2008-07-10 Thread Peter Eisentraut
Am Mittwoch, 9. Juli 2008 schrieb Peter Eisentraut:
 I propose that we relax these two checks to test for binary-coercibility
 instead, which is effectively what is expected and required here anyway.

Here is the corresponding patch.
diff -ur ../cvs-pgsql/doc/src/sgml/ref/create_cast.sgml ./doc/src/sgml/ref/create_cast.sgml
--- ../cvs-pgsql/doc/src/sgml/ref/create_cast.sgml	2007-07-10 14:57:00.0 +0200
+++ ./doc/src/sgml/ref/create_cast.sgml	2008-07-10 14:07:27.0 +0200
@@ -205,7 +205,7 @@
 
   para
Cast implementation functions can have one to three arguments.
-   The first argument type must be identical to the cast's source type.
+   The first argument type must be identical to or binary-compatible with the cast's source type.
The second argument,
if present, must be type typeinteger/; it receives the type
modifier associated with the destination type, or literal-1/
diff -ur ../cvs-pgsql/src/backend/commands/functioncmds.c ./src/backend/commands/functioncmds.c
--- ../cvs-pgsql/src/backend/commands/functioncmds.c	2008-07-03 16:53:04.0 +0200
+++ ./src/backend/commands/functioncmds.c	2008-07-10 13:42:26.0 +0200
@@ -48,6 +48,7 @@
 #include commands/defrem.h
 #include commands/proclang.h
 #include miscadmin.h
+#include parser/parse_coerce.h
 #include parser/parse_func.h
 #include parser/parse_type.h
 #include utils/acl.h
@@ -1403,10 +1404,10 @@
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
   errmsg(cast function must take one to three arguments)));
-		if (procstruct-proargtypes.values[0] != sourcetypeid)
+		if (!IsBinaryCoercible(sourcetypeid, procstruct-proargtypes.values[0]))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-			errmsg(argument of cast function must match source data type)));
+			errmsg(argument of cast function must match or be binary-compatible with source data type)));
 		if (nargs  1  procstruct-proargtypes.values[1] != INT4OID)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -1415,10 +1416,10 @@
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 			errmsg(third argument of cast function must be type boolean)));
-		if (procstruct-prorettype != targettypeid)
+		if (!IsBinaryCoercible(procstruct-prorettype, targettypeid))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-	 errmsg(return data type of cast function must match target data type)));
+	 errmsg(return data type of cast function must match or be binary-compatible with target data type)));
 
 		/*
 		 * Restricting the volatility of a cast function may or may not be a

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest rules

2008-07-10 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes:
 On Monday 07 July 2008 21:56:34 Bruce Momjian wrote:
 Right now you advance the static link to the next commit fest once the
 current one starts --- I was hoping for a link that advances when the
 commit fest is done so I could make it a permanent tab in Firefox.

 This is what I had proposed/changed, but see up thread where Tom disagreed 
 with this idea. 

I surely do not have an objection to having a link defined as above ---
I just wanted to be clear on what we meant by current commitfest.
We probably need two separate terms for the place to submit new
patches and the place we are trying to commit patches from.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread David Fetter
On Thu, Jul 10, 2008 at 04:12:34PM +0530, Abhijit Menon-Sen wrote:
 At 2008-07-09 17:06:19 -0700, [EMAIL PROTECTED] wrote:
 
  I'm really new to this git thing, but I now have access to create
  git-shell accounts, etc. on git.postgresql.org. Any ideas you can
  offer on how better to handle this would really help me. :)
 
 The question is: what is your objective in providing this repository?

Here are my objectives:

1.  Make a repository that keeps up with CVS HEAD.

2.  Allow people who are not currently committers on CVS HEAD to make
needed changes.

 I've only just cloned your repository, so I can only guess at how it
 is managed, but you seem to be rebasing your changes on top of the
 current Postgres source and responding to upstream changes by
 throwing away the old patches and applying the new ones. (By the
 way, your origin/master appears to be lagging the current HEAD by 71
 commits, i.e. a month.)

If you know a better way to do this, I'm all ears :)  I'm completely
new to git and pretty fuzzy on CVS.

 If your objective is only to make an up-to-date patch always
 available, then it is unnecessary to publicise your repository. You
 could just use git-rebase to stay abreast of significant changes in
 origin/master and run git-format-patch to generate a patch... but
 then you still end up with essentially the same thing that Tatsuo
 Ishii posted to the list the other day anyway.
 
 I agree with Alvaro. If the developers aren't committing to this
 repository that the patches are generated from, there's really no
 point to using the repository for review. It's very much simpler to
 just read the patch as posted to the list.

They aren't committing, at least in part, because they did not have
any way to do so.  I'm fixing things so that they do by creating
git-shell accounts on git.postgresql.org which will have write access
to that repository.

To get such an account, please send me your public key and preferred
user name so I can move forward on this.

 The only real benefit to review that I can imagine would be if full
 change history were available, which it could do if a) changes were
 committed separately with proper comments and b) if the branch were
 *NOT* rebased, but instead periodically merged with origin/master.

Great idea!  I'd be happy to wipe this repository and start over just
as you propose.  It would be even nicer if we can put together a
standard procedure for new patches.  Would you be willing to write it
up?

 That way I could pull from the repository and run e.g.
 git-log --stat origin/master..with-recursive or similar.

Excellent :)

 Hope this helps.

It does indeed.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Tom Lane
Stephen R. van den Berg [EMAIL PROTECTED] writes:
 Then, from a client perspective, there is no use at all, because the
 client can actually pause reading the results at any time it wants,
 when it wants to avoid storing all of the result rows.  The network
 will perform the cursor/fetch facility for it.

[ shrug... ]  In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.

A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest rules

2008-07-10 Thread Dave Page
On Thu, Jul 10, 2008 at 3:16 PM, Tom Lane [EMAIL PROTECTED] wrote:

 I surely do not have an objection to having a link defined as above ---
 I just wanted to be clear on what we meant by current commitfest.
 We probably need two separate terms for the place to submit new
 patches and the place we are trying to commit patches from.

Well we have two separate links now, with hints as to their usage:

#  Upcoming CommitFest - add new patches here
# In-progress CommitFest - patch review underway here


-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Andrew Dunstan



Tom Lane wrote:

Stephen R. van den Berg [EMAIL PROTECTED] writes:
  

Then, from a client perspective, there is no use at all, because the
client can actually pause reading the results at any time it wants,
when it wants to avoid storing all of the result rows.  The network
will perform the cursor/fetch facility for it.



[ shrug... ]  In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.

A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.

  


I really think we need to get something like this into libpq. It's on my 
TODO list after notification payloads and libpq support for arrays and 
composites. We'll need to come up with an API before we do much else.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Follow-up on replication hooks for PostgreSQL

2008-07-10 Thread Robert Hodges
Hi Marko,

No fear, we definitely will discuss on pgsql-hackers.  I just wanted to make 
sure that people understood we are still committed to solving this problem and 
will one way or another commit resources to help.

Just to be clear, by logical replication I mean replication based on sending 
SQL or near-SQL (e.g., generic DML events) between servers.   Physical 
replication on the other hand uses internal formats to replicate changes 
without intervening conversion to SQL, for example by shipping WAL records.  
There are advantages to each for different applications.  BTW, I heard this 
nomenclature from Simon Riggs.  It seems quite helpful.

The DDL trigger proposal is interesting and would be a very useful feature 
addition to PostgreSQL.  To execute correctly it may also be necessary to know 
which database you were using at the time the SQL was issued.

For our part we are looking for ways to replicate most or all data on a server 
as efficiently as possible.  Generic call-outs at commit time or reading the 
log directly are attractive approaches.  Depending on the implementation you 
can avoid double writes of replicated data on the master host.  Also, it avoids 
the management headache of ensuring that triggers are correctly installed.  It 
seems as if one of these generic approaches could hook into WAL record 
transport.

Cheers, Robert

On 7/10/08 4:56 AM, Marko Kreen [EMAIL PROTECTED] wrote:

On 7/10/08, Robert Hodges [EMAIL PROTECTED] wrote:
  This is a quick update on a promise I made early in June to suggest
 requirements as well as ways to add replication hooks that would support
 logical replication, as opposed to the physical replication work currently
 underway based on NTT's code.

  Well, June was a pretty busy month, so it has taken a while to get back to
 this.  However, we are now beginning to examine options for PostgreSQL
 logical replication.  To make a long story short we are willing to commit
 resources to this problem or fund other people to do it for us.  If you are
 interested please contact me directly.  Meanwhile, we are quite serious
 about this problem and intend to work on helpful additions to PostgreSQL in
 this area.  I will post more as we make progress.

Well, I'm not exactly sure what you are planning.  It's OK to do
draft design privately, but before actually starting coding, the
design should be discussed in -hackers.

And I'm not exactly sure what you mean in logical replication?
Way to log DDL statements?  Do you want to log DML also?

FWIW, here's very draft design for functionality that could be used
to make current Slony-I/Londiste-like solutions to replicate DDL also.

1. CREATE DDL TRIGGER statement that allows to call function
   for all DDL statements.

   Only filtering that makes sense here is filtering by area:
   tables/functions/views/etc.

   It must be possible to do AFTER trigger.  Whether BEFORE
   trigger for DDL make sense or not, I'm not sure.

2. When function is called, following information is given:

   - Object type the event was for (table/view/function)
   - Array of object names.
   - SQL statement as text.

   The trigger function can filter further based on object names
   whether it does want to log the event or not.

Trying to make the trigger run on only subset of events is complex,
and parsing the SQL to pieces for trigger to understand it better
is also complex and neither is needed.  Unless there are some common
situation where such simple design fails to work, I would not make
the scheme more complex.

Also the design should be based on assumption that the target side
is exactly in sync.  Eg. DROP CASCADE should be replicated as DROP CASCADE.
We should not make scheme more complex to survive cases where target
is not in sync.  That way madness lies.  The effect should be like
same SQL statements are applied to target by hand, no more, no less.

--
marko



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] CommitFest rules

2008-07-10 Thread Tom Lane
Dave Page [EMAIL PROTECTED] writes:
 On Thu, Jul 10, 2008 at 3:16 PM, Tom Lane [EMAIL PROTECTED] wrote:
 I surely do not have an objection to having a link defined as above ---
 I just wanted to be clear on what we meant by current commitfest.
 We probably need two separate terms for the place to submit new
 patches and the place we are trying to commit patches from.

 Well we have two separate links now, with hints as to their usage:

 #  Upcoming CommitFest - add new patches here
 # In-progress CommitFest - patch review underway here

That's fine, but what Bruce asked for was bookmarkable links defined in
both ways --- right now, only the first one has a permanent alias.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UUID - Data type inefficient

2008-07-10 Thread Kless
The new data type, UUID, is stored as a string -char(16)-:


struct pg_uuid_t
{
unsigned char data[UUID_LEN];
};
#define UUID_LEN 16


but this it's very inefficient as you can read here [1].

The ideal would be use bit(128), but today isn't possible. One
possible solution would be create a structure with 2 fields, each one
with bit(64).


[1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread Aidan Van Dyk
* David Fetter [EMAIL PROTECTED] [080710 10:19]:

  The question is: what is your objective in providing this repository?
 
 Here are my objectives:
 
 1.  Make a repository that keeps up with CVS HEAD.

There are already at least 2 public ones that do:
git://repo.or.cz/PostgreSQL.git
git.postgresql.org (which seems to work best on http only!?!?!)

 2.  Allow people who are not currently committers on CVS HEAD to make
 needed changes.

Uh, the point of git is it's distributed, so you don't need to be
involved for them to do that

 If you know a better way to do this, I'm all ears :)  I'm completely
 new to git and pretty fuzzy on CVS.

Well, if you want to use git to it's fullest extent, you really need to 
unlearn *all* of the ideas and restrictions your fuzzy CVS knowledge is
handcuffing you with.  Or at least be able to stuff it in a box in some
dark recess of your brain, so as to not let it control the way you try
and use Git.

Git is *inherently* distributed.  And it's *only* a Content tracker.
Because of this, it works spectacularly well as 2 quite different tools:

1) A developer tool to manage their ideas, developments, and code (track)
2) A content distribution tool (publish)

 They aren't committing, at least in part, because they did not have
 any way to do so.  I'm fixing things so that they do by creating
 git-shell accounts on git.postgresql.org which will have write access
 to that repository.

Committing in GIT has *nothing* to do with an account on
git.postgresql.org.  It's a local operation, and if they are using git
already, they can publish any number of places.  And If they aren't
using git already, then a git-shell account, or some special single repo
on git.postgresql.org isn't going to change that.

You seem to be trying to setup git as a centralized distribution tool
(#2 above), without it being used as a developer tool (#1).  I really
don't see the point of that.  The only difference from CVS that use will
provide is you can provide CVS+patch easily.  But anybody who's going
to use git to get CVS+patch already has git, and thus has CVS+patch
locally available in 1 command anyways...

I think Git's use as #1 *has* to come first.  And, because of it's
inherently distributed nature, #2 just happens once people are using
it...

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Andrew Dunstan



Kless wrote:

The new data type, UUID, is stored as a string -char(16)-:


struct pg_uuid_t
{
unsigned char data[UUID_LEN];
};
#define UUID_LEN 16

  



No it is not. It is stored as 16 binary bytes. As text it won't fit into 
16 bytes.




but this it's very inefficient as you can read here [1].
  



What on earth makes you assume that MySQL performance characteristics 
apply to PostgreSQL? If you want to show that our implementation is 
inefficient, you need to produce PostgreSQL performance tests to 
demonstrate it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb in current cvs head broken?

2008-07-10 Thread Tom Lane
Sushant Sinha [EMAIL PROTECTED] writes:
 You are right. I did not do make clean last time. After make clean, make
 all, and make install it works fine. 

My ironclad rule for syncing with CVS is

make distclean
cvs update
reconfigure, rebuild

The cycles you save by taking shortcuts all vanish in trying to debug
the occasional problem.  Especially since that's human time, not machine
cycles (which can easily be overlapped with, say, reading mail).

Using configure --enable-depend (which is not the default) will prevent
certain categories of problems, but I don't trust it to solve them all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread Alvaro Herrera
Aidan Van Dyk wrote:
 * David Fetter [EMAIL PROTECTED] [080710 10:19]:

  2.  Allow people who are not currently committers on CVS HEAD to make
  needed changes.
 
 Uh, the point of git is it's distributed, so you don't need to be
 involved for them to do that

Yep.  People can already clone the master Pg trunk, and start from there
to build patches.  If they use their *private* repos for this, awesome
-- they have complete history.  If they want other developers to chime
in with further patches, they just need to publish their repos, and let
other clone them.  Then, they can pull from those other repos, or allow
others to push.

If you want to keep updating to trunk as it moves forward, I guess you'd
need to propagate the changes from trunk to your RECURSIVE repo.  And if
upstream changes the patch to fix some bug, you really need that bugfix
to show as a separate (and probably very small) patch.

Unapplying the patch and applying it back seems the worst way to
proceed.

Like Aidan, I think that trying to centralize the GIT repo is trying to
circumvent GIT's design ideas rather than working with them.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE CAST too strict?

2008-07-10 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Mittwoch, 9. Juli 2008 schrieb Peter Eisentraut:
 I propose that we relax these two checks to test for binary-coercibility
 instead, which is effectively what is expected and required here anyway.

 Here is the corresponding patch.

Looks good, but you might want to add a sentence to the documentation
pointing out that the function result type has to match the cast target
type; that seems not to be explicitly stated anywhere.

I also notice that the CREATE CAST page is still worded as if binary
compatible were a symmetric relationship, which it is not.  We probably
need to rework it a bit; but that's not really related to this patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread David Fetter
On Thu, Jul 10, 2008 at 11:31:00AM -0400, Alvaro Herrera wrote:
 Aidan Van Dyk wrote:
  * David Fetter [EMAIL PROTECTED] [080710 10:19]:
 
   2.  Allow people who are not currently committers on CVS HEAD to
   make needed changes.
  
  Uh, the point of git is it's distributed, so you don't need to be
  involved for them to do that
 
 Yep.  People can already clone the master Pg trunk, and start from
 there to build patches.  If they use their *private* repos for this,
 awesome -- they have complete history.  If they want other
 developers to chime in with further patches, they just need to
 publish their repos,

Publishing those repos is easiest on git.postgresql.org.

 and let other clone them.  Then, they can pull from those other
 repos, or allow others to push.

Again, git.postgresql.org is good for this and other places are not
for reasons I've mentioned before.

 If you want to keep updating to trunk as it moves forward, I guess
 you'd need to propagate the changes from trunk to your RECURSIVE
 repo.  And if upstream changes the patch to fix some bug, you really
 need that bugfix to show as a separate (and probably very small)
 patch.
 
 Unapplying the patch and applying it back seems the worst way to
 proceed.

Fine.  I proceeded in ignorance and will fix.  I'm more than delighted
to start the whole thing over based on this.

 Like Aidan, I think that trying to centralize the GIT repo is trying to
 circumvent GIT's design ideas rather than working with them.

It's not about centralizing, but about letting a bunch of people
publish their changes to the same spot without being committers on the
Postgres project.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Mark Mielke

Kless wrote:

The new data type, UUID, is stored as a string -char(16)-:


struct pg_uuid_t
{
unsigned char data[UUID_LEN];
};
#define UUID_LEN 16


but this it's very inefficient as you can read here [1].

The ideal would be use bit(128), but today isn't possible. One
possible solution would be create a structure with 2 fields, each one
with bit(64).


[1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

  


That's a general page about UUID vs serial integers.

What is the complaint? Do you have evidence that it would be noticeably 
faster as two 64-bits? Note that a UUID is broken into several non-64 
bit elements, and managing it as bytes or 64-bit integers, or as a union 
with the bit-lengths specified, are probably all efficient or 
inefficient depending on the operation being performed. The hope should 
be that the optimizer will generate similar best code for each.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Mark Mielke

Mark Mielke wrote:

Kless wrote:

The new data type, UUID, is stored as a string -char(16)-:
struct pg_uuid_t
{
unsigned char data[UUID_LEN];
};
#define UUID_LEN 16



What is the complaint? Do you have evidence that it would be 
noticeably faster as two 64-bits? Note that a UUID is broken into 
several non-64 bit elements, and managing it as bytes or 64-bit 
integers, or as a union with the bit-lengths specified, are probably 
all efficient or inefficient depending on the operation being 
performed. The hope should be that the optimizer will generate similar 
best code for each.


I didn't notice that he put 16. Now I'm looking at uuid.c in PostgreSQL 
8.3.3 and I see that it does use 16, and the struct pg_uuid_t is length 
16. I find myself confused now - why does PostgreSQL define UUID_LEN as 16?


I will investigate if I have time tonight. There MUST be some mistake or 
misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Tom Lane
Mark Mielke [EMAIL PROTECTED] writes:
 Kless wrote:
 [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

 That's a general page about UUID vs serial integers.

AFAICT the author of that page thinks that UUIDs are stored in ASCII
form (32 hex digits), which would indeed be inefficient.  I have no
idea whether he knows what he's talking about with respect to mysql,
but it's certainly 100% irrelevant to the Postgres datatype.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Mark Mielke

Mark Mielke wrote:
I didn't notice that he put 16. Now I'm looking at uuid.c in 
PostgreSQL 8.3.3 and I see that it does use 16, and the struct 
pg_uuid_t is length 16. I find myself confused now - why does 
PostgreSQL define UUID_LEN as 16?


I will investigate if I have time tonight. There MUST be some mistake 
or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.


G Kless you've confused me.  32-bit numbers = 4 bytes, 64-bit 
numbers = 8 bytes, 128-bit numbers = 16 bytes.


You are out to lunch and you dragged me with you. Did we have beer at 
least? :-)


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread Aidan Van Dyk
* David Fetter [EMAIL PROTECTED] [080710 11:34]:

  Yep.  People can already clone the master Pg trunk, and start from
  there to build patches.  If they use their *private* repos for this,
  awesome -- they have complete history.  If they want other
  developers to chime in with further patches, they just need to
  publish their repos,
 
 Publishing those repos is easiest on git.postgresql.org.

Obviously not, but let's hope that the current situation changes ;-)

But that has nothing to do with the original point of this repo
question WRT the recursive patch.

  and let other clone them.  Then, they can pull from those other
  repos, or allow others to push.
 
 Again, git.postgresql.org is good for this and other places are not
 for reasons I've mentioned before.
 
And I've not been convinced by them before either ;-)

 It's not about centralizing, but about letting a bunch of people
 publish their changes to the same spot without being committers on the
 Postgres project.

But thank's to GIT, that same spot is irrelevant ;-)

I do think that git.postgresql.org can have value.  But I think it's
value is directly related (or more correctly *derived*) from the
work that's happening by developers *using* git, and the use of git for
development is a necessary prerequisite for publishing that development.
They don't even have to be publishing to git.postgresql.org for
git.postgresql.org to benefit form that development (because git is
distributed)!
 
If development isn't happening with git, and git.postgresql.org is only
a set of mirrors of CVS+patches, then I don't see *any* value of
git.postgresql.org. 

I'm pretty confident that git.postgresql.org won't *remain* valueless,
because I'm pretty confident that git really is a tool that many
developers will come to use...

But if you're new to git, *I* think your time would be better spent
doing actual development using git than trying to manage git
repositories and just mirror CVS+patches.  As you learn GIT, tracking
CVS+patches will be something you pretty much just do, and forget your
actually doing.  Publishing repos will also become something you just
do and forget your actually doing

But I really do think you need to worry about *using* git before you
worry about *publishing* with git, especially if you're new to git, and
have have fuzzy ideas about CVS still as your idea/framework for that
publishing ;-)

But all that's only my opinion, so take it with a grain of salt, or
another pint ;-)

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Joshua D. Drake


On Thu, 2008-07-10 at 12:05 -0400, Mark Mielke wrote:
 Mark Mielke wrote:
  I didn't notice that he put 16. Now I'm looking at uuid.c in 
  PostgreSQL 8.3.3 and I see that it does use 16, and the struct 
  pg_uuid_t is length 16. I find myself confused now - why does 
  PostgreSQL define UUID_LEN as 16?
 
  I will investigate if I have time tonight. There MUST be some mistake 
  or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.
 
 G Kless you've confused me.  32-bit numbers = 4 bytes, 64-bit 
 numbers = 8 bytes, 128-bit numbers = 16 bytes.
 
 You are out to lunch and you dragged me with you. Did we have beer at 
 least? :-)

Sounds like at least 4 and a couple of chasers.

 
 Cheers,
 mark
 
 -- 
 Mark Mielke [EMAIL PROTECTED]
 
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread David E. Wheeler

On Jul 10, 2008, at 09:13, Joshua D. Drake wrote:


You are out to lunch and you dragged me with you. Did we have beer at
least? :-)


Sounds like at least 4 and a couple of chasers.


Next time I'd like to be invited to the party, too! :-P

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Stephen R. van den Berg
Tom Lane wrote:
Stephen R. van den Berg [EMAIL PROTECTED] writes:
 Then, from a client perspective, there is no use at all, because the
 client can actually pause reading the results at any time it wants,
 when it wants to avoid storing all of the result rows.  The network
 will perform the cursor/fetch facility for it.

[ shrug... ]  In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.

True.  But I have written one just now.  The language is called Pike,
it's a C/C++/Java lookalike.  And I start returning rows as they arrive,
and pause reading from the network when the application wants to pause.

A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.

Interleaved retrieval using multiple portals is not what most libraries
support, I'd guess.
It can be supported at the application layer using multiple cursors, but
that works with my approach as well.

In practice, most applications that need that, open multiple
connections to the same database (I'd think).

The only thing I could imagine is that *if* at the server end, the
notifications that arrive during the retrieval of one long running
Execute, are queued *after* all the data, instead of inserted into
the datastream, then it might be worth doing it differently.

Incidentally, the nice thing about my library is that it automatically
does arguments in binary which are easily processed in binary
(TEXT/BYTEA/ and all those others I mentioned earlier).
It automatically transmits those arguments in binary for *both*
arguments and rowresults; i.e. in one row I can have both text and
binary columns, without the application needing to specify which is
which.
-- 
Sincerely,
   Stephen R. van den Berg.

If you can't explain it to an 8-year-old, you don't understand it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Kaare Rasmussen
 You are out to lunch and you dragged me with you. Did we have beer at
 least? :-)

A bit, and you had a byte of bread.

-- 

Med venlig hilsen
Kaare Rasmussen, Jasonic

Jasonic Telefon: +45 3816 2582
Nordre Fasanvej 12
2000 Frederiksberg  Email: [EMAIL PROTECTED]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Mark Mielke [EMAIL PROTECTED] writes:
 Kless wrote:
 [1] http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/

 That's a general page about UUID vs serial integers.

 AFAICT the author of that page thinks that UUIDs are stored in ASCII
 form (32 hex digits), which would indeed be inefficient.  

Well he does say In fact if you store UUID in binary form you can bring it
down to 16 bytes so size is not really the problem. Though I'm unclear why he
thinks a 4x increase in space usage is not really a problem. 

If you have a highly relational database you can easily have half or more your
columns in large tables consisting of foreign keys. If your database is i/o
bandwidth limited that would be a huge effect.

 I have no idea whether he knows what he's talking about with respect to
 mysql, but it's certainly 100% irrelevant to the Postgres datatype.

The rest of it seems to be pretty mysql-specific. Some of the problems are
universal such as making index inserts especially random and making clustering
impossible, but how much they hurt on different databases is going to be very
different.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Abhijit Menon-Sen
(I don't really have much to add to the discussion here; I'm just
posting for the record on the question of client behaviour, since
I also wrote and maintain a client library in C++.)

At 2008-07-10 18:40:03 +0200, [EMAIL PROTECTED] wrote:

 I start returning rows as they arrive, and pause reading from the
 network when the application wants to pause.

My library also starts returning rows as they arrive, and in fact my
application makes heavy use of that feature. The data rows are read
from a non-blocking socket and the caller either does something for
each one, or waits until they've all arrived before proceeding.

 Interleaved retrieval using multiple portals is not what most
 libraries support, I'd guess.

My code did support that mode of operation in theory, but in practice
in the few situations where I have needed to use something like it, I
found it more convenient to open explicit cursors and FETCH from them
(but I usually needed this inside a transaction, and so did not open
multiple connections).

Thus my code always sets maxrows to 0 at the moment, and so...

 The only thing I could imagine is that *if* at the server end, the
 notifications that arrive during the retrieval of one long running
 Execute, are queued *after* all the data, instead of inserted into
 the datastream, then it might be worth doing it differently.

...I can't comment on this interesting observation.

 i.e. in one row I can have both text and binary columns, without the
 application needing to specify which is which.

Yes, that's nice. My first attempt to define an API for bind variables
set the data format to text by default and allowed it to be overriden,
but that was much too troublesome. Now the code decides by itself what
format is best to use for a given query.

(Again, though my library certainly supports mixing text and binary
format columns, my application has not needed to use this feature.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Gregory Stark
Stephen R. van den Berg [EMAIL PROTECTED] writes:

A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.

 Interleaved retrieval using multiple portals is not what most libraries
 support, I'd guess. It can be supported at the application layer using
 multiple cursors, but that works with my approach as well.

 In practice, most applications that need that, open multiple
 connections to the same database (I'd think).

Er? There's nothing particularly unusual about application logic like:

$sth-execute('huge select');
while ($sth-fetch('foreign_key')) {
  ... do some processing which is hard to do in server-side language ...
  $sth-execute('insert resulting data');
}

Most drivers do support this kind of interface but they may be reading the
entire result set for huge select in advance. However if ti's large enough
then this is only going to really work if you can start a new portal while the
outer portal is actually running on the backend. If the driver tries to cache
the whole result set the programmer will be sad.

Back when I was doing PHP programming and I discovered that PHP's Postgres
driver didn't support this I thought it was an outrageous bug. (It didn't help
that the behaviour was to misbehave randomly rather than throw a meaningful
error.)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Stephen R. van den Berg [EMAIL PROTECTED] writes:
 In practice, most applications that need that, open multiple
 connections to the same database (I'd think).

 Er? There's nothing particularly unusual about application logic like:

 $sth-execute('huge select');
 while ($sth-fetch('foreign_key')) {
   ... do some processing which is hard to do in server-side language ...
   $sth-execute('insert resulting data');
 }

Moreover, there's often good reasons to do it all within one
transaction, which is impossible if you rely on a separate connection
to issue the inserts on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] digest

2008-07-10 Thread Fabrízio de Royes Mello

set pgsql-hackers digest

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WITH RECURSIVE updated to CVS TIP

2008-07-10 Thread Abhijit Menon-Sen
At 2008-07-10 07:18:28 -0700, [EMAIL PROTECTED] wrote:

 Here are my objectives:
 
 1.  Make a repository that keeps up with CVS HEAD.
 
 2.  Allow people who are not currently committers on CVS HEAD to
 make needed changes.

OK. Then, to begin with, I think it is very important to make the
repository available via the git protocol. HTTP just won't cut it.

 It would be even nicer if we can put together a standard procedure
 for new patches.  Would you be willing to write it up?

The standard procedure for new patches would be the standard procedure
for *any* patches when you use git. You have a branch that tracks the
upstream (by which I mean the Postgres source) and a branch where you
work (apply individual changes), and you merge with the origin every
now and then (either in your working branch or in another branch).
And once you've published a branch, you try never to rebase it.

The apply individual changes part could be done by hand (git-apply,
git-commit), or by accepting individual patches via email (git-am) or
pulling from a remote repository, or by having others push into your
repository. It doesn't matter.

 Again, git.postgresql.org is good for this and other places are not
 for reasons I've mentioned before.

I haven't seen your reasons, but frankly, I would be suspicious of them
even if git.postgresql.org filled me with confidence, which it doesn't.
It seems to lag some way behind CVS and, as Alvaro pointed out earlier,
may be missing some patches. (I realise those might have been teething
troubles, and it may even be fixed now, but I just use the mirror on
repo.or.cz instead.)

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol 3, Execute, maxrows to return, impact?

2008-07-10 Thread Gregory Stark
Abhijit Menon-Sen [EMAIL PROTECTED] writes:

 Interleaved retrieval using multiple portals is not what most
 libraries support, I'd guess.

 My code did support that mode of operation in theory, but in practice
 in the few situations where I have needed to use something like it, I
 found it more convenient to open explicit cursors and FETCH from them

Note that using FETCH for each record means a round trip to the server for
each record. If you're dealing with a lot of records that could be a lot
slower than streaming them to the client as quickly as it can consume them.

Now I'm not sure anyone's actually done any experiments to optimize libpq or
other drivers to stream data efficiently, so I'm not sure how much you would
really lose in practice today.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest: how does handoff work for non-committer reviewers?

2008-07-10 Thread Josh Berkus

Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
Well, one thing I think we want to do by having non-committer reviewers, is 
to not involve a committer at all if the patch is going to be sent back.  
So one thing I was thinking of is:



1) change status to ready for committer
2) post message to -hackers detailing the review and calling for a 
committer to check the patch

3) a committer picks it up


Well, the key point there is just the sign-off in the review message.


On the wiki, or on -hackers?

--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] digest

2008-07-10 Thread A.M.


On Jul 10, 2008, at 1:20 PM, Fabrízio de Royes Mello wrote:


set pgsql-hackers digest


Postgresql hackers have been successfully digested. *burp*

-M
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUID - Data type inefficient

2008-07-10 Thread Kless
On Jul 10, 5:05 pm, [EMAIL PROTECTED] (Mark Mielke) wrote:
 Mark Mielke wrote:
  I didn't notice that he put 16. Now I'm looking at uuid.c in
  PostgreSQL 8.3.3 and I see that it does use 16, and the struct
  pg_uuid_t is length 16. I find myself confused now - why does
  PostgreSQL define UUID_LEN as 16?

  I will investigate if I have time tonight. There MUST be some mistake
  or misunderstanding. 128-bit numbers should be stored as 8 bytes, not 16.

 G Kless you've confused me.  32-bit numbers = 4 bytes, 64-bit
 numbers = 8 bytes, 128-bit numbers = 16 bytes.

 You are out to lunch and you dragged me with you. Did we have beer at
 least? :-)

 Cheers,
 mark

 --
xDxD I see that the PostgreSQL developers have sense of humor :) I
like it.

It has been a failure mine. I question about that in the IRC, anybody
says me that structure but also say me of see here:

pgsql/src/backend/utils/adt/uuid.c:45:uuid_out

thing that I didn't make.

But it's clear that this problem has been well resolved.


Greetings!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest: how does handoff work for non-committer reviewers?

2008-07-10 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 1) change status to ready for committer
 2) post message to -hackers detailing the review and calling for a 
 committer to check the patch
 3) a committer picks it up
 
 Well, the key point there is just the sign-off in the review message.

 On the wiki, or on -hackers?

On -hackers.  All the substantive stuff should be in the mail archives;
the wiki page is only a current-status display.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Generating code coverage reports

2008-07-10 Thread Michelle Caisse
I have a patch that I will be submitting to add to the build system the 
capability of reporting on test code coverage metrics for the test 
suite. Actually it can show coverage for any application run against 
PostgreSQL. Download Image:Coverage.tar.gz 
http://wiki.postgresql.org/wiki/Image:Coverage.tar.gz to see an 
example report. Gunzip and un-tar the file and click on 
coverage/index.html. I had to delete most of the files to decrease the 
file size for upload, so only the links for access work.


gcov reports line, branch, and function coverage, but lcov only reports 
on line coverage. I've added a link to the html to the gcov output that 
shows summary statistics for each file for line, branch, and function 
calls.


The report gives a very clear and browseable view of what parts of the 
system might benefit from more extensive testing. It's obviously useful 
for planning future testing, but also can be used in conjunction with 
debugging to see what lines and functions are being exercised or missed 
by existing tests of the functionality under investigation. It could 
even be helpful to give a static view of lines hit by a bug test case in 
lieue of using a debugger. Also, when you're writing a unit test for new 
functionality, it would be good to check what you're actually hitting 
with the test.


It uses gcov together with gcc to generate the statistics, and the lcov 
suite to create the html report. Both of these would obviously have to 
be installed to get a coverage report, but this would be an optional 
feature of the build. It only works with gcc.


To generate coverage statistics, you run configure with 
--enable-coverage and after building and running tests, you do make 
coverage. The process generates data files in the same directories as 
source  object files and produces a coverage directory at the top level 
with the html files. I've also set it up so a tar file with the html is 
generated.


More information on gcov at http://gcc.gnu.org/onlinedocs/gcc/Gcov.html, 
lcov at http://ltp.sourceforge.net/documentation/how-to/ltp.php 
(coverage/lcov tabs).


-- Michelle

--
Michelle Caisse   Sun Microsystems
California, U.S. http://sun.com/postgresql




Re: [HACKERS] [PATCHES] WIP: executor_hook for pg_stat_statements

2008-07-10 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 I don't want the tag there at all, much less converted to a pointer.
 What would the semantics be of copying the node, and why?
 
 Please justify why you must have this and can't do what you want some
 other way.

 In my pg_stat_statements plugin, the tag is used to cache hash values of
 SQL strings in PlannedStmt. It is not necessarily needed because the hash
 value is re-computable from debug_query_string. It is just for avoiding
 the work. In addition, we see different SQLs in debug_query_string in
 PREPARE/EXECUTE and DECLARE/FETCH. Hashed SQL cache can work on those
 commands.

Actually, that aspect of the plugin is 100% broken anyway, because it
assumes that debug_query_string has got something to do with the query
being executed.  There are any number of scenarios where this is a bad
assumption.

I wonder whether we ought to change things so that the real query
source text is available at the executor level.  Since we are (at least
usually) storing the query text in cached plans, I think this might just
require some API refactoring, not extra space and copying.  It would
amount to a permanent decision that we're willing to pay the overhead
of keeping the source text around, though.

Also, after looking at the patch more closely, was there a good reason
for making the hook intercept ExecutePlan rather than ExecutorRun?
ExecutePlan was never intended to have a stable public API --- its
argument list is just a happenstance of what ExecutorRun needs to
fetch for its own purposes.  I think we should keep it private and
have ExecutorRun do

if (hook)
hook(...);
else
standard_ExecutorRun(...);

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański

Tom Lane wrote:

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
Do you think it's worthwhile to implement the LC algorithm in C and send 
it out, so others could try it out? Heck, maybe it's worthwhile to 
replace the current compute_minimal_stats() algorithm with LC and see 
how that compares?


Very possibly.  I repeat that the current implementation of
compute_minimal_stats is very ad-hoc code and wasn't written with an eye
to high performance.  Replacing it with an algorithm that someone
actually thought about might well be worth doing.


Here's a patch that combines both patches included here:
http://archives.postgresql.org/message-id/[EMAIL PROTECTED]
and adds a C implementation of the Lossy Counting algorithm.

It defines two typanalyze functions: ts_typanalyze_std and 
ts_typanalyze_lc, so you can see what statistics are gathered by each of 
them. It's meant for easy applying to HEAD, updating pg_type and running 
ANALYZE on a few tables with tsvectors (i.e. testing, not commiting).


My observations are: the LC algorithm beats the previous one by a fairly 
large margin (20-30%) timewise. The results are almost identical, I got 
discrepancies of about 0.05 for some lexemes' frequencies. I intend to 
stick with LC for tsvectors and that'll allow to throw away the Dllist 
changes.


If I want to keep my GSoC schedule I won't be able to implement LC for 
general statistics gathering, but it's trivial. If no one gets about it 
I can do it after the Summer of Code (if only to see how it'll work).


Oh, one important thing. You need to choose a bucket width for the LC 
algorithm, that is decide after how many elements will you prune your 
data structure. I chose to prune after every twenty tsvectors. You might 
consider:

 - picking some other arbitrary value
 - making it depend on the largest tsvector size
 - making it depend on the statistics_target
 - pruning after each X lexemes instead of after each Y tsvectors, 
because now the buckets will vary in width and you can argue that the 
order of input makes a difference again. OTOH the situation here is a 
bit different: you get streams of mutually different elements (lexemes 
inside a tsvector are all different) and pruning in the middle of such 
stream might be unfair for lexemes that are still to be processed. Hmm, 
dunno.


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


gsoc08-tss-03-with-dllist.diff.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generating code coverage reports

2008-07-10 Thread Tom Lane
Michelle Caisse [EMAIL PROTECTED] writes:
 I have a patch that I will be submitting to add to the build system the 
 capability of reporting on test code coverage metrics for the test 
 suite.

Cool.

 To generate coverage statistics, you run configure with 
 --enable-coverage and after building and running tests, you do make 
 coverage. The process generates data files in the same directories as 
 source  object files and produces a coverage directory at the top level 
 with the html files.

How does that work with a VPATH build?  Are you trying to say that you
still have to have the original build tree around in order to collect
coverage data?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Alvaro Herrera
Jan Urbański wrote:

 Oh, one important thing. You need to choose a bucket width for the LC  
 algorithm, that is decide after how many elements will you prune your  
 data structure. I chose to prune after every twenty tsvectors.

Do you prune after X tsvectors regardless of the numbers of lexemes in
them?  I don't think that preserves the algorithm properties; if there's
a bunch of very short tsvectors and then long tsvectors, the pruning
would take place too early for the initial lexemes.  I think you should
count lexemes, not tsvectors.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański

Alvaro Herrera wrote:

Jan Urbański wrote:

Oh, one important thing. You need to choose a bucket width for the LC  
algorithm, that is decide after how many elements will you prune your  
data structure. I chose to prune after every twenty tsvectors.


Do you prune after X tsvectors regardless of the numbers of lexemes in
them?  I don't think that preserves the algorithm properties; if there's
a bunch of very short tsvectors and then long tsvectors, the pruning
would take place too early for the initial lexemes.  I think you should
count lexemes, not tsvectors.


Yes, that's what I was afraid of. I'm not sure why I was reluctant to 
prune in the middle of a tsvector, maybe it's just in my head.
Still, there's a decision to be made: after how many lexemes should the 
pruning occur?


--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Jan Urbański wrote:
 Oh, one important thing. You need to choose a bucket width for the LC  
 algorithm, that is decide after how many elements will you prune your  
 data structure. I chose to prune after every twenty tsvectors.

 Do you prune after X tsvectors regardless of the numbers of lexemes in
 them?  I don't think that preserves the algorithm properties; if there's
 a bunch of very short tsvectors and then long tsvectors, the pruning
 would take place too early for the initial lexemes.  I think you should
 count lexemes, not tsvectors.

Yeah.  I haven't read the Lossy Counting paper in detail yet, but I
suspect that the mathematical proof of limited error doesn't work if the
pruning is done on a variable spacing.  I don't see anything very wrong
with pruning intra-tsvector; the effects ought to average out, since the
point where you prune is going to move around with respect to the
tsvector boundaries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
 Still, there's a decision to be made: after how many lexemes should the 
 pruning occur?

The way I think it ought to work is that the number of lexemes stored in
the final pg_statistic entry is statistics_target times a constant
(perhaps 100).  I don't like having it vary depending on tsvector width
--- why for example should a column having a few wide tsvectors get a
bigger stats entry than one with many narrow ones?  (Not to mention the
issue of having to estimate the average or max width before you can
start the counting run.)

But in any case, given a target number of lexemes to accumulate,
I'd suggest pruning with that number as the bucket width (pruning
distance).   Or perhaps use some multiple of the target number, but
the number itself seems about right.  The LC paper says that the
bucket width w is equal to ceil(1/e) where e is the maximum frequency
estimation error, and that the maximum number of table entries needed
is log(eN)/e after N lexemes have been scanned.  For the values of e
and N we are going to be dealing with, this is likely to work out to
a few times 1/e, in other words the table size is a few times w.
(They prove it's at most 7w given reasonable assumptions about data
distribution, regardless of how big N gets; though I think our values
for N aren't large enough for that to matter.)

The existing compute_minimal_stats code uses a table size of twice the
target number of values, so setting w to maybe a half or a third of the
target number would reproduce the current space usage.  I don't see a
problem with letting it get a little bigger though, especially since we
can expect that the lexemes aren't very long.  (compute_minimal_stats
can't assume that for arbitrary data types...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [WIP] collation support revisited (phase 1)

2008-07-10 Thread Radek Strnad
Hi,

after long discussion with Mr. Kotala, we've decided to redesign our
collation support proposal.
For those of you who aren't familiar with my WIP patch and comments from
other hackers here's the original mail:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg00019.php

In a few sentences - I'm writing collation support for PostgreSQL that is
almost independent on used collating function. I will implement POSIX
locales but switch to ICU will be quite easy. Collations and character sets
defined by SQL standard will be hard coded so we avoid non-existence in some
functions.

The whole project will be divided into two phases:

phase 1
Implement sort of framework so the PostgreSQL will have basic guts
(pg_collation  pg_charset catalogs, CREATE COLLATION, add collation support
for each type needed) and will support collation at database level. This
phase has been accepted as a Google Summer of Code project.

phase 2
Implement the rest - full collation at column level. I will continue working
on this after finishing phase one and it will be my master degree thesis.

How will the first part work?

Catalogs
- new catalogs pg_collation and pg_charset will be defined
- pg_collation and pg_charset will contain SQL standard collations +
optional default collation (when set other than SQL standard one)
- pg_type, pg_attribute, pg_namespace will be extended with references to
default records in pg_collation and pg_charset

initdb
- pg_collation  pg_charset will contain each pre-defined records regarding
SQL standard and optionally one record that will be non-standard set when
creating initdb (the one using system locales)
- these two records will be referenced by pg_type, pg_attribute,
pg_namespace in concerned columns and will be concidered as default
collation that will be inherited

CREATE DATABASE ... COLLATE ...
- after copying the new database the collation will be default (same as
cluster collation) or changed by COLLATE statement. Then we update pg_type,
pg_attribute and pg_namespace catalogs
- reindex database

When changing databases the database collation will be retrieved from type
text from pg_type. This part should be the only one that will be deleted
when proceeding with phase 2. But that will take a while :-)

Thanks for all your comments

Regards

Radek Strnad


Re: [HACKERS] Generating code coverage reports

2008-07-10 Thread Michelle Caisse
It should be possible to make it work  for a VPATH build with 
appropriate arguments to gcov and lcov, but currently it expects the 
object files and generated data files to be in the build directory.


You need access to the build tree to generate coverage statistics and to 
generate the report with make coverage after running the tests or 
application.


-- Michelle

Tom Lane wrote:


Michelle Caisse [EMAIL PROTECTED] writes:
 

I have a patch that I will be submitting to add to the build system the 
capability of reporting on test code coverage metrics for the test 
suite.
   



Cool.

 

To generate coverage statistics, you run configure with 
--enable-coverage and after building and running tests, you do make 
coverage. The process generates data files in the same directories as 
source  object files and produces a coverage directory at the top level 
with the html files.
   



How does that work with a VPATH build?  Are you trying to say that you
still have to have the original build tree around in order to collect
coverage data?

regards, tom lane

 



--
Michelle Caisse   Sun Microsystems
California, U.S. http://sun.com/postgresql




Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Jan Urbański

Tom Lane wrote:

The way I think it ought to work is that the number of lexemes stored in
the final pg_statistic entry is statistics_target times a constant
(perhaps 100).  I don't like having it vary depending on tsvector width


I think the existing code puts at most statistics_target elements in a 
pg_statistic tuple. In compute_minimal_stats() num_mcv starts with 
stats-attr-attstattarget and is adjusted only downwards.
My original thought was to keep that property for tsvectors (i.e. store 
at most statistics_target lexemes) and advise people to set it high for 
their tsvector columns (e.g. 100x their default).
Also, the existing code decides which elements are worth storing as most 
common ones by discarding those that are not frequent enough (that's 
where num_mcv can get adjusted downwards). I mimicked that for lexemes 
but maybe it just doesn't make sense?



But in any case, given a target number of lexemes to accumulate,
I'd suggest pruning with that number as the bucket width (pruning
distance).   Or perhaps use some multiple of the target number, but
the number itself seems about right. 


Fine with me, I'm too tired to do the math now, so I'll take your word 
for it :)


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding variables for segment_size, wal_segment_size and block sizes

2008-07-10 Thread Tom Lane
Abhijit Menon-Sen [EMAIL PROTECTED] writes:
 At 2008-07-03 16:36:02 +0200, [EMAIL PROTECTED] wrote:
 Here's a patch for this.

 I reviewed the patch, it basically looks fine. A few quibbles with the
 provided documentation:

Applied, with ams' doc changes and some further wordsmithing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The way I think it ought to work is that the number of lexemes stored in
 the final pg_statistic entry is statistics_target times a constant
 (perhaps 100).  I don't like having it vary depending on tsvector width

 I think the existing code puts at most statistics_target elements in a 
 pg_statistic tuple. In compute_minimal_stats() num_mcv starts with 
 stats-attr-attstattarget and is adjusted only downwards.
 My original thought was to keep that property for tsvectors (i.e. store 
 at most statistics_target lexemes) and advise people to set it high for 
 their tsvector columns (e.g. 100x their default).

Well, (1) the normal measure would be statistics_target *tsvectors*,
and we'd have to translate that to lexemes somehow; my proposal is just
to use a fixed constant instead of tsvector width as in your original
patch.  And (2) storing only statistics_target lexemes would be
uselessly small and would guarantee that people *have to* set a custom
target on tsvector columns to get useful results.  Obviously broken
defaults are not my bag.

 Also, the existing code decides which elements are worth storing as most 
 common ones by discarding those that are not frequent enough (that's 
 where num_mcv can get adjusted downwards). I mimicked that for lexemes 
 but maybe it just doesn't make sense?

Well, that's not unreasonable either, if you can come up with a
reasonable definition of not frequent enough; but that adds another
variable to the discussion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gsoc, text search selectivity and dllist enhancments

2008-07-10 Thread Oleg Bartunov

On Wed, 9 Jul 2008, Jan Urbaski wrote:


Jan Urbaski wrote:



Do you think it's worthwhile to implement the LC algorithm in C and send it 
out, so others could try it out? Heck, maybe it's worthwhile to replace the 
current compute_minimal_stats() algorithm with LC and see how that compares?


I and Teodor are using LC for phrase estimation in one application and 
from our understanding of the original paper this algorithm might be
not good for sampling, since all theory behind was about streaming of 
FULL data. As for technique we use suffix tree, which should be fine for

typical sample size.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres 8.1 doesn't like pg_standby's -l option

2008-07-10 Thread Gurjeet Singh
I was trying to set up warm standby for an 8.1.11 instance, and was using
pg_standby's -l option so that it creates links and does not actually copies
files. After struggling for a few hours, I found two problems; one big, one
small.

The smaller issue is that even if we do not end the restore_command's value
with a ' (quote), the restore process still runs just fine!

The bigger issue, Postgres seems to not recognize links as valid files, even
though pg_standby was successfully able to restore them! I did not dig the
issue any deeper, so cannot say if it was a misbehaving stat() or Postgres
in version 8.1 did not know how to handle symlinks!

Just and FYI for all, and for the archives.

Best regards,
-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB  http://www.enterprisedb.com

Mail sent from my BlackLaptop device