date:20130213

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.

I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

Regards,

Atri

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery


On 13.02.2013 09:46, Kyotaro HORIGUCHI wrote:

In this case, the FINAL consistency point is at the
XLOG_SMGR_TRUNCATE record, but current implemet does not record
the consistency point (checkpoint, or commit or smgr_truncate)
itself, so we cannot predict the final consistency point on
starting of recovery.


Hmm, what you did was basically:

1. Run server normally.
2. Kill it with pg_ctl stop -m immediate.
3. Create a recovery.conf file, turning the server into a hot standby.

Without step 3, the server would perform crash recovery, and it would 
work. But because of the recovery.conf file, the server goes into 
archive recovery, and because minRecoveryPoint is not set, it assumes 
that the system is consistent from the start.


Aside from the immediate issue with truncation, the system really isn't 
consistent until the WAL has been replayed far enough, so it shouldn't 
open for hot standby queries. There might be other, later, changes 
already flushed to data files. The system has no way of knowing how far 
it needs to replay the WAL to become consistent.


At least in back-branches, I'd call this a pilot error. You can't turn a 
master into a standby just by creating a recovery.conf file. At least 
not if the master was not shut down cleanly first.


If there's a use case for doing that, maybe we can do something better 
in HEAD. If the control file says that the system was running 
(DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash 
recovery first, until we reach the end of WAL, and go into archive 
recovery mode after that. We'd recover all the WAL files in pg_xlog as 
far as we can, same as in crash recovery, and only start restoring files 
from the archive once we reach the end of WAL in pg_xlog. At that point, 
we'd also consider the system as consistent, and start up for hot standby.


I'm not sure that's worth the trouble, though. Perhaps it would be 
better to just throw an error if the control file state is 
DB_IN_PRODUCTION and a recovery.conf file exists. The admin can always 
start the server normally first, shut it down cleanly, and then create 
the recovery.conf file.



On the other hand, updating control file on every commits or
smgr_truncate's should slow the transactions..


To be precise, we'd need to update the control file on every 
XLogFlush(), like we do during archive recovery. That would indeed be 
unacceptable from a performance point of view. Updating the control file 
that often would also be bad for robustness.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Btrfs clone WIP patch

2013-02-13 Thread Jonathan Rogers

This patch against PostgreSQL 9.1.8 takes advantage of efficient file
cloning on Linux Btrfs file systems to make CREATE DATABASE operations
extremely fast regardless of the size of the database used as a
template. On my system, I can create a database from a multi-gibibyte
template in a second or less. This is very useful for automated testing
as well in a development environment where reverting to a baseline
database is frequently required. As an added bonus, newly created
databases require very little additional disk storage until they diverge
from the template.

The efficient cloning is accomplished by a Btrfs-specific ioctl() call.
On non-Linux systems or if the ioctl() call fails, file contents are
copied in the conventional way so no configuration is needed. This has
been tested on a Linux system on both Btrfs and XFS file systems as well
as an OSX system.

The clone_file() function was originally copied from GNU coreutils which
is under GPL v3. The function is currently only about ten lines long and
contains little essential information beyond the magic values needed for
the ioctl() call so I'm not sure if license is a problem.
-- 
Jonathan Ross Rogers
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index 6cfb816..719a5c1 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -22,6 +22,10 @@
 #include unistd.h
 #include sys/stat.h
 
+#ifdef HAVE_SYS_IOCTL_H
+#include sys/ioctl.h
+#endif
+
 #include storage/copydir.h
 #include storage/fd.h
 #include miscadmin.h
@@ -139,6 +143,24 @@ copydir(char *fromdir, char *todir, bool recurse)
 }
 
 /*
+ * Perform the O(1) btrfs clone operation, if possible.
+ * Upon success, return 0.  Otherwise, return -1.
+ */
+static inline int
+clone_file (int dest_fd, int src_fd)
+{
+#ifdef __linux__
+# define BTRFS_IOCTL_MAGIC 0x94
+# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int)
+	return ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd);
+#else
+	(void) dest_fd;
+	(void) src_fd;
+	return -1;
+#endif
+}
+
+/*
  * copy one file
  */
 void
@@ -150,11 +172,6 @@ copy_file(char *fromfile, char *tofile)
 	int			nbytes;
 	off_t		offset;
 
-	/* Use palloc to ensure we get a maxaligned buffer */
-#define COPY_BUF_SIZE (8 * BLCKSZ)
-
-	buffer = palloc(COPY_BUF_SIZE);
-
 	/*
 	 * Open the files
 	 */
@@ -171,38 +188,54 @@ copy_file(char *fromfile, char *tofile)
 (errcode_for_file_access(),
  errmsg(could not create file \%s\: %m, tofile)));
 
-	/*
-	 * Do the data copying.
-	 */
-	for (offset = 0;; offset += nbytes)
+	if (clone_file (dstfd, srcfd) == 0)
+		ereport(DEBUG1, (errmsg(Cloned \%s\ to \%s\., fromfile, tofile)));
+
+	else
 	{
-		/* If we got a cancel signal during the copy of the file, quit */
-		CHECK_FOR_INTERRUPTS();
+		/*
+		 * Do the data copying.
+		 */
 
-		nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
-		if (nbytes  0)
-			ereport(ERROR,
-	(errcode_for_file_access(),
-	 errmsg(could not read file \%s\: %m, fromfile)));
-		if (nbytes == 0)
-			break;
-		errno = 0;
-		if ((int) write(dstfd, buffer, nbytes) != nbytes)
+		/* Use palloc to ensure we get a maxaligned buffer */
+#define COPY_BUF_SIZE (8 * BLCKSZ)
+
+		buffer = palloc(COPY_BUF_SIZE);
+
+		ereport(DEBUG1, (errmsg(Copying \%s\ to \%s\ in userspace.,
+fromfile, tofile)));
+		for (offset = 0;; offset += nbytes)
 		{
-			/* if write didn't set errno, assume problem is no disk space */
-			if (errno == 0)
-errno = ENOSPC;
-			ereport(ERROR,
-	(errcode_for_file_access(),
-	 errmsg(could not write to file \%s\: %m, tofile)));
+			/* If we got a cancel signal during the copy of the file, quit */
+			CHECK_FOR_INTERRUPTS();
+
+			nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
+			if (nbytes  0)
+ereport(ERROR,
+		(errcode_for_file_access(),
+		 errmsg(could not read file \%s\: %m, fromfile)));
+			if (nbytes == 0)
+break;
+			errno = 0;
+			if ((int) write(dstfd, buffer, nbytes) != nbytes)
+			{
+/* if write didn't set errno, assume problem is no disk space */
+if (errno == 0)
+	errno = ENOSPC;
+ereport(ERROR,
+		(errcode_for_file_access(),
+		 errmsg(could not write to file \%s\: %m, tofile)));
+			}
+
+			/*
+			 * We fsync the files later but first flush them to avoid spamming the
+			 * cache and hopefully get the kernel to start writing them out before
+			 * the fsync comes.
+			 */
+			pg_flush_data(dstfd, offset, nbytes);
 		}
 
-		/*
-		 * We fsync the files later but first flush them to avoid spamming the
-		 * cache and hopefully get the kernel to start writing them out before
-		 * the fsync comes.
-		 */
-		pg_flush_data(dstfd, offset, nbytes);
+		pfree(buffer);
 	}
 
 	if (close(dstfd))
@@ -212,7 +245,6 @@ copy_file(char *fromfile, char *tofile)
 
 	close(srcfd);
 
-	pfree(buffer);
 }
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing


On 13.02.2013 11:01, Atri Sharma wrote:

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.


Hmm, sounds very similar to the GiST buffering build work Alexander 
Korotkov did for 9.2. Only the buffers are for B-trees rather than GiST, 
and the buffers are permanent, rather than used only during index build. 
It's also somewhat similar to the fast insert mechanism in GIN, except 
that the gin fast insert buffer is just a single buffer, rather than a 
buffer at each node.



I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).


There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

On Wed, Feb 13, 2013 at 3:08 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 13.02.2013 11:01, Atri Sharma wrote:

 Hi all,

 Just a curiosity I couldnt control. I was recently reading about
 Fractal tree indexing
 (http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
 how TokuDB engine for MySQL is really working nicely with big data.


 Hmm, sounds very similar to the GiST buffering build work Alexander Korotkov
 did for 9.2. Only the buffers are for B-trees rather than GiST, and the
 buffers are permanent, rather than used only during index build. It's also
 somewhat similar to the fast insert mechanism in GIN, except that the gin
 fast insert buffer is just a single buffer, rather than a buffer at each
 node.


 I was wondering, do we have support for fractal tree indexing? I mean,
 it really does seem to help manage big data, so we could think of
 supporting it in some form for our large data set clients( if it is
 not happening already someplace which I have missed).


 There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

 - Heikki

 Hi Heikki,

Yeah,it is pretty close to GisT, but as you said, it still works on BTree.

On the other hand, one thing I really liked about Fractal trees is
that it attempts to address the problems with BTrees. I feel fractal
trees can provide us with a new way altogether to handle new data,
rather than building on top of BTrees.

I would love to chip in, but would require lots of help :)

Do you think building a new index in postgres with fractal trees as
the basis would serve the purpose? or is there something else we
should think of?

Atri

-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Alexander Korotkov

On Wed, Feb 13, 2013 at 1:38 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

On 13.02.2013 11:01, Atri Sharma wrote:

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/**12/fractal-tree-indexing-**overview/http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/)
and
how TokuDB engine for MySQL is really working nicely with big data.

Hmm, sounds very similar to the GiST buffering build work Alexander
Korotkov did for 9.2. Only the buffers are for B-trees rather than GiST,
and the buffers are permanent, rather than used only during index build.
It's also somewhat similar to the fast insert mechanism in GIN, except that
the gin fast insert buffer is just a single buffer, rather than a buffer at
each node.

I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

I remember we have already discussed fractal trees privately. Short
conclusions are so:
1) Fractal tree indexes are patented. It is distributed as commercial
extension to MySQL. So we can't include it into PostgreSQL core.
2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
extension because lack of WAL extensibility.
We could think about WAL extensibility which would help other applications
as well.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Fractal tree indexing

Wed:


 I remember we have already discussed fractal trees privately. Short
 conclusions are so:
 1) Fractal tree indexes are patented. It is distributed as commercial
 extension to MySQL. So we can't include it into PostgreSQL core.
 2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
 extension because lack of WAL extensibility.
 We could think about WAL extensibility which would help other applications
 as well.


Sounds nice. WAL extensibility can help.

Atri



--
Regards,

Atri
l'appren


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Greg Stark

On Wed, Feb 13, 2013 at 10:19 AM, Atri Sharma atri.j...@gmail.com wrote:
 2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
 extension because lack of WAL extensibility.
 We could think about WAL extensibility which would help other applications
 as well.


 Sounds nice. WAL extensibility can help.

The problem with WAL extensibility is that extensions can come and go
and change over time. If the database can't interpret some WAL record
or misinterprets it because a module is missing or changed since that
record was written then you could lose your whole database. I think a
fundamental part of extensibility is isolating the effects of the
extensions from the rest of the system so that problem would have to
be tackled. Perhaps making each file owned by a single resource
manager and having the database be able to deal with individual files
being corrupted. But that doesn't deal with all record types and there
are situations where you really want to have part of a file contain
data managed by another resource manager.

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-02-13 Thread Amit Kapila

On Tuesday, February 12, 2013 9:02 PM Andres Freund wrote:
 On 2013-02-12 20:19:43 +0530, Amit Kapila wrote:
  On Tuesday, February 12, 2013 4:55 PM Andres Freund wrote:
   On 2013-02-12 14:57:51 +0530, Amit Kapila wrote:
On Tuesday, February 12, 2013 11:24 AM Boszormenyi Zoltan wrote:
 This mail lists this order for the single file approach:

  1) exlusive lock
  2) reload config file (to update in memory structures)
  3) check new variable
  4) write config file (needs to be done atomically)
  5) optionally reload config file
  6) reload lock

 The patch does it this way:
 1. check new variable. No problem with that, validation for
 proper
   GUC
 name,
 type of the value, etc. can be done outside the lock.
 2. grab lock
 3. parse the config file
 4. write the new config file
 5. release lock
  
   1) You need to grab the lock before the value is checked since some
  variables are interdependent (e.g. log_statement_stats,
 wal_level,
  archive_mode) and thus the check needs to be made after
 preventing
  concurrent changes.
 
  This can happen if we do any SIGHUP after the command, otherwise it
 will
  have old value only.
 
 Yes, and thats a problem imo.
 
 SET PERSISTENT log_statement_stats = true;
 restart;
 SET PERSISTENT log_statement_stats = false;
 SET PERSISTENT log_parser_stats = true;
 ERROR...
 
 thats why I think the config file needs to be processed.
 
   2) You need to apply the current config file for exactly the same
  reason before checking the new value. Also
  validate_conf_option/find_option doesn't work appropriately
 without
  an up2date config file. E.g. CURRENT does absurd things without
 but
   I
  think there are other cases as well.
 
  At this moment, I am not able to think through this, could you
 explain by
  small example.
 
 The most trivial one I can think of is:
 
 Transaction A: SET PERSISTENT blub = 'bar';
 Transaction B: SET PERSISTENT blub FROM CURRENT;
 
  I am thinking that shall we remove check hook function and do other
  validation only, as this will be done at time
  of reload, similar to what will get done when user manually edits the
  postgresql.conf file.
 
 Why? The user isn't editing the file by hand for a reason.
 
   I am not saying its impossible to do the one-file approach
 correctly, I
   just think its harder while not being more useful.
  
 Reloading the config file is intentionally not done, it's even
 documented.  You can do SELECT pg_reload_conf() after SET
   PERSISTENT
 if you need it.
  
   Not being done and it being documented as not doing so doesn't make
 it
   correct :P
   I think a SET having no immediate results is confusing. Especially
 if I
   am right and we do need to apply previous config changes before
 doing
   the next SET. But I don't have *that* strong feelings about it.
 
  I don't think any such expectation should be there, as with this
 feature
  (SET PERSISTENT),
  we will allow user to change the settings in file with command
 instead of
  manually editing the file.
 
 I don't see why that follows. The users *do* want something different,
 otherwise they would hand-edit the file.
 
 Specifically, LWLockAcquire() is called first, then
 ParseConfigFp()
 in a PG_TRY() block, so reading the original
 postgresql.auto.conf
 is serialized. No chance to lose changes done in parallel.
  
   Not a fundamental issue, but I just noticed LWLockRelease isn't
 called
   in the PG_CATCH branch in set_config_file. There's also an
 ereport()
   which needs to be moved into the PG_TRY to avoid exiting with a
 held
   lock.
 
  I think rollback/abort transaction due to error will handle release
 of
  locks.
 
 Yes, in a proper transaction abort this is done but for a utility
 command it might be possible to get there without a StartTransaction
 being done. I don't immediately see how, but I wouldn't want to rely on
 it, especially as doing it is simple.

We cannot directly call LWLockRelease in catch block as it will lead to
Assert failure.
The reason is errfinish will set InterruptHoldOffCount to 0, and now in
LWLockRelease, 
when it will call RESUME_INTERRUPTS it will check InterruptHoldOffCount
which leads to Assertion failure.

To handle it, we need to call HOLD_INTERRUPTS before it similar to
LWLockReleaseAll().
So I am not sure if we are not sure of scenario, we should add such calls in
Catch block.

   I think you also forgot to adjust copyfuncs.c et al for your
   VariableSetStmt change (addition of is_persistent).
 
  It is there in _copyVariableSetStmt() function.
 
 Oh, sorry, skipped over it somehow.
 
   What do you mean by framing a variable? Surrounding it by ?
 
  Sorry, I am not able to find framing in quotes.
 
 The quotes were just there to quote the word ;). I was referring to the
 following comment:
 
 + /*
 +  * The auto.conf.d directory should follow the postgresql.conf
 file
 +

Re: [HACKERS] Fractal tree indexing



Sent from my iPad

On 13-Feb-2013, at 18:21, Greg Stark st...@mit.edu wrote
 
 
 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.
 
 
How do we handle the case you mentioned, maybe a module that has been removed 
since a record was made? Is the solution that we encapsulate WAL from those 
kind of changes, and keep the WAL records same for everyone,irrespective 
whether they use an external module or not(I inferred this from Heikki's 
idea,or am I missing something here?)

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statistics and selectivity estimation for ranges


On 04.01.2013 10:42, Alexander Korotkov wrote:

/*
 * Calculate selectivity of  operator using histograms of range lower bounds
 * and histogram of range lengths.
 */
static double
calc_hist_selectivity_overlap(TypeCacheEntry *typcache, RangeBound *lower,
RangeBound *upper, RangeBound 
*hist_lower, int hist_nvalues,
Datum 
*length_hist_values, int length_hist_nvalues)


We already have code to estimate , based on the lower and upper bound 
histograms:



case OID_RANGE_OVERLAP_OP:
case OID_RANGE_CONTAINS_ELEM_OP:
/*
 * A  B = NOT (A  B OR A  B).
 *
 * range @ elem is equivalent to range  
[elem,elem]. The
 * caller already constructed the singular range from 
the element
 * constant, so just treat it the same as .
 */
hist_selec =
calc_hist_selectivity_scalar(typcache, 
const_lower, hist_upper,

 nhist, false);
hist_selec +=
(1.0 - calc_hist_selectivity_scalar(typcache, 
const_upper, hist_lower,

  nhist, true));
hist_selec = 1.0 - hist_selec;
break;


I don't think the method based on lower bound and length histograms is 
any better. In fact, my gut feeling is that it's less accurate. I'd 
suggest dropping that part of the patch.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Alexander Korotkov

On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark st...@mit.edu wrote:

 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.


It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Fractal tree indexing


On 13.02.2013 15:31, Alexander Korotkov wrote:

On Wed, Feb 13, 2013 at 4:51 PM, Greg Starkst...@mit.edu  wrote:


Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.


It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.


You could have a generic WAL record that applies changes to multiple 
pages atomically.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



Sent from my iPad

On 13-Feb-2013, at 19:05, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 13.02.2013 15:31, Alexander Korotkov wrote:
 On Wed, Feb 13, 2013 at 4:51 PM, Greg Starkst...@mit.edu  wrote:
 
 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.
 
 It will, for sure, works well when atomic page changes are enough for us.
 However, some operations, for example, page splits, contain changes in
 multiple pages. Replaying changes in only some of pages is not fair. Now,
 it's hard for me to imagine how to generalize it into generic WAL record
 type.
 
 You could have a generic WAL record that applies changes to multiple pages 
 atomically.
 
 

Sounds extremely interesting and fun.How would we go about implementing it?

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-02-13 Thread Alexander Korotkov

On Wed, Feb 13, 2013 at 5:28 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 04.01.2013 10:42, Alexander Korotkov wrote:

 /*
  * Calculate selectivity of  operator using histograms of range lower
 bounds
  * and histogram of range lengths.
  */
 static double
 calc_hist_selectivity_overlap(**TypeCacheEntry *typcache, RangeBound
 *lower,
 RangeBound *upper, RangeBound
 *hist_lower, int hist_nvalues,
 Datum
 *length_hist_values, int length_hist_nvalues)


 We already have code to estimate , based on the lower and upper bound
 histograms:

  case OID_RANGE_OVERLAP_OP:
 case OID_RANGE_CONTAINS_ELEM_OP:
 /*
  * A  B = NOT (A  B OR A  B).
  *
  * range @ elem is equivalent to range 
 [elem,elem]. The
  * caller already constructed the singular range
 from the element
  * constant, so just treat it the same as .
  */
 hist_selec =
 calc_hist_selectivity_scalar(**typcache,
 const_lower, hist_upper,

nhist, false);
 hist_selec +=
 (1.0 - 
 calc_hist_selectivity_scalar(**typcache,
 const_upper, hist_lower,

 nhist, true));
 hist_selec = 1.0 - hist_selec;
 break;


 I don't think the method based on lower bound and length histograms is any
 better. In fact, my gut feeling is that it's less accurate. I'd suggest
 dropping that part of the patch.


Right. This estimation has an accuracy of histogram, while estimation based
on lower bound and length histograms rely on additional assumption about
independence of lower bound and length histogram. We can sum A  B and A
 B probabilities because they are mutually exclusive. It's pretty evident
but I would like to mention it in the comments, because typical assumption
about events in statistics calculation is their independence.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Fractal tree indexing

On 13 February 2013 13:35, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 You could have a generic WAL record that applies changes to multiple pages
 atomically.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs si...@2ndquadrant.com wrote:
.
 
 I think its a good idea, the best idea even, but we still have no idea
 what the requirements are without a clear case for an external index.
 It could easily turn out that we invent a plausible API that's not
 actually of use because of requirements for locking. Whoever wants
 that can do the legwork.
 
 IIRC each of the new index types has required some changes to the
 generic APIs, which makes sense.
 
 

Does that mean we can add support for fractal tree indexes(or some thing on 
similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use case for 
generic WAL record(I am kind of obsessed with the idea of seeing fractal 
indexes being supported in Postgres).

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



On 02/13/2013 09:13 AM, Atri Sharma wrote:


Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs si...@2ndquadrant.com wrote:
.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.



Does that mean we can add support for fractal tree indexes(or some thing on 
similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use case for 
generic WAL record(I am kind of obsessed with the idea of seeing fractal 
indexes being supported in Postgres).




If they are patented as Alexander says upthread, then surely the idea is 
dead in the water.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

 
 
 
 If they are patented as Alexander says upthread, then surely the idea is dead 
 in the water.
 
 
True, I think so too.

But,the generic WAL seems an awesome idea and I would love to help.

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Craig Ringer

On 02/13/2013 10:43 PM, Andrew Dunstan wrote:

 On 02/13/2013 09:13 AM, Atri Sharma wrote:

 Sent from my iPad

 On 13-Feb-2013, at 19:31, Simon Riggs si...@2ndquadrant.com wrote:
 .
 I think its a good idea, the best idea even, but we still have no idea
 what the requirements are without a clear case for an external index.
 It could easily turn out that we invent a plausible API that's not
 actually of use because of requirements for locking. Whoever wants
 that can do the legwork.

 IIRC each of the new index types has required some changes to the
 generic APIs, which makes sense.


 Does that mean we can add support for fractal tree indexes(or some
 thing on similar lines) in the regular way by changing the generic APIs?

 IMO, we could design the fractal tree index and use it as the use
 case for generic WAL record(I am kind of obsessed with the idea of
 seeing fractal indexes being supported in Postgres).


 If they are patented as Alexander says upthread, then surely the idea
 is dead in the water.
Isn't practically everything patented, with varying degrees of validity
and patent defensibility? Particularly the trick of renewing expired
patents by submitting tiny variations.

I realise that this general situation is different to knowing about a
specific patent applying to a specific proposed technique, particularly
regarding the USA's insane triple-damages-for-knowing-about-it thing,
and that a patent can well and truly block the adoption of a technique
into Pg core. It might not prevent its implementation as an out-of-tree
extension though, even if that requires some enhancements to core APIs
to make it possible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

Atri Sharma atri.j...@gmail.com writes:
 Do you think building a new index in postgres with fractal trees as
 the basis would serve the purpose? or is there something else we
 should think of?

First explain why you couldn't build it as an opclass for gist or
spgist ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Heikki Linnakangas hlinnakan...@vmware.com writes:
 At least in back-branches, I'd call this a pilot error. You can't turn a 
 master into a standby just by creating a recovery.conf file. At least 
 not if the master was not shut down cleanly first.
 ...
 I'm not sure that's worth the trouble, though. Perhaps it would be 
 better to just throw an error if the control file state is 
 DB_IN_PRODUCTION and a recovery.conf file exists.

+1 for that approach, at least until it's clear there's a market for
doing this sort of thing.  I think the error check could be
back-patched, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_xlogdump

Here's an updated version of pg_xlogdump.  This is rebased on top of the
committed xlogreader, palloc restructuring and libpgcommon, PG_RMGR
stuff, and is basically a revamped version of what Andres submitted in
http://www.postgresql.org/message-id/1357672187-7693-5-git-send-email-and...@2ndquadrant.com

I also attach a patch to move the relpathbackend() function to
src/common.  On top of that, it's trivial to change pg_xlogdump and
remove the uninplemented stub we're currently using.  (I also tried it
with a real implementation of relpath() that was mostly a duplicate of
the backend's relpath(), but I didn't like the duplication at all even
though I stripped the unnecessary bits).

(The more adventorous might think about moving timestamp_to_str to
src/common, as well, but this isn't a simple task: it depends on some
backend global state variables such as GUC vars, so it requires detailed
surgery.  I think it's a worthy goal nonetheless, because it'd allow us
to reduce useless duplication in ECPG, but it's not a 9.3 project)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** /dev/null
--- b/contrib/pg_xlogdump/Makefile
***
*** 0 
--- 1,32 
+ # contrib/pg_xlogdump/Makefile
+ 
+ PGFILEDESC = pg_xlogdump
+ PGAPPICON=win32
+ 
+ PROGRAM = pg_xlogdump
+ OBJS = pg_xlogdump.o compat.o xlogreader.o rmgrdesc.o \
+ 	$(RMGRDESCOBJS) $(WIN32RES)
+ 
+ RMGRDESCSOURCES = $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc.c))
+ RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
+ 
+ EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c rmgrdesc.c
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/pg_xlogdump
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 
+ override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+ 
+ rmgrdesc.c xlogreader.c: % : $(top_srcdir)/src/backend/access/transam/%
+ 	rm -f $@  $(LN_S) $ .
+ 
+ $(RMGRDESCSOURCES): % : $(top_srcdir)/src/backend/access/rmgrdesc/%
+ 	rm -f $@  $(LN_S) $ .
*** /dev/null
--- b/contrib/pg_xlogdump/compat.c
***
*** 0 
--- 1,83 
+ /*-
+  *
+  * compat.c
+  *		Reimplementations of various backend functions.
+  *
+  * Portions Copyright (c) 2012, PostgreSQL Global Development Group
+  *
+  * IDENTIFICATION
+  *		contrib/pg_xlogdump/compat.c
+  *
+  * This file contains client-side implementations for various backend
+  * functions that the rm_desc functions in *desc.c files rely on.
+  *
+  *-
+  */
+ 
+ /* ugly hack, same as in e.g pg_controldata */
+ #define FRONTEND 1
+ #include postgres.h
+ 
+ #include catalog/catalog.h
+ #include catalog/pg_tablespace.h
+ #include common/fe_memutils.h
+ #include datatype/timestamp.h
+ #include lib/stringinfo.h
+ #include storage/relfilenode.h
+ 
+ const char *
+ timestamptz_to_str(TimestampTz dt)
+ {
+ 	return unimplemented-timestamp;
+ }
+ 
+ /*
+  * Table of fork names.
+  *
+  * needs to be synced with src/backend/catalog/catalog.c
+  */
+ const char *forkNames[] = {
+ 	main,		/* MAIN_FORKNUM */
+ 	fsm,		/* FSM_FORKNUM */
+ 	vm,		/* VISIBILITYMAP_FORKNUM */
+ 	init		/* INIT_FORKNUM */
+ };
+ #define FORKNAMECHARS	4
+ 
+ /*
+  * relpathbackend
+  *
+  * A pg_xlogdump implementation of the backend function of the same name.
+  * This one doesn't accept non-permanent relfilenodes.
+  */
+ char *
+ relpathbackend(RelFileNode rnode, BackendId backend, ForkNumber forknum)
+ {
+ 	return pg_strdup(unimplemented-relpath);
+ }
+ 
+ /*
+  * Provide a hacked up compat layer for StringInfos so xlog desc functions can
+  * be linked/called.
+  */
+ void
+ appendStringInfo(StringInfo str, const char *fmt, ...)
+ {
+ 	va_list		args;
+ 
+ 	va_start(args, fmt);
+ 	vprintf(fmt, args);
+ 	va_end(args);
+ }
+ 
+ void
+ appendStringInfoString(StringInfo str, const char *string)
+ {
+ 	appendStringInfo(str, %s, string);
+ }
+ 
+ void
+ appendStringInfoChar(StringInfo str, char ch)
+ {
+ 	appendStringInfo(str, %c, ch);
+ }
*** /dev/null
--- b/contrib/pg_xlogdump/pg_xlogdump.c
***
*** 0 
--- 1,674 
+ /*-
+  *
+  * pg_xlogdump.c - decode and display WAL
+  *
+  * Copyright (c) 2012, PostgreSQL Global Development Group
+  *
+  * IDENTIFICATION
+  *		  contrib/pg_xlogdump/pg_xlogdump.c
+  *-
+  */
+ 
+ #define FRONTEND 1
+ #include postgres.h
+ 
+ #include unistd.h
+ 
+ #include rmgrdesc.h
+ 
+ #include access/xlog.h
+ #include access/xlogreader.h
+ #include access/transam.h
+ #include catalog/catalog.h
+ #include common/fe_memutils.h
+ #include getopt_long.h
+ 
+ 
+ static

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

Hello

2013/2/13 Dean Rasheed dean.a.rash...@gmail.com:
 On 11 February 2013 14:29, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 updated patch

 * merged Dean's doc
 * allow NULL as width


 Hi,
 I have not had time to look at this properly, but it doesn't look as
 though you have fixed the other problem I mentioned up-thread, with %s
 for NULL values:

 SELECT format('|%s|', NULL);
 Result: ||
 SELECT format('|%5s|', NULL);
 Result: ||

 In the second case, I think it should produce | |.

fixed

Regards

Pavel Stehule


 Regards,
 Dean


format-width-20130213.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-02-13 Thread Kohei KaiGai

Sorry for my late updates.

I tried to update list of permissions that sepgsql expects, even though
the description might be still a bit rough...
https://wiki.postgresql.org/wiki/SEPostgreSQL_Permissions

Set of permissions are defined for each object class that represents
a particular database object. This list summarize all the defined
permissions and introduction of the case when it shall be checked.

Right now, the list of permissions are based on the latest selinux
policy release at 20120725, but db_materialized_view class will
be (probably) added in the future release somewhere in 2013.
So, I added a short mention of this.

My 2/3 and 3.3 patch try to add support search permission of
db_schema class and execute permission of db_procedure class.
It tries to implement relevant checks, but not supported yet.

Does the permission list help to understand what does these
patch try to tackle?

Thanks,

2013/1/29 Simon Riggs si...@2ndquadrant.com:
 On 29 January 2013 14:39, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 2013/1/29 Simon Riggs si...@2ndquadrant.com:
 On 29 January 2013 13:30, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 It makes unavailable to control execution of
 functions from viewpoint of selinux, and here is no way selinux
 to prevent to execute functions defined by other domains, or
 others being not permitted.
 Also, what we want to do is almost same as existing permission
 checks, except for its criteria to make access control decision.

 Do you have a roadmap of all the things this relates to?

 If selinux has a viewpoint, I'd like to be able to see a list of
 capabilities and then which ones are currently missing. I guess I'm
 looking for external assurance that someone somewhere needs this and
 that it fits into a complete overall plan of what we should do. Just
 like we are able to use SQLStandard as a guide as to what we need to
 implement, we would like something to refer back to. Does this have a
 request id, specification document page number or whatever?

 I previously made several wiki pages for reference of permissions
 to be checked, but it needs maintenance works towards the latest
 state, such as newly added permissions.
   http://wiki.postgresql.org/wiki/SEPostgreSQL_References

 Even though selinuxproject.org hosts permission list, it is more
 rough than what I described at wiki.postgresql.org.
   
 http://www.selinuxproject.org/page/ObjectClassesPerms#Database_Object_Classes

 Unlike SQL standard, we have less resource to document its spec
 being validated by third persons. However, it is a reasonable solution
 to write up which permission shall be checked on which timing.

 Let me revise the above wikipage to show my overall plan.

 OK, that's looking like a good and useful set of info.

 What we need to do is to give the SELinux API a spec/version number
 (yes, the SELinux one) and then match what PostgreSQL implements
 against that, so we can say we are moving towards spec compliance with
 1.0 and we have a list of unimplemented features...

 That puts this in a proper context, so we know what we are doing, why
 we are doing it and also when we've finished it. And also, how to know
 what future external changes will cause additional work.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: ports/174020: initdb will not run on a mounted FS with a snapshot directory

2013-02-13 Thread Allen Landsidel


Forgive the top-reply.

After reading the discussion, I'm in favor of the philosophically 
correct approach rather than the usually-technically-correct approach.  
That is, display the warning, but let the sysadmin/dba do what they 
need/want to do and trust that, most of the time, they know what they're 
doing.


As the most interesting man might say, I don't always have PGDATA as a 
mount point for another filesystem, but when I do...


On 2/6/2013 15:29, Kubilay Kocak wrote:

I had a chat with a couple of the folks on IRC (#PostgreSQL/FreeNode)
to get their feedback, and seanc passed it on to the upstream mailing
lists with a slightly more refined version of the patch ignoring all
dot dirs. The thread is here for reference:

http://www.postgresql.org/message-id/a37dc83f-75aa-4850-8261-ad16b104f...@chittenden.org





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



Sent from my iPad

On 13-Feb-2013, at 20:30, Tom Lane t...@sss.pgh.pa.us wrote:

 
 First explain why you couldn't build it as an opclass for gist or
 spgist ...
 
 

That needs thinking about a bit.I was confused about the current indexes 
because they all build on BTrees.But, building an opclass with GiST should be a 
good solution.

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding

2013-02-13 Thread Merlin Moncure

On Tue, Feb 12, 2013 at 1:18 PM, David E. Wheeler da...@justatheory.com wrote:

couple other suggestions:

 Existing Name  Proposed Name
 -- 
 json_array_length() array_length() or length() or size()

very much prefer without 'array_' prefix as this leads to semantic
confusion with our (generally badly named) array manipulation API.
So, length() -- also I see no reason why this can't be made to run if
the outermost container is an object but that's an acceptable
implementation detail.

 json_each() each_json()

why not each().  Assumption to return json is reasonable and doesn't
need decoration IMO.

 json_each_as_text() each_text()

like this

 json_get()  get_json()

prefer get()

 json_get_as_text()  get_text()

like this

 json_get_path() get_json()

get() please

 json_get_path_as_text() get_text()

like this

 json_object_keys()  get_keys()

like this

 json_populate_record()  record() or row()

to_record()

 json_populate_recordset()   records() or rows()

to_recordset()

 json_unnest()   get_values()

greatly prefer unwrap()

 json_agg()  collect_json()

perfer to leave as json_agg() -- we have string_agg, array_agg, etc.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_xlogdump

2013-02-13 Thread Andres Freund

On 2013-02-13 12:09:37 -0300, Alvaro Herrera wrote:
 Here's an updated version of pg_xlogdump.  This is rebased on top of the
 committed xlogreader, palloc restructuring and libpgcommon, PG_RMGR
 stuff, and is basically a revamped version of what Andres submitted in
 http://www.postgresql.org/message-id/1357672187-7693-5-git-send-email-and...@2ndquadrant.com

Two tiny followup bits, I had fixed since:
* one copy-and-paste-o in an error message
* replace stupid directory verification implementation
* fix include in compat.c to include utils/timestamp.h instead of
  datatype/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/contrib/pg_xlogdump/compat.c b/contrib/pg_xlogdump/compat.c
index a3c98c6..dc54bad 100644
--- a/contrib/pg_xlogdump/compat.c
+++ b/contrib/pg_xlogdump/compat.c
@@ -21,7 +21,7 @@
 #include catalog/catalog.h
 #include catalog/pg_tablespace.h
 #include common/fe_memutils.h
-#include datatype/timestamp.h
+#include utils/timestamp.h
 #include lib/stringinfo.h
 #include storage/relfilenode.h
 
diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index c2ed1b8..9d8597f 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -12,6 +12,7 @@
 #define FRONTEND 1
 #include postgres.h
 
+#include dirent.h
 #include unistd.h
 
 #include rmgrdesc.h
@@ -64,19 +65,16 @@ fatal_error(const char *fmt,...)
 }
 
 /*
- * Check whether directory exists and whether we can open it.
- * errno is kept set so that the caller can report errors.
+ * Check whether directory exists and whether we can open it. Keep errno set so
+ * that the caller can report errors somewhat more accurate.
  */
 static bool
 verify_directory(const char *directory)
 {
-	int			fd = open(directory, O_DIRECTORY | O_RDONLY);
-
-	if (fd  0)
+	DIR *dir = opendir(directory);
+	if (dir == NULL)
 		return false;
-
-	close(fd);
-	errno = 0;	/* ignore errors in this case */
+	closedir(dir);
 	return true;
 }
 
@@ -560,7 +558,7 @@ main(int argc, char **argv)
 		else if (!XLByteInSeg(private.startptr, segno))
 		{
 			fprintf(stderr,
-	%s: end log position %X/%X is not inside file \%s\\n,
+	%s: start log position %X/%X is not inside file \%s\\n,
 	progname,
 	(uint32) (private.startptr  32),
 	(uint32) private.startptr,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting visibility map in VACUUM's second phase


On 03.02.2013 08:24, Pavan Deolasee wrote:

On Sun, Feb 3, 2013 at 2:31 AM, Jeff Janesjeff.ja...@gmail.com  wrote:

I've attached a patch with these changes made.  Does this look OK?


Looks good to me. I also repeated pgbench and make check and they work
as expected. I'll add it to the CF and also mark the patch ready for
committer


Looks good to me too. Committed, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing


On 13.02.2013 17:49, Atri Sharma wrote:

On 13-Feb-2013, at 20:30, Tom Lanet...@sss.pgh.pa.us  wrote:


First explain why you couldn't build it as an opclass for gist or
spgist ...


That needs thinking about a bit.I was confused about the current indexes 
because they all build on BTrees.But, building an opclass with GiST should be a 
good solution.


That makes no sense. I don't see any way to implement this in an 
opclass, and it wouldn't make sense to re-implement this for every 
opclass anyway.


The basic idea of a fractal tree index is to attach a buffer to every 
non-leaf page. On insertion, instead of descending all the way down to 
the correct leaf page, the new tuple is put on the buffer at the root 
page. When that buffer fills up, all the tuples in the buffer are 
cascaded down to the buffers on the next level pages. And recursively, 
whenever a buffer fills up at any level, it's flushed to the next level. 
This speeds up insertions, as you don't need to fetch and update the 
right leaf page on every insert; the lower-level pages are updated in 
batch as a buffer fills up.


As I said earlier, this is very similar to the way the GiST buffering 
build algorithm works. It could be applied to any tree-structured access 
method, including b-tree, GiST and SP-GiST.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Add noreturn attributes to some error reporting functions

2013-02-13 Thread Peter Eisentraut

On 2/12/13 7:19 AM, Andres Freund wrote:
 On 2013-02-12 12:14:06 +, Peter Eisentraut wrote:
 Add noreturn attributes to some error reporting functions
 
 I wonder if its time to add a macro for this instead of slapping
 __attribute__((noreturn)) everywhere. That way msvc had a chance of
 adding __declspec(noreturn) or whatever its magic incatation is
 transparently. And perhaps other compilers in the future.

Sure, if someone who has access to both msvc and gcc wanted to put in
the leg work, it would be worth a try.

One possible problem is whether the placement of these modifiers is the
same across all potential compilers.  But we currently use these in
enough places that it would be easy to find that out by just converting
everything and building it.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



Sent from my iPad
.
 
 That makes no sense. I don't see any way to implement this in an opclass, and 
 it wouldn't make sense to re-implement this for every opclass anyway.
 
 The basic idea of a fractal tree index is to attach a buffer to every 
 non-leaf page. On insertion, instead of descending all the way down to the 
 correct leaf page, the new tuple is put on the buffer at the root page. When 
 that buffer fills up, all the tuples in the buffer are cascaded down to the 
 buffers on the next level pages. And recursively, whenever a buffer fills up 
 at any level, it's flushed to the next level. This speeds up insertions, as 
 you don't need to fetch and update the right leaf page on every insert; the 
 lower-level pages are updated in batch as a buffer fills up.
 
 As I said earlier, this is very similar to the way the GiST buffering build 
 algorithm works. It could be applied to any tree-structured access method, 
 including b-tree, GiST and SP-GiST.
 

Can we implement it in a generic manner then? I mean,irrespective of the tree 
it is being applied to,be it BTree,gist or spgist?

Another thing,in case of a large tree which is split over multiple pages, how 
do we reduce the cost of I/o to fetch and rewrite all those pages?

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every 
 non-leaf page. On insertion, instead of descending all the way down to 
 the correct leaf page, the new tuple is put on the buffer at the root 
 page. When that buffer fills up, all the tuples in the buffer are 
 cascaded down to the buffers on the next level pages. And recursively, 
 whenever a buffer fills up at any level, it's flushed to the next level. 

[ scratches head... ]  What's fractal about that?  Or is that just a
content-free marketing name for this technique?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding



On 02/12/2013 02:18 PM, David E. Wheeler wrote:

Hello Hackers,

If you dislike bike-shedding (and who does?), delete this email and the ensuing 
thread right now. You have been warned!

I have been playing with Andrew’s JSON enhancements and really enjoying them. I 
am already using them in code I’m developing for production deployment in a 
month or two. Kudos!

However, I am not so keen on the function names. They all start with json_! 
This mostly feels redundant to me, since the types of the parameters are part 
of the function signature.


I don't have any problem getting rid of the json_ prefixes, except for 
json_agg which I think should keep it (c.f. string_agg, array_agg).



Therefore, I would like to propose different names:

Existing Name  Proposed Name
-- 
json_array_length() array_length() or length() or size()
json_each() each_json()
json_each_as_text() each_text()
json_get()  get_json()
json_get_as_text()  get_text()
json_get_path() get_json()
json_get_path_as_text() get_text()
json_object_keys()  get_keys()
json_populate_record()  record() or row()
json_populate_recordset()   records() or rows()
json_unnest()   get_values()
json_agg()  collect_json()

Note that I have given json_get() and json_get_path() the same names, as it 
seems to me that the former is the same as the latter, with only one parameter. 
Same for json_get_as_text() and json_get_path_as_text().


I will take some of this under advisement. Note that 
json_populate_record's name was taken from hstore's populate_record, so 
if we're trying to use similar names then it should possibly be just 
populate_record. Or if that's still a bit long I would accept to_record.





One nice thing about get_values() as opposed to json_unnest(), is that it could 
be used to fetch the values from a JSON object as well as an array. (BTW, I 
think unnest is not a good name at all, since unlike the SQL unnest() function, 
it doesn't actually unnest (flatten) the entire array).


I think Merlin's suggestion if unwrap might be good. Or simply 
elements() might work.




As for the operators, as previously discussed, I'm happy with either - or ~ (and - or 
~, of course). But I'm wondering if the same operator couldn't be used when an array is on the RHS. I 
mean, having # to that it doesn't have to be cast is nice, too, but I think it'd be nice if an array 
would work with - and -, too.


The point of using different operator names is that if there's an array 
literal postgres will convert it to an array. If the operator names are 
the same it will treat it as a text key instead. Being able to type


my_json # '{f1,0,f2,3,f3}'

is nice. Of course, we could duplicate the operators, but I generally 
prefer not to do that.




AS for # and #, what about @ and @ instead? Or am I just too much the Perl hacker 
for thinking that @ is a nice mnemonic for array?


Probably. I deliberately avoided @ because it's used elsewhere to mean 
contains and using it for something quite different here might be 
confusing.




And finally, a couple of feature requests, which can be taken with a shaker of 
salt -- or as ideas for 9.4 -- and are mostly stolen from hstore:

* An exists() function (and ? operator) similar to hstore
* A defined() function
* A delete() function
* A slice() function
* A concatenation function and operator
* union, intercept, and except operators and/or functions
* Perhaps some set-returning functions (select_keys(), select_values())



I think this is beyond bikeshedding. Apparently you have missed the 
existence of json_object_keys().


The new API makes many or all of these things possible to do with 
relative ease as extensions (See my possibly upcoming talk on the subject.)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



On 02/13/2013 11:20 AM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.

[ scratches head... ]  What's fractal about that?  Or is that just a
content-free marketing name for this technique?





And if that's all it is then I have some doubt about its patentability. 
For one thing I'd be mildly surprised if there weren't prior art. But of 
course, IANAL :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing


On 13.02.2013 18:20, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.


[ scratches head... ]  What's fractal about that?  Or is that just a
content-free marketing name for this technique?


I'd call it out as a marketing name. I guess it's fractal in the sense 
that all levels of the tree can hold leaf tuples in the buffers; the 
structure looks the same no matter how deep you zoom, like a fractal.. 
But Buffered would be more appropriate IMO.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing


On 13.02.2013 18:43, Andrew Dunstan wrote:


On 02/13/2013 11:20 AM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.

[ scratches head... ] What's fractal about that? Or is that just a
content-free marketing name for this technique?


And if that's all it is then I have some doubt about its patentability.
For one thing I'd be mildly surprised if there weren't prior art. But of
course, IANAL :-)


Agreed, but IANAL either. The papers the GiST buffering build algorithm 
was based pre-dates Tokutek's fractal indexes, for starters. Of course, 
all I know about fractal indexes is what I've read on some presentation 
slides on the 'net, so I might be missing something.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing


Yeah,it is just a fancy name for something that has nothing to do with 
fractals.I guess everything suave these days is fractal!

That said,the buffered concept itself looks really cool and should help us in 
large data sets.I am eager to get off the mark with it.

Will we be building the index from scratch? And how would we go about it?

We will need to be careful about the buffer size per node, in order to ensure 
that the pushing of tuples from nodes in large data sets does not become a 
substantial overhead.

Atri
Sent from my iPad

On 13-Feb-2013, at 22:21, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 13.02.2013 18:43, Andrew Dunstan wrote:
 
 On 02/13/2013 11:20 AM, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.
 [ scratches head... ] What's fractal about that? Or is that just a
 content-free marketing name for this technique?
 
 And if that's all it is then I have some doubt about its patentability.
 For one thing I'd be mildly surprised if there weren't prior art. But of
 course, IANAL :-)
 
 Agreed, but IANAL either. The papers the GiST buffering build algorithm was 
 based pre-dates Tokutek's fractal indexes, for starters. Of course, all I 
 know about fractal indexes is what I've read on some presentation slides on 
 the 'net, so I might be missing something.
 
 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding

Andrew Dunstan and...@dunslane.net writes:
 I will take some of this under advisement. Note that 
 json_populate_record's name was taken from hstore's populate_record, so 
 if we're trying to use similar names then it should possibly be just 
 populate_record. Or if that's still a bit long I would accept to_record.

+1 for following precedent whenever there is some --- so let's go with
populate_record.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding

2013-02-13 Thread David E . Wheeler

On Feb 13, 2013, at 8:36 AM, Andrew Dunstan and...@dunslane.net wrote:

 I don't have any problem getting rid of the json_ prefixes, except for 
 json_agg which I think should keep it (c.f. string_agg, array_agg).

I think that's an unfortunately naming forced on us by the SQL standard, and it 
doesn't mean we have to use it anyway.

 I will take some of this under advisement. Note that json_populate_record's 
 name was taken from hstore's populate_record, so if we're trying to use 
 similar names then it should possibly be just populate_record. Or if that's 
 still a bit long I would accept to_record.

to_record()++

 I think Merlin's suggestion if unwrap might be good. Or simply elements() 
 might work.

Perhaps unwrap() returns a set and elements() returns an array?

 AS for # and #, what about @ and @ instead? Or am I just too much the 
 Perl hacker for thinking that @ is a nice mnemonic for array?
 
 Probably. I deliberately avoided @ because it's used elsewhere to mean 
 contains and using it for something quite different here might be confusing.

I can see that, especially if you end up adding exists(): @ could be its 
operator.

 I think this is beyond bikeshedding. Apparently you have missed the existence 
 of json_object_keys().

Oh, I forgot it returned a set rather than an array. So I suggest:

   values() - Returns an array
   keys()   - Returns an array

And:

   unwrap() - Returns a set
   skeys()  - Returns a set

Er, okay, so skeys() sucks alongside the others here. If we were to steal from 
hstore, these would be:

   svals() - Returns a set
   skeys() - Returns a set
   avals() - Returns an array
   akeys() - Returns an array

I don’t love those, but if we want to follow precedent…

 The new API makes many or all of these things possible to do with relative 
 ease as extensions (See my possibly upcoming talk on the subject.)

I’ll be there, yo!

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alias hstore's ? to ~ so that it works with JDBC

2013-02-13 Thread Seamus Abshere


On 2/11/13 1:35 PM, Heikki Linnakangas wrote:

I agree it's pretty dumb that there's currently no such escape. I think JDBC 
inherited that design mistake from ODBC. Fixing that would be a good idea.


Lance Anderson, Oracle's JDBC spec lead, says [1] we can implement 
something like:


SELECT * FROM tbl WHERE data {postgres qm} 'abc'

Thanks to Mark Rotteveel for driving the discussion. [2]


That said, it occurs to me that there's one fairly simple thing we could also 
do in the backend. At the moment, unlike function and table names, operators 
cannot be quoted. It would be easy to allow this in the grammar:

select 1 operator(+) 1;


I guess I see 2 simple options and 1 complex option:

a) [simple] operator(+) per Heikki
b) [simple, but not popular] alias ? to ~ per Seamus
c) [complex] {postgres blah} per Lance/Mark

How to decide?

Best,
Seamus


[1] 
http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2013-February/58.html
[2] 
http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2013-February/date.html#51 
(threaded view gets it out of order)



--
Seamus Abshere
sea...@abshere.net
https://github.com/seamusabshere


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alias hstore's ? to ~ so that it works with JDBC

2013-02-13 Thread Dave Cramer

Since we already do escape processing much like c that might not be so
complex. However I haven't looked at the code, so I could be way off base.

The question I would pose is how palatable is it to use ? In other words is
it worth pursuing ?

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca


On Wed, Feb 13, 2013 at 12:10 PM, Seamus Abshere sea...@abshere.net wrote:

 On 2/11/13 1:35 PM, Heikki Linnakangas wrote:

 I agree it's pretty dumb that there's currently no such escape. I think
 JDBC inherited that design mistake from ODBC. Fixing that would be a good
 idea.


 Lance Anderson, Oracle's JDBC spec lead, says [1] we can implement
 something like:

 SELECT * FROM tbl WHERE data {postgres qm} 'abc'

 Thanks to Mark Rotteveel for driving the discussion. [2]


  That said, it occurs to me that there's one fairly simple thing we could
 also do in the backend. At the moment, unlike function and table names,
 operators cannot be quoted. It would be easy to allow this in the grammar:

 select 1 operator(+) 1;


 I guess I see 2 simple options and 1 complex option:

 a) [simple] operator(+) per Heikki
 b) [simple, but not popular] alias ? to ~ per Seamus
 c) [complex] {postgres blah} per Lance/Mark

 How to decide?

 Best,
 Seamus


 [1] http://mail.openjdk.java.net/**pipermail/jdbc-spec-discuss/**
 2013-February/58.htmlhttp://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2013-February/58.html
 [2] http://mail.openjdk.java.net/**pipermail/jdbc-spec-discuss/**
 2013-February/date.html#51http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2013-February/date.html#51(threaded
  view gets it out of order)



 --
 Seamus Abshere
 sea...@abshere.net
 https://github.com/**seamusabshere https://github.com/seamusabshere

Re: [HACKERS] JSON Function Bike Shedding

2013/2/13 David E. Wheeler da...@justatheory.com:
 On Feb 13, 2013, at 8:36 AM, Andrew Dunstan and...@dunslane.net wrote:

 I don't have any problem getting rid of the json_ prefixes, except for 
 json_agg which I think should keep it (c.f. string_agg, array_agg).

 I think that's an unfortunately naming forced on us by the SQL standard, and 
 it doesn't mean we have to use it anyway.

 I will take some of this under advisement. Note that json_populate_record's 
 name was taken from hstore's populate_record, so if we're trying to use 
 similar names then it should possibly be just populate_record. Or if that's 
 still a bit long I would accept to_record.

 to_record()++

 I think Merlin's suggestion if unwrap might be good. Or simply elements() 
 might work.

 Perhaps unwrap() returns a set and elements() returns an array?

 AS for # and #, what about @ and @ instead? Or am I just too much the 
 Perl hacker for thinking that @ is a nice mnemonic for array?

 Probably. I deliberately avoided @ because it's used elsewhere to mean 
 contains and using it for something quite different here might be 
 confusing.

 I can see that, especially if you end up adding exists(): @ could be its 
 operator.

 I think this is beyond bikeshedding. Apparently you have missed the 
 existence of json_object_keys().

 Oh, I forgot it returned a set rather than an array. So I suggest:

values() - Returns an array
keys()   - Returns an array

values is keyword and keys is relative high risk too

Regards

Pavel


 And:

unwrap() - Returns a set
skeys()  - Returns a set

 Er, okay, so skeys() sucks alongside the others here. If we were to steal 
 from hstore, these would be:

svals() - Returns a set
skeys() - Returns a set
avals() - Returns an array
akeys() - Returns an array

 I don’t love those, but if we want to follow precedent…

 The new API makes many or all of these things possible to do with relative 
 ease as extensions (See my possibly upcoming talk on the subject.)

 I’ll be there, yo!

 David




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding



On 02/13/2013 12:07 PM, David E. Wheeler wrote:

On Feb 13, 2013, at 8:36 AM, Andrew Dunstan and...@dunslane.net wrote:


I don't have any problem getting rid of the json_ prefixes, except for json_agg 
which I think should keep it (c.f. string_agg, array_agg).

I think that's an unfortunately naming forced on us by the SQL standard, and it 
doesn't mean we have to use it anyway.


Regardless of that, I'd prefer to be consistent.





I think Merlin's suggestion if unwrap might be good. Or simply elements() 
might work.

Perhaps unwrap() returns a set and elements() returns an array?



Now you're adding functionality. Let's just keep this to the question of 
names.






I think this is beyond bikeshedding. Apparently you have missed the existence 
of json_object_keys().

Oh, I forgot it returned a set rather than an array. So I suggest:

values() - Returns an array
keys()   - Returns an array

And:

unwrap() - Returns a set
skeys()  - Returns a set

Er, okay, so skeys() sucks alongside the others here. If we were to steal from 
hstore, these would be:

svals() - Returns a set
skeys() - Returns a set
avals() - Returns an array
akeys() - Returns an array

I don’t love those, but if we want to follow precedent…



Ditto. I think we're a bit late to be adding functionality.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing



Sent from my iPad

On 13-Feb-2013, at 22:21, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 13.02.2013 18:43, Andrew Dunstan wrote:
 
 On 02/13/2013 11:20 AM, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic  fractal indexes is what I've read on some presentation slides 
 on the 'net, so I might be missing something.
 

I think that buffering can be applied to BTree, R Tree and GisT in more or less 
the same manner.

Is there a way we can abstract the buffering part out of them all?

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding

2013-02-13 Thread David E. Wheeler

On Feb 13, 2013, at 9:31 AM, Andrew Dunstan and...@dunslane.net wrote:

 I don’t love those, but if we want to follow precedent…
 
 Ditto. I think we're a bit late to be adding functionality.

Well, how about having just keys() and vals() return arrays? Then one can just 
wrap them in unnest() to get sets.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON Function Bike Shedding

Andrew Dunstan wrote:
 
 On 02/13/2013 12:07 PM, David E. Wheeler wrote:
 On Feb 13, 2013, at 8:36 AM, Andrew Dunstan and...@dunslane.net wrote:

 I think Merlin's suggestion if unwrap might be good. Or simply elements() 
 might work.
 Perhaps unwrap() returns a set and elements() returns an array?
 
 Now you're adding functionality. Let's just keep this to the
 question of names.

I agree with that, but it seems a good idea to leave names available for
future functionality, where reasonable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alias hstore's ? to ~ so that it works with JDBC

Seamus Abshere escribió:

 [1] 
 http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2013-February/58.html
 [2] 
 http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/2013-February/date.html#51
 (threaded view gets it out of order)

Ooh, how archaic --- they're still using Mhonarc to webify list
archives! ;-)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

On 13 February 2013 16:48, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 13.02.2013 18:20, Tom Lane wrote:

 Heikki Linnakangashlinnakan...@vmware.com  writes:

 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.


 [ scratches head... ]  What's fractal about that?  Or is that just a
 content-free marketing name for this technique?


 I'd call it out as a marketing name. I guess it's fractal in the sense that
 all levels of the tree can hold leaf tuples in the buffers; the structure
 looks the same no matter how deep you zoom, like a fractal.. But Buffered
 would be more appropriate IMO.

I hope for their sake there is more to it than that. It's hard to see
how buffering can be patented.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Joshua D. Drake



On 02/13/2013 09:54 AM, Simon Riggs wrote:


I'd call it out as a marketing name. I guess it's fractal in the sense that
all levels of the tree can hold leaf tuples in the buffers; the structure
looks the same no matter how deep you zoom, like a fractal.. But Buffered
would be more appropriate IMO.


I hope for their sake there is more to it than that. It's hard to see
how buffering can be patented.


Talk to Apple about that. It only needs to be worded correctly.

JD







--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 To be precise, we'd need to update the control file on every XLogFlush(),
 like we do during archive recovery. That would indeed be unacceptable from a
 performance point of view. Updating the control file that often would also
 be bad for robustness.

If those arguments make sense, then why don't they apply to recovery as well?

It sounds like we need to look at something better for use during
archive recovery.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Josh Berkus

On 02/13/2013 01:01 AM, Atri Sharma wrote:
 Hi all,
 
 Just a curiosity I couldnt control. I was recently reading about
 Fractal tree indexing
 (http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
 how TokuDB engine for MySQL is really working nicely with big data.
 
 I was wondering, do we have support for fractal tree indexing? I mean,
 it really does seem to help manage big data, so we could think of
 supporting it in some form for our large data set clients( if it is
 not happening already someplace which I have missed).

Sadly, fractal trees are an invention of Tokutek and are heavily and
publically patented.  That's why I haven't pursued the idea, and indeed
why Tokutek hasn't developed a PostgreSQL plug-in.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Simon Riggs si...@2ndquadrant.com writes:
 On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 To be precise, we'd need to update the control file on every XLogFlush(),
 like we do during archive recovery. That would indeed be unacceptable from a
 performance point of view. Updating the control file that often would also
 be bad for robustness.

 If those arguments make sense, then why don't they apply to recovery as well?

In plain old crash recovery, don't the checks on whether to apply WAL
records based on LSN take care of this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery


On 13.02.2013 20:25, Simon Riggs wrote:

On 13 February 2013 09:04, Heikki Linnakangashlinnakan...@vmware.com  wrote:


To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.


If those arguments make sense, then why don't they apply to recovery as well?


To some degree, they do. The big difference is that during normal 
operation, every commit is XLogFlushed(). During recovery, XLogFlush() 
happens much less frequently - certainly not after replaying each commit 
record.



It sounds like we need to look at something better for use during
archive recovery.


Well, no-one's complained about the performance. From a robustness point 
of view, it might be good to keep the minRecoveryPoint value in a 
separate file, for example, to avoid rewriting the control file that 
often. Then again, why fix it when it's not broken.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Heikki Linnakangas hlinnakan...@vmware.com writes:
 Well, no-one's complained about the performance. From a robustness point 
 of view, it might be good to keep the minRecoveryPoint value in a 
 separate file, for example, to avoid rewriting the control file that 
 often. Then again, why fix it when it's not broken.

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no?  Why don't we just forbid that case?  This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery


On 13.02.2013 21:21, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

Well, no-one's complained about the performance. From a robustness point
of view, it might be good to keep the minRecoveryPoint value in a
separate file, for example, to avoid rewriting the control file that
often. Then again, why fix it when it's not broken.


It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no?  Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.


The problem is when you interrupt archive recovery (kill -9), and 
restart. After restart, the system needs to know how far the WAL was 
replayed before the crash, because it must not open for hot standby 
queries, or allow the database to be started up in master-mode, until 
it's replayed the WAL up to that same point again.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery


On 13.02.2013 21:03, Tom Lane wrote:

Simon Riggssi...@2ndquadrant.com  writes:

On 13 February 2013 09:04, Heikki Linnakangashlinnakan...@vmware.com  wrote:

To be precise, we'd need to update the control file on every XLogFlush(),
like we do during archive recovery. That would indeed be unacceptable from a
performance point of view. Updating the control file that often would also
be bad for robustness.



If those arguments make sense, then why don't they apply to recovery as well?


In plain old crash recovery, don't the checks on whether to apply WAL
records based on LSN take care of this?


The problem we're trying to solve is determining how much WAL needs to 
be replayed until the database is consistent again. In crash recovery, 
the answer is all of it. That's why the CRC in the WAL is essential; 
it's required to determine where the WAL ends. But if we had some other 
mechanism, like if we updated minRecoveryPoint after every XLogFlush() 
like Simon suggested, we wouldn't necessarily need the CRC to detect end 
of WAL (not that I'd suggest removing it anyway), and we could throw an 
error if there is corrupt bit somewhere in the WAL before the true end 
of WAL.


In archive recovery, we can't just say replay all the WAL, because the 
whole idea of PITR is to not recover all the WAL. So we use 
minRecoveryPoint to keep track of how far the WAL needs to be replayed 
at a minimum, for the database to be consistent.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 13.02.2013 21:21, Tom Lane wrote:
 It would only be broken if someone interrupted a crash recovery
 mid-flight and tried to establish a recovery stop point before the end
 of WAL, no?  Why don't we just forbid that case? This would either be
 the same as, or a small extension of, the pg_control state vs existence
 of recovery.conf error check that was just discussed.

 The problem is when you interrupt archive recovery (kill -9), and 
 restart. After restart, the system needs to know how far the WAL was 
 replayed before the crash, because it must not open for hot standby 
 queries, or allow the database to be started up in master-mode, until 
 it's replayed the WAL up to that same point again.

Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery


On 13.02.2013 21:30, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

On 13.02.2013 21:21, Tom Lane wrote:

It would only be broken if someone interrupted a crash recovery
mid-flight and tried to establish a recovery stop point before the end
of WAL, no?  Why don't we just forbid that case? This would either be
the same as, or a small extension of, the pg_control state vs existence
of recovery.conf error check that was just discussed.



The problem is when you interrupt archive recovery (kill -9), and
restart. After restart, the system needs to know how far the WAL was
replayed before the crash, because it must not open for hot standby
queries, or allow the database to be started up in master-mode, until
it's replayed the WAL up to that same point again.


Well, archive recovery is a different scenario --- Simon was questioning
whether we need a minRecoveryPoint mechanism in crash recovery, or at
least that's what I thought he asked.


Ah, ok. The short answer to that is no, because in crash recovery, we 
just replay the WAL all the way to the end. I thought he was questioning 
updating the control file at every XLogFlush() during archive recovery. 
The answer to that is that it's not so bad, because XLogFlush() is 
called so infrequently during recovery.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Daniel Farina

On Wed, Feb 13, 2013 at 8:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.

 [ scratches head... ]  What's fractal about that?  Or is that just a
 content-free marketing name for this technique?

The name in the literature is Cache Oblivious Lookahead Array, aka
COLA.  The authors also are founders of TokuTek, and seemed to have
take pains to ring-fence mentions of the algorithm with reference to
its patent.

Well, at least nobody can blame them for submarine patent action.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Heikki Linnakangas hlinnakan...@vmware.com writes:
 The problem we're trying to solve is determining how much WAL needs to 
 be replayed until the database is consistent again. In crash recovery, 
 the answer is all of it. That's why the CRC in the WAL is essential; 
 it's required to determine where the WAL ends. But if we had some other 
 mechanism, like if we updated minRecoveryPoint after every XLogFlush() 
 like Simon suggested, we wouldn't necessarily need the CRC to detect end 
 of WAL (not that I'd suggest removing it anyway), and we could throw an 
 error if there is corrupt bit somewhere in the WAL before the true end 
 of WAL.

Meh.  I think that would be disastrous from both performance and
reliability standpoints.  (Performance because the whole point of WAL is
to commit with only one disk write in one place, and reliability because
of greatly increasing the number of writes to the utterly-critical
pg_control file.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 13.02.2013 21:30, Tom Lane wrote:
 Well, archive recovery is a different scenario --- Simon was questioning
 whether we need a minRecoveryPoint mechanism in crash recovery, or at
 least that's what I thought he asked.

 Ah, ok. The short answer to that is no, because in crash recovery, we 
 just replay the WAL all the way to the end. I thought he was questioning 
 updating the control file at every XLogFlush() during archive recovery. 
 The answer to that is that it's not so bad, because XLogFlush() is 
 called so infrequently during recovery.

Right, and it's not so evil from a reliability standpoint either, partly
because of that and partly because, by definition, this isn't your only
copy of the data.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

Hello

probably one from my top ten SQL statement will be

SELECT * FROM some_relation LIMIT 10

what do you thinking about creating special statement for this purpose?

possible syntax

-- ViewTable
\vt table_name [rows]

or

\sample table_name [rows]

a implementation with autocomplete is terrible simple

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alias hstore's ? to ~ so that it works with JDBC

2013-02-13 Thread Kevin Grittner

Seamus Abshere sea...@abshere.net wrote:
 On 2/11/13 1:35 PM, Heikki Linnakangas wrote:
 I agree it's pretty dumb that there's currently no such escape.
 I think JDBC inherited that design mistake from ODBC. Fixing
 that would be a good idea.

 Lance Anderson, Oracle's JDBC spec lead

Wow, there's a name I haven't seen in years.  I used to deal with
him often when he was at Sybase, and once or twice after he moved
to Sun.  Sharp guy.

 says we can implement something like:

 SELECT * FROM tbl WHERE data {postgres qm} 'abc'

That suggestion makes a lot of sense to me.  The curly-brace
escapes are in JDBC for portability, so this seems like a totally
appropriate use; it's surprising it hasn't come up before.

 I guess I see 2 simple options and 1 complex option:

 a) [simple] operator(+) per Heikki
 b) [simple, but not popular] alias ? to ~ per Seamus
 c) [complex] {postgres blah} per Lance/Mark

 How to decide?

We already have code in the JDBC driver to recognize curly-brace
escape sequences.  We just need to add postgres as another
keyword for such escapes and we can do what we like for the rest of
the escape sequence.  A mnemonic, such as Lance suggests, does seem
like a good approach.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013-02-13 Thread Stephen Frost

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
 SELECT * FROM some_relation LIMIT 10
 
 what do you thinking about creating special statement for this purpose?

I'd rather extend TABLE to support a limit clause or something.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.2.3 crashes during archive recovery


On 13.02.2013 17:02, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

At least in back-branches, I'd call this a pilot error. You can't turn a
master into a standby just by creating a recovery.conf file. At least
not if the master was not shut down cleanly first.
...
I'm not sure that's worth the trouble, though. Perhaps it would be
better to just throw an error if the control file state is
DB_IN_PRODUCTION and a recovery.conf file exists.


+1 for that approach, at least until it's clear there's a market for
doing this sort of thing.  I think the error check could be
back-patched, too.


Hmm, I just realized a little problem with that approach. If you take a 
base backup using an atomic filesystem backup from a running server, and 
start archive recovery from that, that's essentially the same thing as 
Kyotaro's test case.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013/2/13 Stephen Frost sfr...@snowman.net:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 SELECT * FROM some_relation LIMIT 10

 what do you thinking about creating special statement for this purpose?

 I'd rather extend TABLE to support a limit clause or something.

??

Pavel


 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Alias hstore's ? to ~ so that it works with JDBC


On 13.02.2013 22:17, Kevin Grittner wrote:

Seamus Absheresea...@abshere.net  wrote:

  SELECT * FROM tbl WHERE data {postgres qm} 'abc'


That suggestion makes a lot of sense to me.  The curly-brace
escapes are in JDBC for portability, so this seems like a totally
appropriate use; it's surprising it hasn't come up before.


+1 for something like this. Rather than a straight {postgres qm} to 
represent a question mark, though, perhaps it would be more readable to 
allow something like this though:


{postgres SELECT * FROM tbl WHERE data ? 'abc' AND column = $1}

Ie. within {postgres ... }, ? means a question mark, not a parameter 
marker, and for parameter markers, you use the PostgreSQL syntax, $n.


Also, should we consider how to escape { and } characters? They're not a 
problem at the moment, but while we're at it...



I guess I see 2 simple options and 1 complex option:

a) [simple] operator(+) per Heikki
b) [simple, but not popular] alias ? to ~ per Seamus
c) [complex] {postgres blah} per Lance/Mark

How to decide?


We already have code in the JDBC driver to recognize curly-brace
escape sequences.  We just need to add postgres as another
keyword for such escapes and we can do what we like for the rest of
the escape sequence.  A mnemonic, such as Lance suggests, does seem
like a good approach.


We could support operator(?) as well; belt and suspenders. That would 
help ODBC clients too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013-02-13 Thread Erik Rijkers

On Wed, February 13, 2013 21:23, Stephen Frost wrote:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 SELECT * FROM some_relation LIMIT 10

 what do you thinking about creating special statement for this purpose?

 I'd rather extend TABLE to support a limit clause or something.


No need; that already does work, e.g.:

testdb=# table pg_database limit 3;

Not in the documentation, but I hope it won't get removed -- it's quite handy





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013-02-13 Thread Stephen Frost

* Erik Rijkers (e...@xs4all.nl) wrote:
 No need; that already does work, e.g.:
 
 testdb=# table pg_database limit 3;

Oh.

 Not in the documentation, but I hope it won't get removed -- it's quite handy

Perhaps we should add it. :)

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013/2/13 Stephen Frost sfr...@snowman.net:
 * Erik Rijkers (e...@xs4all.nl) wrote:
 No need; that already does work, e.g.:

 testdb=# table pg_database limit 3;

 Oh.

 Not in the documentation, but I hope it won't get removed -- it's quite handy

 Perhaps we should add it. :)

my proposal is little but shorter

just

\vt table

and I expect so limit 10 is default

table statement is little bit different creature (and mainly it is server side)

Regards

Pavel


 Thanks!

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013-02-13 Thread Stephen Frost

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
 and I expect so limit 10 is default
 
 table statement is little bit different creature (and mainly it is server 
 side)

I don't really see the value in this.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.2.3 crashes during archive recovery

On 13 February 2013 09:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Without step 3, the server would perform crash recovery, and it would work.
 But because of the recovery.conf file, the server goes into archive
 recovery, and because minRecoveryPoint is not set, it assumes that the
 system is consistent from the start.

 Aside from the immediate issue with truncation, the system really isn't
 consistent until the WAL has been replayed far enough, so it shouldn't open
 for hot standby queries. There might be other, later, changes already
 flushed to data files. The system has no way of knowing how far it needs to
 replay the WAL to become consistent.

 At least in back-branches, I'd call this a pilot error. You can't turn a
 master into a standby just by creating a recovery.conf file. At least not if
 the master was not shut down cleanly first.

 If there's a use case for doing that, maybe we can do something better in
 HEAD. If the control file says that the system was running
 (DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
 recovery first, until we reach the end of WAL, and go into archive recovery
 mode after that. We'd recover all the WAL files in pg_xlog as far as we can,
 same as in crash recovery, and only start restoring files from the archive
 once we reach the end of WAL in pg_xlog. At that point, we'd also consider
 the system as consistent, and start up for hot standby.

 I'm not sure that's worth the trouble, though. Perhaps it would be better to
 just throw an error if the control file state is DB_IN_PRODUCTION and a
 recovery.conf file exists. The admin can always start the server normally
 first, shut it down cleanly, and then create the recovery.conf file.

Now I've read the whole thing...

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.

But I think that still gives you a timeline problem when putting a
master back into a standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013/2/13 Stephen Frost sfr...@snowman.net:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 and I expect so limit 10 is default

 table statement is little bit different creature (and mainly it is server 
 side)

 I don't really see the value in this.

it is just shortcut for often used query - nothing more

and with larger tables you don't need to forget LIMIT clause

Regards

Pavel


 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

Stephen Frost sfr...@snowman.net writes:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 SELECT * FROM some_relation LIMIT 10
 
 what do you thinking about creating special statement for this purpose?

 I'd rather extend TABLE to support a limit clause or something.

Can't you pretty much do this already in psql with FETCH_COUNT?  I see
no good reason to invent more SQL syntax.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: PATCH: Split stats file per database WAS: [HACKERS] autovacuum stress-testing our system

Here's an updated version of this patch that takes care of the issues I
reported previously: no more repalloc() of the requests array; it's now
an slist, which makes the code much more natural IMV.  And no more
messing around with doing sprintf to create a separate sprintf pattern
for the per-db stats file; instead have a function to return the name
that uses just the pgstat dir as stored by GUC.  I think this can be
further simplified still.

I haven't reviewed the rest yet; please do give this a try to confirm
that the speedups previously reported are still there (i.e. I didn't
completely blew it).

Thanks

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 38,43 
--- 38,44 
  #include access/xact.h
  #include catalog/pg_database.h
  #include catalog/pg_proc.h
+ #include lib/ilist.h
  #include libpq/ip.h
  #include libpq/libpq.h
  #include libpq/pqsignal.h
***
*** 66,73 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_FILENAME		global/pgstat.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		global/pgstat.tmp
  
  /* --
   * Timer definitions.
--- 67,75 
   * Paths for the statistics files (relative to installation's $PGDATA).
   * --
   */
! #define PGSTAT_STAT_PERMANENT_DIRECTORY		pg_stat
! #define PGSTAT_STAT_PERMANENT_FILENAME		pg_stat/global.stat
! #define PGSTAT_STAT_PERMANENT_TMPFILE		pg_stat/global.tmp
  
  /* --
   * Timer definitions.
***
*** 115,120  int			pgstat_track_activity_query_size = 1024;
--- 117,123 
   * Built from GUC parameter
   * --
   */
+ char	   *pgstat_stat_directory = NULL;
  char	   *pgstat_stat_filename = NULL;
  char	   *pgstat_stat_tmpname = NULL;
  
***
*** 219,229  static int	localNumBackends = 0;
   */
  static PgStat_GlobalStats globalStats;
  
! /* Last time the collector successfully wrote the stats file */
! static TimestampTz last_statwrite;
  
! /* Latest statistics request time from backends */
! static TimestampTz last_statrequest;
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
--- 222,237 
   */
  static PgStat_GlobalStats globalStats;
  
! /* Write request info for each database */
! typedef struct DBWriteRequest
! {
! 	Oid			databaseid;		/* OID of the database to write */
! 	TimestampTz request_time;	/* timestamp of the last write request */
! 	slist_node	next;
! } DBWriteRequest;
  
! /* Latest statistics request time from backends for each DB */
! static slist_head	last_statrequests = SLIST_STATIC_INIT(last_statrequests);
  
  static volatile bool need_exit = false;
  static volatile bool got_SIGHUP = false;
***
*** 252,262  static void pgstat_sighup_handler(SIGNAL_ARGS);
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  	 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
--- 260,276 
  static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
  static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,
  	 Oid tableoid, bool create);
! static void pgstat_write_statsfile(bool permanent, bool force);
! static void pgstat_write_db_statsfile(PgStat_StatDBEntry * dbentry, bool permanent);
! static void pgstat_write_db_dummyfile(Oid databaseid);
! static HTAB *pgstat_read_statsfile(Oid onlydb, bool permanent, bool onlydbs);
! static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
  static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);
  
+ static bool pgstat_write_statsfile_needed(void);
+ static bool pgstat_db_requested(Oid databaseid);
+ 
  static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
  static void pgstat_send_funcstats(void);
  static HTAB *pgstat_collect_oids(Oid catalogid);
***
*** 285,291  static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int le
  static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
  static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
  
- 
  /* 
   * Public functions called from postmaster follow
   * 
--- 299,304 
***
*** 549,556

Re: [HACKERS] Btrfs clone WIP patch

Jonathan Rogers jrog...@socialserve.com writes:
 This patch against PostgreSQL 9.1.8 takes advantage of efficient file
 cloning on Linux Btrfs file systems to make CREATE DATABASE operations
 extremely fast regardless of the size of the database used as a
 template.

It would be easier to review this patch if the bulk of it weren't simple
reindentation of existing code.  (Or at least it ought to be that ---
I object to your having moved the buffer palloc inside the loop.  A
patch that is trying to optimize a minority case can expect to be
rejected if it makes things worse for everyone else.)

Consider whether you can't phrase the patch to avoid that, perhaps by
use of continue instead of an else-block.  Alternatively, enclose the
existing code in braces but don't reindent it, ie,

+   if (whatever)
+   ... new code ...
+   else
+   {
... existing code ...
+   }

The next pgindent run will fix the funny indentation, or the committer
can do it if he wishes after reviewing.

 The efficient cloning is accomplished by a Btrfs-specific ioctl() call.

The big-picture question of course is whether we want to carry and
maintain a filesystem-specific hack.  I don't have a sense that btrfs
is so widely used as to justify this.

 +#ifdef __linux__
 +# define BTRFS_IOCTL_MAGIC 0x94
 +# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int)
 + return ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd);
 +#else

This seems to me to be unacceptable on its face.  If we can't get these
constants out of a system header file, it's unlikely that the feature is
stable enough to depend on, if indeed it's meant for general-purpose use
at all.  We could easily end up invoking unexpected behaviors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btrfs clone WIP patch

2013-02-13 Thread Jonathan Rogers

Tom Lane wrote:
 Jonathan Rogers jrog...@socialserve.com writes:
 This patch against PostgreSQL 9.1.8 takes advantage of efficient file
 cloning on Linux Btrfs file systems to make CREATE DATABASE operations
 extremely fast regardless of the size of the database used as a
 template.
 
 It would be easier to review this patch if the bulk of it weren't simple
 reindentation of existing code.  (Or at least it ought to be that ---
 I object to your having moved the buffer palloc inside the loop.  A
 patch that is trying to optimize a minority case can expect to be
 rejected if it makes things worse for everyone else.)

The buffer allocation is actually not inside the loop, but inside the if
branch for ordinary copying behavior since the buffer is unnecessary in
the case of a successful clone.

 
 Consider whether you can't phrase the patch to avoid that, perhaps by
 use of continue instead of an else-block.  Alternatively, enclose the
 existing code in braces but don't reindent it, ie,
 
 + if (whatever)
 + ... new code ...
 + else
 + {
   ... existing code ...
 + }
 

Indeed, I was bothered by the need to reindent so much as well. I'll see
if I can do better.

 The next pgindent run will fix the funny indentation, or the committer
 can do it if he wishes after reviewing.
 
 The efficient cloning is accomplished by a Btrfs-specific ioctl() call.
 
 The big-picture question of course is whether we want to carry and
 maintain a filesystem-specific hack.  I don't have a sense that btrfs
 is so widely used as to justify this.

Yes, this is a problem I considered. I think the basic problem is the
lack of any kind of generic interface to copy or clone a file. A system
call for Linux to copy or clone has been proposed more than once but so
far, nothing has been accepted. I believe there are a few file systems
that support some kind of efficient cloning, but I haven't investigated
it deeply.

 
 +#ifdef __linux__
 +# define BTRFS_IOCTL_MAGIC 0x94
 +# define BTRFS_IOC_CLONE _IOW (BTRFS_IOCTL_MAGIC, 9, int)
 +return ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd);
 +#else
 
 This seems to me to be unacceptable on its face.  If we can't get these
 constants out of a system header file, it's unlikely that the feature is
 stable enough to depend on, if indeed it's meant for general-purpose use
 at all.  We could easily end up invoking unexpected behaviors.

Of course you're right that defining values right there is no good. It
looks like the values are in the Linux headers since 2.6.32 when Btrfs
was merged into mainline. I guess I'll need to brush up on CPP to figure
out how to use the Linux header values if they exist.

Would it be better to move clone_file() into its own module where
implementations for other file system types might eventually be added?

My first implementation called cp with the --reflink=auto option since
that seems to be the closest thing to a file system agnostic interface.
The above snippet comes directly from the GNU cp source and I'm not sure
why that code defines the values instead of taking them from Linux headers.

-- 
Jonathan Ross Rogers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btrfs clone WIP patch

2013-02-13 Thread Josh Berkus

On 02/13/2013 02:13 PM, Tom Lane wrote:
 The big-picture question of course is whether we want to carry and
 maintain a filesystem-specific hack.  I don't have a sense that btrfs
 is so widely used as to justify this.

If this is a valuable hack, it seems like it could work on ZFS as well.
 If we could make it for any snapshot-capable filesystem, and not just
BTRFS, then it would make more sense.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btrfs clone WIP patch

2013-02-13 Thread Jonathan Rogers

Josh Berkus wrote:
 On 02/13/2013 02:13 PM, Tom Lane wrote:
 The big-picture question of course is whether we want to carry and
 maintain a filesystem-specific hack.  I don't have a sense that btrfs
 is so widely used as to justify this.
 
 If this is a valuable hack, it seems like it could work on ZFS as well.
  If we could make it for any snapshot-capable filesystem, and not just
 BTRFS, then it would make more sense.
 

Yes, that's exactly what I hope will be possible. I need to investigate
ZFS to see if it's feasible.

-- 
Jonathan Ross Rogers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btrfs clone WIP patch

2013-02-13 Thread Phil Sorber

On Wed, Feb 13, 2013 at 5:48 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/13/2013 02:13 PM, Tom Lane wrote:
 The big-picture question of course is whether we want to carry and
 maintain a filesystem-specific hack.  I don't have a sense that btrfs
 is so widely used as to justify this.

 If this is a valuable hack, it seems like it could work on ZFS as well.
  If we could make it for any snapshot-capable filesystem, and not just
 BTRFS, then it would make more sense.

I was thinking that too, but I think this is a file level clone, not a
whole filesystem. As far as I can tell, you can't clone individual
files in ZFS.


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Btrfs clone WIP patch

Jonathan Rogers jrog...@socialserve.com writes:
 Would it be better to move clone_file() into its own module where
 implementations for other file system types might eventually be added?

Yeah, possibly.  I considered suggesting that the current code be
treated as a fallback implementation of clone_file, but I'm not sure
if there's a convenient way to manage the run-time fallback if we try
to do it like that.  In any case, +1 for leaving the door open for
easy addition of other cloning techniques.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013-02-13 Thread Ian Lawrence Barwick

2013/2/14 Tom Lane t...@sss.pgh.pa.us:
 Stephen Frost sfr...@snowman.net writes:
 * Pavel Stehule (pavel.steh...@gmail.com) wrote:
 SELECT * FROM some_relation LIMIT 10

 what do you thinking about creating special statement for this purpose?

 I'd rather extend TABLE to support a limit clause or something.

 Can't you pretty much do this already in psql with FETCH_COUNT?  I see
 no good reason to invent more SQL syntax.

Doesn't that just split up the retrieval of the result set into blocks of
FETCH_COUNT rows, i.e. does not limit the result set?

Personally I set commonly-used queries as a psql variable, though
what Pavel suggests sounds useful and AFAICT is not additional SQL
syntax, just (yet another) psql slash command.

Ian Barwick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

Ian Lawrence Barwick barw...@gmail.com writes:
 2013/2/14 Tom Lane t...@sss.pgh.pa.us:
 Can't you pretty much do this already in psql with FETCH_COUNT?  I see
 no good reason to invent more SQL syntax.

 Doesn't that just split up the retrieval of the result set into blocks of
 FETCH_COUNT rows, i.e. does not limit the result set?

Well, you'll get a page worth of data before your pager blocks it,
and then you can continue, or not, for relatively little cost.
I can't see that fetching a fixed number of rows is more useful
than that.

[ experiments... ]  Although I notice that psql's FETCH_COUNT logic
doesn't realize that it could be applied to a TABLE foo query.
That seems like it might be worth fixing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement

2013-02-13 Thread Fabrízio de Royes Mello

On Wed, Feb 13, 2013 at 6:07 PM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 probably one from my top ten SQL statement will be

 SELECT * FROM some_relation LIMIT 10

 what do you thinking about creating special statement for this purpose?

 possible syntax

 -- ViewTable
 \vt table_name [rows]

 or

 \sample table_name [rows]

 a implementation with autocomplete is terrible simple



I liked this idea, but thinking better we can implement a way to users
create your own meta-commands to run:

* another meta commands (like an alias)
* SRFs
* arbitrary SQLs

All of them must accept arguments... some like this:

\mset vt :table :rows 'select * from :table limit :rows'

Then we can do exactly what you need:

\vt foo 10

\unset vt

I don't know if it's a desired feature but I would love if it exists.

Best regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Greg Stark

On Wed, Feb 13, 2013 at 1:31 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark st...@mit.edu wrote:

 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.


 It will, for sure, works well when atomic page changes are enough for us.
 However, some operations, for example, page splits, contain changes in
 multiple pages. Replaying changes in only some of pages is not fair. Now,
 it's hard for me to imagine how to generalize it into generic WAL record
 type.

I think multiple-page operations where you have all the pages locked
at the same time, like tuple updates for example, are fairly simple.
The existing WAL format can handle at most three such buffers in a
single record so we can just have a fixed size buffer large enough to
hold three buffers and perform the diff on the three when the
extension says it has completed an atomic update. The simplest API
which I suspect would suffice for virtually all users would be to tie
this to buffer locking and unlocking.

The types of records which this would not suffice for would be

a) things that need extra book-keeping during recovery in case some
later record was not recorded. We have only a few such records -- page
splits as you say -- and hopefully extensions wouldn't need them
because frankly they're pretty scary.

b) records that are needed not just to maintain consistency of the
data in the database but to provide some other behaviour -- for
instance the WAL records for locks that 2PC needs or the snapshot
records that hot standby needs. But then these types of records might
be more amenable to an extensible WAL format. Since the loss of them
wouldn't leave the database corrupted, just prevent that feature from
operating correctly.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-02-13 Thread Noah Misch

On Tue, Feb 12, 2013 at 03:22:17AM +, Greg Stark wrote:
 But that said I'm not sure saying the whole file is in an encoding is
 the right approach. Paths are actually binary strings. any encoding is
 purely for display purposes anyways.

For Unix, yes.  On Windows, they're ultimately UTF16 strings; some system APIs
accept paths in the Windows ANSI code page and convert to UTF16 internally.
Nonetheless, good point.

 What parts of postgresql.conf are actually encoded strings that need
 to be (and can be) manipulated as encoded strings?

Mainly the ones that refer to arbitrary database objects.  At least these:

default_tablespace
default_text_search_config
search_path
temp_tablespaces


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW for PostgreSQL

Shigeru Hanada shigeru.han...@gmail.com writes:
 [ postgres_fdw.v5.patch ]

I started to look at this patch today.  There seems to be quite a bit
left to do to make it committable.  I'm willing to work on it, but
there are some things that need discussion:

* The code seems to always use GetOuterUserId() to select the foreign
user mapping to use.  This seems entirely wrong.  For instance it will
do the wrong thing inside a SECURITY DEFINER function, where surely the
relevant privileges should be those of the function owner, not the
session user.  I would also argue that if Alice has access to a foreign
table owned by Bob, and Alice creates a view that selects from that
table and grants select privilege on the view to Charlie, then when
Charlie selects from the view the user mapping to use ought to be
Alice's.  (If anyone thinks differently about that, speak up!)
To implement that for queries, we need code similar to what
ExecCheckRTEPerms does, ie rte-checkAsUser ? rte-checkAsUser :
GetUserId().  It's a bit of a pain to get hold of the RTE from
postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
(Should we modify the APIs for these functions to make that easier?)
I think possibly postgresAcquireSampleRowsFunc should use the foreign
table's owner regardless of the current user ID - if the user has
permission to run ANALYZE then we don't really want the command to
succeed or fail depending on exactly who the user is.  That's perhaps
debatable, anybody have another theory?

* AFAICT, the patch expects to use a single connection for all
operations initiated under one foreign server + user mapping pair.
I don't think this can possibly be workable.  For instance, we don't
really want postgresIterateForeignScan executing the entire remote query
to completion and stashing the results locally -- what if that's many
megabytes?  It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection.  (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)
I think we'd better be prepared to allow multiple similar connections.
The main reason I'm bringing this up now is that it breaks the
assumption embodied in postgres_fdw_get_connections() and 
postgres_fdw_disconnect() that foreign server + user mapping can
constitute a unique key for identifying connections.  However ...

* I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
to be a bad idea altogether.  These connections ought to be a hidden
implementation matter, not something that the user has a view of, much
less control over.  Aside from the previous issue, I believe it's a
trivial matter to crash the patch as it now stands by applying
postgres_fdw_disconnect() to a connection that's in active use.  I can
see the potential value in being able to shut down connections when a
session has stopped using them, but this is a pretty badly-designed
approach to that.  I suggest that we just drop these functions for now
and revisit that problem later.  (One idea is some sort of GUC setting
to control how many connections can be held open speculatively for
future use.)

* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end.  This doesn't seem like
a maintainable approach.  Was there a specific reason not to try to use
ruleutils.c for this?  I'd much rather tweak ruleutils to expose some
additional APIs, if that's what it takes, than have all this redundant
logic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Temporal features in PostgreSQL

2013-02-13 Thread Vlad Arkhipov

On 02/04/2013 07:40 PM, Miroslav Šimulčík wrote:

Hi Vlad,

I'm also interested in this topic and work on system-time temporal
extension. Here I wrote down design of my solution few months ago
https://wiki.postgresql.org/wiki/SQL2011Temporal. The idea is
basically the same as in your solution with some minor differences.
For example:
- I use after triggers to store old versions of rows into
historical table, so the row is archived only if modification is
actualy executed.
Then other BEFORE triggers are not able to see what time is going to be
inserted into the table. I considered using two triggers, BEFORE trigger
for setting the period and AFTER trigger for archiving rows into the
history table, but did not find any use cases when it can be better than
just a simple BEFORE trigger.

- I don't need to deal with update conflicts, because I use
clock_timestamp() instead of current_timestamp.
You can still come across a conflict even with clock_timestamp(). What
if clocks go back during the time synchronization? Even if you have
absolutely precious clocks, there are may be clock skew on different
CPUs, low system clock time resolution, etc.

Although my solution needs changes in parser to stick with SQL 2011
standard, maybe you can find something that can help you.
I believe that SQL-2011 standard temporal features are not too abstract
for PostgreSQL to be implemented as a core feature. They have only two
temporal periods: application period (which is controlled by
application/user) and system time (which is controlled by
system/database, but you cannot specify *how* the system control it),
they does not use a special type for storing periods (which is
unefficient), they are tied to DATE/TIMESTAMP types (what if you need to
store revision numbers instead of time?)

Regards,
Miro

2012/12/25 Vlad Arkhipov arhi...@dc.baikal.ru
mailto:arhi...@dc.baikal.ru

Hi all,

Currently I'm working on a large enterprise project that heavily
uses temporal features. We are using PostgreSQL database for data
storage. Now we are using PL/pgSQL trigger-based and
application-based solutions to handle with temporal data. However
we would like to see this functionality in PostgreSQL core,
especially in SQL 2011 syntax. There were some discussions several
months ago on temporal support and audit logs:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00765.php
http://archives.postgresql.org/pgsql-hackers/2012-08/msg00680.php

But currently it seems that there is no active work in this area
(am I wrong?) Now I'm rewriting our temporal solutions into an
extension that is based on C-language triggers to get a better
sense of the problem space and various use cases. There are two
aspects that temporal features usually include: system-time (aka
transaction-time) and application-time (aka valid-time or
business-time). The topics above discussed only the first one.
However there is also another one, which includes application-time
periods, partial updated/deletes queries, querying for a portion
of application time etc. Details can be found here

http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf

or in the SQL-2011 Standard Draft which is available freely on the
network. It's hard to create a convenient extension for
application-time periods because it needs the parser to be changed
(however an extension may be useful for referential integrity
checks for application-time period temporal tables).

I created a simple solution for system-time period temporal
tables, that consist of only one trigger (it resembles
SPI/timetravel trigger but is based on new range types that were
introduced in PostgreSQL 9.2 and it's closer to the SQL-2011
approach for implementation of temporal features).

http://pgxn.org/dist/temporal_tables/1.0.0/

I'm not a PostgreSQL expert, so I would appreciate if someone
could review the code briefly. There are some places I'm not sure
I use some functions properly. Also there are some slight problems
with the design that I would like to discuss if anyone is
interested in.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org

mailto:pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade old cluster delete script

2013-02-13 Thread Bruce Momjian

You might remember this pg_upgrade bug report where the user complained
that user-defined tablespaces _inside_ the old cluster directory were
deleted by the old cluster delete script:


http://www.postgresql.org/message-id/e1thpdm-00018c...@wrigleys.postgresql.org

and my reply that we don't want to fix this:

http://www.postgresql.org/message-id/20121004191201.ga11...@momjian.us

However, creating a script that deletes data is never a good idea so I
have created the attached patch to avoid creating the script in these
cases, and issue a message to the user.

The test isn't perfect but it is probably close enough.  I don't think
it is worth unlinking a possibly matching filename in such cases.
 
-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index a7d4a68..0be90ae
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** output_completion_banner(char *analyze_s
*** 246,255 
  		by pg_upgrade so, once you start the new server, consider running:\n
  			   %s\n\n, analyze_script_file_name);
  
! 	pg_log(PG_REPORT,
! 		   Running this script will delete the old cluster's data files:\n
! 		   %s\n,
! 		   deletion_script_file_name);
  }
  
  
--- 246,262 
  		by pg_upgrade so, once you start the new server, consider running:\n
  			   %s\n\n, analyze_script_file_name);
  
! 
! 	if (deletion_script_file_name)
! 		pg_log(PG_REPORT,
! 			   Running this script will delete the old cluster's data files:\n
! 			   %s\n,
! 			   deletion_script_file_name);
! 	else
! 		pg_log(PG_REPORT,
! 			   Could not create a script to delete the old cluster's data files\n
! 			   because user-defined tablespaces exist in the old cluster directory.\n
! 			   The old cluster's contents must be deleted manually.\n);
  }
  
  
*** create_script_for_old_cluster_deletion(c
*** 585,590 
--- 592,609 
  	FILE	   *script = NULL;
  	int			tblnum;
  
+ 	/*
+ 	 *	Some users (oddly) create tablespaces inside the cluster data
+ 	 *	directory.  We can't create a proper old cluster delete script
+ 	 *	in that case.
+ 	 */
+ 	for (tblnum = 0; tblnum  os_info.num_old_tablespaces; tblnum++)
+ 	{
+ 		if (strncmp(os_info.old_tablespaces[tblnum], old_cluster.pgdata,
+ 			strlen(old_cluster.pgdata)) == 0)
+ 			return;
+ 	}
+ 
  	*deletion_script_file_name = pg_malloc(MAXPGPATH);
  
  	prep_status(Creating script to delete old cluster);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW for PostgreSQL

2013-02-13 Thread Kohei KaiGai

2013/2/14 Tom Lane t...@sss.pgh.pa.us:
 * deparse.c contains a depressingly large amount of duplication of logic
 from ruleutils.c, and can only need more as we expand the set of
 constructs that can be pushed to the remote end.  This doesn't seem like
 a maintainable approach.  Was there a specific reason not to try to use
 ruleutils.c for this?  I'd much rather tweak ruleutils to expose some
 additional APIs, if that's what it takes, than have all this redundant
 logic.

The original pgsql_fdw design utilized ruleutils.c logic.
Previously, you suggested to implement its own logic for query deparsing,
then Hanada-san rewrite the relevant code.
  http://www.postgresql.org/message-id/12181.1331223...@sss.pgh.pa.us

Indeed, most of the logic is duplicated. However, it is to avoid bugs in
some corner cases, for instance, function name is not qualified with
schema even if this function is owned by different schema in remote side.
Do we add a flag on deparse_expression() to show this call intend to
construct remote executable query? It may be reasonable, but case-
branch makes code complicated in general

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: FILTER for aggregates [was Re: [HACKERS] Department of Redundancy Department: makeNode(FuncCall) division]

2013-02-13 Thread David Fetter

On Wed, Feb 13, 2013 at 06:45:31AM -0800, David Fetter wrote:
 On Sat, Feb 09, 2013 at 11:59:22PM -0800, David Fetter wrote:
  Folks,
  
  Per suggestions and lots of help from Andrew Gierth, please find
  attached a patch to clean up the call sites for FuncCall nodes, which
  I'd like to expand centrally rather than in each of the 37 (or 38, but
  I only redid 37) places where it's called.  The remaining one is in
  src/backend/nodes/copyfuncs.c, which has to be modified for any
  changes in the that struct anyhow.
  
  The immediate purpose is two-fold: to reduce some redundancies, which
  I believe is worth doing in and of itself, and to prepare for adding
  FILTER on aggregates from the spec, and possibly other things in
  the aggregate function part of the spec.
  
  Cheers,
  David.
 
 Folks,
 
 Please find attached two versions of a patch which provides optional
 FILTER clause for aggregates (T612, Advanced OLAP operations).
 
 The first is intended to be applied on top of the previous patch, the
 second without it.

I'll find a brown paper back to wear over my head at some point, but
meanwhile, here's a cleaned-up version of the patch that doesn't use
makeFuncArgs, now without merge artifacts and with the ability to
actually compile.  It's still WIP in the sense previously mentioned.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
***
*** 4395,4400  ExecInitExpr(Expr *node, PlanState *parent)
--- 4395,4401 
  
astate-args = (List *) 
ExecInitExpr((Expr *) aggref-args,

 parent);
+   astate-agg_filter = 
ExecInitExpr(aggref-agg_filter, parent);
  
/*
 * Complain if the aggregate's 
arguments contain any
***
*** 4433,4438  ExecInitExpr(Expr *node, PlanState *parent)
--- 4434,4440 
  
wfstate-args = (List *) 
ExecInitExpr((Expr *) wfunc-args,

  parent);
+   wfstate-agg_filter = 
ExecInitExpr(wfunc-agg_filter, parent);
  
/*
 * Complain if the windowfunc's 
arguments contain any
*** a/src/backend/executor/functions.c
--- b/src/backend/executor/functions.c
***
*** 364,370  sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, 
Node *var)
  
list_make1(subfield),
  
list_make1(param),
  NIL, false, 
false, false,
! NULL, true, 
cref-location);
}
  
return param;
--- 364,370 
  
list_make1(subfield),
  
list_make1(param),
  NIL, false, 
false, false,
! NULL, NULL, 
true, cref-location);
}
  
return param;
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
***
*** 487,492  advance_aggregates(AggState *aggstate, AggStatePerGroup 
pergroup)
--- 487,504 
int i;
TupleTableSlot *slot;
  
+   /* Skip anything FILTERed out */
+   ExprState *filter = peraggstate-aggrefstate-agg_filter;
+   if (filter)
+   {
+   MemoryContext oldcontext = 
MemoryContextSwitchTo(aggstate-tmpcontext-ecxt_per_tuple_memory);
+   bool isnull;
+   Datum res = ExecEvalExpr(filter, aggstate-tmpcontext, 
isnull, NULL);
+   MemoryContextSwitchTo(oldcontext);
+   if (isnull || !DatumGetBool(res))
+   continue;
+   }
+ 
/* Evaluate the current input expressions for this aggregate */
slot = ExecProject(peraggstate-evalproj, NULL);
  
*** a/src/backend/executor/nodeWindowAgg.c
--- b/src/backend/executor/nodeWindowAgg.c
***
*** 226,234

Re: [HACKERS] BUG #7493: Postmaster messages unreadable in a Windows console

2013-02-13 Thread Alexander Law


Hello,

Alexander Law exclus...@gmail.com writes:

Please look at the following l10n bug:
http://www.postgresql.org/message-id/502a26f1.6010...@gmail.com
and the proposed patch.

With your proposed change, the problem will resurface in an actual SQL_ASCII
database.  At the problem's root is write_console()'s assumption that messages
are in the database encoding.  pg_bind_textdomain_codeset() tries to make that
so, but it only works for encodings with a pg_enc2gettext_tbl entry.  That
excludes SQL_ASCII, MULE_INTERNAL, and others.  write_console() needs to
behave differently in such cases.
Thank you for the notice. So it seems that DatabaseEncoding variable 
alone can't present a database encoding (for communication with a 
client) and current process messages encoding (for logging messages) at 
once. There should be another variable, something like 
CurrentProcessEncoding, that will be set to OS encoding at start and can 
be changed to encoding of a connected database (if 
bind_textdomain_codeset succeeded).



On Tue, Feb 12, 2013 at 03:22:17AM +, Greg Stark wrote:

But that said I'm not sure saying the whole file is in an encoding is
the right approach. Paths are actually binary strings. any encoding is
purely for display purposes anyways.

For Unix, yes.  On Windows, they're ultimately UTF16 strings; some system APIs
accept paths in the Windows ANSI code page and convert to UTF16 internally.
Nonetheless, good point.
Yes, and if postresql.conf not going to be UTF16 encoded, it seems 
natural to use ANSI code page on Windows to write such paths in it.
So the paths should be written in OS encoding, which is accepted by OS 
functions, such as fopen. (This is what we have now.)
And it seems too complicated to have different encodings in one file. Or 
maybe path parameters should be separated from the others, for which OS 
encoding is undesirable.

If we knew that postgresql.conf was stored in, say, UTF8, then it would
probably be possible to perform encoding conversion to get string
variables into the database encoding.  Perhaps we should allow some
magic syntax to tell us the encoding of a config file?

 file_encoding = 'utf8'  # must precede any non-ASCII in the file

If we're going to do that we might as well use the Emacs standard
-*-coding: latin-1;-*-


Explicit encoding specification such as these (or even ?xml 
version=1.0 encoding=utf-8?) can be useful but what encoding to 
assume without it? For XML (without BOM) it's UTF-8, for emacs it 
depends on it's language environment.
If postgresql.conf doesn't have to be portable (as XML), then IMO OS 
encoding is the right choice for it.



Best regards,
Alexander

Re: [HACKERS] proposal or just idea for psql - show first N rows from relation backslash statement