Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Robert Haas
On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker  wrote:
>> I don't see how.  There really is no declaration in there for a
>> variable called server.
>
> Absolutely correct. My only guess is that it was from failing to make clean
> after a checkout/re-checkout. A good reason to have even boring regression
> tests.
>
> Regression tests added.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Robert Haas
On Wed, Feb 3, 2016 at 11:17 AM, Ashutosh Bapat
 wrote:
> There seems to be double successive assignment to fdw_private in recent
> commit. Here's patch to remove the first one.

Committed along with a fix for another problem I noted along the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Ashutosh Bapat
There seems to be double successive assignment to fdw_private in recent
commit. Here's patch to remove the first one.

On Wed, Feb 3, 2016 at 7:40 PM, Robert Haas  wrote:

> On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker 
> wrote:
> >> I don't see how.  There really is no declaration in there for a
> >> variable called server.
> >
> > Absolutely correct. My only guess is that it was from failing to make
> clean
> > after a checkout/re-checkout. A good reason to have even boring
> regression
> > tests.
> >
> > Regression tests added.
>
> Committed.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index d5c0383..c95ac05 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1009,22 +1009,20 @@ postgresGetForeignPlan(PlannerInfo *root,
 * expressions to be sent as parameters.
 */
initStringInfo();
deparseSelectStmtForRel(, root, baserel, remote_conds,

best_path->path.pathkeys, _attrs,
_list);
/*
 * Build the fdw_private list that will be available to the executor.
 * Items in the list must match enum FdwScanPrivateIndex, above.
 */
-   fdw_private = list_make2(makeString(sql.data),
-retrieved_attrs);
fdw_private = list_make3(makeString(sql.data),
 retrieved_attrs,
 
makeInteger(fpinfo->fetch_size));
 
/*
 * Create the ForeignScan node from target list, filtering expressions,
 * remote parameter expressions, and FDW private information.
 *
 * Note that the remote parameter expressions are stored in the 
fdw_exprs
 * field of the finished plan node; we can't keep them in private state

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-02 Thread Robert Haas
On Wed, Jan 27, 2016 at 11:19 PM, Corey Huinker  wrote:
> The possible tests might be:
> - creating a server/table with fetch_size set
> - altering an existing server/table to have fetch_size set
> - verifying that the value exists in pg_foreign_server.srvoptions and
> pg_foreign_table.ftoptions.
> - attempting to set fetch_size to a non-integer value

I'd add a test that does one of the first two and skip the others.
I'm not wedded to that exact thing; that's just a suggestion.

> Enjoy.

I'd enjoy it more if, heh heh, it compiled.

postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
foreach(lc, server->options)
^
../../src/include/nodes/pg_list.h:153:26: note: expanded from macro 'foreach'
for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
^
1 error generated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-02 Thread Corey Huinker
>
>
> postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
> foreach(lc, server->options)
>

Odd. Compiled for me. Maybe I messed up the diff. Will get back to you on
that with the tests suggested.



> ^
> ../../src/include/nodes/pg_list.h:153:26: note: expanded from macro
> 'foreach'
> for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
> ^


Didn't modify this file on this or any other work of mine. Possible garbage
from a git pull. Will investigate.


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-02 Thread Robert Haas
On Tue, Feb 2, 2016 at 5:57 PM, Corey Huinker  wrote:
>> postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
>> foreach(lc, server->options)
>
>
> Odd. Compiled for me. Maybe I messed up the diff. Will get back to you on
> that with the tests suggested.

I don't see how.  There really is no declaration in there for a
variable called server.

>> ../../src/include/nodes/pg_list.h:153:26: note: expanded from macro
>> 'foreach'
>> for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
>> ^
>
> Didn't modify this file on this or any other work of mine. Possible garbage
> from a git pull. Will investigate.

This is context information for the same error, not a separate problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-02 Thread Corey Huinker
>
>
> I don't see how.  There really is no declaration in there for a
> variable called server.
>

Absolutely correct. My only guess is that it was from failing to make clean
after a checkout/re-checkout. A good reason to have even boring regression
tests.

Regression tests added.
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2390e61..e28cf77 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3951,3 +3951,63 @@ QUERY:  CREATE FOREIGN TABLE t5 (
 OPTIONS (schema_name 'import_source', table_name 't5');
 CONTEXT:  importing foreign table "t5"
 ROLLBACK;
+BEGIN;
+CREATE SERVER fetch101 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( fetch_size '101' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+ count 
+---
+ 1
+(1 row)
+
+ALTER SERVER fetch101 OPTIONS( SET fetch_size '202' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+ count 
+---
+ 0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=202'];
+ count 
+---
+ 1
+(1 row)
+
+CREATE FOREIGN TABLE table3 ( x int ) SERVER fetch101 OPTIONS ( fetch_size '3' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table3'::regclass
+AND ftoptions @> array['fetch_size=3'];
+ count 
+---
+ 1
+(1 row)
+
+ALTER FOREIGN TABLE table3 OPTIONS ( SET fetch_size '6');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table3'::regclass
+AND ftoptions @> array['fetch_size=3'];
+ count 
+---
+ 0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table3'::regclass
+AND ftoptions @> array['fetch_size=6'];
+ count 
+---
+ 1
+(1 row)
+
+ROLLBACK;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("%s requires a non-negative integer value",
+def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ab85f6..8f72ff7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem*def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1012,6 +1018,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = 

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-01 Thread Alvaro Herrera
Corey Huinker wrote:

> Enjoy.

Didn't actually look into the patch but looks like there's progress
here and this might be heading for commit.  Moving to next one.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-27 Thread Robert Haas
On Mon, Jan 25, 2016 at 4:18 PM, Corey Huinker  wrote:
> Revised in patch v3:
> * get_option() and get_fetch_size() removed, fetch_size searches added to
> existing loops.
> * Move fetch_size <= 0 tests into postgres_fdw_validator() routine in
> option.c
> * DEBUG1 message removed, never intended that to live beyond the proof of
> concept.
> * Missing regression test mentioned in makefile de-mentioned, as there's
> nothing to see without the DEBUG1 message.
> * Multi-line comment shrunk

Looks pretty good.  You seem to have added a blank line at the end of
postgres_fdw.c, which should be stripped back out.

> I'm not too keen on having *no* new regression tests, but defer to your
> judgement.

So... I'm not *opposed* to regression tests.  I just don't see a
reasonable way to write one.  If you've got an idea, I'm all ears.

> Still not sure what you mean by remote execution options. But it might be
> simpler now that the patch is closer to your expectations.

I'm talking about the documentation portion of the patch, which
regrettably seems to have disappeared between v2 and v3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-27 Thread Corey Huinker
>
>
> Looks pretty good.  You seem to have added a blank line at the end of
> postgres_fdw.c, which should be stripped back out.
>

Done.


>
> > I'm not too keen on having *no* new regression tests, but defer to your
> > judgement.
>
> So... I'm not *opposed* to regression tests.  I just don't see a
> reasonable way to write one.  If you've got an idea, I'm all ears.
>

The possible tests might be:
- creating a server/table with fetch_size set
- altering an existing server/table to have fetch_size set
- verifying that the value exists in pg_foreign_server.srvoptions and
pg_foreign_table.ftoptions.
- attempting to set fetch_size to a non-integer value

None of which are very interesting, and none of which actually test what
fetch_size was actually used.

But I'm happy to add any of those that seem worthwhile.


> > Still not sure what you mean by remote execution options. But it might be
> > simpler now that the patch is closer to your expectations.
>
> I'm talking about the documentation portion of the patch, which
> regrettably seems to have disappeared between v2 and v3.
>

Ah, didn't realize you were speaking about the documentation, and since
that section was new, I wasn't familiar with the name. Moved.

...and not sure why the doc change didn't make it into the last patch, but
it's in this one.

I also moved the paragraph starting "When using the extensions option, *it
is the user's responsibility* that..." such that it is in the same
varlistentry as "extensions", though maybe that change should be delegated
to the patch that created the extensions option.

Enjoy.
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("%s requires a non-negative integer value",
+def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 374faf5..f71bf61 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem*def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1069,6 +1075,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 			 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+			 retrieved_attrs,
+			 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1147,6 +1156,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 FdwScanPrivateSelectSql));
 

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-25 Thread Robert Haas
On Mon, Jan 25, 2016 at 1:21 PM, Corey Huinker  wrote:
>> - We could consider folding fetch_size into "Remote Execution
>> Options", but maybe that's too clever.
>
> If you care to explain, I'm listening. Otherwise I'm going forward with the
> other suggestions you've made.

It's just a little unfortunate to have multiple sections with only a
single option in each.  It would be nice to avoid that somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-25 Thread Robert Haas
On Thu, Dec 31, 2015 at 1:10 AM, Corey Huinker  wrote:
> Here's a re-based patch. Notable changes since the last patch are:
>
> - some changes had to move to postgres_fdw.h
> - the regression tests are in their own script fetch_size.sql /
> fetch_size.out. that should make things easier for the reviewer(s), even if
> we later merge it into postgres_fdw.sql/.out.
> - I put braces around a multi-line error ereport(). That might be a style
> violation, but the statement seemed confusing without it.
> - The fetch sql is reported as a DEBUG1 level ereport(), and confirms that
> server level options are respected, and table level options are used in
> favor of server-level options.
>
> I left the DEBUG1-level message in this patch so that others can see the
> output, but I assume we would omit that for production code, with
> corresponding deletions from fetch_size.sql and fetch_size.out.

Review:

- There is an established idiom in postgres_fdw for how to extract
data from lists of DefElems, and this patch does something else
instead.  Please make it look like postgresIsForeignRelUpdatable,
postgresGetForeignRelSize, and postgresImportForeignSchema instead of
inventing something new.  (Note that your approach would require
multiple passes over the list if you needed information on multiple
options, whereas the existing approach does not.)

- I think the comment in InitPgFdwOptions() could be reduced to a
one-line comment similar to those already present.

- I would reduce the debug level on the fetch command to something
lower than DEBUG1, or drop it entirely.  If you keep it, you need to
fix the formatting so that the line breaks look like other ereport()
calls.

- We could consider folding fetch_size into "Remote Execution
Options", but maybe that's too clever.

I would not worry about trying to get this into the regression tests.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-25 Thread Corey Huinker
>
>
> Review:
>
> - There is an established idiom in postgres_fdw for how to extract
> data from lists of DefElems, and this patch does something else
> instead.  Please make it look like postgresIsForeignRelUpdatable,
> postgresGetForeignRelSize, and postgresImportForeignSchema instead of
> inventing something new.  (Note that your approach would require
> multiple passes over the list if you needed information on multiple
> options, whereas the existing approach does not.)
>

I will look into that. The original patch pre-dates import foreign schema,
so I'm not surprised that I missed the established pattern.


>
> - I think the comment in InitPgFdwOptions() could be reduced to a
> one-line comment similar to those already present.
>

Aye.


> - I would reduce the debug level on the fetch command to something
> lower than DEBUG1, or drop it entirely.  If you keep it, you need to
> fix the formatting so that the line breaks look like other ereport()
> calls.
>

As I mentioned, I plan to drop it entirely.  It was only there to show a
reviewer that it works as advertised. There's not much to see without it.


> - We could consider folding fetch_size into "Remote Execution
> Options", but maybe that's too clever.
>

If you care to explain, I'm listening. Otherwise I'm going forward with the
other suggestions you've made.


>
> I would not worry about trying to get this into the regression tests.
>
>
Happy to hear that.


Re: [HACKERS] [POC] FETCH limited by bytes.

2016-01-25 Thread Corey Huinker
On Mon, Jan 25, 2016 at 3:22 PM, Robert Haas  wrote:

> On Mon, Jan 25, 2016 at 1:21 PM, Corey Huinker 
> wrote:
> >> - We could consider folding fetch_size into "Remote Execution
> >> Options", but maybe that's too clever.
> >
> > If you care to explain, I'm listening. Otherwise I'm going forward with
> the
> > other suggestions you've made.
>
> It's just a little unfortunate to have multiple sections with only a
> single option in each.  It would be nice to avoid that somehow.
>
>
Revised in patch v3:
* get_option() and get_fetch_size() removed, fetch_size searches added to
existing loops.
* Move fetch_size <= 0 tests into postgres_fdw_validator() routine in
option.c
* DEBUG1 message removed, never intended that to live beyond the proof of
concept.
* Missing regression test mentioned in makefile de-mentioned, as there's
nothing to see without the DEBUG1 message.
* Multi-line comment shrunk

(There's a v2 patch that is prior to the change to postgres_fdw_validator()
in option.c, but in retrospect that's not interesting to you).

I'm not too keen on having *no* new regression tests, but defer to your
judgement.

Still not sure what you mean by remote execution options. But it might be
simpler now that the patch is closer to your expectations.


v3.Make-fetch_size-settable-per-server-and-table.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-30 Thread Corey Huinker
>
>
>> I believe it won't be needed as a regression test but DEBUGn
>> messages could be usable to see "what was sent to the remote
>> side". It can be shown to the console so easily done in
>> regression test. See the deocumentation for client_min_messages
>> for details. (It could receive far many messages then expected,
>> though, and maybe fragile for changing in other part.)
>>
>> regards,
>>
>> --
>> Kyotaro Horiguchi
>> NTT Open Source Software Center
>>
>>
>>
> Ok, I'll see what debug-level messages reveal.
>
>
>
Here's a re-based patch. Notable changes since the last patch are:

- some changes had to move to postgres_fdw.h
- the regression tests are in their own script fetch_size.sql /
fetch_size.out. that should make things easier for the reviewer(s), even if
we later merge it into postgres_fdw.sql/.out.
- I put braces around a multi-line error ereport(). That might be a style
violation, but the statement seemed confusing without it.
- The fetch sql is reported as a DEBUG1 level ereport(), and confirms that
server level options are respected, and table level options are used in
favor of server-level options.

I left the DEBUG1-level message in this patch so that others can see the
output, but I assume we would omit that for production code, with
corresponding deletions from fetch_size.sql and fetch_size.out.
diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 3543312..5d50b30 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw fetch_size
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 380ac80..f482dcd 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -162,6 +162,12 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/*
+		 * fetch_size is available on both server and table, the table setting
+		 * overrides the server setting.
+		 */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f501c6c..d2595f6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -303,6 +307,8 @@ static HeapTuple make_tuple_from_result_row(PGresult *res,
 		   List *retrieved_attrs,
 		   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
+static int get_fetch_size(ForeignTable	*table,
+	ForeignServer *server);
 
 
 /*
@@ -371,6 +377,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	/* Look up foreign-table catalog info. */
 	fpinfo->table = GetForeignTable(foreigntableid);
 	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+	/* Look up any table-specific fetch size */
+	fpinfo->fetch_size = get_fetch_size(fpinfo->table,fpinfo->server);
 
 	/*
 	 * Extract user-settable option values.  Note that per-table setting of
@@ -1069,6 +1077,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 			 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+			 retrieved_attrs,
+			 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1147,6 +1158,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	 FdwScanPrivateSelectSql));
 	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
 			   FdwScanPrivateRetrievedAttrs);
+	fsstate->fetch_size = intVal(list_nth(fsplan->fdw_private,
+ FdwScanPrivateFetchSize));
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -2275,15 +2288,13 @@ fetch_more_data(ForeignScanState *node)
 	{
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
-		int			fetch_size;
 		int			numrows;
 		int			i;
 
-		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-26 Thread Vladimir Sitnikov
>Have you got numbers showing any actual performance win for postgres_fdw?

For JDBC purposes, it would be nice to have an ability of asking
backend "to stop fetching if produced more than X MiB of response
data".
For small table (4 int4 fields), having decent fetchSize (~1000) makes
result processing 7 times faster than with fetchSize of 50 rows (14 ms
-> 2 ms for 2000 rows).
Here are the measurements: [1] and [2].

Note: it is not required to precisely follow given "max fetch bytes"
limit. It would be enough just to stop after certain amount of data
was sent.
The whole thing of using limited fetch size is to avoid running out of
memory at client side.
I do not think developers care how many rows is fetched at once. It
they do, they should rather use "limit X" SQL syntax.

Do you have a suggestion for such a use case?

For fixed-size data types, JDBC driver can estimate "max sane fetch
size", however:
1) In order to know data types, a roundtrip is required. This means
the first fetch must be conservative, thus small queries would be
penalized.
2) For variable length types there is no way to estimate "sane number
of rows", except of using "average row size of already received data".
This is not reliable, especially if the first rows have nulls, and
subsequent ones contain non-empty strings.

[1]: https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473
[2]: https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-26 Thread Corey Huinker
On Fri, Dec 25, 2015 at 12:31 AM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> At Thu, 24 Dec 2015 18:31:42 -0500, Corey Huinker 
> wrote in 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-26 Thread Corey Huinker
On Sat, Dec 26, 2015 at 6:16 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> >Have you got numbers showing any actual performance win for postgres_fdw?
>
> For JDBC purposes, it would be nice to have an ability of asking
> backend "to stop fetching if produced more than X MiB of response
> data".
> For small table (4 int4 fields), having decent fetchSize (~1000) makes
> result processing 7 times faster than with fetchSize of 50 rows (14 ms
> -> 2 ms for 2000 rows).
> Here are the measurements: [1] and [2].
>
> Note: it is not required to precisely follow given "max fetch bytes"
> limit. It would be enough just to stop after certain amount of data
> was sent.
> The whole thing of using limited fetch size is to avoid running out of
> memory at client side.
> I do not think developers care how many rows is fetched at once. It
> they do, they should rather use "limit X" SQL syntax.
>
> Do you have a suggestion for such a use case?
>
> For fixed-size data types, JDBC driver can estimate "max sane fetch
> size", however:
> 1) In order to know data types, a roundtrip is required. This means
> the first fetch must be conservative, thus small queries would be
> penalized.
> 2) For variable length types there is no way to estimate "sane number
> of rows", except of using "average row size of already received data".
> This is not reliable, especially if the first rows have nulls, and
> subsequent ones contain non-empty strings.
>
> [1]: https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473
> [2]: https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387
>
> Vladimir
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

I believe that Kyotaro proposed something like that, wherein the FDW would
be more adaptive based on the amount of memory available, and fetch a
number of rows that, by its estimation, would fit in the memory available.
I don't know the progress of that patch.

This patch is a far less complicated solution and puts the burden on the
DBA to figure out approximately how many rows would fit in memory based on
the average row size, and set the per-table option accordingly. If it is
later determined that the rows are now too heavy to fit into the space
allotted, the fetch size can be altered for that table as needed.


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-26 Thread Vladimir Sitnikov
>and fetch a number of rows that, by its estimation, would fit in the memory 
>available

What's wrong with having size limit in the first place? It seems to
make much more sense.

Vladimir


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-24 Thread Corey Huinker
On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby  wrote:

> On 12/23/15 12:15 PM, Corey Huinker wrote:
>
>> That's fair. I'm still at a loss for how to show that the fetch size was
>> respected by the remote server, suggestions welcome!
>>
>
> A combination of repeat() and generate_series()?
>
> I'm guessing it's not that obvious and that I'm missing something; can you
> elaborate?
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


I'll try. So the basic test of whether the FDW respected the fetch limit is
this:

1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number different from
100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].

The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where the query
column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-24 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 24 Dec 2015 18:31:42 -0500, Corey Huinker  
wrote in 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-23 Thread Michael Paquier
On Mon, Sep 21, 2015 at 1:52 PM, Corey Huinker  wrote:
> Attached is the patch / diff against current master.

I have moved this patch to next CF, there is a new patch, but no reviews.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-23 Thread Corey Huinker
On Wed, Dec 23, 2015 at 8:43 AM, Michael Paquier 
wrote:

> On Mon, Sep 21, 2015 at 1:52 PM, Corey Huinker 
> wrote:
> > Attached is the patch / diff against current master.
>
> I have moved this patch to next CF, there is a new patch, but no reviews.
> --
> Michael
>


That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-12-23 Thread Jim Nasby

On 12/23/15 12:15 PM, Corey Huinker wrote:

That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!


A combination of repeat() and generate_series()?

I'm guessing it's not that obvious and that I'm missing something; can 
you elaborate?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-09-20 Thread Corey Huinker
On Fri, Sep 4, 2015 at 2:45 AM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello,
>
> > > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > > int eflags)
> ...
> > > > + def = get_option(table->options, "fetch_size");
> >
> > > I don't think it's a good idea to make such checks at runtime - and
> > > either way it's somethign that should be reported back using an
> > > ereport(), not an elog.
> >
> > > Also, it seems somewhat wrong to determine this at execution
> > > time. Shouldn't this rather be done when creating the foreign scan
> node?
> > > And be a part of the scan state?
> >
> > I agree, that was my original plan, but I wasn't familiar enough with the
> > FDW architecture to know where the table struct and the scan struct were
> > both exposed in the same function.
> >
> > What I submitted incorporated some of Kyotaro's feedback (and working
> > patch) to my original incomplete patch.
>
> Sorry, it certainly shouldn't be a good place to do such thing. I
> easily selected the place in order to avoid adding new similar
> member in multiple existing structs (PgFdwRelationInfo and
> PgFdwScanState).
>
> Having a new member fetch_size is added in PgFdwRelationInfo and
> PgFdwScanState, I think postgresGetForeignRelSize is the best
> place to do that, from the point that it collects basic
> information needed to calculate scan costs.
>
> # Fetch sizes of foreign join would be the future issue..
>
> > typedef struct PgFdwRelationInfo
> > {
>   ...
> + intfetch_size;   /* fetch size for this remote table */
>
> 
> > postgreGetForeignRelSize()
> > {
>   ...
> > fpinfo->table = GetForeignTable(foreigntableid);
> > fpinfo->server = GetForeignServer(fpinfo->table->serverid);
> >
> + def = get_option(table->options, "fetch_size");
> + ..
> + fpinfo->fetch_size = strtod(defGetString...
>
> Also it is doable in postgresGetForeignPlan and doing there
> removes redundant copy of fetch_size from PgFdwRelation to
> PgFdwScanState but theoretical basis would be weak.
>
> regards,
>
> > > On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
> > > > +static DefElem*
> > > > +get_option(List *options, char *optname)
> > > > +{
> > > > + ListCell *lc;
> > > > +
> > > > + foreach(lc, options)
> > > > + {
> > > > + DefElem *def = (DefElem *) lfirst(lc);
> > > > +
> > > > + if (strcmp(def->defname, optname) == 0)
> > > > + return def;
> > > > + }
> > > > + return NULL;
> > > > +}
> > >
> > >
> > > >   /*
> > > >* Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state
> stays
> > > NULL.
> > > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > > int eflags)
> > > >   server = GetForeignServer(table->serverid);
> > > >   user = GetUserMapping(userid, server->serverid);
> > > >
> > > > + /* Reading table options */
> > > > + fsstate->fetch_size = -1;
> > > > +
> > > > + def = get_option(table->options, "fetch_size");
> > > > + if (!def)
> > > > + def = get_option(server->options, "fetch_size");
> > > > +
> > > > + if (def)
> > > > + {
> > > > + fsstate->fetch_size = strtod(defGetString(def), NULL);
> > > > + if (fsstate->fetch_size < 0)
> > > > + elog(ERROR, "invalid fetch size for foreign
> table
> > > \"%s\"",
> > > > +  get_rel_name(table->relid));
> > > > + }
> > > > + else
> > > > + fsstate->fetch_size = 100;
> > >
> > > I don't think it's a good idea to make such checks at runtime - and
> > > either way it's somethign that should be reported back using an
> > > ereport(), not an elog.
> >
> > > Also, it seems somewhat wrong to determine this at execution
> > > time. Shouldn't this rather be done when creating the foreign scan
> node?
> > > And be a part of the scan state?
> >
> > I agree, that was my original plan, but I wasn't familiar enough with the
> > FDW architecture to know where the table struct and the scan struct were
> > both exposed in the same function.
> >
> > What I submitted incorporated some of Kyotaro's feedback (and working
> > patch) to my original incomplete patch.
> >
> > > Have you thought about how this option should cooperate with join
> > > pushdown once implemented?
> > >
> >
> > I hadn't until now, but I think the only sensible thing would be to
> > disregard table-specific settings once a second foreign table is
> detected,
> > and instead consider only the server-level setting.
> >
> > I suppose one could argue that if ALL the tables in the join had the same
> > table-level setting, we should go with that, but I think that would be
> > complicated, expensive, and generally a good argument for changing the
> > server setting instead.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
Ok, with some guidance from RhodiumToad (thanks!) I was able to get 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-09-04 Thread Kyotaro HORIGUCHI
Hello,

> > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > int eflags)
...
> > > + def = get_option(table->options, "fetch_size");
> 
> > I don't think it's a good idea to make such checks at runtime - and
> > either way it's somethign that should be reported back using an
> > ereport(), not an elog.
> 
> > Also, it seems somewhat wrong to determine this at execution
> > time. Shouldn't this rather be done when creating the foreign scan node?
> > And be a part of the scan state?
> 
> I agree, that was my original plan, but I wasn't familiar enough with the
> FDW architecture to know where the table struct and the scan struct were
> both exposed in the same function.
> 
> What I submitted incorporated some of Kyotaro's feedback (and working
> patch) to my original incomplete patch.

Sorry, it certainly shouldn't be a good place to do such thing. I
easily selected the place in order to avoid adding new similar
member in multiple existing structs (PgFdwRelationInfo and
PgFdwScanState).

Having a new member fetch_size is added in PgFdwRelationInfo and
PgFdwScanState, I think postgresGetForeignRelSize is the best
place to do that, from the point that it collects basic
information needed to calculate scan costs.

# Fetch sizes of foreign join would be the future issue..

> typedef struct PgFdwRelationInfo
> {
  ...
+ intfetch_size;   /* fetch size for this remote table */


> postgreGetForeignRelSize()
> {
  ...
> fpinfo->table = GetForeignTable(foreigntableid);
> fpinfo->server = GetForeignServer(fpinfo->table->serverid);
>  
+ def = get_option(table->options, "fetch_size");
+ ..
+ fpinfo->fetch_size = strtod(defGetString...

Also it is doable in postgresGetForeignPlan and doing there
removes redundant copy of fetch_size from PgFdwRelation to
PgFdwScanState but theoretical basis would be weak.

regards,

> > On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
> > > +static DefElem*
> > > +get_option(List *options, char *optname)
> > > +{
> > > + ListCell *lc;
> > > +
> > > + foreach(lc, options)
> > > + {
> > > + DefElem *def = (DefElem *) lfirst(lc);
> > > +
> > > + if (strcmp(def->defname, optname) == 0)
> > > + return def;
> > > + }
> > > + return NULL;
> > > +}
> >
> >
> > >   /*
> > >* Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays
> > NULL.
> > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > int eflags)
> > >   server = GetForeignServer(table->serverid);
> > >   user = GetUserMapping(userid, server->serverid);
> > >
> > > + /* Reading table options */
> > > + fsstate->fetch_size = -1;
> > > +
> > > + def = get_option(table->options, "fetch_size");
> > > + if (!def)
> > > + def = get_option(server->options, "fetch_size");
> > > +
> > > + if (def)
> > > + {
> > > + fsstate->fetch_size = strtod(defGetString(def), NULL);
> > > + if (fsstate->fetch_size < 0)
> > > + elog(ERROR, "invalid fetch size for foreign table
> > \"%s\"",
> > > +  get_rel_name(table->relid));
> > > + }
> > > + else
> > > + fsstate->fetch_size = 100;
> >
> > I don't think it's a good idea to make such checks at runtime - and
> > either way it's somethign that should be reported back using an
> > ereport(), not an elog.
> 
> > Also, it seems somewhat wrong to determine this at execution
> > time. Shouldn't this rather be done when creating the foreign scan node?
> > And be a part of the scan state?
> 
> I agree, that was my original plan, but I wasn't familiar enough with the
> FDW architecture to know where the table struct and the scan struct were
> both exposed in the same function.
> 
> What I submitted incorporated some of Kyotaro's feedback (and working
> patch) to my original incomplete patch.
>
> > Have you thought about how this option should cooperate with join
> > pushdown once implemented?
> >
> 
> I hadn't until now, but I think the only sensible thing would be to
> disregard table-specific settings once a second foreign table is detected,
> and instead consider only the server-level setting.
> 
> I suppose one could argue that if ALL the tables in the join had the same
> table-level setting, we should go with that, but I think that would be
> complicated, expensive, and generally a good argument for changing the
> server setting instead.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-09-03 Thread Corey Huinker
On Wed, Sep 2, 2015 at 9:08 AM, Andres Freund  wrote:

> On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
> > +static DefElem*
> > +get_option(List *options, char *optname)
> > +{
> > + ListCell *lc;
> > +
> > + foreach(lc, options)
> > + {
> > + DefElem *def = (DefElem *) lfirst(lc);
> > +
> > + if (strcmp(def->defname, optname) == 0)
> > + return def;
> > + }
> > + return NULL;
> > +}
>
>
> >   /*
> >* Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays
> NULL.
> > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> int eflags)
> >   server = GetForeignServer(table->serverid);
> >   user = GetUserMapping(userid, server->serverid);
> >
> > + /* Reading table options */
> > + fsstate->fetch_size = -1;
> > +
> > + def = get_option(table->options, "fetch_size");
> > + if (!def)
> > + def = get_option(server->options, "fetch_size");
> > +
> > + if (def)
> > + {
> > + fsstate->fetch_size = strtod(defGetString(def), NULL);
> > + if (fsstate->fetch_size < 0)
> > + elog(ERROR, "invalid fetch size for foreign table
> \"%s\"",
> > +  get_rel_name(table->relid));
> > + }
> > + else
> > + fsstate->fetch_size = 100;
>
> I don't think it's a good idea to make such checks at runtime - and
> either way it's somethign that should be reported back using an
> ereport(), not an elog.


> Also, it seems somewhat wrong to determine this at execution
> time. Shouldn't this rather be done when creating the foreign scan node?
> And be a part of the scan state?
>

I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.

What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.


>
> Have you thought about how this option should cooperate with join
> pushdown once implemented?
>

I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table is detected,
and instead consider only the server-level setting.

I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-09-02 Thread Andres Freund
On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
> +static DefElem*
> +get_option(List *options, char *optname)
> +{
> + ListCell *lc;
> +
> + foreach(lc, options)
> + {
> + DefElem *def = (DefElem *) lfirst(lc);
> +
> + if (strcmp(def->defname, optname) == 0)
> + return def;
> + }
> + return NULL;
> +}


>   /*
>* Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
> @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int 
> eflags)
>   server = GetForeignServer(table->serverid);
>   user = GetUserMapping(userid, server->serverid);
>  
> + /* Reading table options */
> + fsstate->fetch_size = -1;
> +
> + def = get_option(table->options, "fetch_size");
> + if (!def)
> + def = get_option(server->options, "fetch_size");
> +
> + if (def)
> + {
> + fsstate->fetch_size = strtod(defGetString(def), NULL);
> + if (fsstate->fetch_size < 0)
> + elog(ERROR, "invalid fetch size for foreign table 
> \"%s\"",
> +  get_rel_name(table->relid));
> + }
> + else
> + fsstate->fetch_size = 100;

I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.

Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?

Have you thought about how this option should cooperate with join
pushdown once implemented?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] FETCH limited by bytes.

2015-02-27 Thread Corey Huinker
Attached is a diff containing the original (working) patch from my
(incomplete) patch, plus regression test changes and documentation changes.

While it's easy to regression-test the persistence of the fetch_size
options, I am confounded as to how we would show that the fetch_size
setting was respected. I've seen it with my own eyes viewing the query log
on redshift, but I see no way to automate that. Suggestions welcome.

On Fri, Feb 6, 2015 at 3:11 AM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hello,

  Redshift has a table, stv_inflight, which serves about the same purpose
 as
  pg_stat_activity. Redshift seems to perform better with very high fetch
  sizes (10,000 is a good start), so the default foreign data wrapper
 didn't
  perform so well.

 I agree with you.

  I was able to confirm that the first query showed FETCH 150 FROM c1 as
  the query, which is normal highly unhelpful, but in this case it confirms
  that tables created in redshift_server do by default use the fetch_size
  option given during server creation.
 
  I was also able to confirm that the second query showed FETCH 151 FROM
 c1
  as the query, which shows that per-table overrides also work.
 
  I'd be happy to stop here, but Kyotaro might feel differently.

 This is enough in its own way, of course.

  With this
  limited patch, one could estimate the number of rows that would fit into
  the desired memory block based on the row width and set fetch_size
  accordingly.

 The users including me will be happy with it when the users know
 how to determin the fetch size. Especially the remote tables are
 very few or the configuration will be enough stable.

 On widely distributed systems, it would be far difficult to tune
 fetch size manually for every foreign tables, so finally it would
 be left at the default and safe size, it's 100 or so.

 This is the similar discussion about max_wal_size on another
 thread. Calculating fetch size is far tougher for users than
 setting expected memory usage, I think.

  But we could go further, and have a fetch_max_memory option only at the
  table level, and the fdw could do that same memory / estimated_row_size
  calculation and derive fetch_size that way *at table creation time*, not
  query time.

 We cannot know the real length of the text type data in advance,
 other than that, even defined as char(n), the n is the
 theoretically(or in-design) maximum size for the field but in the
 most cases the mean length of the real data would be far small
 than that. For that reason, calculating the ratio at the table
 creation time seems to be difficult.

 However, I agree to the Tom's suggestion that the changes in
 FETCH statement is defenitly ugly, especially the overhead
 argument is prohibitive even for me:(

  Thanks to Kyotaro and Bruce Momjian for their help.

 Not at all.

 regardes,


 At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker corey.huin...@gmail.com
 wrote in CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=
 p6mxmg7s86c...@mail.gmail.com
  I applied this patch to REL9_4_STABLE, and I was able to connect to a
  foreign database (redshift, actually).
 
  the basic outline of the test is below, names changed to protect my
  employment.
 
  create extension if not exists postgres_fdw;
 
  create server redshift_server foreign data wrapper postgres_fdw
  options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
  fetch_size '150' );
 
  create user mapping for public server redshift_server options ( user
  'redacted_user', password 'comeonyouarekiddingright' );
 
  create foreign table redshift_tab150 ( colspecs )
  server redshift_server options (table_name 'redacted_table', schema_name
  'redacted_schema' );
 
  create foreign table redshift_tab151 ( colspecs )
  server redshift_server options (table_name 'redacted_table', schema_name
  'redacted_schema', fetch_size '151' );
 
  -- i don't expect the fdw to push the aggregate, this is just a test to
 see
  what query shows up in stv_inflight.
  select count(*) from redshift_ccp150;
 
  -- i don't expect the fdw to push the aggregate, this is just a test to
 see
  what query shows up in stv_inflight.
  select count(*) from redshift_ccp151;
 
 
  For those of you that aren't familiar with Redshift, it's a columnar
  database that seems to be a fork of postgres 8.cough. You can connect to
 it
  with modern libpq programs (psql, psycopg2, etc).
  Redshift has a table, stv_inflight, which serves about the same purpose
 as
  pg_stat_activity. Redshift seems to perform better with very high fetch
  sizes (10,000 is a good start), so the default foreign data wrapper
 didn't
  perform so well.
 
  I was able to confirm that the first query showed FETCH 150 FROM c1 as
  the query, which is normal highly unhelpful, but in this case it confirms
  that tables created in redshift_server do by default use the fetch_size
  option given during server creation.
 
  I was also able to confirm that the second query showed FETCH 151 FROM
 c1
 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-02-06 Thread Kyotaro HORIGUCHI
Hello,

 Redshift has a table, stv_inflight, which serves about the same purpose as
 pg_stat_activity. Redshift seems to perform better with very high fetch
 sizes (10,000 is a good start), so the default foreign data wrapper didn't
 perform so well.

I agree with you.

 I was able to confirm that the first query showed FETCH 150 FROM c1 as
 the query, which is normal highly unhelpful, but in this case it confirms
 that tables created in redshift_server do by default use the fetch_size
 option given during server creation.
 
 I was also able to confirm that the second query showed FETCH 151 FROM c1
 as the query, which shows that per-table overrides also work.
 
 I'd be happy to stop here, but Kyotaro might feel differently.

This is enough in its own way, of course.

 With this
 limited patch, one could estimate the number of rows that would fit into
 the desired memory block based on the row width and set fetch_size
 accordingly.

The users including me will be happy with it when the users know
how to determin the fetch size. Especially the remote tables are
very few or the configuration will be enough stable.

On widely distributed systems, it would be far difficult to tune
fetch size manually for every foreign tables, so finally it would
be left at the default and safe size, it's 100 or so.

This is the similar discussion about max_wal_size on another
thread. Calculating fetch size is far tougher for users than
setting expected memory usage, I think.

 But we could go further, and have a fetch_max_memory option only at the
 table level, and the fdw could do that same memory / estimated_row_size
 calculation and derive fetch_size that way *at table creation time*, not
 query time.

We cannot know the real length of the text type data in advance,
other than that, even defined as char(n), the n is the
theoretically(or in-design) maximum size for the field but in the
most cases the mean length of the real data would be far small
than that. For that reason, calculating the ratio at the table
creation time seems to be difficult.

However, I agree to the Tom's suggestion that the changes in
FETCH statement is defenitly ugly, especially the overhead
argument is prohibitive even for me:(

 Thanks to Kyotaro and Bruce Momjian for their help.

Not at all.

regardes,


At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker corey.huin...@gmail.com 
wrote in CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=p6mxmg7s86c...@mail.gmail.com
 I applied this patch to REL9_4_STABLE, and I was able to connect to a
 foreign database (redshift, actually).
 
 the basic outline of the test is below, names changed to protect my
 employment.
 
 create extension if not exists postgres_fdw;
 
 create server redshift_server foreign data wrapper postgres_fdw
 options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
 fetch_size '150' );
 
 create user mapping for public server redshift_server options ( user
 'redacted_user', password 'comeonyouarekiddingright' );
 
 create foreign table redshift_tab150 ( colspecs )
 server redshift_server options (table_name 'redacted_table', schema_name
 'redacted_schema' );
 
 create foreign table redshift_tab151 ( colspecs )
 server redshift_server options (table_name 'redacted_table', schema_name
 'redacted_schema', fetch_size '151' );
 
 -- i don't expect the fdw to push the aggregate, this is just a test to see
 what query shows up in stv_inflight.
 select count(*) from redshift_ccp150;
 
 -- i don't expect the fdw to push the aggregate, this is just a test to see
 what query shows up in stv_inflight.
 select count(*) from redshift_ccp151;
 
 
 For those of you that aren't familiar with Redshift, it's a columnar
 database that seems to be a fork of postgres 8.cough. You can connect to it
 with modern libpq programs (psql, psycopg2, etc).
 Redshift has a table, stv_inflight, which serves about the same purpose as
 pg_stat_activity. Redshift seems to perform better with very high fetch
 sizes (10,000 is a good start), so the default foreign data wrapper didn't
 perform so well.
 
 I was able to confirm that the first query showed FETCH 150 FROM c1 as
 the query, which is normal highly unhelpful, but in this case it confirms
 that tables created in redshift_server do by default use the fetch_size
 option given during server creation.
 
 I was also able to confirm that the second query showed FETCH 151 FROM c1
 as the query, which shows that per-table overrides also work.
 
 I'd be happy to stop here, but Kyotaro might feel differently. With this
 limited patch, one could estimate the number of rows that would fit into
 the desired memory block based on the row width and set fetch_size
 accordingly.
 
 But we could go further, and have a fetch_max_memory option only at the
 table level, and the fdw could do that same memory / estimated_row_size
 calculation and derive fetch_size that way *at table creation time*, not
 query time.
 
 Thanks to Kyotaro and Bruce Momjian for their help.
 
 
 
 
 
 
 On Mon, 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-02-04 Thread Corey Huinker
I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).

the basic outline of the test is below, names changed to protect my
employment.

create extension if not exists postgres_fdw;

create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );

create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );

create foreign table redshift_tab150 ( colspecs )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );

create foreign table redshift_tab151 ( colspecs )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;


For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to it
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.

I was able to confirm that the first query showed FETCH 150 FROM c1 as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed FETCH 151 FROM c1
as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

Thanks to Kyotaro and Bruce Momjian for their help.






On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hmm, somehow I removed some recipients, especially the
 list. Sorry for the duplicate.

 -
 Sorry, I've been back. Thank you for the comment.

  Do you have any insight into where I would pass the custom row fetches
 from
  the table struct to the scan struct?

 Yeah it's one simple way to tune it, if the user knows the
 appropreate value.

  Last year I was working on a patch to postgres_fdw where the fetch_size
  could be set at the table level and the server level.
 
  I was able to get the settings parsed and they would show up in
  pg_foreign_table
  and pg_foreign_servers. Unfortunately, I'm not very familiar with how
  foreign data wrappers work, so I wasn't able to figure out how to get
 these
  custom values passed from the PgFdwRelationInfo struct into the
  query's PgFdwScanState
  struct.

 Directly answering, the states needed to be shared among several
 stages are holded within fdw_private. Your new variable
 fpinfo-fetch_size can be read in postgresGetForeignPlan.  It
 newly creates another fdw_private.  You can pass your values to
 ForeignPlan making it hold the value there. Finally, you will get
 it in postgresBeginForeginScan and can set it into
 PgFdwScanState.

  I bring this up only because it might be a simpler solution, in that the
  table designer could set the fetch size very high for narrow tables, and
  lower or default for wider tables. It's also a very clean syntax, just
  another option on the table and/or server creation.
 
  My incomplete patch is attached.

 However, the fetch_size is not needed by planner (so far), so we
 can simply read the options in postgresBeginForeignScan() and set
 into PgFdwScanState. This runs once per exection.

 Finally, the attached patch will work as you intended.

 What do you think about this?

 regards,

 --
 Kyotaro Horiguchi
 NTT Open Source Software Center


  On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI 
  horiguchi.kyot...@lab.ntt.co.jp wrote:
 
   Thank you for the comment.
  
   The automatic way to determin the fetch_size looks become too
   much for the purpose. An example of non-automatic way is a new
   foreign table option like 'fetch_size' but this exposes the
   inside too much... Which do you think is preferable?
  
   Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane t...@sss.pgh.pa.us wrote
 in 
   24503.1421943...@sss.pgh.pa.us
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Hello, as the 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-02-01 Thread Kyotaro HORIGUCHI
Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.

-
Sorry, I've been back. Thank you for the comment.

 Do you have any insight into where I would pass the custom row fetches from
 the table struct to the scan struct?

Yeah it's one simple way to tune it, if the user knows the
appropreate value.

 Last year I was working on a patch to postgres_fdw where the fetch_size
 could be set at the table level and the server level.
 
 I was able to get the settings parsed and they would show up in
 pg_foreign_table
 and pg_foreign_servers. Unfortunately, I'm not very familiar with how
 foreign data wrappers work, so I wasn't able to figure out how to get these
 custom values passed from the PgFdwRelationInfo struct into the
 query's PgFdwScanState
 struct.

Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo-fetch_size can be read in postgresGetForeignPlan.  It
newly creates another fdw_private.  You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.

 I bring this up only because it might be a simpler solution, in that the
 table designer could set the fetch size very high for narrow tables, and
 lower or default for wider tables. It's also a very clean syntax, just
 another option on the table and/or server creation.
 
 My incomplete patch is attached.

However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.

Finally, the attached patch will work as you intended.

What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


 On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI 
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  Thank you for the comment.
 
  The automatic way to determin the fetch_size looks become too
  much for the purpose. An example of non-automatic way is a new
  foreign table option like 'fetch_size' but this exposes the
  inside too much... Which do you think is preferable?
 
  Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 
  24503.1421943...@sss.pgh.pa.us
   Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.
  
Is such a feature and syntax could be allowed to be added?
  
   This seems like a lot of work, and frankly an incredibly ugly API,
   for a benefit that is entirely hypothetical.  Have you got numbers
   showing any actual performance win for postgres_fdw?
 
  The API is a rush work to make the path for the new parameter
  (but, yes, I did too much for the purpose that use from
  postgres_fdw..)  and it can be any saner syntax but it's not the
  time to do so yet.
 
  The data-size limitation, any size to limit, would give
  significant gain especially for small sized rows.
 
  This patch began from the fact that it runs about twice faster
  when fetch size = 1 than 100.
 
 
  http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyot...@lab.ntt.co.jp
 
  I took exec times to get 1M rows from localhost via postgres_fdw
  and it showed the following numbers.
 
  =# SELECT a from ft1;
  fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
  (local)0.75s
  10060  6.2s   6000 (0.006)
  1  60  2.7s 60 (0.6  )
  3  60  2.2s180 (2.0  )
  6  60  2.4s360 (4.0  )
 
  =# SELECT a, b, c from ft1;
  fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
  (local)0.8s
  100   204 12  s  20400 (0.02 )
  1000  204 10  s 204000 (0.2  )
  1 204  5.8s204 (2)
  2 204  5.9s408 (4)
 
  =# SELECT a, b, d from ft1;
  fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
  (local)0.8s
  100  1356 17  s 135600 (0.136)
  1000 1356 15  s1356000 (1.356)
  1475 1356 13  s2000100 (2.0  )
  2950 1356 13  s4000200 (4.0  )
 
  The definitions of the environment are the following.
 
  CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
  'localhost', dbname 'postgres');
  CREATE USER MAPPING FOR PUBLIC SERVER sv1;
  CREATE TABLE lt1 (a int, b timestamp, c text, d text);
  CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1
  OPTIONS (table_name 'lt1');
  INSERT INTO lt1 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-01-27 Thread Kyotaro HORIGUCHI
Thank you for the comment.

The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?

Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 
24503.1421943...@sss.pgh.pa.us
 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
  Hello, as the discuttion on async fetching on postgres_fdw, FETCH
  with data-size limitation would be useful to get memory usage
  stability of postgres_fdw.
 
  Is such a feature and syntax could be allowed to be added?
 
 This seems like a lot of work, and frankly an incredibly ugly API,
 for a benefit that is entirely hypothetical.  Have you got numbers
 showing any actual performance win for postgres_fdw?

The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..)  and it can be any saner syntax but it's not the
time to do so yet.

The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.

This patch began from the fact that it runs about twice faster
when fetch size = 1 than 100.

http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyot...@lab.ntt.co.jp

I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.

=# SELECT a from ft1;
fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
(local)0.75s
10060  6.2s   6000 (0.006)
1  60  2.7s 60 (0.6  )
3  60  2.2s180 (2.0  )
6  60  2.4s360 (4.0  )

=# SELECT a, b, c from ft1;
fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
(local)0.8s
100   204 12  s  20400 (0.02 )
1000  204 10  s 204000 (0.2  )
1 204  5.8s204 (2)
2 204  5.9s408 (4)

=# SELECT a, b, d from ft1;
fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
(local)0.8s
100  1356 17  s 135600 (0.136)
1000 1356 15  s1356000 (1.356)
1475 1356 13  s2000100 (2.0  )
2950 1356 13  s4000200 (4.0  )

The definitions of the environment are the following.

CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', 
dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1 
OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM 
generate_series(0, 99) a);

The avg row size is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup-t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.

They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.

 Even if we wanted to do something like this, I strongly object to
 measuring size by heap_compute_data_size.  That's not a number that users
 would normally have any direct knowledge of; nor does it have anything
 at all to do with the claimed use-case, where what you'd really need to
 measure is bytes transmitted down the wire.  (The difference is not small:
 for instance, toasted values would likely still be toasted at the point
 where you're measuring.)

Sure. Finally, the attached patch #1 which does the following
things.

 - Sender limits the number of tuples using the sum of the net
   length of the column values to be sent, not including protocol
   overhead. It is calculated in the added function
   slot_compute_attr_size(), using raw length for compressed
   values.

 - postgres_fdw calculates fetch limit bytes by the following
   formula,

   MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);

The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 3.

fetch_size,   avg row size(*1),   time,   max alloced_mem/fetch(Mbytes)
(auto) 60  2.4s   108 ( 1.08)
(auto)204  7.3s536400 ( 0.54)
(auto)   1356 15  s430236 ( 0.43)

This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(


On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution.  

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-01-27 Thread Corey Huinker
Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.

I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get these
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.

I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.

My incomplete patch is attached.



On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Thank you for the comment.

 The automatic way to determin the fetch_size looks become too
 much for the purpose. An example of non-automatic way is a new
 foreign table option like 'fetch_size' but this exposes the
 inside too much... Which do you think is preferable?

 Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 
 24503.1421943...@sss.pgh.pa.us
  Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
   Hello, as the discuttion on async fetching on postgres_fdw, FETCH
   with data-size limitation would be useful to get memory usage
   stability of postgres_fdw.
 
   Is such a feature and syntax could be allowed to be added?
 
  This seems like a lot of work, and frankly an incredibly ugly API,
  for a benefit that is entirely hypothetical.  Have you got numbers
  showing any actual performance win for postgres_fdw?

 The API is a rush work to make the path for the new parameter
 (but, yes, I did too much for the purpose that use from
 postgres_fdw..)  and it can be any saner syntax but it's not the
 time to do so yet.

 The data-size limitation, any size to limit, would give
 significant gain especially for small sized rows.

 This patch began from the fact that it runs about twice faster
 when fetch size = 1 than 100.


 http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyot...@lab.ntt.co.jp

 I took exec times to get 1M rows from localhost via postgres_fdw
 and it showed the following numbers.

 =# SELECT a from ft1;
 fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
 (local)0.75s
 10060  6.2s   6000 (0.006)
 1  60  2.7s 60 (0.6  )
 3  60  2.2s180 (2.0  )
 6  60  2.4s360 (4.0  )

 =# SELECT a, b, c from ft1;
 fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
 (local)0.8s
 100   204 12  s  20400 (0.02 )
 1000  204 10  s 204000 (0.2  )
 1 204  5.8s204 (2)
 2 204  5.9s408 (4)

 =# SELECT a, b, d from ft1;
 fetch_size,   avg row size(*1),   time,   alloced_mem/fetch(Mbytes)(*1)
 (local)0.8s
 100  1356 17  s 135600 (0.136)
 1000 1356 15  s1356000 (1.356)
 1475 1356 13  s2000100 (2.0  )
 2950 1356 13  s4000200 (4.0  )

 The definitions of the environment are the following.

 CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
 'localhost', dbname 'postgres');
 CREATE USER MAPPING FOR PUBLIC SERVER sv1;
 CREATE TABLE lt1 (a int, b timestamp, c text, d text);
 CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1
 OPTIONS (table_name 'lt1');
 INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM
 generate_series(0, 99) a);

 The avg row size is alloced_mem/fetch_size and the alloced_mem
 is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
 tup-t_len) for all stored tuples in the receiver side,
 fetch_more_data() in postgres_fdw.

 They are about 50% gain for the smaller tuple size and 25% for
 the larger. They looks to be optimal at where alloced_mem is
 around 2MB by the reason unknown to me. Anyway the difference
 seems to be significant.

  Even if we wanted to do something like this, I strongly object to
  measuring size by heap_compute_data_size.  That's not a number that users
  would normally have any direct knowledge of; nor does it have anything
  at all to do with the claimed use-case, where what you'd really need to
  measure is bytes transmitted down the wire.  (The difference is not
 small:
  for instance, toasted values would likely still be toasted at the point
  where you're measuring.)

 Sure. Finally, the attached patch #1 which does the following
 things.

  - Sender limits the number of tuples using the sum of the net
length of 

Re: [HACKERS] [POC] FETCH limited by bytes.

2015-01-22 Thread Tom Lane
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Hello, as the discuttion on async fetching on postgres_fdw, FETCH
 with data-size limitation would be useful to get memory usage
 stability of postgres_fdw.

 Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical.  Have you got numbers
showing any actual performance win for postgres_fdw?

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size.  That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire.  (The difference is not small:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers