Re: [GENERAL] Retrieving query results

2017-08-28 Thread Michael Paquier
On Sun, Aug 27, 2017 at 12:12 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane  wrote:
>>> I think the real problem occurs where we realloc the array bigger.
>
>> Looking at the surroundings, I think that it would be nice to have
>> pqAddTuple and PQsetvalue set an error message with this patch.
>
> Yeah, I was thinking about that myself - the existing design presumes
> that the only possible reason for failure of pqAddTuple is OOM, but
> it'd be better to distinguish "too many tuples" from true OOM.  So
> we should also refactor to make pqAddTuple responsible for setting
> the error message.  Might be best to do the refactoring first.

Attached are two patches:
1) 0001 refactors the code around pqAddTuple to be able to handle
error messages and assign them in PQsetvalue particularly.
2) 0002 adds sanity checks in pqAddTuple for overflows, maximizing the
size of what is allocated to INT_MAX but now more.

pqRowProcessor() still has errmsgp, but it is never used on HEAD. At
least with this set of patches it comes to be useful. We could rework
check_field_number() to use as well an error message string, but I
have left that out to keep things simple. Not sure if any complication
is worth compared to just copying the error message in case of an
unmatching column number.

Attached is as well a small program I have used to test PQsetvalue
through PQcopyResult to see if results get correctly allocated at each
call, looking at the error message stacks on the way.
-- 
Michael
/*
 * Script to test PQcopyResult and subsequently PQsetvalue.
 * Compile with for example:
 * gcc -lpq -g -o pg_copy_res pg_copy_res.c
 */

#include 
#include 
#include "libpq-fe.h"

#define DEFAULT_PORT	"5432"
#define DEFAULT_HOST	"/tmp"
#define DEFAULT_DB		"postgres"

int
main()
{
	char *port = getenv("PGPORT");
	char *host = getenv("PGHOST");
	char *dbname = getenv("PGDATABASE");
	char connstr[512];
	PGconn *conn;
	PGresult *res, *res_copy;

	if (port == NULL)
		port = DEFAULT_PORT;
	if (host == NULL)
		host = DEFAULT_HOST;
	if (dbname == NULL)
		dbname = DEFAULT_DB;

	snprintf(connstr, sizeof(connstr), "port=%s host=%s dbname=%s",
			 port, host, dbname);

	conn = PQconnectdb(connstr);

	if (PQstatus(conn) != CONNECTION_OK)
	{
		fprintf(stderr, "Connection to database failed: %s",
PQerrorMessage(conn));
		return 1;
	}

	res = PQexec(conn, "SELECT 1");

	/* Copy the resuld wanted, who care what that is... */
	res_copy = PQcopyResult(res, PG_COPYRES_TUPLES | PG_COPYRES_ATTRS);

	PQclear(res);
	PQclear(res_copy);

	PQfinish(conn);
	return 0;
}


0001-Refactor-error-message-handling-in-pqAddTuple.patch
Description: Binary data


0002-Improve-overflow-checks-of-pqAddTuple-in-libpq.patch
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-26 Thread Tom Lane
Michael Paquier  writes:
> On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane  wrote:
>> I think the real problem occurs where we realloc the array bigger.

> Looking at the surroundings, I think that it would be nice to have
> pqAddTuple and PQsetvalue set an error message with this patch.

Yeah, I was thinking about that myself - the existing design presumes
that the only possible reason for failure of pqAddTuple is OOM, but
it'd be better to distinguish "too many tuples" from true OOM.  So
we should also refactor to make pqAddTuple responsible for setting
the error message.  Might be best to do the refactoring first.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-26 Thread Michael Paquier
On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane  wrote:
> I think the real problem occurs where we realloc the array bigger.
> tupArrSize needs to be kept to no more than INT_MAX --- and, ideally,
> it should reach that value rather than dying on the iteration after
> it reaches 2^30 (so that we support resultsets as large as we possibly
> can).  Without a range-check, it's not very clear what realloc will think
> it's being asked for.  Also, on 32-bit machines, we could overflow size_t
> before tupArrSize even gets that big, so a test against
> SIZE_MAX/sizeof(pointer) may be needed as well.
>
> As long as we constrain tupArrSize to be within bounds, we don't
> have to worry about overflow of ntups per se.

I just poked more seriously at this code, and we could use something like that:
@@ -868,6 +868,16 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
int newSize = (res->tupArrSize > 0) ? res->tupArrSize * 2 : 128;
PGresAttValue **newTuples;

+   if (res->tupArrSize == INT_MAX)
+   return FALSE;
+   if (new_size == INT_MIN)
+   new_size = INT_MAX;
+   if (newSize > SIZE_MAX / sizeof(PGresAttValue *))
+   return FALSE;

Looking at the surroundings, I think that it would be nice to have
pqAddTuple and PQsetvalue set an error message with this patch. The
user can see now that those would only properly report on OOM, but if
we add more types of errors proper error messages would be nice for
users.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Igor Korot
Hi,

On Thu, Aug 24, 2017 at 7:18 PM, Tom Lane  wrote:
> Igor Korot  writes:
>> So there is no way to retrieve an arbitrary number of rows from the query?
>> That sucks...
>
> The restriction is on the number of rows in one PGresult, not the total
> size of the query result.  You could use single-row mode, or use a cursor
> and fetch some reasonable number of rows at a time.  If you try to inhale
> all of a many-gigarow result at once, you're going to have OOM problems
> anyway, even if you had the patience to wait for it.  So I don't think the
> existence of a limit is a problem.  Failure to check it *is* a problem,
> certainly.

Is there a sample of using single-row mode?
How to turn it on and turn it off?

Is there a cursor example with the Prepared Statements?
The one in the documentation doesn't use them - it uses PQexec().

Thank you.

>
> regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Tom Lane
Igor Korot  writes:
> So there is no way to retrieve an arbitrary number of rows from the query?
> That sucks...

The restriction is on the number of rows in one PGresult, not the total
size of the query result.  You could use single-row mode, or use a cursor
and fetch some reasonable number of rows at a time.  If you try to inhale
all of a many-gigarow result at once, you're going to have OOM problems
anyway, even if you had the patience to wait for it.  So I don't think the
existence of a limit is a problem.  Failure to check it *is* a problem,
certainly.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane  wrote:
>> I haven't tried it, but it sure looks like it would, if you don't hit
>> OOM first.  pqAddTuple() isn't doing anything to guard against integer
>> overflow.  The lack of reports implies that no one has ever tried to
>> retrieve even 1G rows, let alone more ...

> Yeah, looking at the code we would just need to check if ntups gets
> negative (well, equal to INT_MIN) after being incremented.

I think the real problem occurs where we realloc the array bigger.
tupArrSize needs to be kept to no more than INT_MAX --- and, ideally,
it should reach that value rather than dying on the iteration after
it reaches 2^30 (so that we support resultsets as large as we possibly
can).  Without a range-check, it's not very clear what realloc will think
it's being asked for.  Also, on 32-bit machines, we could overflow size_t
before tupArrSize even gets that big, so a test against
SIZE_MAX/sizeof(pointer) may be needed as well.

As long as we constrain tupArrSize to be within bounds, we don't
have to worry about overflow of ntups per se.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Igor Korot
Michael et al,

On Thu, Aug 24, 2017 at 6:57 PM, Michael Paquier
 wrote:
> On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane  wrote:
>> I haven't tried it, but it sure looks like it would, if you don't hit
>> OOM first.  pqAddTuple() isn't doing anything to guard against integer
>> overflow.  The lack of reports implies that no one has ever tried to
>> retrieve even 1G rows, let alone more ...
>
> Yeah, looking at the code we would just need to check if ntups gets
> negative (well, equal to INT_MIN) after being incremented.

So there is no way to retrieve an arbitrary number of rows from the query?
That sucks...

Thank you.

> --
> Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Michael Paquier
On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane  wrote:
> I haven't tried it, but it sure looks like it would, if you don't hit
> OOM first.  pqAddTuple() isn't doing anything to guard against integer
> overflow.  The lack of reports implies that no one has ever tried to
> retrieve even 1G rows, let alone more ...

Yeah, looking at the code we would just need to check if ntups gets
negative (well, equal to INT_MIN) after being incremented.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Tom Lane
Igor Korot  writes:
> On Thu, Aug 24, 2017 at 8:51 AM, Tom Lane  wrote:
>> I think what we need is to (1) introduce some error checking in libpq so
>> that it reports an error if the resultset exceeds 2G rows --- right now
>> it'll just crash, I fear, and (2) change the documentation so that this
>> is explained as a library-wide limitation and not just a problem with
>> PQntuples.

> Does this mean that querying a table with a big number of rows will
> crash the psql?

I haven't tried it, but it sure looks like it would, if you don't hit
OOM first.  pqAddTuple() isn't doing anything to guard against integer
overflow.  The lack of reports implies that no one has ever tried to
retrieve even 1G rows, let alone more ...

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Igor Korot
Hi, guys,


On Thu, Aug 24, 2017 at 8:51 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot  wrote:
>>> [quote]
>>> PQntuples
>>>
>>> Returns the number of rows (tuples) in the query result. Because it
>>> returns an integer result, large result sets might overflow the return
>>> value on 32-bit operating systems.
>>>
>>> int PQntuples(const PGresult *res);
>>> [/quote]
>>>
>>> Is there another way to not to overflow the result?
>
>> Not really with the existing API.
>
> Actually, that documentation note is pretty beside-the-point, if not
> outright wrong.  The real issue here is that libpq's internal row counter
> is also a plain int.  As are the rownumber arguments to PQgetvalue and so
> on.  While we could widen that internal counter, it's useless to do so
> as long as these API choices prevent applications from dealing with
> resultsets of more than 2G rows.
>
> I think what we need is to (1) introduce some error checking in libpq so
> that it reports an error if the resultset exceeds 2G rows --- right now
> it'll just crash, I fear, and (2) change the documentation so that this
> is explained as a library-wide limitation and not just a problem with
> PQntuples.

Does this mean that querying a table with a big number of rows will
crash the psql?

Thank you.

>
> regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-24 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot  wrote:
>> [quote]
>> PQntuples
>> 
>> Returns the number of rows (tuples) in the query result. Because it
>> returns an integer result, large result sets might overflow the return
>> value on 32-bit operating systems.
>> 
>> int PQntuples(const PGresult *res);
>> [/quote]
>> 
>> Is there another way to not to overflow the result?

> Not really with the existing API.

Actually, that documentation note is pretty beside-the-point, if not
outright wrong.  The real issue here is that libpq's internal row counter
is also a plain int.  As are the rownumber arguments to PQgetvalue and so
on.  While we could widen that internal counter, it's useless to do so
as long as these API choices prevent applications from dealing with
resultsets of more than 2G rows.

I think what we need is to (1) introduce some error checking in libpq so
that it reports an error if the resultset exceeds 2G rows --- right now
it'll just crash, I fear, and (2) change the documentation so that this
is explained as a library-wide limitation and not just a problem with
PQntuples.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-22 Thread Igor Korot
Hi, Michael,

On Tue, Aug 22, 2017 at 8:32 PM, Michael Paquier
 wrote:
> On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot  wrote:
>> [quote]
>> PQntuples
>>
>> Returns the number of rows (tuples) in the query result. Because it
>> returns an integer result, large result sets might overflow the return
>> value on 32-bit operating systems.
>>
>> int PQntuples(const PGresult *res);
>>  [/quote]
>>
>> Is there another way to not to overflow the result?
>
> Not really with the existing API.


What do you mean "not really"

> Note that getting at 2 billion rows
> is really a lot, and would cause performance issues on the application
> side because a bunch of data would need to be processed, and getting
> out this much data is not network-wise anyway.

That's OK, As long as my program works with arbitrary number of rows.

Thank you.

> --
> Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Retrieving query results

2017-08-22 Thread Michael Paquier
On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot  wrote:
> [quote]
> PQntuples
>
> Returns the number of rows (tuples) in the query result. Because it
> returns an integer result, large result sets might overflow the return
> value on 32-bit operating systems.
>
> int PQntuples(const PGresult *res);
>  [/quote]
>
> Is there another way to not to overflow the result?

Not really with the existing API. Note that getting at 2 billion rows
is really a lot, and would cause performance issues on the application
side because a bunch of data would need to be processed, and getting
out this much data is not network-wise anyway.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Retrieving query results

2017-08-22 Thread Igor Korot
 Hi, ALL,
[quote]
PQntuples

Returns the number of rows (tuples) in the query result. Because it
returns an integer result, large result sets might overflow the return
value on 32-bit operating systems.

int PQntuples(const PGresult *res);
 [/quote]

Is there another way to not to overflow the result?

Thank you.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general