Re: [GENERAL] Retrieving query results
On Sun, Aug 27, 2017 at 12:12 AM, Tom Lanewrote: > Michael Paquier writes: >> On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane wrote: >>> I think the real problem occurs where we realloc the array bigger. > >> Looking at the surroundings, I think that it would be nice to have >> pqAddTuple and PQsetvalue set an error message with this patch. > > Yeah, I was thinking about that myself - the existing design presumes > that the only possible reason for failure of pqAddTuple is OOM, but > it'd be better to distinguish "too many tuples" from true OOM. So > we should also refactor to make pqAddTuple responsible for setting > the error message. Might be best to do the refactoring first. Attached are two patches: 1) 0001 refactors the code around pqAddTuple to be able to handle error messages and assign them in PQsetvalue particularly. 2) 0002 adds sanity checks in pqAddTuple for overflows, maximizing the size of what is allocated to INT_MAX but now more. pqRowProcessor() still has errmsgp, but it is never used on HEAD. At least with this set of patches it comes to be useful. We could rework check_field_number() to use as well an error message string, but I have left that out to keep things simple. Not sure if any complication is worth compared to just copying the error message in case of an unmatching column number. Attached is as well a small program I have used to test PQsetvalue through PQcopyResult to see if results get correctly allocated at each call, looking at the error message stacks on the way. -- Michael /* * Script to test PQcopyResult and subsequently PQsetvalue. * Compile with for example: * gcc -lpq -g -o pg_copy_res pg_copy_res.c */ #include #include #include "libpq-fe.h" #define DEFAULT_PORT "5432" #define DEFAULT_HOST "/tmp" #define DEFAULT_DB "postgres" int main() { char *port = getenv("PGPORT"); char *host = getenv("PGHOST"); char *dbname = getenv("PGDATABASE"); char connstr[512]; PGconn *conn; PGresult *res, *res_copy; if (port == NULL) port = DEFAULT_PORT; if (host == NULL) host = DEFAULT_HOST; if (dbname == NULL) dbname = DEFAULT_DB; snprintf(connstr, sizeof(connstr), "port=%s host=%s dbname=%s", port, host, dbname); conn = PQconnectdb(connstr); if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Connection to database failed: %s", PQerrorMessage(conn)); return 1; } res = PQexec(conn, "SELECT 1"); /* Copy the resuld wanted, who care what that is... */ res_copy = PQcopyResult(res, PG_COPYRES_TUPLES | PG_COPYRES_ATTRS); PQclear(res); PQclear(res_copy); PQfinish(conn); return 0; } 0001-Refactor-error-message-handling-in-pqAddTuple.patch Description: Binary data 0002-Improve-overflow-checks-of-pqAddTuple-in-libpq.patch Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Michael Paquierwrites: > On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane wrote: >> I think the real problem occurs where we realloc the array bigger. > Looking at the surroundings, I think that it would be nice to have > pqAddTuple and PQsetvalue set an error message with this patch. Yeah, I was thinking about that myself - the existing design presumes that the only possible reason for failure of pqAddTuple is OOM, but it'd be better to distinguish "too many tuples" from true OOM. So we should also refactor to make pqAddTuple responsible for setting the error message. Might be best to do the refactoring first. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
On Fri, Aug 25, 2017 at 8:10 AM, Tom Lanewrote: > I think the real problem occurs where we realloc the array bigger. > tupArrSize needs to be kept to no more than INT_MAX --- and, ideally, > it should reach that value rather than dying on the iteration after > it reaches 2^30 (so that we support resultsets as large as we possibly > can). Without a range-check, it's not very clear what realloc will think > it's being asked for. Also, on 32-bit machines, we could overflow size_t > before tupArrSize even gets that big, so a test against > SIZE_MAX/sizeof(pointer) may be needed as well. > > As long as we constrain tupArrSize to be within bounds, we don't > have to worry about overflow of ntups per se. I just poked more seriously at this code, and we could use something like that: @@ -868,6 +868,16 @@ pqAddTuple(PGresult *res, PGresAttValue *tup) int newSize = (res->tupArrSize > 0) ? res->tupArrSize * 2 : 128; PGresAttValue **newTuples; + if (res->tupArrSize == INT_MAX) + return FALSE; + if (new_size == INT_MIN) + new_size = INT_MAX; + if (newSize > SIZE_MAX / sizeof(PGresAttValue *)) + return FALSE; Looking at the surroundings, I think that it would be nice to have pqAddTuple and PQsetvalue set an error message with this patch. The user can see now that those would only properly report on OOM, but if we add more types of errors proper error messages would be nice for users. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Hi, On Thu, Aug 24, 2017 at 7:18 PM, Tom Lanewrote: > Igor Korot writes: >> So there is no way to retrieve an arbitrary number of rows from the query? >> That sucks... > > The restriction is on the number of rows in one PGresult, not the total > size of the query result. You could use single-row mode, or use a cursor > and fetch some reasonable number of rows at a time. If you try to inhale > all of a many-gigarow result at once, you're going to have OOM problems > anyway, even if you had the patience to wait for it. So I don't think the > existence of a limit is a problem. Failure to check it *is* a problem, > certainly. Is there a sample of using single-row mode? How to turn it on and turn it off? Is there a cursor example with the Prepared Statements? The one in the documentation doesn't use them - it uses PQexec(). Thank you. > > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Igor Korotwrites: > So there is no way to retrieve an arbitrary number of rows from the query? > That sucks... The restriction is on the number of rows in one PGresult, not the total size of the query result. You could use single-row mode, or use a cursor and fetch some reasonable number of rows at a time. If you try to inhale all of a many-gigarow result at once, you're going to have OOM problems anyway, even if you had the patience to wait for it. So I don't think the existence of a limit is a problem. Failure to check it *is* a problem, certainly. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Michael Paquierwrites: > On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane wrote: >> I haven't tried it, but it sure looks like it would, if you don't hit >> OOM first. pqAddTuple() isn't doing anything to guard against integer >> overflow. The lack of reports implies that no one has ever tried to >> retrieve even 1G rows, let alone more ... > Yeah, looking at the code we would just need to check if ntups gets > negative (well, equal to INT_MIN) after being incremented. I think the real problem occurs where we realloc the array bigger. tupArrSize needs to be kept to no more than INT_MAX --- and, ideally, it should reach that value rather than dying on the iteration after it reaches 2^30 (so that we support resultsets as large as we possibly can). Without a range-check, it's not very clear what realloc will think it's being asked for. Also, on 32-bit machines, we could overflow size_t before tupArrSize even gets that big, so a test against SIZE_MAX/sizeof(pointer) may be needed as well. As long as we constrain tupArrSize to be within bounds, we don't have to worry about overflow of ntups per se. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Michael et al, On Thu, Aug 24, 2017 at 6:57 PM, Michael Paquierwrote: > On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane wrote: >> I haven't tried it, but it sure looks like it would, if you don't hit >> OOM first. pqAddTuple() isn't doing anything to guard against integer >> overflow. The lack of reports implies that no one has ever tried to >> retrieve even 1G rows, let alone more ... > > Yeah, looking at the code we would just need to check if ntups gets > negative (well, equal to INT_MIN) after being incremented. So there is no way to retrieve an arbitrary number of rows from the query? That sucks... Thank you. > -- > Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
On Thu, Aug 24, 2017 at 11:56 PM, Tom Lanewrote: > I haven't tried it, but it sure looks like it would, if you don't hit > OOM first. pqAddTuple() isn't doing anything to guard against integer > overflow. The lack of reports implies that no one has ever tried to > retrieve even 1G rows, let alone more ... Yeah, looking at the code we would just need to check if ntups gets negative (well, equal to INT_MIN) after being incremented. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Igor Korotwrites: > On Thu, Aug 24, 2017 at 8:51 AM, Tom Lane wrote: >> I think what we need is to (1) introduce some error checking in libpq so >> that it reports an error if the resultset exceeds 2G rows --- right now >> it'll just crash, I fear, and (2) change the documentation so that this >> is explained as a library-wide limitation and not just a problem with >> PQntuples. > Does this mean that querying a table with a big number of rows will > crash the psql? I haven't tried it, but it sure looks like it would, if you don't hit OOM first. pqAddTuple() isn't doing anything to guard against integer overflow. The lack of reports implies that no one has ever tried to retrieve even 1G rows, let alone more ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Hi, guys, On Thu, Aug 24, 2017 at 8:51 AM, Tom Lanewrote: > Michael Paquier writes: >> On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot wrote: >>> [quote] >>> PQntuples >>> >>> Returns the number of rows (tuples) in the query result. Because it >>> returns an integer result, large result sets might overflow the return >>> value on 32-bit operating systems. >>> >>> int PQntuples(const PGresult *res); >>> [/quote] >>> >>> Is there another way to not to overflow the result? > >> Not really with the existing API. > > Actually, that documentation note is pretty beside-the-point, if not > outright wrong. The real issue here is that libpq's internal row counter > is also a plain int. As are the rownumber arguments to PQgetvalue and so > on. While we could widen that internal counter, it's useless to do so > as long as these API choices prevent applications from dealing with > resultsets of more than 2G rows. > > I think what we need is to (1) introduce some error checking in libpq so > that it reports an error if the resultset exceeds 2G rows --- right now > it'll just crash, I fear, and (2) change the documentation so that this > is explained as a library-wide limitation and not just a problem with > PQntuples. Does this mean that querying a table with a big number of rows will crash the psql? Thank you. > > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Michael Paquierwrites: > On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot wrote: >> [quote] >> PQntuples >> >> Returns the number of rows (tuples) in the query result. Because it >> returns an integer result, large result sets might overflow the return >> value on 32-bit operating systems. >> >> int PQntuples(const PGresult *res); >> [/quote] >> >> Is there another way to not to overflow the result? > Not really with the existing API. Actually, that documentation note is pretty beside-the-point, if not outright wrong. The real issue here is that libpq's internal row counter is also a plain int. As are the rownumber arguments to PQgetvalue and so on. While we could widen that internal counter, it's useless to do so as long as these API choices prevent applications from dealing with resultsets of more than 2G rows. I think what we need is to (1) introduce some error checking in libpq so that it reports an error if the resultset exceeds 2G rows --- right now it'll just crash, I fear, and (2) change the documentation so that this is explained as a library-wide limitation and not just a problem with PQntuples. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
Hi, Michael, On Tue, Aug 22, 2017 at 8:32 PM, Michael Paquierwrote: > On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot wrote: >> [quote] >> PQntuples >> >> Returns the number of rows (tuples) in the query result. Because it >> returns an integer result, large result sets might overflow the return >> value on 32-bit operating systems. >> >> int PQntuples(const PGresult *res); >> [/quote] >> >> Is there another way to not to overflow the result? > > Not really with the existing API. What do you mean "not really" > Note that getting at 2 billion rows > is really a lot, and would cause performance issues on the application > side because a bunch of data would need to be processed, and getting > out this much data is not network-wise anyway. That's OK, As long as my program works with arbitrary number of rows. Thank you. > -- > Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Retrieving query results
On Wed, Aug 23, 2017 at 3:19 AM, Igor Korotwrote: > [quote] > PQntuples > > Returns the number of rows (tuples) in the query result. Because it > returns an integer result, large result sets might overflow the return > value on 32-bit operating systems. > > int PQntuples(const PGresult *res); > [/quote] > > Is there another way to not to overflow the result? Not really with the existing API. Note that getting at 2 billion rows is really a lot, and would cause performance issues on the application side because a bunch of data would need to be processed, and getting out this much data is not network-wise anyway. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Retrieving query results
Hi, ALL, [quote] PQntuples Returns the number of rows (tuples) in the query result. Because it returns an integer result, large result sets might overflow the return value on 32-bit operating systems. int PQntuples(const PGresult *res); [/quote] Is there another way to not to overflow the result? Thank you. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general