Hi Anzai-san, The latest patch is fair enough for me, so let me hand over its reviewing for comitters.
Thanks, 2012/10/1 Nozomi Anzai <an...@sraoss.co.jp>: > Here is 64-bit API for large object version 3 patch. > >> I checked this patch. It looks good, but here are still some points to be >> discussed. >> >> * I have a question. What is the meaning of INT64_IS_BUSTED? >> It seems to me a marker to indicate a platform without 64bit support. >> However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce >> says as follows: >> | Remove all the special-case code for INT64_IS_BUSTED, per decision that >> | we're not going to support that anymore. > > Removed INT64_IS_BUSTED. > > >> * At inv_seek(), it seems to me it checks offset correctness with wrong way, >> as follows: >> | case SEEK_SET: >> | if (offset < 0) >> | elog(ERROR, "invalid seek offset: " INT64_FORMAT, offset); >> | obj_desc->offset = offset; >> | break; >> It is a right assumption, if large object size would be restricted to 2GB. >> But the largest positive int64 is larger than expected limitation. >> So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE) >> instead. > > Fixed. > > >> * At inv_write(), it definitely needs a check to prevent data-write upper >> 4TB. >> In case when obj_desc->offset is a bit below 4TB, an additional 1GB write >> will break head of the large object because of "pageno" overflow. > > Added a such check. > > >> * Please also add checks on inv_read() to prevent LargeObjectDesc->offset >> unexpectedly overflows 4TB boundary. > > Added a such check. > > >> * At inv_truncate(), variable "off" is re-defined to int64. Is it really >> needed >> change? All its usage is to store the result of "len % LOBLKSIZE". > > Fixed and back to int32. > > >> Thanks, >> >> 2012/9/24 Nozomi Anzai <an...@sraoss.co.jp>: >> > Here is 64-bit API for large object version 2 patch. >> > >> >> I checked this patch. It can be applied onto the latest master branch >> >> without any problems. My comments are below. >> >> >> >> 2012/9/11 Tatsuo Ishii <is...@postgresql.org>: >> >> > Ok, here is the patch to implement 64-bit API for large object, to >> >> > allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to >> >> > 32KB). The patch is based on Jeremy Drake's patch posted on September >> >> > 23, 2005 >> >> > (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) >> >> > and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai >> >> > for the backend part and Yugo Nagata for the rest(including >> >> > documentation patch). >> >> > >> >> > Here are changes made in the patch: >> >> > >> >> > 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) >> >> > >> >> > lo_initialize() gathers backend 64-bit large object handling >> >> > function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. >> >> > >> >> > If client calls lo_*64 functions and backend does not support them, >> >> > lo_*64 functions return error to caller. There might be an argument >> >> > since calls to lo_*64 functions can automatically be redirected to >> >> > 32-bit older API. I don't know this is worth the trouble though. >> >> > >> >> I think it should definitely return an error code when user tries to >> >> use lo_*64 functions towards the backend v9.2 or older, because >> >> fallback to 32bit API can raise unexpected errors if application >> >> intends to seek the area over than 2GB. >> >> >> >> > Currently lo_initialize() throws an error if one of oids are not >> >> > available. I doubt we do the same way for 64-bit functions since this >> >> > will make 9.3 libpq unable to access large objects stored in pre-9.2 >> >> > PostgreSQL servers. >> >> > >> >> It seems to me the situation to split the case of pre-9.2 and post-9.3 >> >> using a condition of "conn->sversion >= 90300". >> > >> > Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc. >> > >> > >> >> > To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr >> >> > is a pointer to 64-bit integer and actual data is placed somewhere >> >> > else. There might be other way: add new member to union u to store >> >> > 64-bit integer: >> >> > >> >> > typedef struct >> >> > { >> >> > int len; >> >> > int isint; >> >> > union >> >> > { >> >> > int *ptr; /* can't use >> >> > void (dec compiler barfs) */ >> >> > int integer; >> >> > int64 bigint; /* 64-bit integer */ >> >> > } u; >> >> > } PQArgBlock; >> >> > >> >> > I'm a little bit worried about this way because PQArgBlock is a public >> >> > interface. >> >> > >> >> I'm inclined to add a new field for the union; that seems to me straight >> >> forward approach. >> >> For example, the manner in lo_seek64() seems to me confusable. >> >> It set 1 on "isint" field even though pointer is delivered actually. >> >> >> >> + argv[1].isint = 1; >> >> + argv[1].len = 8; >> >> + argv[1].u.ptr = (int *) &len; >> > >> > Your proposal was not adopted per discussion. >> > >> > >> >> > Also we add new type "pg_int64": >> >> > >> >> > #ifndef NO_PG_INT64 >> >> > #define HAVE_PG_INT64 1 >> >> > typedef long long int pg_int64; >> >> > #endif >> >> > >> >> > in postgres_ext.h per suggestion from Tom Lane: >> >> > http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php >> >> > >> >> I'm uncertain about context of this discussion. >> >> >> >> Does it make matter if we include <stdint.h> and use int64_t instead >> >> of the self defined data type? >> > >> > Your proposal was not adopted per discussion. >> > Per discussion, endiannness translation was moved to fe-lobj.c. >> > >> > >> >> > 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai) >> >> > >> >> > Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle >> >> > 64-bit seek position and data length. loread64 and lowrite64 are not >> >> > added because if a program tries to read/write more than 2GB at once, >> >> > it would be a sign that the program need to be re-designed anyway. >> >> > >> >> I think it is a reasonable. >> >> >> >> > 3) Backend inv_api.c functions(Nozomi Anzai) >> >> > >> >> > No need to add new functions. Just extend them to handle 64-bit data. >> >> > >> >> > BTW , what will happen if older 32-bit libpq accesses large objects >> >> > over 2GB? >> >> > >> >> > lo_read and lo_write: they can read or write lobjs using 32-bit API as >> >> > long as requested read/write data length is smaller than 2GB. So I >> >> > think we can safely allow them to access over 2GB lobjs. >> >> > >> >> > lo_lseek: again as long as requested offset is smaller than 2GB, there >> >> > would be no problem. >> >> > >> >> > lo_tell:if current seek position is beyond 2GB, returns an error. >> >> > >> >> Even though iteration of lo_lseek() may move the offset to 4TB, it also >> >> makes unavailable to use lo_tell() to obtain the current offset, so I >> >> think >> >> it is reasonable behavior. >> >> >> >> However, error code is not an appropriate one. >> >> >> >> + if (INT_MAX < offset) >> >> + { >> >> + ereport(ERROR, >> >> + (errcode(ERRCODE_UNDEFINED_OBJECT), >> >> + errmsg("invalid large-object >> >> descriptor: %d", fd))); >> >> + PG_RETURN_INT32(-1); >> >> + } >> >> >> >> According to the manpage of lseek(2) >> >> EOVERFLOW >> >> The resulting file offset cannot be represented in an off_t. >> >> >> >> Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW. >> > >> > Changed the error code and error message. We added a new error code >> > "ERRCODE_UNDEFINED_OBJECT (22P07)". >> > >> > >> >> > 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo >> >> > Nagata) >> >> > >> >> > Comments and suggestions are welcome. >> >> > >> >> miscellaneous comments are below. >> >> >> >> Regression test is helpful. Even though no need to try to create 4TB large >> >> object, it is helpful to write some chunks around the design boundary. >> >> Could you add some test cases that writes some chunks around 4TB offset. >> > >> > Added 64-bit lobj test items into regression test and confirmed it worked >> > rightly. >> > >> > >> >> Thanks, >> >> -- >> >> KaiGai Kohei <kai...@kaigai.gr.jp> >> >> >> >> >> >> -- >> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> >> To make changes to your subscription: >> >> http://www.postgresql.org/mailpref/pgsql-hackers >> > >> > >> > -- >> > Nozomi Anzai >> > SRA OSS, Inc. Japan >> > >> > >> > -- >> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> > To make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-hackers >> > >> >> >> >> -- >> KaiGai Kohei <kai...@kaigai.gr.jp> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers > > > -- > Nozomi Anzai > SRA OSS, Inc. Japan > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- KaiGai Kohei <kai...@kaigai.gr.jp> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers