Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
Kevin Grittner kevin.gritt...@wicourts.gov writes: Leonardo F wrote: New version of the patch, let me know if I can fix/change something else. All issues addressed, with one tiny nit-pick -- the get_bit and set_bit methods are not part of the SQL standard. I took the liberty of removing SQL-standard from the documentation of these functions so that I can mark this Ready for Committer. Applied with some further editorialization. When I looked at how OVERLAY(text) was implemented, I didn't like it at all, so I took the liberty of transforming it to C code and then duplicating that implementation for bit and bytea. I doubt this would make any performance difference in simple cases, but it will have less surprise factor. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
New version of the patch, let me know if I can fix/change something else. Leonardo getsetbit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
Leonardo F wrote: New version of the patch, let me know if I can fix/change something else. All issues addressed, with one tiny nit-pick -- the get_bit and set_bit methods are not part of the SQL standard. I took the liberty of removing SQL-standard from the documentation of these functions so that I can mark this Ready for Committer. Thanks for the patch! -Kevin getsetbit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
All issues addressed, with one tiny nit-pick -- the get_bit and set_bit methods are not part of the SQL standard. Damn! I completely forgot to mention that I had no idea if what I wrote in the docs made any sense... Well thank you for your thorough review. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
In the documentation, the get_bit and set_bit methods are added to a list where we state The following SQL-standard functions work on bit strings as well as character strings; however they are not SQL-standard functions and are implemented on binary strings, not character strings. Ok, I didn't change that, but I guess I can fix it. The tests don't include any examples where the bit string lines up with a byte boundary or is operating on the first or last bit of a byte, and the overlay function is not tested with values which change the length of the bit string. (All of these cases seem to work in the current version, but seem like the sort of thing which could get broken in revision.) Ok, I'll add some corner-cases tests. There is a peculiar omission from the patch -- it adds supportfor the overlay function to bit strings but not binary strings. Sincethe standard drops the bit string as a standard type in the 2003standard, in favor of boolean type and binary string types, it seemsodd to omit binary string support (which is defined in the standard)but include bit string support (which isn't). What the patch adds seemsuseful, and I don't think the omission should be considered fatal, butit would be nice to see us also cover binary strings if we can. Mmh, don't know how complicated that would be, but I can give it a try: I guess it could be a simple SQL replacement (as for char strings and bit strings?) Well, there was one thing which is beyond my ken in this regard: I'm not sure that the cases from bits8 to (unsigned char *) are needed or appropriate, although on my machine they don't seem to hurt. Perhaps someone with a better grasp of such issues can comment. That's some kind of copypaste from varlena.c byteaGetBit: I don't know if they can be avoided. I'm not sure the leading whitespace in pg_proc.h is as it should be. Do you mean the tab in: _null_(tab)select ... ? Yes, I think I should remove it. I'd prefer to see the comments for get_bit and set_bit mention that the location is specified left-to-right in a zero-based fashion consistent with the other get_bit and set_bit functions, but inconsistent with the standard substring, position, overlay functions. Personally, I find it unfortunate that our extensions introduced this inconsistency, but let's keep it consistent within the similarly named functions. Ok; I found them bizarre too, but I kept it consistent. It seems odd to me that you specify the bit to set as a 32 bit integer, and that that is what get_bit returns, but again, this is consistent with what we do for the bytea functions, so it's probably the right thing to match that. Same as above: I tried to be consistent. Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is out-of-bounds seems somewhat bizarre to me -- I'd probably have chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again, it matches the bytea implementation. Again, same as above: consistency; but I can change it if you want This last point is probably more a matter of C coding style than anything, and I'm way to rusty in C to want to be the final arbiter of something like this, but it seems to me that oldByte and newByte local variables are just cluttering things up rather than clarifying: intoldByte, newByte; [...] oldByte = ((unsigned char *) r)[byteNo]; if (newBit == 0) newByte = oldByte (~(1 bitNo)); else newByte = oldByte | (1 bitNo); ((unsigned char *) r)[byteNo] = newByte; vs if (newBit == 0) ((unsigned char *) r)[byteNo] = (~(1 bitNo)); else ((unsigned char *) r)[byteNo] |= (1 bitNo); I agree, you version is cleaner. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
Leonardo F wrote: In the documentation, the get_bit and set_bit methods are added to a list where we state The following SQL-standard functions work on bit strings as well as character strings; however they are not SQL-standard functions and are implemented on binary strings, not character strings. Ok, I didn't change that, but I guess I can fix it. The items already on that list, and substring function that you added, *do* fit the description; get_bit and set_bit don't fit that description, so they don't belong on that list. They must be described in some other way. There is a peculiar omission from the patch -- it adds support for the overlay function to bit strings but not binary strings. Since the standard drops the bit string as a standard type in the 2003 standard, in favor of boolean type and binary string types, it seems odd to omit binary string support (which is defined in the standard) but include bit string support (which isn't). What the patch adds seems useful, and I don't think the omission should be considered fatal, but it would be nice to see us also cover binary strings if we can. Mmh, don't know how complicated that would be, but I can give it a try: I guess it could be a simple SQL replacement (as for char strings and bit strings?) The standard explicitly defines it as being equivalent to the substring and concatenation technique, so that should work. Well, there was one thing which is beyond my ken in this regard: I'm not sure that the cases from bits8 to (unsigned char *) are needed or appropriate, although on my machine they don't seem to hurt. Perhaps someone with a better grasp of such issues can comment. That's some kind of copypaste from varlena.c byteaGetBit: I don't know if they can be avoided. Ah, but there it's casting the result of VARDATA_ANY. Here you already have the value in a bits8 variable. I looked it over some more and I'm pretty confident we can omit those casts in bitsetbit. [pointer is zero-based] Ok; I found them bizarre too, but I kept it consistent. [int32 used for bit values] Same as above: I tried to be consistent. [ERRCODE_ARRAY_SUBSCRIPT_ERROR used when query has no array] Again, same as above: consistency; but I can change it if you want I'm not suggesting the above require changes; I mentioned those primarily for the benefit of whoever eventually commits this, to try to save time chasing down the issues. [questions use of oldByte and newByte local variables] I agree, you version is cleaner. Cool. Should look even tidier without those casts. :-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
This patch no longer applies. Could you rebase it? Done (I think). Added a couple of simple tests for bit overlay. I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has to be changed. Leonardo Index: src/backend/utils/adt/varbit.c === RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varbit.c,v retrieving revision 1.63 diff -c -r1.63 varbit.c *** src/backend/utils/adt/varbit.c 7 Jan 2010 20:17:43 - 1.63 --- src/backend/utils/adt/varbit.c 18 Jan 2010 09:05:50 - *** *** 1606,1608 --- 1606,1704 } PG_RETURN_INT32(0); } + + + /* bitsetbit + * Given an instance of type 'bit' creates a new one with + * the Nth bit set to the given value. + */ + Datum + bitsetbit(PG_FUNCTION_ARGS) + { + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + int32 n = PG_GETARG_INT32(1); + int32 newBit = PG_GETARG_INT32(2); + VarBit *result; + int len, + bitlen; + bits8 *r, + *p; + int byteNo, + bitNo; + int oldByte, + newByte; + + bitlen = VARBITLEN(arg1); + if (n 0 || n = bitlen) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg(index %d out of valid range, 0..%d, + n, bitlen - 1))); + /* +* sanity check! +*/ + if (newBit != 0 newBit != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg(new bit must be 0 or 1))); + + len = VARSIZE(arg1); + result = (VarBit *) palloc(len); + SET_VARSIZE(result, len); + VARBITLEN(result) = bitlen; + + p = VARBITS(arg1); + r = VARBITS(result); + + memcpy(r, p, VARBITBYTES(arg1)); + + byteNo = n / BITS_PER_BYTE; + bitNo = BITS_PER_BYTE - 1 - (n % BITS_PER_BYTE); + + /* +* Update the byte. +*/ + oldByte = ((unsigned char *) r)[byteNo]; + + if (newBit == 0) + newByte = oldByte (~(1 bitNo)); + else + newByte = oldByte | (1 bitNo); + + ((unsigned char *) r)[byteNo] = newByte; + + PG_RETURN_VARBIT_P(result); + } + + /* bitgetbit + * returns the value of the Nth bit (0 or 1). + */ + Datum + bitgetbit(PG_FUNCTION_ARGS) + { + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + int32 n = PG_GETARG_INT32(1); + int bitlen; + int byteNo, + bitNo; + int byte; + + bitlen = VARBITLEN(arg1); + + if (n 0 || n = bitlen) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg(index %d out of valid range, 0..%d, + n, bitlen - 1))); + + byteNo = n / BITS_PER_BYTE; + bitNo = BITS_PER_BYTE - 1 - n % BITS_PER_BYTE; + + byte = ((unsigned char *) VARBITS(arg1))[byteNo]; + + if (byte (1 bitNo)) + PG_RETURN_INT32(1); + else + PG_RETURN_INT32(0); + } + Index: src/include/catalog/pg_proc.h === RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.562 diff -c -r1.562 pg_proc.h *** src/include/catalog/pg_proc.h 15 Jan 2010 09:19:07 - 1.562 --- src/include/catalog/pg_proc.h 18 Jan 2010 09:05:58 - *** *** 2405,2410 --- 2405,2418 DATA(insert OID = 1699 ( substring PGNSP PGUID 12 1 0 0 f f f t f i 2 0 1560 1560 23 _null_ _null_ _null_ _null_ bitsubstr_no_len _null_ _null_ _null_ )); DESCR(return portion of bitstring); + DATA(insert OID = 3030 ( overlay PGNSP PGUID 14 1 0 0 f f f t f i 4 0 1560 1560 1560 23 23 _null_ _null_ _null_ _null_ select pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + $4)) _null_ _null_ _null_ )); + DESCR(substitute portion of bit string); + DATA(insert OID = 3031 ( overlay PGNSP PGUID 14 1 0 0 f f f t f i 3 0 1560 1560 1560 23 _null_ _null_ _null_ _null_select pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + pg_catalog.length($2))) _null_ _null_ _null_ )); + DESCR(substitute portion of bit string); + DATA(insert OID = 3032 ( get_bitPGNSP PGUID 12 1 0 0 f f f t f i 2 0 23 1560 23 _null_ _null_ _null_ _null_ bitgetbit _null_ _null_ _null_ )); + DESCR(get bit); + DATA(insert
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
Leonardo F wrote: Done (I think). Added a couple of simple tests for bit overlay. I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has to be changed. [Following the Reviewing A Patch wiki, more or less] The patch is in context diff format and applies cleanly. It builds and passes regression tests. There are documentation changes and regression tests, although there is a must-fix problem in the docs and I would prefer to see better coverage of corner cases in the regression tests. In the documentation, the get_bit and set_bit methods are added to a list where we state The following SQL-standard functions work on bit strings as well as character strings; however they are not SQL-standard functions and are implemented on binary strings, not character strings. The tests don't include any examples where the bit string lines up with a byte boundary or is operating on the first or last bit of a byte, and the overlay function is not tested with values which change the length of the bit string. (All of these cases seem to work in the current version, but seem like the sort of thing which could get broken in revision.) It seems to *me* like the patch does something useful; also, it was on the TODO list, the author felt it was worth the effort, the overlay support for bit strings seems to comply with the spirit of the standard by extending a standard feature to a legacy data type, the get_bit and set_bit functions extend custom PostgreSQL behavior we have on some other data types to an additional type where it seems useful, and there is an incidental fix for a documentation gap. So I think we want it. We don't already have it. As mentioned above, I believe it follows the SQL spec where appropriate and PostgreSQL community-agreed behavior where appropriate. I don't see where pg_dump support is needed. I see no dangers. There is a peculiar omission from the patch -- it adds supportfor the overlay function to bit strings but not binary strings. Sincethe standard drops the bit string as a standard type in the 2003standard, in favor of boolean type and binary string types, it seemsodd to omit binary string support (which is defined in the standard)but include bit string support (which isn't). What the patch adds seemsuseful, and I don't think the omission should be considered fatal, butit would be nice to see us also cover binary strings if we can. The features worked as advertised in all the tests I could think up. There were no assertion failures or crashes. Simple tests performed reasonably well -- a SELECT adding the get_bit and set_bit around a bit literal typically ran in under 0.05 ms longer than a SELECT of the literal by itself. SELECT of an overlay function typically ran about 0.25 ms longer than the literal by itself. Performance of the overlay function could probably be improved with a C implementation, rather than it being declared as the concatenation of the results of two substring functions with a third value. Unless someone actually has unacceptable performance with this implementation, I'm inclined to think that optimization isn't worth the effort. This patch adds new functions rather than aiming to improve peformance of anything; extensive performance tests were not done. No slowdowns of other things seems likely and none was observed. I didn't see any portability issues; the techniques used seemed consistent with the related code, so it should be as portable as what already is in use. Well, there was one thing which is beyond my ken in this regard: I'm not sure that the cases from bits8 to (unsigned char *) are needed or appropriate, although on my machine they don't seem to hurt. Perhaps someone with a better grasp of such issues can comment. No new warning issues were generated in the build. I saw no problem interdependencies. In the extremely nit-picky department: I'm not sure the leading whitespace in pg_proc.h is as it should be. I'd prefer to see the comments for get_bit and set_bit mention that the location is specified left-to-right in a zero-based fashion consistent with the other get_bit and set_bit functions, but inconsistent with the standard substring, position, overlay functions. Personally, I find it unfortunate that our extensions introduced this inconsistency, but let's keep it consistent within the similarly named functions. This patch uses all-lowercase function names for the internal functions, which is consistent with the rest of the varbit.c file, but inconsistent with the similar functions in varlena.c. I'm not sure which it is more important that they match. There is bound to be some surprise factor in the mismatch either way. I'm not sure which way would be least astonishing. It seems odd to me that you specify the bit to set as a 32 bit integer, and that that is what get_bit returns, but again, this is consistent with what we do for the bytea functions, so it's probably the right thing to
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
Leonardo F 01/07/10 6:03 AM wrote: attached a patch Leonardo, This patch no longer applies. Could you rebase it? Thanks, -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers