Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-25 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Leonardo F wrote:
 New version of the patch, let me know if I can fix/change something
 else.

 All issues addressed, with one tiny nit-pick -- the get_bit and
 set_bit methods are not part of the SQL standard. I took the liberty
 of removing SQL-standard from the documentation of these functions
 so that I can mark this Ready for Committer.

Applied with some further editorialization.  When I looked at how
OVERLAY(text) was implemented, I didn't like it at all, so I took the
liberty of transforming it to C code and then duplicating that
implementation for bit and bytea.  I doubt this would make any
performance difference in simple cases, but it will have less surprise
factor.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Leonardo F
New version of the patch, let me know if I can fix/change something else.



Leonardo



  

getsetbit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Kevin Grittner
Leonardo F wrote:
 New version of the patch, let me know if I can fix/change something
 else.

All issues addressed, with one tiny nit-pick -- the get_bit and
set_bit methods are not part of the SQL standard. I took the liberty
of removing SQL-standard from the documentation of these functions
so that I can mark this Ready for Committer.
 
Thanks for the patch!

-Kevin




getsetbit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Leonardo F
 All issues addressed, with one tiny nit-pick -- the get_bit and
 set_bit methods are not part of the SQL standard. 


Damn! I completely forgot to mention that I had no idea if what I wrote
in the docs made any sense...

Well thank you for your thorough review.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-19 Thread Leonardo F
 In the
 documentation, the get_bit and set_bit methods are added to a list
 where we state The following SQL-standard functions work on bit
 strings as well as character strings; however they are not
 SQL-standard functions and are implemented on binary strings, not
 character strings. 

Ok, I didn't change that, but I guess I can fix it.


 The tests don't include any examples where the bit
 string lines up with a byte boundary or is operating on the first or
 last bit of a byte, and the overlay function is not tested with
 values which change the length of the bit string. (All of these
 cases seem to work in the current version, but seem like the sort of
 thing which could get broken in revision.)


Ok, I'll add some corner-cases tests.

 There is a peculiar omission from the patch -- it adds supportfor the
 overlay function to bit strings but not binary strings. Sincethe
 standard drops the bit string as a standard type in the 2003standard,
 in favor of boolean type and binary string types, it seemsodd to omit
 binary string support (which is defined in the standard)but include
 bit string support (which isn't). What the patch adds seemsuseful,
 and I don't think the omission should be considered fatal, butit
 would be nice to see us also cover binary strings if we can.


Mmh, don't know how complicated that would be, but I can give it a try:
I guess it could be a simple SQL replacement (as for char strings and bit
strings?)

 Well, there was one thing which is beyond my ken
 in this regard: I'm not sure that the cases from bits8 to (unsigned
 char *) are needed or appropriate, although on my machine they don't
 seem to hurt.  Perhaps someone with a better grasp of such issues can
 comment.


That's some kind of copypaste from varlena.c byteaGetBit: I don't know
if they can be avoided.

 I'm not sure the leading whitespace in pg_proc.h is as it should be.


Do you mean the tab in:
 
_null_(tab)select ...

?

Yes, I think I should remove it.

 I'd prefer to see the comments for get_bit and set_bit mention that
 the location is specified left-to-right in a zero-based fashion
 consistent with the other get_bit and set_bit functions, but
 inconsistent with the standard substring, position, overlay
 functions.  Personally, I find it unfortunate that our extensions
 introduced this inconsistency, but let's keep it consistent within
 the similarly named functions.

Ok; I found them bizarre too, but I kept it consistent.

 It seems odd to me that you specify the bit to set as a 32 bit
 integer, and that that is what get_bit returns, but again, this is
 consistent with what we do for the bytea functions, so it's probably
 the right thing to match that.


Same as above: I tried to be consistent.

 Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is
 out-of-bounds seems somewhat bizarre to me -- I'd probably have
 chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again,
 it matches the bytea implementation.


Again, same as above: consistency; but I can change it if you want

 This last point is probably more a matter of C coding style than
 anything, and I'm way to rusty in C to want to be the final arbiter
 of something like this, but it seems to me that oldByte and newByte
 local variables are just cluttering things up rather than clarifying:
 
 intoldByte,
 newByte;
 [...]
 oldByte = ((unsigned char *) r)[byteNo];
 
 if (newBit == 0)
 newByte = oldByte  (~(1  bitNo));
 else
 newByte = oldByte | (1  bitNo);
 
 ((unsigned char *) r)[byteNo] = newByte;
 
 vs
 
 if (newBit == 0)
 ((unsigned char *) r)[byteNo] = (~(1  bitNo));
 else
 ((unsigned char *) r)[byteNo] |= (1  bitNo);
 


I agree, you version is cleaner.


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-19 Thread Kevin Grittner
Leonardo F  wrote:
 
 In the documentation, the get_bit and set_bit methods are added
 to a list where we state The following SQL-standard functions
 work on bit strings as well as character strings; however they
 are not SQL-standard functions and are implemented on binary
 strings, not character strings. 
 
 Ok, I didn't change that, but I guess I can fix it.
 
The items already on that list, and substring function that you
added, *do* fit the description; get_bit and set_bit don't fit that
description, so they don't belong on that list.  They must be
described in some other way.
 
 There is a peculiar omission from the patch -- it adds support
 for the overlay function to bit strings but not binary strings.
 Since the standard drops the bit string as a standard type in the
 2003 standard, in favor of boolean type and binary string types,
 it seems odd to omit binary string support (which is defined in
 the standard) but include bit string support (which isn't).  What
 the patch adds seems useful, and I don't think the omission
 should be considered fatal, but it would be nice to see us also
 cover binary strings if we can.

 Mmh, don't know how complicated that would be, but I can give it a
 try: I guess it could be a simple SQL replacement (as for char
 strings and bit strings?)
 
The standard explicitly defines it as being equivalent to the
substring and concatenation technique, so that should work.
 
 Well, there was one thing which is beyond my ken in this regard:
 I'm not sure that the cases from bits8 to (unsigned char *) are
 needed or appropriate, although on my machine they don't seem to
 hurt.  Perhaps someone with a better grasp of such issues can
 comment.
 
 That's some kind of copypaste from varlena.c byteaGetBit: I don't
 know if they can be avoided.
 
Ah, but there it's casting the result of VARDATA_ANY.  Here you
already have the value in a bits8 variable.  I looked it over some
more and I'm pretty confident we can omit those casts in bitsetbit.
 
 [pointer is zero-based]

 Ok; I found them bizarre too, but I kept it consistent.
 
 [int32 used for bit values]
 
 Same as above: I tried to be consistent.
 
 [ERRCODE_ARRAY_SUBSCRIPT_ERROR used when query has no array]
 
 Again, same as above: consistency; but I can change it if you want
 
I'm not suggesting the above require changes; I mentioned those
primarily for the benefit of whoever eventually commits this, to try
to save time chasing down the issues.
 
 [questions use of oldByte and newByte local variables]
 
 I agree, you version is cleaner.
 
Cool.  Should look even tidier without those casts.  :-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-18 Thread Leonardo F
 This patch no longer applies.  Could you rebase it?


Done (I think). Added a couple of simple tests for bit overlay.

I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has 
to be changed.


Leonardo


  Index: src/backend/utils/adt/varbit.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varbit.c,v
retrieving revision 1.63
diff -c -r1.63 varbit.c
*** src/backend/utils/adt/varbit.c  7 Jan 2010 20:17:43 -   1.63
--- src/backend/utils/adt/varbit.c  18 Jan 2010 09:05:50 -
***
*** 1606,1608 
--- 1606,1704 
}
PG_RETURN_INT32(0);
  }
+ 
+ 
+ /* bitsetbit
+  * Given an instance of type 'bit' creates a new one with
+  * the Nth bit set to the given value.
+  */
+ Datum
+ bitsetbit(PG_FUNCTION_ARGS)
+ {
+   VarBit *arg1 = PG_GETARG_VARBIT_P(0);
+   int32   n = PG_GETARG_INT32(1);
+   int32   newBit = PG_GETARG_INT32(2);
+   VarBit *result;
+   int len,
+   bitlen;
+   bits8  *r,
+  *p;
+   int byteNo,
+   bitNo;
+   int oldByte,
+   newByte;
+ 
+   bitlen = VARBITLEN(arg1);
+   if (n  0 || n = bitlen)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg(index %d out of valid range, 0..%d,
+   n, bitlen - 1)));
+   /*
+* sanity check!
+*/
+   if (newBit != 0  newBit != 1)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg(new bit must be 0 or 1)));
+ 
+   len = VARSIZE(arg1);
+   result = (VarBit *) palloc(len);
+   SET_VARSIZE(result, len);
+   VARBITLEN(result) = bitlen;
+ 
+   p = VARBITS(arg1);
+   r = VARBITS(result);
+ 
+   memcpy(r, p, VARBITBYTES(arg1));
+ 
+   byteNo = n / BITS_PER_BYTE;
+   bitNo = BITS_PER_BYTE - 1 - (n % BITS_PER_BYTE);
+ 
+   /*
+* Update the byte.
+*/
+   oldByte = ((unsigned char *) r)[byteNo];
+ 
+   if (newBit == 0)
+   newByte = oldByte  (~(1  bitNo));
+   else
+   newByte = oldByte | (1  bitNo);
+ 
+   ((unsigned char *) r)[byteNo] = newByte;
+ 
+   PG_RETURN_VARBIT_P(result);
+ }
+ 
+ /* bitgetbit
+  * returns the value of the Nth bit (0 or 1).
+  */
+ Datum
+ bitgetbit(PG_FUNCTION_ARGS)
+ {
+   VarBit *arg1 = PG_GETARG_VARBIT_P(0);
+   int32   n = PG_GETARG_INT32(1);
+   int bitlen;
+   int byteNo,
+   bitNo;
+   int byte;
+ 
+   bitlen = VARBITLEN(arg1);
+ 
+   if (n  0 || n = bitlen)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg(index %d out of valid range, 0..%d,
+   n, bitlen - 1)));
+ 
+   byteNo = n / BITS_PER_BYTE;
+   bitNo = BITS_PER_BYTE - 1 - n % BITS_PER_BYTE;
+ 
+   byte = ((unsigned char *) VARBITS(arg1))[byteNo];
+ 
+   if (byte (1  bitNo))
+   PG_RETURN_INT32(1);
+   else
+   PG_RETURN_INT32(0);
+ }
+ 
Index: src/include/catalog/pg_proc.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.562
diff -c -r1.562 pg_proc.h
*** src/include/catalog/pg_proc.h   15 Jan 2010 09:19:07 -  1.562
--- src/include/catalog/pg_proc.h   18 Jan 2010 09:05:58 -
***
*** 2405,2410 
--- 2405,2418 
  DATA(insert OID = 1699 (  substring   PGNSP PGUID 12 1 0 0 f 
f f t f i 2 0 1560 1560 23 _null_ _null_ _null_ _null_ bitsubstr_no_len 
_null_ _null_ _null_ ));
  DESCR(return portion of bitstring);
  
+ DATA(insert OID = 3030 (  overlay PGNSP PGUID 14 1 0 0 f 
f f t f i 4 0 1560 1560 1560 23 23 _null_ _null_ _null_ _null_ select 
pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + 
$4)) _null_ _null_ _null_ ));
+ DESCR(substitute portion of bit string);
+ DATA(insert OID = 3031 (  overlay PGNSP PGUID 14 1 0 0 f 
f f t f i 3 0 1560 1560 1560 23 _null_ _null_ _null_ _null_select 
pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + 
pg_catalog.length($2))) _null_ _null_ _null_ ));
+ DESCR(substitute portion of bit string);
+ DATA(insert OID = 3032 (  get_bitPGNSP PGUID 12 1 0 0 f f f t 
f i 2 0 23 1560 23 _null_ _null_ _null_ _null_ bitgetbit _null_ _null_ _null_ 
));
+ DESCR(get bit);
+ DATA(insert 

Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-18 Thread Kevin Grittner
Leonardo F wrote:
 
 Done (I think). Added a couple of simple tests for bit overlay.

 I didn't include the catversion.h changes: obviously the
 CATALOG_VERSION_NO has to be changed.
 
[Following the Reviewing A Patch wiki, more or less]
 
The patch is in context diff format and applies cleanly.  It builds
and passes regression tests.
 
There are documentation changes and regression tests, although there
is a must-fix problem in the docs and I would prefer to see better
coverage of corner cases in the regression tests. In the
documentation, the get_bit and set_bit methods are added to a list
where we state The following SQL-standard functions work on bit
strings as well as character strings; however they are not
SQL-standard functions and are implemented on binary strings, not
character strings. The tests don't include any examples where the bit
string lines up with a byte boundary or is operating on the first or
last bit of a byte, and the overlay function is not tested with
values which change the length of the bit string. (All of these
cases seem to work in the current version, but seem like the sort of
thing which could get broken in revision.)
 
It seems to *me* like the patch does something useful; also, it was
on the TODO list, the author felt it was worth the effort, the
overlay support for bit strings seems to comply with the spirit of
the standard by extending a standard feature to a legacy data type,
the get_bit and set_bit functions extend custom PostgreSQL behavior
we have on some other data types to an additional type where it seems
useful, and there is an incidental fix for a documentation gap. So I
think we want it.
 
We don't already have it.
 
As mentioned above, I believe it follows the SQL spec where
appropriate and PostgreSQL community-agreed behavior where
appropriate.
 
I don't see where pg_dump support is needed.
 
I see no dangers.
 
There is a peculiar omission from the patch -- it adds supportfor the
overlay function to bit strings but not binary strings. Sincethe
standard drops the bit string as a standard type in the 2003standard,
in favor of boolean type and binary string types, it seemsodd to omit
binary string support (which is defined in the standard)but include
bit string support (which isn't). What the patch adds seemsuseful,
and I don't think the omission should be considered fatal, butit
would be nice to see us also cover binary strings if we can.
 
The features worked as advertised in all the tests I could think up.
 
There were no assertion failures or crashes.
 
Simple tests performed reasonably well -- a SELECT adding the get_bit
and set_bit around a bit literal typically ran in under 0.05 ms
longer than a SELECT of the literal by itself.  SELECT of an overlay
function typically ran about 0.25 ms longer than the literal by
itself.  Performance of the overlay function could probably be
improved with a C implementation, rather than it being declared as
the concatenation of the results of two substring functions with a
third value.  Unless someone actually has unacceptable performance
with this implementation, I'm inclined to think that optimization
isn't worth the effort.
 
This patch adds new functions rather than aiming to improve
peformance of anything; extensive performance tests were not done.
No slowdowns of other things seems likely and none was observed.
 
I didn't see any portability issues; the techniques used seemed
consistent with the related code, so it should be as portable as what
already is in use.  Well, there was one thing which is beyond my ken
in this regard: I'm not sure that the cases from bits8 to (unsigned
char *) are needed or appropriate, although on my machine they don't
seem to hurt.  Perhaps someone with a better grasp of such issues can
comment.
 
No new warning issues were generated in the build.  I saw no problem
interdependencies.
 
In the extremely nit-picky department:
 
I'm not sure the leading whitespace in pg_proc.h is as it should be.
 
I'd prefer to see the comments for get_bit and set_bit mention that
the location is specified left-to-right in a zero-based fashion
consistent with the other get_bit and set_bit functions, but
inconsistent with the standard substring, position, overlay
functions.  Personally, I find it unfortunate that our extensions
introduced this inconsistency, but let's keep it consistent within
the similarly named functions.
 
This patch uses all-lowercase function names for the internal
functions, which is consistent with the rest of the varbit.c file,
but inconsistent with the similar functions in varlena.c.  I'm not
sure which it is more important that they match.  There is bound to
be some surprise factor in the mismatch either way.  I'm not sure
which way would be least astonishing.
 
It seems odd to me that you specify the bit to set as a 32 bit
integer, and that that is what get_bit returns, but again, this is
consistent with what we do for the bytea functions, so it's probably
the right thing to 

Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-16 Thread Kevin Grittner
 Leonardo F  01/07/10 6:03 AM wrote:
 attached a patch
 
Leonardo,
 
This patch no longer applies.  Could you rebase it?
 
Thanks,
 
-Kevin
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers