Re: [SQL] how to concatenate in PostgreSQL

2012-03-24 Thread Allan Kamau
On 3/24/12, Andreas Kretschmer  wrote:
> Rehan Saleem  wrote:
>
>> hi ,
>> how can we concatinate these lines and execute sql command
>>
>>set sql = 'select user,username, firstname '
>>   set sql += ' lastname, cardno from table1 where userid=' + 5
>
> sql = sql || ' bla fasel';
>
> || is the concat - Operator.
>
>
>
> Andreas
> --
> Really, I'm not out to destroy Microsoft. That will just be a completely
> unintentional side effect.  (Linus Torvalds)
> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
> Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>


You may want to ensure that none of variables to be concatenated is
NULL, else the end result will be NULL.
"coalesce" function provides means to provide an substitute value for
a variable that may be NULL.

Allan.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] How to compare two tables in PostgreSQL

2012-11-12 Thread Allan Kamau
If you would like to compare their contents perhaps this may help.
Write a select statement containing the fields for which you would like to
compare data for, you may want to leave out fields whose values are
provided by default for example fields populated from sequence object
and/or timestamp fields.
You may need to include triming of leading and trailing empty spaces for
the text based fields if such white spaces are not relevant for your
defination of similarity.
The same may apply on rounding and formatting numeric data for example
9.900 could be equivalent to 9.9 in the other table based on your
application of the data.
Include an ORDER BY clause to ensure you get the records in a predictable
order.
Output these data to a CSV file without the CSV header.
Now rewrite the same query for the other table, this is required if the
table definations are not common between the two tables.
Remember to substitute the table name accordingly.
Output these data to another CSV file without the CSV header.

Now run sha1sum on the first file and compare the returned sha1sum value
with the value returned on running sha1sum with the second file.
Perhaps use "diff" tool.

Allan.


On Mon, Nov 12, 2012 at 10:23 AM, Rob Sargentg wrote:

> On 11/10/2012 08:13 PM, saikiran mothe wrote:
>
>> Hi,
>>
>> How can i compare two tables in PostgreSQL.
>>
>> Thanks,
>> Sai
>>
> Compare their content or their definition?
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-sql
>


Re: [SQL] How to compare two tables in PostgreSQL

2012-11-12 Thread Allan Kamau
Thanks Willem for the advise to stick to SQL solutions where possible.

A simple SQL solution would be to perform a full out join on the two tables.



On Mon, Nov 12, 2012 at 12:13 PM, Willem Leenen
wrote:

>
> My advice: for comparing databases, tables , data etc, don't go scripting
> yourself. There are already tools in the market for that and they give nice
> reports on differences in constraints, indexes, columnnames, data etc.
> I used dbdiff from dkgas.com, but it seems the website is down.
>
> I would try to stick to SQL solutions as much as possible, instead of
> creating files and compare them. (got that from Joe Celko ;) )
>
>
>
>  --
> Date: Mon, 12 Nov 2012 11:00:32 +0300
> Subject: Re: [SQL] How to compare two tables in PostgreSQL
> From: kamaual...@gmail.com
> To: pgsql-sql@postgresql.org
>
>
>  If you would like to compare their contents perhaps this may help.
> Write a select statement containing the fields for which you would like to
> compare data for, you may want to leave out fields whose values are
> provided by default for example fields populated from sequence object
> and/or timestamp fields.
> You may need to include triming of leading and trailing empty spaces for
> the text based fields if such white spaces are not relevant for your
> defination of similarity.
> The same may apply on rounding and formatting numeric data for example
> 9.900 could be equivalent to 9.9 in the other table based on your
> application of the data.
> Include an ORDER BY clause to ensure you get the records in a predictable
> order.
> Output these data to a CSV file without the CSV header.
> Now rewrite the same query for the other table, this is required if the
> table definations are not common between the two tables.
> Remember to substitute the table name accordingly.
> Output these data to another CSV file without the CSV header.
>
> Now run sha1sum on the first file and compare the returned sha1sum value
> with the value returned on running sha1sum with the second file.
> Perhaps use "diff" tool.
>
> Allan.
>
>
> On Mon, Nov 12, 2012 at 10:23 AM, Rob Sargentg wrote:
>
>  On 11/10/2012 08:13 PM, saikiran mothe wrote:
>
> Hi,
>
> How can i compare two tables in PostgreSQL.
>
> Thanks,
> Sai
>
> Compare their content or their definition?
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-sql
>
>
>


[SQL] Removing redundant itemsets

2008-03-31 Thread Allan Kamau

Hi all,
I have a list of purchases (market basket) and I would like to select
non redundant longest possible patterns by eliminating
(creating/populating other table to contain only non redandant itemsets)
purchases having item lists which are fully included in at least one
other purchase.

(Am assuming all the items of all the purchases have met the minimum
support currently set at 1)

Below is a sample case, table schema and data(DDL and DML)

Transaction   Itemset
'100'   'a','b','c','d'
'200'   'c','d'
'300'   'a','c','e'
'400'   'e','d'

On successful removal out of 'redanduant' or smaller purchases having
items contained in totality by at least one other purchase, the
purchase '200' would be weeded out as it's itemset {'c','d'} is
contained in '100' {'a','b','c','d'} purchase.


drop sequence if exists togo_seq cascade;
create sequence togo_seq;
drop table if exists togo cascade;
create table togo
(
id integer not null default nextval('togo_seq')
,tid char(3) not null
,item char(1) not null
,primary key(id)
,unique(tid,item)
)
;
insert into togo(tid,item)values('100','b');
insert into togo(tid,item)values('100','a');
insert into togo(tid,item)values('100','c');
insert into togo(tid,item)values('100','d');
insert into togo(tid,item)values('200','c');
insert into togo(tid,item)values('200','d');
insert into togo(tid,item)values('300','a');
insert into togo(tid,item)values('300','c');
insert into togo(tid,item)values('300','e');
insert into togo(tid,item)values('400','e');
insert into togo(tid,item)values('400','d');

Allan.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Removing redundant itemsets

2008-03-31 Thread Allan Kamau

Craig,
Thank you so much for the solution. I have spent many hours since 
Thursday last week including the weekend (and it took you just a few 
minutes) trying to figure out a solution not involving procedural 
programming and looping (as the size of the items and even the number of 
"purchases" in the datasets I may be working with may be large), I was 
looking for a solution that may take (almost) polynomial time (and 
resources) and also make use of Postgresql refined and efficient engine. 
Your solution satisfies these requirements. Thanks.


Allan.

Craig Ringer wrote:

Allan Kamau wrote:
  

Hi all,
I have a list of purchases (market basket) and I would like to select
non redundant longest possible patterns by eliminating
(creating/populating other table to contain only non redandant itemsets)
purchases having item lists which are fully included in at least one
other purchase.



Here's a possibly slow and surely ugly solution (I think it's right,
though I haven't done more than passing testing):



CREATE VIEW togo_as_arr AS
  SELECT a.tid,
ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
AS items
  FROM togo a GROUP BY tid;

SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;



(the view isn't necessary, but does improve the readability of the query).

It groups the purchases up with item lists as arrays, then finds any
purchases with items arrays wholly contained by other item arrays from
other purchases.

I'm *sure* there's a smarter way to do this that avoids the use of
arrays, but I don't seem to be able to come up with one right now. It's
interesting, though, so I might keep fiddling.

--
Craig Ringer
  



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Weeks elapsed to Months elapsed conversion

2008-05-30 Thread Allan Kamau

Hi all,
I have a simple question (tried googling but found no answers). How do I 
convert weeks elapsed into months elapsed?
I have data that contains duration in weeks (without any other date 
values such as year and so on) for example a week value of 14 and I 
would like to convert the 14 weeks to 3 months (some lose of accuracy 
expected).

Expected tests may be:
14 weeks yields 3 months.
1 weeks yields 0 months.

Is there such a function. The rudimentary solution (may reduce lose of 
accuracy) am thinking of is to add the weeks to the value returned by 
clock_timestamp() then subtract clock_timestamp() from it. The question 
now is how to convert the returned value to months elapsed as opposed to 
days elapsed.


Allan.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Sequential event query

2008-06-25 Thread Allan Kamau

Hi Steve,
Am having difficulties (there is a chance I could be the only one) 
trying to see how the results you've listed under "I would want to get:" 
section can be generated from the information you have provided in your 
implicit problem statement. Please reconstruct your question, show 
simple and clear sample data along with desired results.


For example where does the event_type, sequential_events fields get 
their values from given the sample data in a single field you've provided?


Allan.

Steve Crawford wrote:

I have a table that includes the following columns:
event_time timestamptz
device_id integer
event_type integer
...

There are hundreds of unique device_ids, about ten event_types and 
millions of records in the table. Devices can run the gamut from idle 
to fully utilized so for any given time-period a device might have 
anywhere from zero to thousands of events. I am trying to concoct two 
queries.


1: Analysis query to determine the distribution of sequential 
event_types. For example, if the event_types, in chronological order, 
were:

1
3
1
4
4
5
4
2
2
2
4
4
7
4
4

I would want to get:
event_type, sequential_events, occurrences
1,1,2
2,3,1
3,1,1
4,1,1
4,2,3
5,1,1
7,1,1



May be a select event_type,sequential_events, count(*) as occurances 
from blah group by ...




2: Listing of all devices where the most recent N events are all 
identical. As noted above, the varying load on the devices means that 
for device 1, the last N might be the last 2 minutes but for device 3 
it might be a day or two. I am looking for a query that will list any 
device having no variation in the recent events.


Cheers,
Steve






--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Efficiently determining the number of bits set in the contents of a VARBIT field

2008-07-26 Thread Allan Kamau

Hi all,
Am looking for a fast and efficient way to count the number of bits set 
(to 1) in a VARBIT field. I am currently using 
"LENGTH(REGEXP_REPLACE(CAST(a.somefield_bit_code AS TEXT),'0','','g'))".


Allan.


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Re: Efficiently determining the number of bits set in the contents of, a VARBIT field

2008-08-21 Thread Allan Kamau
Thank you TJ and everyone else for the advise and the c code. Today I 
did finally return to the 'number of bits set challenge' and managed to 
compile and link the nbits c function which went smoothly. However the 
function does crash my postgres server installation (8.3.3) with a 
segmentation fault each time I call it for example SELECT 
nbits_set(B'1101');
My C skills are very sparse and am unable to debug the function, I have 
included the C code of this function. Is there something I may have left 
out?




#include "postgres.h"
#include "utils/varbit.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */
   VarBit *a = PG_GETARG_VARBIT_P(0);
   int n=0;
   int i;
   unsigned char *ap = VARBITS(a);
   unsigned char aval;
   for (i=0; i < VARBITBYTES(a); ++i) {
   aval = *ap; ++ap;
   if (aval == 0) continue;
   if (aval & 1) ++n;
   if (aval & 2) ++n;
   if (aval & 4) ++n;
   if (aval & 8) ++n;
   if (aval & 16) ++n;
   if (aval & 32) ++n;
   if (aval & 64) ++n;
   if (aval & 128) ++n;
   }
   PG_RETURN_INT32(n);
}

Allan

Bruce Momjian wrote:

Jean-David Beyer wrote:
  

TJ O'Donnell wrote:


I use a c function, nbits_set that will do what you need.
I've posted the code in this email.

TJ O'Donnell
http://www.gnova.com

#include "postgres.h"
#include "utils/varbit.h"

Datum   nbits_set(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */

 VarBit *a = PG_GETARG_VARBIT_P(0);
 int n=0;
 int i;
 unsigned char *ap = VARBITS(a);
 unsigned char aval;
 for (i=0; i < VARBITBYTES(a); ++i) {
 aval = *ap; ++ap;
 if (aval == 0) continue;
 if (aval & 1) ++n;
 if (aval & 2) ++n;
 if (aval & 4) ++n;
 if (aval & 8) ++n;
 if (aval & 16) ++n;
 if (aval & 32) ++n;
 if (aval & 64) ++n;
 if (aval & 128) ++n;
 }
 PG_RETURN_INT32(n);
}



  

Hi all,
Am looking for a fast and efficient way to count the number of bits set 
(to 1) in a VARBIT field. I am currently using 
"LENGTH(REGEXP_REPLACE(CAST(a.somefield_bit_code AS TEXT),'0','','g'))".


Allan.

  

When I had to do that, in days with smaller amounts of RAM, but very long
bit-vectors, I used a faster function sort-of like this:

static char table[256] = {
0,1,1,2,1,2,2,3,1,.
};

Then like above, but instead of the loop,

n+= table[aval];


You get the idea.



Uh, I was kind of confused by this, even when I saw a full
implementation:

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable

Actually, this looks even better:


http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan

  



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Re: Efficiently determining the number of bits set in the contents of, a VARBIT field

2008-08-22 Thread Allan Kamau
All was well with the code below, apologies to all who read my previous 
email. The error (an oversight) was on my part. In the "CREATE FUNCTION 
..." statement I had FLOAT as the return type instead of INTEGER.
Now the function runs smoothly. Preliminary results show it is orders of 
magnitude faster than the LENGTH(REGEXP(CAST(myVarBit AS 
TEXT),'0','','g')) solution.

Thanks again TJ and the rest of the team.

Allan

Allan Kamau wrote:
Thank you TJ and everyone else for the advise and the c code. Today I 
did finally return to the 'number of bits set challenge' and managed 
to compile and link the nbits c function which went smoothly. However 
the function does crash my postgres server installation (8.3.3) with a 
segmentation fault each time I call it for example SELECT 
nbits_set(B'1101');
My C skills are very sparse and am unable to debug the function, I 
have included the C code of this function. Is there something I may 
have left out?




#include "postgres.h"
#include "utils/varbit.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */
   VarBit *a = PG_GETARG_VARBIT_P(0);
   int n=0;
   int i;
   unsigned char *ap = VARBITS(a);
   unsigned char aval;
   for (i=0; i < VARBITBYTES(a); ++i) {
   aval = *ap; ++ap;
   if (aval == 0) continue;
   if (aval & 1) ++n;
   if (aval & 2) ++n;
   if (aval & 4) ++n;
   if (aval & 8) ++n;
   if (aval & 16) ++n;
   if (aval & 32) ++n;
   if (aval & 64) ++n;
   if (aval & 128) ++n;
   }
   PG_RETURN_INT32(n);
}

Allan

Bruce Momjian wrote:

Jean-David Beyer wrote:
 

TJ O'Donnell wrote:
   

I use a c function, nbits_set that will do what you need.
I've posted the code in this email.

TJ O'Donnell
http://www.gnova.com

#include "postgres.h"
#include "utils/varbit.h"

Datum   nbits_set(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */

 VarBit *a = PG_GETARG_VARBIT_P(0);
 int n=0;
 int i;
 unsigned char *ap = VARBITS(a);
 unsigned char aval;
 for (i=0; i < VARBITBYTES(a); ++i) {
 aval = *ap; ++ap;
 if (aval == 0) continue;
 if (aval & 1) ++n;
 if (aval & 2) ++n;
 if (aval & 4) ++n;
 if (aval & 8) ++n;
 if (aval & 16) ++n;
 if (aval & 32) ++n;
 if (aval & 64) ++n;
 if (aval & 128) ++n;
 }
 PG_RETURN_INT32(n);
}



 

Hi all,
Am looking for a fast and efficient way to count the number of 
bits set (to 1) in a VARBIT field. I am currently using 
"LENGTH(REGEXP_REPLACE(CAST(a.somefield_bit_code AS 
TEXT),'0','','g'))".


Allan.

  
When I had to do that, in days with smaller amounts of RAM, but very 
long

bit-vectors, I used a faster function sort-of like this:

static char table[256] = {
0,1,1,2,1,2,2,3,1,.
};

Then like above, but instead of the loop,

n+= table[aval];


You get the idea.



Uh, I was kind of confused by this, even when I saw a full
implementation:

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable 



Actually, this looks even better:

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan 



  






--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] Collapsing (select) row values into single text field.

2008-12-10 Thread Allan Kamau

Hi all,
I would like to concatenate the field values of several rows in a table 
that meet some similarity criteria  based on a the values of  some other 
field (more like a group by). Then I would also like to also include the 
lowest value of another associated field along.


I have a table that contains 3 fields of interest.
create table temp
(id INTEGER NOT NULL
,location TEXT NOT NULL --this will hold the zip code
,lowest_temp NUMERIC(5,2) NOT NULL --The lowest temperature at some 
given night

,location_bit_data VARBIT NOT NULL
,PRIMARY KEY(id)
);

There will be usually more than one record for a location 
(location+lowest_temp is not unique either).
Now I would like to collapse the data in this table (an populate another 
table) as follows.

Lets assume this table has the structure below.

create table temp_major
(id INTEGER NOT NULL
,location TEXT NOT NULL --this will hold the zip code
,lowest_overall_temp NUMERIC(5,2) NOT NULL --The lowest temperature at 
some given night

,overall_location_bit_data VARBIT NOT NULL
,PRIMARY KEY(id)
,UNIQUE(location)
);

The new table (temp_major) is population as follows: the 
"location_bit_data" values for a given location are "grouped" into one 
entry (to create a concatenation effect), the lowest_temp reading across 
all the records of the given location is noted and the location is also 
noted, this data is used in populating the table.


The solution I have so far involves using a stored procedure and cursors 
(on Select .. order by location) to continuously "grow" the data for a 
given location's "overall_location_bit_data" field.


Allan.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Collapsing (select) row values into single text field.

2008-12-10 Thread Allan Kamau
Thanks Filip for the User Defined Aggregates information I will look
into it, and compare its performance with the another probable
solution (explained next).
I do recall making use of arrays (then array_to_string()) to do this
kind of collapsing as a subquery in the select clause of a group by
query, but I can not recall the actual syntax.

Allan.

On Wed, Dec 10, 2008 at 4:45 PM, Filip Rembiałkowski
<[EMAIL PROTECTED]> wrote:
>
>
> 2008/12/10 Allan Kamau <[EMAIL PROTECTED]>
>>
>> Hi all,
>> I would like to concatenate the field values of several rows in a table
>> that meet some similarity criteria  based on a the values of  some other
>> field (more like a group by). Then I would also like to also include the
>> lowest value of another associated field along.
>>
>> I have a table that contains 3 fields of interest.
>> create table temp
>> (id INTEGER NOT NULL
>> ,location TEXT NOT NULL --this will hold the zip code
>> ,lowest_temp NUMERIC(5,2) NOT NULL --The lowest temperature at some given
>> night
>> ,location_bit_data VARBIT NOT NULL
>> ,PRIMARY KEY(id)
>> );
>>
>> There will be usually more than one record for a location
>> (location+lowest_temp is not unique either).
>> Now I would like to collapse the data in this table (an populate another
>> table) as follows.
>> Lets assume this table has the structure below.
>>
>> create table temp_major
>> (id INTEGER NOT NULL
>> ,location TEXT NOT NULL --this will hold the zip code
>> ,lowest_overall_temp NUMERIC(5,2) NOT NULL --The lowest temperature at
>> some given night
>> ,overall_location_bit_data VARBIT NOT NULL
>> ,PRIMARY KEY(id)
>> ,UNIQUE(location)
>> );
>>
>> The new table (temp_major) is population as follows: the
>> "location_bit_data" values for a given location are "grouped" into one entry
>> (to create a concatenation effect), the lowest_temp reading across all the
>> records of the given location is noted and the location is also noted, this
>> data is used in populating the table.
>>
>> The solution I have so far involves using a stored procedure and cursors
>> (on Select .. order by location) to continuously "grow" the data for a given
>> location's "overall_location_bit_data" field.
>>
>> Allan.
>
>
> sounds like you need a custom aggregate function.
> http://www.postgresql.org/docs/current/static/xaggr.html
>
> however it's not clear how you want to aggregate; what does your actual
> grouping function do?
>
> general pattern is:
>
> CREATE FUNCTION varbit_concat(varbit,varbit)
> returns varbit
> as 'whatever you need' language 'of your choice' immutable;
>
> CREATE AGGREGATE agg_varbit_concat ( varbit ) (
> SFUNC = varbit_concat,
> STYPE = varbit
> -- check CREATE AGGREGATE syntax, maybe you need something fancy here
> );
>
>
> -- and finally:
>
> SELECT
>  location,
>  min(lowest_temp) as lowest_overall_temp,
>  agg_varbit_concat(location_bit_data) as overall_location_bit_data
> FROM temp;
>
>
> --
> Filip Rembiałkowski
>

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Re: Efficiently determining the number of bits set in the contents of, a VARBIT field

2009-03-12 Thread Allan Kamau

Hi all,
I am now looking for a function to return the position of the first 
position of the left most set bit. And optionally another to return the 
position of the right most set bit.


I have been looking at 
"http://graphics.stanford.edu/~seander/bithacks.html#OperationCounting"; 
but it seems it will take me a while to figure out bit manipulation.


Allan.

Allan Kamau wrote:
All was well with the code below, apologies to all who read my 
previous email. The error (an oversight) was on my part. In the 
"CREATE FUNCTION ..." statement I had FLOAT as the return type instead 
of INTEGER.
Now the function runs smoothly. Preliminary results show it is orders 
of magnitude faster than the LENGTH(REGEXP(CAST(myVarBit AS 
TEXT),'0','','g')) solution.

Thanks again TJ and the rest of the team.

Allan

Allan Kamau wrote:
Thank you TJ and everyone else for the advise and the c code. Today I 
did finally return to the 'number of bits set challenge' and managed 
to compile and link the nbits c function which went smoothly. However 
the function does crash my postgres server installation (8.3.3) with 
a segmentation fault each time I call it for example SELECT 
nbits_set(B'1101');
My C skills are very sparse and am unable to debug the function, I 
have included the C code of this function. Is there something I may 
have left out?




#include "postgres.h"
#include "utils/varbit.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */
   VarBit *a = PG_GETARG_VARBIT_P(0);
   int n=0;
   int i;
   unsigned char *ap = VARBITS(a);
   unsigned char aval;
   for (i=0; i < VARBITBYTES(a); ++i) {
   aval = *ap; ++ap;
   if (aval == 0) continue;
   if (aval & 1) ++n;
   if (aval & 2) ++n;
   if (aval & 4) ++n;
   if (aval & 8) ++n;
   if (aval & 16) ++n;
   if (aval & 32) ++n;
   if (aval & 64) ++n;
   if (aval & 128) ++n;
   }
   PG_RETURN_INT32(n);
}

Allan

Bruce Momjian wrote:

Jean-David Beyer wrote:
 

TJ O'Donnell wrote:
  

I use a c function, nbits_set that will do what you need.
I've posted the code in this email.

TJ O'Donnell
http://www.gnova.com

#include "postgres.h"
#include "utils/varbit.h"

Datum   nbits_set(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(nbits_set);
Datum
nbits_set(PG_FUNCTION_ARGS)
{
/* how many bits are set in a bitstring? */

 VarBit *a = PG_GETARG_VARBIT_P(0);
 int n=0;
 int i;
 unsigned char *ap = VARBITS(a);
 unsigned char aval;
 for (i=0; i < VARBITBYTES(a); ++i) {
 aval = *ap; ++ap;
 if (aval == 0) continue;
 if (aval & 1) ++n;
 if (aval & 2) ++n;
 if (aval & 4) ++n;
 if (aval & 8) ++n;
 if (aval & 16) ++n;
 if (aval & 32) ++n;
 if (aval & 64) ++n;
 if (aval & 128) ++n;
 }
 PG_RETURN_INT32(n);
}





Hi all,
Am looking for a fast and efficient way to count the number of 
bits set (to 1) in a VARBIT field. I am currently using 
"LENGTH(REGEXP_REPLACE(CAST(a.somefield_bit_code AS 
TEXT),'0','','g'))".


Allan.

  
When I had to do that, in days with smaller amounts of RAM, but 
very long

bit-vectors, I used a faster function sort-of like this:

static char table[256] = {
0,1,1,2,1,2,2,3,1,.
};

Then like above, but instead of the loop,

n+= table[aval];


You get the idea.



Uh, I was kind of confused by this, even when I saw a full
implementation:


http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable


Actually, this looks even better:


http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan 



  









--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Re: Efficiently determining the number of bits set in the contents of, a VARBIT field

2009-03-12 Thread Allan Kamau
Seems I have a solution based on the code TJ had provided for counting
the bits sets, for those interested below are the two functions.



#include "postgres.h"
#include "utils/varbit.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(last_bit_set);
Datum
last_bit_set(PG_FUNCTION_ARGS)
{
/* position of last set bit of a bitstring? */
int n=0;
VarBit *a = PG_GETARG_VARBIT_P(0);
int i;
unsigned char *ap = VARBITS(a);
unsigned char aval;
int b=0;
int byte_cnt=0;
int last_bit_set_position=0;

for(i=0;i0)
first_bit_set_position=(8*byte_cnt)+b;
else
first_bit_set_position=0;
PG_RETURN_INT32(first_bit_set_position);
}


Allan.

On Thu, Mar 12, 2009 at 2:53 PM, Allan Kamau  wrote:
> Hi all,
> I am now looking for a function to return the position of the first position
> of the left most set bit. And optionally another to return the position of
> the right most set bit.
>
> I have been looking at
> "http://graphics.stanford.edu/~seander/bithacks.html#OperationCounting"; but
> it seems it will take me a while to figure out bit manipulation.
>
> Allan.
>
> Allan Kamau wrote:
>>
>> All was well with the code below, apologies to all who read my previous
>> email. The error (an oversight) was on my part. In the "CREATE FUNCTION ..."
>> statement I had FLOAT as the return type instead of INTEGER.
>> Now the function runs smoothly. Preliminary results show it is orders of
>> magnitude faster than the LENGTH(REGEXP(CAST(myVarBit AS TEXT),'0','','g'))
>> solution.
>> Thanks again TJ and the rest of the team.
>>
>> Allan
>>
>> Allan Kamau wrote:
>>>
>>> Thank you TJ and everyone else for the advise and the c code. Today I did
>>> finally return to the 'number of bits set challenge' and managed to compile
>>> and link the nbits c function which went smoothly. However the function does
>>> crash my postgres server installation (8.3.3) with a segmentation fault each
>>> time I call it for example SELECT nbits_set(B'1101');
>>> My C skills are very sparse and am unable to debug the function, I have
>>> included the C code of this function. Is there something I may have left
>>> out?
>>>
>>>
>>>
>>> #include "postgres.h"
>>> #include "utils/varbit.h"
>>> #include "fmgr.h"
>>> #ifdef PG_MODULE_MAGIC
>>> PG_MODULE_MAGIC;
>>> #endif
>>>
>>> PG_FUNCTION_INFO_V1(nbits_set);
>>> Datum
>>> nbits_set(PG_FUNCTION_ARGS)
>>> {
>>> /* how many bits are set in a bitstring? */
>>>   VarBit *a = PG_GETARG_VARBIT_P(0);
>>>   int n=0;
>>>   int i;
>>>   unsigned char *ap = VARBITS(a);
>>>   unsigned char aval;
>>>   for (i=0; i < VARBITBYTES(a); ++i) {
>>>   aval = *ap; ++ap;
>>>   if (aval == 0) continue;
>>>   if (aval & 1) ++n;
>>>   if (aval & 2) ++n;
>>>   if (aval & 4) ++n;
>>>   if (aval & 8) ++n;
>>>   if (aval & 16) ++n;
>>>   if (aval & 32) ++n;
>>>   if (aval & 64) ++n;
>>>   if (aval & 128) ++n;
>>>   }
>>>   PG_RETURN_INT32(n);
>>> }
>>>
>>> Allan
>>>
>>> Bruce Momjian wrote:
>>>>
>>>> Jean-David Beyer wrote:
>>>>
>>>>>
>>>>> TJ O'Donnell wrote:
>>>>>
>>>>>>
>>>>>> I use a c function, nbits_set that will do what you need.
>>>>>> I've posted the code in this email.
>>>>>>
>>>>>> TJ O'Donnell
>>>>>> http://www.gnova.com
>>>>>>
>>>>>> #include "postgres.h"
>>>>>> #include "utils/varbit.h"
>>>>>>
>>>>>> Datum   nbits_set(PG_FUNCTION_ARGS);
>>>>>> PG_FUNCTION_INFO_V1(nbits_set);
>>>>>> Datum
>>>>>> nbits_set(PG_FUNCTION_ARGS)
>>>>>> {
>>>>>> /* how many bits are set in a bitstring? */
>>>>>>
>>>>>> VarBit *a = PG_GETARG_VARBIT_P(0);
>>>>>> int n=0;
>>>>>> int i;
>>>>>> unsigned char *ap = VARBITS(a);
>>>&g