[GENERAL] aggregate hash function

2008-01-30 Thread Matthew Dennis
I'm in need of an aggregate hash function.  Something like select
md5_agg(someTextColumn) from (select someTextColumn from someTable order by
someOrderingColumn).  I know that there is an existing MD5 function, but it
is not an aggregate.  I have thought about writing a concat aggregate
function that would concatenate the input into a long string and then using
MD5() on that, but that seems like it would have some bad performance
implications (memory consumption, possibly spilling to disk, many large
memory copies, etc) as it would buildup the entire concatenated string first
before hashing it.

I also thought about making a aggregate function that works by keeping the
MD5 result as a string in the state, then concatenating the new input with
the current state, hashing that and using it as the new state.  This solves
the problem of building up a giant string to just traverse over at the end
to get the MD5 sum.  This approach would actually work for me, but it
doesn't give me the actual MD5 sum of the data which is what I really want.

comments/ideas/suggestions?


Re: [GENERAL] aggregate hash function

2008-01-30 Thread Vyacheslav Kalinin
Most implementations of md5 internally consist of 3 functions: md5_init -
which initializes internal context, md5_update - which accepts portions of
data and processes them and md5_final - which finalizes the hash and
releases the context. These roughly suit  aggregate's  internal functions
(SFUNC and FINALFUNC,  md5_init is probably to be called on first actual
input). Since performance  is important for you the functions should be
written in low-level language as C, to me it doesn't look difficult to take
some C md5 module and adapt it to be an aggregate... though it's not like I
would do this easily myself :)


Re: [GENERAL] aggregate hash function

2008-01-30 Thread Matthew Dennis
On Jan 30, 2008 4:40 PM, Vyacheslav Kalinin [EMAIL PROTECTED] wrote:

 Most implementations of md5 internally consist of 3 functions: md5_init -
 which initializes internal context, md5_update - which accepts portions of
 data and processes them and md5_final - which finalizes the hash and
 releases the context. These roughly suit  aggregate's  internal functions
 (SFUNC and FINALFUNC,  md5_init is probably to be called on first actual
 input). Since performance  is important for you the functions should be
 written in low-level language as C, to me it doesn't look difficult to take
 some C md5 module and adapt it to be an aggregate... though it's not like I
 would do this easily myself :)


Yes, thank you, I'm aware of how MD5 works - that's precisely why I don't
like the idea of concatenating everything together first.  I was hoping that
because PG already exposed an MD5 function that it used a stdlib and also
exposed the constituent functions and I just wasn't looking in the right
place for them.  Assuming it did, it would be pretty trivial to use them for
SFUNC and FFUNC in creating an aggregate.

Thanks for the help.