Hi, Just for testing... is there a fast (i.e. written in C) crc32 or a similar small hash function for PostgreSQL?
On 15 June 2016 at 16:00, Ivan Voras <ivo...@gmail.com> wrote: > Hi, > > This idea is similar to the substring one, and while it does give > excellent performance and small size, it requires application code > modifications, so it's out. > > > On 15 June 2016 at 15:58, julyanto SUTANDANG <julya...@equnix.co.id> > wrote: > >> Hi Ivan, >> >> How about using crc32 ? and then index the integer as the result of crc32 >> function? you can split the hash into 2 part and do crc32 2x ? and then >> create composite index on both integer (the crc32 result) >> instead of using 64 char, you only employ 2 integer as index key. >> >> Regards, >> >> Jul >> >> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivo...@gmail.com> wrote: >> >>> Hi, >>> >>> I understand your idea, and have also been thinking about it. Basically, >>> existing applications would need to be modified, however slightly, and that >>> wouldn't be good. >>> >>> >>> >>> >>> On 15 June 2016 at 15:38, hubert depesz lubaczewski <dep...@depesz.com> >>> wrote: >>> >>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: >>>> > I have an application which stores a large amounts of hex-encoded hash >>>> > strings (nearly 100 GB of them), which means: >>>> >>>> Why do you keep them hex encoded, and not use bytea? >>>> >>>> I made a sample table with 1 million rows, looking like this: >>>> >>>> Table "public.new" >>>> Column | Type | Modifiers >>>> ---------+-------+----------- >>>> texthex | text | >>>> a_bytea | bytea | >>>> >>>> values are like: >>>> >>>> $ select * from new limit 10; >>>> texthex | >>>> a_bytea >>>> >>>> ------------------------------------------------------------------+-------------------------------------------------------------------- >>>> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | >>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f >>>> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | >>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db >>>> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | >>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 >>>> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | >>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 >>>> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | >>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a >>>> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | >>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea >>>> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | >>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 >>>> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | >>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c >>>> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | >>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 >>>> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | >>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa >>>> (10 rows) >>>> >>>> created two indexes: >>>> create index i1 on new (texthex); >>>> create index i2 on new (a_bytea); >>>> >>>> i1 is 91MB, and i2 is 56MB. >>>> >>>> Index creation was also much faster - best out of 3 runs for i1 was >>>> 4928.982 >>>> ms, best out of 3 runs for i2 was 2047.648 ms >>>> >>>> Best regards, >>>> >>>> depesz >>>> >>>> >>> >> >> >> -- >> >> >> Julyanto SUTANDANG >> >> Equnix Business Solutions, PT >> (An Open Source an Open Mind Company) >> >> Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta >> Pusat >> T: +6221 22866662 F: +62216315281 M: +628164858028 >> >> >> Caution: The information enclosed in this email (and any attachments) may >> be legally privileged and/or confidential and is intended only for the use >> of the addressee(s). No addressee should forward, print, copy, or otherwise >> reproduce this message in any manner that would allow it to be viewed by >> any individual not originally listed as a recipient. If the reader of this >> message is not the intended recipient, you are hereby notified that any >> unauthorized disclosure, dissemination, distribution, copying or the taking >> of any action in reliance on the information herein is strictly prohibited. >> If you have received this communication in error, please immediately notify >> the sender and delete this message.Unless it is made by the authorized >> person, any views expressed in this message are those of the individual >> sender and may not necessarily reflect the views of PT Equnix Business >> Solutions. >> > >