Hi,

Just for testing... is there a fast (i.e. written in C) crc32 or a similar
small hash function for PostgreSQL?


On 15 June 2016 at 16:00, Ivan Voras <ivo...@gmail.com> wrote:

> Hi,
>
> This idea is similar to the substring one, and while it does give
> excellent performance and small size, it requires application code
> modifications, so it's out.
>
>
> On 15 June 2016 at 15:58, julyanto SUTANDANG <julya...@equnix.co.id>
> wrote:
>
>> Hi Ivan,
>>
>> How about using crc32 ? and then index the integer as the result of crc32
>> function? you can split the hash into 2 part and do crc32 2x ? and then
>> create composite index on both integer (the crc32 result)
>> instead of using 64 char, you only employ 2 integer as index key.
>>
>> Regards,
>>
>> Jul
>>
>> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I understand your idea, and have also been thinking about it. Basically,
>>> existing applications would need to be modified, however slightly, and that
>>> wouldn't be good.
>>>
>>>
>>>
>>>
>>> On 15 June 2016 at 15:38, hubert depesz lubaczewski <dep...@depesz.com>
>>> wrote:
>>>
>>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>>>> > I have an application which stores a large amounts of hex-encoded hash
>>>> > strings (nearly 100 GB of them), which means:
>>>>
>>>> Why do you keep them hex encoded, and not use bytea?
>>>>
>>>> I made a sample table with 1 million rows, looking like this:
>>>>
>>>>      Table "public.new"
>>>>  Column  | Type  | Modifiers
>>>> ---------+-------+-----------
>>>>  texthex | text  |
>>>>  a_bytea | bytea |
>>>>
>>>> values are like:
>>>>
>>>> $ select * from new limit 10;
>>>>                              texthex                              |
>>>>                           a_bytea
>>>>
>>>> ------------------------------------------------------------------+--------------------------------------------------------------------
>>>>  c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>>>>  61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>>>>  757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>>>>  fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>>>>  ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>>>>  11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>>>>  5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>>>>  2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>>>>  2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>>>>  2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>>>> (10 rows)
>>>>
>>>> created two indexes:
>>>> create index i1 on new (texthex);
>>>> create index i2 on new (a_bytea);
>>>>
>>>> i1 is 91MB, and i2 is 56MB.
>>>>
>>>> Index creation was also much faster - best out of 3 runs for i1 was
>>>> 4928.982
>>>> ms, best out of 3 runs for i2 was 2047.648 ms
>>>>
>>>> Best regards,
>>>>
>>>> depesz
>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> Julyanto SUTANDANG
>>
>> Equnix Business Solutions, PT
>> (An Open Source an Open Mind Company)
>>
>> Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
>> Pusat
>> T: +6221 22866662 F: +62216315281 M: +628164858028
>>
>>
>> Caution: The information enclosed in this email (and any attachments) may
>> be legally privileged and/or confidential and is intended only for the use
>> of the addressee(s). No addressee should forward, print, copy, or otherwise
>> reproduce this message in any manner that would allow it to be viewed by
>> any individual not originally listed as a recipient. If the reader of this
>> message is not the intended recipient, you are hereby notified that any
>> unauthorized disclosure, dissemination, distribution, copying or the taking
>> of any action in reliance on the information herein is strictly prohibited.
>> If you have received this communication in error, please immediately notify
>> the sender and delete this message.Unless it is made by the authorized
>> person, any views expressed in this message are those of the individual
>> sender and may not necessarily reflect the views of PT Equnix Business
>> Solutions.
>>
>
>

Reply via email to