> Presumably the data you are inserting isn't actually random. Please
> describe the use case you are considering in more detail and some view
> on how frequent that is, with some examples. Once we understand the
> use case and agree it is important, we might solve problems.


Collecting calls data for mobile network operators (and no, I don't work for 
the NSA...)
Easily 5000-10000 inserts per second. Indexes in timestamp and ID (not a 
problem, always increasing so no btree issues) and in called #, calling #, 
imsi, imei. The last four obviously are random, out of millions of possible 
values.
After the few first millions of records, the disks can't keep up with the 
amount of random writing in the indexes. Workaround: the table is partitioned 
every 15 minutes, and indexes created in bulk after we "start" the new 
15-minutes partition. Searches on current 15 minutes are not allowed (as it is 
not indexed), and searches on older data are K*log(N) (where K is the number of 
partitions). 
Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still 
think that btree just aren't fit for this kind of problem. I don't delete data, 
I don't update data, there's not that much concurrency going on. I would 
sacrifice search speed (K*log(N) is already much slower than "regular" btree 
usage) for realtime insertion.

I don't think I'm the only one having a big system to be indexed by random 
values. 

In fact, I didn't want to turn this thread into a "help me with this workload" 
thread. I just wanted to know if there was some other known issues with these 
"different indexes" other than "not enough time to implement them correctly": I 
was afraid that someone already dismissed them as "good in theory, bad in 
practice"...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to