On 30 October 2013 10:35, Leonardo Francalanci <m_li...@yahoo.it> wrote:
>> Presumably the data you are inserting isn't actually random. Please
>> describe the use case you are considering in more detail and some view
>> on how frequent that is, with some examples. Once we understand the
>> use case and agree it is important, we might solve problems.
>
>
> Collecting calls data for mobile network operators (and no, I don't work for 
> the NSA...)
> Easily 5000-10000 inserts per second. Indexes in timestamp and ID (not a 
> problem, always increasing so no btree issues) and in called #, calling #, 
> imsi, imei. The last four obviously are random, out of millions of possible 
> values.
> After the few first millions of records, the disks can't keep up with the 
> amount of random writing in the indexes. Workaround: the table is partitioned 
> every 15 minutes, and indexes created in bulk after we "start" the new 
> 15-minutes partition. Searches on current 15 minutes are not allowed (as it 
> is not indexed), and searches on older data are K*log(N) (where K is the 
> number of partitions).
> Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still 
> think that btree just aren't fit for this kind of problem. I don't delete 
> data, I don't update data, there's not that much concurrency going on. I 
> would sacrifice search speed (K*log(N) is already much slower than "regular" 
> btree usage) for realtime insertion.
>
> I don't think I'm the only one having a big system to be indexed by random 
> values.
>
> In fact, I didn't want to turn this thread into a "help me with this 
> workload" thread. I just wanted to know if there was some other known issues 
> with these "different indexes" other than "not enough time to implement them 
> correctly": I was afraid that someone already dismissed them as "good in 
> theory, bad in practice"...

What is the reason for needing such fast access to individual groups
of records? Sure sounds like the NSA or similar ;-)

Sacrificing timeliness for efficiency is a common solution. I'm seeing
lots of areas where being able to specify the timeliness that is
acceptable in a query leads to various optimisations of this and
similar.

Indexes are a declarative solution.  We would need to be able to
specify the tolerances to be able to do this. (You can write your own
index...)

In terms of generality, do you think its worth a man year of developer
effort to replicate what you have already achieved? Who would pay?

-- 
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to