On 21 June 2012 11:40, Florian Pflug <f...@phlo.org> wrote:
> At this point, my theory is that your choice of "random" strings prevents
> strxfrm() from ever winning over strcoll(). The reason being that you pick
> each letter uniformly distributed from a-z, resulting in a probability of
> two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
> extremely complex collation rules, strcoll() probably only needs to compare
> a few characters to determine the order. Whereas strxfrm() has to compute
> the whole sort key, no matter what.

Good point.

> The question is thus, how good a model are your "random" strings for the
> input of a typical sorting step in postgres? My guess is, a quite good one
> actually, since people probably don't deal with a lot of very similar strings
> very often. Which makes we wonder if using strxfrm() during sorting wouldn't
> be a net loss, all things considered…

Well, I think the answer to that has to be no. I posted a simple
postgres text -> strxfrm blob bytea SQL-callable wrapper function a
few days ago that clearly wins by quite a bit on some sample queries
against the dellstore sample database, which has a representative
sample set. Completely uniformly-distributed data actually isn't
representative of the real world at all. Normal distributions abound.

This C++ program was never intended to justify the general utility of
using strxfrm() for sorting, which I don't believe is in question. I
just wanted to get some idea about the performance characteristics of
using strxfrm() to traverse an index.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to