On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan <p...@heroku.com> wrote: > On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmh...@gmail.com> wrote: >> So here. You may not agree that the mitigation strategies for which >> others are asking for are worthwhile, but you can't expect everyone >> else to agree with your assessment of which cases are likely to occur >> in practice. The case of a cohort of strings to be sorted which share >> a long fixed prefix and have different stuff at the end does not seem >> particularly pathological to me. It doesn't, in other words, require >> an adversary: some real-world data sets will look like that. I will >> forebear repeating examples I've given before, but I've seen that kind >> of thing more than once in real data sets that people (well, me, >> anyway) actually wanted to put into a PostgreSQL database. So I'm >> personally of the opinion that the time you've put into trying to >> protect against those cases is worthwhile. I understand that you may >> disagree with that, and that's fine: we're not all going to agree on >> everything. > > I actually agree with you here.
/me pops cork. > Sorting text is important, so we > should spend a lot of time avoiding regressions. Your example is > reasonable - I believe that people do that to a non-trivial degree. > The fact is, I probably can greatly ameliorate that case. However, to > give an example of a case I have less sympathy for, I'll name the case > you mention, *plus* the strings are already in logical order, and so > our qsort() can (dubiously) take advantage of that, so that there is a > 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls. > There might be other cases like that that crop up. I think that's actually not a very unrealistic case at all. In general, I think that if a particular data distribution is a reasonable scenario, that data distribution plus "it's already sorted" is also reasonable. Data gets presorted in all kinds of ways: sometimes it gets loaded in sorted (or nearly-sorted) order from another data source; sometimes we do an index scan that produces data in order by column A and then later sort by A, B; sometimes somebody clusters the table. Respecting the right of other people to have different opinions, I don't think I'll ever be prepared to concede the presorted case as either uncommon or unimportant. That's not to prejudge anything that may or may not be in your patch, which I have not studied in enormous detail. It's just what I think about the subject in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers