> From: Robert Haas [mailto:robertmh...@gmail.com]
> On Mon, Aug 6, 2012 at 10:33 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Robert Haas <robertmh...@gmail.com> writes: > >> On Sun, Aug 5, 2012 at 10:41 PM, Etsuro Fujita > >> <fujita.ets...@lab.ntt.co.jp> wrote: > >>> I think file_fdw is useful for managing log files such as PG CSV logs. Since > >>> often, such files are sorted by timestamp, I think the patch can improve > the > >>> performance of log analysis, though I have to admit my demonstration was > not > >>> realistic. > > > >> Hmm, I guess I could buy that as a plausible use case. > > > > In the particular case of PG log files, I'd bet good money against them > > being *exactly* sorted by timestamp. Clock skew between backends, or > > varying amounts of time to construct and send messages, will result in > > small inconsistencies. This would generally not matter, until the > > planner relied on the claim of sortedness for something like a mergejoin > > ... and then it would matter a lot. > > Hmm, true. > > > In general I'm quite suspicious of the idea of believing that externally > > supplied data is sorted in exactly the way that PG thinks it should > > sort. If we implement this you can bet that people will screw up, for > > instance by using the wrong locale/collation to sort text data. > > I think that optimizations like this are going to be essential for > things like pgsql_fdw (or other_rdms_fdw). Despite the thorny > semantic issues, we're just not going to be able to get around it. > There will even be people who want SELECT * FROM ft ORDER BY 1 to > order by the remote side's notion of ordering rather than ours, > despite the fact that the remote side has some insane-by-PG-standards > definition of ordering. People are going to find ways to do that kind > of thing whether we condone it or not, so we might as well start > thinking now about how we're going to live with it. But that doesn't > answer the question of whether or not we ought to support it for > file_fdw in particular, which seems like a more arguable point. For file_fdw, I feel inclined to simply implement file_fdw (1) to verify the key column is sorted in the specified way at the execution phase ie, at the (first) scan of a data file, only when pathkeys are set, and (2) to abort the transaction if it detects the data file is not sorted. Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers