On Mon, Dec 5, 2016 at 7:44 AM, Peter Geoghegan <p...@heroku.com> wrote:
> On Sat, Dec 3, 2016 at 7:23 PM, Tomas Vondra > <tomas.von...@2ndquadrant.com> wrote: > > I do share your concerns about unpredictable behavior - that's > > particularly worrying for pg_restore, which may be used for time- > > sensitive use cases (DR, migrations between versions), so unpredictable > > changes in behavior / duration are unwelcome. > > Right. > > > But isn't this more a deficiency in pg_restore, than in CREATE INDEX? > > The issue seems to be that the reltuples value may or may not get > > updated, so maybe forcing ANALYZE (even very low statistics_target > > values would do the trick, I think) would be more appropriate solution? > > Or maybe it's time add at least some rudimentary statistics into the > > dumps (the reltuples field seems like a good candidate). > > I think that there is a number of reasonable ways of looking at it. It > might also be worthwhile to have a minimal ANALYZE performed by CREATE > INDEX directly, iff there are no preexisting statistics (there is > definitely going to be something pg_restore-like that we cannot fix -- > some ETL tool, for example). Perhaps, as an additional condition to > proceeding with such an ANALYZE, it should also only happen when there > is any chance at all of parallelism being used (but then you get into > having to establish the relation size reliably in the absence of any > pg_class.relpages, which isn't very appealing when there are many tiny > indexes). > > In summary, I would really like it if a consensus emerged on how > parallel CREATE INDEX should handle the ecosystem of tools like > pg_restore, reindexdb, and so on. Personally, I'm neutral on which > general approach should be taken. Proposals from other hackers about > what to do here are particularly welcome. > > Moved to next CF with "needs review" status. Regards, Hari Babu Fujitsu Australia