On Fri, Aug 5, 2016 at 8:55 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
> On Fri, Aug 5, 2016 at 1:27 AM, Pavan Deolasee <pavan.deola...@gmail.com> > wrote: > > > > > > I don't see why it is hard. Trying to find the index entry for the old > > update row seems odd, and updating an index row seems odd, but skipping > > an index addition for the new row seems simple. Is the problem > > concurrent activity? I assumed already had that ability to add to HOT > > chains because we have to lock the update chain. > > > Think of an index over a 1TB table whose keys have only 2 values: true > and false. > > Sure, it's a horrible index. But I've seen things like that, and I've > seen cases when they're useful too. > > So, conceptually, for each key you have circa N/2 tids on the index. > nbtree finds the leftmost valid insert point comparing keys, it > doesn't care about tids, so to find the index entries that point to > the page where the new tuple is, you'd have to scan the N/2 tids in > the index, an extremely expensive operation. > > Well, it's always going to be extremely hard to solve for all use cases. This is one such extreme case and we should just give up and do cold update. I think we can look at the index type (unique vs non-unique) along with table statistics to find what fraction of column values are distinct and then estimate whether its worthwhile to look for duplicate (key, CTID) or just do a cold update. In addition put some cap of how hard we try once we decide to check for duplicates and give up after we cross that threshold. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services