On Wed, Dec 9, 2015 at 1:31 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> On 12/09/2015 12:24 AM, Berend Tober wrote: > >> Adrian Klaver wrote: >> >>> On 12/05/2015 08:08 AM, Berend Tober wrote: >>> >>>> /* >>>> >>>> Deletion Challenge >>>> >>>> I want to delete all but the most recent transaction, per person, from a >>>> table that records a transaction history because at some point the >>>> transaction history grows large enough to adversely effect performance, >>>> and also becomes less relevant for retention. >>>> >>>> ... >>>> >>>> >>> test=> delete from cash_journal where ARRAY[click, cash_journal_id] >>> NOT in (select max(ARRAY[click,cash_journal_id]) from cash_journal >>> group by fairian_id); >>> DELETE 7 >>> >>> test=> SELECT * FROM cash_journal order by fairian_id, click, >>> cash_journal_id; >>> click | cash_journal_id | fairian_id | debit | credit | balance >>> | description >>> >>> -------+-----------------+------------+-------+--------+---------+---------------------------------- >>> >>> 412 | 1 | 7 | 5 | | 14 | >>> Sold food quantity 7 units. >>> 37 | 7 | 8 | 8 | | 8 | >>> Ratified contract f1abd670358e03 >>> 37 | 9 | 9 | 7 | | 7 | >>> Ratified contract 1574bddb75c78a >>> 36 | 14 | 18 | 0 | 0 | 0 | >>> initial cash balance >>> 413 | 1 | 25 | | 995 | 0 | >>> Redeemed bond 7719a1c782a1ba >>> (5 rows) >>> >>> >> Nice. >> >> The idea of a NOT IN query had occurred to me briefly, but I failed to >> pursue it because at some point in the distant past I had gained the >> impression that NOT IN queries were not computationally efficient. >> During one round of testing I had like a million rows. I'll have to run >> some EXPLAIN query testing with a larger data sample for comparison. >> Thanks! >> > > Plan B: > > WITH d AS > (SELECT * FROM > cash_journal > LEFT JOIN > (SELECT > MAX(ARRAY[click,cash_journal_id]) AS mx > FROM > cash_journal > GROUP BY > fairian_id) > AS > mxa > ON > mxa.mx=ARRAY[click, cash_journal_id] > WHERE > mx IS NULL) > DELETE FROM > cash_journal > USING > d > WHERE > d.click = cash_journal.click > AND > d.cash_journal_id = cash_journal.cash_journal_id; > > Couldn't the LEFT JOIN relation in the CTE be better written using "SELECT DISTINCT ON (click, cash_journal_id) click, cash_journal_id [...] ORDER BY click DESC, cash_journal_id" or something similar? It doesn't seem like you should need to introduce an array and an aggregate here. It does have the negative property of only providing a single row; which excludes using it for the "last 5" part but I suspect it will be considerably faster for the single version. David J.