After bumping up work_mem from 12MB to 25MB that last materialize is indeed hashing and this cut the query time by about 60%. Thanks, this was very helpful and gives me something else to look for when troubleshooting explains.
On Tue, Jul 7, 2015 at 11:10 AM, Mike Broers <mbro...@gmail.com> wrote: > Thanks, very informative! I'll experiment with work_mem settings and > report back. > > On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Mike Broers <mbro...@gmail.com> writes: >> > I had a query that was filtering with a wildcard search of a text field >> for >> > %SUCCESS%. The query took about 5 seconds and was running often so I >> wanted >> > to improve it. I suggested that the engineers include a new boolean >> column >> > for successful status. They implemented the requested field, but the >> query >> > that filters on that new column runs very long (i kill it after letting >> it >> > run for about an hour). Can someone help me understand why that is the >> > case and how to resolve it? >> >> It's hashing the subplan output in the first case and not the second: >> >> > Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369) >> (actual >> > time=4699.619..4699.869 rows=1 loops=1) >> > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <> >> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3)))) >> ^^^^^^^^^^^^^^^^ >> vs >> >> > Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957 >> width=369) >> > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <> >> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3)))) >> ^^^^^^^^^ >> >> Presumably, the new more-accurate rows count causes the planner to realize >> that the hash table will exceed work_mem so it doesn't choose to hash ... >> but for your situation, you'd rather it did, because what you're getting >> instead is a Materialize node that spills to disk (again, because the data >> involved exceeds work_mem) and that's a killer for this query. You should >> be able to get back the old behavior if you raise work_mem enough. >> >> Another idea you might think about is changing the OR'd IN conditions >> to a single IN over a UNION ALL of the subselects. I'm not really sure if >> that would produce a better plan, but it's worth trying if it wouldn't >> require too much app-side contortion. >> >> regards, tom lane >> > >