On 5/7/20 4:19 AM, Amarendra Konda wrote:
Hi,

PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

We have noticed huge difference interms of execution plan ( response time) , When we pass the direct values  Vs  inner query to IN clause.

High level details of the use case are as follows

  * As part of the SQL there are 2 tables named Process_instance
    (master) and Process_activity ( child)
  * Wanted to fetch TOP 50 rows from  Process_activity table for the
    given values of the Process_instance.
  * When we used Inner Join / Inner query ( query1)  between parent
    table and child table , LIMIT is not really taking in to account.
    Instead it is fetching more rows and columns that required, and
    finally limiting the result

It is doing what you told it to do which is SELECT all process_instance_i's for user_id='317079413683604' and app_id = '427380312000560' and then filtering further. I am going to guess that if you run the inner query alone you will find it returns ~23496 rows. You might have better results if you an actual join between process_activity and process_instance. Something like below(obviously not tested):

SELECT
    pa.process_activity_id
FROM
    process_activity pa
JOIN
    process_instance pi
ON
    pa.process_instance_id = pi.process_instance_id
WHERE
    pa.app_id = '427380312000560'
    AND
         pa.created > '1970-01-01 00:00:00'
    AND
         pi.user_id = '317079413683604'
ORDER BY
    pa.process_instance_id,
    pa.created
LIMIT 50;

The second query is not equivalent as you are not filtering on user_id and you are filtering on only three process_instance_id's.


  *


*Query1*

web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id  FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in *_(SELECT pi.process_instance_id FROM process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id = '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50;                                                                 QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  Limit  (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.958..85.991 rows=50 loops=1)
    Output: pa.process_activity_id, pa.process_instance_id, pa.created
    Buffers: shared hit=43065
   ->  Sort  (cost=1071.47..1071.55 rows=31 width=24) (actual time=85.956..85.971 rows=50 loops=1)
          Output: pa.process_activity_id, pa.process_instance_id, pa.created
          Sort Key: pa.process_instance_id, pa.created
          Sort Method: top-N heapsort  Memory: 28kB
          Buffers: shared hit=43065
         ->  Nested Loop  (cost=1.14..1070.70 rows=31 width=24) (actual time=0.031..72.183 rows=46992 loops=1)                Output: pa.process_activity_id, pa.process_instance_id, pa.created
                Buffers: shared hit=43065
               ->  Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual time=0.010..0.013 rows=2 loops=1)
                      Output: pi.process_instance_id
                      Index Cond: (pi.user_id = '317079413683604'::bigint)
                      Filter: (pi.app_id = '427380312000560'::bigint)
                      Buffers: shared hit=5
               ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..1053.80 rows=1425 width=24) (actual time=0.015..20.702 rows=*23496* loops=2) * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_response_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_translate, pa.in_reply_to*                      Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
                      Buffers: shared hit=43060
  Planning time: 0.499 ms
  Execution time: 86.040 ms
(22 rows)

*_Query 2_*

web_1=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity m WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (*240117466018927,325820556706970,433008275197305*) ORDER BY pa.process_instance_id,pa.created limit 50;                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094 rows=50 loops=1)
    Output: process_activity_id, process_instance_id, created
    Buffers: shared hit=50
   ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..3124.97 rows=4226 width=24) (actual time=0.022..0.079 *rows=50* loops=1)
          Output: process_activity_id, process_instance_id, created
         Index Cond: ((pa.process_instance_id = ANY ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
          Buffers: shared hit=50
  Planning time: 0.167 ms
  Execution time: 0.137 ms
(9 rows)


Can someone explain

  * Why It is fetching more columns and more rows, incase of inner query ?
  * Is there any option to really limit values with INNER JOIN, INNER
    query ? If yes, can you please share information on this ?

Thanks in advance for your time and suggestions.

Regards, Amar


--
Adrian Klaver
adrian.kla...@aklaver.com


Reply via email to