Hi Virendra, Thanks for your time.
Here is the table and index structure * process_activity* Table "public.process_activity" Column | Type | Modifiers --------------------+-----------------------------+---------------------------- process_activity_id | bigint | not null default next_id() process_activity_type | smallint | not null voice_url | text | process_activity_user_id | bigint | not null app_id | bigint | not null process_instance_id | bigint | not null alias | text | not null read_by_user | smallint | default 0 source | smallint | default 0 label_category_id | bigint | label_id | bigint | csat_response_id | bigint | process_activity_fragments | jsonb | created | timestamp without time zone | not null updated | timestamp without time zone | rule_id | bigint | marketing_reply_id | bigint | delivered_at | timestamp without time zone | reply_fragments | jsonb | status_fragment | jsonb | internal_meta | jsonb | interaction_id | text | do_not_translate | boolean | should_translate | integer | in_reply_to | jsonb | Indexes: "process_activity_pkey" PRIMARY KEY, btree (process_activity_id) "fki_process_activity_konotor_user_user_id" btree (process_activity_user_id) WITH (fillfactor='70') "*process_activity_process_instance_id_app_id_created_idx*" btree (process_instance_id, app_id, created) WITH (fillfactor='70') "process_activity_process_instance_id_app_id_read_by_user_created_idx" btree (process_instance_id, app_id, read_by_user, created) WITH (fillfactor='70') "process_activity_process_instance_id_idx" btree (process_instance_id) WITH (fillfactor='70') *process_instance* Table "public.process_instance" Column | Type | Modifiers -------------------------+-----------------------------+----------------------------- process_instance_id | bigint | not null default next_id() process_instance_alias | text | not null app_id | bigint | not null user_id | bigint | not null Indexes: "process_instance_pkey" PRIMARY KEY, btree (process_instance_id) "*fki_conv_konotor_user_user_id*" btree (user_id) WITH (fillfactor='70') Regards, Amarendra On Fri, May 8, 2020 at 12:01 AM Virendra Kumar <viru_7...@yahoo.com> wrote: > Sending table structure with indexes might help little further in > understanding. > > Regards, > Virendra > > On Thursday, May 7, 2020, 11:08:14 AM PDT, Amarendra Konda < > amar.vij...@gmail.com> wrote: > > > Hi David, > > In earlier reply, Over looked another condition, hence please ignore that > one > > Here is the correct one with all the needed conditions. According to the > latest one, exists also not limiting rows from the process_activity table. > > > EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id > AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = > '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( > SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND > *pi.process_instance_id > = pa.process_instance_id * AND pi.user_id = '137074931866340') ORDER BY > pa.process_instance_id, pa.created limit 50; > > > > QUERY PLAN > > > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ----------------------- > Limit (cost=1079.44..1079.52 rows=32 width=24) (actual > time=85.747..85.777 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=43070 > -> Sort (cost=1079.44..1079.52 rows=32 width=24) (actual > time=85.745..85.759 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Sort Key: pa.process_instance_id, pa.created > Sort Method: top-N heapsort Memory: 28kB > Buffers: shared hit=43070 > -> Nested Loop (cost=1.14..1078.64 rows=32 width=24) (actual > time=0.025..72.115 rows=47011 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, > pa.created > Buffers: shared hit=43070 > -> Index Scan using fki_conv_konotor_user_user_id on > public.process_instance pi (cost=0.43..2.66 rows=1 width=16) (actual > time=0.010..0.015 rows=2 loops=1) > Output: pi.app_id, pi.process_instance_id > Index Cond: (c.user_id = '137074931866340'::bigint) > Filter: (c.app_id = '126502930200650'::bigint) > Buffers: shared hit=5 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..1061.62 rows=1436 width=32) *(actual > time=0.011..20.320 rows=23506 loops=2)* > Output: pa.process_activity_id, > pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, > pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, > pa.label_category_id, pa.label_id, pa.csat_respons > e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, > pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, > pa.status_fragment, pa.internal_meta, pa.interaction_id, > pa.do_not_translate, pa.should_tr > anslate, pa.in_reply_to > Index Cond: ((m.process_instance_id = > pi.process_instance_id) AND (m.app_id = '126502930200650'::bigint) AND > (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) > Buffers: shared hit=43065 > Planning time: 0.455 ms > Execution time: 85.830 ms > > On Thu, May 7, 2020 at 11:19 PM Amarendra Konda <amar.vij...@gmail.com> > wrote: > > Hi David, > > Thanks for the reply.This has optimized number of rows. > > Can you please explain, why it is getting more columns in output, even > though we have asked for only one column ? > > > EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id > AS pa_process_activity_id FROM process_activity pa WHERE pa.app_id = > '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND EXISTS ( > SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND > pi.user_id = '137074931866340') ORDER BY pa.process_instance_id,m.created > limit 50; > > > > QUERY PLAN > > > > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ----------------- > Limit (cost=1.14..37.39 rows=50 width=24) (actual time=821.283..891.629 > rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=274950 > -> Nested Loop Semi Join (cost=1.14..266660108.78 rows=367790473 > width=24) (actual time=821.282..891.607 rows=50 loops=1) > Output: pa.process_activity_id, pa.process_instance_id, pa.created > Buffers: shared hit=274950 > -> Index Scan using > process_activity_process_instance_id_app_id_created_idx on > public.process_activity pa (cost=0.70..262062725.21 rows=367790473 > width=32) (actual time=821.253..891.517 rows=50 loops=1) > > > * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, > pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, > pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, > pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated, > pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, > pa.status_fragment, pa.internal_meta, pa.interaction_id, > pa.do_not_translate, pa.should_translate, pa.in_reply_to* > Index Cond: ((m.app_id = '126502930200650'::bigint) AND > (m.created > '1970-01-01 00:00:00'::timestamp without time zone)) > Buffers: shared hit=274946 > -> Materialize (cost=0.43..2.66 rows=1 width=8) (actual > time=0.001..0.001 rows=1 loops=50) > Output: pi.app_id > Buffers: shared hit=4 > -> Index Scan using fki_conv_konotor_user_user_id on > public.process_instance pi (cost=0.43..2.66 rows=1 width=8) (actual > time=0.020..0.020 rows=1 loops=1) > Output: pi.app_id > Index Cond: (pi.user_id = '137074931866340'::bigint) > Filter: (pi.app_id = '126502930200650'::bigint) > Buffers: shared hit=4 > Planning time: 0.297 ms > Execution time: 891.686 ms > (20 rows) > > On Thu, May 7, 2020 at 9:17 PM David G. Johnston < > david.g.johns...@gmail.com> wrote: > > On Thu, May 7, 2020 at 7:40 AM Adrian Klaver <adrian.kla...@aklaver.com> > wrote: > > On 5/7/20 4:19 AM, Amarendra Konda wrote: > > Hi, > > > > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled > > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit > > > > We have noticed huge difference interms of execution plan ( response > > time) , When we pass the direct values Vs inner query to IN clause. > > > > High level details of the use case are as follows > > > > * As part of the SQL there are 2 tables named Process_instance > > (master) and Process_activity ( child) > > * Wanted to fetch TOP 50 rows from Process_activity table for the > > given values of the Process_instance. > > * When we used Inner Join / Inner query ( query1) between parent > > table and child table , LIMIT is not really taking in to account. > > Instead it is fetching more rows and columns that required, and > > finally limiting the result > > It is doing what you told it to do which is SELECT all > process_instance_i's for user_id='317079413683604' and app_id = > '427380312000560' and then filtering further. I am going to guess that > if you run the inner query alone you will find it returns ~23496 rows. > You might have better results if you an actual join between > process_activity and process_instance. Something like below(obviously > not tested): > > > What the OP seems to want is a semi-join: > > (not tested) > > SELECT pa.process_activity_id > FROM process_activity pa WHERE pa.app_id = '427380312000560' AND > pa.created > '1970-01-01 00:00:00' > AND EXISTS ( > SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND > pi.user_id = '317079413683604' > ) > ORDER BY > pa.process_instance_id, > pa.created limit 50; > > I'm unsure exactly how this will impact the plan choice but it should be > an improvement, and in any case more correctly defines what it is you are > looking for. > > David J. > >