Re: How to solve my slow disk i/o throughput during index scan

2024-07-11 Thread David G. Johnston
On Thursday, July 11, 2024, FREYBURGER Simon (SNCF VOYAGEURS / DIRECTION
GENERALE TGV / DM RMP YIELD MANAGEMENT)  wrote:

> Also, It might not be related, but I have suspiciously similar slow reads
> when I am inserting in database, could it be related ?
> I’m using a 3 steps process to insert my lines in the table :
>
>- COPY into a temporary table
>- DELETE FROM on the perimeter I will be inserting into
>- INSERT … INTO mytable SELECT … FROM temporarytable ON CONFLICT DO
>NOTHING
>
>
>
> Is it possible to parallelize the scans during the modify step ?
>
>
>
This tells you when parallelism is used:


https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

David J.


Re: Specific objects backup in PostgreSQL

2024-07-10 Thread David G. Johnston
On Wed, Jul 10, 2024 at 11:05 AM nikhil kumar 
wrote:

>
> We received a request from client. They required all functions, stored
> procedures and triggers backup. can anyone please let me know. How to take
> backup only above objects.
>

This hardly qualifies as a performance question.

You might try the -general list if you want to brainstorm workarounds
because pg_dump itself doesn't provide any command line options to give you
this specific subset of your database.

David J.


Re: Need help on configuration SMTP

2024-06-12 Thread David G. Johnston
On Wednesday, June 12, 2024, nikhil kumar  wrote:

>
> Can anyone please help on SMTP configuration for send gmail. If any
> document please let me know.
>

This seems like an exceedingly unusual place to be asking for such help…

David J.


Re: Parallel hints in PostgreSQL with consistent perfromance

2024-01-02 Thread David G. Johnston
On Tue, Jan 2, 2024 at 8:12 AM mohini mane  wrote:

>
> I have executed the SELECT query with 2,4 & 6 parallel degree however
>>> every time only 4 workers launched & there was a slight increase in
>>> Execution time as well,
>>>
>>
>> Adding an ignored comment to your SQL would not be expected to do
>> anything.  So it is not surprising that it does not do anything about
>> the number of workers launched.  It is just a comment.  A note to the human
>> who is reading the code.
>> * >> As I am using ph_hint_plan extension so as expected hints should not
>> get ignored by the optimizer .*
>>
>
Sounds like a bug you should go tell the pg_hint_plan authors about then.

David J.


Re: Parallel hints in PostgreSQL with consistent perfromance

2023-12-28 Thread David G. Johnston
On Thursday, December 28, 2023, mohini mane 
wrote:

> Thank you for your response !!
> I am experimenting with SQL query performance for SELECT queries on large
> tables and I observed that changing/increasing the degree of parallel hint
> doesn't give the expected performance improvement.
>
> I have executed the SELECT query with 2,4 & 6 parallel degree however
> every time only 4 workers launched & there was a slight increase in
> Execution time as well, why there is an increase in execution time with
> parallel degree 6 as compared to 2 or 4?
>

Random environmental effects.

Also, analyzing a performance test without understanding how “buffers” are
used is largely pointless.

Whatever told you about that comment syntax is hallucinating.

Please don’t reply by top-posting. Inline reply to the comments others make
directly and trim as needed.  Simply restating your first email isn’t
productive.

You cannot enforce the number of workers used, only the the maximum.  That
is you knob.

David J.


Re: Awkward Join between generate_series and long table

2023-11-08 Thread David G. Johnston
On Wed, Nov 8, 2023 at 6:26 PM Lincoln Swaine-Moore 
wrote:

> SELECT
>
s at time zone 'utc'  AS period_start,
> LEAD(s) OVER (
> ORDER BY
> s
> )   at time zone 'utc' AS period_end
>

Maybe doesn't help overall but this can be equivalently written as:
s + '1 day'::interval as period_end

Resorting to a window function here is expensive waste, the lead() value
can be computed, not queried.


> SELECT
> p.period_start,
> p.period_end,
> COUNT (distinct d.id)
> FROM
> periods p
> LEFT JOIN data d
> ON
> d.timestamp >= (p.period_start)
> AND d."timestamp" < (p.period_end)
> AND d.sn = 'BLAH'
>

This seems better written (semantically, not sure about execution dynamics)
as:

FROM periods AS p
LEFT JOIN LATERAL (SELECT count(distinct? d.id) FROM data AS d WHERE
d.timestamp >= p.period_start AND d.timestamp < p.period_end AND d.sn =
'BLAH') AS cnt_d
-- NO grouping required at this query level

David J.


Re: Function call very slow from JDBC/java but super fast from DBear

2023-08-09 Thread David G. Johnston
On Tuesday, August 8, 2023, An, Hongguo (CORP)  wrote:

> Hi:
>
> I have a function, if I call it from DBeaver, it returns within a minute.
>
> *call* commonhp.run_unified_profile_load_script_work_assignment_details(
> 'BACDHP', 'G3XPM6YE2JHMSQA2');
>
> but if I called it from spring jdbc template, it never comes back:
>

If you are passing in exactly the same text values and running this within
the same database then executing the call command via any method should
result in the same exact outcome in terms of execution.  You can get shared
buffer cache variations but that should be it.  I’d want to exhaust the
possibility that those preconditions are not met before investigating this
as some sort of bug.  At least, a PostgreSQL bug.  I suppose the relative
novelty of using call within the JBC driver leaves open the possibility of
an issue there.  Removing the driver by using PREPARE may help to move
things forward (as part of a well written bug report).  In any case this is
not looking to be on-topic for the performance list.  Some more research
and proving out should be done then send it to either the jdbc list or
maybe -bugs, if not then -general.

You may want to install the auto-explain extension to get the inner query
explain plan(s) into the logs for examination.

David J.


Re: Index on (fixed size) bytea value

2023-06-19 Thread David G. Johnston
On Mon, Jun 19, 2023 at 1:05 PM Les  wrote:

> AFAIK PostgreSQL does not allow a row to occupy multiple blocks.
>

Your plan is going to heavily involve out-of-band storage.  Please read up
on it here:

https://www.postgresql.org/docs/current/storage-toast.html

I'm not looking for a definitive answer, just trying to get some hints from
> more experienced users before I fill up the drives with terabytes of data.
>
>
Store a hash (and other metadata, like the hashing algorithm) as well as
the path to some system better designed for object storage and retrieval
instead.

David J.


Performance issues in query with multiple joins

2023-04-28 Thread David G. Johnston
On Friday, April 28, 2023, Παρασκευη Πασσαρη 
wrote:

> Dear all,
>
> We are facing a performance issue with the following query. Executing this
> query takes about 20 seconds.
> (the database version is 14.1)
>

Given the possibility of this working better in the supported 14.7 I
suggest starting with a minor version update then posting again if still
having problems.

David J.


Re: Domain check taking place unnecessarily?

2023-02-08 Thread David G. Johnston
On Wed, Feb 8, 2023 at 11:01 AM Mark Hills  wrote:

>
> CREATE DOMAIN hash AS text
> CHECK (VALUE ~ E'^[a-zA-Z0-9]{8,32}$');
>
> devstats=> ALTER TABLE invite ADD COLUMN test hash;
> ALTER TABLE
> Time: 30923.380 ms (00:30.923)
>

Necessarily, I presume because if you decided that the check on the domain
should be "value is not null" (don't do this though...) the column addition
would have to fail for existing rows (modulo defaults...).

David J.


Re: alter table xxx set unlogged take long time

2022-07-26 Thread David G. Johnston
On Tue, Jul 26, 2022 at 5:45 AM James Pang (chaolpan) 
wrote:

> Without step 3 ,  copy data take long time.   Use wal_level=minimal can
> help make COPY load data without logging ?
>
>
I believe you are referring to:

https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM

Since the final state of your table will be "logged" relying on the above
optimization is the correct path, if you enable "logged" at the end, even
with wal_level=minimal, you do not benefit from the optimization and thus
your data ends up being written to WAL.

Otherwise, it is overall time that matters, it's no use boasting the COPY
is fast if you end up spending hours waiting for ALTER TABLE at the end.

David J.


Re: Missed query planner optimization: `n in (select q)` -> `n in (q)`

2022-06-12 Thread David G. Johnston
On Sun, Jun 12, 2022 at 2:47 PM Josh  wrote:

>
> This was only possible because I was dealing with arrays though, and an
> operation such as `in (select unnest...)` can be easily converted to `=
> any(...)`. However for the general case,


In the general case you don't have subqueries inside join conditions.


> I believe an optimization in this area may provide benefit as there may
> exist a circumstance that does not have an alternative to a sub-query
> select (`= any()` was my alternative)


I think we'd want a concrete example of a non-poorly written query (or at
least a poorly written one that, say, is generated by a framework, not just
inexperienced human SQL writers) before we'd want to even entertain
spending time on something like this.


> - Is this an issue that should be fixed?


Probably not worth the effort.

I'm fascinated by the work y'all do, and submitting a patch to Postgres
> that makes it into production would make my week.
>
>
Maybe you'll find almost as much good is done helping others get their
patches committed.  There are many in need of reviewers.

https://commitfest.postgresql.org/


REINDEXdb performance degrading gradually PG13.4

2022-05-31 Thread David G. Johnston
On Tuesday, May 31, 2022, Praneel Devisetty 
wrote:

>
> Initially it was processing 1000 tables per minute. Performance is
>> gradually dropping and now after 24 hr it was processing 90 tables per
>> minute.
>>
>
That seems like a fairly problematic metric given the general vast
disparities in size tables have.

Building indexes is so IO heavy that the non-IO bottlenecks that exists
likely have minimal impact on the overall times this rebuild everything
will take.  That said, I’ve never done anything at this scale before.  I
wouldn’t be too surprised if per-session cache effects are coming into play
given the number of objects involved and the assumption that each session
used for parallelism is persistent.  I’m not sure how the parallelism works
for managing the work queue though as it isn’t documented and I haven’t
inspected the source code.


Re: Query Tunning related to function

2022-04-17 Thread David G. Johnston
On Sun, Apr 17, 2022 at 8:53 AM Kumar, Mukesh 
wrote:

> We request you to please provide some assistance on below issue and it is
> impacting the migration project.
>

I suggest you try and re-write the loop-based function into a set-oriented
view.

Specifically, I think doing: "array_agg(DISTINCT paymenttype)" and then
checking for various array results will be considerably more efficient.

Or do a combination: write the set-oriented query in an SQL function.  You
should not need pl/pgsql for this and avoiding it should improve
performance.

David J.

p.s., The convention on these lists is to inline post and remove unneeded
context.  Or at least bottom post.


Re: Slow Planning Times

2022-04-06 Thread David G. Johnston
On Wed, Apr 6, 2022 at 5:27 PM Saurabh Sehgal  wrote:

>
> I have the following query:
>
>  *explain* (*analyze*, costs, timing) *SELECT*  rr.* *FROM* rpc rpc
>
>*INNER* *JOIN* rr rr
>
>*ON* rr.uuid = rpc.rr_id
>
>*INNER* *JOIN* rs rs
>
>*ON* rs.r_id = rpc.r_id
>
>*INNER* *JOIN* *role* r
>
>*ON* r.uuid = rs.r_id
>
>*LEFT* *JOIN* spc spc
>
>*ON* spc.rr_id = rpc.rr_id
>
>*WHERE* rs.s_id =
> 'caa767b8-8371-43a3-aa11-d1dba1893601'
>
>*and* spc.s_id  =
> 'caa767b8-8371-43a3-aa11-d1dba1893601'
>
>*and* spc.rd_id  =
> '9f33c45a-90c2-4e05-a42e-048ec1f2b2fa'
>
>*AND* rpc.rd_id =
> '9f33c45a-90c2-4e05-a42e-048ec1f2b2fa'
>
>*AND* rpc.c_id =
> '9fd29fdc-15fd-40bb-b85d-8cfe99734987'
>
>*and* spc.c_id  =
> '9fd29fdc-15fd-40bb-b85d-8cfe99734987'
>
>*AND* rr.b_id = 'xyz'
>
>*AND* (('GLOBAL' = ' NO_PROJECT_ID + "' ) *OR* (rr.
> p_id = 'GLOBAL'))
>
>*AND* spc.permission_type *IS* *null* *and* spc.
> is_active  = *true*
>
>*AND* rpc.is_active = *true* *AND* rr.is_active =
> *true* *AND* rs.is_active = *true* *AND* r.is_active = *true*
>
>
> I don't think it is super complex. But when I run explain analyze on this
> I get the following:
>
> Planning Time: 578.068 ms
> Execution Time: 0.113 ms
>
> This is a huge deviation in planning vs. execution times. The explain plan
> looks good since the execution time is < 1ms. It doesn't matter though
> since the planning time is high. I don't see anything in the explain
> analyze output that tells me why the planning time is high. On average, the
> tables being joined have 3 indexes/table. How can I debug this?
>
> Been stuck on this for weeks. Any help is appreciated. Thank you!
>
>
The fundamental issue here is that you have basically 12 conditions across
5 tables that need to be evaluated to determine which one of the 1,680
possible join orders is the most efficient.  The fact that you have 5
is_active checks and 3 pairs of matching UUID checks seems odd and if you
could reduce those 11 to 4 I suspect you'd get a better planning time.
Though it also may produce an inferior plan...thus consider the following
option:

Assuming the ideal plan shape for your data doesn't change you can read the
following and basically tell the planner to stop trying so hard and just
trust the join order that exists in the query.

https://www.postgresql.org/docs/current/explicit-joins.html

Lastly, if you can leverage prepared statements you can at least amortize
the cost (depending on whether a generic plan performs sufficiently
quickly).

I'll admit I'm no expert at this.  I'd probably just follow the
join_collapse_limit advice and move on if it works.  Maybe adding a
periodic check to see if anything has changed.
David J.


Re: Simple task with partitioning which I can't realize

2022-03-01 Thread David G. Johnston
On Tue, Mar 1, 2022 at 9:37 AM Andrew Zakharov  wrote:

> David, - yes, creation composite foreign/primary key is not a problem. But
> the main question is what method should I use for partitioning by composite
> key gid, region_code?
>

The convention here is to inline or bottom-post responses.

Your original plan - list partitions by region_code.  You couldn't do that
before because you weren't seeing the region_code as being part of your PK
and all partition columns must be part of the PK.  My suggestion is that
instead of figuring out how to work around that limitation (not that I
think there is a good one to be had) you accept it and just add region_code
to the PK.

David J.


Re: Simple task with partitioning which I can't realize

2022-03-01 Thread David G. Johnston
On Tue, Mar 1, 2022 at 8:37 AM Andrew Zakharov  wrote:

> create table region_hierarchy(
>
>   gid uuid not null default uuid_generate_v1mc(),
>
>   parent_gid uuid null,
>
>   region_code int2,
>
>
>


> I’ve carefully looked thru docs/faqs/google/communities and found out that
> I must include “gid” field into partition key because a primary key field.
>

Yes, you are coming up against the following limitation:

"Unique constraints (and hence primary keys) on partitioned tables must
include all the partition key columns. This limitation exists because the
individual indexes making up the constraint can only directly enforce
uniqueness within their own partitions; therefore, the partition structure
itself must guarantee that there are not duplicates in different
partitions."

https://www.postgresql.org/docs/current/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

That limitation is independent of partitioning; i.e., the legacy
inheritance option doesn't bypass it.

Thus, your true "key" is composite: (region, identifier).  Thus you need to
add a "parent_region_code" column as well, redefine the PK as (region_code,
gid), and the REFERENCES clause to link the two paired fields.

You can decide whether that is sufficient or if you want some added comfort
in ensuring that a gid cannot appear in multiple regions by creating a
single non-partitioned table containing all gid values and add a unique
constraint there.

Or maybe allow for duplicates across region codes and save space by using a
smaller data type (int or bigint - while renaming the column to "rid" or
some such) - combined with having the non-partitioned reference table being
defined as (region_code, rid, gid).

David J.


Re: PostgreSQL and Linux CPU's

2022-01-20 Thread David G. Johnston
On Thu, Jan 20, 2022 at 4:22 PM Sbob  wrote:

> I am looking for information on how PostgreSQL leverages or interfaces
> with CPU's on Linux. Does PostgreSQL let Linux do the work? Does it
> bypass the OS? Any information or docs you can send my way would be much
> appreciated.
>
>
PostgreSQL is a user land process in Linux.  Linux doesn't allow itself to
be bypassed by user land processes when dealing with the CPU.  That is kind
of its main reason for existing...

PostgreSQL uses a process forking model and each process runs on a single
thread.

You can probably verify all of that by perusing the PostgreSQL
documentation.  Don't know what to recommend regarding Linxu, user land,
kernel mode, and CPUs...

David J.


Re: PostgreSQL 12.8 Same Query Same Execution Plan Different Time

2022-01-19 Thread David G. Johnston
On Wed, Jan 19, 2022 at 7:59 AM Ludwig Isaac Lim  wrote:

>
> I noticed that different is actually in Nested Loop join. One is taking 2
> minutes, other is taking 12 seconds. I find this puzzling as I assume the
> nested loop should be done in memory.
>

Everything is done in memory, but the data has to get there first (hence
BUFFERS as you figured out below).


> The disk is gp2 SDD so I'm even more baffled by this. What could be the
> factors that affect the speed of nested loop. I notice for that both loops
> the rows is 7780 and loops is 1. I don't think those are big numbers
>

The loops are ~= 400 and 6,000


>
> It was only after the running the 2 queries that I realize I could
> do EXPLAIN (ANALYZE, BUFFERS), but I couldn't reproduce the slowness.
>

Did you (can you even in RDS) attempt to clear those buffers?  If the first
query ran slowly because none of the data was in memory (which you don't
know for certain because you didn't run with BUFFERS option then) then
subsequent runs would indeed be faster (the implementation of shared
buffers having fulfilled one of its major purposes in life).

I'll agree buffers for that query does not seem to account for nearly two
minutes...though as RDS is a shared resource I'd probably chalk at least
some of it to contention on the underlying hardware (disk likely being more
problematic than memory).

David J.


Re: Unique constraint blues

2022-01-18 Thread David G. Johnston
On Tue, Jan 18, 2022 at 10:13 AM Mladen Gogala 
wrote:

>
> mgogala=# create unique index test1_uq on test1(col1,coalesce(col2,'***
> EMPTY ***'));
>
> ->  Bitmap Index Scan on test1_uq  (cost=0.00..1.70 rows=6 width=0)

 ...
>Index Cond: (test1.col1 = 1)
>
> How come that the index is used for search without the "coalesce"
> function?


Only the second column is an expression.  The first (leading) column is
perfectly usable all by itself.  It is less efficient, hence the parent
node's:

Recheck Cond: (test1.col1 = 1)
Filter: ((test1.col2)::text = 'test1'::text)

but usable.

If you are willing to create partial unique indexes you probably should
just create two of them.  One where col2 is null and one where it isn't.

If the coalesce version is acceptable you should consider declaring the
column not null and put the sentinel value directly into the record.

David J.


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-21 Thread David G. Johnston
On Tue, Dec 21, 2021 at 4:07 PM Tom Lane  wrote:

> Lars Bergeson  writes:
> > I'm running PostgreSQL under AWS Aurora, and I didn't set it up or
> install
> > it, so I'm not sure about the OS version.
>
> Oh!  Aurora is not Postgres.  My admittedly-not-well-informed
> understanding is that they stuck a Postgres front end on their
> existing storage engine, so it's not surprising if storage-level
> behaviors are quite different from stock Postgres.
>
>
I do wish Amazon would be more considerate and modify what version()
outputs to include "AWS Aurora" somewhere in the human readable string.
Though the lack really isn't an excuse for reports of this nature to omit
such a crucial hardware/hosting detail.  The rest of the problem statement,
even with the "newbie to PostgreSQL" qualifier, was written well enough I
hadn't really considered that it would be anything but stock PostgreSQL on
a personal VM setup for testing.

David J.


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Tom Lane  wrote:

> Justin Pryzby  writes:
> > On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> >> Still taking 10X more I/O to read the smaller table. Very odd.
>
> > If I'm not wrong, it's even worse than that ?
> > It takes 20 or 30sec to run the query - but it says the associated I/O
> times
> > are ~500sec or ~6000sec ?
>
> It would help if somebody had labeled the units of I/O Time
> ... but I'm guessing those are microsec vs. the millisec
> of the other times, because otherwise it's completely wrong.
>
>
Related to my preceding observation, from the explain (buffers) docs:

“…and the time spent reading and writing data file blocks (in milliseconds)
if track_io_timing

is
enabled.“

David J.


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Justin Pryzby  wrote:

> On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> > ok, here are results after I did:
> > set max_parallel_workers_per_gather = 0;
> >
> > HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual
> time=19908.343..19908.345 rows=5 loops=1)
> >   I/O Timings: read=532369.898
> > Execution Time: 19908.383 ms
>
> > HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual
> time=30369.758..30369.761 rows=5 loops=1)
> >   I/O Timings: read=6440851.540
> > Execution Time: 30369.796 ms
>
> > Still taking 10X more I/O to read the smaller table. Very odd.
>
> If I'm not wrong, it's even worse than that ?
> It takes 20 or 30sec to run the query - but it says the associated I/O
> times
> are ~500sec or ~6000sec ?
>
> What architecture and OS/version are you running ?
> How did you install postgres?  From a package or compiled from source ?
>

The docs indicate you’ll only see I/O Timing information if using EXPLAIN
BUFFERS but I’m not seeing any of the other buffer-related information in
these plans.  Thoughts?

David J.


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Lars Bergeson  wrote:

>
> What is it about null values in the table that slows down the full table
> scan?
>
> If I populate blank/zero for all of the unused values in columns that are
> NULLable, the query is fast again. So just defining the columns as NULLable
> isn't what slows it down -- it's actually the NULL values in the rows that
> seems to degrade performance.
>

The presence or absence of the constraint has zero effect on the contents
of the page/tuple.  As soon as you have a single null in a row you are
adding a null bitmap [1] to the stored tuple.  And now for every single
column the system has to check whether a specific column’s value is null or
not.  Given the number of columns in your table, that this is noticeable is
not surprising.

David J.

[1]  https://www.postgresql.org/docs/current/storage-page-layout.html


Re: Views don't seem to use indexes?

2021-10-27 Thread David G. Johnston
On Wed, Oct 27, 2021 at 7:31 PM Tim Slechta  wrote:

>
> == Point 2. The equivalent query on the VL10N_OBJECT_NAME view executes a
> Seq Scan on the underlying pl10n_object_name. Why?
> tc=# EXPLAIN ANALYZE select pval_0 from VL10N_OBJECT_NAME where pval_0 =
> '';
>

Just to confirm and simplify, the question boils down to:

Why does:

SELECT * FROM view WHERE view.view_column = ?;

And view is:

CREATE VIEW AS
SELECT ..., view_column
FROM tbl1
UNION ALL
SELECT ..., view_column
FROM tbl2
;

Where tbl1 has an index on view_column AND tbl2 does not have an index on
view_column

Result in a plan where both tb11 and tbl2 are sequentially scanned and the
filter applied to the unioned result

Instead of a plan where the index lookup rows of tbl1 are supplied to the
union and only tbl2 is sequentially scanned

?

I don't have an answer to offer up here.  I'm pretty sure we do handle
predicate pushdown into UNION ALL generally.  I'm unclear exactly what the
equivalently rewritten query would be in this case - but demonstrating that
a query that doesn't use union all applies the index while the direct
access of the view doesn't isn't sufficient to narrow down the problem.  It
can still either be the rule processing or the union processing that is
seeming to make a wrong plan choice.

That isn't meant to discount the possibility that this case is actually
correct - or at least the best we do presently for one or more technical
reasons that I'm not familiar with...

David J.


Re: Postgres views cannot use both union and join/where

2021-10-20 Thread David G. Johnston
On Wed, Oct 20, 2021 at 6:58 AM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Tuesday, October 19, 2021, Michael Lewis  wrote:
> >> On Tue, Oct 19, 2021 at 3:48 PM Mithran Kulasekaran <
> >> mithranakulaseka...@gmail.com> wrote:
> >>> create  view template_view (id, name, description, is_staged) as
> >>> select t.id,t.name, t.description, false as is_staged
> >>> from template t
> >>> left join template_staging ts on t.name = ts.name and ts.name is null
>
> >> Does that work? I've only seen that type of logic written as-
> >> left join template_staging ts on t.name = ts.name
> >> where ts.name is null
>
> > The are functionally equivalent, though the timing of the expression
> > evaluation differs slightly.
>
> No, not at all.  Michael's version correctly implements an anti-join,
> where the first version does not.  The reason is that the WHERE clause
> "sees" the column value post-JOIN, whereas the JOIN/ON clause "sees"
> values pre-JOIN.
>

Yeah, my bad.  I was actually thinking this but then figured the OP
wouldn't have written an anti-join that didn't actually work.

My original email was going to be:

Adding the single table expression to the ON clause is shorthand for
writing:

SELECT t.* FROM template AS t LEFT JOIN (SELECT * FROM template_staging
WHERE template_staging.name IS NULL) AS ts ON t.name = ts.name;

David J.


Re: Postgres views cannot use both union and join/where

2021-10-19 Thread David G. Johnston
On Tuesday, October 19, 2021, Michael Lewis  wrote:

> On Tue, Oct 19, 2021 at 3:48 PM Mithran Kulasekaran <
> mithranakulaseka...@gmail.com> wrote:
>
>> create  view template_view (id, name, description, is_staged) as
>> select t.id,t.name, t.description, false as is_staged
>> from template t
>>  left join template_staging ts on t.name = ts.name and ts.name is 
>> null
>>
>>
> Does that work? I've only seen that type of logic written as-
>
> left join template_staging ts on t.name = ts.name
> where ts.name is null
>

The are functionally equivalent, though the timing of the expression
evaluation differs slightly.

It could also be written as an anti-join:

Select * from template as t where not exists (select 1 from
template_staging as ts where t.name = ts.name)

David J.


Re: Postgres views cannot use both union and join/where

2021-10-19 Thread David G. Johnston
On Tue, Oct 19, 2021 at 2:48 PM Mithran Kulasekaran <
mithranakulaseka...@gmail.com> wrote:

> i think the only problem is when we try to use both union and where/join
> the issue starts to happen
>

I'm unconvinced this is actually an issue based upon what is presented
here.  All I'm seeing is two decidedly different queries resulting in
different query plans.  That the "problem one" isn't using an index isn't
surprising given the volume of data involved and the change from specifying
a literal value in the where clause to letting a join determine which
results to return.

Assuming you have a real scenario you are testing with being able to
demonstrate (probably through the use of the query planner GUCs) that
PostgreSQL can produce a better plan but doesn't by default would be a more
compelling case.  More generally, you probably need to either use your real
scenario's data to help demonstrate the issue or create a self-contained
test that is at least closer to what it produces (this approach still
benefits from seeing what is happening for real).

David J.


Re: pg_restore schema dump to schema with different name

2021-08-24 Thread David G. Johnston
On Mon, Aug 23, 2021 at 2:46 AM Nagaraj Raj  wrote:

>
> Currently this is not something can do. this functionality is there in
> oracle.
>
> Is this future considering to add?  (it would really help for create any
> test schemas without disturbing current schema. )
>
>
I find this to be not all that useful.  Current practice is to avoid
relying on search_path and, in general, to schema-qualify object references
(yes, attaching a local SET search_path to a function works, not sure how
it would play out in this context).  Performing a dependency and contextual
rename of one schema name to another is challenging given all of that, and
impossible if the schema name is hard-coded into a function body.

I won't say we wouldn't accept such a patch, but as this isn't exactly a
new problem or realization, and the feature doesn't presently exist, that
for whatever reasons individuals may have no one has chosen to volunteer or
fund such development.  I don't even remember seeing a proposal in the past
5 or so years.

David J.


Re: difference between pg_triggers and information_schema.triggers

2021-08-11 Thread David G. Johnston
On Wednesday, August 11, 2021, aditya desai  wrote:

> Hi All,
> What is the difference between pg_triggers and
> information_schema.triggers? I want to list all triggers in the database.
>

Read the docs for information_schema.triggers.


> What is the best way to list all objects in PostgreSQL?(similar to
> all_objects in Oracle).
>
>
With pg_catalog tables.  But I’m not aware of anything that combines all
object types into a single result.  Seems like an easy enough query to put
together though.

David J.


Re: Big performance slowdown from 11.2 to 13.3

2021-07-27 Thread David G. Johnston
On Tue, Jul 27, 2021 at 7:57 PM l...@laurent-hasson.com <
l...@laurent-hasson.com> wrote:

> hash_mem_multiplier is an upper-bound right: it doesn't reserve memory
> ahead of time correct?
>

Yes, that is what the phrasing "maximum amount" in the docs is trying to
convey.

https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY

But also note that it is "each operation" that gets access to that limit.

David J.


Re: Users grants with setting options

2021-03-08 Thread David G. Johnston
On Mon, Mar 8, 2021 at 4:30 PM Nagaraj Raj  wrote:

> I have a quick question, does role custom parameters settings will be
> granted to users well?
>

Parameters are not inherited - the role credentials that are logging in are
the ones that are used to check for defaults.  This "no" is not explicitly
documented that I can find; though easy enough to test.

David J.


Re: Low cost query - running longer

2020-11-04 Thread David G. Johnston
On Wednesday, November 4, 2020, Koteswara Rao Daliparthi 
wrote:

> Hi Guys, Sorry for the bold below. I just feel it helps others identify
> my question easily.
>

You stopped following the reporting form too soon.  You also need to
provide the “questions about queries” section.  Or “slow queries”:

 https://wiki.postgresql.org/wiki/Slow_Query_Questions

David J.


Re: CPU Consuming query. Sequential scan despite indexing.

2020-10-21 Thread David G. Johnston
On Wed, Oct 21, 2020 at 10:22 PM aditya desai  wrote:

> As per application team, it is business requirement to show last 60 days
>> worth data.
>>
>
I didn't look deeply but it sounds like you are looking backwards into 60
days worth of detail every single time you perform the query and computing
an aggregate directly from the detail.  Stop doing that.  By way of
example, at the end of every day compute the aggregates on the relevant
dimensions and save them.  Then query the saved aggregates from previous
days and add them to the computed aggregate from the current day's detail.

David J.


Re: Query performance

2020-10-21 Thread David G. Johnston
On Wed, Oct 21, 2020 at 5:32 PM Nagaraj Raj  wrote:

> Hi, I have long running query which running for long time and its planner
> always performing sequnce scan the table2.
>

 FROM sor_t.transfer_order_header hed,sor_t.transfer_order_item itm
>  where hed.eventid=itm.eventid group by 1,2,3,4,5,6
>
> Any suggestions would greatly appretiated.
>

You aren't filtering out any rows so it is unsurprising that a sequential
scan was chosen to fulfil the request that the entire detail table be
consulted.  The good news is you have access to parallelism - see if you
can increase that factor.

Any other suggestions probably requires more knowledge of your problem
domain than you've provided here.

Finding a way to add a where clause or compute your desired result during
record insertion or updating are two other potential avenues of
consideration.

David J.


Performance issue when we use policies for Row Level Security along with functions

2020-10-12 Thread David G. Johnston
On Sunday, October 11, 2020, Gopisetty, Ramesh 
wrote:

>
> to sch USING  ( key =
> f_sel_1(key)
> );
>

As Tom said it doesn’t matter what you classify the function as (stable,
etc) if your function call accepts a column reference as an input and
compares its output to another column reference.  With a column reference
you need a row to find a value and if you already have a row the index
serves no purpose.

David J.


Re: Unclamped row estimates whith OR-ed subplans

2020-06-19 Thread David G. Johnston
On Friday, June 19, 2020, Laurenz Albe  wrote:

>
> > I am absolutely aware that those are hashed sub plans below a seqscan
> and that Postgres therefore has to scan all tuples of the table. But the
> problem is that upper nodes (which are excluded from
> > this example for simplicity) think they will receive 1301316 rows from
> the seqscan, when in fact they will probably only see a hand full, which
> the planner could have (easily?) deduced by taking the
> > greater of the two subplan row estimates.
> >
> > What am I missing, or is this perhaps a shortfall of the planner?
>
> The subplans are executed *fpr each row* found in "transactions",
> and the estimate on the subplans is *per execution".
>

I understood Tom’s nearby answer but not this one.  This seems to be
referring to correlated subplans which the examples are not.

David J.


Re: When to use PARTITION BY HASH?

2020-06-02 Thread David G. Johnston
On Tue, Jun 2, 2020 at 10:17 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> That *might* turn out to be the case with a small number of distinct
> values in the partitioning column(s), but then why rely on hash
> assignment instead of using PARTITION BY LIST in the first place?
>
> [1] https://www.postgresql.org/docs/12/ddl-partitioning.html
>

Why the cross-posting? (-performance is oriented toward problem solving,
not theory, so -general is the one and only PostgreSQL list this should
have been sent to)

Anyway, quoting the documentation you linked to:

"When choosing how to partition your table, it's also important to consider
what changes may occur in the future. For example, if you choose to have
one partition per customer and you currently have a small number of large
customers, consider the implications if in several years you instead find
yourself with a large number of small customers. In this case, it may be
better to choose to partition by HASH and choose a reasonable number of
partitions rather than trying to partition by LIST and hoping that the
number of customers does not increase beyond what it is practical to
partition the data by."

Hashing does indeed preclude some of the benefits and introduces others.

I suspect that having a hash function that turns its input into a different
output and checking for equality on the output would be better than trying
to "OR" a partition list together in order to combine multiple inputs onto
the same table.

David J.


Re: Suggestion to improve query performance for GIS query.

2020-05-22 Thread David G. Johnston
On Thursday, May 21, 2020, postgann2020 s  wrote:

>
> SELECT seq_no+1 INTO pair_seq_no FROM SCHEMA.TABLE WHERE (Column1 like
> '%,sheath--'||cable_seq_id ||',%' or Column1 like 'sheath--'||cable_seq_id
> ||',%' or Column1 like '%,sheath--'||cable_seq_id  or
> Column1='sheath--'||cable_seq_id) order by seq_no desc limit 1 ;
>
>
> Could you please suggest a better way to execute the query
>

Add a trigger to the table to normalize the contents of column1 upon insert
and then rewrite your query to reference the newly created normalized
fields.

David J.


Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread David G. Johnston
On Thu, May 21, 2020 at 7:45 AM postgann2020 s 
wrote:

> >And what type of data exactly are we talking about.  ==> Column is
> stroing GIS data.
>

GIS data isn't really TEXT and isn't a core datatype of PostgreSQL so this
is maybe better posted to the PostGIS community directly...

David J.


Re: Suggestion on index creation for TEXT data field

2020-05-21 Thread David G. Johnston
On Thu, May 21, 2020 at 7:28 AM postgann2020 s 
wrote:

> which is having an avg width of 149bytes.
>

The average is meaningless if your maximum value exceeds a limit.

2. What type of index is the best suited for this type of data?.
>

And what type of data exactly are we talking about.  "TEXT" is not a useful
answer.

If the raw data is too large no index is going to be "best" -  as the hint
suggests you either need to drop the idea of indexing the column altogether
or apply some function to the raw data and then index the result.

David J.


Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

2020-05-07 Thread David G. Johnston
On Thu, May 7, 2020 at 11:07 AM Amarendra Konda 
wrote:

> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id
> AS pa_process_activity_id  FROM process_activity pa WHERE pa.app_id =
> '126502930200650' AND pa.created > '1970-01-01 00:00:00'  AND EXISTS (
> SELECT 1 FROM process_instance pi where pi.app_id = pa.app_id AND 
> *pi.process_instance_id
> = pa.process_instance_id * AND pi.user_id = '137074931866340') ORDER BY
> pa.process_instance_id,  pa.created limit 50;
>
>
>
>
>->  Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa  (cost=0.70..1061.62 rows=1436 width=32) *(actual
> time=0.011..20.320 rows=23506 loops=2)*
>
> Index Cond: ((m.process_instance_id = pi.process_instance_id) AND
(m.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01
00:00:00'::timestamp without time zone))

I suppose during the nested loop the inner index scan could limit itself to
the first 50 entries it finds (since the first two index columns are being
held constant on each scan, m.created should define the traversal order...)
so that the output of the nested loop ends up being (max 2 x 50) 100
entries which are then sorted and only the top 50 returned.

Whether the executor could but isn't doing that here or isn't programmed to
do that (or my logic is totally off) I do not know.

David J.


Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

2020-05-07 Thread David G. Johnston
On Thu, May 7, 2020 at 10:49 AM Amarendra Konda 
wrote:

> Can you please explain, why it is getting more columns in output, even
> though we have asked for only one column ?
>
>
>
> * Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url,
> pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias,
> pa.read_by_user, pa.source, pa.label_category_id, pa.label_id,
> pa.csat_response_id, m.process_activity_fragments, pa.created, pa.updated,
> pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments,
> pa.status_fragment, pa.internal_meta, pa.interaction_id,
> pa.do_not_translate, pa.should_translate, pa.in_reply_to*
>
Not knowing the source code in this area at all...

I'm pretty sure its because it doesn't matter.  The executor retrieves data
"pages", 8k blocks containing multiple records, then extracts specific full
tuples from there.  At that point its probably just data pointers being
passed around.  Its not until the end that the planner/executor has to
decide which subset of columns to return to the user, or when a new tuple
structure has to be created anyway (say because of joining), maybe, does it
take the effort of constructing a minimally necessary output column set.

David J.


Re: pg_attribute, pg_class, pg_depend grow huge in count and size with multiple tenants.

2020-05-07 Thread David G. Johnston
On Thu, May 7, 2020 at 1:05 PM samhitha g 
wrote:

> Our application serves multiple tenants. Each tenant has the schema with a
> few hundreds of tables and few functions.
> We have 2000 clients so we have to create 2000 schemas in a single
> database.
>

That is one option but I wouldn't say you must.  If you cannot get
individual tables to be multi-tenant you are probably better off having one
database per client on a shared cluster - at least given the size of the
schema and number of clients.

David J.


Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNER Query )

2020-05-07 Thread David G. Johnston
On Thu, May 7, 2020 at 7:40 AM Adrian Klaver 
wrote:

> On 5/7/20 4:19 AM, Amarendra Konda wrote:
> > Hi,
> >
> > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
> >
> > We have noticed huge difference interms of execution plan ( response
> > time) , When we pass the direct values  Vs  inner query to IN clause.
> >
> > High level details of the use case are as follows
> >
> >   * As part of the SQL there are 2 tables named Process_instance
> > (master) and Process_activity ( child)
> >   * Wanted to fetch TOP 50 rows from  Process_activity table for the
> > given values of the Process_instance.
> >   * When we used Inner Join / Inner query ( query1)  between parent
> > table and child table , LIMIT is not really taking in to account.
> > Instead it is fetching more rows and columns that required, and
> > finally limiting the result
>
> It is doing what you told it to do which is SELECT all
> process_instance_i's for user_id='317079413683604' and app_id =
> '427380312000560' and then filtering further. I am going to guess that
> if you run the inner query alone you will find it returns ~23496 rows.
> You might have better results if you an actual join between
> process_activity and process_instance. Something like below(obviously
> not tested):
>

What the OP seems to want is a semi-join:

(not tested)

SELECT pa.process_activity_id
FROM process_activity pa WHERE pa.app_id = '427380312000560' AND pa.created
> '1970-01-01 00:00:00'
AND EXISTS (
  SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND
pi.user_id = '317079413683604'
)
ORDER BY
pa.process_instance_id,
pa.created limit 50;

I'm unsure exactly how this will impact the plan choice but it should be an
improvement, and in any case more correctly defines what it is you are
looking for.

David J.


Re: Recursive query slow on strange conditions

2020-05-04 Thread David G. Johnston
On Mon, May 4, 2020 at 9:12 AM Jean-Christophe Boggio <
postgre...@thefreecat.org> wrote:

> Is there a way to disable JIT (I use the apt.postgresql.org repository)
> in both 11.6 and 12.2 ? I would have liked to disable it on this
> particular query but maybe I could live with disabling JIT everywhere.
>
>
https://www.postgresql.org/docs/12/jit-decision.html

David J.


Re: PostgreSQL does not choose my indexes well

2020-04-23 Thread David G. Johnston
On Thu, Apr 23, 2020 at 1:33 PM Tom Lane  wrote:

> I find the apparently
> unnecessary cast in the partial-index predicate to be suspicious ---
> maybe that's blocking matching to the WHERE clause?
>

I noticed that too...I suspect its related to the ANALYZE result:

Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41
rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
   Index Cond: ((cod_tabla)::bigint = 4)

Since the index condition ended up cast to bigint the OP probably wrote the
predicate to match.

David J.


Re: PostgreSQL does not choose my indexes well

2020-04-23 Thread David G. Johnston
On Thu, Apr 23, 2020 at 8:29 AM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Thursday, April 23, 2020, Thomas Kellerer  wrote:
> >> Plus: scanning idx_tabla_entidad is more efficient because that index is
> >> smaller.
>
> > Really?  The absence of 33 million rows in the partial index seems like
> it
> > would compensate fully and then some for the extra included columns.
>
> On the other hand, an indexscan is likely to end up being effectively
> random-access rather than the purely sequential access involved in
> a seqscan.
>

I feel like I'm missing something as the OP's query is choosing indexscan -
just it is choosing to scan the full index containing the searched upon
field instead of a partial index that doesn't contain the field but whose
predicate matches the where condition - in furtherance of a count(*)
computation where the columns don't really matter.

I do get "its going to perform 1.4 million random index entries and heap
lookup anyway - so it doesn't really matter" - but the first answer was
"the full index is smaller than the partial" which goes against my
intuition.

The sequential scan that isn't being used would have to touch 25x the
number of records - so its non-preference seems reasonable.

David J.


Re: PostgreSQL does not choose my indexes well

2020-04-23 Thread David G. Johnston
On Thursday, April 23, 2020, Thomas Kellerer  wrote:

> > CREATE INDEX idx_tabla_entidad
> > ON public.entidad USING btree
> > (cod_tabla ASC NULLS LAST);
> >
> > CREATE INDEX idx_entidad_tabla_4
> > ON public.entidad USING btree
> > (cod_entidad_tabla ASC NULLS LAST)
> > INCLUDE(cod_entidad, cod_tabla, cod_entidad_tabla)
> > WHERE cod_tabla::bigint = 4;
> >
> >
> > SELECT count(*) from entidad;
> > 34.413.354
> >
> > SELECT count(*) from entidad where cod_tabla = 4;
> > 1.409.985
> >
> >
> > explain (analyze, buffers, format text) select * from entidad where
> cod_tabla = 4
> > Index Scan using idx_tabla_entidad on entidad (cost=0.56..51121.41
> rows=1405216 width=20) (actual time=0.037..242.609 rows=1409985 loops=1)
> >   Index Cond: ((cod_tabla)::bigint = 4)
> >   Buffers: shared hit=12839
> > Planning Time: 0.158 ms
> > Execution Time: 311.828 ms
> >
> >
> > Why postgresql doesnt use the index idx_entidad_tabla_4?
>
> Because that index does not contain the column from the WHERE clause as an
> "indexed" column (only as an included column).


But it does match the partials index’s predicate


> Plus: scanning idx_tabla_entidad is more efficient because that index is
> smaller.
>

Really?  The absence of 33 million rows in the partial index seems like it
would compensate fully and then some for the extra included columns.

David J.


Postgres not using index on views

2020-04-17 Thread David G. Johnston
On Friday, April 17, 2020, Rick Vincent  wrote:

> Hi,
>
> I was wondering if anyone can explain the below problem.  Should a bug be
> logged for this?
>
> Kind regards,
> Rick
>
> _
> *From:* Rick Vincent
> *Sent:* Tuesday, April 7, 2020 11:08 AM
> *To:* 'Tom Lane' ; Justin Pryzby 
> *Cc:* pgsql-performa...@postgresql.org; Manoj Kumar <
> manojku...@temenos.com>; Herve Aubert 
> *Subject:* RE: Postgres not using index on views
>
>
> Hi Tom,
>
> The function is defined as below, so no use of VOLATILE.  Let me know if
> you need any other information.  I am hoping the below will further clarify
> the issue.
>
>

IIUC as Tom wrote you have volatile functions (implied/default as Thomas
wrote) attached to view column outputs and the planner will not optimize
those away.

Mark your function immutable (assuming it is) and retry your experiment
with the where clause query.

David J.


Re: Random function

2020-03-24 Thread David G. Johnston
How is this a performance related question?

On Tue, Mar 24, 2020 at 11:10 AM Luis Roberto Weck <
luisrobe...@siscobra.com.br> wrote:

> However, I am getting the same result over mulitiple rows. This is a
> sample of the SQL I am using:
>
> select (select string_agg(random()::text,';')
>from pg_catalog.generate_series(1,3,1) )
>from generate_series(1,10,1)
>
> And I am getting something like:
>
> |string_agg |
> +--+
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
> |0.243969671428203583;0.69257879443434;0.291524752043187618|
>
> If this is the expected output,


Yes, you've asked it to compute a value, assign it to a column, then
generate 10 rows of that value.

is there a way to always generate random
> numbers?
>

Don't use a scalar subquery in the main target list.

One possible answer:

select format('%s;%s;%s', random(), random(), random()) from
generate_series(1, 10)

David J.


Re: Can we have multiple tablespaces with in a database.

2020-02-20 Thread David G. Johnston
Please pick a single list to post to.  Performance seems like the
unnecessary one here.

On Thu, Feb 20, 2020 at 10:34 PM Daulat Ram 
wrote:

> Can we have multiple tablespaces with in a database in postgres?
>

I fell as if I'm missing something in your question given the presence of
the "CREATE TABLESPACE" SQL command and the related non-command
documentation covered here:

https://www.postgresql.org/docs/12/manage-ag-tablespaces.html


> Can we have a table on different tablespace same as Oracle?
>

There is no provision to assign two tablespaces to a single physical
table.  To the benefit of those who don't use the other product you may
wish to say exactly what you want to do instead of comparing it to
something that many people likely have never used.

David J.


Re: How to avoid UPDATE performance degradation in a transaction

2020-02-13 Thread David G. Johnston
On Thu, Feb 13, 2020 at 1:42 PM Karl Düüna  wrote:

> It really isn't noticeable until about 5k UPDATEs on a single row.
>

Don't know why, and never dealt with a scenario where this would even come
up, but that this doesn't perform well inside a transaction isn't
surprising to me.  Kinda surprised it works well at all actually.  I'd
probably try and rework the processing algorithm to create an unlogged
temporary table with data from the row's initial state, manipulate until my
heart's content, then take the final result and update the single live row
with the final state.

David J.


Re: Searching in varchar column having 100M records

2019-07-17 Thread David G. Johnston
On Wed, Jul 17, 2019 at 4:04 AM mayank rupareliya 
wrote:

> create table fields(user_id varchar(64), field varchar(64));
> CREATE INDEX index_field ON public.fields USING btree (field);
>
> Any suggestions for improvement?
>

Reduce the number of rows by constructing a relationally normalized data
model.

David J.


Re: Oracle to postgres migration

2019-04-08 Thread David G. Johnston
On Mon, Apr 8, 2019 at 3:31 PM Gavin Flower 
wrote:

> I think having a specific migration group would also be likely to
> improve the visibility of pg, and the idea of migrating to pg.  As it
> help pg to appear in more search results.
>
>
I presently have qualms retaining novice, sql, performance, and probably
some others.  I don't think adding yet another specialized low-volume list
is of particular benefit.  Nor do I think it behooves the core project to
be in the center of migration support anyway.

This discussion can and should be moved to -general

David J.


Re: How Do You Associate a Query With its Invoking Procedure?

2018-09-14 Thread David G. Johnston
On Thu, Sep 13, 2018 at 12:49 PM, Fd Habash  wrote:

> In API function may invoke 10 queries. Ideally, I would like to know what
> queries are invoked by it and how long each took.
>
>
>
> I’m using pg_stat_statement. I can see the API function statement, but how
> do I deterministically identify all queries invoked by it?
>

pg_stat_statement is a global tracker that throws away execution context,
in this case the process id, needed to track the level of detail you
desire.  I think the best you can do is log all statements and durations to
the log file and parse that.

For the "what queries are invoked by it" you can just read the source
code...

As there is no canned solution to provide the answer you seek the final
solution you come up with will be influenced by your access patterns,
specific needs, and (in)ability to write C code (though maybe there is an
extension out there you could leverage...).

David J.


Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-21 Thread David G. Johnston
On Mon, May 21, 2018 at 7:43 AM, pavan95 
wrote:

> Hi David,
>
> Thank you so much for your valuable inputs.  Is there anything that I need
> to look from Indexes perspective or Join order ??
>
> Kindly let me know if  it can be tuned further.
>

What I've got to give here is what you've received.

David J.
​


Re: Help me in reducing the CPU cost for the high cost query below, as it is hitting production seriously!!

2018-05-21 Thread David G. Johnston
On Mon, May 21, 2018 at 6:39 AM, pavan95 
wrote:

> Hi Abbas,
>
> Thanks for your valuable suggestions. To my surprise I got the same output
> as what I have executed before.
>
> But unfortunately I'm unable to understand the logic of the code, in
> specific what is base 60 number? The used data type for "effort_hours"
> column is 'double precision'.
>
> Kindly help me in understanding the logic. Thanks in advance.


This is not converting a "base 60 number to base 10" - this is computing a
percentage, which is indeed what you want to do.

Since 0.60 is the maximum value of the fraction in this encoding scheme
dividing the actual value by 0.60 tells you what percentage (between 0 and
1) your value is of the maximum.  But you have to get rid of the hours
component first, and floor truncates the minutes leaving just the hours
which you can subtract out from the original leaving only the minutes.
​
David J.​

P.S. ​You could consider adding a new column to the table, along with a
trigger, and compute and store the derived value upon insert.
​


Re: Query optimiser is not using 'not null' constraint when 'order by nulls last' clause is used

2018-02-02 Thread David G. Johnston
On Fri, Feb 2, 2018 at 8:49 AM, Nandakumar M  wrote:

> But, for the PK column we are not in control of the index that is created.
>

​You probably can (I assume the nulls aspect of the index doesn't prevent
PK usage), but you must add the PK to the table after creating the index
and not let the system auto-generate the index for you.​

https://www.postgresql.org/docs/10/static/sql-altertable.html

​ALTER TABLE name ADD ​PRIMARY KEY USING INDEX index_name;

David J.


Re: 8.2 Autovacuum BUG ?

2018-01-24 Thread David G. Johnston
On Wed, Jan 24, 2018 at 7:48 AM, Pavan Teja 
wrote:

>
>
>  Yes, but why doing vacuum freeze of a table is causing a rapid
> ​archiving??
> Any idea??
>
>
IIUC ​Freezing involves physically altering those pages that are not frozen
to make them frozen.  Those changes are logged just like any (most?) other
physical changes to pages.  The rapid-ness is because freezing is not that
difficult so lots of pages can be changed in a relatively short period of
time.

David J.
​