Re: cutting down the TODO list thread

2020-10-28 Thread Oleksandr Shulgin
On Tue, Oct 27, 2020 at 8:25 PM John Naylor 
wrote:

> As I mentioned in [1], I've volunteered to clear out the TODO list of
> items that appear to be too difficult, controversial, or otherwise not
> worth doing to warrant being listed there. I'll be working a few sections
> at a time, and every so often I'll have a list of proposed items for
> removal. If I don't hear objections, I'll remove the items after a few days
> while going through the next set.
>

I'm totally on board with cleaning the list up, but how about marking as
"won't fix" (or similar) instead of actually removing the items?  That
should help to prevent the same exact items from appearing on the list
again, which they eventually would, I believe.

--
Alex


Re: Concurrency issue in pg_rewind

2020-09-18 Thread Oleksandr Shulgin
On Fri, Sep 18, 2020 at 8:10 AM Michael Paquier  wrote:

> On Thu, Sep 17, 2020 at 10:20:16AM +0200, Oleksandr Shulgin wrote:
> > Ouch.  I think pg_rewind shouldn't try to remove any random files in
> pg_wal
> > that it doesn't know about.
> > What if the administrator made a backup of some WAL segments there?
>
> IMO, this would be a rather bad strategy anyway, so just don't do
> that, because that could also mean that this is on the same partition
> as pg_wal/ which would crash the server if the partition has the idea
> to get full even if max_wal_size is set correctly.


To clarify my point, I don't mean to backup WAL segments in the background
when the server is running, but precisely when the server is down and you
need to intervene, such as running pg_rewind.  You might want to "stash"
some of the latest segments in case you need to start over (name it
pg_wal/00840A76001E.backup, or
pg_wal/backup/00840A76001E).  It is surprising that pg_rewind
might want to decide to remove those.

--
Alex


Re: Concurrency issue in pg_rewind

2020-09-17 Thread Oleksandr Shulgin
On Wed, Sep 16, 2020 at 2:55 PM Alexander Kukushkin 
wrote:

>
> The second time pg_rewind also failed, but the error looked differently:
> servers diverged at WAL location A76/39E55338 on timeline 132
> rewinding from last common checkpoint at A76/1EF254B8 on timeline 132
>
> could not remove file
>
> "/home/postgres/pgdata/pgroot/data/pg_wal/.wal-g/prefetch/running/00840A760024":
>

Ouch.  I think pg_rewind shouldn't try to remove any random files in pg_wal
that it doesn't know about.
What if the administrator made a backup of some WAL segments there?

--
Alex


Re: builtin functions, parameter names and psql's \df

2020-09-02 Thread Oleksandr Shulgin
On Wed, Sep 2, 2020 at 7:35 AM Andres Freund  wrote:

> Hi,
>
> on a regular basis I remember a builtin function's name, or can figure it
> out
> using \df etc, but can't remember the argument order. A typical example is
> regexp_*, where I never remember whether the pattern or the input string
> comes
> first.
>
> Unfortunatly \df does not really help with that:
>
> =# \df regexp_split_to_table
>
> ┌┬───┬──┬─┬──┐
> │   Schema   │ Name  │ Result data type │ Argument data
> types │ Type │
>
> ├┼───┼──┼─┼──┤
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text
> │ func │
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text,
> text│ func │
>
> └┴───┴──┴─┴──┘
>
> If the parameters were named however, it'd be clear:
>
> =# CREATE OR REPLACE FUNCTION pg_catalog.regexp_split_to_table(string
> text, pattern text)
>  RETURNS SETOF text
>  LANGUAGE internal
>  IMMUTABLE PARALLEL SAFE STRICT
> AS $function$regexp_split_to_table_no_flags$function$
>
> =# \df regexp_split_to_table
>
> ┌┬───┬──┬──┬──┐
> │   Schema   │ Name  │ Result data type │   Argument data
> types│ Type │
>
> ├┼───┼──┼──┼──┤
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ string text,
> pattern text │ func │
> │ pg_catalog │ regexp_split_to_table │ SETOF text   │ text, text,
> text │ func │
>
> └┴───┴──┴──┴──┘
>
> (I intentionally left the three parameter version unchanged, to show the
> difference)
>
>
> In the docs we already name the parameters using SQL like syntax, see [1].
> How
> about we actually do so for at least the more common / complicated
> functions?
>

+many

I find myself in the same situation a lot.
I've never realized that's an implementation detail and not something
fundamental preventing the parameters from being named in the built-in
functions.

--
Alex


Re: libpq copy error handling busted

2020-06-04 Thread Oleksandr Shulgin
On Thu, Jun 4, 2020 at 5:37 AM Thomas Munro  wrote:

> On Thu, Jun 4, 2020 at 1:53 PM Thomas Munro 
> wrote:
> > On Thu, Jun 4, 2020 at 1:35 PM Tom Lane  wrote:
> > > Ah, it's better if I put the pqReadData call into *both* the paths
> > > where 1f39a1c06 made pqSendSome give up.  The attached patch seems
> > > to fix the issue for the "pgbench -i" scenario, with either fast-
> > > or immediate-mode server stop.  I tried it with and without SSL too,
> > > just to see.  Still, it's not clear to me whether this might worsen
> > > any of the situations we discussed in the lead-up to 1f39a1c06 [1].
> > > Thomas, are you in a position to redo any of that testing?
>
> It seems to be behave correctly in that scenario.
>
> Here's what I tested.  First, I put this into pgdata/postgresql.conf:
>
> ssl=on
> ssl_ca_file='root+client_ca.crt'
> ssl_cert_file='server-cn-only.crt'
> ssl_key_file='server-cn-only.key'
> ssl_crl_file='root+client.crl'
> ssl_min_protocol_version='TLSv1.2'
> ssl_max_protocol_version='TLSv1.1'
> ssl_min_protocol_version='TLSv1.2'
> ssl_max_protocol_version=''
>
> I copied the named files from src/test/ssl/ssl/ into pgdata, and I ran
> chmod 600 on the .key file.
>
> I put this into pgdata/pg_hba.conf at the top:
>
> hostssl all all 127.0.0.1/32 cert clientcert=verify-full
>
> I made a copy of src/test/ssl/ssl/client-revoked.key and ran chmod 600 on
> it.
>

Would it be feasible to capture this in a sort of a regression (TAP?) test?

--
Alex


Re: feature idea: use index when checking for NULLs before SET NOT NULL

2020-05-29 Thread Oleksandr Shulgin
On Fri, May 29, 2020 at 8:56 AM Sergei Kornilov  wrote:

> Hello
>
> Correct index lookup is a difficult task. I tried to implement this
> previously...
>
> But the answer in SO is a bit incomplete for recent postgresql releases.
> Seqscan is not the only possible way to set not null in pg12+. My patch was
> commited ( https://commitfest.postgresql.org/22/1389/ ) and now it's
> possible to do this way:
>
> alter table foos
>  add constraint foos_not_null
>  check (bar1 is not null) not valid; -- short-time exclusive lock
>
> alter table foos validate constraint foos_not_null; -- still seqscan
> entire table but without exclusive lock
>
> An then another short lock:
> alter table foos alter column bar1 set not null;
> alter table foos drop constraint foos_not_null;
>

That's really good to know, Sergei!

John, I think it's worth pointing out that Postgres most likely does a full
table scan to validate a constraint by design and not in optimization
oversight.  Think of what's gonna happen if the index used for checking is
corrupted?

Cheers,
--
Alex


Re: [PATCH] Add support to psql for edit-and-execute-command

2020-05-18 Thread Oleksandr Shulgin
On Mon, May 18, 2020 at 1:30 AM Joe Wildish  wrote:

>
> Attached is a small patch for adding "edit-and-execute-command" readline
> support to psql. Bash has this concept and I miss it when using psql. It
> allows you to amend the current line in an editor by pressing "v" (when
> in vi mode) or "C-x C-e" (when in emacs mode). Those are the default
> bindings from bash although of course they can be amended in inputrc.
>

The only difference from \e is that you don't need to jump to the end of
input first, I guess?

--
Alex


Re: Poll: are people okay with function/operator table redesign?

2020-05-05 Thread Oleksandr Shulgin
On Mon, May 4, 2020 at 11:22 PM Tom Lane  wrote:

> I've now completed updating chapter 9 for the new layout,
> and the results are visible at
> https://www.postgresql.org/docs/devel/functions.html
> There is more to do --- for instance, various contrib modules
> have function/operator tables that should be synced with this
> design.  But this seemed like a good place to pause and reflect.
>

Would it be premature to complain about the not-that-great look of Table
9.1 now?

Compare the two attached images: the screenshot from
https://www.postgresql.org/docs/devel/functions-comparison.html
vs the GIMP-assisted pipe dream of mine to align it to the right edge of
the table cell.

I don't have the faintest idea how to achieve that using SGML at the
moment, but it just looks so much nicer to me. ;-)

Regards,
--
Alex


Re: do {} while (0) nitpick

2020-05-04 Thread Oleksandr Shulgin
On Fri, May 1, 2020 at 3:52 AM Bruce Momjian  wrote:
>
> On Thu, Apr 30, 2020 at 09:51:10PM -0400, Tom Lane wrote:
> > John Naylor  writes:
> > > As I understand it, the point of having "do {} while (0)" in a
> > > multi-statement macro is to turn it into a simple statement.
> >
> > Right.
> >
> > > As such,
> > > ending with a semicolon in both the macro definition and the
> > > invocation will turn it back into multiple statements, creating
> > > confusion if someone were to invoke the macro in an "if" statement.
> >
> > Yeah.  I'd call these actual bugs, and perhaps even back-patch worthy.
>
> Agreed.  Those semicolons could easily create bugs.

It was a while ago that I last checked our Developer guide over at
PostgreSQL wiki website, but I wonder if this is a sort of issue that
modern linters would be able to recognize?

The only hit for "linting" search on the wiki is this page referring to the
developer meeting in Ottawa about a year ago:
https://wiki.postgresql.org/wiki/PgCon_2019_Developer_Meeting

> Other major projects include:
>   ...
>   Code linting

Anybody aware what's the current status of that effort?

Cheers,
--
Alex


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Oleksandr Shulgin
On Tue, Apr 28, 2020 at 5:49 AM Jonah H. Harris 
wrote:

> On Mon, Apr 27, 2020 at 11:33 PM David Fetter  wrote:
>
>>
>> Have the authors agreed to make it available to the project under a
>> compatible license?
>
>
> If there’s interest, obviously. Otherwise I wouldn’t be asking.
>
> I said from the start why I wasn’t attaching a patch and that I was seeing
> feedback. Honestly, David, stop wasting my, and list time, asking
> pointless off-topic questions.
>

Jonah,

I see it the other way round—it could end up as a waste of everyone's time
discussing the details, if the authors don't agree to publish their code in
the first place.  Of course, it could also be written from scratch, in
which case I guess someone else from the community (who haven't seen that
private code) would have to take a stab at it, but I believe it helps to
know this in advance.

I also don't see how it "obviously" follows from your two claims: "I've
found this functionality" and "I'll clean-up their patch and submit", that
you have even asked (or, for that matter—found) the authors of that code.

Finally, I'd like to suggest you adopt a more constructive tone and become
familiar with the project's Code of Conduct[1], if you haven't yet.  I am
certain that constructive, respectful communication from your side will
help the community to focus on the actual details of your proposal.

Kind regards,
-- 
Alex

[1] https://www.postgresql.org/about/policies/coc/


Re: Make java client lib accept same connection strings as psql

2020-02-24 Thread Oleksandr Shulgin
On Sat, Feb 22, 2020 at 4:05 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Fri, Feb 21, 2020 at 6:21 PM Michael Leonhard 
> wrote:
>
>> How about making the Java client library accept the same connection
>> strings as psql and other command-line tools?  [...]
>>
>
> That falls outside the scope of this list/project.  The separate pgJDBC
> project team would need to decide to take that up.
>

Michael,

While your proposed end goal sounds as a desirable thing to me, there are
certain obstacles to that, unfortunately.

First, consider that the URL support appeared in libpq after, not before
the support in JDBC driver.
Second, the common subset of allowed connection parameters between the two
is only as big as "host", "port", "database", "user" and "password".

Additionally, libpq understands the "ssl=true" parameter for JDBC
compatibility, though I don't think that was a good idea in the end.  For
one, in the JDBC world "ssl=false" is treated exactly the same as
"ssl=true" or any other value, which is questionable design in the first
place.  And even if you could use exactly the same URL in both libpq-based
and JDBC clients, without running into syntax errors, the semantics of
"ssl=true" is subtly different between the two: in the former case, the
client does not attempt to validate certificate, nor the hostname, as
opposed to the latter.

As to your actual example, the part of syntax that is treated differently
in libpq is the the "userinfo":
https://tools.ietf.org/html/rfc3986#section-3.2.1
The JDBC driver could be extended to support this as well, as such a change
is backwards-compatible.  As David has pointed out, this question should be
asked to the PgJDBC project.

Lastly, the RFC provides some good arguments as to why providing username
and, especially, password in the connection URL is undesirable.  While the
"user:password@host" or "?user=fred=secret" syntax can be handy
for local testing, this is definitely not something that should be used in
production.  Both libpq and the JDBC driver provide ways to pass login
credentials in a more secure manner.

Kind regards,
--
Alex


Re: POC: converting Lists into arrays

2019-07-03 Thread Oleksandr Shulgin
On Tue, Jul 2, 2019 at 5:12 PM Tom Lane  wrote:

> Oleksandr Shulgin  writes:
> > Not related to the diff v6..v7, but shouldn't we throw additionally a
> > memset() with '\0' before calling pfree():
>
> I don't see the point of that.  In debug builds CLOBBER_FREED_MEMORY will
> take care of it, and in non-debug builds I don't see why we'd expend
> the cycles.
>

This is what I was wondering about, thanks for providing a reference.

--
Alex


Re: POC: converting Lists into arrays

2019-07-02 Thread Oleksandr Shulgin
On Tue, Jul 2, 2019 at 1:27 AM Tom Lane  wrote:

>
> So I think this is a win, and attached is v7.
>

Not related to the diff v6..v7, but shouldn't we throw additionally a
memset() with '\0' before calling pfree():

+ListCell   *newelements;
+
+newelements = (ListCell *)
+MemoryContextAlloc(GetMemoryChunkContext(list),
+   new_max_len * sizeof(ListCell));
+memcpy(newelements, list->elements,
+   list->length * sizeof(ListCell));
+pfree(list->elements);
+list->elements = newelements;

Or is this somehow ensured by debug pfree() implementation or does it work
differently together with Valgrind?

Otherwise it seems that the calling code can still be hanging onto a list
element from a freed chunk and (rather) happily accessing it, as opposed to
almost ensured crash if that is zeroed before returning from enlarge_list().

Cheers,
--
Alex


Re: Analyze all plans

2019-01-23 Thread Oleksandr Shulgin
On Wed, Jan 23, 2019 at 9:44 AM Donald Dong  wrote:

>
> 1. Enumerate all the plans
>

Not sure this is going to work.  Because of the total number of possible
plans is somewhere
around O(n!), if I'm not mistaken, in terms of number of joined relations
times the available
access methods times the possible join strategies.

So enumerating all possible plans stops being practical for even slightly
complicated queries.

Regards,
--
Alex


Re: How can we submit code patches that implement our (pending) patents?

2018-07-26 Thread Oleksandr Shulgin
On Wed, Jul 25, 2018 at 8:35 PM Nico Williams  wrote:

>
> What are you proposing anyways?  That every commit come with a patent
> search?
>

I would propose that everyone wasting their time and effort to discuss this
issue here, would rather spend that time working towards putting an end to
software patents instead.
That would be a much better use of the time, IMHO.

Just my 2c,
--
Alex


Re: psql's \d versus included-index-column feature

2018-07-19 Thread Oleksandr Shulgin
On Thu, Jul 19, 2018 at 1:11 AM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Wed, Jul 18, 2018 at 11:14 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Wed, Jul 18, 2018 at 12:55 PM, Tom Lane  wrote:
>>
>>>
>>> regression=# \d tbl_include_reg_idx
>>> Index "public.tbl_include_reg_idx"
>>>  Column |  Type   | Key | Definition
>>> +-+--
>>>  c1 | integer | t   | c1
>>>  c2 | integer | t   | c2
>>>  c3 | integer | f   | c3
>>>  c4 | box | f   | c4
>>> btree, for table "public.tbl_include_reg"
>>>
>>
>> ​+1 for the additional column indicating whether the column is being
>> treated as key data or supplemental included data.​
>>
>
> +1
> And especially I don't think we should place word "INCLUDE" to the
> definition column.
>
> ​-1 for printing a boolean t/f; would rather spell it out:
>>
>
> IMHO, t/f have advantage of brevity.  From my point of view, covering
> indexes are not so evident feature.  So, users need to spend some time
> reading documentation before realizing what they are and how to use them.
> So, I don't expect that short designation of INCLUDE columns as "non-key"
> (Key == false) columns could be discouraging here.
>

I don't think there is an established practice on how to display this sort
of info, but I see that both styles already exist, namely:

=# \dL
   List of languages
Name|  Owner   | Trusted | Description
+--+-+--
 plpgsql| postgres | t   | PL/pgSQL procedural language
 plproxy| postgres | f   |
...

and

=# \dC
 List of casts
 Source type | Target type |  Function
|   Implicit?
-+-++---
 abstime | date| date
 | in assignment
 abstime | integer | (binary
coercible) | no
 abstime | timestamp without time zone | timestamp
| yes
...

I like the second option more, for readability reasons and because it is
easier to extend if ever needed.

Given that the documentation refers to included columns as "non-key
columns", it seems natural to me to name the \d output column "Key?" and
use "yes/no" as the values.

Regards,
--
Alex


Re: Setting libpq TCP keepalive parameters from environment

2018-05-09 Thread Oleksandr Shulgin
On Wed, May 9, 2018 at 1:58 AM, Craig Ringer  wrote:

> >
> > It would be much more convenient to just set the environment variable
> when
> > running the script and let it affect the whole process and its children.
> >
> > Would a patch be welcome?
>
> I can't really comment on that part, but:
>
> PGOPTIONS="-c tcp_keepalives_count=5 -c tcp_keepalives_interval=60"
> psql 'host=localhost'
>
> should enable server-side keepalives. Unsure how much that helps you
> if you need client-side keepalives too.
>

Good point, in our specific case it appears to work as well if it's the
server who sends the keepalives.

Thank you,
--
Alex


Re: [HACKERS] [PATCH] Generic type subscripting

2018-03-20 Thread Oleksandr Shulgin
On Tue, Mar 6, 2018 at 6:21 PM, Dmitry Dolgov <9erthali...@gmail.com> wrote:

>
> One more small update after fd1a421fe6 in attachments.
>

Before looking at the code I have a few comments about documentation:

in json.sgml:

+-- Extract value by key
+SELECT ('{"a": 1}'::jsonb)['a'];

What is the result of running this query?  What is the resulting data type?

+-- Extract nested value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract element by index
+SELECT ('[1, "2", null]'::jsonb)['1'];

What is the result here?  Why subscript is a string and not a number?  Are
subscription indexes 0- or 1-based?

+-- Update value by key
+UPDATE table_name set jsonb_field['key'] = 1;
+
+-- Select records using where clause with subscripting
+SELECT * from table_name where jsonb_field['key'] = '"value"';

Please capitalize: SET, FROM, WHERE.

Use of double quotes around "value" requires some explanation, I think.
Should the user expect that a suitable index is used by the query planner
for this query?

In other words, I would like to see this part of documentation to be
extended beyond just showcasing the syntax.

Regards,
-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707


Re: Estimate maintenance_work_mem for CREATE INDEX

2017-12-19 Thread Oleksandr Shulgin
On Tue, Dec 19, 2017 at 10:47 AM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> (cross-posting admin and hackers)
>
> Hello,
>
> I wonder if I'm alone in my wish to have a way for estimating how much
> maintenance work memory would suffice to allocate for a session when
> creating an index and avoid spilling to disk?
>
> Recently I had to re-create some indexes on a 9.6 server and I had some
> input on the on-disk index size: one was around 30 GB, the other -- a bit
> over 60 GB according to \di+ output.  The total number of live tuples in
> the table itself was close to 1.3e+9, the table had an estimated 25% bloat.
>
> I had some spare memory on the machine so I've given it 60 GB for
> maintenance_work_mem and expected that at least the smaller of the two will
> fit in memory completely.  To my surprise that didn't suffice and both
> indexes were building with some disk spill.
>
> Is anyone aware of a query to estimate the memory requirements for CREATE
> INDEX [CONCURRENTLY]?
>
> I've looked in the postgres wiki, but didn't find anything to that end.
> Nor searching the archives of pgsql-admin did help.
>
> I understand that there were some changes in recent releases related to
> memory allocation (e.g. allowing huge allocation in 9.4), but at least
> targeting 9.6 or 10 would make sense.  There are also a lot of ways how one
> CREATE INDEX can be different from the other, but in the most simple case
> where you have fixed-width columns and building the full index (i.e. no
> WHERE clause), it should be possible.
>

Now I see I fail to mention this is the default btree index with all
default options.  Obviously other indexes can be very different in memory
requirements.

Not hasting to look in the source to calculate all the sizeof()s yet:
> waiting on your reply and suggestions. ;-)
>

If there would be an option in the database itself to provide those
estimation, we wouldn't even need to figure out estimation queries.
"EXPLAIN CREATE INDEX" anyone?

Regards,
-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707


Estimate maintenance_work_mem for CREATE INDEX

2017-12-19 Thread Oleksandr Shulgin
(cross-posting admin and hackers)

Hello,

I wonder if I'm alone in my wish to have a way for estimating how much
maintenance work memory would suffice to allocate for a session when
creating an index and avoid spilling to disk?

Recently I had to re-create some indexes on a 9.6 server and I had some
input on the on-disk index size: one was around 30 GB, the other -- a bit
over 60 GB according to \di+ output.  The total number of live tuples in
the table itself was close to 1.3e+9, the table had an estimated 25% bloat.

I had some spare memory on the machine so I've given it 60 GB for
maintenance_work_mem and expected that at least the smaller of the two will
fit in memory completely.  To my surprise that didn't suffice and both
indexes were building with some disk spill.

Is anyone aware of a query to estimate the memory requirements for CREATE
INDEX [CONCURRENTLY]?

I've looked in the postgres wiki, but didn't find anything to that end.
Nor searching the archives of pgsql-admin did help.

I understand that there were some changes in recent releases related to
memory allocation (e.g. allowing huge allocation in 9.4), but at least
targeting 9.6 or 10 would make sense.  There are also a lot of ways how one
CREATE INDEX can be different from the other, but in the most simple case
where you have fixed-width columns and building the full index (i.e. no
WHERE clause), it should be possible.

Not hasting to look in the source to calculate all the sizeof()s yet:
waiting on your reply and suggestions. ;-)

Cheers!
-- 
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707