Re: cfbot update: Using GitHub for patch review

2024-06-21 Thread Josef Šimánek
pá 21. 6. 2024 v 16:36 odesílatel Jelte Fennema-Nio  napsal:
>
> I recently got write access to the cfbot repo[1] and machine from
> Thomas. And I deployed a few improvements this week. The most
> significant one is that it is now much easier to use GitHub as part of
> your patch review workflow.
>
> On the cfbot website[2] there's now a "D" (diff) link next to each
> commit fest entry. A good example of such a link would be the one for
> my most recent commitfest entry[3]. There is a separate commit for
> each patch file and those commits contain the "git format-patch"
> metadata. (this is not done using git am, but using git mailinfo +
> patch + sed, because git am is horrible at resolving conflicts)

This is brilliant!

> The killer feature (imho) of GitHub diffs over looking at patch files:
> You can press the "Expand up"/"Expand down" buttons on the left of the
> diff to see some extra context that the patch file doesn't contain.
>
> You can also add the cfbot repo as a remote to your local git
> repository. That way you don't have to manually download patches and
> apply them to your local checkout anymore:
>
> # Add the remote
> git remote add -f cfbot https://github.com/postgresql-cfbot/postgresql.git
> # make future git pulls much quicker (optional)
> git maintenance start
> # check out a commitfest entry
> git checkout cf/5065
>
> P.S. Suggestions for further improvements are definitely appreciated.
> We're currently already working on better integration between the
> commitfest app website and the cfbot website.
>
> P.P.S The "D" links don't work for patches that need to be rebased
> since before I deployed this change, but that problem should fix
> itself with time.
>
> [1]: https://github.com/macdice/cfbot
> [2]: http://cfbot.cputube.org/
> [3]: 
> https://github.com/postgresql-cfbot/postgresql/compare/cf/5065~1...cf/5065
>
>




Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek
so 18. 5. 2024 v 23:29 odesílatel Yasir  napsal:
>
>
>
> On Sun, May 19, 2024 at 2:23 AM Josef Šimánek  wrote:
>>
>> so 18. 5. 2024 v 23:16 odesílatel Tom Lane  napsal:
>> >
>> > =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
>> > > But this is different. If I understand it well, just by following
>> > > https://www.postgresql.org/docs/16/install-windows-full.html you'll
>> > > get those files no matter what is your specific environment (or
>> > > specific set of tools).
>> >
>> > Hm?  Visual Studio seems like quite a specific tool from here.
>>
>> I initially thought the .vs folder is created just by compiling
>> PostgreSQL using build.bat (like without opening Visual Studio at
>> all). But I'm not 100% sure, I'll take a look and report back.
>
>
> .vs folder is not created just by compiling PG. It is created if you open any 
> of .sln, .vcproj or .vcxproj files.
> I have verified it.

Yes, I can confirm. Just running build.bat doesn't create .vs. I'm
sorry for confusion and I do agree ignoring ".vs" directory is a local
environment thing and doesn't belong to Postgres .gitignore.

>>
>>
>> > I did some googling around the question of project .gitignore
>> > files ignoring .vs/, and was amused to come across this:
>> >
>> > https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
>> >
>> > which seems like a mighty fine example of where we *don't*
>> > want to go.
>>
>> That's clearly a nightmare to maintain. But in this case it should be
>> all hidden within one .vs folder.
>>
>> > regards, tom lane




Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek
so 18. 5. 2024 v 23:16 odesílatel Tom Lane  napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > But this is different. If I understand it well, just by following
> > https://www.postgresql.org/docs/16/install-windows-full.html you'll
> > get those files no matter what is your specific environment (or
> > specific set of tools).
>
> Hm?  Visual Studio seems like quite a specific tool from here.

I initially thought the .vs folder is created just by compiling
PostgreSQL using build.bat (like without opening Visual Studio at
all). But I'm not 100% sure, I'll take a look and report back.

> I did some googling around the question of project .gitignore
> files ignoring .vs/, and was amused to come across this:
>
> https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
>
> which seems like a mighty fine example of where we *don't*
> want to go.

That's clearly a nightmare to maintain. But in this case it should be
all hidden within one .vs folder.

> regards, tom lane




Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek
so 18. 5. 2024 v 22:36 odesílatel Tom Lane  napsal:
>
> Yasir  writes:
> > We can add it to "~/.config/git/ignore" as it will ignore globally on
> > windows which we don't want. Also we don't have ".git/info/exclude" in PG
> > project's so the best place left is projects's .gitignore. That's what was
> > patched.
>
> As Peter said, we're not going to do that.  The intention with
> the project's .gitignore files is to ignore files that are
> intentionally built by our "make" targets (and, hopefully, will be
> removed by "make maintainer-clean").  Anything else that you want
> git to ignore should be in a personal ignore list; especially
> files that are platform-specific.  The fact that it's reasonable
> to ignore ".vs" files when working with your toolset doesn't mean
> that it's reasonable to ignore them when working on some other
> platform.
>
> If we used some other policy, we'd have tons of debates about
> which files were reasonable to exclude.  For myself, for example,
> I exclude "*~" (Emacs backup files) and "*.orig" (patch(1)
> backup files) but those choices are very much dependent on the
> set of tools I choose to use.  Other developers have other
> personal exclusion lists.  If we tried to make the project's
> files be the union of all those lists, we'd be at serious risk
> of ignoring stuff we absolutely shouldn't ignore in some contexts.

But this is different. If I understand it well, just by following
https://www.postgresql.org/docs/16/install-windows-full.html you'll
get those files no matter what is your specific environment (or
specific set of tools).

> regards, tom lane




Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-18 Thread Josef Šimánek
pá 17. 5. 2024 v 8:09 odesílatel Yasir  napsal:
>
> Hi Hackers,
>
> I have been playing with PG on the Windows platform recently. An annoying 
> thing I faced is that a lot of Visual Studio's temp files kept appearing in 
> git changed files. Therefore, I am submitting this very trivial patch to 
> ignore these temp files.

see 
https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files#configuring-ignored-files-for-all-repositories-on-your-computer
for various strategies

Anyway if those are not files specific to your setup (like editor
ones), but files which every PG hacker on Windows will generate as
well (which is this case IMHO), it will make sense to add it into
project's gitignore.

> Looking forward to the PG guru's guidance!
>
> Regards...
>
> Yasir Hussain
> Principal Software Engineer
> Bitnine Global Inc.
>




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek
st 15. 5. 2024 v 21:33 odesílatel David G. Johnston
 napsal:
>
> On Wed, May 15, 2024 at 12:18 PM  wrote:
>>
>> Tom Lane:
>> >> This is really what is missing for the ecosystem. A libpqparser for
>> >> tools to use: Formatters, linters, query rewriters, simple syntax
>> >> checkers... they are all missing access to postgres' own parser.
>> >
>> > To get to that, you'd need some kind of agreement on what the syntax
>> > tree is.  I doubt our existing implementation would be directly useful
>> > to very many tools, and even if it is, do they want to track constant
>> > version-to-version changes?
>>
>> Correct, on top of what the syntax tree currently has, one would
>> probably need:
>> - comments
>> - locations (line number / character) for everything, including those of
>> comments
>>
>
> I'm with the original patch idea at this point.  A utility that simply runs 
> text through the parser, not parse analysis, and answers the question: "Were 
> you able to parse this?" has both value and seems like something that can be 
> patched into core in a couple of hundred lines, not thousands, as has already 
> been demonstrated.
>
> Sure, other questions are valid and other goals exist in the ecosystem, but 
> that doesn't diminish this one which is sufficiently justified for my +1 on 
> the idea.
>
> Now, in my ideal world something like this could be made as an extension so 
> that it can work on older versions and not have to be maintained by core.  
> And likely grow more features over time.  Is there anything fundamental about 
> this that prevents it being implemented in an extension and, if so, what can 
> we add to core in v18 to alleviate that limitation?

Like extension providing additional binary? Or what kind of extension
do you mean? One of the original ideas was to be able to do so (parse
query) without running postgres itself. Could extension be useful
without running postgres backend?

> David J.
>




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek
st 15. 5. 2024 v 21:28 odesílatel Tom Lane  napsal:
>
> Robert Haas  writes:
> > On Wed, May 15, 2024 at 2:39 PM Tom Lane  wrote:
> >> The thing that was bothering me most about this is that I don't
> >> understand why that's a useful check. ...
>
> > But I don't understand the idea that the concept doesn't make sense.
>
> Sorry: "make sense" was a poorly chosen phrase there.  What I was
> doubting, and continue to doubt, is that 100% checking of what
> you can check without catalog access and 0% checking of the rest
> is a behavior that end users will find especially useful.

But that's completely different feature which is not exclusive and
shouldn't block this other feature to do only exactly as specified.

> > I think it is perfectly reasonable to imagine a world in which the
> > initial parsing takes care of reporting everything that can be
> > determined by static analysis without knowing anything about catalog
> > contents, and parse analysis checks all the things that require
> > catalog access, and you can run the first set of checks and then
> > decide whether to proceed further. I think if I were designing a new
> > system from scratch, I'd definitely want to set it up that way, and I
> > think moving our existing system in that direction would probably let
> > us clean up a variety of warts along the way. Really, the only
> > argument I see against it is that getting from where we are to there
> > would be a gigantic amount of work for the value we'd derive.
>
> I'm less enthusiatic about this than you are.  I think it would likely
> produce a slower and less maintainable system.  Slower because we'd
> need more passes over the query: what parse analysis does today would
> have to be done in at least two separate steps.  Less maintainable
> because knowledge about certain things would end up being more spread
> around the system.  Taking your example of getting syntax checking to
> recognize invalid EXPLAIN keywords: right now there's just one piece
> of code that knows what those options are, but this'd require two.
> Also, "run the first set of checks and then decide whether to proceed
> further" seems like optimizing the system for broken queries over
> valid ones, which I don't think is an appropriate goal.
>
> Now, I don't say that there'd be *no* benefit to reorganizing the
> system that way.  But it wouldn't be an unalloyed win, and so I
> share your bottom line that the costs would be out of proportion
> to the benefits.
>
> regards, tom lane




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek
st 15. 5. 2024 v 20:39 odesílatel Tom Lane  napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > I'm not sure everyone in this thread understands the reason for this
> > patch, which is clearly my fault, since I have failed to explain. Main
> > idea is to make a tool to validate query can be parsed, that's all.
> > Similar to running "EXPLAIN query", but not caring about the result
> > and not caring about the DB structure (ignoring missing tables, ...),
> > just checking it was successfully executed. This definitely belongs to
> > the server side and not to the client side, it is just a tool to
> > validate that for this running PostgreSQL backend it is a "parseable"
> > query.
>
> The thing that was bothering me most about this is that I don't
> understand why that's a useful check.  If I meant to type
>
> UPDATE mytab SET mycol = 42;
>
> and instead I type
>
> UPDATEE mytab SET mycol = 42;
>
> your proposed feature would catch that; great.  But if I type
>
> UPDATE mytabb SET mycol = 42;
>
> it won't.  How does that make sense?  I'm not entirely sure where
> to draw the line about what a "syntax check" should catch, but this
> seems a bit south of what I'd want in a syntax-checking editor.

This is exactly where the line is drawn. My motivation is not to use
this feature for syntax check in editor (even could be used to find
those banalities). I'm looking to improve automation to be able to
detect those banalities as early as possible. Let's say there is
complex CI automation configuring and starting PostgreSQL backend,
loading some data, ... and in the end all this is useless, since there
is this kind of simple mistake like UPDATEE. I would like to detect
this problem as early as possible and stop the complex CI pipeline to
save time and also to save resources (= money) by failing super-early
and reporting back. This kind of mistake could be simply introduced by
like wrongly resolved git conflict, human typing error ...

This kind of mistake (typo, ...) can be usually spotted super early.
In compiled languages during compilation, in interpreted languages
(like Ruby) at program start (since code is parsed as one of the first
steps). There is no such early detection possible for PostgreSQL
currently IMHO.

> BTW, if you do feel that a pure grammar check is worthwhile, you
> should look at the ecpg preprocessor, which does more or less that
> with the SQL portions of its input.  ecpg could be a better model
> to follow because it doesn't bring all the dependencies of the server
> and so is much more likely to appear in a client-side installation.
> It's kind of an ugly, unmaintained mess at the moment, sadly.
>
> The big knock on doing this client-side is that there might be
> version skew compared to the server you're using --- but if you
> are not doing anything beyond a grammar-level check then your
> results are pretty approximate anyway, ISTM.  We've not heard
> anything suggesting that version skew is a huge problem for
> ecpg users.

Thanks for the info, I'll check.

> regards, tom lane




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek
pá 15. 12. 2023 v 15:50 odesílatel Tom Lane  napsal:
>
> Laurenz Albe  writes:
> > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> >> of ideas for PostgreSQL
> >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> >> quick patch to do SQL syntax validation.
> >>
> >> What do you think?
>
> > I like the idea.  But what will happen if the SQL statement references
> > tables or other objects, since we have no database?
>
> This seems like a fairly useless wart to me.

this hurts :'(

>
> In the big picture a command line switch in the postgres executable
> doesn't feel like the right place for this.  There's no good reason
> to assume that the server executable will be installed where you want
> this capability; not to mention the possibility of version skew
> between that executable and whatever installation you're actually
> running on.
>
> Another thing I don't like is that this exposes to the user what ought
> to be purely an implementation detail, namely the division of labor
> between gram.y (raw_parser()) and the rest of the parser.  There are
> checks that a user would probably see as "syntax checks" that don't
> happen in gram.y, and conversely there are some things we reject there
> that seem more like semantic than syntax issues.
>
> regards, tom lane




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-15 Thread Josef Šimánek
st 15. 5. 2024 v 19:43 odesílatel Robert Haas  napsal:
>
> On Sun, Feb 25, 2024 at 5:24 PM Tomas Vondra
>  wrote:
> > I think there's about 0% chance we'd add this to "postgres" binary.
>
> Several people have taken this position, so I'm going to mark
> https://commitfest.postgresql.org/48/4704/ as Rejected.
>
> My own opinion is that syntax checking is a useful thing to expose,
> but I don't believe that this is a useful way to expose it. I think
> this comment that Tom made upthread is right on target:
>
> # Another thing I don't like is that this exposes to the user what ought
> # to be purely an implementation detail, namely the division of labor
> # between gram.y (raw_parser()) and the rest of the parser.  There are
> # checks that a user would probably see as "syntax checks" that don't
> # happen in gram.y, and conversely there are some things we reject there
> # that seem more like semantic than syntax issues.
>
> I think that what that means in practice is that, while this patch may
> seem to give reasonable results in simple tests, as soon as you try to
> do slightly more complicated things with it, it's going to give weird
> results, either failing to flag things that you'd expect it to flag,
> or flagging things you'd expect it not to flag. Fixing that would be
> either impossible or a huge amount of work depending on your point of
> view. If you take the point of view that we need to adjust things so
> that the raw parser reports all the things that ought to be reported
> by a tool like this and none of the things that it shouldn't, then
> it's probably just a huge amount of work. If you take the point of
> view that what goes into the raw parser and what goes into parse
> analysis ought to be an implementation decision untethered to what a
> tool like this ought to report, then fixing the problems would be
> impossible, or at least, it would amount to throwing away this patch
> and starting over. I think the latter point of view, which Tom has
> already taken, would be the more prevalent view among hackers by far,
> but even if the former view prevailed, who is going to do all that
> work?
>
> I strongly suspect that doing something useful in this area requires
> about two orders of magnitude more code than are included in this
> patch, and a completely different design. If it actually worked well
> to do something this simple, somebody probably would have done it
> already. In fact, there are already tools out there for validation,
> like https://github.com/okbob/plpgsql_check for example. That tool
> doesn't do exactly the same thing as this patch is trying to do, but
> it does do other kinds of validation, and it's 19k lines of code, vs.
> the 45 lines of code in this patch, which I think reinforces the point
> that you need to do something much more complicated than this to
> create real value.
>
> Also, the patch as proposed, besides being 45 lines, also has zero
> lines of comments, no test cases, no documentation, and doesn't follow
> the PostgreSQL coding standards. I'm not saying that to be mean, nor
> am I suggesting that Josef should go fix that stuff. It's perfectly
> reasonable to propose a small patch without a lot of research to see
> what people think -- but now we have the answer to that question:
> people think it won't work. So Josef should now decide to either give
> up, or try a new approach, or if he's really sure that all the
> feedback that has been given so far is completely wrong, he can try to
> demonstrate that the patch does all kinds of wonderful things with
> very few disadvantages. But I think if he does that last, he's going
> to find that it's not really possible, because I'm pretty sure that
> Tom is right.

I'm totally OK to mark this as rejected and indeed 45 lines were just
minimal patch to create PoC (I have coded this during last PGConf.eu
lunch break) and mainly to start discussion.

I'm not sure everyone in this thread understands the reason for this
patch, which is clearly my fault, since I have failed to explain. Main
idea is to make a tool to validate query can be parsed, that's all.
Similar to running "EXPLAIN query", but not caring about the result
and not caring about the DB structure (ignoring missing tables, ...),
just checking it was successfully executed. This definitely belongs to
the server side and not to the client side, it is just a tool to
validate that for this running PostgreSQL backend it is a "parseable"
query.

I'm not giving up on this, but I hear there are various problems to
explore. If I understand it well, just running the parser to query
doesn't guarantee the query is valid, since it can fail later for
various reasons (I mean other than missing table, ...). I wasn't aware
of that. Also exposing this inside postgres binary seems
controversial. I had an idea to expose parser result at SQL level with
a new command (similar to EXPLAIN), but you'll need running PostgreSQL
backend to be able to use this capability, which is 

Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-02-26 Thread Josef Šimánek
po 26. 2. 2024 v 8:20 odesílatel Jelte Fennema-Nio  napsal:
>
> On Sun, 25 Feb 2024 at 23:34, Josef Šimánek  wrote:
> > Exposing parser to frontend tools makes no sense to me
>
> Not everyone seems to agree with that, it's actually already done by
> Lukas from pganalyze: https://github.com/pganalyze/libpg_query

I did a quick look. That's clearly amazing work, but it is not parser
being exposed to frontend (refactored). It is (per my understanding)
backend code isolated to minimum to be able to parse query. It is
still bound to individual backend version and to backend source code.
And it is close to my effort (I was about to start with a simpler
version not providing tokens, just the result), but instead of copying
files from backend into separate project and shave off to minimum,
provide the same tool with backend utils directly.




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-02-25 Thread Josef Šimánek
ne 25. 2. 2024 v 23:24 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 12/15/23 16:38, Josef Šimánek wrote:
> > pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
> >  napsal:
> >>
> >> On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek  
> >> wrote:
> >>>
> >>> (parser is not available
> >>> in public APIs of postgres_fe.h or libpq).
> >>
> >>
> >> What about building "libpg" that does expose and exports some public APIs 
> >> for the parser?  We can include a reference CLI implementation for basic 
> >> usage of the functionality while leaving the actual language server 
> >> project outside of core.
> >
> > Language server (LSP) can just benefit from this feature, but it
> > doesn't cover all possibilities since LSP is meant for one purpose ->
> > run in developer's code editor. Actual syntax check is more generic,
> > able to cover CI checks and more. I would not couple this feature and
> > LSP, LSP can just benefit from it (and it is usually built in a way
> > that uses other tools and packs them into LSP). Exposing this kind of
> > SQL check doesn't mean something LSP related being implemented in
> > core. LSP can just benefit from this.
> >
>
> I don't know enough about LSP to have a good idea how to implement this
> for PG, but my assumption would be we'd have some sort of library
> exposing "parser" to frontend tools, and also an in-core binary using
> that library (say, src/bin/pg_syntax_check). And LSP would use that
> parser library too ...
>
> I think there's about 0% chance we'd add this to "postgres" binary.

Exposing parser to frontend tools makes no sense to me and even if it
would, it is a huge project not worth to be done just because of
syntax check. I can update the patch to prepare a new binary, but
still on the backend side. This syntax check should be equivalent to
running a server locally, running a query and caring only about
parsing part finished successfully. In my thinking, this belongs to
backend tools.

> > Exposing parser to libpg seems good idea, but I'm not sure how simple
> > that could  be done since during my journey I have found out there are
> > a lot of dependencies which are not present in usual frontend code per
> > my understanding like memory contexts and removal of those
> > (decoupling) would be huge project IMHO.
> >
>
> You're right the grammar/parser expects a lot of backend infrastructure,
> so making it available to frontend is going to be challenging. But I
> doubt there's a better way than starting with gram.y and either removing
> or adding the missing pieces (maybe only a mock version of it).
>
> I'm not a bison expert, but considering your goal seems to be a basic
> syntax check, maybe you could simply rip out most of the bits depending
> on backend stuff, or maybe replace them with some trivial no-op code?
>
> But as Tom mentioned, the question is how far gram.y gets you. There's
> plenty of ereport(ERROR) calls in src/backend/parser/*.c our users might
> easily consider as parse errors ...
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2023-12-15 Thread Josef Šimánek
pá 15. 12. 2023 v 16:32 odesílatel David G. Johnston
 napsal:
>
> On Fri, Dec 15, 2023 at 8:20 AM Josef Šimánek  wrote:
>>
>> (parser is not available
>> in public APIs of postgres_fe.h or libpq).
>
>
> What about building "libpg" that does expose and exports some public APIs for 
> the parser?  We can include a reference CLI implementation for basic usage of 
> the functionality while leaving the actual language server project outside of 
> core.

Language server (LSP) can just benefit from this feature, but it
doesn't cover all possibilities since LSP is meant for one purpose ->
run in developer's code editor. Actual syntax check is more generic,
able to cover CI checks and more. I would not couple this feature and
LSP, LSP can just benefit from it (and it is usually built in a way
that uses other tools and packs them into LSP). Exposing this kind of
SQL check doesn't mean something LSP related being implemented in
core. LSP can just benefit from this.

Exposing parser to libpg seems good idea, but I'm not sure how simple
that could  be done since during my journey I have found out there are
a lot of dependencies which are not present in usual frontend code per
my understanding like memory contexts and removal of those
(decoupling) would be huge project IMHO.

> David J.




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2023-12-15 Thread Josef Šimánek
pá 15. 12. 2023 v 16:16 odesílatel David G. Johnston
 napsal:
>
> On Fri, Dec 15, 2023 at 8:05 AM Josef Šimánek  wrote:
>>
>> pá 15. 12. 2023 v 15:50 odesílatel Tom Lane  napsal:
>> >
>> > Laurenz Albe  writes:
>> > > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
>> > >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
>> > >> of ideas for PostgreSQL
>> > >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
>> > >> quick patch to do SQL syntax validation.
>> > >>
>> > >> What do you think?
>> >
>> > > I like the idea.  But what will happen if the SQL statement references
>> > > tables or other objects, since we have no database?
>> >
>> > This seems like a fairly useless wart to me.  What does it do that
>> > you can't do better with existing facilities (psql etc)?
>>
>> Per my experience, this is mostly useful during development to catch
>> syntax errors super early.
>
>
> I'd suggest helping this project get production capable since it already is 
> trying to integrate with the development ecosystem you describe here.
>
> https://github.com/supabase/postgres_lsp

Indeed LSP is the one of the targets of this feature. Currently it is
using https://github.com/pganalyze/libpg_query under the hood probably
because of the same reasons I have described (parser is not available
in public APIs of postgres_fe.h or libpq). Exposing basic parser
capabilities in postgres binary itself and also on SQL level by
pg_check_syntax function can prevent the need of "hacking" pg parser
to be accessible outside of server binary.

> I agree that developing this as a new executable for the purpose is needed in 
> order to best conform to existing conventions.
>
> David J.
>




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2023-12-15 Thread Josef Šimánek
pá 15. 12. 2023 v 15:50 odesílatel Tom Lane  napsal:
>
> Laurenz Albe  writes:
> > On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> >> Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> >> of ideas for PostgreSQL
> >> (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> >> quick patch to do SQL syntax validation.
> >>
> >> What do you think?
>
> > I like the idea.  But what will happen if the SQL statement references
> > tables or other objects, since we have no database?
>
> This seems like a fairly useless wart to me.  What does it do that
> you can't do better with existing facilities (psql etc)?

Per my experience, this is mostly useful during development to catch
syntax errors super early. For SELECT/INSERT/UPDATE/DELETE queries, it
is usually enough to prepend with EXPLAIN and run. But EXPLAIN doesn't
support all SQL like DDL statements. Let's say I have a long SQL
script I'm working on and there is typo in the middle like ALTERR
instead of ALTER. Is there any simple way to detect this without
actually running the statement in psql or other existing facilities?
This check could be simply combined with editor capabilities to be run
on each SQL file save to get quick feedback on this kind of mistakes
for great developer experience.

> In the big picture a command line switch in the postgres executable
> doesn't feel like the right place for this.  There's no good reason
> to assume that the server executable will be installed where you want
> this capability; not to mention the possibility of version skew
> between that executable and whatever installation you're actually
> running on.

This is mostly intended for SQL developers and CI systems where
PostgreSQL backend is usually installed and in the actual version
needed. I agree postgres is not the best place (even it makes
partially sense to me), but as I mentioned, I wasn't able to craft a
quick patch with a better place to put this in. What would you
recommend? Separate executable like pg_syntaxcheck?

> Another thing I don't like is that this exposes to the user what ought
> to be purely an implementation detail, namely the division of labor
> between gram.y (raw_parser()) and the rest of the parser.  There are
> checks that a user would probably see as "syntax checks" that don't
> happen in gram.y, and conversely there are some things we reject there
> that seem more like semantic than syntax issues.

I have no big insight into SQL parsing. Can you please share examples
of given concerns? Is there anything better than raw_parser() for this
purpose?

> regards, tom lane




Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2023-12-15 Thread Josef Šimánek
Dne pá 15. 12. 2023 14:09 uživatel Laurenz Albe 
napsal:

> On Fri, 2023-12-15 at 13:21 +0100, Josef Šimánek wrote:
> > Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
> > of ideas for PostgreSQL
> > (https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
> > quick patch to do SQL syntax validation.
> >
> > What do you think?
>
> I like the idea.  But what will happen if the SQL statement references
> tables or other objects, since we have no database?
>

It checks just the identifier is valid from parser perspective, like it is
valid table name.


> Yours,
> Laurenz Albe
>


[PATCH] Add --syntax to postgres for SQL syntax checking

2023-12-15 Thread Josef Šimánek
Hello!

Inspired by Simon Riggs' keynote talk at PGCounf.eu 2023 sharing list
of ideas for PostgreSQL
(https://riggs.business/blog/f/postgresql-todo-2023), I have crafted a
quick patch to do SQL syntax validation.

It is also heavily inspired by the "ruby -c" command, useful to check
syntax of Ruby programs without executing them.

For now, to keep it simple and to open discussion, I have added new
"--syntax" option into "postgres" command, since it is currently the
only one using needed parser dependency (at least per my
understanding). I tried to add this into psql or separate pg_syntax
commands, but parser is not exposed in "postgres_fe.h" and including
backend into those tools would not make most likely sense. Also syntax
could vary per backend, it makes sense to keep it in there.

It expects input on STDIN, prints out error if any and prints out
summary message (Valid SQL/Invalid SQL). On valid input it exits with
0 (success), otherwise it exits with 1 (error).

Example usage:

$ echo "SELECT 1" | src/backend/postgres --syntax
Valid SQL

$ echo "SELECT 1abc" | src/backend/postgres --syntax
ERROR:  trailing junk after numeric literal at or near "1a" at character 8
Invalid SQL

$ cat ../src/test/regress/sql/alter_operator.sql | src/backend/postgres --syntax
Valid SQL

$ cat ../src/test/regress/sql/advisory_lock.sql | src/backend/postgres --syntax
ERROR:  syntax error at or near "\" at character 99
Invalid SQL

This could be useful for manual script checks, automated script checks
and code editor integrations.

Notice it just parses the SQL, it doesn't detect any "runtime"
problems like unexisting table names, etc.

I have various ideas around this (like exposing similar functionality
directly in SQL using custom function like pg_check_syntax), but I
would like to get some feedback first.

What do you think?
enhnace
PS: I wasn't able to find any automated tests for "postgres" command
to enhance with, are there any?

PS2: Patch could be found at https://github.com/simi/postgres/pull/8 as well.
From 6ec6cc599678c0ac76f4559039ff3399f843b9b1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Fri, 15 Dec 2023 13:00:23 +0100
Subject: [PATCH] Add --syntax option to postgres.

- it validates SQL on STDIN and reports back if valid/invalid
---
 src/backend/main/main.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index ed11e8be7fab..eb7266f9d5d3 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -40,6 +40,7 @@
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
 #include "utils/ps_status.h"
+#include "parser/parser.h"
 
 
 const char *progname;
@@ -50,6 +51,7 @@ static void startup_hacks(const char *progname);
 static void init_locale(const char *categoryname, int category, const char *locale);
 static void help(const char *progname);
 static void check_root(const char *progname);
+static void check_syntax(void);
 
 
 /*
@@ -153,6 +155,10 @@ main(int argc, char *argv[])
 			fputs(PG_BACKEND_VERSIONSTR, stdout);
 			exit(0);
 		}
+		if (strcmp(argv[1], "--syntax") == 0)
+		{
+			check_syntax();
+		}
 
 		/*
 		 * In addition to the above, we allow "--describe-config" and "-C var"
@@ -347,6 +353,7 @@ help(const char *progname)
 	printf(_("  -s show statistics after each query\n"));
 	printf(_("  -S WORK-MEMset amount of memory for sorts (in kB)\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  --syntax   checks SQL on STDIN is valid, then exit\n"));
 	printf(_("  --NAME=VALUE   set run-time parameter\n"));
 	printf(_("  --describe-config  describe configuration parameters, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
@@ -422,6 +429,44 @@ check_root(const char *progname)
 #endif			/* WIN32 */
 }
 
+static void
+check_syntax() {
+	List *parsetree_list;
+	bool valid = false;
+	char buffer[1024 * 1024];
+	char ch;
+	int i = 0;
+
+	// TODO: check for buffer overflow
+	while ((ch = getchar()) != EOF)
+	{
+		buffer[i] = ch;
+		i++;
+	}
+
+	PG_TRY();
+	{
+		parsetree_list = raw_parser(buffer, RAW_PARSE_DEFAULT);
+		valid = true;
+	}
+	PG_CATCH();
+	{
+		EmitErrorReport();
+		FlushErrorState();
+		valid = false;
+	}
+	PG_END_TRY();
+
+	// TODO: translate output
+	if (valid && parsetree_list) {
+		printf("Valid SQL\n");
+		exit(0);
+	} else {
+		printf("Invalid SQL\n");
+		exit(1);
+	}
+}
+
 /*
  * At least on linux, set_ps_display() breaks /proc/$pid/environ. The
  * sanitizer library uses /proc/$pid/environ to implement getenv() as it wants


Re: GTIN14 support for contrib/isn

2023-06-08 Thread Josef Šimánek
čt 8. 6. 2023 v 17:20 odesílatel Michael Kefeder  napsal:
>
>
> Am 15.03.19 um 17:27 schrieb Tom Lane:
> > Michael Kefeder  writes:
> >> For a project of ours we need GTIN14 data type support.
> >
> > Hm, what is that and where would a reviewer find the specification for it?
> >
> specs are from GS1 here https://www.gs1.org/standards/id-keys/gtin
> side-note EAN13 is actually called GTIN-13 now. Wikipedia has a quick
> overview https://en.wikipedia.org/wiki/Global_Trade_Item_Number
>
> >> Looking at the code I saw every format that isn-extension supports is
> >> stored as an EAN13. Theoretically that can be changed to be GTIN14, but
> >> that would mean quite a lot of rewrite I feared, so I chose to code only
> >> GTIN14 I/O separetely to not interfere with any existing conversion
> >> magic. This yields an easier to understand patch and doesn't touch
> >> existing functionality. However it introduces redundancy to a certain
> >> extent.
> >
> > Yeah, you certainly don't get to change the on-disk format of the existing
> > types, unfortunately.  Not sure what the least messy way of dealing with
> > that is.  I guess we do want this to be part of contrib/isn rather than
> > an independent module, if there are sane datatype conversions with the
> > existing isn types.
> >
> the on-disk format does not change (it would support even longer codes
> it's just an integer where one bit is used for valid/invalid flag, did
> not touch that at all). Putting GTIN14 in isn makes sense I find and is
> back/forward compatible.
>
> >> Find my patch attached. Please let me know if there are things that need
> >> changes, I'll do my best to get GTIN support into postgresql.
> >
> > Well, two comments immediately:
> >
> > * where's the documentation changes?
> >
> > * simply editing the .sql file in-place is not acceptable; that breaks
> > the versioning conventions for extensions, and leaves users with no
> > easy upgrade path.  What you need to do is create a version upgrade
> > script that adds the new objects.  For examples look for other recent
> > patches that have added features to contrib modules, eg
> >
> > https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=eb6f29141bed9dc95cb473614c30f470ef980705
> >
> > Also, I'm afraid you've pretty much missed the deadline to get this
> > into PG v12; we've already got more timely-submitted patches than
> > we're likely to be able to finish reviewing.  Please add it to the
> > first v13 commit fest,
> >
> > https://commitfest.postgresql.org/23/
> >
> > so that we don't forget about it when the time does come to look at it.
> >
> >   regards, tom lane
> >
>
> thanks for the feedback! will do mentioned documentation changes and
> create a separate upgrade sql file. Making it into v13 is fine by me.

Hello!

If I understand it well, this patch wasn't finished and submitted
after this discussion. If there is still interest, I can try to polish
the patch, rebase and submit. I'm interested in GTIN14 support.

> br
>   mike
>
>
>




Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)

2023-03-20 Thread Josef Šimánek
po 20. 3. 2023 v 11:24 odesílatel Matthias van de Meent
 napsal:
>
> On Mon, 20 Mar 2023 at 11:11, Tomas Vondra
>  wrote:
> >
> > On 3/14/23 15:41, Matthias van de Meent wrote:
> > > On Tue, 14 Mar 2023 at 14:49, Tomas Vondra
> > >  wrote:
> > >>
> > >>> ...
> > >
> > >> If you agree with these changes, I'll get it committed.
> > >
> > > Yes, thanks!
> > >
> >
> > I've tweaked the patch per the last round of comments, cleaned up the
> > commit message a bit (it still talked about unused bit in tuple header
> > and so on), and pushed it.
> >
> > Thanks for fixing the issues that got the patch reverted last year!
>
> Thanks for helping getting this in!

Thanks for fixing the problems!

>
> Kind regards,
>
> Matthias van de Meent.
>
>




Re: How about a psql backslash command to show GUCs?

2022-04-07 Thread Josef Šimánek
st 6. 4. 2022 v 19:49 odesílatel Tom Lane  napsal:
>
> It's not difficult to get psql to show you the current value
> of a single GUC --- "SHOW" does that fine, and it has tab
> completion support for the GUC name.  However, I very often
> find myself resorting to the much more tedious
>
> select * from pg_settings where name like '%foo%';
>
> when I want to see some related parameters, or when I'm a bit
> fuzzy on the exact name of the parameter.  Not only is this
> a lot of typing, but unless I'm willing to type even more to
> avoid using "*", I'll get a wall of mostly unreadable text,
> because pg_settings is far too wide and cluttered with
> low-grade information.
>
> In the discussion about adding privileges for GUCs [1], there
> was a proposal to add a new psql backslash command to show GUCs,
> which could reduce this problem to something like
>
> \dcp *foo*
>
> (The version proposed there was not actually useful for this
> purpose because it was too narrowly focused on GUCs with
> privileges, but that's easily fixed.)
>
> So does anyone else like this idea?

I like this idea. Also I'm interested in contributing this. Feel free
to ping me if welcomed, I can try to prepare at least the initial
patch. Currently it seems the discussion is related mostly to the
command name, which can be changed at any time.

> In detail, I'd imagine this command showing the name, setting, unit,
> and vartype fields of pg_setting by default, and if you add "+"
> then it should add the context field, as well as applicable
> privileges when server version >= 15.  However, there's plenty
> of room for bikeshedding that list of columns, not to mention
> the precise name of the command.  (I'm not that thrilled with
> "\dcp" myself, as it looks like it might be a sub-form of "\dc".)
> So I thought I'd solicit comments before working on a patch
> not after.
>
> I view this as being at least in part mop-up for commit a0ffa885e,
> especially since a form of this was discussed in that thread.
> So I don't think it'd be unreasonable to push into v15, even
> though it's surely a new feature.
>
> regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/flat/3d691e20-c1d5-4b80-8ba5-6beb63af3...@enterprisedb.com
>
>




Re: faulty link

2022-02-10 Thread Josef Šimánek
čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
>
> The provided link
>https://www.postgresql.org/docs/release/
>
> leads to
>https://www.postgresql.org/docs/release/14.2/
>
> which gives 'Not Found' for me (Netherlands)

Thinking about that again, the 14.2 release just happened. Could it be
just a matter of propagating new release info to mirrors?

>
> At least one person on IRC reports it 'works' for them but it seems
> there still something wrong..
>
>
> Erik Rijkers
>
>
>
>
>




Re: faulty link

2022-02-10 Thread Josef Šimánek
It works well here (Czech republic).

čt 10. 2. 2022 v 15:35 odesílatel Erik Rijkers  napsal:
>
> The provided link
>https://www.postgresql.org/docs/release/
>
> leads to
>https://www.postgresql.org/docs/release/14.2/
>
> which gives 'Not Found' for me (Netherlands)
>
>
> At least one person on IRC reports it 'works' for them but it seems
> there still something wrong..
>
>
> Erik Rijkers
>
>
>
>
>




Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Josef Šimánek
pá 21. 1. 2022 v 16:31 odesílatel Tom Lane  napsal:
>
> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > There is README.git explaining this. README itself is meant to be used
> > for distributed source code. You can generate INSTALL locally for
> > example by running make dist (INSTALL will be present in
> > postgresql-15devel directory).
>
> > Anyway I do agree this is confusing. Maybe we can actually rename
> > README.git to README and current README to README.dist or similar.
> > README.dist can be copied to distribution package as README during
> > Makefile magic.
>
> IIRC, we discussed that when README.git was invented, and concluded
> that it would just create different sorts of confusion.  I might
> be biased, as the person who is generally checking created tarballs
> for sanity, but I really do not want any situation where a file
> appearing in the tarball is different from the same-named file in
> the git tree.
>
> Perhaps it could be sane to not have *any* file named README in
> the git tree, only README.git and README.dist, with the tarball
> preparation process copying README.dist to README.  However,
> if I'm understanding what github does, that would leave us with
> no automatically-displayed documentation for the github repo.
> So I'm not sure that helps with samay's concern.

Especially for GitHub use-case it is possible to add separate readme
into .github directory. But the problem with local clone will not be
solved anyway.

>From GitHub docs:

"If you put your README file in your repository's root, docs, or
hidden .github directory, GitHub will recognize and automatically
surface your README to repository visitors."

Another solution would be to merge both README files together and make
separate section for development/git based codebase.

> regards, tom lane




Re: New developer papercut - Makefile references INSTALL

2022-01-21 Thread Josef Šimánek
pá 21. 1. 2022 v 1:29 odesílatel samay sharma  napsal:
>
>
>
> On Wed, Jan 19, 2022 at 4:58 PM Tim McNamara  wrote:
>>
>> Hello,
>>
>> I encountered a minor road bump when checking out the pg source today. The 
>> Makefile's all target includes the following help message if GNUmakefile 
>> isn't available:
>>
>>   echo "You need to run the 'configure' program first. See the file"; \
>>   echo "'INSTALL' for installation instructions." ; \
>>
>> After consulting README.git, it looks as though INSTALL isn't created unless 
>> the source is bundled into a release or snapshot tarball. I'm happy to 
>> submit a patch to update the wording, but wanted to check on the preferred 
>> approach.
>>
>> Perhaps this would be sufficient?
>>
>>   echo "You need to run the 'configure' program first. See the file"; \
>>   echo "'INSTALL' for installation instructions, or visit" ; \
>>   echo "" ; \
>>
>> -Tim
>
>
> I noticed a similar thing in the README of the github repository. It asks to 
> see the INSTALL file for build and installation instructions but I couldn't 
> find that file and that confused me. This might confuse other new developers 
> as well. So, maybe we should update the text in the README too?

There is README.git explaining this. README itself is meant to be used
for distributed source code. You can generate INSTALL locally for
example by running make dist (INSTALL will be present in
postgresql-15devel directory).

Anyway I do agree this is confusing. Maybe we can actually rename
README.git to README and current README to README.dist or similar.
README.dist can be copied to distribution package as README during
Makefile magic.

I can try to provide a patch if welcomed.

> Regards,
> Samay




Re: Add Boolean node

2021-12-28 Thread Josef Šimánek
po 27. 12. 2021 v 16:10 odesílatel Alvaro Herrera
 napsal:
>
> On 2021-Dec-27, Peter Eisentraut wrote:
>
> > This patch adds a new node type Boolean, to go alongside the "value" nodes
> > Integer, Float, String, etc.  This seems appropriate given that Boolean
> > values are a fundamental part of the system and are used a lot.
>
> I like the idea.  I'm surprised that there is no notational savings in
> the patch, however.
>
> > diff --git a/src/test/regress/expected/create_function_3.out 
> > b/src/test/regress/expected/create_function_3.out
> > index 3a4fd45147..e0c4bee893 100644
> > --- a/src/test/regress/expected/create_function_3.out
> > +++ b/src/test/regress/expected/create_function_3.out
> > @@ -403,7 +403,7 @@ SELECT pg_get_functiondef('functest_S_13'::regproc);
> >LANGUAGE sql+
> >   BEGIN ATOMIC +
> >SELECT 1;   +
> > -  SELECT false AS bool;   +
> > +  SELECT false;   +
> >   END  +
>
> Hmm, interesting side-effect: we no longer assign a column name in this
> case so it remains "?column?", just like it happens for other datatypes.
> This seems okay to me.  (This is also what causes the changes in the
> isolationtester expected output.)

This seems to be caused by a change of makeBoolAConst function. I was
thinking for a while about the potential backward compatibility
problems, but I wasn't able to find any.

> --
> Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
> "Ni aún el genio muy grande llegaría muy lejos
> si tuviera que sacarlo todo de su propio interior" (Goethe)
>
>




Re: [PATCH] Don't block HOT update by BRIN index

2021-12-01 Thread Josef Šimánek
út 30. 11. 2021 v 20:11 odesílatel Tomas Vondra
 napsal:
>
> OK,
>
> I've polished the last version of the patch a bit (added a regression
> test with update of attribute in index predicate and docs about the new
> flag into indexam.sgml) and pushed.

Thanks a lot for taking over this, improving and pushing!

> I wonder if we could/should improve handling of index predicates. In
> particular, it seems to me we could simply ignore indexes when the new
> row does not match the index predicate. For example, if there's an index
>
>CREATE INDEX ON t (a) WHERE b = 1;
>
> and the update does:
>
>UPDATE t SET b = 2 WHERE ...;
>
> then we'll not add the tuple pointer to this index anyway, and we could
> simply ignore this index when considering HOT. But I might be missing
> something important about HOT ...
>
> The main problem I see with this is it requires evaluating the index
> predicate for each tuple, which makes it incompatible with the caching
> in RelationGetIndexAttrBitmap. Just ditching the caching seems like a
> bad idea, so we'd probably have to do this in two phases:
>
> 1) Do what we do now, i.e. RelationGetIndexAttrBitmap considering all
> indexes / attributes. If this says HOT is possible, great - we're done.
>
> 2) If (1) says HOT is not possible, we need to look whether it's because
> of regular or partial index. For regular indexes it's clear, for partial
> indexes we could ignore this if the predicate evaluates to false for the
> new row.
>
> But even if such optimization is possible, it's way out of scope of this
> patch and it's not clear to me it's actually a sensible trade-off.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [RFC] building postgres with meson

2021-10-14 Thread Josef Šimánek
čt 14. 10. 2021 v 19:24 odesílatel Andres Freund  napsal:
>
> Hi,
>
> On 2021-10-14 10:29:42 -0300, Alvaro Herrera wrote:
> > On 2021-Oct-14, Josef Šimánek wrote:
> >
> > > I'm using Fedora 34 and I still see perl-Opcode.x86_64 as a separate
> > > package. Anyway it behaves differently with autoconf tools and the
> > > meson build system. Is perl disabled by default in the current build
> > > system?
>
> Hm, so it seems we should make the test separately verify that perl -M{Opcode,
> ExtUtils::Embed, ExtUtils::ParseXS} doesn't fail, so that we can fail perl
> detection with a useful message?

I can confirm "perl -MOpcode" fails. ExtUtils::Embed and
ExtUtils::ParseXS are present. Looking at the local system history of
perl-interpreter package, it seems to be installed by default on
Fedora 34. Friendly error message would be welcomed.

>
> > Yes, you have to use --with-perl in order to get it.
>
> With the meson prototype I set most optional features to "auto", except for
> LLVM, as that increases compile times noticeably.
>
> For configure we didn't/don't want to do much auto-detection, because that
> makes life harder for distributors. But meson has one switch controlling all
> features set to 'auto' and not explicitly enabled/disabled:
>   --auto-features {enabled,disabled,auto}   
> Override value of all 'auto' features (default: auto).
> so the argument doesn't apply to the same degree there. We could default
> auto-features to something else too.
>
> There were two other reasons:
>
> 1) I got tired of needing to disable zlib, readline to be able to build on
>windows.
> 2) Exercising all the dependency detection / checking seems important at this
>stage

Clear, thanks for the info.

> Greetings,
>
> Andres Freund




Re: [RFC] building postgres with meson

2021-10-14 Thread Josef Šimánek
čt 14. 10. 2021 v 15:32 odesílatel Dagfinn Ilmari Mannsåker
 napsal:
>
> Josef Šimánek  writes:
>
> > čt 14. 10. 2021 v 15:14 odesílatel Dagfinn Ilmari Mannsåker
> >  napsal:
> >>
> >> Josef Šimánek  writes:
> >>
> >> > The only problem I do have currently is auto-detection of perl. I'm
> >> > getting error related to missing "Opcode.pm". PERL is autodetected and
> >> > enabled (https://pastebin.com/xfRRrDcU).
> >>
> >> Your Perl (not PERL) installation seems to be incomplete. Opcode.pm is a
> >> core module, and should be in /usr/lib64/perl5, judging by the paths in
> >> the error message.
> >>
> >> Which OS is this?  Some Linux distributions have separate packages for
> >> the interpreter itself and the included modules, and the packages can be
> >> named confusingly.  E.g. on older Redhat/Fedora versions you have to
> >> install the 'perl-core' package to get all the modules, 'perl' is just
> >> the interpreter and the bare minimum set of strictily necessary modules.
> >>
> >> They've fixed this in recent versions (Fedora 34 and Redhat 8, IIRC), so
> >> that 'perl' gives you the hole bundle, and 'perl-interpeter' is the
> >> minimal one.
> >
> > I'm using Fedora 34 and I still see perl-Opcode.x86_64 as a separate
> > package.`
>
> Yes, it's a separate package, but the 'perl' package depends on all the
> core module packages, so installing that should fix things.  You appear
> to only have 'perl-interpreter' installed.

You're right. Installing "perl" or "perl-Opcode" manually fixes this
problem. Currently I only have "perl-interpreter" installed.

> > Anyway it behaves differently with autoconf tools and the meson build
> > system. Is perl disabled by default in the current build system?
>
> configure doesn't auto-detect any optional features, they have to be
> explicitly enabled using --with-foo switches.
>
> - ilmari




Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Josef Šimánek
čt 14. 10. 2021 v 13:17 odesílatel Gilles Darold  napsal:
>
> Hi,
>
>
> Here is a proposal to implement HIDDEN columns feature in PostgreSQL.
>
> The user defined columns are always visible in the PostgreSQL. If user
> wants to hide some column(s) from a SELECT * returned values then the
> hidden columns feature is useful. Hidden column can always be used and
> returned by explicitly referring it in the query.
>
> I agree that views are done for that or that using a SELECT * is a bad
> practice
> but sometime we could need to "technically" prevent some columns to be part
> of a star expansion and nbot be forced to use view+rules.

Just to remind here, there was recently a proposal to handle this
problem another way - provide a list of columns to skip for "star
selection" aka "SELECT * EXCEPT col1...".

https://postgrespro.com/list/id/d51371a2-f221-1cf3-4a7d-b2242d4da...@gmail.com

> For example when
> upgrading a database schema where a column have been added to a table,
> this will break any old version of the application that is using a
> SELECT * on
> this table. Being able to "hide" this column to such query will make
> migration
> easier.
>
> An other common use case for this feature is to implements temporal tables
> or row versionning. On my side I see a direct interest in Oracle to
> PostgreSQL
> migration to emulate the ROWID system column without the hassle of creating
> views, it will save lot of time.
>
> The other advantage over views is that the hidden column can still be used
> in JOIN, WHERE, ORDER BY or GROUP BY clause which is not possible otherwise.
> I don't talk about writing to complex view which would require a RULE.
>
> Hidden column is not part of the SQL standard but is implemented in all
> other
> RDBMS which is also called invisible columns [1] [2] [3] [4]. In all
> these RDBMS
> the feature is quite the same.
>
>[1] https://www.ibm.com/docs/en/db2/10.5?topic=concepts-hidden-columns
>[2] https://oracle-base.com/articles/12c/invisible-columns-12cr1
>[3]
> https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-ver15
>[4] https://dev.mysql.com/doc/refman/8.0/en/invisible-columns.html
>
>
> Here is the full description of the proposal with a patch attached that
> implements
> the feature:
>
>1) Creating hidden columns:
>
>   A column visibility attribute is added to the column definition
>   of CREATE TABLE and ALTER TABLE statements. For example:
>
>   CREATE TABLE htest1 (a bigserial HIDDEN, b text);
>
>   ALTER TABLE htest1 ADD COLUMN c integer HIDDEN;
>
>   Columns are visible by default.
>
>2) Altering column visibility attribute:
>
>   The ALTER TABLE statement can be used to change hidden columns to not
>   hidden and the opposite. Example:
>
>   ALTER TABLE htest1 ALTER COLUMN c DROP HIDDEN;
>
>3) Insert and hidden columns:
>
>   If the column list of INSERT or COPY statements is empty
>   then while expanding column list hidden columns are NOT
>   included. DEFAULT or NULL values are inserted for hidden
>   columns in this case. Hidden column should be explicitly
>   referenced in the column list of INSERT and COPY statement
>   to insert a value.
>
>   Example:
>
> -- Value 'one' is stored in column b and 1 in hidden column.
> INSERT INTO t1 VALUES ('one');
>
> -- Value 2 is stored in hidden column and 'two' in b.
> INSERT INTO htest1 (a, b) VALUES (2, 'two');
>
>4) Star expansion for SELECT * statements:
>
>   Hidden columns are not included in a column list while
>   expanding wild card '*' in the SELECT statement.
>
>   Example:
>
>   SELECT * FROM htest1;
> b
>   --
>one
>two
>
>Hidden columns are accessible when explicitly referenced
>in the query.
>
>Example:
>   SELECT f1, f2 FROM t1;
>  a  |  b
>   --+--
> 1   | one
> 2   | two
>
>5) psql extended describe lists hidden columns.
>
>postgres=# \d+ htest1
>Table "public.htest1"
> Column |  Type  | Collation | Nullable |  Default   | Visible | ...
> ++---+--++-+ ...
> a  | bigint |   | not null | nextval... | hidden  | ...
> b  | text   |   |  | | | ...
>
>6) When a column is flagged as hidden the attishidden column value of
>   table pg_attribute is set to true.
>
>7) For hidden attributes, column is_hidden of table
> information_schema.columns
>   is set to YES. By default the column is visible and the value is 'NO'.
>
> For a complete description of the feature, see chapter "Hidden columns" in
> file doc/src/sgml/ddl.sgml after applying the patch.
>
>
> The patch is a full implementation of this feture except 

Re: [RFC] building postgres with meson

2021-10-14 Thread Josef Šimánek
čt 14. 10. 2021 v 15:14 odesílatel Dagfinn Ilmari Mannsåker
 napsal:
>
> Josef Šimánek  writes:
>
> > The only problem I do have currently is auto-detection of perl. I'm
> > getting error related to missing "Opcode.pm". PERL is autodetected and
> > enabled (https://pastebin.com/xfRRrDcU).
>
> Your Perl (not PERL) installation seems to be incomplete. Opcode.pm is a
> core module, and should be in /usr/lib64/perl5, judging by the paths in
> the error message.
>
> Which OS is this?  Some Linux distributions have separate packages for
> the interpreter itself and the included modules, and the packages can be
> named confusingly.  E.g. on older Redhat/Fedora versions you have to
> install the 'perl-core' package to get all the modules, 'perl' is just
> the interpreter and the bare minimum set of strictily necessary modules.
>
> They've fixed this in recent versions (Fedora 34 and Redhat 8, IIRC), so
> that 'perl' gives you the hole bundle, and 'perl-interpeter' is the
> minimal one.

I'm using Fedora 34 and I still see perl-Opcode.x86_64 as a separate
package. Anyway it behaves differently with autoconf tools and the
meson build system. Is perl disabled by default in the current build
system?

>
> - ilmari




Re: [RFC] building postgres with meson

2021-10-13 Thread Josef Šimánek
st 13. 10. 2021 v 1:54 odesílatel Andres Freund  napsal:
>
> Hi,
>
> On 2021-10-13 01:19:27 +0200, Josef Šimánek wrote:
> > I tried to clean and start from scratch, but I'm getting different
> > error probably related to wrongly configured JIT (LLVM wasn't found
> > during meson setup). I'll debug on my side to provide more info.
>
> ../src/backend/jit/jit.c:91:73: error: ‘DLSUFFIX’ undeclared (first use in 
> this function)
>91 | snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, 
> jit_provider, DLSUFFIX);
>   |   
>   ^~~~
>
> This *very* likely is related to building in a source tree that also contains
> a "non-meson" build "in place". The problem is that the meson build picks up
> the pg_config.h generated by ./configure in the "normal" build, rather than
> the one meson generated itself.
>
> You'd need to execute make distclean or such, or use a separate git checkout.
>
> I forgot about this issue because I only ever build postgres from outside the
> source-tree (by invoking configure from a separate directory), so there's
> never build products in it. I think at least I need to make the build emit a
> warning / error if there's a pg_config.h in the source tree...

Hello, thanks for the hint. I can finally build using meson and run
regress tests.

The only problem I do have currently is auto-detection of perl. I'm
getting error related to missing "Opcode.pm". PERL is autodetected and
enabled (https://pastebin.com/xfRRrDcU).

I do get the same error when I enforce perl for current master build
(./configure --with-perl). Using ./configure with perl autodetection
skips plperl extension on my system.

Disabling perl manually for meson build (meson setup build
--reconfigure --buildtype debug -Dperl=disabled) works for me.

>
> This is the part of the jit code that's built regardless of llvm availability
> - you'd get the same error in a few other places unrelated to jit.
>
> Greetings,
>
> Andres Freund




Re: [RFC] building postgres with meson

2021-10-12 Thread Josef Šimánek
út 12. 10. 2021 v 19:17 odesílatel Andres Freund  napsal:
>
> Hi,
>
> On 2021-10-12 17:21:50 +0200, Josef Šimánek wrote:
> > > # build (uses automatically as many cores as available)
> > > ninja
> >
> > I'm getting errors at this step. You can find my output at
> > https://pastebin.com/Ar5VqfFG. Setup went well without errors. Is that
> > expected for now?
>
> Thanks, that's helpful. And no, that's not expected (*), it should be fixed.
>
> What OS / distribution / version is this?

Fedora 34 (64 bit)

> Can you build postgres "normally" with --with-gss? Seems like we're ending up
> with a version of gssapi that we're not compatible with.

Yes, I can.

> You should be able to get past this by disabling gss using meson configure
> -Dgssapi=disabled.

I tried to clean and start from scratch, but I'm getting different
error probably related to wrongly configured JIT (LLVM wasn't found
during meson setup). I'll debug on my side to provide more info.

Whole build error could be found at https://pastebin.com/hCFqcPvZ.
Setup log could be found at https://pastebin.com/wjbE1w56.

> Greetings,
>
> Andres Freund
>
> * except kinda, in the sense that I'd expect it to be buggy, given that I've
>   run it only on a few machines and it's very, uh, bleeding edge




Re: [RFC] building postgres with meson

2021-10-12 Thread Josef Šimánek
.

út 12. 10. 2021 v 10:37 odesílatel Andres Freund  napsal:
>
> Hi,
>
> For the last year or so I've on and off tinkered with $subject.  I think
> it's in a state worth sharing now.  First, let's look at a little
> comparison.
>
> My workstation:
>
> non-cached configure:
> current:11.80s
> meson:   6.67s
>
> non-cached build (world-bin):
> current:40.46s
> ninja:   7.31s
>
> no-change build:
> current: 1.17s
> ninja:   0.06s
>
> test world:
> current:  105s
> meson: 63s
>
>
> What actually started to motivate me however were the long times windows
> builds took to come back with testsresults. On CI, with the same machine
> config:
>
> build:
> current:  202s (doesn't include genbki etc)
> meson+ninja:  140s
> meson+msbuild:206s
>
>
> test:
> current: 1323s (many commands)
> meson:903s (single command)
>
> (note that the test comparison isn't quite fair - there's a few tests
> missing, but it's just small contrib ones afaik)
>
>
> The biggest difference to me however is not the speed, but how readable
> the output is.
>
> Running the tests with meson in a terminal, shows the number of tests
> that completed out of how many total, how much time has passed, how long
> the currently running tests already have been running.
>
> At the end of a testrun a count of tests is shown:
>
> 188/189 postgresql:tap+pg_basebackup / pg_basebackup/t/010_pg_basebackup.pl   
>   OK   39.51s   110 subtests passed
> 189/189 postgresql:isolation+snapshot_too_old / snapshot_too_old/isolation
>   OK   62.93s
>
>
> Ok: 188
> Expected Fail:  0
> Fail:   1
> Unexpected Pass:0
> Skipped:0
> Timeout:0
>
> Full log written to /tmp/meson/meson-logs/testlog.txt
>
>
> The log has the output of the tests and ends with:
>
> Summary of Failures:
> 120/189 postgresql:tap+recovery / recovery/t/007_sync_rep.pl  
>  ERROR 7.16s   (exit status 255 or signal 127 
> SIGinvalid)
>
>
> Quite the difference to make check-world -jnn output.
>
>
> So, now that the teasing is done, let me explain a bit what lead me down
> this path:
>
> Autoconf + make is not being actively developed. Especially autoconf is
> *barely* in maintenance mode - despite many shortcomings and bugs. It's
> also technology that very few want to use - autoconf m4 is scary, and
> it's scarier for people that started more recently than a lot of us
> committers for example.
>
> Recursive make as we use it is hard to get right. One reason the clean
> make build is so slow compared to meson is that we had to resort to
> .NOTPARALLEL to handle dependencies in a bunch of places. And despite
> that, I quite regularly see incremental build failures that can be
> resolved by retrying the build.
>
> While we have incremental build via --enable-depend, they don't work
> that reliable (i.e. misses necessary rebuilds) and yet is often too
> aggressive. More modern build system can keep track of the precise
> command used to build a target and rebuild it when that command changes.
>
>
> We also don't just have the autoconf / make buildsystem, there's also
> the msvc project generator - something most of us unix-y folks do not
> like to touch. I think that, combined with there being no easy way to
> run all tests, and it being just different, really hurt our windows
> developer appeal (and subsequently the quality of postgres on
> windows). I'm not saying this to ding the project generator - that was
> well before there were decent "meta" buildsystems out there (and in some
> ways it is a small one itself).
>
>
> The last big issue I have with the current situation is that there's no
> good test integration. make check-world output is essentially unreadable
> / not automatically parseable. Which led to the buildfarm having a
> separate list of things it needs to test, so that failures can be
> pinpointed and paired with appropriate logs. That approach unfortunately
> doesn't scale well to multi-core CPUs, slowing down the buildfarm by a
> fair bit.
>
>
> This all led to me to experiment with improvements. I tried a few
> somewhat crazy but incremental things like converting our buildsystem to
> non-recursive make (I got it to build the backend, but it's too hard to
> do manually I think), or to not run tests during the recursive make
> check-world, but to append commands to a list of tests, that then is run
> by a helper (can kinda be made to work).  In the end I concluded that
> the amount of time we'd need to invest to maintain our more-and-more
> custom buildsystem going forward doesn't make sense.
>
>
> Which lead me to look around and analyze which other buildsystems there
> are that could make some sense for us. The halfway decent list includes,
> I think:
> 1) cmake
> 2) bazel
> 3) meson
>
>
> cmake would be a decent 

Re: [PATCH] Don't block HOT update by BRIN index

2021-10-10 Thread Josef Šimánek
po 4. 10. 2021 v 16:17 odesílatel Tomas Vondra
 napsal:
>
> Hi,
>
> I took a look at this patch again to see if I can get it polished and
> fixed. Per the discussion, I've removed the rd_indexattr list and
> replaced it with a simple flag. While doing so, I noticed a couple of
> places that should have consider (init or free) rd_hotblockingattr.

Thanks for finishing this. I can confirm both patches do apply without
problems. I did some simple testing locally and everything worked as
intended.

> Patch 0001 is the v2, 0002 removes the rd_indexattr etc.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Git revision in tarballs

2021-07-15 Thread Josef Šimánek
čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander  napsal:
>
> I think it'd be useful to be able to identify exactly which git commit
> was used to produce a tarball. This would be especially useful when
> downloading snapshot tarballs where that's not entirely clear, but can
> also be used to verify that the release tarballs matches what's
> expected (in the extremely rare case that a tarball is rewrapped for
> example).
>
> What do people think of the attached?

The only problem I do see is adding "git" as a new dependency. That
can potentially cause troubles.

For the file name, I have seen GIT_VERSION or REVISION file names used
before in another projects. Using ".gitrevision" doesn't make sense to
me since it will be hidden on Unix by default and I'm not sure that is
intended.

> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/




Re: [PATCH] Don't block HOT update by BRIN index

2021-07-12 Thread Josef Šimánek
po 12. 7. 2021 v 23:15 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 7/12/21 11:02 PM, Alvaro Herrera wrote:
> > On 2021-Jul-12, Josef Šimánek wrote:
> >
> >>> 2) Do we actually need to calculate and store hotblockingattrs
> >>> separately in RelationGetIndexAttrBitmap? It seems to me it's either
> >>> NULL (with amhotblocking=false) or equal to indexattrs. So why not to
> >>> just get rid of hotblockingattr and rd_hotblockingattr, and do something
> >>> like
> >>>
> >>>   case INDEX_ATTR_BITMAP_HOT_BLOCKING:
> >>> return (amhotblocking) ? bms_copy(rel->rd_hotblockingattr) : NULL;
> >>>
> >>> I haven't tried, so maybe I'm missing something?
> >>
> >> relation->rd_indexattr is currently not used (at least I have not
> >> found anything) for anything, except looking if other values are
> >> already loaded.
> >
> > Oh, that's interesting.  What this means is that INDEX_ATTR_BITMAP_ALL
> > is no longer used; its uses must have all been replaced by something
> > else.  It seems the only one that currently exists is for HOT in
> > heap_update, which this patch replaces with the new one.  In a quick
> > search, no external code depends on it, so I'd be inclined to just
> > remove it ...
> >
> > I think a boolean is much simpler.  Consider a table with 1600 columns :-)
> >
>
> I'm not sure how to verify no external code depends on that flag. I have
> no idea if there's a plausible use case for it, though.

I tried GitHub search before to ensure at least it is not a widely
used "API". There were no results outside of PostgreSQL code itself in
first 10 pages of results.


> Even with 1600 columns the amount of wasted memory is only about 200B,
> which is not that bad I think. Not great, not terrible.
>
> OTOH most tables won't have any BRIN indexes, in which case indexattr
> and hotblockingattr are guaranteed to be exactly the same. So maybe
> that's something we could leverage - we need to calculate the "hot"
> bitmap, and in most cases we can use it for indexattr too.
>
> Maybe let's leave that for a separate patch, though?
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [PATCH] Don't block HOT update by BRIN index

2021-07-12 Thread Josef Šimánek
po 12. 7. 2021 v 22:31 odesílatel Tomas Vondra
 napsal:
>
> On 6/30/21 1:43 AM, Josef Šimánek wrote:
> > st 30. 6. 2021 v 1:20 odesílatel Tomas Vondra
> >  napsal:
> >>
> >>
> >>
> >> On 6/30/21 12:53 AM, Josef Šimánek wrote:
> >>> st 30. 6. 2021 v 0:31 odesílatel Josef Šimánek  
> >>> napsal:
> >>>>
> >>>> Hello!
> >>>>
> >>>> Tomáš Vondra has shared a few ideas to improve BRIN index in czech
> >>>> PostgreSQL mail list some time ago [1 , in czech only]. This is first
> >>>> try to implement one of those ideas.
> >>>>
> >>>> Currently BRIN index blocks HOT update even it is not linked tuples
> >>>> directly. I'm attaching the initial patch allowing HOT update even on
> >>>> BRIN indexed columns. This patch went through an initial review on
> >>>> czech PostgreSQL mail list [1].
> >>>
> >>> I just found out current patch is breaking partial-index isolation
> >>> test. I'm looking into this problem.
> >>>
> >>
> >> The problem is in RelationGetIndexAttrBitmap - the existing code first
> >> walks indnatts, and builds the indexattrs / hotblockingattrs. But then
> >> it also inspects expressions and the predicate (by pull_varattnos), and
> >> the patch fails to do that for hotblockingattrs. Which is why it fails
> >> for partial-index, because that uses an index with a predicate.
> >>
> >> So there needs to be something like:
> >>
> >>  if (indexDesc->rd_indam->amhotblocking)
> >>  pull_varattnos(indexExpressions, 1, );
> >>
> >>  if (indexDesc->rd_indam->amhotblocking)
> >>  pull_varattnos(indexPredicate, 1, );
> >>
> >> This fixes the failure for me.
> >
> > Thanks for the hint. I'm attaching a fixed standalone patch.
> >
>
> Thanks, this version seems to be working fine and passes check-world. So
> I did another round of review, and all I have are some simple comments:
>
>
> 1) naming stuff (this is very subjective, feel free to disagree)
>
> I wonder if we should rename 'amhotblocking' to 'amblockshot' which
> seems more natural to me?
>
> Similarly, maybe rename rd_hotblockingattr to rd_hotattr

OK, I wasn't sure about the naming.

>
> 2) Do we actually need to calculate and store hotblockingattrs
> separately in RelationGetIndexAttrBitmap? It seems to me it's either
> NULL (with amhotblocking=false) or equal to indexattrs. So why not to
> just get rid of hotblockingattr and rd_hotblockingattr, and do something
> like
>
>   case INDEX_ATTR_BITMAP_HOT_BLOCKING:
> return (amhotblocking) ? bms_copy(rel->rd_hotblockingattr) : NULL;
>
> I haven't tried, so maybe I'm missing something?

relation->rd_indexattr is currently not used (at least I have not
found anything) for anything, except looking if other values are
already loaded.

/* Quick exit if we already computed the result. */
if (relation->rd_indexattr != NULL)

I think it could be replaced with boolean to make it clear other
values (rd_keyattr, rd_pkattr, rd_idattr, rd_hotblockingattr) are
already loaded.
>
> 3) The patch should update indexam.sgml with description of the new
> field, amhotblocking or how it'll end up named.

I'll do.

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [PATCH] Don't block HOT update by BRIN index

2021-06-29 Thread Josef Šimánek
st 30. 6. 2021 v 1:20 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 6/30/21 12:53 AM, Josef Šimánek wrote:
> > st 30. 6. 2021 v 0:31 odesílatel Josef Šimánek  
> > napsal:
> >>
> >> Hello!
> >>
> >> Tomáš Vondra has shared a few ideas to improve BRIN index in czech
> >> PostgreSQL mail list some time ago [1 , in czech only]. This is first
> >> try to implement one of those ideas.
> >>
> >> Currently BRIN index blocks HOT update even it is not linked tuples
> >> directly. I'm attaching the initial patch allowing HOT update even on
> >> BRIN indexed columns. This patch went through an initial review on
> >> czech PostgreSQL mail list [1].
> >
> > I just found out current patch is breaking partial-index isolation
> > test. I'm looking into this problem.
> >
>
> The problem is in RelationGetIndexAttrBitmap - the existing code first
> walks indnatts, and builds the indexattrs / hotblockingattrs. But then
> it also inspects expressions and the predicate (by pull_varattnos), and
> the patch fails to do that for hotblockingattrs. Which is why it fails
> for partial-index, because that uses an index with a predicate.
>
> So there needs to be something like:
>
>  if (indexDesc->rd_indam->amhotblocking)
>  pull_varattnos(indexExpressions, 1, );
>
>  if (indexDesc->rd_indam->amhotblocking)
>  pull_varattnos(indexPredicate, 1, );
>
> This fixes the failure for me.

Thanks for the hint. I'm attaching a fixed standalone patch.

> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
From a997caa436757000970e2794cb44901b9e7cf6d8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Sun, 13 Jun 2021 20:12:40 +0200
Subject: [PATCH 1/2] Allow HOT for BRIN indexed columns.

---
 src/backend/access/brin/brin.c|  1 +
 src/backend/access/gin/ginutil.c  |  1 +
 src/backend/access/gist/gist.c|  1 +
 src/backend/access/hash/hash.c|  1 +
 src/backend/access/heap/heapam.c  |  2 +-
 src/backend/access/nbtree/nbtree.c|  1 +
 src/backend/access/spgist/spgutils.c  |  1 +
 src/backend/utils/cache/relcache.c| 14 ++
 src/include/access/amapi.h|  2 +
 src/include/utils/rel.h   |  1 +
 src/include/utils/relcache.h  |  3 +-
 .../modules/dummy_index_am/dummy_index_am.c   |  1 +
 src/test/regress/expected/brin.out| 49 +++
 src/test/regress/sql/brin.sql | 46 +
 14 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index ccc9fa0959a9..f521bb963567 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -108,6 +108,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amhotblocking = false;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_CLEANUP;
 	amroutine->amkeytype = InvalidOid;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index cdd626ff0a44..878a041be073 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -56,6 +56,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = true;
+	amroutine->amhotblocking = true;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_CLEANUP;
 	amroutine->amkeytype = InvalidOid;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0683f42c2588..d96ce1c0a99b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -77,6 +77,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = true;
 	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amhotblocking = true;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
 	amroutine->amkeytype = InvalidOid;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0752fb38a924..b30b255e021c 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -74,6 +74,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amhotblocking = true;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_BULKDEL;
 	amrout

Re: [PATCH] Don't block HOT update by BRIN index

2021-06-29 Thread Josef Šimánek
st 30. 6. 2021 v 0:31 odesílatel Josef Šimánek  napsal:
>
> Hello!
>
> Tomáš Vondra has shared a few ideas to improve BRIN index in czech
> PostgreSQL mail list some time ago [1 , in czech only]. This is first
> try to implement one of those ideas.
>
> Currently BRIN index blocks HOT update even it is not linked tuples
> directly. I'm attaching the initial patch allowing HOT update even on
> BRIN indexed columns. This patch went through an initial review on
> czech PostgreSQL mail list [1].

I just found out current patch is breaking partial-index isolation
test. I'm looking into this problem.

> It can be viewed online (latest version) on GitHub [2] as well.
>
> - small overview
>
> 1. I have added "amhotblocking" flag to index AM descriptor set to
> "true" for all, except BRIN, index types. And later in heap_update
> method (heapam.c) I do filter attributes based on this new flag,
> instead of currently checking for any existing index.
>
> 2. I had to enhance the "RelationGetIndexAttrBitmap" function to be
> able to return a bitmap of index attribute numbers related to the new
> AM flag using "INDEX_ATTR_BITMAP_HOT_BLOCKING" filter.
> PS: Originally the "INDEX_ATTR_BITMAP_ALL" filter was used for HOT
> check update and most likely could be removed (including all logic
> related in RelationGetIndexAttrBitmap), since I have not found any
> other usage.
>
> 3. I have created an initial regression test using
> "pg_stat_get_tuples_hot_updated" to find out HOT was successful on the
> BRIN indexed column. Unfortunately "pg_stat_get_tuples_hot_updated" is
> not updated immediately and I have not found any way to enforce the
> update. Thus (at least for now) I have used a similar approach to
> stats.sql using the "wait_for_stats" function (waiting for 30 seconds
> and checking each 100ms for change).
>
> I'm attaching patch to CF 2021-07.
>
> [1] https://groups.google.com/g/postgresql-cz/c/oxA_v3H17Qg
> [2] https://github.com/simi/postgres/pull/7




[PATCH] Don't block HOT update by BRIN index

2021-06-29 Thread Josef Šimánek
Hello!

Tomáš Vondra has shared a few ideas to improve BRIN index in czech
PostgreSQL mail list some time ago [1 , in czech only]. This is first
try to implement one of those ideas.

Currently BRIN index blocks HOT update even it is not linked tuples
directly. I'm attaching the initial patch allowing HOT update even on
BRIN indexed columns. This patch went through an initial review on
czech PostgreSQL mail list [1].

It can be viewed online (latest version) on GitHub [2] as well.

- small overview

1. I have added "amhotblocking" flag to index AM descriptor set to
"true" for all, except BRIN, index types. And later in heap_update
method (heapam.c) I do filter attributes based on this new flag,
instead of currently checking for any existing index.

2. I had to enhance the "RelationGetIndexAttrBitmap" function to be
able to return a bitmap of index attribute numbers related to the new
AM flag using "INDEX_ATTR_BITMAP_HOT_BLOCKING" filter.
PS: Originally the "INDEX_ATTR_BITMAP_ALL" filter was used for HOT
check update and most likely could be removed (including all logic
related in RelationGetIndexAttrBitmap), since I have not found any
other usage.

3. I have created an initial regression test using
"pg_stat_get_tuples_hot_updated" to find out HOT was successful on the
BRIN indexed column. Unfortunately "pg_stat_get_tuples_hot_updated" is
not updated immediately and I have not found any way to enforce the
update. Thus (at least for now) I have used a similar approach to
stats.sql using the "wait_for_stats" function (waiting for 30 seconds
and checking each 100ms for change).

I'm attaching patch to CF 2021-07.

[1] https://groups.google.com/g/postgresql-cz/c/oxA_v3H17Qg
[2] https://github.com/simi/postgres/pull/7
From a997caa436757000970e2794cb44901b9e7cf6d8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Sun, 13 Jun 2021 20:12:40 +0200
Subject: [PATCH] Allow HOT for BRIN indexed columns.

---
 src/backend/access/brin/brin.c|  1 +
 src/backend/access/gin/ginutil.c  |  1 +
 src/backend/access/gist/gist.c|  1 +
 src/backend/access/hash/hash.c|  1 +
 src/backend/access/heap/heapam.c  |  2 +-
 src/backend/access/nbtree/nbtree.c|  1 +
 src/backend/access/spgist/spgutils.c  |  1 +
 src/backend/utils/cache/relcache.c| 14 ++
 src/include/access/amapi.h|  2 +
 src/include/utils/rel.h   |  1 +
 src/include/utils/relcache.h  |  3 +-
 .../modules/dummy_index_am/dummy_index_am.c   |  1 +
 src/test/regress/expected/brin.out| 49 +++
 src/test/regress/sql/brin.sql | 46 +
 14 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index ccc9fa0959a9..f521bb963567 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -108,6 +108,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amhotblocking = false;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_CLEANUP;
 	amroutine->amkeytype = InvalidOid;
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index cdd626ff0a44..878a041be073 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -56,6 +56,7 @@ ginhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = true;
+	amroutine->amhotblocking = true;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_CLEANUP;
 	amroutine->amkeytype = InvalidOid;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0683f42c2588..d96ce1c0a99b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -77,6 +77,7 @@ gisthandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = true;
 	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amhotblocking = true;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
 	amroutine->amkeytype = InvalidOid;
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0752fb38a924..b30b255e021c 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -74,6 +74,7 @@ hashhandler(PG_FUNCTION_ARGS)
 	amroutine->amcanparallel = false;
 	amroutine->amcaninclude = false;
 	amroutine->amusemaintenanceworkmem = false;
+	amroutine->amhotblocking = true;
 	amroutine->amparallelvacuumoptions =
 		VACUUM_OPTION_PARALLEL_BULKDEL;
 	amroutine->amkeytype = INT4OID;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 

Re: Improvements and additions to COPY progress reporting

2021-03-09 Thread Josef Šimánek
út 9. 3. 2021 v 6:34 odesílatel Michael Paquier  napsal:
>
> On Mon, Mar 08, 2021 at 05:33:40PM +0100, Matthias van de Meent wrote:
> > Seems reasonable. PFA updated patches. I've renamed the previous 0003
> > to 0002 to keep git-format-patch easy.
>
> Thanks for updating the patch.  0001 has been applied, after tweaking
> a bit comments, indentation and the docs.
>
> > This is keeping current behaviour of the implementation as committed
> > with 8a4f618e, with the rationale of that patch being that this number
> > should mirror the number returned by the copy command.
> >
> > I am not opposed to adding another column for `tuples_inserted` and
> > changing the logic accordingly (see prototype 0003), but that was not
> > in the intended scope of this patchset. Unless you think that this
> > should be included in this current patchset, I'll spin that patch out
> > into a different thread, but I'm not sure that would make it into
> > pg14.
>
> Okay, point taken.  If there is demand for it in the future, we could
> extend the existing set of columns.  After thinking more about it the
> usecase if not completely clear to me from a monitoring point of
> view.
>
> I have not looked at 0002 in details yet, but I am wondering first if
> the size estimations in the expected output are actually portable.
> Second, I doubt a bit that the extra cycles spent on that are actually
> worth the coverage, even if the trick with an AFTER INSERT trigger is
> interesting.

Those extra cycles are in there to cover at least parts of the COPY
progress from being accidentally broken. I have seen various patches
modifying COPY command being currently in progress. It would be nice
to ensure at least basic functionality works well in an automated way.
On my machine there is no huge overhead added by adding those tests
(they finish almost instantly).

> --
> Michael




Re: Improvements and additions to COPY progress reporting

2021-03-04 Thread Josef Šimánek
čt 4. 3. 2021 v 12:32 odesílatel Matthias van de Meent
 napsal:
>
> On Thu, 4 Mar 2021 at 11:38, Michael Paquier  wrote:
> >
> > On Thu, Mar 04, 2021 at 12:16:17PM +0530, Bharath Rupireddy wrote:
> > > IMO, the phrasing proposed by Justin upthread looks good. It's like this:
> > >
> > > > +Each backend running ANALYZE will report its 
> > > > progress in
> > > > +the pg_stat_progress_analyze view. See
> >
> > No objections to just go with that.  As a new patch set is needed, I
> > am switching the CF entry to "Waiting on Author".
>
> Thanks for all your comments, and sorry for the delayed response.
> Please find attached a new version of the patch set, that is rebased
> and contains the requested changes:
>
> 1/3:
> Docs:
> - on which the COPY command is executed
> + on which the COPY command is being executed
> Reworded existing commment:
> - /* Increment amount of processed tuples and update the progress */
> + /* Increment the number of processed tuples, and report the progress */
>
> 2/3:
> Docs:
> - ... report its progress to ...
> + ... report its progress in ...
> - report its progress to the >pg_stat_progress_cluster< ...
> + report its progress in the >pg_stat_progress_cluster< view ...
>
> 3/3:
> No changes
>
> I believe that that was the extent of the not-yet-resolved comments
> and suggestions.

LGTM, special thanks for taking over the work on initial COPY progress
regression tests.

>
> With regards,
>
> Matthias van de Meent.




Re: Improvements and additions to COPY progress reporting

2021-02-20 Thread Josef Šimánek
so 20. 2. 2021 v 7:09 odesílatel Bharath Rupireddy
 napsal:
>
> On Fri, Feb 19, 2021 at 2:34 AM Tomas Vondra
>  wrote:
> > > On Mon, 15 Feb 2021 at 17:07, Tomas Vondra
> > >  wrote:
> > >>
> > >> - The blocks in copyfrom.cc/copyto.c should be reworked - I don't think
> > >> we do this in our codebase.
> > >
> > > I saw this being used in (re)index progress reporting, that's where I
> > > took inspiration from. It has been fixed in the attached version.
> > >
> >
> > Hmmm, good point. I haven't looked at the other places reporting
> > progress and I only ever saw this pattern in old code. I kinda dislike
> > these blocks, but admittedly that's rather subjective view. So if other
> > similar places do this when reporting progress, this probably should
> > too. What's your opinion on this?
>
> Actually in the code base the style of that variable declaration and
> usage of pgstat_progress_update_multi_param is a mix. For instance, in
> lazy_scan_heap, ReindexRelationConcurrently, the variables are
> declared at the start of the function. And in _bt_spools_heapscan,
> index_build, validate_index, perform_base_backup, the variables are
> declared within a separate block.
>
> IMO, we can have the arrays declared at the start of the functions
> i.e. the way it's done in v8-0001, because we can extend them for
> reporting some other parameter(maybe in future).
>
> > >> - I fir the "io_target" name misleading, because in some cases it's
> > >> actually the *source*.
> > >
> > > Yes, I was also not quite happy with this, but couldn't find a better
> > > one at the point of writing the initial patchset. Would
> > > "io_operations", "io_port", "operates_through" or "through" maybe be
> > > better?
> > >
> >
> > No idea. Let's see if someone has a better proposal ...
>
>  For COPY TO the name "source_type" column and for COPY FROM the name
> "destination_type" makes sense. To have a combined column name for
> both, how about naming that column as "io_type"?

+1 on "io_type", that is my best candidate as well

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com




Re: Improvements and additions to COPY progress reporting

2021-02-11 Thread Josef Šimánek
čt 11. 2. 2021 v 15:27 odesílatel Matthias van de Meent
 napsal:
>
> On Wed, 10 Feb 2021 at 07:43, Bharath Rupireddy
>  wrote:
> >
> > On Tue, Feb 9, 2021 at 6:02 PM Matthias van de Meent
> >  wrote:
> > > > Also, you can add this to the current commitfest.
> > >
> > > See https://commitfest.postgresql.org/32/2977/
> > >
> > > On Tue, 9 Feb 2021 at 12:53, Josef Šimánek  
> > > wrote:
> > > >
> > > > OK, would you mind to integrate my regression test initial patch as
> > > > well in v3 or should I submit it later in a separate way?
> > >
> > > Attached, with minor fixes
> >
> > Why do we need to have a new test file progress.sql for the test
> > cases? Can't we add them into existing copy.sql or copy2.sql? Or do
> > you have a plan to add test cases into progress.sql for other progress
> > reporting commands?
>
> I don't mind moving the test into copy or copy2, but the main reason
> to put it in a seperate file is to test the 'copy' component of the
> feature called 'progress reporting'. If the feature instead is 'copy'
> and 'progress reporting' is part of that feature, then I'd put it in
> the copy-tests, but because the documentation of this has it's own
> docs page  'progress reporting', and because 'copy' is a subsection of
> that, I do think that this feature warrants its own regression test
> file.
>
> There are no other tests for the progress reporting feature yet,
> because COPY ... FROM is the only command that is progress reported
> _and_ that can fire triggers while running the command, so checking
> the progress view during the progress reported command is only
> feasable in COPY progress reporting. To test the other progress
> reporting views, we would need multiple sessions, which I believe is
> impossible in this test format. Please correct me if I'm wrong; I'd
> love to add tests for the other components. That will not be in this
> patchset, though.
>
> > IMO, it's better not add any new test file but add the tests to existing 
> > files.
>
> In general I agree, but in some cases (e.g. new system component, new
> full-fledged feature), new test files are needed. I think that this
> could be one of those cases.

I have split it since it should be the start of progress reporting
testing at all. If you better consider this as part of COPY testing,
feel free to move it to already existing copy testing related files.
There's no real reason to keep it separated if not needed.

>
> With regards,
>
> Matthias van de Meent




Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread Josef Šimánek
út 9. 2. 2021 v 12:51 odesílatel 0010203112132233  napsal:
>
> On Tue, 9 Feb 2021 at 09:32, Josef Šimánek  wrote:
> >
> > po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
> >  napsal:
> > > Lastly, 0005 adds 'io_target' to the reported information, that is,
> > > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> > > be determined based on the commands in pg_stat_activity, it is
> > > reasonably something that a user would want to query on, as the
> > > origin/target of COPY has security and performance implications,
> > > whereas other options (e.g. format) are less interesting for clients
> > > that are not executing that specific COPY command.
> >
> > I took a little deeper look and I'm not sure if I understand FILE and
> > STDIO. I have finally tried to finalize some initial regress testing
> > of COPY command progress using triggers. I have attached the initial
> > patch  applicable to your changes. As you can see COPY FROM STDIN is
> > reported as FILE. That's probably expected, but it is a little
> > confusing for me since STDIN and STDIO sound similar. What is the
> > purpose of STDIO? When is the COPY command reported with io_target of
> > STDIO?
>
> I checked for the type of the copy_src before it was correctly set,
> therefore only reporting FILE type, but this will be fixed shortly in
> v3.

OK, would you mind to integrate my regression test initial patch as
well in v3 or should I submit it later in a separate way?

> Matthias




Re: Improvements and additions to COPY progress reporting

2021-02-09 Thread Josef Šimánek
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
 napsal:
>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].
>
> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").
>
> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.
>
> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.

I took a little deeper look and I'm not sure if I understand FILE and
STDIO. I have finally tried to finalize some initial regress testing
of COPY command progress using triggers. I have attached the initial
patch  applicable to your changes. As you can see COPY FROM STDIN is
reported as FILE. That's probably expected, but it is a little
confusing for me since STDIN and STDIO sound similar. What is the
purpose of STDIO? When is the COPY command reported with io_target of
STDIO?

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.
>
>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] 
> https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
From d91fbe3a80c1858e9cbc126e4ade44945cea9ae9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 9 Feb 2021 09:27:21 +0100
Subject: [PATCH] Add initial regress test for COPY progress reporting.

---
 src/test/regress/expected/progress.out | 43 ++
 src/test/regress/parallel_schedule |  2 +-
 src/test/regress/sql/progress.sql  | 39 +++
 3 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/progress.out
 create mode 100644 src/test/regress/sql/progress.sql

diff --git a/src/test/regress/expected/progress.out b/src/test/regress/expected/progress.out
new file mode 100644
index ..2c745b946366
--- /dev/null
+++ b/src/test/regress/expected/progress.out
@@ -0,0 +1,43 @@
+-- setup for COPY progress testing
+CREATE TEMP TABLE test_progress_with_trigger (
+  a int,
+  b text
+) ;
+CREATE OR REPLACE function notice_after_progress_reporting() RETURNS trigger AS
+$$
+DECLARE report record;
+BEGIN
+  SELECT INTO report * FROM pg_stat_progress_copy report WHERE pid = pg_backend_pid();
+  raise info 'progress datname: %', report.datname;
+  raise info 'progress command: %', report.command;
+  

Re: Improvements and additions to COPY progress reporting

2021-02-08 Thread Josef Šimánek
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent
 napsal:
>
> Hi,
>
> With [0] we got COPY progress reporting. Before the column names of
> this newly added view are effectively set in stone with the release of
> pg14, I propose the following set of relatively small patches. These
> are v2, because it is a patchset that is based on a set of patches
> that I previously posted in [0].

Hello. I had this in my backlog to revisit this feature as well before
the release. Thanks for picking this up.

> 0001 Adds a column to pg_stat_progress_copy which details the amount
> of tuples that were excluded from insertion by the WHERE clause of the
> COPY FROM command.
>
> 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead
> of 'line'-terminology. 'Line' doesn't make sense in the binary copy
> case, and only for the 'text' copy format there can be a guarantee
> that the source / output file actually contains the reported amount of
> lines, whereas the amount of data tuples (which is also what it's
> called internally) is guaranteed to equal for all data types.
>
> There was some discussion about this in [0] where the author thought
> 'line' is more consistent with the CSV documentation, and where I
> argued that 'tuple' is both more consistent with the rest of the
> progress reporting tables and more consistent with the actual counted
> items: these are the tuples serialized / inserted (as noted in the CSV
> docs; "Thus the files are not strictly one line per table row like
> text-format files.").

As an mentioned author I have no preference over line or tuple
terminology here. For some cases "line" terminology fits better, for
some "tuple" one. Docs can be improved later if needed to make it
clear at some cases (for example in most common case probably - CSV
import/export) tuple equals one line in CSV.

> Patch 0003 adds backlinks to the progress reporting docs from the docs
> of the commands that have progress reporting (re/index, cluster,
> vacuum, etc.) such that progress reporting is better discoverable from
> the relevant commands, and removes the datname column from the
> progress_copy view (that column was never committed). This too should
> be fairly trivial and uncontroversial.
>
> 0004 adds the 'command' column to the progress_copy view; which
> distinguishes between COPY FROM and COPY TO. The two commands are (in
> my opinion) significantly different enough to warrant this column;
> similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY]
> which also report that information. I believe that this change is
> appropriate; as the semantics of the columns change depending on the
> command being executed.

This was part of my initial patch as well, but I decided to strip it
out to make the final patch as small as possible to make it quickly
mergeable without need of further discussion. From my side this is
useful to have directly in the progress report as well.

> Lastly, 0005 adds 'io_target' to the reported information, that is,
> FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily
> be determined based on the commands in pg_stat_activity, it is
> reasonably something that a user would want to query on, as the
> origin/target of COPY has security and performance implications,
> whereas other options (e.g. format) are less interesting for clients
> that are not executing that specific COPY command.

Similar (simplified, not supporting CALLBACK) info was also part of
the initial patch and stripped out later. I'm also +1 on this info
being useful to have directly in the progress report.

> Of special interest in 0005 is that it reports the io_target for the
> logical replications' initial tablesyncs' internal COPY. This would
> otherwise be measured, but no knowledge about the type of copy (or its
> origin) would be available on the worker's side. I'm not married to
> this patch 0005, but I believe it could be useful, and therefore
> included it in the patchset.

All patches seem good to me. I was able to apply them to current clean
master and "make check" has succeeded without problems.

>
> With regards,
>
> Matthias van de Meent.
>
>
> [0] 
> https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com




Re: proposal: schema variables

2021-01-14 Thread Josef Šimánek
I did some testing locally. All runs smoothly, compiled without warning.

Later on (once merged) it would be nice to write down a documentation
page for the whole feature as a set next to documented individual
commands.
It took me a few moments to understand how this works.

I was looking how to create variable with non default value in one
command, but if I understand it correctly, that's not the purpose.
Variable lives in a schema with default value, but you can set it per
session via LET.

Thus "CREATE VARIABLE" statement should not be usually part of
"classic" queries, but it should be threatened more like TABLE or
other DDL statements.

On the other side LET is there to use the variable in "classic" queries.

Does that make sense? Feel free to ping me if any help with
documentation would be needed. I can try to prepare an initial
variables guide once I'll ensure I do  understand this feature well.

PS: Now it is clear to me why it is called "schema variables", not
"session variables".

čt 14. 1. 2021 v 7:36 odesílatel Pavel Stehule  napsal:
>
> Hi
>
> rebase
>
> Regards
>
> Pavel
>




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
pá 8. 1. 2021 v 5:03 odesílatel Amit Kapila  napsal:
>
> On Fri, Jan 8, 2021 at 8:42 AM Josef Šimánek  wrote:
> >
> > pá 8. 1. 2021 v 3:55 odesílatel Amit Kapila  
> > napsal:
> > >
> > >
> > > Can't we display the entire COPY command? I checked that
> > > pg_stat_statements display the query so there shouldn't be a problem
> > > to display the entire command.
> >
> > In previous discussions there was mentioned it doesn't make sense
> > since you can join with pg_stat_statements on the pid column if
> > needed. What would be the reason to duplicate that info?
> >
>
> But pg_stat_staments won't be present by default. Also, the same
> argument could be applied for the command to be present in
> stat_progress views. It occurred to me only when I was trying to
> compare what we display in all the progress views. I think there is
> some merit in being consistent.

Sorry, I mean pg_stat_activity (not pg_stat_statements). That is
included by default (at least in my installations).

> --
> With Regards,
> Amit Kapila.




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
pá 8. 1. 2021 v 3:55 odesílatel Amit Kapila  napsal:
>
> On Thu, Jan 7, 2021 at 7:02 PM Josef Šimánek  wrote:
> >
> > čt 7. 1. 2021 v 14:08 odesílatel Amit Kapila  
> > napsal:
> > >
> > > On Thu, Jan 7, 2021 at 3:15 AM Tomas Vondra
> > >  wrote:
> > > >
> > > > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > > > I'm attaching the whole patch since commitfest failed to ingest the
> > > > > last incremental on CI.
> > > > >
> > > >
> > > > Yeah, the whole patch needs to be attached for the commitfest tester to
> > > > work correctly - it can't apply pieces from multiple messages, etc.
> > > >
> > > > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > > > mainly to the docs - one place used pg_stat_copy_progress, the section
> > > > was not indexed properly, and so on.
> > > >
> > >
> > > How about including a column for command similar to
> > > pg_stat_progress_create_index and pg_stat_progress_cluster? It seems
> > > that command will be useful in the context of COPY as there are many
> > > variants of COPY.
> > >
> >
> > From pg_stat_progress_create_index docs:
> >
> > The command that is running: CREATE INDEX, CREATE INDEX CONCURRENTLY,
> > REINDEX, or REINDEX CONCURRENTLY.
> >
> > Which values should be present in copy progress? Just COPY FROM or COPY TO?
> >
>
> Can't we display the entire COPY command? I checked that
> pg_stat_statements display the query so there shouldn't be a problem
> to display the entire command.

In previous discussions there was mentioned it doesn't make sense
since you can join with pg_stat_statements on the pid column if
needed. What would be the reason to duplicate that info?

> --
> With Regards,
> Amit Kapila.




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 1/7/21 7:56 PM, Josef Šimánek wrote:
> > čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
> >  napsal:
> >>
> >> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra  
> >> wrote:
> >>>
> >>> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> >>>> I'm attaching the whole patch since commitfest failed to ingest the
> >>>> last incremental on CI.
> >>>>
> >>>
> >>> Yeah, the whole patch needs to be attached for the commitfest tester to
> >>> work correctly - it can't apply pieces from multiple messages, etc.
> >>>
> >>> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >>> mainly to the docs - one place used pg_stat_copy_progress, the section
> >>> was not indexed properly, and so on.
> >>
> >> Thank you all, I'd love to use this in the future to keep track of
> >> (e.g.) my backup/restore progress.
> >>
> >> For my previously mentioned extension to keep track of filtered tuples
> >> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
> >> of that, in line with the current column name style of lines.
> >
> > If I understand it well, this column could be used on COPY TO to track
> > skipped lines because of BEFORE TRIGGER, right? I can include this in
> > my following patch keeping lines_processed incremented even for
> > skipped lines as well.
> >
>
> That's my understanding too.
>
> >> If so desired, I'll split this off into a different thread & CF entry.
> >>
> >>> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >>> message after pushing, but I probably wouldn't make that change anyway.
> >>> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >>> If not, we can change that.
> >>
> >> The CSV docs, sure. But copy doesn't only process CSVs; it also has
> >> text (which does have a # lines = # tuples / rows guarantee) and
> >> binary (in which the 'line' vocabulary doesn't make sense, and in
> >> which the 'tuples' vocabulary is used). Additionally, most mentions of
> >> postgres' logical rows/tuples in the COPY documentation use the 'rows'
> >> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
> >> external format's textual representation's strings delimited by
> >> newlines (which I believe is not exactly what we're counting).
> >>
> >> One common user of COPY is the pg_dump tool, and that uses binary
> >> format by default (iirc).
> >>
> >> Additionally, all comments surrounding the *LINES_PROCESSED updates
> >> only mention 'tuples', so I'd like to strongly suggest (a variant of)
> >> attached patch 0002 to keep the vocabulary consistent by using
> >> 'tuples' instead of 'lines'.
> >>
>
> I'm not particularly attached to the "lines" naming, it just seemed OK
> to me. So if there's consensus to rename this somehow, I'm OK with it.

The problem I do see here is it depends on the "way" of COPY. If
you're copying from CSV file to table, those are actually lines (since
1 line = 1 tuple). But copying from DB to file is copying tuples (but
1 tuple = 1 file line). Line works better here for me personally.

Once I'll fix the problem with triggers (and also another cases if
found), I think we can consider it lines. It will represent amount of
lines processed from file on COPY FROM and amount of lines written to
file in COPY TO form (at least in CSV format). I'm not sure how BINARY
format works, I'll check.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
 napsal:
>
> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra  
> wrote:
> >
> > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > I'm attaching the whole patch since commitfest failed to ingest the
> > > last incremental on CI.
> > >
> >
> > Yeah, the whole patch needs to be attached for the commitfest tester to
> > work correctly - it can't apply pieces from multiple messages, etc.
> >
> > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > mainly to the docs - one place used pg_stat_copy_progress, the section
> > was not indexed properly, and so on.
>
> Thank you all, I'd love to use this in the future to keep track of
> (e.g.) my backup/restore progress.
>
> For my previously mentioned extension to keep track of filtered tuples
> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
> of that, in line with the current column name style of lines.

If I understand it well, this column could be used on COPY TO to track
skipped lines because of BEFORE TRIGGER, right? I can include this in
my following patch keeping lines_processed incremented even for
skipped lines as well.

> If so desired, I'll split this off into a different thread & CF entry.
>
> > I see Matthias proposed to change "lines" to "tuples" - I only saw the
> > message after pushing, but I probably wouldn't make that change anyway.
> > The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> > If not, we can change that.
>
> The CSV docs, sure. But copy doesn't only process CSVs; it also has
> text (which does have a # lines = # tuples / rows guarantee) and
> binary (in which the 'line' vocabulary doesn't make sense, and in
> which the 'tuples' vocabulary is used). Additionally, most mentions of
> postgres' logical rows/tuples in the COPY documentation use the 'rows'
> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
> external format's textual representation's strings delimited by
> newlines (which I believe is not exactly what we're counting).
>
> One common user of COPY is the pg_dump tool, and that uses binary
> format by default (iirc).
>
> Additionally, all comments surrounding the *LINES_PROCESSED updates
> only mention 'tuples', so I'd like to strongly suggest (a variant of)
> attached patch 0002 to keep the vocabulary consistent by using
> 'tuples' instead of 'lines'.
>
>
> With regards,
>
> Matthias van de Meent




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 16:54 odesílatel Tomas Vondra
 napsal:
>
>
>
> On 1/7/21 12:06 PM, Josef Šimánek wrote:
> > st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
> >  napsal:
> >>
> >> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> >>> I'm attaching the whole patch since commitfest failed to ingest the
> >>> last incremental on CI.
> >>>
> >>
> >> Yeah, the whole patch needs to be attached for the commitfest tester to
> >> work correctly - it can't apply pieces from multiple messages, etc.
> >>
> >> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >> mainly to the docs - one place used pg_stat_copy_progress, the section
> >> was not indexed properly, and so on.
> >>
> >> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >> message after pushing, but I probably wouldn't make that change anyway.
> >> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >> If not, we can change that.
> >>
> >> One more question, though - I now realize the lines_processed ignores
> >> rows skipped because of BEFORE INSERT triggers. I wonder if that's the
> >> right thing to do? Imagine you know the number of lines in a file. You
> >> can't really use (lines_processed / total_lines) to measure progress,
> >> because that may ignore many "skipped" rows. So maybe this should be
> >> changed to count all rows. OTOH we still have bytes_processed.
> >
> > I think that should be fixed. It is called "lines_processed" not
> > "lines_inserted". I'll take a look.
> >
>
> So we may either rename the column to "lines_inserted", or tweak the
> code to count all processed lines. Or track both and have two columns.

First I'll ensure lines_processed represents the actual amount of
processed lines. If reading from file and some lines are skipped due
to before insert trigger, I consider that one processed as well, even
if it is not inserted. If welcomed, I can add lines_inserted later as
well. But I'm not sure about the use case.

Also thanks to mentioning triggers, I think those could be used to
test the COPY progress (at least some variants). I'll check if I would
be able to add some test cases as well.

> regarss
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
čt 7. 1. 2021 v 14:08 odesílatel Amit Kapila  napsal:
>
> On Thu, Jan 7, 2021 at 3:15 AM Tomas Vondra
>  wrote:
> >
> > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > I'm attaching the whole patch since commitfest failed to ingest the
> > > last incremental on CI.
> > >
> >
> > Yeah, the whole patch needs to be attached for the commitfest tester to
> > work correctly - it can't apply pieces from multiple messages, etc.
> >
> > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > mainly to the docs - one place used pg_stat_copy_progress, the section
> > was not indexed properly, and so on.
> >
>
> How about including a column for command similar to
> pg_stat_progress_create_index and pg_stat_progress_cluster? It seems
> that command will be useful in the context of COPY as there are many
> variants of COPY.
>

>From pg_stat_progress_create_index docs:

The command that is running: CREATE INDEX, CREATE INDEX CONCURRENTLY,
REINDEX, or REINDEX CONCURRENTLY.

Which values should be present in copy progress? Just COPY FROM or COPY TO?

> With Regards,
> Amit Kapila.




Re: [PATCH] Simple progress reporting for COPY command

2021-01-07 Thread Josef Šimánek
st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
 napsal:
>
> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > I'm attaching the whole patch since commitfest failed to ingest the
> > last incremental on CI.
> >
>
> Yeah, the whole patch needs to be attached for the commitfest tester to
> work correctly - it can't apply pieces from multiple messages, etc.
>
> Anyway, I pushed this last version of patch, after a couple more tweaks,
> mainly to the docs - one place used pg_stat_copy_progress, the section
> was not indexed properly, and so on.
>
> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> message after pushing, but I probably wouldn't make that change anyway.
> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> If not, we can change that.
>
> One more question, though - I now realize the lines_processed ignores
> rows skipped because of BEFORE INSERT triggers. I wonder if that's the
> right thing to do? Imagine you know the number of lines in a file. You
> can't really use (lines_processed / total_lines) to measure progress,
> because that may ignore many "skipped" rows. So maybe this should be
> changed to count all rows. OTOH we still have bytes_processed.

I think that should be fixed. It is called "lines_processed" not
"lines_inserted". I'll take a look.

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: [PATCH] Simple progress reporting for COPY command

2021-01-05 Thread Josef Šimánek
I'm attaching the whole patch since commitfest failed to ingest the
last incremental on CI.


út 5. 1. 2021 v 2:32 odesílatel Josef Šimánek  napsal:
>
> út 5. 1. 2021 v 0:46 odesílatel Tomas Vondra
>  napsal:
> >
> > Hi,
> >
> > I did take a quick look today, and I have a couple minor comments:
>
> Hi! Thanks for your time.
>
> > 1) The catalog sgml docs seem to mention bytes_processed twice (one of
> > that should be bytes_total), and line_processed (should be "lines_").
>
> Fixed in attached patch.
>
> >
> > 2) I'm not quite sure about not including any info about the command.
> > For example pg_stat_progress_create_index includes info about the
> > command, although it's not very detailed. Not sure how useful would it
> > be to show just COPY FROM / COPY TO, without more details.
> >
> > It's probably possible to extract this from pg_stat_activity, but that
> > may be rather cumbersome (parsing arbitrary SQL and all that). Also,
> > what if the COPY is called from a function etc.?
>
> Any idea where to discuss this? My usecase is really simple. I would
> like to be able to see progress of COPY command by pid. There is a lot
> of COPY info inside CopyToStateData and CopyFromStateData structs. The
> only limitation I see is support of only int64 values for progress
> reporting columns. I'm not sure if it is easily possible to expose for
> example filename this way.
>
> >
> > 3) I wonder if we should have something like lines_estimated. For COPY
> > TO it's pretty simple to calculate it as
> >
> >  bytes_total / (bytes_processed / lines_processed)
> >
> > but what about
> >
> >  COPY (query) TO file
> >
> > In that case we don't know the total amount of bytes / rows, so we can't
> > calculate any estimate. So maybe we could peek into the query plan? But
> > I agree this is something we can add later.
>
> If I remember well one of the original ideas was to be able to pass
> custom bytes_total (or lines_total) by COPY options to be stored in
> copy progress. I can add this in some following patch if still
> welcomed.
>
> For estimates I would prefer to add an additional column to not mix
> those two together (or at least boolean estimated = true/false and
> reuse bytes_total column). If query estimates are welcomed, I can take
> a look at how to reach the query plan and expose those numbers when
> the query is used to estimated_lines and potentially estimated_bytes
> columns. It would be probably a little tricky to calculate
> estimated_bytes for some column types.
>
> Also currently only COPY FROM file supports bytes_total (it is 0 for
> all other scenarios). I think it should be possible for PostgreSQL to
> know the actual amount of lines query returns for some kind of
> queries, but I have no idea where to look at this. If that's possible
> to get, it would be one of the next steps to introduce additional
> column lines_total.
>
> >
> > 4) This comment is a bit confusing, as it mixes "total" and "processed".
> > I'd just say "number of bytes processed so far" instead.
> >
> Fixed in attached patch.
> >
> > Other than that, it seems fine. I'm sure we could add more features, but
> > it seems like a good start - I plan to get this committed once I get a
> > patch fixing the docs issues.
>
> Patch is attached, it should be applied to the top of the previous
> patch. Overall patch (having both patches merged together) could be
> found at 
> https://patch-diff.githubusercontent.com/raw/simi/postgres/pull/6.patch.
>
> > regards
> >
> > --
> > Tomas Vondra
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
From 847b227dac1ce2e9554a32ff95b8d618f8725843 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Fri, 1 Jan 2021 01:14:47 +0100
Subject: [PATCH 1/2] Add pg_stat_progress_copy view with COPY progress report.

---
 doc/src/sgml/monitoring.sgml | 101 +++
 src/backend/catalog/system_views.sql |  11 +++
 src/backend/commands/copyfrom.c  |  16 +++-
 src/backend/commands/copyfromparse.c |   4 +
 src/backend/commands/copyto.c|  21 -
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/include/commands/copyfrom_internal.h |   1 +
 src/include/commands/progress.h  |   5 ++
 src/include/pgstat.h |   3 +-
 src/test/regress/expected/rules.out  |   9 ++
 10 files changed, 168 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c90130677..51d261defd94 100644
--- a/doc/src/sgml/monitor

Re: [PATCH] Simple progress reporting for COPY command

2021-01-04 Thread Josef Šimánek
út 5. 1. 2021 v 0:46 odesílatel Tomas Vondra
 napsal:
>
> Hi,
>
> I did take a quick look today, and I have a couple minor comments:

Hi! Thanks for your time.

> 1) The catalog sgml docs seem to mention bytes_processed twice (one of
> that should be bytes_total), and line_processed (should be "lines_").

Fixed in attached patch.

>
> 2) I'm not quite sure about not including any info about the command.
> For example pg_stat_progress_create_index includes info about the
> command, although it's not very detailed. Not sure how useful would it
> be to show just COPY FROM / COPY TO, without more details.
>
> It's probably possible to extract this from pg_stat_activity, but that
> may be rather cumbersome (parsing arbitrary SQL and all that). Also,
> what if the COPY is called from a function etc.?

Any idea where to discuss this? My usecase is really simple. I would
like to be able to see progress of COPY command by pid. There is a lot
of COPY info inside CopyToStateData and CopyFromStateData structs. The
only limitation I see is support of only int64 values for progress
reporting columns. I'm not sure if it is easily possible to expose for
example filename this way.

>
> 3) I wonder if we should have something like lines_estimated. For COPY
> TO it's pretty simple to calculate it as
>
>  bytes_total / (bytes_processed / lines_processed)
>
> but what about
>
>  COPY (query) TO file
>
> In that case we don't know the total amount of bytes / rows, so we can't
> calculate any estimate. So maybe we could peek into the query plan? But
> I agree this is something we can add later.

If I remember well one of the original ideas was to be able to pass
custom bytes_total (or lines_total) by COPY options to be stored in
copy progress. I can add this in some following patch if still
welcomed.

For estimates I would prefer to add an additional column to not mix
those two together (or at least boolean estimated = true/false and
reuse bytes_total column). If query estimates are welcomed, I can take
a look at how to reach the query plan and expose those numbers when
the query is used to estimated_lines and potentially estimated_bytes
columns. It would be probably a little tricky to calculate
estimated_bytes for some column types.

Also currently only COPY FROM file supports bytes_total (it is 0 for
all other scenarios). I think it should be possible for PostgreSQL to
know the actual amount of lines query returns for some kind of
queries, but I have no idea where to look at this. If that's possible
to get, it would be one of the next steps to introduce additional
column lines_total.

>
> 4) This comment is a bit confusing, as it mixes "total" and "processed".
> I'd just say "number of bytes processed so far" instead.
>
Fixed in attached patch.
>
> Other than that, it seems fine. I'm sure we could add more features, but
> it seems like a good start - I plan to get this committed once I get a
> patch fixing the docs issues.

Patch is attached, it should be applied to the top of the previous
patch. Overall patch (having both patches merged together) could be
found at 
https://patch-diff.githubusercontent.com/raw/simi/postgres/pull/6.patch.

> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
From 6d2ee68b227c05ce4d1eb95a4c4a9c4f7dd6fbfa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Tue, 5 Jan 2021 02:07:03 +0100
Subject: [PATCH] Fix docs and comment.

---
 doc/src/sgml/monitoring.sgml | 4 ++--
 src/include/commands/copyfrom_internal.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 51d261defd94..875133303e19 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6477,7 +6477,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   bytes_processed bigint
+   bytes_total bigint
   
   
Size of source file for COPY FROM command in bytes. It is set to 0 if not available.
@@ -6486,7 +6486,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
   
-   line_processed bigint
+   lines_processed bigint
   
   
Number of lines already processed by COPY command.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index ae76be295a9b..80fac1e58a12 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -154,7 +154,7 @@ typedef struct CopyFromStateData
 	char	   *raw_buf;
 	int			raw_buf_index;	/* next byte to process */
 	int			raw_buf_len;	/* total # of bytes stored */
-	uint64  bytes_processed;/* total # of bytes processed, used for progress reporting */
+	uint64  bytes_processed;/* number of bytes processed so far */
 	/* Shorthand for number of unconsumed bytes available in raw_buf */
 #define RAW_BUF_BYTES(cstate) 

Re: [PATCH] Simple progress reporting for COPY command

2021-01-01 Thread Josef Šimánek
pá 1. 1. 2021 v 11:16 odesílatel Bharath Rupireddy
 napsal:
>
> On Fri, Jan 1, 2021 at 6:55 AM Josef Šimánek  wrote:
> > finally I had some time to revisit patch and all comments from
> > https://www.postgresql.org/message-id/CAFp7QwqMGEi4OyyaLEK9DR0%2BE%2BoK3UtA4bEjDVCa4bNkwUY2PQ%40mail.gmail.com
> > and I have prepared simple version of COPY command progress reporting.
>
> Thanks for the patch.
>
> > To keep the patch small as possible, I have introduced only a minimum
> > set of columns. It could be extended later if needed.
> >
> > Columns are inspired by CREATE INDEX progress report system view.
> >
> > pid - integer - PID of backend
> > datid - oid - OID of related database
> > datname - name - name of related database (this seems redundant, since
> > oid should be enough, but it is the same in CREATE INDEX)
> > relid - oid - oid of table related to COPY command, when not known
> > (for example when copying to file, it is 0)
> > bytes_processed - bigint - amount of bytes processed
> > bytes_total - bigint - file size in bytes if COPY FROM file (0 if not
> > COPY FROM file)
>
> It means that, for COPY FROM STDIN, bytes_total will be 0. IIUC, so
> having this parameter as 0 is an indication of the COPY command being
> from STDIN? I'm not sure whether it's discussed or not previously, but
> I personally prefer to have it as a separate column, say a varchar or
> text column with values STDIN, FILE or PROGRAM may be. And also it
> will be better if we have the format of the data streaming in, as a
> separate column, with possible values may be CSV, TEXT, BINARY.
>
> Since this progress reporting is for both COPY FROM and COPY TO,
> wouldn't it be good to have that also as a separate column, say
> command type with values FROM and TO?
>
> Thoughts?

My first patch had more columns (direction - FROM/TO, file bool,
program bool), but it was subject of discussion what's the purpose.
You can check the query by looking at pg_stat_activity by pid to get
more info. To keep it simple I decided to go with a minimal set of
columns for now. It can be extended later. I'm ok to continue on
improving this with more feedback once merged.

> I'm sure some of the points would have already been discussed. I just
> shared my thoughts here.
>
> I have not looked into the patch in detail, but it's missing tests.
> I'm sure we can not add the tests into copy.sql or copy2.sql, can we
> think of adding test cases to TAP or under isolation? I'm not sure
> whether other progress reporting commands have test cases.

I have raised the question in an old thread as well since there are no
tests for other progress commands as well. I was told it is ok for now
to keep it untested, since there's no easy way to do that for now.

https://www.postgresql.org/message-id/20200615001828.GA52676%40paquier.xyz

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com




[PATCH] Simple progress reporting for COPY command

2020-12-31 Thread Josef Šimánek
Hello,

finally I had some time to revisit patch and all comments from
https://www.postgresql.org/message-id/CAFp7QwqMGEi4OyyaLEK9DR0%2BE%2BoK3UtA4bEjDVCa4bNkwUY2PQ%40mail.gmail.com
and I have prepared simple version of COPY command progress reporting.

To keep the patch small as possible, I have introduced only a minimum
set of columns. It could be extended later if needed.

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since
oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known
(for example when copying to file, it is 0)
bytes_processed - bigint - amount of bytes processed
bytes_total - bigint - file size in bytes if COPY FROM file (0 if not
COPY FROM file)
lines_processed - bigint - amount of tuples processed

example output of progress for common use case (import from CSV file):

first console:
yr=# COPY test FROM '/home/retro/test.csv' (FORMAT CSV);

second console:
yr=# SELECT * FROM pg_stat_progress_copy;
  pid   | datid | datname | relid | bytes_processed | bytes_total |
lines_processed
+---+-+---+-+-+-
 803148 | 16384 | yr  | 16394 |   998965248 |  177796 |
56730126
(1 row)

It is simple to get progress in percents for example by:

yr=# SELECT (bytes_processed/bytes_total::decimal)*100 FROM
pg_stat_progress_copy WHERE pid = 803148;
?column?
-
 50.04287948706048525800

^ ~50% of file processed already

I did some dead simple benchmarking as well. The difference is not
huge. Each command works with 100 millions of tuples. Times are in
seconds.

   test with progress   master (32d6287)   difference
 - --- -- 
  COPY table TO46.102 47.499   -1.397
  COPY query TO52.168 49.8222.346
  COPY table TO PROGRAM52.345 51.8820.463
  COPY query TO PROGRAM54.141 52.7631.378
  COPY table FROM  88.970 85.1613.809
  COPY table FROM PROGRAM  94.393 90.3464.047

Properly formatted table (since I'm not sure everyone here would be
able to see the table formatted well) and the benchmark source is
present at https://github.com/simi/postgres/pull/6. I have also
included an example output in there.

I'll add this to the current commitfest as well.
From 847b227dac1ce2e9554a32ff95b8d618f8725843 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Fri, 1 Jan 2021 01:14:47 +0100
Subject: [PATCH] Add pg_stat_progress_copy view with COPY progress report.

---
 doc/src/sgml/monitoring.sgml | 101 +++
 src/backend/catalog/system_views.sql |  11 +++
 src/backend/commands/copyfrom.c  |  16 +++-
 src/backend/commands/copyfromparse.c |   4 +
 src/backend/commands/copyto.c|  21 -
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/include/commands/copyfrom_internal.h |   1 +
 src/include/commands/progress.h  |   5 ++
 src/include/pgstat.h |   3 +-
 src/test/regress/expected/rules.out  |   9 ++
 10 files changed, 168 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c90130677..51d261defd94 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -399,6 +399,12 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_copypg_stat_progress_copy
+  One row for each backend running COPY, showing current progress.
+   See .
+  
+ 
 

   
@@ -5247,6 +5253,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
which support progress reporting are ANALYZE,
CLUSTER,
CREATE INDEX, VACUUM,
+   COPY,
and  (i.e., replication
command that  issues to take
a base backup).
@@ -6396,6 +6403,100 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
 
  
+
+ 
+  COPY Progress Reporting
+
+  
+   Whenever COPY is running, the
+   pg_stat_copy_progress view will contain one row
+   for each backend that is currently running COPY command.
+   The table bellow describes the information that will be reported and provide
+   information how to interpret it.
+  
+
+  
+   pg_stat_progress_copy View
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   pid integer
+  
+  
+   Process ID of backend.
+  
+ 
+
+ 
+  
+   datid text
+  
+  
+   OID of the database to which this backend is 

Re: Github Actions (CI)

2020-12-03 Thread Josef Šimánek
čt 3. 12. 2020 v 7:34 odesílatel Thomas Munro  napsal:
>
> Hi hackers,
>
> I'm looking for more horsepower for testing commitfest entries
> automatically, and today I tried out $SUBJECT.  The attached is a
> rudimentary first attempt, for show-and-tell.  If you have a Github
> account, you just have to push it to a branch there and look at the
> Actions tab on the web page for the results.  Does anyone else have
> .github files and want to share, to see if we can combine efforts
> here?
>
> The reason for creating three separate "workflows" for Linux, Windows
> and macOS rather than three separate "jobs" inside one workflow is so
> that cfbot.cputube.org could potentially get separate pass/fail
> results for each OS out of the API rather than one combined result.  I
> rather like that feature of cfbot's results.  (I could be wrong about
> needing to do that, this is the first time I've ever looked at this
> stuff.)
>
> The Windows test actually fails right now, exactly as reported by
> Ranier[1].  It is a release build on a recent MSVC, so I guess that is
> expected and off-topic for this thread.  But generally,
> .github/workflows/ci-windows.yml is the weakest part of this.  It'd be
> great to get a debug/assertion build, show backtraces when it crashes,
> run more of the tests, etc etc, but I don't know nearly enough about
> Windows to do that myself.  Another thing is that it uses Choco for
> flex and bison; it'd be better to find those on the image, if
> possible.  Also, for all 3 OSes, it's not currently attempting to
> cache build results or anything like that.

Any chance to also share links to failing/passing testing builds?

> I'm a bit sad that GH doesn't have FreeBSD build runners.  Those are
> now popping up on other CIs, but I'm not sure if their free/open
> source tiers have enough resources for cfbot.
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAEudQArhn8bH836OB%2B3SboiaeEcgOtrJS58Bki4%3D5yeVqToxgw%40mail.gmail.com




Re: Commitfest 2020-11 is closed

2020-12-02 Thread Josef Šimánek
st 2. 12. 2020 v 23:36 odesílatel Andrew Dunstan  napsal:
>
>
> On 12/2/20 5:13 PM, Thomas Munro wrote:
> >
> > I'm experimenting with Github's built in CI.  All other ideas welcome.
>
>
>
> I'd look very closely at gitlab.

I was about to respond before with the same idea. Feel free to ping
regarding another CI. Also there is possibility to move to GitHub
Actions (free open source CI). I got some experience with that as
well.

>
> cheers
>
>
> andrew
>
>
>




Re: [PATCH] Initial progress reporting for COPY command

2020-09-17 Thread Josef Šimánek
čt 17. 9. 2020 v 7:09 odesílatel Michael Paquier  napsal:
>
> On Mon, Sep 07, 2020 at 01:13:10PM +0900, Michael Paquier wrote:
> > Josef, the patch has been waiting on author for a bit more than one
> > month, so could you send a rebased version please?  It looks that you
> > are still a bit confused by the commit fest process, and as a first
> > step we need a clean version to be able to review it.  This would also
> > allow the commit fest bot to check it at http://commitfest.cputube.org/.
>
> This feature has some appeal, but there is no activity lately, so I am
> marking it as returned with feedback.  Please feel free to send a new
> patch once you have time to do so, and I would recommend to register a
> new entry in the commit fest app when done.

Thanks for info. I hope I'll be able to revisit this patch soon,
rebase and post again. I'm still interested in this.

> --
> Michael




Re: [PATCH] Initial progress reporting for COPY command

2020-07-28 Thread Josef Šimánek
Thanks for the info. I am waiting for review. Is there any summary of
requested changes needed?

Dne út 28. 7. 2020 19:00 uživatel Fujii Masao 
napsal:

>
>
> On 2020/07/02 21:51, Daniel Gustafsson wrote:
> >> On 2 Jul 2020, at 14:42, Josef Šimánek  wrote:
> >> čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson  <mailto:dan...@yesql.se>> napsal:
> >
> >> The automated patchtester for the Commitfest gets confused when there
> are two
> >> versions of the same changeset attached to the email, as it tries to
> apply them
> >> both which obviously results in an application failure.  I've attached
> just the
> >> previously submitted patch version to this email to see if we can get a
> test
> >> run of it.
> >>
> >> Thanks, I'm new to commitfest and I was confused as well.
> >
> > No worries, we're here to help.
> >
> >> I tried to reattach the thread there. I'll prepare a new patch soon,
> what should I do? Just attach it again?
>
> New patch has not been sent yet.
> So I marked this as "Waiting on Author" at Commit Fest.
>
> Regards,
>
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


Re: [PATCH] Initial progress reporting for COPY command

2020-07-02 Thread Josef Šimánek
čt 2. 7. 2020 v 14:25 odesílatel Daniel Gustafsson  napsal:

> The automated patchtester for the Commitfest gets confused when there are
> two
> versions of the same changeset attached to the email, as it tries to apply
> them
> both which obviously results in an application failure.  I've attached
> just the
> previously submitted patch version to this email to see if we can get a
> test
> run of it.
>

Thanks, I'm new to commitfest and I was confused as well. I tried to
reattach the thread there. I'll prepare a new patch soon, what should I do?
Just attach it again?


> cheers ./daniel
>
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-23 Thread Josef Šimánek
út 23. 6. 2020 v 13:15 odesílatel Tomas Vondra 
napsal:

> On Tue, Jun 23, 2020 at 03:40:08PM +0530, vignesh C wrote:
> >On Mon, Jun 22, 2020 at 4:28 PM Josef Šimánek 
> wrote:
> >>
> >> Thanks for the hint regarding "CopyReadLineText". I'll take a look.
> >>
> >> For now I have tested those cases:
> >>
> >> CREATE TABLE test(id int);
> >> INSERT INTO test SELECT 1 FROM generate_series(1, 100);
> >> COPY (SELECT * FROM test) TO '/tmp/ids';
> >> COPY test FROM '/tmp/ids';
> >>
> >> psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,1))
> TO STDOUT;' > /tmp/ryba.txt
> >> echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'
> >>
> >> It is easy to check lines count and bytes count are in sync (since 1
> line is 2 bytes here - "1" and newline character).
> >> I'll try to check more complex COPY commands to ensure everything is in
> sync.
> >>
> >> If you have any ideas for testing queries, feel free to suggest.
> >
> >For copy from statement you could attach the session, put a breakpoint
> >at CopyReadLineText, execution will hit this breakpoint for every
> >record it is doing COPY FROM and parallely check if
> >pg_stat_progress_copy is getting updated correctly. I noticed it was
> >showing the file read size instead of the actual processed bytes.
> >
> >>>  +pg_stat_progress_copy| SELECT s.pid,
> >>> +s.datid,
> >>> +d.datname,
> >>> +s.relid,
> >>> +CASE s.param1
> >>> +WHEN 0 THEN 'TO'::text
> >>> +WHEN 1 THEN 'FROM'::text
> >>> +ELSE NULL::text
> >>> +END AS direction,
> >>> +((s.param2)::integer)::boolean AS file,
> >>> +((s.param3)::integer)::boolean AS program,
> >>> +s.param4 AS lines_processed,
> >>> +s.param5 AS file_bytes_processed
> >>>
> >>> You could include pg_size_pretty for s.param5 like
> >>> pg_size_pretty(S.param5) AS bytes_processed, it will be easier for
> >>> users to understand bytes_processed when the data size increases.
> >>
> >>
> >> I was looking at the rest of reporting views and for me those seem to
> be just basic ones providing just raw data to be used later in custom nice
> friendly human-readable views built on the client side.
> >> For example "pg_stat_progress_basebackup" also reports
> "backup_streamed" in raw form.
> >>
> >> Anyway if you would like to make this view more user-friendly, I can
> add that. Just ping me.
> >
> >I felt we could add pg_size_pretty to make the view more user friendly.
> >
>
> Please no. That'd make processing of the data (say, computing progress
> as processed/total) impossible. It's easy to add pg_size_pretty if you
> want it, it's impossible to undo it. I don't see a single pg_size_pretty
> call in system_views.sql.
>
>
I think the same.


> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-22 Thread Josef Šimánek
po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra 
napsal:

> On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
> >Thanks for all comments. I have updated code to support more options
> >(including STDIN/STDOUT) and added some documentation.
> >
> >Patch is attached and can be found also at
> >https://github.com/simi/postgres/pull/5.
> >
> >Diff version: https://github.com/simi/postgres/pull/5.diff
> >Patch version: https://github.com/simi/postgres/pull/5.patch
> >
> >I'm also attaching screenshot of HTML documentation and html documentation
> >file.
> >
> >I'll do my best to get this to commitfest now.
> >
>
> I see we're not showing the total number of bytes the COPY is expected
> to process, which makes it hard to estimate how far we actually are.
> Clearly there are cases when we really don't know that (exports, import
> from stdin/program), but why not to show file size for imports from a
> file? I'd expect that to be the most common case.
>

For COPY FROM file fstat is done and info is available already at
https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934.
It should be easy to update some param (param6 for example) with file size
and expose it in report view. When not available, this column can be NULL.

Would that be enough?

On the other side everyone can check file size manually to get total value
expected and just compare to reported bytes_processed. Alt. "wc -l" can be
checked to get amount of lines and check lines_processed column to get
progress. Should it check amount of lines and populate another column with
lines total (using a configured separator) as well? AFAIK that would need
full file scan which can be slow for huge files.


> I wonder if it made sense to show some estimates in the other cases. For
> example when exporting query result, maybe we could show the estimated
> number of rows and size? Of course, that's prone to estimation errors
> and it's more a wild idea for the future, I don't expect this patch to
> implement that.
>

My plan here was to expose numbers not being currently available and let
clients get the rest of info on their own.

For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be
executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be
inspected first to get amount of expected rows or bytes to be processed

I see the current system view in my patch (and also all other report views
currently available) more as a scaffold to build own tools.

For example CLI tools can use this to provide some kind of progress.


> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-22 Thread Josef Šimánek
po 22. 6. 2020 v 9:15 odesílatel vignesh C  napsal:

> On Sun, Jun 21, 2020 at 5:11 PM Josef Šimánek 
> wrote:
> >
> > Thanks for all comments. I have updated code to support more options
> (including STDIN/STDOUT) and added some documentation.
> >
> > Patch is attached and can be found also at
> https://github.com/simi/postgres/pull/5.
> >
> > Diff version: https://github.com/simi/postgres/pull/5.diff
> > Patch version: https://github.com/simi/postgres/pull/5.patch
> >
> > I'm also attaching screenshot of HTML documentation and html
> documentation file.
> >
> > I'll do my best to get this to commitfest now.
> >
> > ne 14. 6. 2020 v 14:32 odesílatel Josef Šimánek 
> napsal:
> >>
> >> Hello, as proposed by Pavel Stěhule and discussed on local czech
> PostgreSQL maillist (
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email_source=footer),
> I have prepared an initial patch for COPY command progress reporting.
> >>
> >> Few examples first:
> >>
> >> "COPY (SELECT * FROM test) TO '/tmp/ids';"
> >>
> >> yr=# SELECT * from pg_stat_progress_copy;
> >>pid   | datid | datname | relid | direction | file | program |
> lines_processed | file_bytes_processed
> >>
> -+---+-+---+---+--+-+-+--
> >>  3347126 | 16384 | yr  | 0 | TO| t| f   |
>3529943 | 24906226
> >> (1 row)
> >>
> >> "COPY test FROM '/tmp/ids';
> >>
> >> yr=# SELECT * from pg_stat_progress_copy;
> >>pid   | datid | datname | relid | direction | file | program |
> lines_processed | file_bytes_processed
> >>
> -+---+-+---+---+--+-+-+--
> >>  3347126 | 16384 | yr  | 16385 | FROM  | t| f   |
>  121591999 |957218816
> >> (1 row)
> >>
> >> Columns are inspired by CREATE INDEX progress report system view.
> >>
> >> pid - integer - PID of backend
> >> datid - oid - OID of related database
> >> datname - name - name of related database (this seems redundant, since
> oid should be enough, but it is the same in CREATE INDEX)
> >> relid - oid - oid of table related to COPY command, when not known (for
> example when copying to file, it is 0)
> >> direction - text - one of "FROM" or "TO" depends on command used
> >> file - bool - is file is used?
> >> program - bool - is program used?
> >> lines_processed - bigint - amount of processed lines, works for both
> directions (FROM/TO)
> >> file_bytes_processed - amount of bytes processed when file is used
> (otherwise 0), works for both direction (
> >> FROM/TO) when file is used (file = t)
> >>
> >> Patch is attached and can be found also at
> https://github.com/simi/postgres/pull/5.
> >>
>
> Few comments:
> @@ -713,6 +714,8 @@ CopyGetData(CopyState cstate, void *databuf, int
> minread, int maxread)
>   break;
>   }
>
> + CopyUpdateBytesProgress(cstate, bytesread);
> +
>   return bytesread;
>  }
>
> This is actually the read data, actual processing will happen later
> like in CopyReadLineText, it would be better if
> CopyUpdateBytesProgress is done later, if not it will give the same
> value even though it does multiple inserts on the table.
> lines_processed will keep getting updated but file_bytes_processed
> will not be updated.
>

First I would like to explain what's reported (or at least I'm trying to
get reported) at bytes_processed column.

When exporting to file it should start at 0 and end up at the actual final
file size.
When importing from file, it should do the same. You can check file size
before you start COPY FROM and get actual progress looking at
bytes_processed.

This column is just a counter of bytes read from input on COPY FROM or
amount of bytes going through COPY TO.

Thanks for the hint regarding "CopyReadLineText". I'll take a look.

For now I have tested those cases:

CREATE TABLE test(id int);
INSERT INTO test SELECT 1 FROM generate_series(1, 100);
COPY (SELECT * FROM test) TO '/tmp/ids';
COPY test FROM '/tmp/ids';

psql -h /tmp yr -c 'COPY (SELECT 1 from generate_series(1,1)) TO
STDOUT;' > /tmp/ryba.txt
echo /tmp/ryba.txt | psql -h /tmp yr -c 'COPY test FROM STDIN'

It is easy to check lines count and bytes count are in sync (since 1 line
is 2 bytes here - "1" and newline character).
I'll try to ch

Re: [PATCH] Initial progress reporting for COPY command

2020-06-22 Thread Josef Šimánek
po 22. 6. 2020 v 4:48 odesílatel Fujii Masao 
napsal:

>
>
> On 2020/06/21 20:33, Josef Šimánek wrote:
> >
> >
> > po 15. 6. 2020 v 6:39 odesílatel Fujii Masao <
> masao.fu...@oss.nttdata.com <mailto:masao.fu...@oss.nttdata.com>> napsal:
> >
> >
> >
> > On 2020/06/14 21:32, Josef Šimánek wrote:
> >  > Hello, as proposed by Pavel Stěhule and discussed on local czech
> PostgreSQL maillist (
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email_source=footer),
> I have prepared an initial patch for COPY command progress reporting.
> >
> > Sounds nice!
> >
> >
> >  > file - bool - is file is used?
> >  > program - bool - is program used?
> >
> > Are these fields really necessary in a progress view?
> > What values are reported when STDOUT/STDIN is specified in COPY
> command?
> >
> >
> > For STDOUT and STDIN file is true and program is false.
>
> Could you tell me why these columns are necessary in *progress* view?
> If we want to see what copy command is actually running, we can see
> pg_stat_activity, instead. For example,
>
>  SELECT pc.*, a.query FROM pg_stat_progress_copy pc, pg_stat_activity
> a WHERE pc.pid = a.pid;
>

If that doesn't make any sense, I can remove those. I have not strong
opinion about those values. Those were just around when I was looking for
possible values to include in the progress report.

>
> >  > file_bytes_processed - amount of bytes processed when file is
> used (otherwise 0), works for both direction (
> >  > FROM/TO) when file is used (file = t)
> >
> > What value is reported when STDOUT/STDIN is specified in COPY
> command?
> >
> >
> > For my first patch nothing was reported on STDOUT/STDIN usage. I'll
> attach new patch soon supporting those as well.
>
> Thanks for the patch!
>
> With the patch, pg_stat_progress_copy seems to report the progress of
> the processing on file_fdw. Is this intentional?
>

Every action using internally COPY will be included in the progress report
view.
I have spotted for example pg_dump does that and is reported there as well.
I do not see any problem regarding this. For pg_dump it is consistent with
"pg_stat_activity" reporting COPY command in the query field.


> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-21 Thread Josef Šimánek
po 15. 6. 2020 v 7:34 odesílatel Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> napsal:

> > I'm using ftell to get current position in file to populate
> file_bytes_processed without error handling (ftell can return -1L and also
> populate errno on problems).
> >
> > 1. Is that a good way to get progress of file processing?
>
> IMO, it's better to handle the error cases. One possible case where
> ftell can return -1 and set errno is when the total bytes processed is
> more than LONG_MAX.
>
> Will your patch handle file_bytes_processed reporting for COPY FROM
> STDIN cases? For this case, ftell can't be used.
>
> Instead of using ftell and worrying about the errors, a simple
> approach could be to have a uint64 variable in CopyStateData to track
> the number of bytes read whenever CopyGetData is called. This approach
> can also handle the case of COPY FROM STDIN.
>

Thanks for suggestion. I used this approach and latest patch supports both
STDIN and STDOUT now.


> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-21 Thread Josef Šimánek
po 15. 6. 2020 v 2:18 odesílatel Michael Paquier 
napsal:

> Hi Josef,
>
> On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
> > Hello, as proposed by Pavel Stěhule and discussed on local czech
> PostgreSQL
> > maillist (
> >
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email_source=footer
> ),
> > I have prepared an initial patch for COPY command progress reporting.
>
> Sounds like a good idea to me.
>
> > I have not found any tests for progress reporting. Are there any? It
> would
> > need two backends running (one running COPY, one checking output of
> report
> > view). Is there any similar test I can inspire at? In theory, it should
> be
> > possible to use dblink_send_query to run async COPY command in the
> > background.
>
> We don't have any tests in core.  I think that making deterministic
> test cases is rather tricky here as long as we don't have a more
> advanced testing framework that allows is to lock certain code paths
> and keep around an expected state until a second session comes around
> and looks at the progress catalog (even that would need adding more
> code to core to mark the extra point looked at).  So I think that it is
> fine to not focus on that for this feature.  The important parts are
> the choice of the progress points and the data sent to MyProc, and
> both should be chosen wisely.
>
> > My initial (attached) patch also doesn't introduce documentation for this
> > system view. I can add that later once this patch is finalized (if that
> > happens).
>
> You may want to add it to the next commit fest:
> https://commitfest.postgresql.org/28/
> Documentation is necessary, and having some would ease reviews.
>

I have added documentation, more code comments and I'll upload patch to
commit fest.


> --
> Michael
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-21 Thread Josef Šimánek
po 15. 6. 2020 v 6:39 odesílatel Fujii Masao 
napsal:

>
>
> On 2020/06/14 21:32, Josef Šimánek wrote:
> > Hello, as proposed by Pavel Stěhule and discussed on local czech
> PostgreSQL maillist (
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email_source=footer),
> I have prepared an initial patch for COPY command progress reporting.
>
> Sounds nice!
>
>
> > file - bool - is file is used?
> > program - bool - is program used?
>
> Are these fields really necessary in a progress view?
> What values are reported when STDOUT/STDIN is specified in COPY command?
>

For STDOUT and STDIN file is true and program is false.


> > file_bytes_processed - amount of bytes processed when file is used
> (otherwise 0), works for both direction (
> > FROM/TO) when file is used (file = t)
>
> What value is reported when STDOUT/STDIN is specified in COPY command?


For my first patch nothing was reported on STDOUT/STDIN usage. I'll attach
new patch soon supporting those as well.


>
>


> Regards,
>
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


Re: [PATCH] Initial progress reporting for COPY command

2020-06-21 Thread Josef Šimánek
po 15. 6. 2020 v 2:18 odesílatel Michael Paquier 
napsal:

> Hi Josef,
>
> On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote:
> > Hello, as proposed by Pavel Stěhule and discussed on local czech
> PostgreSQL
> > maillist (
> >
> https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email_source=footer
> ),
> > I have prepared an initial patch for COPY command progress reporting.
>
> Sounds like a good idea to me.
>

Great. I will continue working on this.


> > I have not found any tests for progress reporting. Are there any? It
> would
> > need two backends running (one running COPY, one checking output of
> report
> > view). Is there any similar test I can inspire at? In theory, it should
> be
> > possible to use dblink_send_query to run async COPY command in the
> > background.
>
> We don't have any tests in core.  I think that making deterministic
> test cases is rather tricky here as long as we don't have a more
> advanced testing framework that allows is to lock certain code paths
> and keep around an expected state until a second session comes around
> and looks at the progress catalog (even that would need adding more
> code to core to mark the extra point looked at).  So I think that it is
> fine to not focus on that for this feature.  The important parts are
> the choice of the progress points and the data sent to MyProc, and
> both should be chosen wisely.
>

Thanks for the info. I'm focusing exactly at looking for right spots to
report the progress. I'll attach new patch with better places and
supporting more options of reporting (including STDIN, STDOUT) soon and
also I'll try to add it to commitfest.


>
> > My initial (attached) patch also doesn't introduce documentation for this
> > system view. I can add that later once this patch is finalized (if that
> > happens).
>
> You may want to add it to the next commit fest:
> https://commitfest.postgresql.org/28/
> Documentation is necessary, and having some would ease reviews.
> --
> Michael
>


[PATCH] Initial progress reporting for COPY command

2020-06-14 Thread Josef Šimánek
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email_source=footer),
I have prepared an initial patch for COPY command progress reporting.

Few examples first:

"COPY (SELECT * FROM test) TO '/tmp/ids';"

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed
-+---+-+---+---+--+-+-+--
 3347126 | 16384 | yr  | 0 | TO| t| f   |
3529943 | 24906226
(1 row)

"COPY test FROM '/tmp/ids';

yr=# SELECT * from pg_stat_progress_copy;
   pid   | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed
-+---+-+---+---+--+-+-+--
 3347126 | 16384 | yr  | 16385 | FROM  | t| f   |
121591999 |957218816
(1 row)

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid
should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for
example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both
directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used
(otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)

Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.

Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patch

I havefew initial notes and questions.

I'm using ftell to get current position in file to populate
file_bytes_processed without error handling (ftell can return -1L and also
populate errno on problems).

1. Is that a good way to get progress of file processing?
2. Is it safe in given context to not care about errors? If not, what to do
on error?

Some columns are not populated on certain COPY commands. For example when a
file is not used, file_bytes_processed is set to 0. Would it be better to
use NULL instead when the column is not related to the current command?
Same problem is for relid column.

I have not found any tests for progress reporting. Are there any? It would
need two backends running (one running COPY, one checking output of report
view). Is there any similar test I can inspire at? In theory, it should be
possible to use dblink_send_query to run async COPY command in the
background.

My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).


copy-progress-v1.patch
Description: Binary data


Re: Another modest proposal for docs formatting: catalog descriptions

2020-06-01 Thread Josef Šimánek
út 2. 6. 2020 v 0:30 odesílatel Tom Lane  napsal:

> As of HEAD, building the PDF docs for A4 paper draws 538 "contents
> ... exceed the available area" warnings.  While this is a nice step
> forward from where we were (v12 has more than 1500 such warnings),
> we're far from done fixing that issue.
>
> A large chunk of the remaining warnings are about tables that describe
> the columns of system catalogs, system views, and information_schema
> views.  The typical contents of a row in such a table are a field name,
> a field data type, possibly a "references" link, and then a description.
> Unsurprisingly, this does not work very well for descriptions of more
> than a few words.  And not infrequently, we *need* more than a few words.
>
> ISTM this is more or less the same problem we have/had with function
> descriptions, and so I'm tempted to solve it in more or less the same
> way.  Let's redefine the table layout to look like, say, this for
> pg_attrdef [1]:
>
> oid oid
> Row identifier
>
> adrelid oid (references pg_class.oid)
> The table this column belongs to
>
> adnum int2 (references pg_attribute.attnum)
> The number of the column
>
> adbin pg_node_tree
> The column default value, in nodeToString() representation. Use
> pg_get_expr(adbin, adrelid) to convert it to an SQL expression.
>
> That is, let's go over to something that's more or less like a table-ized
> , with the fixed items for an entry all written on the first
> line, and then as much description text as we need.  The actual markup
> would be closely modeled on what we did for function-table entries.
>
> Thoughts?
>

I have spotted this change recently at progress monitoring devel docs (
https://www.postgresql.org/docs/devel/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING).
Current version seems a little chaotic since there are multiple tables on
the same page with 2 mixed layouts. Older layout (for example v12 one -
https://www.postgresql.org/docs/12/progress-reporting.html#CREATE-INDEX-PROGRESS-REPORTING)
is much easier to read for me.

Is this final change? I do not see any problem on this (progress
monitoring) page in old layout. Is there any example of problematic page?
Maybe there's a different way to solve this. For example instead of
in-lining long text as a column description, it should be possible to link
to detailed description in custom paragraph or table. See description
column at table 27.22. at progress monitoring page for column "phase" for
similar approach.


>
> regards, tom lane
>
> [1] https://www.postgresql.org/docs/devel/catalog-pg-attrdef.html
>
>
>
>
>


Re: [PATCH] Improve documentation of REINDEX options

2019-12-17 Thread Josef Šimánek
út 17. 12. 2019 v 6:36 odesílatel Michael Paquier 
napsal:

> On Fri, Dec 13, 2019 at 10:28:33AM +0100, Josef Šimánek wrote:
> > I have prepared patch to improve documentation for REINDEX. It
> > should be more inline with another documentation pages.
> >
> > You can see the change applied in attached file. Patch can be found at
> > https://github.com/simi/postgres/pull/3 (diff -
> > https://github.com/simi/postgres/pull/3.diff, patch -
> > https://github.com/simi/postgres/pull/3.patch).
>
> Please, always attach your patches to emails sent on this mailing
> list.  If for a reason or another, the data located to with external
> link is lost (imagine for example that your github account is gone or
> that github is reduced to ashes), then such patches would be lost, and
> anybody looking at this email 10 years from now would not know what
> you have been writing about here.  I am attaching it here for the
> archive's sake.
>
> +where option can
> be:
> +
> +VERBOSE
> Why not...  We did that in the docs of ANALYZE for v11 when
> introducing the parenthesized grammar flavor for the options
> available.
>
> -   Rebuild all the indexes on the table my_table:
> +   Rebuild all the indexes on the table my_table
> with progress report per index:
>
>  
> -REINDEX TABLE my_table;
> +REINDEX (VERBOSE) TABLE my_table;
> Not sure if this part brings much to the reader though.  It is not
> like the command description of REINDEX is complicated with dozens
> of option choices.
>

For me this is the default way how to reindex whole table manually in psql
since you get some "progress". Anyway I can remove it if you don't see any
benefit in extending this example.


> --
> Michael
>


Re: [PATCH] Improve documentation of REINDEX options

2019-12-17 Thread Josef Šimánek
út 17. 12. 2019 v 6:36 odesílatel Michael Paquier 
napsal:

> On Fri, Dec 13, 2019 at 10:28:33AM +0100, Josef Šimánek wrote:
> > I have prepared patch to improve documentation for REINDEX. It
> > should be more inline with another documentation pages.
> >
> > You can see the change applied in attached file. Patch can be found at
> > https://github.com/simi/postgres/pull/3 (diff -
> > https://github.com/simi/postgres/pull/3.diff, patch -
> > https://github.com/simi/postgres/pull/3.patch).
>
> Please, always attach your patches to emails sent on this mailing
> list.  If for a reason or another, the data located to with external
> link is lost (imagine for example that your github account is gone or
> that github is reduced to ashes), then such patches would be lost, and
> anybody looking at this email 10 years from now would not know what
> you have been writing about here.  I am attaching it here for the
> archive's sake.
>

Sorry, I'm attaching the same patch now for future reference.


>
> +where option can
> be:
> +
> +VERBOSE
> Why not...  We did that in the docs of ANALYZE for v11 when
> introducing the parenthesized grammar flavor for the options
> available.
>
> -   Rebuild all the indexes on the table my_table:
> +   Rebuild all the indexes on the table my_table
> with progress report per index:
>
>  
> -REINDEX TABLE my_table;
> +REINDEX (VERBOSE) TABLE my_table;
> Not sure if this part brings much to the reader though.  It is not
> like the command description of REINDEX is complicated with dozens
> of option choices.
> --
> Michael
>
From 992fbb946daf0e5483319665c74a1a75dfea503b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Fri, 13 Dec 2019 01:59:07 +0100
Subject: [PATCH] Unify REINDEX options documentation and extend examples.

---
 doc/src/sgml/ref/reindex.sgml | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 10881ab03a85..007cceb320ee 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+REINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name
+
+where option can be:
+
+VERBOSE
 
  
 
@@ -422,10 +426,10 @@ REINDEX INDEX my_index;
   
 
   
-   Rebuild all the indexes on the table my_table:
+   Rebuild all the indexes on the table my_table with progress report per index:
 
 
-REINDEX TABLE my_table;
+REINDEX (VERBOSE) TABLE my_table;
 
   
 


[PATCH] Improve documentation of REINDEX options

2019-12-13 Thread Josef Šimánek
Hello!

According to discussion at pgsql-general (
https://www.postgresql.org/message-id/flat/CAFp7QwqFYcHiARfT91rOQj%3DmFT0MWBE%2BkxEmjfQh3QmRN1UBiw%40mail.gmail.com#05b75be4fd11c0e6216f0b329c808f72)
I
have prepared patch to improve documentation for REINDEX. It should be more
inline with another documentation pages.

You can see the change applied in attached file. Patch can be found at
https://github.com/simi/postgres/pull/3 (diff -
https://github.com/simi/postgres/pull/3.diff, patch -
https://github.com/simi/postgres/pull/3.patch).

This change is based on idea of Pave Stěhule, thanks a lot for that!
Similar approach was used recently in
https://www.postgresql.org/docs/devel/sql-dropdatabase.html.
Title: REINDEX


REINDEXPrev UpSQL CommandsHome NextREINDEXREINDEX — rebuild indexesSynopsisREINDEX [ ( option [, ...] ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM } [ CONCURRENTLY ] name

where option can be:

VERBOSE
Description
   REINDEX rebuilds an index using the data
   stored in the index's table, replacing the old copy of the index. There are
   several scenarios in which to use REINDEX:

   
  An index has become corrupted, and no longer contains valid
  data. Although in theory this should never happen, in
  practice indexes can become corrupted due to software bugs or
  hardware failures.  REINDEX provides a
  recovery method.
 
  An index has become “bloated”, that is it contains many
  empty or nearly-empty pages.  This can occur with B-tree indexes in
  PostgreSQL under certain uncommon access
  patterns. REINDEX provides a way to reduce
  the space consumption of the index by writing a new version of
  the index without the dead pages. See Section 24.2 for more information.
 
  You have altered a storage parameter (such as fillfactor)
  for an index, and wish to ensure that the change has taken full effect.
 
  If an index build fails with the CONCURRENTLY option,
  this index is left as “invalid”. Such indexes are useless
  but it can be convenient to use REINDEX to rebuild
  them. Note that only REINDEX INDEX is able
  to perform a concurrent build on an invalid index.
 ParametersINDEX
  Recreate the specified index.
 TABLE
  Recreate all indexes of the specified table.  If the table has a
  secondary “TOAST” table, that is reindexed as well.
 SCHEMA
  Recreate all indexes of the specified schema.  If a table of this
  schema has a secondary “TOAST” table, that is reindexed as
  well. Indexes on shared system catalogs are also processed.
  This form of REINDEX cannot be executed inside a
  transaction block.
 DATABASE
  Recreate all indexes within the current database.
  Indexes on shared system catalogs are also processed.
  This form of REINDEX cannot be executed inside a
  transaction block.
 SYSTEM
  Recreate all indexes on system catalogs within the current database.
  Indexes on shared system catalogs are included.
  Indexes on user tables are not processed.
  This form of REINDEX cannot be executed inside a
  transaction block.
 name
  The name of the specific index, table, or database to be
  reindexed.  Index and table names can be schema-qualified.
  Presently, REINDEX DATABASE and REINDEX SYSTEM
  can only reindex the current database, so their parameter must match
  the current database's name.
 CONCURRENTLY
  When this option is used, PostgreSQL will rebuild the
  index without taking any locks that prevent concurrent inserts,
  updates, or deletes on the table; whereas a standard index rebuild
  locks out writes (but not reads) on the table until it's done.
  There are several caveats to be aware of when using this option
  — see Rebuilding Indexes Concurrently.
 VERBOSE
  Prints a progress report as each index is reindexed.
 Notes
   If you suspect corruption of an index on a user table, you can
   simply rebuild that index, or all indexes on the table, using
   REINDEX INDEX or REINDEX TABLE.
  
   Things are more difficult if you need to recover from corruption of
   an index on a system table.  In this case it's important for the
   system to not have used any of the suspect indexes itself.
   (Indeed, in this sort of scenario you might find that server
   processes are crashing immediately at start-up, due to reliance on
   the corrupted indexes.)  To recover safely, the server must be started
   with the -P option, which prevents it from using
   indexes for system catalog lookups.
  
   One way to do this is to shut down the server and start a single-user
   PostgreSQL server
   with the -P option included on its command line.
   Then, REINDEX DATABASE, REINDEX SYSTEM,
   REINDEX TABLE, or REINDEX INDEX can be
   issued, depending on how much you want to reconstruct.  If in
   doubt, use REINDEX SYSTEM to select
   reconstruction 

Re: [PATCH] Include triggers in EXPLAIN

2019-11-04 Thread Josef Šimánek
Hello Tom.

Thanks for quick response. As I was testing this feature it shows all
"possible" triggers to be executed running given query. The benefit of
having this information in EXPLAIN as well is you do not need to execute
the query (as EXPLAIN ANALYZE does). My usecase is to take a look at query
before it is executed to get some idea about the plan with EXPLAIN.

Do you have idea about some case where actual trigger will be missing in
EXPLAIN with current implementation, but will be present in EXPLAIN
ANALYZE? I can take a look if there's any way how to handle those cases as
well.

ne 3. 11. 2019 v 22:49 odesílatel Tom Lane  napsal:

> =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?=  writes:
> > Recently I got few times into situation where I was trying to find out
> what
> > is blocking DELETE queries. Running EXPLAIN (even VERBOSE one) wasn't
> > useful, since the reason was slow trigger (missing index on foreign key
> > column). I had to create testing entry and run EXPLAIN ANALYZE DELETE to
> > get this information.
>
> > It will be really valuable for me to show triggers in EXPLAIN query since
> > it will make clear for me there will be any trigger "activated" during
> > execution of DELETE query and that can be the reason for slow DELETE.
>
> I don't really see the point of this patch?  You do get the trigger
> times during EXPLAIN ANALYZE, and I don't believe that a plain EXPLAIN
> is going to have full information about what triggers might fire or
> not fire.
>
> regards, tom lane
>


[PATCH] Include triggers in EXPLAIN

2019-11-03 Thread Josef Šimánek
Hello!

Recently I got few times into situation where I was trying to find out what
is blocking DELETE queries. Running EXPLAIN (even VERBOSE one) wasn't
useful, since the reason was slow trigger (missing index on foreign key
column). I had to create testing entry and run EXPLAIN ANALYZE DELETE to
get this information.

It will be really valuable for me to show triggers in EXPLAIN query since
it will make clear for me there will be any trigger "activated" during
execution of DELETE query and that can be the reason for slow DELETE.

I have seen initial discussion at
https://www.postgresql.org/message-id/flat/20693.732761%40sss.pgh.pa.us
to show time spent in triggers in EXPLAIN ANALYZE including quick
discussion to possibly show triggers during EXPLAIN. Anyway since it
doesn't show any additional cost and just inform about the possibilities, I
still consider this feature useful. This is probably implementation of idea
mentioned at
https://www.postgresql.org/message-id/21221.736869%40sss.pgh.pa.us by
Tom Lane.

After initial discussion with Pavel Stěhule and Tomáš Vondra at czech
postgresql maillist (
https://groups.google.com/forum/#!topic/postgresql-cz/Dq1sT7huVho) I was
able to prepare initial version of this patch. I have added EXPLAIN option
called TRIGGERS enabled by default.There's already autoexplain property for
this. I understand it is not possible to show only triggers which will be
really activated unless query is really executed. EXPLAIN ANALYZE remains
untouched with this patch.

- patch with examples can be found at
https://github.com/simi/postgres/pull/2
- DIFF format https://github.com/simi/postgres/pull/2.diff
- PATCH format (also attached) https://github.com/simi/postgres/pull/2.patch

All regression tests passed with this change locally on latest git master.
I would like to cover this patch with more regression tests, but I wasn't
sure where to place them, since there's no "EXPLAIN" related test "group".
Is "src/test/regress/sql/triggers.sql" the best place to add tests related
to this change?

PS: This is my first try to contribute to postgresql codebase. The quality
of patch is probably not the best, but I will be more than happy to do any
requested change if needed.

Regards,
Josef Šimánek
From 18578e3d07aa159631e0903abae496a6482fa279 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= 
Date: Sat, 27 Jul 2019 09:27:41 +0200
Subject: [PATCH] Show possible triggers in EXPLAIN.

---
 doc/src/sgml/ref/explain.sgml |  9 
 src/backend/commands/explain.c| 43 +++
 src/include/commands/explain.h|  1 +
 src/test/regress/expected/foreign_key.out |  6 ++-
 src/test/regress/expected/insert_conflict.out |  4 +-
 src/test/regress/expected/updatable_views.out | 10 -
 6 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 385d10411fa0..ba0d8df88c19 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -43,6 +43,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] statementboolean ]
 TIMING [ boolean ]
 SUMMARY [ boolean ]
+TRIGGERS [ boolean ]
 FORMAT { TEXT | XML | JSON | YAML }
 
  
@@ -224,6 +225,14 @@ ROLLBACK;
 

 
+   
+TRIGGERS
+
+ 
+   TODO
+ 
+
+   

 FORMAT
 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 62fb3434a32f..93845bcaad36 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -149,6 +149,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 	ListCell   *lc;
 	bool		timing_set = false;
 	bool		summary_set = false;
+	bool		triggers_set = false;
 
 	/* Parse options list. */
 	foreach(lc, stmt->options)
@@ -175,6 +176,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 			summary_set = true;
 			es->summary = defGetBoolean(opt);
 		}
+		else if (strcmp(opt->defname, "triggers") == 0)
+		{
+			triggers_set = true;
+			es->triggers = defGetBoolean(opt);
+		}
 		else if (strcmp(opt->defname, "format") == 0)
 		{
 			char	   *p = defGetString(opt);
@@ -210,6 +216,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString,
 	/* if the timing was not set explicitly, set default value */
 	es->timing = (timing_set) ? es->timing : es->analyze;
 
+	/* if the triggers was not set explicitly, set default value */
+	es->triggers = (triggers_set) ? es->triggers : true;
+
 	/* check that timing is used with EXPLAIN ANALYZE */
 	if (es->timing && !es->analyze)
 		ereport(ERROR,
@@ -556,7 +565,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	}
 
 	/* Print info about runtime of triggers */
-	if (es->analyze)
+	if (es->triggers)
 		ExplainPrintTriggers(es, quer