Re: An improved README experience for PostgreSQL

2024-02-28 Thread Andrew Atkinson
Looks good!

Presentation of Markdown file as HTML on mirrors I know of:
https://github.com/postgres/postgres/blob/master/README.md
https://gitlab.com/postgres/postgres/-/blob/master/README.md



On Wed, Feb 28, 2024 at 2:59 PM Nathan Bossart 
wrote:

> On Wed, Feb 28, 2024 at 02:21:49PM -0600, Nathan Bossart wrote:
> > On Wed, Feb 28, 2024 at 02:07:34PM -0600, Andrew Atkinson wrote:
> >> I agree that starting with the direct conversion is reasonable. Markdown
> >> "modernizes" the file using a popular plain text file format that's
> >> renderable.
> >
> > Thanks.  I'll commit this shortly.
>
> Committed.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>


Re: An improved README experience for PostgreSQL

2024-02-28 Thread Andrew Atkinson
I've grabbed Nathan's patch, and pushed it to GitHub simply to preview the
rendered Markdown there. This isn't intended to be reviewed, this is just
for anyone else that's interested in easily seeing the HTML version of the
Markdown file compared with the earlier one.

Nathan's direct conversion:
https://github.com/postgres/postgres/blob/9c0f1dd350ee29ad3ade2816c4338b7ca5186bba/README.md

Original email version with more sections and content:
https://github.com/andyatkinson/postgres/blob/e88138765750b6f7898089b4016641eee01bf616/README.md

I agree that starting with the direct conversion is reasonable. Markdown
"modernizes" the file using a popular plain text file format that's
renderable.

However, I also think it would be cool to get some input on what the most
useful 2-3 content items are for new developers and make any additions
possible there. In writing this, I had an idea to ask about whether this
topic could be covered as an upcoming PostgreSQL community blog post
series. In theory, we could gather a variety of perspectives that way. That
could make it less subjective if we see several people independently
suggesting a particular wiki page for example, for inclusion in the README.
I'll pursue that outside the mailing list and report back!



On Wed, Feb 28, 2024 at 1:36 PM Tom Lane  wrote:

> Nathan Bossart  writes:
> > On Wed, Feb 28, 2024 at 01:08:03PM -0600, Nathan Bossart wrote:
> >> -PostgreSQL Database Management System
> >> -=
> >> +# PostgreSQL Database Management System
>
> > This change can be omitted, which makes the conversion even simpler.
>
> That's a pretty convincing proof-of-concept.  Let's just do this,
> and then make sure to keep the file legible as plain text.
>
> regards, tom lane
>


Re: Add publisher and subscriber to glossary documentation.

2024-02-26 Thread Andrew Atkinson
If there's a movement towards "node" to refer to the database which has the
Subscription object, then perhaps the documentation for

31.2. Subscription, Chapter 31. Logical Replication

should be updated as well, since it uses both the "database" and "node"
terms on the same page, and to me referring to the same thing (I could be
missing a subtlety).

See:

"The subscriber database..."

"A subscriber node may..."

Also, the word "database" in this sentence: "A subscription defines the
connection to another database" to me works, but I think using "node" there
could be more consistent if it’s referring to the server instance running
the database that holds the PUBLICATION. The connection string information
example later on the page shows "host" and "dbname" configured in the
CONNECTION value for the SUBSCRIPTION. This sentence seems like the use of
"database" in casual style to mean the "server instance" (or "node").

Also, the "The node where a subscription is defined". That one actually
feels to me like "The database where a subscription is defined", but then
that contradicts what I just said, and "node" is fine here but I think
"node" should be on the preceding sentence too.

Anyway, hopefully these examples show “node” and “database” are mixed and
perhaps others agree using one consistently might help the goals of the
docs.


Thanks!



On Mon, Feb 26, 2024 at 1:08 AM Peter Smith  wrote:

> Hi, the patch v4 LGTM.
>
> ==
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>
>
>


Re: An improved README experience for PostgreSQL

2024-02-26 Thread Andrew Atkinson
Thanks for the feedback Nathan and Tom. Samay also suggested adding the
patch. I've added a .patch with the file for consideration.

On Mon, Feb 26, 2024 at 2:30 PM Tom Lane  wrote:

> Nathan Bossart  writes:
> > I think this would be nice.  If the Markdown version is reasonably
> readable
> > as plain-text, maybe we could avoid maintaining two READMEs files, too.
> > But overall, +1 to modernizing the README a bit.
>
> Per past track record, we change the top-level README only once every
> three years or so, so I doubt it'd be too painful to maintain two
> versions of it.
>
> Having said that, any proposal for this ought to be submitted as
> a patch, rather than expecting people to go digging around on
> some other repo.
>
> regards, tom lane
>


0001-Adds-modernized-README.md.patch
Description: Binary data


An improved README experience for PostgreSQL

2024-02-26 Thread Andrew Atkinson
Hello Hackers. We’re proposing an improved README for PostgreSQL that
includes more helpful links for prospective PostgreSQL contributors and has
a nicer presentation.

Although development does not take place on GitHub or GitLab for
PostgreSQL, many developers might view the PostgreSQL source code using one
of those mirrors (I do myself). Since both support Markdown files, a
Markdown version of the README (as README.md) gets presentational benefits
that I think are helpful.

For a head-to-head comparison of what that looks like, review the current
README and a proposed README.md version below:

Current version:

https://github.com/andyatkinson/postgres/blob/master/README

Markdown README.md version on GitHub:

https://github.com/andyatkinson/postgres/blob/e88138765750b6f7898089b4016641eee01bf616/README.md


 Feedback Requested 

Samay Sharma are both interested in the initial developer experience for
PostgreSQL. We had a chat about the role the README plays in that, while
it's a small role, we thought this might be a place to start.


We'd love some feedback.

Prospective contributors need to know about compilation, the mailing lists,
and how the commitfest events work. This information is scattered around on
wiki pages, but we're wondering if more could be brought into the README
and whether that would help?

If you do check out the new file, we'd love to know whether you think
there's useful additions, or there's content that's missing.

If there's any kind of feedback or consensus on this thread, I'm happy to
create and send a patch.

Thanks for taking a look!

Andrew Atkinson w/ reviews from Samay Sharma


Re: [PATCH] pgbench log file headers

2023-11-13 Thread Andrew Atkinson
Hi Adam. Column headers in pgbench log files seem helpful. Besides
programs, it seems helpful for humans to understand the column data as
well. I was able to apply your patch and verify that the headers are added
to the log file:

andy@MacBook-Air-4 ~/P/postgres (master)> rm pgbench_log.*

andy@MacBook-Air-4 ~/P/postgres (master)> src/bin/pgbench/pgbench
postgres://andy:@localhost:5432/postgres --log --log-header

pgbench (17devel)




andy@MacBook-Air-4 ~/P/postgres (master)> cat pgbench_log.*

client_id transaction_no time script_no time_epoch time_us

0 1 8435 0 1699902315 902700

0 2 1130 0 1699902315 903973

...



The generated pgbench_log.62387 log file showed headers "client_id
transaction_no time script_no time_epoch time_us". Hope that helps with
your patch acceptance journey.


Good luck!


Andrew Atkinson

On Mon, Nov 13, 2023 at 11:55 AM Adam Hendel  wrote:

> Hello Hackers!
>
> Currently, pgbench will log individual transactions to a logfile when the
> `--log` parameter flag is provided. The logfile, however, does not include
> column header. It has become a fairly standard expectation of users to have
> column headers present in flat files. Without the header in the pgbench log
> files, new users must navigate to the docs and piece together the column
> headers themselves. Most industry leading frameworks have tooling built in
> to read column headers though, for example python/pandas read_csv().
>
> We can improve the experience for users by adding column headers to
> pgbench logfiles with an optional command line flag, `--log-header`. This
> will keep the API backwards compatible by making users opt-in to the column
> headers. It follows the existing pattern of having conditional flags in
> pgbench’s API; the `--log` option would have both –log-prefix and
> –log-header if this work is accepted.
>
> The implementation considers the column headers only when the
> `--log-header` flag is present. The values for the columns are taken
> directly from the “Per-Transaction Logging” section in
> https://www.postgresql.org/docs/current/pgbench.html and takes into
> account the conditional columns `schedule_lag` and `retries`.
>
>
> Below is an example of what that logfile will look like:
>
>
> pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
> --log-header
>
> client_id transaction_no time script_no time_epoch time_us
>
> 0 1 1863 0 169988 791102
>
> 0 2 706 0 169988 791812
>
>
> If the interface and overall approach makes sense, I will work on adding
> documentation and tests for this too.
>
> Respectfully,
>
> Adam Hendel
>
>


Re: [Doc] Glossary Term Definitions Edits

2023-10-14 Thread Andrew Atkinson
> It would depend on how you pronounce SQL.
Got it, makes sense.

> We've standardised our docs
Makes sense. This "a vs. an" could be a nice thing to add to a
"conventions" or "doc standards" if it's not there already. I checked
https://www.postgresql.org/docs/current/notation.html and
https://wiki.postgresql.org/wiki/Main_Page Is there a docs page that has
that information? If there's an existing page where it could be added, I'd
be happy to add it.

> That's a suspended hyphen and is common usage.
Sounds good, reset it back.

 > "Has been" means that something happened just now.
Sounds good, reset it back. "has been" is also used in the materialized
term, "has been pre-computed".

> I think "option to" is not wrong
Ok, don't feel strongly. Reset it back.

> That's a suspended hyphen and is common usage.
Ok, reset it back.


Curious what people think about this. I thought the first phrase was
possibly redundant.


- On operating systems with a root user,

- said user is not allowed to be the cluster owner.

+ The user root is not allowed to be the cluster
owner.



I reviewed the definitions of assure vs. ensure, and I think ensure fits
better, but I also noticed elsewhere the word “assurances” is used, as in
an assurance about durability.



- makes it visible to other transactions and assures its

+ makes it visible to other transactions and ensures its



Re: that/which, I put this into ChatGPT :) and apparently there is a
“relative clause” vs. non-relative clause. My understanding was a
non-relative clause would typically be inside commas, and could be removed
without changing the meaning.


Since this section is talking about Bloat, and the space in data pages with
non-current row versions is part of bloat, I don’t think it could be
removed. So I think it’s a “relative clause” and “that” makes more sense.

This is another situation though where if there’s English majors or
documentation experts, I’m happy to learn why I’m wrong. :)



- Space in data pages which does not contain current row versions,

+ Space in data pages that does not contain current row versions,




Smaller patch attached!

Thanks.






On Sat, Oct 14, 2023 at 12:55 AM Erik Wienhold  wrote:

> On 2023-10-14 06:16 +0200, Andrew Atkinson write:
> >- When describing options for a command, changed to “option of”
> instead
> >of “option to”
>
> I think "option to" is not wrong (maybe less common).  I've seen this
> in other texts and took it as "the X option [that applies] to Y".
>
> >- “system- or user-supplied”, removed the dash after system. Or I’d
> >suggest system-supplied or user-supplied, to hyphenate both.
>
> That's a suspended hyphen and is common usage.
>
> >- Changed “volume of records has been written” to “volume of records
> >were written”
>
> "Has been" means that something happened just now.  This is perfectly
> fine when talking about checkpoints IMO.
>
> >- Many examples of “an SQL”. I changed those to “a SQL...”. For
> example
> >I changed “An SQL command which” to “A SQL command that”. I'm not an
> >English major so maybe I'm missing something here.
>
> Depends on how you pronounce SQL (ess-cue-el or sequel).  "An SQL"
> is more common in the docs whereas "a SQL" is more common in code
> comments.
>
> --
> Erik
>


glossary_terms_grammar_edits_v2.patch
Description: Binary data


[Doc] Glossary Term Definitions Edits

2023-10-13 Thread Andrew Atkinson
Hello. I started reading through the Glossary[^1] terms to learn from the
definitions, and to double check them against what I'd written elsewhere. I
found myself making edits. :)

I've put the edits together into a patch. My goal was to focus on wording
simplifications that are smoother to read, without big changes.

I realized I should check with others though, so this is a mid-point
check-in. For now I went through terms from "A" through "I".


Here's a recap of the changes:



   - Changed places like “to make” to use the verb directly, i.e. “make”
   - When describing options for a command, changed to “option of” instead
   of “option to”
   - “system- or user-supplied”, removed the dash after system. Or I’d
   suggest system-supplied or user-supplied, to hyphenate both.
   - Changed “will access”  to “access”
   - Changed “helps to prevent” to “helps prevent”
   - Changed “volume of records has been written” to “volume of records
   were written”
   - “It is required that this user exist” changed to “This user is
   required to exist...” (I’d also suggest “This user must exist before”) as a
   simplification, but that’s a bigger difference from what’s there now.
   - Changed “operating-system” to remove the hyphen, which is how it’s
   written elsewhere in the Glossary.
   - Many examples of “an SQL”. I changed those to “a SQL...”. For example
   I changed “An SQL command which” to “A SQL command that”. I'm not an
   English major so maybe I'm missing something here.
   - I often thought “that” was easier to read than “which”, and there are
   several examples in the patch. For example “Space in data pages that…”
   replaced “Space in data pages which…”
   - Simplifications like: “There also exist two secondary forks” to “There
   are two secondary forks”

I was able to build the documentation locally and preview the HTML version.


If these types of changes are helpful, and can continue a consistent style
through all the terms and provide a new (larger) v2 patch.



Thanks for taking a look.

Andrew Atkinson

[^1]: https://www.postgresql.org/docs/current/glossary.html


glossary_terms_grammar_edits_v1.patch
Description: Binary data


Re: Doc: Minor update for enable_partitionwise_aggregate

2023-10-10 Thread Andrew Atkinson
Thank you for the feedback and clarifications Ashutosh. How about this?

"which allows grouping or aggregation on partitioned tables to be performed
separately for each partition."

Attached a v2 patch file with this change.

Here is a gist w/ a partitioned table and 2 partitions, comparing execution
plans after enabling the parameter, and reading the plan nodes up from the
bottom. With enable_partitionwise_aggregate = on, I can see the Partial
HashAggregate/"Group Key" on each of the two partitions (of the partitioned
table) where I don't see that when the setting is off (by default).
https://gist.github.com/andyatkinson/7af81fb8a5b9e677af6049e29ab2cb73

For the terms partitioned table vs. partitions, I used how they're
described here:
https://www.postgresql.org/docs/current/ddl-partitioning.html
- partitioned table (virtual table)
- partitions (of a partitioned table)

Thanks!
Andrew


On Tue, Oct 3, 2023 at 4:33 AM Ashutosh Bapat 
wrote:

> On Sun, Oct 1, 2023 at 7:38 AM Andy Atkinson 
> wrote:
> >
> > Hello. While reading the docs for the enable_partitionwise_aggregate
> parameter on the Query Planning page, I thought the description had a small
> mistake that could be improved.
> >
> > The current wording is: "which allows grouping or aggregation on a
> partitioned tables performed separately "
> >
> > Page: https://www.postgresql.org/docs/current/runtime-config-query.html
> >
> > I think possible better alternatives could be:
> >
> > (Option 1) a "partitioned table's partitions" (the possessive form of
> "it's"). The "enable_partition_pruning" parameter uses "the partitioned
> table's partitions" in this form. I think this option is good, but I had a
> slight preference for option 2.
> > (Option 2) Or to just cut out the first part and say "to be performed
> separately for each partition", which seemed simpler. So the sentence
> reads: "which allows grouping or aggregation to be performed separately for
> each partition"
>
> I would leave "on a partitioned table". Notice that I have removed "s"
> from tables.
>
> > (Option 3) dropping the "a" so it says "which allows grouping or
> aggregation on partitioned tables performed separately". I don't think this
> is as good though because the aggregation happens on the partitions, so it
> feels slightly off to me to say the "partitioned tables" instead of the
> partitions.
>
> It's technically incorrect as well. Aggregation is performed on a
> single relation always - a join or subquery or simple relation. A join
> may have multiple tables in it but the aggregation is performed on its
> result and not individual tables and hence not on partitions of
> individual tables.
>
> --
> Best Wishes,
> Ashutosh Bapat
>


query-planning-enable-partitionwise-aggregate-improved-wording_v2.patch
Description: Binary data