Re: [HACKERS] backup.sgml-cmd-v003.patch

Karl O. Pinc Thu, 26 Sep 2013 20:28:32 -0700

On 09/26/2013 12:15:25 PM, Ivan Lezhnjov IV wrote:
> 
> On Sep 3, 2013, at 6:56 AM, Karl O. Pinc <k...@meme.com> wrote:
> 
> > On 07/31/2013 12:08:12 PM, Ivan Lezhnjov IV wrote:
> > 
> >> Patch filename: backup.sgml-cmd-v003.patch
> >> 
> >> The third version of this patch takes into consideration feedback
> >> received after original submission (it can be read starting from
> this
> >> message http://www.postgresql.org/message-id/CA
> >> +Tgmoaq-9D_mst113TdW=ar8mpgbc+x6t61azk3emhww9g...@mail.gmail.com)
> >> 
> >> Essentially, it addresses the points that were raised in community
> >> feedback and offers better worded statements that avoid implying
> that
> >> some features are being deprecated when it isn't the case. We also
> >> spent some more time polishing other details, like making
> adjustments
> >> to the tone of the text so that it sounds more like a manual, and 
> >> less
> >> like a blog post. More importantly, this chapter now makes it 
> clear
> >> that superuser privileges are not always required to perform a
> >> successful backup because in practice as long as the role used to 
> >> make
> >> a backup has sufficient read privileges on all of the objects a
> user
> >> is interested in it's going to work just fine. We also mention and
> >> show examples of usage for pg_restore and pigz alongside with 
> gzip,
> >> and probably something else too.


> > ---
> > 
> > Cleaned up and clarified here and there.
> > 
> > The bit about OIDs being depreciated might properly belong in 
> > a separate patch.  The same might be said about adding mention of
> pigz.
> > If you submit these as separate patch file attachments
> > they can always be applied in a single commit, but the reverse is 
> > more work for the committer.  (Regardless, I see no reason to
> > have separate commitfest entries or anything other than multiple
> > attachments to the email that finalizes our discussion.)
> 
> Hello,
> 
> took me a while to get here, but a lot has been going on...

No worries.
 
> Okay, I'm new and I don't know why a single patch like this is more
> work for a commiter? Just so I understand and know.

Different committers have different preferences but the general rule
is that it's work to split a patch into pieces if you don't like the
whole thing but it's easy to apply a bunch of small patches and
commit them all at once.  Further each commit should represent
a single "feature" or conceptual change.  Again, preferences vary
but I like to think that a good rule is that 1 commit should
be able to be described in a sentence, and not a run-on sentence
either that says I did this and I also did that and something else.
So if there's a question in your mind about whether a committer
will want your entire change, or if your patch changes unrelated
things then it does not hurt to submit it as separate patches.
All the patches can be attached to a single email and part of
a single commitfest tracking entry, usually.  No need to get
crazy.  These are just things to think about.

In your case I see 3 things happening:

oid depreciation

custom format explanation

pigz promotion


> > My thought is that the part beginning with "The options in detail
> > are:" should not describe all the possibilities for the --format
> > option, that being better left to the reference section.  Likewise,
> > this being prose, it might be best to describe all the options
> > in-line, instead of presented as a list.  I have left it as-is
> > for you to improve as seen fit.
> 
> Agreed, it probably looks better as a sentence.

Looks good.

> 
> > 
> > I have frobbed your <programlisting> to adjust the indentation and
> > line-wrap style.  I submit it here for consideration in case this
> > style is attractive.  This is nothing but conceit.  We should use
> the
> > same style used elsewhere in the documentation.  

> Looks good to me.

I fixed the missing \ I messed up on last time
and slightly re-worded the previous sentence.

I've grep-ed through the sgml looking for multi-line shell scripts
and found only 1 (in sepgsql.sgm).  I don't see a conflict with
the formatting/line-break convention used in the patch, although
it does differ slightly in indentation.  I'm leaving the shell
script formatting in the patch as-is for the committer to judge.
(I like the way it looks but it is not a traditional style.)

> 
> > 
> > I don't know that it's necessary to include pigz examples, because
> it
> > sounds like pigz is a drop-in gzip replacement.  I've left your
> > examples in, in case you feel they are necessary.
> 
> We do. We believe it can encourage more people to consider using it.
> The way we see it, most people seem to be running mutlicore systems
> these days, yet many simply are not aware of pigz.

Ok.  It's your patch.

> > 
> > The existing text of the SQL Dump section could use some alteration
> to
> > reduce redundancy and add clarity.  I'm thinking specifically of
> > mention of pg_restore as being required to restore custom format
> > backups and of the default pg_dump output being not just "plain
> text"
> > but being a collection of SQL commands.  Yes, the latter is obvious
> > upon reflection, psql being what it is, but I think it would be
> > helpful to spell this out.  Especially in the context of the 
> current
> > patch.  There could well be other areas like this to be addressed.
> 
> I don't quite follow you here. I mean, I kinda understand what you
> mean in general, but when I look at the text I fail to see what you
> had in mind specifically.

Specifically, I was waving my hands about in a general fashion.  ;-)

> For example, pg_restore is mentioned only 3 times in section 24.1.
> Each mention seems pretty essential to me. And the text flow is 
> pretty
> natural.

Ok.

> 
> Also, about plain text format being a collection of SQL commands. The
> very first paragraph of the section 24.1 reads "The idea behind this
> dump method is to generate a text file with SQL commands that, when
> fed back to the server, will recreate the database in the same state
> as it was at the time of the dump. PostgreSQL provides the utility
> program pg_dump for this purpose."

I have added/changed some text at the first use of "plain text"
to be more explicit.  (Usually I would use <wordasword>
markup but the pg docs always seem to use <quote> instead,
which is fine so I've done that.)

I also removed some duplicate-ish text towards the end of the
section regards the nature of custom-format dumps in the
context of restoring same with pg_restore.

> Thanks for a detailed response. I attached a patch file that builds 
> on
> your corrections and introduces some of the edits discussed above.

Upon reflection it bothered me to have text in the dump section
talking about restore.  I've moved this into the restore
section and tweaked the text leading into the paste.
This seems to work well but feel free to disagree.

I've one more thought; whether the business with
confusing --file with <filename> is really important enough
to be marked <important> and included
in-line.  In general, too many of these
out of band comments detract from the flow of the document.
I've left the markup as-is, but you might consider changing
this into a <footnote>.  (You'll need to move it into the 
back of the preceding <para>.)  The document currently contains very
few footnotes but this might be an appropriate case.
(I've not looked at the other footnotes to see what they
contain.)

I also see inconsistent use of the SQL term.
In some cases it's marked up with <acronym> and in
some cases not.  I am ignoring the issue.

Attached is a patch for your review (applies against HEAD):
backup.sgml-cmd-v003_3.patch  

If you like it
let me know and I'll get it sent to the committers
or if you want to make some more changes based on
comments above/break it into 3 patches/whatever
get those back to me.

Regards,

Karl <k...@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index ccb76d8..7e748b5 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -38,7 +38,7 @@
 pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">outfile</replaceable>
 </synopsis>
    As you see, <application>pg_dump</> writes its result to the
-   standard output. We will see below how this can be useful.
+   standard output.
   </para>
 
   <para>
@@ -47,8 +47,12 @@ pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable cl
    that you can perform this backup procedure from any remote host that has
    access to the database. But remember that <application>pg_dump</>
    does not operate with special permissions. In particular, it must
-   have read access to all tables that you want to back up, so in
-   practice you almost always have to run it as a database superuser.
+   have read access to all tables that you want to back up. Whether
+   it is going to have proper access permissions is determined by
+   the privileges granted to the role used to connect to the database server.
+   Although a superuser role will always have the necessary permissions
+   as long as the role used has read access to tables, sequences, etc.
+   it need not to be a superuser role.
   </para>
 
   <para>
@@ -95,13 +99,38 @@ pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable cl
 
   <important>
    <para>
-    If your database schema relies on OIDs (for instance, as foreign
-    keys) you must instruct <application>pg_dump</> to dump the OIDs
-    as well. To do this, use the <option>-o</option> command-line
-    option.
+    The use of OIDs on user objects is deprecated. If your database schema
+    relies on OIDs (for instance, as foreign keys) you must instruct
+    <application>pg_dump</> to dump the OIDs as well. To do this, use 
+    the <option>-o</option> command-line option.
    </para>
   </important>
 
+  <para>
+    The above example creates a <quote>plaintext</quote> backup -- a file that
+   contains SQL commands which, when executed, restore the database.  This
+   type of dump can be used to restore a database in full. However there are
+   more sophisticated
+   <productname>PostgreSQL</> backup formats which allow for far greater
+   control when working with backups.  One of these is
+   the <quote>custom</quote> format, which the following more elaborate
+   example creates:
+
+<synopsis>
+pg_dump -U <replaceable class="parameter">username</replaceable> --format=c --file=<replaceable class="parameter">mydatabase.sqlc</replaceable> <replaceable class="parameter">dbname</replaceable>
+</synopsis>
+   where  <replaceable class="parameter">-U</replaceable> instructs <application>pg_dump</> to connect as specified database user, 
+   <replaceable class="parameter">--format</replaceable> sets output file format 
+   as custom (other supported options are: directory, tar, plain text) and 
+   <replaceable class="parameter">--file</replaceable> specifies output file name.
+
+   The most interesting of these is 
+   <replaceable class="parameter">--format</replaceable>.  By default 
+   <application>pg_dump</> creates a plaintext backup.  You may be
+   better off creating a custom format backup, since the custom format is much
+   more flexible.
+  </para>
+
   <sect2 id="backup-dump-restore">
    <title>Restoring the Dump</title>
 
@@ -112,8 +141,8 @@ pg_dump <replaceable class="parameter">dbname</replaceable> &gt; <replaceable cl
 <synopsis>
 psql <replaceable class="parameter">dbname</replaceable> &lt; <replaceable class="parameter">infile</replaceable>
 </synopsis>
-    where <replaceable class="parameter">infile</replaceable> is the
-    file output by the <application>pg_dump</> command. The database <replaceable
+    where <replaceable class="parameter">infile</replaceable> is a plaintext
+    backup output by the <application>pg_dump</> command. The database <replaceable
     class="parameter">dbname</replaceable> will not be created by this
     command, so you must create it yourself from <literal>template0</>
     before executing <application>psql</> (e.g., with
@@ -176,6 +205,46 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
    </important>
 
    <para>
+    The <application>psql</> command is a way, as shown above, to restore
+    plaintext backups.  To restore a custom format backup
+    the <application>pg_restore</> command must be used. It has options
+    similar to those of <application>pg_dump</>. A simple use
+    of <application>pg_restore</> to restore an entire backup is:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">databasename</replaceable> <replaceable class="parameter">filename</replaceable>
+</synopsis>
+    Where <replaceable class="parameter">filename</replaceable> is the name of
+    the backup file.
+   </para>
+
+   <important>
+    <para>
+     Do not confuse <option>--file</> with <replaceable
+     class="parameter">filename</replaceable>. The
+     <option>--file</> option can be used when converting backups from one
+     form to another, the value of <option>--file</> specifying the name of
+     the output file.
+    </para>
+   </important>
+
+  <para>
+   Using the custom format you are able to restore single objects from a
+   backup. For example to restore only a specified index from a backup
+   file:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --index=<replaceable class="parameter">indexname</replaceable>
+</synopsis>
+   To restore only a single function:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --function=<replaceable class="parameter">functionname(args)</replaceable>
+</synopsis>
+   To restore only a single table:
+<synopsis>
+pg_restore -U <replaceable class="parameter">username</replaceable> --dbname=<replaceable class="parameter">dbname</replaceable> --table=<replaceable class="parameter">tablename</replaceable>
+</synopsis>
+  </para>
+
+   <para>
     After restoring a backup, it is wise to run <xref
     linkend="sql-analyze"> on each
     database so the query optimizer has useful statistics;
@@ -222,6 +291,21 @@ psql -f <replaceable class="parameter">infile</replaceable> postgres
     each database will be internally consistent, the snapshots of
     different databases might not be exactly in-sync.
    </para>
+
+   <para>
+    Unfortunately, <application>pg_dumpall</> can only create plaintext
+    backups. However, it is currently the only way to backup the globals in your
+    cluster. So, a reasonable backup strategy to backup your globals and
+    produce a flexible backup of every database in the cluster mig:
+<programlisting>
+pg_dumpall -g -Uusername --file=globals.sql;
+psql -AtU postgres \
+     -c "SELECT datname FROM pg_database WHERE NOT datistemplate" \
+  | while read f;
+      do pg_dump -Upostgres --format=c --file=$f.sqlc $f;
+    done;
+</programlisting>
+   </para>
   </sect2>
 
   <sect2 id="backup-dump-large">
@@ -239,12 +323,21 @@ psql -f <replaceable class="parameter">infile</replaceable> postgres
     <title>Use compressed dumps.</title>
     <para>
      You can use your favorite compression program, for example
-     <application>gzip</application>:
+     <application>gzip</application> or <application>pigz</application>
+     (a parallel implementation of <application>gzip</application> for modern
+     multi-processor, multi-core machines):
 
 <programlisting>
 pg_dump <replaceable class="parameter">dbname</replaceable> | gzip &gt; <replaceable class="parameter">filename</replaceable>.gz
 </programlisting>
 
+     or:
+
+<programlisting>
+pg_dump <replaceable class="parameter">dbname</replaceable> | pigz &gt; <replaceable class="parameter">filename</replaceable>.gz
+</programlisting>
+
+
      Reload with:
 
 <programlisting>
@@ -254,7 +347,7 @@ gunzip -c <replaceable class="parameter">filename</replaceable>.gz | psql <repla
      or:
 
 <programlisting>
-cat <replaceable class="parameter">filename</replaceable>.gz | gunzip | psql <replaceable class="parameter">dbname</replaceable>
+pigz -dc <replaceable class="parameter">filename</replaceable>.gz | psql <replaceable class="parameter">dbname</replaceable>
 </programlisting>
     </para>
    </formalpara>
@@ -293,8 +386,8 @@ cat <replaceable class="parameter">filename</replaceable>* | psql <replaceable c
 pg_dump -Fc <replaceable class="parameter">dbname</replaceable> &gt; <replaceable class="parameter">filename</replaceable>
 </programlisting>
 
-     A custom-format dump is not a script for <application>psql</>, but
-     instead must be restored with <application>pg_restore</>, for example:
+     A custom-format dump must be restored with <application>pg_restore</>,
+     for example:
 
 <programlisting>
 pg_restore -d <replaceable class="parameter">dbname</replaceable> <replaceable class="parameter">filename</replaceable>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] backup.sgml-cmd-v003.patch

Reply via email to