Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-24 Thread Taiki Kondo
Hi, Merlin.

Thank you for your comment, and sorry for late response.

 *) how do you estimate %done and ETA when dumping?

I mentioned in the mail I replied to Andres, I think %done and ETA can be 
estimated from number of tuples in pg_class.reltuples.
Pg_dump, you maybe know, writes in file whenever it reads one tuple when 
executing COPY FROM.
Therefore pg_dump can calculate %done and ETA by getting pg_class.reltuples 
and measuring number of dumped tuples per second.

 *) what's the benefit of doing this instead of using a utility like 'pv'?

Thank you for giving new point of view. I have never known about the utility 
'pv'. :)
I tried pg_dump with pv, and then I found this approach uses the number of how 
many chars passed through the pipe.
In my point of view, it seems that using 'pv' has some problems as following.
At least, I think the following points from No.1 to No.4 are benefits.

1) %done and ETA is calculated from number of chars passed through the pipe 
(mentioned above), and total amount of chars is specified by hand.
   Therefore, if specified total amount is completely wrong, %done and ETA have 
a large gap from their true value.
2) Since 'pv' is used with pipe processing, pg_dump/pg_restore can't be used 
together with '-j' option.
   This forces pg_dump/pg_restore to be processing with only 1 process even if 
processing with 2+ processes is possible.
3) Since same reason, command line for pg_dump/pg_restore is longer and less 
easier.
   This may spoil user experiences. 
4) To pass data through pipe, pg_dump can't be used together with '-f' option, 
and pg_restore also can't be used together with '-d' option.
   This also may spoil user experiences because command line is longer and less 
easier.
5) Neither this approach nor my proposal resolve the concern about CREATE 
INDEX.
   We have to discuss more further for it.



regards,
--
Taiki Kondo



-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Friday, June 12, 2015 10:42 PM
To: Taiki Kondo
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

On Fri, Jun 12, 2015 at 7:45 AM, Taiki Kondo tai-ko...@yk.jp.nec.com wrote:
 Hi, all.

 I am newbie in hackers.
 I have an idea from my point of view as one user, I would like to propose the 
 following.


 Progress bar for pg_dump / pg_restore
 =

 Motivation
 --
 pg_dump and pg_restore show nothing if users don't specify verbose (-v) 
 option.
 In too large table to finish in a few minutes, this behavior worries some 
 users about if this situation (nothing shows up) is all right.

 I propose this feature to free these users from worrying.


 Design  API
 
 When pg_dump / pg_restore is running, progress bar and estimated time to 
 finish is shown on screen like following.


 =   (50%)  15:50

 The bar (= in above) and percentage value (50% in above) show percentage 
 of progress, and the time (15:50 in above) shows estimated time to finish.
 (This percentage is the ratio for the whole processing.)

 Percentage and time are calculated and shown for every 1 second.

 In pg_dump, the information, which is required for calculating percentage and 
 time, is from pg_class.

 In pg_restore, to calculate the same things, I want to record total amount of 
 command lines into pg_dump file, thus I would like to add a new element to 
 Archive structure.
 (This means that version number of archive format is changed.)


 Usage
 --
 To use this feature, user must specify -P option in command line.
 (This definition is also temporary, so this is changeable if this leads 
 problem.)

 $ pg_dump -Fc -P -f foo.pgdump foo

 I also think it's better that this feature is enabled as the default and does 
 not force users to specify any options, but it means changing the default 
 behavior, and can make problem in some programs expecting no output on stdout.


 I will implement this feature if this proposal is accepted by hackers.
 (Maybe, I will not use ncurses for implementing this feature, because ncurses 
 can not be used with standard printf family functions.)


 Any comments are welcome.

*) how do you estimate %done and ETA when dumping?

*) what's the benefit of doing this instead of using a utility like 'pv'?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-22 Thread Jim Nasby

On 6/21/15 9:45 PM, Craig Ringer wrote:

And, I also understood your concern about CREATE INDEX, but we have no way to get 
progress information of CREATE INDEX.
At present, I think it may be good to refer to the same time as estimated time to 
execute COPY TO.

You could probably get a handwave-quality estimate by looking at the
average column widths for the cols included in the index plus the
number of tuples in the table. It'd be useless for expression indexes,
partial indexes, etc, but you can't have everything...


Jan UrbaƄski demonstrated[1] getting progress stats for long running 
queries[2] at pgCon 2013. Perhaps some of that code would be useful here 
(or better yet perhaps we could include at least the measuring portion 
of his stuff in core... ;)


[1] https://www.pgcon.org/2013/schedule/events/576.en.html
[2] https://github.com/wulczer/pg-progress
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-21 Thread Craig Ringer
On 19 June 2015 at 16:45, Taiki Kondo tai-ko...@yk.jp.nec.com wrote:
 Hi, andres

 Thank you for your comment, and sorry for late response.

 The question is how to actually get useful estimates. As there's no
 progress report for indvidiual COPY and CREATE INDEX commands you'll, in
 many cases, have very irregular progress updates. In many many cases
 most of the time is spent on a very small subset of the total objects.

 When dumping, I think number of tuples can be got from pg_class.reltuples, 
 therefore I want pg_dump to run select reltuples to get it, and then 
 pg_dump will calculate estimated time to execute COPY FROM command in 
 getting each tuples.

It'd need to be a bit smarter than that, since it'd have to take some
account of average tuple size, etc, but it's an interesting idea to
use the stats to guestimate copy times.

 For restoring, I think it's better to record above information (number of 
 tuples) into pg_dump file to estimate time to restore tables.

Since we generally suggest that people use a pg_dump and pg_restore
from the server version they're going to be restoring to, that should
be OK. It'd create some new entries in the pg_restore file manifest
that older pg_restore versions wouldn't understand.

 And, I also understood your concern about CREATE INDEX, but we have no way 
 to get progress information of CREATE INDEX.
 At present, I think it may be good to refer to the same time as estimated 
 time to execute COPY TO.

You could probably get a handwave-quality estimate by looking at the
average column widths for the cols included in the index plus the
number of tuples in the table. It'd be useless for expression indexes,
partial indexes, etc, but you can't have everything...

Interesting idea to explore.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-19 Thread Taiki Kondo
Hi, andres

Thank you for your comment, and sorry for late response.

 The question is how to actually get useful estimates. As there's no
 progress report for indvidiual COPY and CREATE INDEX commands you'll, in
 many cases, have very irregular progress updates. In many many cases
 most of the time is spent on a very small subset of the total objects.

When dumping, I think number of tuples can be got from pg_class.reltuples, 
therefore I want pg_dump to run select reltuples to get it, and then pg_dump 
will calculate estimated time to execute COPY FROM command in getting each 
tuples.

For restoring, I think it's better to record above information (number of 
tuples) into pg_dump file to estimate time to restore tables.

And, I also understood your concern about CREATE INDEX, but we have no way to 
get progress information of CREATE INDEX.
At present, I think it may be good to refer to the same time as estimated time 
to execute COPY TO.
But it's better to get information from pg_stat_activity which is proposed at 
other thread from Anzai-san as following.

http://www.postgresql.org/message-id/116262cf971c844fb6e793f8809b51c6ea6...@bpxm02gp.gisp.nec.co.jp

How about your opinion?


regards,
--
Taiki Kondo


-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andres Freund
Sent: Friday, June 12, 2015 10:48 PM
To: Taiki Kondo
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

Hi,

On 2015-06-12 12:45:50 +, Taiki Kondo wrote:
 Design  API
 
 When pg_dump / pg_restore is running, progress bar and estimated time to 
 finish is shown on screen like following. 
 
 
 =   (50%)  15:50
 
 The bar (= in above) and percentage value (50% in above) show percentage 
 of progress, and the time (15:50 in above) shows estimated time to finish.
 (This percentage is the ratio for the whole processing.)
 
 Percentage and time are calculated and shown for every 1 second.
 
 In pg_dump, the information, which is required for calculating percentage and 
 time, is from pg_class.
 
 In pg_restore, to calculate the same things, I want to record total amount of 
 command lines into pg_dump file, thus I would like to add a new element to 
 Archive structure.
 (This means that version number of archive format is changed.)

The question is how to actually get useful estimates. As there's no progress 
report for indvidiual COPY and CREATE INDEX commands you'll, in many cases, 
have very irregular progress updates. In many many cases most of the time is 
spent on a very small subset of the total objects.

Greetings,

Andres Freund


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-12 Thread Taiki Kondo
Hi, all.

I am newbie in hackers.
I have an idea from my point of view as one user, I would like to propose the 
following.


Progress bar for pg_dump / pg_restore
=

Motivation
--
pg_dump and pg_restore show nothing if users don't specify verbose (-v) 
option.
In too large table to finish in a few minutes, this behavior worries some users 
about if this situation (nothing shows up) is all right.

I propose this feature to free these users from worrying.


Design  API

When pg_dump / pg_restore is running, progress bar and estimated time to finish 
is shown on screen like following. 


=   (50%)  15:50

The bar (= in above) and percentage value (50% in above) show percentage 
of progress, and the time (15:50 in above) shows estimated time to finish.
(This percentage is the ratio for the whole processing.)

Percentage and time are calculated and shown for every 1 second.

In pg_dump, the information, which is required for calculating percentage and 
time, is from pg_class.

In pg_restore, to calculate the same things, I want to record total amount of 
command lines into pg_dump file, thus I would like to add a new element to 
Archive structure.
(This means that version number of archive format is changed.)


Usage
--
To use this feature, user must specify -P option in command line.
(This definition is also temporary, so this is changeable if this leads 
problem.)

$ pg_dump -Fc -P -f foo.pgdump foo

I also think it's better that this feature is enabled as the default and does 
not force users to specify any options, but it means changing the default 
behavior, and can make problem in some programs expecting no output on stdout.


I will implement this feature if this proposal is accepted by hackers.
(Maybe, I will not use ncurses for implementing this feature, because ncurses 
can not be used with standard printf family functions.)


Any comments are welcome.



Best Regards,

--
Taiki Kondo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-12 Thread Andres Freund
Hi,

On 2015-06-12 12:45:50 +, Taiki Kondo wrote:
 Design  API
 
 When pg_dump / pg_restore is running, progress bar and estimated time to 
 finish is shown on screen like following. 
 
 
 =   (50%)  15:50
 
 The bar (= in above) and percentage value (50% in above) show percentage 
 of progress, and the time (15:50 in above) shows estimated time to finish.
 (This percentage is the ratio for the whole processing.)
 
 Percentage and time are calculated and shown for every 1 second.
 
 In pg_dump, the information, which is required for calculating percentage and 
 time, is from pg_class.
 
 In pg_restore, to calculate the same things, I want to record total amount of 
 command lines into pg_dump file, thus I would like to add a new element to 
 Archive structure.
 (This means that version number of archive format is changed.)

The question is how to actually get useful estimates. As there's no
progress report for indvidiual COPY and CREATE INDEX commands you'll, in
many cases, have very irregular progress updates. In many many cases
most of the time is spent on a very small subset of the total objects.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers