Re: [HACKERS] Question / requests.
On Mon, Oct 10, 2016 at 4:51 AM, Jim Nasby wrote: > On 10/5/16 9:58 AM, Francisco Olarte wrote: >> Is the system catalog a bottleneck for people who has real use for >> paralell vacuum? I mean, to me someone who does this must have a very >> big db on a big iron. If that does not consist of thousands and >> thousands of smallish relations, it will normally be some very big >> tables and a much smaller catalog. > Not necessarily. Anyone that makes extensive use of temp tables can end up > with a very large (and bloated) pg_attribute. AFAIK you can actually create > "temp" versions of any object that lives in a schema by specifying pg_temp > as the schema, but in practice I don't think you'll really see anything > other than pg_attribute get really large. So it would be nice if > pg_attribute could be done in parallel, but I suspect it's one of the > catalog tables that could be causing these problems. This I see, but if you crunch on temp tables I'm not sure you should do full vacuum on the catalog ( I fear full catalog vacuum is going to lock DDL, and this kind of situation is better served by autovacuum maintaning free space in the catalog so it gets to an stable size ). I do not think it is neccessary to make every operation as fast as possible, I prefer a simpler system. I feel someone having a multi terabyte database which needs full vacuums due to its use patterns, in paralell, and also crunchs on a lots of temporarary tables, which a strange use pattern which mandates full vacuums of pg_attribut ( I could concoct a situtation for these, but not easily ) is a specialized power DBA which should be able to easily script vacuum tasks taking into account the usage pattern much better than any reasonable simple alternative. After all -t is there and can be repeated for something. Francisco Olarte. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On 10/5/16 9:58 AM, Francisco Olarte wrote: Is the system catalog a bottleneck for people who has real use for paralell vacuum? I mean, to me someone who does this must have a very big db on a big iron. If that does not consist of thousands and thousands of smallish relations, it will normally be some very big tables and a much smaller catalog. Not necessarily. Anyone that makes extensive use of temp tables can end up with a very large (and bloated) pg_attribute. AFAIK you can actually create "temp" versions of any object that lives in a schema by specifying pg_temp as the schema, but in practice I don't think you'll really see anything other than pg_attribute get really large. So it would be nice if pg_attribute could be done in parallel, but I suspect it's one of the catalog tables that could be causing these problems. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
Robert Haas wrote: > On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte > wrote: > > On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: > >> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera > >> wrote: > > ... > >>> I wonder if the real answer isn't just to disallow -f with parallel > >>> vacuuming. > >> Seems like we should figure out which catalog tables are needed in > >> order to perform a VACUUM, and force those to be done last and one at > >> a time. > > > > Is the system catalog a bottleneck for people who has real use for > > paralell vacuum? > > I don't know, but it seems like the documentation for vacuumdb > currently says, more or less, "Hey, if you use -j with -f, it may not > work!", which seems unacceptable to me. It should be the job of the > person writing the feature to make it work in all cases, not the job > of the person using the feature to work around the problem when it > doesn't. The most interesting use case of vacuumdb is lazy vacuuming, I think, so committing that patch as it was submitted previously was a good step forward even if it didn't handle VACUUM FULL 100%. I agree that it's better to have both modes Just Work in parallel, which is the point of this subsequent patch. So let's move forward. I support Francisco's effort to make -f work with -j. I don't have a strong opinion on which of the various proposals presented so far is the best way to implement it, but let's figure that out and get it done. If you want to argue that vacuumdb -f -j not working properly is a bug in the vacuumdb -j commit, ISTM you're arguing that we should backpatch whatever we come up with as a bug fix, but I would disagree with that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
Robert: On Fri, Oct 7, 2016 at 3:20 PM, Robert Haas wrote: > On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte > wrote: >> On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: >>> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera >>> wrote: >> ... I wonder if the real answer isn't just to disallow -f with parallel vacuuming. >>> Seems like we should figure out which catalog tables are needed in >>> order to perform a VACUUM, and force those to be done last and one at >>> a time. >> >> Is the system catalog a bottleneck for people who has real use for >> paralell vacuum? > I don't know, but it seems like the documentation for vacuumdb > currently says, more or less, "Hey, if you use -j with -f, it may not > work!", which seems unacceptable to me. It should be the job of the > person writing the feature to make it work in all cases, not the job > of the person using the feature to work around the problem when it > doesn't. That may be the case, but the only ways to solve it seems to be disallow full paralell as suggested. OTOH what I was asking was just if people think the time gained by minimizing the part of pg_catalog serially processed on a full-paralell case would be enough to warrant the increased code complexity and bug surface. Anyway, I'll stick to my original plan even if someone decides to fix or disallow full paralell as I think it has it uses. Francisco Olarte. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On Wed, Oct 5, 2016 at 10:58 AM, Francisco Olarte wrote: > On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: >> On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera >> wrote: > ... >>> I wonder if the real answer isn't just to disallow -f with parallel >>> vacuuming. >> Seems like we should figure out which catalog tables are needed in >> order to perform a VACUUM, and force those to be done last and one at >> a time. > > Is the system catalog a bottleneck for people who has real use for > paralell vacuum? I don't know, but it seems like the documentation for vacuumdb currently says, more or less, "Hey, if you use -j with -f, it may not work!", which seems unacceptable to me. It should be the job of the person writing the feature to make it work in all cases, not the job of the person using the feature to work around the problem when it doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On Tue, Oct 4, 2016 at 7:50 PM, Robert Haas wrote: > On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera > wrote: ... >> I wonder if the real answer isn't just to disallow -f with parallel >> vacuuming. > Seems like we should figure out which catalog tables are needed in > order to perform a VACUUM, and force those to be done last and one at > a time. Is the system catalog a bottleneck for people who has real use for paralell vacuum? I mean, to me someone who does this must have a very big db on a big iron. If that does not consist of thousands and thousands of smallish relations, it will normally be some very big tables and a much smaller catalog. Then you can vacuum paralell everything but system catalogs and then vaccum serial those. I do not have that big dbs, but in my modest case system catalogs are very fast to vacuum. If 99% of the time is spent vacuuming non-system it does not make much sense to spend effort on speeding maybe one half of the system catalogs vacuum ( I mean, 99% of the time is non-system, half of system can be done in paralell, with a ten-fold speed up we go from 99+0.5+0.5 to 9.9+0.5+0.5 = 10.9 with full serial system catalogs and to 9.9+0.5+0.05=10.45 with hybrid system vacuum and with a 100 fold speedup, in the realm of SF for me, to 0.99+0.5+0.5=1.99 and 0.99+0.5+0.05=1.54, not that much to be gained ) ( Just asking. ) OTOH while I dig into the code I will take a look to see how complex it will be to build to lists, paralell + serial, and loop on them. This could be used on a first approach to split on !pg_catalog + pg_catalog and used as a base for having and explicit list or some flag in the catalog later. Francisco Olarte. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On Mon, Oct 3, 2016 at 5:44 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte >> wrote: >> > After some messages due to vacuumdb auto-deadlocking itself on the >> > system tables when doing paralell vacuum of a full database I >> > suggested adding some flags to make vacuumdb process schemas. I was >> > asked wether I could write a patch for that and I am thinking on doing >> > it. >> >> What messages are you seeing, exactly? "auto-deadlocking" isn't a thing. > > https://www.postgresql.org/message-id/57EBC9AE.2060302%40163.com > > I wonder if the real answer isn't just to disallow -f with parallel > vacuuming. Seems like we should figure out which catalog tables are needed in order to perform a VACUUM, and force those to be done last and one at a time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On Mon, Oct 3, 2016 at 11:44 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte >> What messages are you seeing, exactly? "auto-deadlocking" isn't a thing. > https://www.postgresql.org/message-id/57EBC9AE.2060302%40163.com Besides that even the docs for -j state "Note that using this mode together with the -f (FULL) option might cause deadlock failures if certain system catalogs are processed in parallel." So the only "safe" way to do -j -f seems to be using a table list. My proposed patch just makes it easy to build that by doing schema filtering. > I wonder if the real answer isn't just to disallow -f with parallel > vacuuming. It may be. I do not feel it is necessary, the docs are clear, this may be like disallowing knifes because you can cut yourself. IMO vacuumdb -f and -j are for people who know what they are doing, a warning may be nice anyway. Anyway, even if the combo is disallowed I feel schema filtering has its use. As an example, in some of my systems I have CDR tables partitioned by timestamp, either monthly or other period. As most of the data does not change I routinely coalesce many partitions and move them to a historic schema ( like I make a yearly partition and zap monthly ones, I still inherit from it ). This let me dump the historic schema when I change it and dump without the historic schema daily, greatly reducing dump times as the historic schema typically contains >90% of the data and it only changes when I have to do a back-fix to the data, which is very rare. Being able to do the same thing with vacuumdb could be useful. So, I'll just follow on executing my plan, but I'm prepared to abort it anytime if people feel it does not hold its weight. Most of the work is going to be learning how to submit a patch so it is reusable for me. Francisco Olarte. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
Robert Haas wrote: > On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte > wrote: > > After some messages due to vacuumdb auto-deadlocking itself on the > > system tables when doing paralell vacuum of a full database I > > suggested adding some flags to make vacuumdb process schemas. I was > > asked wether I could write a patch for that and I am thinking on doing > > it. > > What messages are you seeing, exactly? "auto-deadlocking" isn't a thing. https://www.postgresql.org/message-id/57EBC9AE.2060302%40163.com I wonder if the real answer isn't just to disallow -f with parallel vacuuming. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On Fri, Sep 30, 2016 at 11:20 AM, Francisco Olarte wrote: > After some messages due to vacuumdb auto-deadlocking itself on the > system tables when doing paralell vacuum of a full database I > suggested adding some flags to make vacuumdb process schemas. I was > asked wether I could write a patch for that and I am thinking on doing > it. What messages are you seeing, exactly? "auto-deadlocking" isn't a thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question / requests.
On 9/30/16, Francisco Olarte wrote: > Hello everyone. Hello, Francisco! > Also, although I feel confident in my coding I have zero knowledge of > developing for postgres, It is easy enough and all important steps are documented in the wiki. Also some interesting things can be found in presentations from hackers about how to hack PostgreSQL. > and I am not confident in my git or testing habilities. > I can develop it, as it is just modifying a single libpq > client program and only in the easy part of the string lists and may > be emitting a new error (as this can introduce a new failure mode of > 'no tables to vacuum'), I can test it locally and produce a patch for > that file, but I'm not confident on integrating it, making git patchs > or going further, so I would like to know if doing that would be > enough and then I can give the code to someone to review or integrate > it. Do your best and send a patch. No one is good enough at understanding all the code base at once. There are lot of people who know different parts of the code and who have ability to test patches in different ways. You can be sure you get a feedback, your code will not be merged to the code base without deep review and independent testing. Just be ready to improve your patch according to a feedback and be ready that usually it takes several rounds of sending-review before patch is committed. Also you can follow a discussion from one of simple patches in a commitfest to be familiar with the process. -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers