Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 1/23/15 3:26 AM, Abhijit Menon-Sen wrote: > At 2014-12-24 08:10:46 -0500, pete...@gmx.net wrote: >> >> As a demo for how this might look, attached is a wildly incomplete >> patch to produce GNU long-link headers. > > Hi Peter. > > In what way exactly is this patch wildly incomplete? (I ask because it's > been added to the current CF). This patch is not in the commit fest. It's just the most recently posted patch-like attachment in this thread, which confuses the new CP app. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 2/2/15 8:58 AM, Robert Haas wrote: > I think we should commit this, where by "this" I mean your patch to > error-check the length of filenames and symlinks instead of truncating > them. done -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Fri, Nov 7, 2014 at 9:03 PM, Peter Eisentraut wrote: > On 11/4/14 3:52 PM, Peter Eisentraut wrote: >> Here are patches to address that. First, it reports errors when >> attempting to create a tar header that would truncate file or symlink >> names. Second, it works around the problem in the tests by creating a >> symlink from the short-name tempdir that we had set up for the >> Unix-socket directory case. > > I ended up splitting this up differently. I applied to part of the > second patch that works around the length issue in tablespaces. So the > tests now pass in 9.4 and up even in working directories with long > names. This clears up the regression in 9.4. > > The remaining, not applied patch is attached. It errors when the file > name is too long and adds tests for that. This could be applied to 9.5 > and backpatched, if we so choose. It might become obsolete if > https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted. > If that patch doesn't get accepted, I might add my patch to a future > commit fest. I think we should commit this, where by "this" I mean your patch to error-check the length of filenames and symlinks instead of truncating them. I don't know what will become of Amit's patch, but I think this is a good idea anyway. We should perhaps even consider back-patching it, because silently eating people's data is generally not cool. It's possible that there are people out there who know that their filenames and links are being truncated and don't care, and those people would be unhappy to see this back-patched. However, it's also possible that there are people who don't know that this is happening and do care, and those people would be happy about a back-patch. I don't know which group is larger. At the least, I think we should apply it to master; because whatever we end up doing about Amit's patch, adding error checks for conditions where we're chewing up somebody's filenames and spitting out what's left over has got to be a good thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
At 2014-12-24 08:10:46 -0500, pete...@gmx.net wrote: > > As a demo for how this might look, attached is a wildly incomplete > patch to produce GNU long-link headers. Hi Peter. In what way exactly is this patch wildly incomplete? (I ask because it's been added to the current CF). -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Jan 13, 2015 at 4:41 PM, Peter Eisentraut wrote: > I think the key point I'm approaching is that the information should > only ever be in one place, all the time. This is not dissimilar from > why we took the tablespace location out of the system catalogs. Users > might have all kinds of workflows for how they back up, restore, and > move their tablespaces. This works pretty well right now, because the > authoritative configuration information is always in plain view. The > proposal is essentially that we add another location for this > information, because the existing location is incompatible with some > operating system tools. And, when considered by a user, that second > location might or might not collide with or overwrite the first location > at some mysterious times. > > So I think the preferable fix is not to add a second location, but to > make the first location compatible with said operating system tools, > possibly in the way I propose above. I see. I'm a little concerned that following symlinks may be cheaper than whatever system we would come up with for caching the tablespace-name-to-file-name mappings. But that concern might be unfounded, and apart from it I have no reason to oppose your proposal, if you want to do the work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 1/7/15 3:19 PM, Robert Haas wrote: > On Tue, Jan 6, 2015 at 4:33 PM, Peter Eisentraut wrote: >> Currently, when you unpack a tarred basebackup with tablespaces, the >> symlinks will tell you whether you have unpacked the tablespace tars at >> the right place. Otherwise, how do you know? Secondly, you also have >> the option of putting the tablespaces somewhere else by changing the >> symlinks. > > That's a good argument for making the tablespace-map file > human-readable and human-editable, but I don't think it's an argument > for duplicating its contents inaccurately in the filesystem. > >> One way to address this would be to do away with the symlinks altogether >> and have pg_tblspc/12345 be a text file that contains the tablespace >> location. Kind of symlinks implemented in user space. > > Well, that's just spreading the tablespace-map file out into several > files, and maybe keeping it around after we've restored from backup. I think the key point I'm approaching is that the information should only ever be in one place, all the time. This is not dissimilar from why we took the tablespace location out of the system catalogs. Users might have all kinds of workflows for how they back up, restore, and move their tablespaces. This works pretty well right now, because the authoritative configuration information is always in plain view. The proposal is essentially that we add another location for this information, because the existing location is incompatible with some operating system tools. And, when considered by a user, that second location might or might not collide with or overwrite the first location at some mysterious times. So I think the preferable fix is not to add a second location, but to make the first location compatible with said operating system tools, possibly in the way I propose above. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Jan 6, 2015 at 4:33 PM, Peter Eisentraut wrote: > Currently, when you unpack a tarred basebackup with tablespaces, the > symlinks will tell you whether you have unpacked the tablespace tars at > the right place. Otherwise, how do you know? Secondly, you also have > the option of putting the tablespaces somewhere else by changing the > symlinks. That's a good argument for making the tablespace-map file human-readable and human-editable, but I don't think it's an argument for duplicating its contents inaccurately in the filesystem. > One way to address this would be to do away with the symlinks altogether > and have pg_tblspc/12345 be a text file that contains the tablespace > location. Kind of symlinks implemented in user space. Well, that's just spreading the tablespace-map file out into several files, and maybe keeping it around after we've restored from backup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Wed, Jan 7, 2015 at 3:03 AM, Peter Eisentraut wrote: > > On 12/27/14 8:02 PM, Robert Haas wrote: > > On Wed, Dec 24, 2014 at 8:12 AM, Peter Eisentraut wrote: > >> On 12/22/14 10:00 PM, Amit Kapila wrote: > >>> There is already a patch [1] in this CF which will handle both cases, so > >>> I am > >>> not sure if it is very good idea to go with a new tar format to handle this > >>> issue. > >> > >> I think it would still make sense to have proper symlinks in the > >> basebackup if possible, for clarity. > > > > I guess I would have assumed it would be more clear to omit the > > symlinks if we're expecting the server to put them in. Otherwise, the > > server has to remove the existing symlinks and create new ones, which > > introduces various possibilities for failure and confusion. > > Currently, when you unpack a tarred basebackup with tablespaces, the > symlinks will tell you whether you have unpacked the tablespace tars at > the right place. Otherwise, how do you know? via some kind of tablespace map file which will tell us the exact location where symlink need to be pointed and the same will be used to create a symlink. So after you unpack a tarred basebackup with tablespaces, there will be no symlinks; when you start the server (archive recovery) using base backup, it will create the appropriate symlinks. > Secondly, you also have > the option of putting the tablespaces somewhere else by changing the > symlinks. Under the new scheme, the existing symlinks would be > overwritten (or not?). If that is actually correct, then the proposed > fix doesn't really replicate the required functionality on Windows. > > One way to address this would be to do away with the symlinks altogether > and have pg_tblspc/12345 be a text file that contains the tablespace > location. Kind of symlinks implemented in user space. > I think this is somewhat similar to what existing patch [1] does with the different that there is just one file for all the tablespace locations rather than individual file in each tablespace directory. [1] : https://commitfest.postgresql.org/action/patch_view?id=1512 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 12/27/14 8:02 PM, Robert Haas wrote: > On Wed, Dec 24, 2014 at 8:12 AM, Peter Eisentraut wrote: >> On 12/22/14 10:00 PM, Amit Kapila wrote: >>> There is already a patch [1] in this CF which will handle both cases, so >>> I am >>> not sure if it is very good idea to go with a new tar format to handle this >>> issue. >> >> I think it would still make sense to have proper symlinks in the >> basebackup if possible, for clarity. > > I guess I would have assumed it would be more clear to omit the > symlinks if we're expecting the server to put them in. Otherwise, the > server has to remove the existing symlinks and create new ones, which > introduces various possibilities for failure and confusion. Currently, when you unpack a tarred basebackup with tablespaces, the symlinks will tell you whether you have unpacked the tablespace tars at the right place. Otherwise, how do you know? Secondly, you also have the option of putting the tablespaces somewhere else by changing the symlinks. Under the new scheme, the existing symlinks would be overwritten (or not?). If that is actually correct, then the proposed fix doesn't really replicate the required functionality on Windows. One way to address this would be to do away with the symlinks altogether and have pg_tblspc/12345 be a text file that contains the tablespace location. Kind of symlinks implemented in user space. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Wed, Dec 24, 2014 at 8:12 AM, Peter Eisentraut wrote: > On 12/22/14 10:00 PM, Amit Kapila wrote: >> There is already a patch [1] in this CF which will handle both cases, so >> I am >> not sure if it is very good idea to go with a new tar format to handle this >> issue. > > I think it would still make sense to have proper symlinks in the > basebackup if possible, for clarity. I guess I would have assumed it would be more clear to omit the symlinks if we're expecting the server to put them in. Otherwise, the server has to remove the existing symlinks and create new ones, which introduces various possibilities for failure and confusion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 12/22/14 10:00 PM, Amit Kapila wrote: > There is already a patch [1] in this CF which will handle both cases, so > I am > not sure if it is very good idea to go with a new tar format to handle this > issue. I think it would still make sense to have proper symlinks in the basebackup if possible, for clarity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 12/22/14 5:40 PM, Oskari Saarenmaa wrote: > I think we should just use the UStar tar format > (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and > allow long file names; all actively used tar implementations should be > able to handle them. I'll try to write a patch for that soonish. UStar doesn't handle long link targets, only long file names (and then only up to 255 characters, which doesn't seem satisfactory). AFAICT, to allow long link targets, the available solutions are either pax extended headers or GNU-specific long-link extra headers. When I create a symlink with a long target and call tar on it, GNU tar by default creates the GNU long-link header and BSD tar by default creates a pax header. But they are both able to extract either one. As a demo for how this might look, attached is a wildly incomplete patch to produce GNU long-link headers. diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index fbcecbb..f0cffcf 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1233,13 +1233,15 @@ static void _tarWriteHeader(const char *filename, const char *linktarget, struct stat * statbuf) { - char h[512]; + char *h; - tarCreateHeader(h, filename, linktarget, statbuf->st_size, + h = tarCreateHeader(filename, linktarget, statbuf->st_size, statbuf->st_mode, statbuf->st_uid, statbuf->st_gid, statbuf->st_mtime); pq_putmessage('d', h, 512); + + free(h); } /* diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index e7c2939..490dd98 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -911,10 +911,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) if (basetablespace && writerecoveryconf) { -char header[512]; +char *header; int padding; -tarCreateHeader(header, "recovery.conf", NULL, +header = tarCreateHeader("recovery.conf", NULL, recoveryconfcontents->len, 0600, 04000, 02000, time(NULL)); @@ -925,6 +925,8 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) WRITE_TAR_DATA(recoveryconfcontents->data, recoveryconfcontents->len); if (padding) WRITE_TAR_DATA(zerobuf, padding); + +free(header); } /* 2 * 512 bytes empty data at end of file */ diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c index f48d369..60ae013 100644 --- a/src/bin/pg_dump/pg_backup_tar.c +++ b/src/bin/pg_dump/pg_backup_tar.c @@ -1300,11 +1300,13 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th) static void _tarWriteHeader(TAR_MEMBER *th) { - char h[512]; + char *h; - tarCreateHeader(h, th->targetFile, NULL, th->fileLen, 0600, 04000, 02000, time(NULL)); + h = tarCreateHeader(th->targetFile, NULL, th->fileLen, 0600, 04000, 02000, time(NULL)); /* Now write the completed header. */ if (fwrite(h, 1, 512, th->tarFH) != 512) WRITE_ERROR_EXIT; + + free(h); } diff --git a/src/include/pgtar.h b/src/include/pgtar.h index 2dd6f6b..c3e0eb5 100644 --- a/src/include/pgtar.h +++ b/src/include/pgtar.h @@ -11,5 +11,5 @@ * *- */ -extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime); +extern char * tarCreateHeader(const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime); extern int tarChecksum(char *header); diff --git a/src/port/tar.c b/src/port/tar.c index 8ef4f9c..d8183d7 100644 --- a/src/port/tar.c +++ b/src/port/tar.c @@ -49,10 +49,26 @@ tarChecksum(char *header) * must always have space for 512 characters, which is a requirement by * the tar format. */ -void -tarCreateHeader(char *h, const char *filename, const char *linktarget, +char * +tarCreateHeader(const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime) { + char *h; + char *ext_header = NULL; + char *ext_content = NULL; + + if (0 && linktarget && strlen(linktarget) > 99) + { + ext_header = calloc(512, 1); + print_val(&ext_header[124], 512, 8, 11); + sprintf(&ext_header[156], "K"); + sprintf(&ext_header[148], "%06o ", tarChecksum(ext_header)); + + ext_content = calloc(512, 1); + strcpy(ext_content, linktarget); + } + h = malloc(512); + /* * Note: most of the fields in a tar header are not supposed to be * null-terminated. We use sprintf, which will write a null after the @@ -141,4 +157,18 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget, * 6 digits, a space, and a null, which is legal per POSIX. */ sprintf(&h[148], "%06o ", tarChecksum(h)); + + if (ext_header || ext_content) + { + char *ret = malloc(3 * 512); + memcpy(ret, ext_header, 512); + memcpy(ret + 512, ext_content, 512); + memcpy(ret + 1024, h
Re: [HACKERS] pg_basebackup fails with long tablespace paths
23.12.2014, 05:00, Amit Kapila kirjoitti: > On Tue, Dec 23, 2014 at 4:10 AM, Oskari Saarenmaa wrote: >> 08.11.2014, 04:03, Peter Eisentraut kirjoitti: >> > It errors when the file >> > name is too long and adds tests for that. This could be applied to 9.5 >> > and backpatched, if we so choose. It might become obsolete if >> > https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted. >> > If that patch doesn't get accepted, I might add my patch to a future >> > commit fest. >> >> I think we should just use the UStar tar format >> (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and >> allow long file names; all actively used tar implementations should be >> able to handle them. I'll try to write a patch for that soonish. >> > > I think even using UStar format won't make it work for Windows where > the standard utilities are not able to understand the symlinks in tar. > There is already a patch [1] in this CF which will handle both cases, so > I am > not sure if it is very good idea to go with a new tar format to handle this > issue. > > [1] : https://commitfest.postgresql.org/action/patch_view?id=1512 That patch makes sense for 9.5, but I don't think it's going to be backpatched to previous releases? I think we should also apply Peter's patch to master and backbranches to avoid creating invalid tar files anywhere. And optionally implement and backpatch long filename support in tar even if 9.5 no longer creates tar files with long names. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Dec 23, 2014 at 4:10 AM, Oskari Saarenmaa wrote: > > 08.11.2014, 04:03, Peter Eisentraut kirjoitti: > > On 11/4/14 3:52 PM, Peter Eisentraut wrote: > >> > Here are patches to address that. First, it reports errors when > >> > attempting to create a tar header that would truncate file or symlink > >> > names. Second, it works around the problem in the tests by creating a > >> > symlink from the short-name tempdir that we had set up for the > >> > Unix-socket directory case. > > I ended up splitting this up differently. I applied to part of the > > second patch that works around the length issue in tablespaces. So the > > tests now pass in 9.4 and up even in working directories with long > > names. This clears up the regression in 9.4. > > > > The remaining, not applied patch is attached. It errors when the file > > name is too long and adds tests for that. This could be applied to 9.5 > > and backpatched, if we so choose. It might become obsolete if > > https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted. > > If that patch doesn't get accepted, I might add my patch to a future > > commit fest. > > I think we should just use the UStar tar format > (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and > allow long file names; all actively used tar implementations should be > able to handle them. I'll try to write a patch for that soonish. > I think even using UStar format won't make it work for Windows where the standard utilities are not able to understand the symlinks in tar. There is already a patch [1] in this CF which will handle both cases, so I am not sure if it is very good idea to go with a new tar format to handle this issue. [1] : https://commitfest.postgresql.org/action/patch_view?id=1512 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_basebackup fails with long tablespace paths
08.11.2014, 04:03, Peter Eisentraut kirjoitti: > On 11/4/14 3:52 PM, Peter Eisentraut wrote: >> > Here are patches to address that. First, it reports errors when >> > attempting to create a tar header that would truncate file or symlink >> > names. Second, it works around the problem in the tests by creating a >> > symlink from the short-name tempdir that we had set up for the >> > Unix-socket directory case. > I ended up splitting this up differently. I applied to part of the > second patch that works around the length issue in tablespaces. So the > tests now pass in 9.4 and up even in working directories with long > names. This clears up the regression in 9.4. > > The remaining, not applied patch is attached. It errors when the file > name is too long and adds tests for that. This could be applied to 9.5 > and backpatched, if we so choose. It might become obsolete if > https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted. > If that patch doesn't get accepted, I might add my patch to a future > commit fest. I think we should just use the UStar tar format (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and allow long file names; all actively used tar implementations should be able to handle them. I'll try to write a patch for that soonish. Until UStar format is used we should raise an error if a filename is being truncated by tar instead of creating invalid archives. Also note that Posix tar format allows 100 byte file names as the name doesn't have to be zero terminated, but we may want to stick to 99 bytes in old type tar anyway as using 100 byte filenames has shown bugs in other tar implementations, for example https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=689582 - and truncating at 100 bytes instead of 99 doesn't help us too much anyway. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 11/4/14 3:52 PM, Peter Eisentraut wrote: > Here are patches to address that. First, it reports errors when > attempting to create a tar header that would truncate file or symlink > names. Second, it works around the problem in the tests by creating a > symlink from the short-name tempdir that we had set up for the > Unix-socket directory case. I ended up splitting this up differently. I applied to part of the second patch that works around the length issue in tablespaces. So the tests now pass in 9.4 and up even in working directories with long names. This clears up the regression in 9.4. The remaining, not applied patch is attached. It errors when the file name is too long and adds tests for that. This could be applied to 9.5 and backpatched, if we so choose. It might become obsolete if https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted. If that patch doesn't get accepted, I might add my patch to a future commit fest. From bd15c32714c9292a93763c7e74508c3643a7b6cd Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 4 Nov 2014 15:19:18 -0500 Subject: [PATCH] Error when creating names too long for tar format The tar format (at least the version we are using), does not support file names or symlink targets longer than 99 bytes. Until now, the tar creation code would silently truncate any names that are too long. (Its original application was pg_dump, where this never happens.) This creates problems when running base backups over the replication protocol. The most important problem is when a tablespace path is longer than 99 bytes, which will result in a truncated tablespace path being backed up. Less importantly, the basebackup protocol also promises to back up any other files it happens to find in the data directory, which would also lead to file name truncation if someone put a file with a long name in there. Now both of these cases result in an error during the backup. Add tests that fail when a too-long file name or symlink is attempted to be backed up. --- src/backend/replication/basebackup.c | 21 - src/bin/pg_basebackup/t/010_pg_basebackup.pl | 15 ++- src/include/pgtar.h | 10 +- src/port/tar.c | 10 +- 4 files changed, 52 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index fbcecbb..5835725 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget, struct stat * statbuf) { char h[512]; + enum tarError rc; - tarCreateHeader(h, filename, linktarget, statbuf->st_size, + rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size, statbuf->st_mode, statbuf->st_uid, statbuf->st_gid, statbuf->st_mtime); + switch (rc) + { + case TAR_OK: + break; + case TAR_NAME_TOO_LONG: + ereport(ERROR, + (errmsg("file name too long for tar format: \"%s\"", + filename))); + break; + case TAR_SYMLINK_TOO_LONG: + ereport(ERROR, + (errmsg("symbolic link target too long for tar format: file name \"%s\", target \"%s\"", + filename, linktarget))); + break; + default: + elog(ERROR, "unrecognized tar error: %d", rc); + } + pq_putmessage('d', h, 512); } diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index c966de0..7e9a776 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -2,7 +2,7 @@ use warnings; use Cwd; use TestLib; -use Test::More tests => 33; +use Test::More tests => 35; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -49,6 +49,13 @@ 'tar format'); ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created'); +my $superlongname = "superlongname_" . ("x"x100); + +system_or_bail 'touch', "$tempdir/pgdata/$superlongname"; +command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft' ], + 'pg_basebackup tar with long name fails'); +unlink "$tempdir/pgdata/$superlongname"; + # Create a temporary directory in the system location and symlink it # to our physical temp location. That way we can use shorter names # for the tablespace directories, which hopefully won't run afoul of @@ -117,3 +124,9 @@ command_fails( [ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ], '-T with invalid format fails'); + +mkdir "$tempdir/$superlongname"; +psql 'postgres', "CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';"; +command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' ], + 'pg_basebackup tar with long symlink target fails'); +psql 'postgres', "DROP TABLESPACE tblspc3;"; diff --git a/src/include/pgtar.h b/src/include/pgtar.h index 2dd6f6b..6b6c1e1 100644 --- a/src/include/pgtar.h +++ b/src/include/pgtar.h @@
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/20/14 4:51 PM, Peter Eisentraut wrote: > On 10/20/14 2:59 PM, Tom Lane wrote: >> What do we want to do about this? I think a minimum expectation would be >> for pg_basebackup to notice and complain when it's trying to create an >> unworkably long symlink entry, but it would be far better if we found a >> way to cope instead. > > Isn't it the backend that should error out before sending truncated > files names? > > src/port/tar.c: > > /* Name 100 */ > sprintf(&h[0], "%.99s", filename); Here are patches to address that. First, it reports errors when attempting to create a tar header that would truncate file or symlink names. Second, it works around the problem in the tests by creating a symlink from the short-name tempdir that we had set up for the Unix-socket directory case. The first patch can be backpatched to 9.3. The tar code before that is different and would need manual adjustments. If someone has a too-long tablespace path, I think they can work around that after this patch by creating a shorter symlink and updating the pg_tblspc symlinks to point there. From 16d5eb9514c89391e6c2361212363a7ac2f16e0b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 4 Nov 2014 15:19:18 -0500 Subject: [PATCH 1/2] Error when creating names too long for tar format The tar format (at least the version we are using), does not support file names or symlink targets longer than 99 bytes. Until now, the tar creation code would silently truncate any names that are too long. (Its original application was pg_dump, where this never happens.) This creates problems when running base backups over the replication protocol. The most important problem is when a tablespace path is longer than 99 bytes, which will result in a truncated tablespace path being backed up. Less importantly, the basebackup protocol also promises to back up any other files it happens to find in the data directory, which would also lead to file name truncation if someone put a file with a long name in there. Now both of these cases result in an error during the backup. --- src/backend/replication/basebackup.c | 21 - src/include/pgtar.h | 10 +- src/port/tar.c | 10 +- 3 files changed, 38 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index fbcecbb..5835725 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget, struct stat * statbuf) { char h[512]; + enum tarError rc; - tarCreateHeader(h, filename, linktarget, statbuf->st_size, + rc = tarCreateHeader(h, filename, linktarget, statbuf->st_size, statbuf->st_mode, statbuf->st_uid, statbuf->st_gid, statbuf->st_mtime); + switch (rc) + { + case TAR_OK: + break; + case TAR_NAME_TOO_LONG: + ereport(ERROR, + (errmsg("file name too long for tar format: \"%s\"", + filename))); + break; + case TAR_SYMLINK_TOO_LONG: + ereport(ERROR, + (errmsg("symbolic link target too long for tar format: file name \"%s\", target \"%s\"", + filename, linktarget))); + break; + default: + elog(ERROR, "unrecognized tar error: %d", rc); + } + pq_putmessage('d', h, 512); } diff --git a/src/include/pgtar.h b/src/include/pgtar.h index 2dd6f6b..6b6c1e1 100644 --- a/src/include/pgtar.h +++ b/src/include/pgtar.h @@ -11,5 +11,13 @@ * *- */ -extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime); + +enum tarError +{ + TAR_OK = 0, + TAR_NAME_TOO_LONG, + TAR_SYMLINK_TOO_LONG +}; + +extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime); extern int tarChecksum(char *header); diff --git a/src/port/tar.c b/src/port/tar.c index 09fd6c1..74168e8 100644 --- a/src/port/tar.c +++ b/src/port/tar.c @@ -49,10 +49,16 @@ tarChecksum(char *header) * must always have space for 512 characters, which is a requirement by * the tar format. */ -void +enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime) { + if (strlen(filename) > 99) + return TAR_NAME_TOO_LONG; + + if (linktarget && strlen(linktarget) > 99) + return TAR_SYMLINK_TOO_LONG; + /* * Note: most of the fields in a tar header are not supposed to be * null-terminated. We use sprintf, which will write a null after the @@ -141,4 +147,6 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget, * 6 digits, a space, and a null, which is legal per POSIX. */ sprintf(&h[148], "%06o ", tarChecksum(h)); + + return TAR_OK; } -- 2.1.3 From 25f6daded3a4be8
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/29/14 10:48 AM, Robert Haas wrote: > On Tue, Oct 28, 2014 at 8:29 PM, Peter Eisentraut wrote: >> On 10/20/14 2:59 PM, Tom Lane wrote: >>> My Salesforce colleague Thomas Fanghaenel observed that the TAP tests >>> for pg_basebackup fail when run in a sufficiently deeply-nested directory >>> tree. >> >> As for the test, we can do something like the attached to mark the test >> as "TODO". > > What does this actually do? It doesn't appear that it's just > disabling the test. It still runs the tests, but doesn't count the results in whether the suite passes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Oct 28, 2014 at 8:29 PM, Peter Eisentraut wrote: > On 10/20/14 2:59 PM, Tom Lane wrote: >> My Salesforce colleague Thomas Fanghaenel observed that the TAP tests >> for pg_basebackup fail when run in a sufficiently deeply-nested directory >> tree. > > As for the test, we can do something like the attached to mark the test > as "TODO". What does this actually do? It doesn't appear that it's just disabling the test. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/20/14 2:59 PM, Tom Lane wrote: > My Salesforce colleague Thomas Fanghaenel observed that the TAP tests > for pg_basebackup fail when run in a sufficiently deeply-nested directory > tree. As for the test, we can do something like the attached to mark the test as "TODO". diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 597fb60..695fd98 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -68,6 +68,8 @@ "-T$tempdir/tblspc1=$tempdir/tbackup/tblspc1" ], 'plain format with tablespaces succeeds with tablespace mapping'); ok(-d "$tempdir/tbackup/tblspc1", 'tablespace was relocated'); +TODO: { + local $TODO = 'symlinks >99 chars not supported'; opendir(my $dh, "$tempdir/pgdata/pg_tblspc") or die; ok( ( grep { @@ -77,6 +79,7 @@ } readdir($dh)), "tablespace symlink was updated"); closedir $dh; +} mkdir "$tempdir/tbl=spc2"; psql 'postgres', "DROP TABLE test1;"; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Oct 21, 2014 at 12:29 AM, Tom Lane wrote: > > My Salesforce colleague Thomas Fanghaenel observed that the TAP tests > for pg_basebackup fail when run in a sufficiently deeply-nested directory > tree. The cause appears to be that we rely on standard "tar" format > to represent the symlink for a tablespace, and POSIX tar format has a > hard-wired restriction of 99 bytes in a symlink's expansion. > > What do we want to do about this? I think a minimum expectation would be > for pg_basebackup to notice and complain when it's trying to create an > unworkably long symlink entry, but it would be far better if we found a > way to cope instead. One way to cope with such a situation could be that during backup we create a backup symlink file which contains listing of symlinks and then archive recovery recreates it. Basically this is the solution (patch), I have proposed for Windows [1]. [1] - https://commitfest.postgresql.org/action/patch_view?id=1512 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/20/14 2:59 PM, Tom Lane wrote: > What do we want to do about this? I think a minimum expectation would be > for pg_basebackup to notice and complain when it's trying to create an > unworkably long symlink entry, but it would be far better if we found a > way to cope instead. Isn't it the backend that should error out before sending truncated files names? src/port/tar.c: /* Name 100 */ sprintf(&h[0], "%.99s", filename); And then do we need to prevent the creation of tablespaces that can't be backed up? > One thing we could possibly do without reinventing "tar" is to avoid > using > absolute path names if a PGDATA-relative one would do. Maybe we could hack up the tar format to store the symlink target as the file body, like cpio does. Of course then we'd lose the property of this actually being tar. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_basebackup fails with long tablespace paths
My Salesforce colleague Thomas Fanghaenel observed that the TAP tests for pg_basebackup fail when run in a sufficiently deeply-nested directory tree. The cause appears to be that we rely on standard "tar" format to represent the symlink for a tablespace, and POSIX tar format has a hard-wired restriction of 99 bytes in a symlink's expansion. What do we want to do about this? I think a minimum expectation would be for pg_basebackup to notice and complain when it's trying to create an unworkably long symlink entry, but it would be far better if we found a way to cope instead. One thing we could possibly do without reinventing "tar" is to avoid using absolute path names if a PGDATA-relative one would do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers