Re: Can we avoid chdir'ing in resolve_symlinks() ?

2023-03-23 Thread Tom Lane
I wrote:
> I think that changing pg_config's behavior would be enough to resolve
> the complaints you listed, but perhaps I'm missing some fine points.

Meanwhile I've gone ahead and pushed my v1 patch (plus Munro's
recommendation about _fullpath error handling), so we can see if
the buildfarm blows up.  The question of whether we can sometimes
skip replacement of symlinks seems like material for a second patch
in any case.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2023-03-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 05.10.22 15:59, Tom Lane wrote:
>> What did you think of the compromise proposal to change only
>> the paths that pg_config outputs?  I've not tried to code that,
>> but I think it should be feasible.

> I don't think I understand what this proposal actually means.  What 
> would be the behavior of pg_config and how would it be different from 
> before?

What I had in mind was:

* server and most frontend programs keep the same behavior, that is
fully resolve their executable's path to an absolute path (and then
navigate to the rest of the installation from there); but now they'll
use realpath() to avoid chdir'ing while they do that.

* pg_config applies realpath() if its initial PATH search produces a
relative path to the executable, or if the last component of that path
is a symlink.  Otherwise leave it alone, which would have the effect of
not expanding directory-level symlinks.

I think that changing pg_config's behavior would be enough to resolve
the complaints you listed, but perhaps I'm missing some fine points.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2023-03-22 Thread Peter Eisentraut

On 05.10.22 15:59, Tom Lane wrote:

What did you think of the compromise proposal to change only
the paths that pg_config outputs?  I've not tried to code that,
but I think it should be feasible.


I don't think I understand what this proposal actually means.  What 
would be the behavior of pg_config and how would it be different from 
before?






Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-10-05 Thread Tom Lane
Peter Eisentraut  writes:
> To clarify, this instance is not at all the fault of any code in 
> PostgreSQL.  But it's another instance where resolving symlinks just 
> because we can causing problems.

[ shrug... ]  *Not* resolving symlinks when we can causes its
own set of problems, which maybe we don't see very clearly
because we have been doing it like that for a couple of decades.
I remain pretty hesitant to change this behavior.

What did you think of the compromise proposal to change only
the paths that pg_config outputs?  I've not tried to code that,
but I think it should be feasible.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-10-05 Thread Peter Eisentraut

On 15.09.22 16:43, Tom Lane wrote:

That seems ... a tad far-fetched, and even more to the point,
it'd be the other package's fault not ours.  We have never promised
that those directories point to anyplace that's not PG-specific.
I certainly do not buy that that's a good argument for breaking
Postgres installation setups that work today.

Also, there is nothing in that scenario that is in any way dependent
on the use of symlinks, or even absolute paths, so I don't quite
see the relevance to the current discussion.


Here is another variant of the same problem:

I have

$ which meson
/usr/local/bin/meson

Meson records its own path (somewhere under meson-info/ AFAICT), so it 
can re-run itself when any of the meson.build files change.  But since 
the above is a symlink, it records its own location as 
"/usr/local/Cellar/meson/0.63.1/bin/meson".  So now, whenever the meson 
package updates (even if it's just 0.63.0 -> 0.63.1), my build tree is 
broken.


To clarify, this instance is not at all the fault of any code in 
PostgreSQL.  But it's another instance where resolving symlinks just 
because we can causing problems.






Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-10-04 Thread Tom Lane
Thomas Munro  writes:
> I tried lots of crazy stuff[1] to try to get an error out of this
> thing, but came up empty handed.  Unlike realpath(), _fullpath()
> doesn't resolve symlinks (or junctions), so I guess there's less to go
> wrong.  It still needs the present working directory, which is a
> per-drive concept on this OS, but even bogus drives don't seem to
> produce an error (despite what the manual says).

Interesting.

> I'd still lean towards assuming errno is set, given that the manual
> references errno and not GetLastError().

Agreed.  In the attached, I drop the _dosmaperr() step and instead
just do "errno = 0" before the call.  That way, if we ever do manage
to hit a _fullpath() failure, we can at least tell whether the errno
that's reported is real or not.

In this version I've attempted to resolve Peter's complaint by only
applying realpath() when the executable path we've obtained is relative
or has a symlink as the last component.  Things will definitely not
work right if either of those is true and we make no effort to get
a more trustworthy path.  I concede that things will usually work okay
without resolving a symlink that's two or more levels up the path,
but I wonder how much we want to trust that.  Suppose somebody changes
such a symlink while the server is running --- nothing very good is
likely to happen if it suddenly starts consulting some other libdir
or sharedir.  Maybe we need to add a flag telling whether we want
this behavior?  TBH I think that pg_config is the only place I'd
be comfortable with doing it like this.  Peter, would your concerns
be satisfied if we just made pg_config do it?

regards, tom lane

diff --git a/src/common/exec.c b/src/common/exec.c
index 22f04aafbe..06a8601b85 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -14,6 +14,15 @@
  *-
  */
 
+/*
+ * On macOS, "man realpath" avers:
+ *Defining _DARWIN_C_SOURCE or _DARWIN_BETTER_REALPATH before including
+ *stdlib.h will cause the provided implementation of realpath() to use
+ *F_GETPATH from fcntl(2) to discover the path.
+ * This should be harmless everywhere else.
+ */
+#define _DARWIN_BETTER_REALPATH
+
 #ifndef FRONTEND
 #include "postgres.h"
 #else
@@ -58,11 +67,8 @@ extern int	_CRT_glob = 0;		/* 0 turns off globbing; 1 turns it on */
 	(fprintf(stderr, __VA_ARGS__), fputc('\n', stderr))
 #endif
 
-#ifdef _MSC_VER
-#define getcwd(cwd,len)  GetCurrentDirectory(len, cwd)
-#endif
-
-static int	resolve_symlinks(char *path);
+static int	normalize_exec_path(char *path);
+static char *pg_realpath(const char *fname);
 
 #ifdef WIN32
 static BOOL GetTokenUser(HANDLE hToken, PTOKEN_USER *ppTokenUser);
@@ -87,7 +93,7 @@ validate_exec(const char *path)
 	char		path_exe[MAXPGPATH + sizeof(".exe") - 1];
 
 	/* Win32 requires a .exe suffix for stat() */
-	if (strlen(path) >= strlen(".exe") &&
+	if (strlen(path) < strlen(".exe") ||
 		pg_strcasecmp(path + strlen(path) - strlen(".exe"), ".exe") != 0)
 	{
 		strlcpy(path_exe, path, sizeof(path_exe) - 4);
@@ -151,30 +157,16 @@ validate_exec(const char *path)
 int
 find_my_exec(const char *argv0, char *retpath)
 {
-	char		cwd[MAXPGPATH],
-test_path[MAXPGPATH];
 	char	   *path;
 
-	if (!getcwd(cwd, MAXPGPATH))
-	{
-		log_error(errcode_for_file_access(),
-  _("could not identify current directory: %m"));
-		return -1;
-	}
-
 	/*
 	 * If argv0 contains a separator, then PATH wasn't used.
 	 */
-	if (first_dir_separator(argv0) != NULL)
+	strlcpy(retpath, argv0, MAXPGPATH);
+	if (first_dir_separator(retpath) != NULL)
 	{
-		if (is_absolute_path(argv0))
-			strlcpy(retpath, argv0, MAXPGPATH);
-		else
-			join_path_components(retpath, cwd, argv0);
-		canonicalize_path(retpath);
-
 		if (validate_exec(retpath) == 0)
-			return resolve_symlinks(retpath);
+			return normalize_exec_path(retpath);
 
 		log_error(errcode(ERRCODE_WRONG_OBJECT_TYPE),
   _("invalid binary \"%s\": %m"), retpath);
@@ -183,9 +175,8 @@ find_my_exec(const char *argv0, char *retpath)
 
 #ifdef WIN32
 	/* Win32 checks the current directory first for names without slashes */
-	join_path_components(retpath, cwd, argv0);
 	if (validate_exec(retpath) == 0)
-		return resolve_symlinks(retpath);
+		return normalize_exec_path(retpath);
 #endif
 
 	/*
@@ -208,21 +199,15 @@ find_my_exec(const char *argv0, char *retpath)
 			if (!endp)
 endp = startp + strlen(startp); /* point to end */
 
-			strlcpy(test_path, startp, Min(endp - startp + 1, MAXPGPATH));
+			strlcpy(retpath, startp, Min(endp - startp + 1, MAXPGPATH));
 
-			if (is_absolute_path(test_path))
-join_path_components(retpath, test_path, argv0);
-			else
-			{
-join_path_components(retpath, cwd, test_path);
-join_path_components(retpath, retpath, argv0);
-			}
+			join_path_components(retpath, retpath, argv0);
 			canonicalize_path(retpath);
 
 			switch (validate_exec(retpath))
 			{
 case 0:			/* found ok */
-			

Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-26 Thread Thomas Munro
On Sun, Sep 4, 2022 at 2:42 PM Tom Lane  wrote:
> Here's a draft patch for this.  It seems to work on Linux,
> but the Windows code is just speculation.  In particular,
> I did
>
> path = _fullpath(NULL, fname, 0);
> if (path == NULL)
> _dosmaperr(GetLastError());
>
> but I'm not really sure that the _dosmaperr bit is needed,
> because the _fullpath man page I found makes reference to
> setting "errno" [1].  It's likely to be hard to test, because
> most of the possible error cases should be nigh unreachable
> in our usage; we already know the input is a valid reference
> to an executable file.

I tried lots of crazy stuff[1] to try to get an error out of this
thing, but came up empty handed.  Unlike realpath(), _fullpath()
doesn't resolve symlinks (or junctions), so I guess there's less to go
wrong.  It still needs the present working directory, which is a
per-drive concept on this OS, but even bogus drives don't seem to
produce an error (despite what the manual says).

I'd still lean towards assuming errno is set, given that the manual
references errno and not GetLastError().  Typical manual pages
explicitly tell you when GetLastError() has the error (example:
GetFullPathName(), for which this might be intended as a more Unix-y
wrapper, but even if so there's nothing to say that _fullpath() can't
set errno directly itself, in which case you might clobber it that
way).

[1] https://cirrus-ci.com/task/4935917730267136?logs=main




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-15 Thread Tom Lane
Peter Eisentraut  writes:
> On 13.09.22 17:16, Tom Lane wrote:
>> What about it does not work?

> The problem is if another package or extension uses pg_config to find, 
> say, libdir, includedir, or bindir and integrates it into its own build 
> system or its own build products.  If those directories point to 
> /usr/local/pgsql/{bin,include,lib}, then there is no problem.  But if 
> they point to /usr/local/pgsql-14.5/{bin,include,lib}, then the next 
> minor update will break those other packages.

That seems ... a tad far-fetched, and even more to the point,
it'd be the other package's fault not ours.  We have never promised
that those directories point to anyplace that's not PG-specific.
I certainly do not buy that that's a good argument for breaking
Postgres installation setups that work today.

Also, there is nothing in that scenario that is in any way dependent
on the use of symlinks, or even absolute paths, so I don't quite
see the relevance to the current discussion.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-15 Thread Peter Eisentraut

On 13.09.22 17:16, Tom Lane wrote:

Peter Eisentraut  writes:

2) You configure and install with prefix=/usr/local/pgsql-14, and then
symlink /usr/local/pgsql -> /usr/local/pgsql-14; hoping that you can
then use /usr/local/pgsql as if that's where it actually is.  We don't
currently support that.  (Note that it would work if you made a copy of
the tree instead of using the symlink.)


What about it does not work?


The problem is if another package or extension uses pg_config to find, 
say, libdir, includedir, or bindir and integrates it into its own build 
system or its own build products.  If those directories point to 
/usr/local/pgsql/{bin,include,lib}, then there is no problem.  But if 
they point to /usr/local/pgsql-14.5/{bin,include,lib}, then the next 
minor update will break those other packages.






Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-13 Thread Tom Lane
Peter Eisentraut  writes:
> 2) You configure and install with prefix=/usr/local/pgsql-14, and then 
> symlink /usr/local/pgsql -> /usr/local/pgsql-14; hoping that you can 
> then use /usr/local/pgsql as if that's where it actually is.  We don't 
> currently support that.  (Note that it would work if you made a copy of 
> the tree instead of using the symlink.)

What about it does not work?

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-13 Thread Peter Eisentraut

On 13.09.22 01:26, Tom Lane wrote:

Andrew Dunstan  writes:

I think the discussion here is a bit tangential to the original topic.


Indeed, because I just wanted to reimplement *how* we resolve the
executable path to absolute, not question whether we should do it at all.


Well, if we decided not to do it, then we could just delete the code and 
not have to think about how to change it.



I'm not familiar with how homebrew sets up the installation
layout, but I'm suspicious that the situation Peter refers to
has a similar problem, only with a symlink for the bin directory
not the individual executable.


I think the two contradicting use cases are:

1) You configure and install with prefix=/usr/local/pgsql, and then 
symlink ~/bin/pg_ctl -> /usr/local/pgsql/bin/pg_ctl; hoping that that 
will allow pg_ctl to find the other programs it needs in 
/usr/local/pgsql/bin.  This is what we currently support.


2) You configure and install with prefix=/usr/local/pgsql-14, and then 
symlink /usr/local/pgsql -> /usr/local/pgsql-14; hoping that you can 
then use /usr/local/pgsql as if that's where it actually is.  We don't 
currently support that.  (Note that it would work if you made a copy of 
the tree instead of using the symlink.)


I don't know if anyone uses #1 or what the details of such use are.

#2 is how Homebrew and some other packaging systems work.





Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-12 Thread Tom Lane
Andrew Dunstan  writes:
> I think the discussion here is a bit tangential to the original topic.

Indeed, because I just wanted to reimplement *how* we resolve the
executable path to absolute, not question whether we should do it at all.

> The point you make is reasonable, but it seems a bit more likely that in
> the case Peter cites the symlink is one level higher in the tree, in
> which case there's probably little value in resolving the symlink. Maybe
> we could compromise and check if a path exists and only resolve symlinks
> if it does not?

It's non-negotiable that we apply realpath() or a handmade equivalent
if the path we find to the executable turns out to be relative, ie
you did "../../postgres/bin/psql" or the equivalent.  In the case of
the server, we *will* chdir to someplace else, rendering the original
path useless.  psql might chdir in response to a user command, so it
likewise had better resolve the installation location while it can.

We could maybe skip realpath() if we find what appears to be an
absolute path to the executable.  However, I think that fails in
too many situations.  As an example, if I do
ln -s /path/to/psql ~/bin
and then invoke psql using that symlink, we're not going to be
able to find any of the installation's supporting files unless
we resolve the symlink.  The executable path we'd deduce after
examining PATH is /home/tgl/bin/psql, which is plenty absolute,
but it doesn't help us find the rest of the PG installation.
That case works today, and I think a lot of people will be
sad if we break it.

I'm not familiar with how homebrew sets up the installation
layout, but I'm suspicious that the situation Peter refers to
has a similar problem, only with a symlink for the bin directory
not the individual executable.

I think the only potentially-workable alternative design is
to forget about relocatable installations and insist that the
supporting files be found at the installation path designated
at configure time.  But, again, that seems likely to break a
lot of setups that work today.  And I've still not heard a
positive reason why we should change it.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-12 Thread Andrew Dunstan


On 2022-09-12 Mo 16:07, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 12.09.22 17:33, Tom Lane wrote:
>>> Are you proposing we give up the support for relocatable installations?
>>> I'm not here to defend that feature, but I bet somebody will.  (And
>>> doesn't "make check" depend on it?)
>> I'm complaining specifically about the resolving of symlinks.  Why does
>> $ /usr/local/opt/postgresql@13/bin/pg_config --bindir
>> print
>> /usr/local/Cellar/postgresql@13/13.8/bin
>> when it clearly should print
>> /usr/local/opt/postgresql@13/bin
> I'm not sure about your setup there, but if you mean that
> /usr/local/opt/postgresql@13/bin is a symlink reading more or less
> "./13.8/bin", I doubt that failing to canonicalize that is a good idea.
> The point of finding the bindir is mainly to be able to navigate to its
> sibling directories such as lib/, etc/, share/.  There's no certainty
> that a symlink leading to the bin directory will have sibling symlinks
> to those other directories.
>
>   


I think the discussion here is a bit tangential to the original topic.

The point you make is reasonable, but it seems a bit more likely that in
the case Peter cites the symlink is one level higher in the tree, in
which case there's probably little value in resolving the symlink. Maybe
we could compromise and check if a path exists and only resolve symlinks
if it does not?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 12.09.22 17:33, Tom Lane wrote:
>> Are you proposing we give up the support for relocatable installations?
>> I'm not here to defend that feature, but I bet somebody will.  (And
>> doesn't "make check" depend on it?)

> I'm complaining specifically about the resolving of symlinks.  Why does

> $ /usr/local/opt/postgresql@13/bin/pg_config --bindir
> print
> /usr/local/Cellar/postgresql@13/13.8/bin
> when it clearly should print
> /usr/local/opt/postgresql@13/bin

I'm not sure about your setup there, but if you mean that
/usr/local/opt/postgresql@13/bin is a symlink reading more or less
"./13.8/bin", I doubt that failing to canonicalize that is a good idea.
The point of finding the bindir is mainly to be able to navigate to its
sibling directories such as lib/, etc/, share/.  There's no certainty
that a symlink leading to the bin directory will have sibling symlinks
to those other directories.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-12 Thread Peter Eisentraut

On 12.09.22 17:33, Tom Lane wrote:

Peter Eisentraut  writes:

On 02.09.22 01:39, Tom Lane wrote:

find_my_exec() wants to obtain an absolute, symlink-free path
to the program's own executable, for what seem to me good
reasons.



I still think they are bad reasons, and we should kill all that code.
Just sayin' ...


Are you proposing we give up the support for relocatable installations?
I'm not here to defend that feature, but I bet somebody will.  (And
doesn't "make check" depend on it?)


I'm complaining specifically about the resolving of symlinks.  Why does

$ /usr/local/opt/postgresql@13/bin/pg_config --bindir

print

/usr/local/Cellar/postgresql@13/13.8/bin

when it clearly should print

/usr/local/opt/postgresql@13/bin

This is unrelated to the support for relocatable installations, AFAICT.





Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 02.09.22 01:39, Tom Lane wrote:
>> find_my_exec() wants to obtain an absolute, symlink-free path
>> to the program's own executable, for what seem to me good
>> reasons.

> I still think they are bad reasons, and we should kill all that code. 
> Just sayin' ...

Are you proposing we give up the support for relocatable installations?
I'm not here to defend that feature, but I bet somebody will.  (And
doesn't "make check" depend on it?)

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-12 Thread Peter Eisentraut

On 02.09.22 01:39, Tom Lane wrote:

find_my_exec() wants to obtain an absolute, symlink-free path
to the program's own executable, for what seem to me good
reasons.


I still think they are bad reasons, and we should kill all that code. 
Just sayin' ...





Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-03 Thread Tom Lane
Here's a draft patch for this.  It seems to work on Linux,
but the Windows code is just speculation.  In particular,
I did

path = _fullpath(NULL, fname, 0);
if (path == NULL)
_dosmaperr(GetLastError());

but I'm not really sure that the _dosmaperr bit is needed,
because the _fullpath man page I found makes reference to
setting "errno" [1].  It's likely to be hard to test, because
most of the possible error cases should be nigh unreachable
in our usage; we already know the input is a valid reference
to an executable file.

BTW, I noticed what seems a flat-out bug in validate_exec:

/* Win32 requires a .exe suffix for stat() */
-   if (strlen(path) >= strlen(".exe") &&
+   if (strlen(path) < strlen(".exe") ||
pg_strcasecmp(path + strlen(path) - strlen(".exe"), ".exe") != 
0)

Nobody's noticed because none of our executables have base names
shorter than 4 characters, but it's still a bug.

regards, tom lane

[1] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fullpath-wfullpath?view=msvc-170

diff --git a/src/common/exec.c b/src/common/exec.c
index 22f04aafbe..fa01ff7b32 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -14,6 +14,15 @@
  *-
  */
 
+/*
+ * On macOS, "man realpath" avers:
+ *Defining _DARWIN_C_SOURCE or _DARWIN_BETTER_REALPATH before including
+ *stdlib.h will cause the provided implementation of realpath() to use
+ *F_GETPATH from fcntl(2) to discover the path.
+ * This should be harmless everywhere else.
+ */
+#define _DARWIN_BETTER_REALPATH
+
 #ifndef FRONTEND
 #include "postgres.h"
 #else
@@ -58,11 +67,8 @@ extern int	_CRT_glob = 0;		/* 0 turns off globbing; 1 turns it on */
 	(fprintf(stderr, __VA_ARGS__), fputc('\n', stderr))
 #endif
 
-#ifdef _MSC_VER
-#define getcwd(cwd,len)  GetCurrentDirectory(len, cwd)
-#endif
-
-static int	resolve_symlinks(char *path);
+static int	normalize_exec_path(char *path);
+static char *pg_realpath(const char *fname);
 
 #ifdef WIN32
 static BOOL GetTokenUser(HANDLE hToken, PTOKEN_USER *ppTokenUser);
@@ -87,7 +93,7 @@ validate_exec(const char *path)
 	char		path_exe[MAXPGPATH + sizeof(".exe") - 1];
 
 	/* Win32 requires a .exe suffix for stat() */
-	if (strlen(path) >= strlen(".exe") &&
+	if (strlen(path) < strlen(".exe") ||
 		pg_strcasecmp(path + strlen(path) - strlen(".exe"), ".exe") != 0)
 	{
 		strlcpy(path_exe, path, sizeof(path_exe) - 4);
@@ -151,30 +157,16 @@ validate_exec(const char *path)
 int
 find_my_exec(const char *argv0, char *retpath)
 {
-	char		cwd[MAXPGPATH],
-test_path[MAXPGPATH];
 	char	   *path;
 
-	if (!getcwd(cwd, MAXPGPATH))
-	{
-		log_error(errcode_for_file_access(),
-  _("could not identify current directory: %m"));
-		return -1;
-	}
-
 	/*
 	 * If argv0 contains a separator, then PATH wasn't used.
 	 */
-	if (first_dir_separator(argv0) != NULL)
+	strlcpy(retpath, argv0, MAXPGPATH);
+	if (first_dir_separator(retpath) != NULL)
 	{
-		if (is_absolute_path(argv0))
-			strlcpy(retpath, argv0, MAXPGPATH);
-		else
-			join_path_components(retpath, cwd, argv0);
-		canonicalize_path(retpath);
-
 		if (validate_exec(retpath) == 0)
-			return resolve_symlinks(retpath);
+			return normalize_exec_path(retpath);
 
 		log_error(errcode(ERRCODE_WRONG_OBJECT_TYPE),
   _("invalid binary \"%s\": %m"), retpath);
@@ -183,9 +175,8 @@ find_my_exec(const char *argv0, char *retpath)
 
 #ifdef WIN32
 	/* Win32 checks the current directory first for names without slashes */
-	join_path_components(retpath, cwd, argv0);
 	if (validate_exec(retpath) == 0)
-		return resolve_symlinks(retpath);
+		return normalize_exec_path(retpath);
 #endif
 
 	/*
@@ -208,21 +199,15 @@ find_my_exec(const char *argv0, char *retpath)
 			if (!endp)
 endp = startp + strlen(startp); /* point to end */
 
-			strlcpy(test_path, startp, Min(endp - startp + 1, MAXPGPATH));
+			strlcpy(retpath, startp, Min(endp - startp + 1, MAXPGPATH));
 
-			if (is_absolute_path(test_path))
-join_path_components(retpath, test_path, argv0);
-			else
-			{
-join_path_components(retpath, cwd, test_path);
-join_path_components(retpath, retpath, argv0);
-			}
+			join_path_components(retpath, retpath, argv0);
 			canonicalize_path(retpath);
 
 			switch (validate_exec(retpath))
 			{
 case 0:			/* found ok */
-	return resolve_symlinks(retpath);
+	return normalize_exec_path(retpath);
 case -1:		/* wasn't even a candidate, keep looking */
 	break;
 case -2:		/* found but disqualified */
@@ -241,105 +226,90 @@ find_my_exec(const char *argv0, char *retpath)
 
 
 /*
- * resolve_symlinks - resolve symlinks to the underlying file
+ * normalize_exec_path - resolve symlinks and convert to absolute path
  *
- * Replace "path" by the absolute path to the referenced file.
+ * Given a path that refers to an executable, chase through any symlinks
+ * to find the real file 

Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-03 Thread Tom Lane
I wrote:
> Andrew Dunstan  writes:
>> These days there seem to be library functions that do this, realpath(3)
>> and canonicalize_file_name(3). The latter is what seems to be called by
>> readlink(1). Should we be using one of those?

> Oh!  I see realpath() in POSIX, but not canonicalize_file_name().
> It does look like realpath() would be helpful here, although if
> it's not present on Windows that's a problem.

After some surveying of man pages, I conclude that

(1) realpath() exists on all platforms of interest except Windows,
where it looks like we can use _fullpath() instead.

(2) AIX and Solaris 10 only implement the SUSv2 semantics,
where the caller must supply a buffer that it has no good way
to determine a safe size for.  Annoying.

(3) The Solaris 10 man page has this interesting disclaimer:

 The realpath() function might fail to return to the current
 directory if an error occurs.

which implies that on that platform it's basically implemented
in the same way as our current code.  Sigh.

I think we can ignore (3) though.  Solaris 11 seems to have an
up-to-speed implementation of realpath(), and 10 will be EOL
in January 2024 according to Wikipedia.

As for (2), both systems promise to report EINVAL for a null
pointer, which is also what SUSv2 says.  So I think what we
can do is approximately

ptr = realpath(fname, NULL);
if (ptr == NULL && errno == EINVAL)
{
ptr = pg_malloc(MAXPGPATH);
ptr = realpath(fname, ptr);
}

and just take it on faith that MAXPGPATH is enough on those
platforms.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-03 Thread Tom Lane
Andrew Dunstan  writes:
> On 2022-09-01 Th 19:39, Tom Lane wrote:
>> find_my_exec() wants to obtain an absolute, symlink-free path
>> to the program's own executable, for what seem to me good
>> reasons.  However, chasing down symlinks is left to its
>> subroutine resolve_symlinks(), which does this:

> These days there seem to be library functions that do this, realpath(3)
> and canonicalize_file_name(3). The latter is what seems to be called by
> readlink(1). Should we be using one of those?

Oh!  I see realpath() in POSIX, but not canonicalize_file_name().
It does look like realpath() would be helpful here, although if
it's not present on Windows that's a problem.

Quick googling suggests that _fullpath() could be used as a substitute.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-03 Thread Andrew Dunstan


On 2022-09-01 Th 19:39, Tom Lane wrote:
> find_my_exec() wants to obtain an absolute, symlink-free path
> to the program's own executable, for what seem to me good
> reasons.  However, chasing down symlinks is left to its
> subroutine resolve_symlinks(), which does this:
>
>  * To resolve a symlink properly, we have to chdir into its directory and
>  * then chdir to where the symlink points; otherwise we may fail to
>  * resolve relative links correctly (consider cases involving mount
>  * points, for example).  After following the final symlink, we use
>  * getcwd() to figure out where the heck we're at.
>
> and then afterwards it has to chdir back to the original cwd.
> That last step is a bit of a sore spot, because sometimes
> (especially in sudo situations) we may not have the privileges
> necessary to do that; I think this is the cause of the complaint
> at [1].  Anyway the whole thing seems a bit excessively Rube
> Goldbergian.  I'm wondering why we couldn't just read the
> symlink(s), concatenate them together, and use canonicalize_path()
> to clean up any mess.
>
> This code was mine originally (336969e49), but I sure don't
> remember why I wrote it like that.  I know we didn't have a
> robust version of canonicalize_path() then, and that may have
> been the main issue, but that offhand comment about mount
> points bothers me.  But I can't reconstruct precisely what
> I was worried about there.  The only contemporaneous discussion
> thread I can find is [2], which doesn't go into coding details.
>
> Thoughts?
>
>   regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAH8yC8kOj0pmHF1RbK2Gb2t4YCcNG-5h0TwZ7yxk3Hzw6C0Otg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/flat/4973.1099605411%40sss.pgh.pa.us
>
>

These days there seem to be library functions that do this, realpath(3)
and canonicalize_file_name(3). The latter is what seems to be called by
readlink(1). Should we be using one of those? I don't know how portable
they are. I don't see them here :-(



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-01 Thread Tom Lane
Isaac Morland  writes:
> On Thu, 1 Sept 2022 at 19:39, Tom Lane  wrote:
>> This code was mine originally (336969e49), but I sure don't
>> remember why I wrote it like that.

> Does this happen in a context where we need to worried about the directory
> structure changing under us, either accidentally or maliciously?

Well, one of the reasons it'd be a good idea to not change cwd is
that then you don't have to worry about that moving while you're
messing around.  But everything else that we're considering here is
either a component of PATH or a directory/symlink associated with
the PG installation.  If $badguy has control of any of that,
you've already lost, so I'm not excited about worrying about it.

> I'm wondering because I understand cd'ing through the structure can avoid
> some of the related problems and might be the reason for doing it that way
> originally.

Pretty sure I was not thinking about that.  I might have been
thinking about AFS installations, which IIRC often have two nominal
paths associated with them.  But I don't recall any details about how
that works, and anyway the comment says nothing about AFS.

> My impression is that the modern equivalent would be to use
> openat() with O_PATH to step through the hierarchy. But then I'm not clear
> on how to get back to the absolute path, given a file descriptor for the
> final directory.

Yeah.  The point here is not to open a particular file, but to derive
a pathname string for where the file is.

What I'm thinking right at the moment is that we don't necessarily
have to have the exact path that getcwd() would report.  We need
*some* path-in-absolute-form that works.  This leads me to think
that both the AFS case and the mount-point case are red herrings.
But I can't shake the feeling that I'm missing something.

regards, tom lane




Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-01 Thread Isaac Morland
On Thu, 1 Sept 2022 at 19:39, Tom Lane  wrote:

This code was mine originally (336969e49), but I sure don't
> remember why I wrote it like that.  I know we didn't have a
> robust version of canonicalize_path() then, and that may have
> been the main issue, but that offhand comment about mount
> points bothers me.  But I can't reconstruct precisely what
> I was worried about there.  The only contemporaneous discussion
> thread I can find is [2], which doesn't go into coding details.
>

Does this happen in a context where we need to worried about the directory
structure changing under us, either accidentally or maliciously?

I'm wondering because I understand cd'ing through the structure can avoid
some of the related problems and might be the reason for doing it that way
originally. My impression is that the modern equivalent would be to use
openat() with O_PATH to step through the hierarchy. But then I'm not clear
on how to get back to the absolute path, given a file descriptor for the
final directory.