Re: make dist using git archive

2024-04-15 Thread Tristan Partin

On Wed Jan 24, 2024 at 11:57 AM CST, Tristan Partin wrote:

On Wed Jan 24, 2024 at 10:18 AM CST, Tristan Partin wrote:
> On Tue Jan 23, 2024 at 3:30 AM CST, Peter Eisentraut wrote:
> > On 22.01.24 21:04, Tristan Partin wrote:
> > 3. Meson does not support tar.bz2 archives.

Submitted https://github.com/mesonbuild/meson/pull/12770.


This has now been merged. It will be in 1.5, so we will probably see it 
in RHEL in a decade :P.


> > 4. Meson uses git archive internally, but then unpacks and repacks the 
> > archive, which loses the ability to use git get-tar-commit-id.


Because Meson allows projects to distribute arbitrary files via 
meson.add_dist_script(), and can include subprojects via `meson dist 
--include-subprojects`, this doesn't seem like an easily solvable 
problem.


> Thanks Peter. I will bring these up with upstream!

I think the solution to point 4 is to not unpack/repack if there are no 
dist scripts and/or subprojects to distribute. I can take a look at 
this later. I think this would also solve points 1, 2, 5, and 6 because 
at that point meson is just calling git-archive.


I think implementing a solution to point 4 is a little bit more pressing 
given that reproducible tarballs are more important after the xz 
debaucle. I will try to give it some effort soon.


--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-03-26 Thread Tristan Partin

On Tue Mar 26, 2024 at 2:56 AM CDT, Andres Freund wrote:

Hi,

On 2024-03-26 08:36:58 +0100, Peter Eisentraut wrote:
> On 26.03.24 01:23, Andres Freund wrote:
> > On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:
> > > Done and committed.
> > 
> > This triggered a new warning for me:
> > 
> > ../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: Project targets '>=0.54' but uses feature introduced in '0.55.0': Passing executable/found program object to script parameter of add_dist_script.
> 
> Hmm, I don't see that.  Is there another version dependency that controls

> when you see version dependency warnings? ;-)

Sometimes an incompatibility is later noticed and a warning is introduced at
that point.

> We could trivially remove this particular line, or perhaps put a
> 
> if meson.version().version_compare('>=0.55')
> 
> around it.  (But would that still warn?)


It shouldn't, no. As long as the code is actually executed within the check,
it avoids the warning. However if you just set a variable inside the version
gated block and then later use the variable outside that, it will
warn. Probably hard to avoid...


The following change also makes the warning go away, but the version 
comparison seems better to me due to how we choose not to use machine 
files for overriding programs[0]. :(


- meson.add_dist_script(perl, ...)
+ meson.add_dist_script('perl', ...)

Aside, but I think since we dropped AIX, we can bump the required Meson 
version. My last analysis of the situation told me that the AIX 
buildfarm animals were the only machines which didn't have a Python 
version capable of running a newer version. I would need to look at the 
situation again though.


[0]: If someone wants to make a plea here: 
https://github.com/mesonbuild/meson/pull/12623


--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-03-26 Thread Andres Freund
Hi,

On 2024-03-26 08:36:58 +0100, Peter Eisentraut wrote:
> On 26.03.24 01:23, Andres Freund wrote:
> > On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:
> > > Done and committed.
> > 
> > This triggered a new warning for me:
> > 
> > ../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: 
> > Project targets '>=0.54' but uses feature introduced in '0.55.0': Passing 
> > executable/found program object to script parameter of add_dist_script.
> 
> Hmm, I don't see that.  Is there another version dependency that controls
> when you see version dependency warnings? ;-)

Sometimes an incompatibility is later noticed and a warning is introduced at
that point.

> We could trivially remove this particular line, or perhaps put a
> 
> if meson.version().version_compare('>=0.55')
> 
> around it.  (But would that still warn?)

It shouldn't, no. As long as the code is actually executed within the check,
it avoids the warning. However if you just set a variable inside the version
gated block and then later use the variable outside that, it will
warn. Probably hard to avoid...

Greetings,

Andres Freund




Re: make dist using git archive

2024-03-26 Thread Peter Eisentraut

On 26.03.24 01:23, Andres Freund wrote:

On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:

Done and committed.


This triggered a new warning for me:

../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: Project 
targets '>=0.54' but uses feature introduced in '0.55.0': Passing 
executable/found program object to script parameter of add_dist_script.


Hmm, I don't see that.  Is there another version dependency that 
controls when you see version dependency warnings? ;-)


We could trivially remove this particular line, or perhaps put a

if meson.version().version_compare('>=0.55')

around it.  (But would that still warn?)





Re: make dist using git archive

2024-03-25 Thread Andres Freund
Hi,

On 2024-03-25 06:44:33 +0100, Peter Eisentraut wrote:
> Done and committed.

This triggered a new warning for me:

../../../../../home/andres/src/postgresql/meson.build:3422: WARNING: Project 
targets '>=0.54' but uses feature introduced in '0.55.0': Passing 
executable/found program object to script parameter of add_dist_script.

Greetings,

Andres




Re: make dist using git archive

2024-03-24 Thread Peter Eisentraut

On 24.03.24 16:42, Tristan Partin wrote:
You might find Meson's string formatting syntax creates a more readable 
command string:


'tar.tar.bz2.command=@0@ -c'.format(bzip2.path())

And then 'install: false' is the default if you feel like leaving it out.

Otherwise, let's get this in!


Done and committed.





Re: make dist using git archive

2024-03-24 Thread Tristan Partin

3 comments left that are inconsequential. Feel free to ignore.


+# Meson has its own distribution building command (meson dist), but we
+# are not using that at this point.  The main problem is that the way
+# they have implemented it, it is not deterministic.  Also, we want it
+# to be equivalent to the "make" version for the time being.  But the
+# target name "dist" in meson is reserved for that reason, so we call
+# the custom target "pgdist".


The second sentence is a run-on.


+if bzip2.found()
+  tar_bz2 = custom_target('tar.bz2',
+build_always_stale: true,
+command: [git, '-C', '@SOURCE_ROOT@',
+  '-c', 'core.autocrlf=false',
+  '-c', 'tar.tar.bz2.command="' + bzip2.path() + '" -c',
+  'archive',
+  '--format', 'tar.bz2',
+  '--prefix', distdir + '/',
+  '-o', join_paths(meson.build_root(), '@OUTPUT@'),
+  'HEAD', '.'],
+install: false,
+output: distdir + '.tar.bz2',
+  )


You might find Meson's string formatting syntax creates a more readable 
command string:


'tar.tar.bz2.command=@0@ -c'.format(bzip2.path())

And then 'install: false' is the default if you feel like leaving it 
out.


Otherwise, let's get this in!

--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-03-24 Thread Peter Eisentraut

On 22.03.24 18:29, Tristan Partin wrote:

On Thu Mar 21, 2024 at 3:44 AM CDT, Peter Eisentraut wrote:

Here is an updated version of this patch set.


You should add 'disabler: true' to the git find_program in Meson. If Git 
doesn't exist on the system with the way your patch is currently 
written, the targets would be defined, even though they would never 
succeed.


Ok, added.  (I had it in there in an earlier version, but I think I 
misread one of your earlier messages and removed it.)


You may also want to make sure that we are actually in a Git repository. 
I don't think git-archive works outside one.


Then git archive will print an error.  That seems ok.

Re the autoclrf, is this something we could throw in a .gitattributes 
files?


We don't want to apply it to all git commands, just this one in this 
context.



I would suggest poisoning `meson dist` in the following way:

if not meson.is_subproject()

[...]

 meson.add_dist_script(perl, '-e', 'exit 1')
endif


Good idea, added that.

I have extracted the freebsd CI script fix into a separate patch 
(0002).   I think this is useful even if we don't take the full CI 
patch (0003).


0002 looks pretty reasonable to me.


Committed that one in the meantime.
From 680967a621083756e1e182df7394a739008a21d6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 11 Feb 2024 23:58:42 +0100
Subject: [PATCH v5 1/2] make dist uses git archive

This changes "make dist" to directly use "git archive", rather than
the custom shell script it currently runs.

This is to make the creation of the distribution tarball more directly
traceable to the git repository.  That is why we removed the "make
distprep" step.

"make dist" continues to produce a .gz and a .bz2 tarball as before.

The archives produced this way are deterministic and reproducible,
meaning for a given commit the result file should always be
bit-for-bit identical.  The exception is that if you use a git version
older than 2.38.0, gzip records the platform in the archive, so you'd
get a different output on Windows vs. macOS vs. "UNIX" (everything
else).  In git 2.38.0, this was changed so that everything is recorded
as "UNIX" now.  This is just something to keep in mind.  This issue is
specific to the gzip format, it does not affect other compression
formats.

Meson has its own distribution building command (meson dist), but we
are not using that at this point.  The main problem is that the way
they have implemented it, it is not deterministic in the above sense.
Also, we want a "make" version for the time being.  But the target
name "dist" in meson is reserved for that reason, so we call the
custom target "pgdist" (so call something like "meson compile -C build
pgdist").

Discussion: 
https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org
---
 GNUmakefile.in | 35 ++
 meson.build| 66 ++
 2 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 4d8fc794bbb..4dc13a3e2ea 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,20 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+# Note: core.autocrlf=false is needed to avoid line-ending conversion
+# in case the environment has a different setting.  Without this, a
+# tarball created on Windows might be different than on, and unusable
+# on, Unix machines.
+
+$(distdir).tar.gz:
+   $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz 
--prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+
+$(distdir).tar.bz2:
+   $(GIT) -C $(srcdir) -c core.autocrlf=false -c 
tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ 
HEAD -o $(abs_top_builddir)/$@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index c8fdfeb0ec3..4981447414f 100644
--- a/meson.build
+++ b/meson.build
@@ -3359,6 +3359,72 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###

Re: make dist using git archive

2024-03-22 Thread Tristan Partin

On Thu Mar 21, 2024 at 3:44 AM CDT, Peter Eisentraut wrote:

Here is an updated version of this patch set.


You should add 'disabler: true' to the git find_program in Meson. If Git 
doesn't exist on the system with the way your patch is currently 
written, the targets would be defined, even though they would never 
succeed.


You may also want to make sure that we are actually in a Git repository. 
I don't think git-archive works outside one.


Re the autoclrf, is this something we could throw in a .gitattributes 
files?


I have removed the "dirty check" stuff.  It didn't really work well/was 
buggy under meson, and it failed mysteriously on the Linux CI tasks.  So 
let's just drop that functionality for now.


I have also added a more complete commit message and some more code 
comments.



Meson has its own distribution building command (meson dist), but we
are not using that at this point.  The main problem is that the way
they have implemented it, it is not deterministic in the above sense.
Also, we want a "make" version for the time being.  But the target
name "dist" in meson is reserved for that reason, so we call the
custom target "pgdist" (so call something like "meson compile -C build
pgdist").


I would suggest poisoning `meson dist` in the following way:

if not meson.is_subproject()
	# Maybe edit the comment...Maybe tell perl to print this message 
	# instead and then exit non-zero?

#
# Meson has its own distribution building command (meson dist), but we
# are not using that at this point.  The main problem is that the way
# they have implemented it, it is not deterministic in the above sense.
# Also, we want a "make" version for the time being.  But the target
# name "dist" in meson is reserved for that reason, so we call the
# custom target "pgdist" (so call something like "meson compile -C build
# pgdist").
#
	# We don't poison the dist if we are a subproject because it is 
	# possible that the parent project may want to create a dist using 
	# the builtin Meson method.

meson.add_dist_script(perl, '-e', 'exit 1')
endif

I have extracted the freebsd CI script fix into a separate patch (0002). 
  I think this is useful even if we don't take the full CI patch (0003).


0002 looks pretty reasonable to me.

About the 0003 patch: It seems useful in principle to test these things 
continuously.  The dist script runs about 10 seconds in each task, and 
takes a bit of disk space for the artifacts.  I'm not sure to what 
degree this might bother someone.


0003 works for me :).

--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-03-21 Thread Peter Eisentraut

Here is an updated version of this patch set.

I have removed the "dirty check" stuff.  It didn't really work well/was 
buggy under meson, and it failed mysteriously on the Linux CI tasks.  So 
let's just drop that functionality for now.


I have also added a more complete commit message and some more code 
comments.


I have extracted the freebsd CI script fix into a separate patch (0002). 
 I think this is useful even if we don't take the full CI patch (0003).


About the 0003 patch: It seems useful in principle to test these things 
continuously.  The dist script runs about 10 seconds in each task, and 
takes a bit of disk space for the artifacts.  I'm not sure to what 
degree this might bother someone.
From d7c11e243d46e7e4ca0c4d0397680c3cce003687 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 11 Feb 2024 23:58:42 +0100
Subject: [PATCH v4 1/3] make dist uses git archive

This changes "make dist" to directly use "git archive", rather than
the custom shell script it currently runs.

This is to make the creation of the distribution tarball more directly
traceable to the git repository.  That is why we removed the "make
distprep" step.

"make dist" continues to produce a .gz and a .bz2 tarball as before.

The archives produced this way are deterministic and reproducible,
meaning for a given commit the result file should always be
bit-for-bit identical.  The exception is that if you use a git version
older than 2.38.0, gzip records the platform in the archive, so you'd
get a different output on Windows vs. macOS vs. "UNIX" (everything
else).  In git 2.38.0, this was changed so that everything is recorded
as "UNIX" now.  This is just something to keep in mind.  This issue is
specific to the gzip format, it does not affect other compression
formats.

Meson has its own distribution building command (meson dist), but we
are not using that at this point.  The main problem is that the way
they have implemented it, it is not deterministic in the above sense.
Also, we want a "make" version for the time being.  But the target
name "dist" in meson is reserved for that reason, so we call the
custom target "pgdist" (so call something like "meson compile -C build
pgdist").

Discussion: 
https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org
---
 GNUmakefile.in | 35 +
 meson.build| 52 ++
 2 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index 4d8fc794bbb..4dc13a3e2ea 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,20 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+# Note: core.autocrlf=false is needed to avoid line-ending conversion
+# in case the environment has a different setting.  Without this, a
+# tarball created on Windows might be different than on, and unusable
+# on, Unix machines.
+
+$(distdir).tar.gz:
+   $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz 
--prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+
+$(distdir).tar.bz2:
+   $(GIT) -C $(srcdir) -c core.autocrlf=false -c 
tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ 
HEAD -o $(abs_top_builddir)/$@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index c8fdfeb0ec3..7e233317075 100644
--- a/meson.build
+++ b/meson.build
@@ -3359,6 +3359,58 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true)
+bzip2 = find_program('bzip2', required: false, native: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+# Note: core.autocrlf=false is needed to avoid line-ending conversion
+# in case the environment has a different setting.  Without this, a
+# tarball created on Windows might be different than on, and unusable
+# on, Unix machines.
+
+tar_gz = custom_target('tar.gz',
+  build_always_stale: true,
+  command

Re: make dist using git archive

2024-02-12 Thread Peter Eisentraut

On 12.02.24 18:26, Tristan Partin wrote:

On Sun Feb 11, 2024 at 5:09 PM CST, Peter Eisentraut wrote:
Small update: I noticed that on Windows (at least the one that is 
running the CI job), I need to use git -c core.autocrlf=false, 
otherwise git archive does line-ending conversion for the files it 
puts into the archive.  With this fix, all the archives produced by 
all the CI jobs across the different platforms match, except the 
.tar.gz archive from the Linux job, which I suspect suffers from an 
old git version.  We should get the Linux images updated to a newer 
Debian version soon anyway, so I think that issue will go away.


I think with this change, it is unlikely I will be able to upstream 
anything to Meson that would benefit Postgres here since setting this 
option seems project dependent.


Meson is vulnerable to the same problem: If the person who makes the 
release had some crlf-related git setting activated in their 
environment, then that would affect the tarball.  And such a tarball 
would be genuinely broken for non-Windows users, because at least some 
parts of Unix systems can't process such CRLF files correctly.


(This is easy to test: Run meson dist with core.autocrlf=true on the 
postgresql tree on a non-Windows system.  It will fail during dist check.)






Re: make dist using git archive

2024-02-12 Thread Tristan Partin

On Sun Feb 11, 2024 at 5:09 PM CST, Peter Eisentraut wrote:
Small update: I noticed that on Windows (at least the one that is 
running the CI job), I need to use git -c core.autocrlf=false, otherwise 
git archive does line-ending conversion for the files it puts into the 
archive.  With this fix, all the archives produced by all the CI jobs 
across the different platforms match, except the .tar.gz archive from 
the Linux job, which I suspect suffers from an old git version.  We 
should get the Linux images updated to a newer Debian version soon 
anyway, so I think that issue will go away.


I think with this change, it is unlikely I will be able to upstream 
anything to Meson that would benefit Postgres here since setting this 
option seems project dependent.


--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-02-11 Thread Peter Eisentraut
Small update: I noticed that on Windows (at least the one that is 
running the CI job), I need to use git -c core.autocrlf=false, otherwise 
git archive does line-ending conversion for the files it puts into the 
archive.  With this fix, all the archives produced by all the CI jobs 
across the different platforms match, except the .tar.gz archive from 
the Linux job, which I suspect suffers from an old git version.  We 
should get the Linux images updated to a newer Debian version soon 
anyway, so I think that issue will go away.
From e6447b8a9ffd354549827e90e8bec007e703374d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 11 Feb 2024 23:58:42 +0100
Subject: [PATCH v3 1/2] make dist uses git archive

Discussion: 
https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org
---
 GNUmakefile.in | 34 -
 meson.build| 52 ++
 2 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..73560ea7d0a 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) -C $(srcdir) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz 
--prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -C $(srcdir) -c core.autocrlf=false -c 
tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ 
HEAD -o $(abs_top_builddir)/$@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index 8ed51b6aae8..5d68c66ac0a 100644
--- a/meson.build
+++ b/meson.build
@@ -3352,6 +3352,58 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true)
+bzip2 = find_program('bzip2', required: false, native: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+  command: [git, '-C', '@SOURCE_ROOT@', 'diff-index', '--quiet', 'HEAD'])
+
+tar_gz = custom_target('tar.gz',
+  build_always_stale: true,
+  command: [git, '-C', '@SOURCE_ROOT@',
+'-c', 'core.autocrlf=false',
+'archive',
+'--format', 'tar.gz',
+'--prefix', distdir + '/',
+'-o', join_paths(meson.build_root(), '@OUTPUT@'),
+'HEAD', '.'],
+  install: false,
+  output: distdir + '.tar.gz',
+)
+
+if bzip2.found()
+  tar_bz2 = custom_target('tar.bz2',
+build_always_stale: true,
+command: [git, '-C', '@SOURCE_ROOT@',
+  '-c', 'core.autocrlf=false',
+  '-c', 'tar.tar.bz2.command="' + bzip2.path() + '" -c',
+  'archive',
+  '--format', 'tar.bz2',
+  '--prefix', distdir + '/',
+  '-o', join_paths(meson.build_root(), '@OUTPUT@'),
+  'HEAD', '.'],
+install: false,
+output: distdir + '.tar.bz2',
+  )
+else
+  tar_bz2 = custom_target('tar.bz2',
+command: ['sh', '-c', 'exit 1'],
+output: distdir + '.tar.bz2',
+  )
+endif
+
+# FIXME: would like to add check_dirty_index here, broken(?) before
+# meson 0.60.0
+alias_target('pgdist', [tar_gz, tar_bz2])
+
+
+
 ###
 # The End, The End, My Friend
 ###

base-commit: ce571434ae7027462565706236a0c6fbdf603734
-- 
2.43.1

From 0227fa494dab9302290c63afc04d0efc9c7d175d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 11 Feb 2024 23:58:42 +0100
Subject: [PATCH v3 2/2] WIP: Add dist building to CI

Note: freebsd repartition script didn't copy .git directory
---
 .cirrus.tasks.yml   | 28 +
 src/tools/ci/gcp_freebsd_repartition.sh |  2 +-
 2 file

Re: make dist using git archive

2024-01-31 Thread Robert Haas
On Wed, Jan 31, 2024 at 10:50 AM Eli Schwartz  wrote:
> Ideally git commits should be signed, but that requires large numbers of
> people to have security-minded git commit habits. From a quick check of
> the postgres commit logs, only one person seems to be regularly signing
> commits, which does provide a certain measure of protection -- an
> attacker cannot attack via `git push --force` across that boundary, and
> those commits serve as verifiable states that multiple people have seen.
>
> The tags aren't signed either, which is a big issue for verifiably
> identifying the release artifacts published by the release manager. Even
> if not every commit is signed, having signed tags provides a known
> coordination point of code that has been broadly tested and code-signed
> for mass use.
>
> In summary, my opinion is that using git-get-tar-commit-id provides zero
> security guarantees, and if that's not something you are worried about
> then that's one thing, but if you were expecting it to *replace* signing
> the tarball, then that's very much another thing entirely, and not
> one I can agree at all with.

I read this part with interest. I think there's definitely something
to be said for strengthening some of our practices in this area. At
the same time, I think it's reasonable for Peter to want to pursue the
limited goal he stated in the original post, namely reproducible
tarball generation, without getting tangled up in possible policy
changes that might be controversial and might require a bunch of
planning and coordination. "GPG signatures are good" can be true
without "reproducible tarball generation is good" being false; and if
"git archive" allows for that and "meson dist" doesn't, then we're
unlikely to adopt "meson dist".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make dist using git archive

2024-01-31 Thread Eli Schwartz
On 1/31/24 3:03 AM, Peter Eisentraut wrote:
>> What do you use this for? IMO a more robust way to track the commit used
>> is to use gitattributes export-subst to write a `.git_archival.txt` file
>> containing the commit sha1 and other info -- this can be read even after
>> the file is extracted, which means it can also be used to bake the ID
>> into the built binaries e.g. as part of --version output.
> 
> It's a marginal use case, for sure.  But it is something that git
> provides tooling for that is universally available.  Any alternative
> would be an ad-hoc solution that is specific to our project and would be
> different for the next project.


mercurial has the "archivemeta" config setting that exports similar
information, but forces the filename ".hg_archival.txt".

The setuptools-scm project follows this pattern by requiring the git
file to be called ".git_archival.txt" with a set pattern mimicking the
hg one:

https://setuptools-scm.readthedocs.io/en/latest/usage/#git-archives


So, I guess you could use this and then it would not be specific to your
project. :)


>> Overall I feel like much of this is about requiring dist tarballs to be
>> byte-identical to other dist tarballs, although reproducible builds is
>> mainly about artifacts, not sources, and for sources it doesn't
>> generally matter unless the sources are ephemeral and generated
>> on-demand (in which case it is indeed very important to produce the same
>> tarball each time).
> 
> The source tarball is, in a way, also an artifact.
> 
> I think it's useful that others can easily independently verify that the
> produced tarball matches what they have locally.  It's not an absolute
> requirement, but given that it is possible, it seems useful to take
> advantage of it.
> 
> In a way, this also avoids the need for signing the tarball, which we
> don't do.  So maybe that contributes to a different perspective.


Since you mention signing and not as a simple "aside"...

That's a fascinating perspective. I wonder how people independently
verify that what they have locally (I assume from git clones) matches
what the postgres committers have authorized.

I'm a bit skeptical that you can avoid the need to perform code-signing
at some stage, somewhere, somehow, by suggesting that people can simply
git clone, run some commands and compare the tarball. The point of
signing is to verify that no one has acquired an untraceable API token
they should not have and gotten write access to the authoritative server
then uploaded malicious code under various forged identities, possibly
overwriting previous versions, either in git or out of git.

Ideally git commits should be signed, but that requires large numbers of
people to have security-minded git commit habits. From a quick check of
the postgres commit logs, only one person seems to be regularly signing
commits, which does provide a certain measure of protection -- an
attacker cannot attack via `git push --force` across that boundary, and
those commits serve as verifiable states that multiple people have seen.

The tags aren't signed either, which is a big issue for verifiably
identifying the release artifacts published by the release manager. Even
if not every commit is signed, having signed tags provides a known
coordination point of code that has been broadly tested and code-signed
for mass use.

...

In summary, my opinion is that using git-get-tar-commit-id provides zero
security guarantees, and if that's not something you are worried about
then that's one thing, but if you were expecting it to *replace* signing
the tarball, then that's very much another thing entirely, and not
one I can agree at all with.



-- 
Eli Schwartz


OpenPGP_0x84818A6819AF4A9B.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: make dist using git archive

2024-01-31 Thread Peter Eisentraut

On 26.01.24 22:18, Eli Schwartz wrote:

Hello, meson developer here.


Hello, and thanks for popping in!


3. Meson does not support tar.bz2 archives.


Simple enough to add, but I'm a bit surprised as usually people seem to
want either gzip for portability or xz for efficient compression.


We may very well end up updating our requirements here before too long, 
so I wouldn't bother with this on the meson side.  Last time we 
discussed this, there were still platforms under support that didn't 
have xz easily available.



4. Meson uses git archive internally, but then unpacks and repacks the
archive, which loses the ability to use git get-tar-commit-id.


What do you use this for? IMO a more robust way to track the commit used
is to use gitattributes export-subst to write a `.git_archival.txt` file
containing the commit sha1 and other info -- this can be read even after
the file is extracted, which means it can also be used to bake the ID
into the built binaries e.g. as part of --version output.


It's a marginal use case, for sure.  But it is something that git 
provides tooling for that is universally available.  Any alternative 
would be an ad-hoc solution that is specific to our project and would be 
different for the next project.



5. I have found that the tar archives created by meson and git archive
include the files in different orders.  I suspect that the Python
tarfile module introduces some either randomness or platform dependency.


Different orders is meaningless, the question is whether the order is
internally consistent. Python uses sorted() to guarantee a stable order,
which may be a different algorithm than the one git-archive uses to
guarantee a stable order. But the order should be stable and that is
what matters.


(FWIW, I couldn't reproduce this anymore, so maybe it's not actually an 
issue.)



Overall I feel like much of this is about requiring dist tarballs to be
byte-identical to other dist tarballs, although reproducible builds is
mainly about artifacts, not sources, and for sources it doesn't
generally matter unless the sources are ephemeral and generated
on-demand (in which case it is indeed very important to produce the same
tarball each time).


The source tarball is, in a way, also an artifact.

I think it's useful that others can easily independently verify that the 
produced tarball matches what they have locally.  It's not an absolute 
requirement, but given that it is possible, it seems useful to take 
advantage of it.


In a way, this also avoids the need for signing the tarball, which we 
don't do.  So maybe that contributes to a different perspective.






Re: make dist using git archive

2024-01-26 Thread Eli Schwartz
Hello, meson developer here.


On 1/23/24 4:30 AM, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
>> I am not really following why we can't use the builtin Meson dist
>> command. The only difference from my testing is it doesn't use a
>> --prefix argument.
> 
> Here are some problems I have identified:
> 
> 1. meson dist internally runs gzip without the -n option.  That makes
> the tar.gz archive include a timestamp, which in turn makes it not
> reproducible.


Well, it uses python tarfile which uses python gzip support under the
hood, but yes, that is true, python tarfile doesn't expose this tunable.


> 2. Because gzip includes a platform indicator in the archive, the
> produced tar.gz archive is not reproducible across platforms.  (I don't
> know if gzip has an option to avoid that.  git archive uses an internal
> gzip implementation that handles this.)


This appears to be https://github.com/python/cpython/issues/112346


> 3. Meson does not support tar.bz2 archives.


Simple enough to add, but I'm a bit surprised as usually people seem to
want either gzip for portability or xz for efficient compression.


> 4. Meson uses git archive internally, but then unpacks and repacks the
> archive, which loses the ability to use git get-tar-commit-id.


What do you use this for? IMO a more robust way to track the commit used
is to use gitattributes export-subst to write a `.git_archival.txt` file
containing the commit sha1 and other info -- this can be read even after
the file is extracted, which means it can also be used to bake the ID
into the built binaries e.g. as part of --version output.


> 5. I have found that the tar archives created by meson and git archive
> include the files in different orders.  I suspect that the Python
> tarfile module introduces some either randomness or platform dependency.


Different orders is meaningless, the question is whether the order is
internally consistent. Python uses sorted() to guarantee a stable order,
which may be a different algorithm than the one git-archive uses to
guarantee a stable order. But the order should be stable and that is
what matters.


> 6. meson dist is also slower because of the additional work.


I'm amenable to skipping the extraction/recombination of subprojects and
running of dist scripts in the event that neither exist, as Tristan
offered to do, but...


> 7. meson dist produces .sha256sum files but we have called them .sha256.
>  (This is obviously trivial, but it is something that would need to be
> dealt with somehow nonetheless.)
> 
> Most or all of these issues are fixable, either upstream in Meson or by
> adjusting our own requirements.  But for now this route would have some
> significant disadvantages.


Overall I feel like much of this is about requiring dist tarballs to be
byte-identical to other dist tarballs, although reproducible builds is
mainly about artifacts, not sources, and for sources it doesn't
generally matter unless the sources are ephemeral and generated
on-demand (in which case it is indeed very important to produce the same
tarball each time). A tarball is usually generated once, signed, and
uploaded to release hosting. Meson already guarantees the contents are
strictly based on the built tag.


-- 
Eli Schwartz


OpenPGP_0x84818A6819AF4A9B.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: make dist using git archive

2024-01-26 Thread Tristan Partin

On Fri Jan 26, 2024 at 12:28 AM CST, Peter Eisentraut wrote:

On 25.01.24 17:25, Tristan Partin wrote:
> For what it's worth, I run Meson 1.3, and the behavior of generating the 
> tarballs even though it is a dirty tree still occurred. In the new patch 
> you seem to say it was fixed in 0.60.


The problem I'm referring to is that before 0.60, alias_target cannot 
depend on run_target (only "build target").  This is AFAICT not 
documented and might not have been an intentional change, but you can 
trace it in the meson source code, and it shows in the PostgreSQL CI. 
That's also why for the above bzip2 issue I have to use custom_target in 
place of your run_target.


https://github.com/mesonbuild/meson/pull/12783

Thanks for finding these issues.

--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-01-25 Thread Peter Eisentraut

On 25.01.24 17:25, Tristan Partin wrote:
The way that this currently works is that you will fail at configure 
time if bz2 doesn't exist on the system. Meson will try to resolve a 
.path() method on a NotFoundProgram. You might want to define the bz2 
target to just call `exit 1` in this case.


if bzip2.found()
  # do your current target
else
  bz2 = run_target('tar.bz2', command: ['exit', 1])
endif

This should cause pgdist to appropriately fail at run time when 
generating the bz2 tarball.


Ok, done that way.

For what it's worth, I run Meson 1.3, and the behavior of generating the 
tarballs even though it is a dirty tree still occurred. In the new patch 
you seem to say it was fixed in 0.60.


The problem I'm referring to is that before 0.60, alias_target cannot 
depend on run_target (only "build target").  This is AFAICT not 
documented and might not have been an intentional change, but you can 
trace it in the meson source code, and it shows in the PostgreSQL CI. 
That's also why for the above bzip2 issue I have to use custom_target in 
place of your run_target.
From 48ddd079c1530baaabf92a5650ff9a7bfa7ac3d6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Jan 2024 12:28:28 +0100
Subject: [PATCH v2 1/2] make dist uses git archive

Discussion: 
https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org
---
 GNUmakefile.in | 34 --
 meson.build| 47 +++
 2 files changed, 59 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..680c765dd73 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) -C $(srcdir) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) -C $(srcdir) archive --format tar.gz --prefix $(distdir)/ HEAD 
-o $(abs_top_builddir)/$@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -C $(srcdir) -c tar.tar.bz2.command='$(BZIP2) -c' archive 
--format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index 55184db2488..c245a1818e0 100644
--- a/meson.build
+++ b/meson.build
@@ -3348,6 +3348,53 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true)
+bzip2 = find_program('bzip2', required: false, native: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+  command: [git, '-C', '@SOURCE_ROOT@', 'diff-index', '--quiet', 'HEAD'])
+
+tar_gz = custom_target('tar.gz',
+  build_always_stale: true,
+  command: [git, '-C', '@SOURCE_ROOT@', 'archive',
+'--format', 'tar.gz',
+'--prefix', distdir + '/',
+'-o', join_paths(meson.build_root(), '@OUTPUT@'),
+'HEAD', '.'],
+  install: false,
+  output: distdir + '.tar.gz',
+)
+
+if bzip2.found()
+  tar_bz2 = custom_target('tar.bz2',
+build_always_stale: true,
+command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command="' + 
bzip2.path() + '" -c', 'archive',
+  '--format', 'tar.bz2',
+  '--prefix', distdir + '/',
+  '-o', join_paths(meson.build_root(), '@OUTPUT@'),
+  'HEAD', '.'],
+install: false,
+output: distdir + '.tar.bz2',
+  )
+else
+  tar_bz2 = custom_target('tar.bz2',
+command: ['sh', '-c', 'exit 1'],
+output: distdir + '.tar.bz2',
+  )
+endif
+
+# FIXME: would like to add check_dirty_index here, broken(?) before
+# meson 0.60.0
+alias_target('pgdist', [tar_gz, tar_bz2])
+
+
+
 ###
 # The End, The End, My Friend
 ###

base-commit: 46d8587b504170ca14f064890bc7ccbbd7523f81
-- 
2.43.0

From 

Re: make dist using git archive

2024-01-25 Thread Tristan Partin

On Thu Jan 25, 2024 at 10:04 AM CST, Peter Eisentraut wrote:

On 22.01.24 21:04, Tristan Partin wrote:
>> +    'HEAD', '.'],
>> +  install: false,
>> +  output: distdir + '.tar.bz2',
>> +)
> 
> The bz2 target should be wrapped in an `if bzip2.found()`.


The way that this currently works is that you will fail at configure 
time if bz2 doesn't exist on the system. Meson will try to resolve 
a .path() method on a NotFoundProgram. You might want to define the bz2 
target to just call `exit 1` in this case.


if bzip2.found()
 # do your current target
else
 bz2 = run_target('tar.bz2', command: ['exit', 1])
endif

This should cause pgdist to appropriately fail at run time when 
generating the bz2 tarball.


Well, I think we want the dist step to fail if bzip2 is not there.  At 
least that is the current expectation.


>> +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])
> 
> Are you intending for the check_dirty_index target to prohibit the other 
> two targets from running? Currently that is not the case.


Yes, that was the hope, and that's how the make dist variant works.  But 
I couldn't figure this out with meson.  Also, the above actually also 
doesn't work with older meson versions, so I had to comment this out to 
get CI to work.


> If it is what 
> you intend, use a stamp file or something to indicate a relationship. 
> Alternatively, inline the git diff-index into the other commands. These 
> might also do better as external scripts. It would reduce duplication 
> between the autotools and Meson builds.


Yeah, maybe that's a direction.


For what it's worth, I run Meson 1.3, and the behavior of generating the 
tarballs even though it is a dirty tree still occurred. In the new patch 
you seem to say it was fixed in 0.60.


--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-01-25 Thread Peter Eisentraut

On 22.01.24 21:04, Tristan Partin wrote:

+git = find_program('git', required: false, native: true, disabler: true)
+bzip2 = find_program('bzip2', required: false, native: true, 
disabler: true)


This doesn't need to be a disabler. git is fine as-is. See later 
comment. Disablers only work like you are expecting when they are used 
like how git is used. Once you call a method like .path(), all bets are 
off.


ok, fixed


+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+   command: [git, 'diff-index', 
'--quiet', 'HEAD'])


Seems like you might want to add -C here too?


done


+tar_bz2 = custom_target('tar.bz2',
+  build_always_stale: true,
+  command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' 
+ bzip2.path() + ' -c', 'archive',

+    '--format', 'tar.bz2',
+    '--prefix', distdir + '/',


-    '-o', '@BUILD_ROOT@/@OUTPUT@',
+    '-o', join_paths(meson.build_root(), '@OUTPUT@'),

This will generate the tarballs in the build directory. Do the same for 
the previous target. Tested locally.


Fixed, thanks.  I had struggled with this one.


+    'HEAD', '.'],
+  install: false,
+  output: distdir + '.tar.bz2',
+)


The bz2 target should be wrapped in an `if bzip2.found()`.


Well, I think we want the dist step to fail if bzip2 is not there.  At 
least that is the current expectation.



+alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])


Are you intending for the check_dirty_index target to prohibit the other 
two targets from running? Currently that is not the case.


Yes, that was the hope, and that's how the make dist variant works.  But 
I couldn't figure this out with meson.  Also, the above actually also 
doesn't work with older meson versions, so I had to comment this out to 
get CI to work.


If it is what 
you intend, use a stamp file or something to indicate a relationship. 
Alternatively, inline the git diff-index into the other commands. These 
might also do better as external scripts. It would reduce duplication 
between the autotools and Meson builds.


Yeah, maybe that's a direction.

The updated patch also supports vpath builds with make now.

I have also added a CI patch, for amusement.  Maybe we'll want to keep 
it, though.From 13612f9a1c486e8acfe4156a7bc069a66fd30e77 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 25 Jan 2024 12:28:28 +0100
Subject: [PATCH v1 1/2] make dist uses git archive

Discussion: 
https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org
---
 GNUmakefile.in | 34 --
 meson.build| 41 +
 2 files changed, 53 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..680c765dd73 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) -C $(srcdir) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) -C $(srcdir) archive --format tar.gz --prefix $(distdir)/ HEAD 
-o $(abs_top_builddir)/$@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -C $(srcdir) -c tar.tar.bz2.command='$(BZIP2) -c' archive 
--format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index 55184db2488..43884e7cfdd 100644
--- a/meson.build
+++ b/meson.build
@@ -3348,6 +3348,47 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true)
+bzip2 = find_program('bzip2', required: false, native: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+   command: [git, '-C', '@SOURCE_ROOT@',
+  

Re: make dist using git archive

2024-01-24 Thread Peter Eisentraut

On 24.01.24 18:57, Tristan Partin wrote:
> 4. Meson uses git archive internally, but then unpacks and repacks 
the > archive, which loses the ability to use git get-tar-commit-id.


Because Meson allows projects to distribute arbitrary files via 
meson.add_dist_script(), and can include subprojects via `meson dist 
--include-subprojects`, this doesn't seem like an easily solvable problem.


git archive has the --add-file option, which can probably do the same 
thing.  Subprojects are another thing, but obviously are more rarely used.


I think the solution to point 4 is to not unpack/repack if there are no 
dist scripts and/or subprojects to distribute. I can take a look at this 
later. I think this would also solve points 1, 2, 5, and 6 because at 
that point meson is just calling git-archive.


I think that would be a useful direction.





Re: make dist using git archive

2024-01-24 Thread Tristan Partin

On Wed Jan 24, 2024 at 10:18 AM CST, Tristan Partin wrote:

On Tue Jan 23, 2024 at 3:30 AM CST, Peter Eisentraut wrote:
> On 22.01.24 21:04, Tristan Partin wrote:
> > I am not really following why we can't use the builtin Meson dist 
> > command. The only difference from my testing is it doesn't use a 
> > --prefix argument.

>
> Here are some problems I have identified:
>
> 1. meson dist internally runs gzip without the -n option.  That makes 
> the tar.gz archive include a timestamp, which in turn makes it not 
> reproducible.


It doesn't look like Python provides the facilities to affect this.

> 2. Because gzip includes a platform indicator in the archive, the 
> produced tar.gz archive is not reproducible across platforms.  (I don't 
> know if gzip has an option to avoid that.  git archive uses an internal 
> gzip implementation that handles this.)


Same reason as above.


> 3. Meson does not support tar.bz2 archives.


Submitted https://github.com/mesonbuild/meson/pull/12770.

> 4. Meson uses git archive internally, but then unpacks and repacks the 
> archive, which loses the ability to use git get-tar-commit-id.


Because Meson allows projects to distribute arbitrary files via 
meson.add_dist_script(), and can include subprojects via `meson dist 
--include-subprojects`, this doesn't seem like an easily solvable 
problem.


> 5. I have found that the tar archives created by meson and git archive 
> include the files in different orders.  I suspect that the Python 
> tarfile module introduces some either randomness or platform dependency.


Seems likely.


> 6. meson dist is also slower because of the additional work.


Not easily solvable due to 4.

> 7. meson dist produces .sha256sum files but we have called them .sha256. 
>   (This is obviously trivial, but it is something that would need to be 
> dealt with somehow nonetheless.)

>
> Most or all of these issues are fixable, either upstream in Meson or by 
> adjusting our own requirements.  But for now this route would have some 
> significant disadvantages.


Thanks Peter. I will bring these up with upstream!


I think the solution to point 4 is to not unpack/repack if there are no 
dist scripts and/or subprojects to distribute. I can take a look at 
this later. I think this would also solve points 1, 2, 5, and 6 because 
at that point meson is just calling git-archive.


--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-01-24 Thread Tristan Partin

On Tue Jan 23, 2024 at 3:30 AM CST, Peter Eisentraut wrote:

On 22.01.24 21:04, Tristan Partin wrote:
> I am not really following why we can't use the builtin Meson dist 
> command. The only difference from my testing is it doesn't use a 
> --prefix argument.


Here are some problems I have identified:

1. meson dist internally runs gzip without the -n option.  That makes 
the tar.gz archive include a timestamp, which in turn makes it not 
reproducible.


2. Because gzip includes a platform indicator in the archive, the 
produced tar.gz archive is not reproducible across platforms.  (I don't 
know if gzip has an option to avoid that.  git archive uses an internal 
gzip implementation that handles this.)


3. Meson does not support tar.bz2 archives.

4. Meson uses git archive internally, but then unpacks and repacks the 
archive, which loses the ability to use git get-tar-commit-id.


5. I have found that the tar archives created by meson and git archive 
include the files in different orders.  I suspect that the Python 
tarfile module introduces some either randomness or platform dependency.


6. meson dist is also slower because of the additional work.

7. meson dist produces .sha256sum files but we have called them .sha256. 
  (This is obviously trivial, but it is something that would need to be 
dealt with somehow nonetheless.)


Most or all of these issues are fixable, either upstream in Meson or by 
adjusting our own requirements.  But for now this route would have some 
significant disadvantages.


Thanks Peter. I will bring these up with upstream!

--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-01-23 Thread Peter Eisentraut

On 22.01.24 21:04, Tristan Partin wrote:
I am not really following why we can't use the builtin Meson dist 
command. The only difference from my testing is it doesn't use a 
--prefix argument.


Here are some problems I have identified:

1. meson dist internally runs gzip without the -n option.  That makes 
the tar.gz archive include a timestamp, which in turn makes it not 
reproducible.


2. Because gzip includes a platform indicator in the archive, the 
produced tar.gz archive is not reproducible across platforms.  (I don't 
know if gzip has an option to avoid that.  git archive uses an internal 
gzip implementation that handles this.)


3. Meson does not support tar.bz2 archives.

4. Meson uses git archive internally, but then unpacks and repacks the 
archive, which loses the ability to use git get-tar-commit-id.


5. I have found that the tar archives created by meson and git archive 
include the files in different orders.  I suspect that the Python 
tarfile module introduces some either randomness or platform dependency.


6. meson dist is also slower because of the additional work.

7. meson dist produces .sha256sum files but we have called them .sha256. 
 (This is obviously trivial, but it is something that would need to be 
dealt with somehow nonetheless.)


Most or all of these issues are fixable, either upstream in Meson or by 
adjusting our own requirements.  But for now this route would have some 
significant disadvantages.






Re: make dist using git archive

2024-01-22 Thread Junwang Zhao
On Tue, Jan 23, 2024 at 2:36 AM Peter Eisentraut  wrote:
>
> On 22.01.24 13:10, Junwang Zhao wrote:
> > I played this with meson build on macOS, the packages are generated
> > in source root but not build root, I'm sure if this is by design but I think
> > polluting *working directory* is not good.
>
> Yes, it's not good, but I couldn't find a way to make it work.
>
> This is part of the complications with meson I referred to.  The
> @BUILD_ROOT@ placeholder in custom_target() is apparently always a
> relative path, but it doesn't know that git -C changes the current
> directory.
>
> > Another thing I'd like to point out is, should we also introduce *git 
> > commit*
> > or maybe *git tag* to package name, something like:
> >
> > git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> > postgresql-17devel-`git rev-parse --short HEAD`.tar.gz
> > git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> > postgresql-`git describe --tags`.tar.gz
>
> I'm not sure why we would need it built-in.  It can be done by hand, of
> course.

If this is only used by the release phase, one can do this by hand.

*commit id/tag* in package name can be used to identify the git source,
which might be useful for cooperation between QA and dev team,
but surely there are better ways for this, so I do not have a strong
opinion here.

>


-- 
Regards
Junwang Zhao




Re: make dist using git archive

2024-01-22 Thread Tristan Partin

On Mon Jan 22, 2024 at 1:31 AM CST, Peter Eisentraut wrote:

From 4b128faca90238d0a0bb6949a8050c2501d1bd67 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 20 Jan 2024 21:54:36 +0100
Subject: [PATCH v0] make dist uses git archive

---
 GNUmakefile.in | 34 --
 meson.build| 38 ++
 2 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..3e04785ada2 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git

+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) archive --format tar.gz --prefix $(distdir)/ HEAD -o $@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 
--prefix $(distdir)/ HEAD -o $@
 
 distcheck: dist

rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index c317144b6bc..f0d870c5192 100644
--- a/meson.build
+++ b/meson.build
@@ -3347,6 +3347,44 @@ run_target('help',
 
 
 
+###

+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true, disabler: true)
+bzip2 = find_program('bzip2', required: false, native: true, disabler: true)


This doesn't need to be a disabler. git is fine as-is. See later 
comment. Disablers only work like you are expecting when they are used 
like how git is used. Once you call a method like .path(), all bets are 
off.



+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index',
+   command: [git, 'diff-index', '--quiet', 'HEAD'])


Seems like you might want to add -C here too?


+
+tar_gz = custom_target('tar.gz',
+  build_always_stale: true,
+  command: [git, '-C', '@SOURCE_ROOT@', 'archive',
+'--format', 'tar.gz',
+'--prefix', distdir + '/',
+'-o', '@BUILD_ROOT@/@OUTPUT@',
+'HEAD', '.'],
+  install: false,
+  output: distdir + '.tar.gz',
+)
+
+tar_bz2 = custom_target('tar.bz2',
+  build_always_stale: true,
+  command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' + 
bzip2.path() + ' -c', 'archive',
+'--format', 'tar.bz2',
+'--prefix', distdir + '/',


-'-o', '@BUILD_ROOT@/@OUTPUT@',
+'-o', join_paths(meson.build_root(), '@OUTPUT@'),

This will generate the tarballs in the build directory. Do the same for 
the previous target. Tested locally.



+'HEAD', '.'],
+  install: false,
+  output: distdir + '.tar.bz2',
+)


The bz2 target should be wrapped in an `if bzip2.found()`. It is 
possible for git to be found, but not bzip2. I might also define the bz2 
command out of line. Also, you may want to add 
these programs to meson_options.txt for overriding, even though the 
"meson-ic" way is to use a machine file.



+
+alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2])


Are you intending for the check_dirty_index target to prohibit the other 
two targets from running? Currently that is not the case. If it is what 
you intend, use a stamp file or something to indicate a relationship. 
Alternatively, inline the git diff-index into the other commands. These 
might also do better as external scripts. It would reduce duplication 
between the autotools and Meson builds.



+
+
+
 ###
 # The End, The End, My Friend
 ###


I am not really following why we can't use the builtin Meson dist 
command. The only difference from my testing is it doesn't use 
a --prefix argument.


--
Tristan Partin
Neon (https://neon.tech)




Re: make dist using git archive

2024-01-22 Thread Peter Eisentraut

On 22.01.24 13:10, Junwang Zhao wrote:

I played this with meson build on macOS, the packages are generated
in source root but not build root, I'm sure if this is by design but I think
polluting *working directory* is not good.


Yes, it's not good, but I couldn't find a way to make it work.

This is part of the complications with meson I referred to.  The 
@BUILD_ROOT@ placeholder in custom_target() is apparently always a 
relative path, but it doesn't know that git -C changes the current 
directory.



Another thing I'd like to point out is, should we also introduce *git commit*
or maybe *git tag* to package name, something like:

git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
postgresql-17devel-`git rev-parse --short HEAD`.tar.gz
git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
postgresql-`git describe --tags`.tar.gz


I'm not sure why we would need it built-in.  It can be done by hand, of 
course.






Re: make dist using git archive

2024-01-22 Thread Junwang Zhao
Hi,

On Mon, Jan 22, 2024 at 3:32 PM Peter Eisentraut  wrote:
>
> One of the goals is to make the creation of the distribution tarball
> more directly traceable to the git repository.  That is why we removed
> the "make distprep" step.
>
> Here I want to take another step in that direction, by changing "make
> dist" to directly use "git archive", rather than the custom shell script
> it currently runs.
>
> The simple summary is that it would run
>
> git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
> postgresql-17devel.tar.gz
>
> (with appropriate version numbers, of course), and that's the tarball we
> would ship.
>
> There are analogous commands for other compression formats.
>
> The actual command gets subtly more complicated if you need to run this
> in a separate build directory.  In my attached patch, the make version
> doesn't support vpath at the moment, just so that it's easier to
> understand for now.  The meson version is a bit hairy.
>
> I have studied and tested this quite a bit, and I have found that the
> archives produced this way are deterministic and reproducible, meaning
> for a given commit the result file should always be bit-for-bit identical.
>
> The exception is that if you use a git version older than 2.38.0, gzip
> records the platform in the archive, so you'd get a different output on
> Windows vs. macOS vs. "UNIX" (everything else).  In git 2.38.0, this was
> changed so that everything is recorded as "UNIX" now.  This is just
> something to keep in mind.  This issue is specific to the gzip format,
> it does not affect other compression formats.
>
> Meson has its own distribution building command (meson dist), but opted
> against using that.  The main problem is that the way they have
> implemented it, it is not deterministic in the above sense.  (Another
> point is of course that we probably want a "make" version for the time
> being.)
>
> But the target name "dist" in meson is reserved for that reason, so I
> needed to call the custom target "pgdist".
>
> I did take one idea from meson: It runs a check before git archive that
> the checkout is clean.  That way, you avoid mistakes because of
> uncommitted changes.  This works well in my "make" implementation.  In
> the meson implementation, I had to find a workaround, because a
> custom_target cannot have a dependency on a run_target.  As also
> mentioned above, the whole meson implementation is a bit ugly.
>
> Anyway,  with the attached patch you can do
>
>  make dist
>
> or
>
>  meson compile -C build pgdist

I played this with meson build on macOS, the packages are generated
in source root but not build root, I'm sure if this is by design but I think
polluting *working directory* is not good.

Another thing I'd like to point out is, should we also introduce *git commit*
or maybe *git tag* to package name, something like:

git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
postgresql-17devel-`git rev-parse --short HEAD`.tar.gz
git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o
postgresql-`git describe --tags`.tar.gz

>
> and it produces the same set of tarballs as before, except it's done
> differently.
>
> The actual build scripts need some fine-tuning, but the general idea is
> correct, I think.

I think this is a good idea, thanks for working on this.


-- 
Regards
Junwang Zhao




make dist using git archive

2024-01-21 Thread Peter Eisentraut
One of the goals is to make the creation of the distribution tarball 
more directly traceable to the git repository.  That is why we removed 
the "make distprep" step.


Here I want to take another step in that direction, by changing "make 
dist" to directly use "git archive", rather than the custom shell script 
it currently runs.


The simple summary is that it would run

git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o 
postgresql-17devel.tar.gz


(with appropriate version numbers, of course), and that's the tarball we 
would ship.


There are analogous commands for other compression formats.

The actual command gets subtly more complicated if you need to run this 
in a separate build directory.  In my attached patch, the make version 
doesn't support vpath at the moment, just so that it's easier to 
understand for now.  The meson version is a bit hairy.


I have studied and tested this quite a bit, and I have found that the 
archives produced this way are deterministic and reproducible, meaning 
for a given commit the result file should always be bit-for-bit identical.


The exception is that if you use a git version older than 2.38.0, gzip 
records the platform in the archive, so you'd get a different output on 
Windows vs. macOS vs. "UNIX" (everything else).  In git 2.38.0, this was 
changed so that everything is recorded as "UNIX" now.  This is just 
something to keep in mind.  This issue is specific to the gzip format, 
it does not affect other compression formats.


Meson has its own distribution building command (meson dist), but opted 
against using that.  The main problem is that the way they have 
implemented it, it is not deterministic in the above sense.  (Another 
point is of course that we probably want a "make" version for the time 
being.)


But the target name "dist" in meson is reserved for that reason, so I 
needed to call the custom target "pgdist".


I did take one idea from meson: It runs a check before git archive that 
the checkout is clean.  That way, you avoid mistakes because of 
uncommitted changes.  This works well in my "make" implementation.  In 
the meson implementation, I had to find a workaround, because a 
custom_target cannot have a dependency on a run_target.  As also 
mentioned above, the whole meson implementation is a bit ugly.


Anyway,  with the attached patch you can do

make dist

or

meson compile -C build pgdist

and it produces the same set of tarballs as before, except it's done 
differently.


The actual build scripts need some fine-tuning, but the general idea is 
correct, I think.From 4b128faca90238d0a0bb6949a8050c2501d1bd67 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 20 Jan 2024 21:54:36 +0100
Subject: [PATCH v0] make dist uses git archive

---
 GNUmakefile.in | 34 --
 meson.build| 38 ++
 2 files changed, 50 insertions(+), 22 deletions(-)

diff --git a/GNUmakefile.in b/GNUmakefile.in
index eba569e930e..3e04785ada2 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers 
submake-libpgport
 distdir= postgresql-$(VERSION)
 dummy  = =install=
 
+GIT = git
+
 dist: $(distdir).tar.gz $(distdir).tar.bz2
-   rm -rf $(distdir)
-
-$(distdir).tar: distdir
-   $(TAR) chf $@ $(distdir)
-
-.INTERMEDIATE: $(distdir).tar
-
-distdir-location:
-   @echo $(distdir)
-
-distdir:
-   rm -rf $(distdir)* $(dummy)
-   for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name 
.git -prune \) -o -print`; do \
- file=`expr X$$x : 'X\./\(.*\)'`; \
- if test -d "$(top_srcdir)/$$file" ; then \
-   mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \
- else \
-   ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \
- || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \
- fi || exit; \
-   done
-   $(MAKE) -C $(distdir) distclean
+
+.PHONY: check-dirty-index
+check-dirty-index:
+   $(GIT) diff-index --quiet HEAD
+
+$(distdir).tar.gz: check-dirty-index
+   $(GIT) archive --format tar.gz --prefix $(distdir)/ HEAD -o $@
+
+$(distdir).tar.bz2: check-dirty-index
+   $(GIT) -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 
--prefix $(distdir)/ HEAD -o $@
 
 distcheck: dist
rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index c317144b6bc..f0d870c5192 100644
--- a/meson.build
+++ b/meson.build
@@ -3347,6 +3347,44 @@ run_target('help',
 
 
 
+###
+# Distribution archive
+###
+
+git = find_program('git', required: false, native: true, disabler: true)
+bzip2 = find_program('bzip2', required: false, native: true, disabler: true)
+
+distdir = meson.project_name() + '-' + meson.project_version()
+
+check_dirty_index = run_target('check-dirty-index'