Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-05 Thread Junio C Hamano
René Scharfe  writes:

>> One practical problem is that users who do this
>> 
>>  $ git archive HEAD Documentation/ | tar tf -
>> 
>> would be expecting (at least) two different things, depending on the
>> situation they are in.
>> 
>> So at least you'd need an "--include-untracked" option, I guess.
>
> Right, this breaks down with directories -- most build artifacts (e.g.
> .o files) are probably not meant to end up in archives.

I agree that it is unwise to overload the pathspec for this purpose.
Perhaps bulk of the documentation of a project is in javadoc in its
source code and extracted into some directory, where the user would
want to include untracked things as well as tracked ones, while
untracked contents of other directories are all not meant to be
packaged.  As "git archive" is primarily about freezing the contents
of a set of paths in a single revision into an archive, and
including untracked things is secondary, perhaps the right way to do
so would be to:

 (1) leave pathspec as-is---they mean "only this area of the named
 revision goes into the resulting archive", and 

 (2) introduce a new "--add-untracked=" option, that can
 be multiply given, is cumulative, and is used to specify which
 untracked paths to be included in the result from the working
 tree contents.

So

git archive \
--add-untracked=./configure \
--add-untracked='Documentation/**/*.html' \
--add-untracked='Documentation/*.[1-9]' \
HEAD -- . ':!contrib/' ':t/'

might be a way to package up sources we use without tests but
include the built documentation files.


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-05 Thread René Scharfe
Am 04.01.2018 um 19:22 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> I don't know if it's a good idea, but perhaps we don't even need a new
>> option.  We could change how pathspecs of untracked files are handled:
>> Instead of aborting we could include them in the archive.  (Sounds like
>> the simplest possible interface, but may have practical problems.)
> 
> One practical problem is that users who do this
> 
>  $ git archive HEAD Documentation/ | tar tf -
> 
> would be expecting (at least) two different things, depending on the
> situation they are in.
> 
> So at least you'd need an "--include-untracked" option, I guess.

Right, this breaks down with directories -- most build artifacts (e.g.
.o files) are probably not meant to end up in archives.  We could still
do it for regular files and symlinks.  Perhaps that's too confusing,
though, and an --add-untracked-file parameter (or whatever we want to
call it) is the way to go.

René


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-04 Thread suzuki toshiya

Dear Junio,

Could you tell me your thought about the way for me to go?
Do you agree with his suggestion; "--uid etc is not the right
solution, --include-untracked is better and generic" ? Or,
should I work "--uid etc" further?

Regards,
mpsuzuki

Junio C Hamano wrote:

René Scharfe  writes:


I don't know if it's a good idea, but perhaps we don't even need a new
option.  We could change how pathspecs of untracked files are handled:
Instead of aborting we could include them in the archive.  (Sounds like
the simplest possible interface, but may have practical problems.)


One practical problem is that users who do this

$ git archive HEAD Documentation/ | tar tf -

would be expecting (at least) two different things, depending on the
situation they are in.

So at least you'd need an "--include-untracked" option, I guess.





Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-04 Thread Junio C Hamano
René Scharfe  writes:

> I don't know if it's a good idea, but perhaps we don't even need a new
> option.  We could change how pathspecs of untracked files are handled:
> Instead of aborting we could include them in the archive.  (Sounds like
> the simplest possible interface, but may have practical problems.)

One practical problem is that users who do this

$ git archive HEAD Documentation/ | tar tf -

would be expecting (at least) two different things, depending on the
situation they are in.

So at least you'd need an "--include-untracked" option, I guess.


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-04 Thread René Scharfe
Am 04.01.2018 um 03:25 schrieb suzuki toshiya:
> Taking a glance on parse-options.h, I could not find the
> existing class collecting the operands as an array (or
> linked list) from multiple "--xxx=yyy" options. Similar
> things might be the collecting the pathnames to pathspec
> structure. Should I write something with OPTION_CALLBACK?

There is OPT_STRING_LIST; Documentation/technical/api-parse-options.txt
says:

  `OPT_STRING_LIST(short, long,  string_list, arg_str, description)`::
  Introduce an option with string argument.
  The string argument is stored as an element in `string_list`.
  Use of `--no-option` will clear the list of preceding values.

I don't know if it's a good idea, but perhaps we don't even need a new
option.  We could change how pathspecs of untracked files are handled:
Instead of aborting we could include them in the archive.  (Sounds like
the simplest possible interface, but may have practical problems.)

René


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-03 Thread suzuki toshiya

Hi,


Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).


Taking a glance on parse-options.h, I could not find the
existing class collecting the operands as an array (or
linked list) from multiple "--xxx=yyy" options. Similar
things might be the collecting the pathnames to pathspec
structure. Should I write something with OPTION_CALLBACK?

Regards,
mpsuzuki

suzuki toshiya wrote:

Dear René ,

By overlooking your response, I was writing a patch to add
uid/gid into zip archive X-D (not finished yet)
https://github.com/mpsuzuki/git/tree/add-zip-uid-gid
However, I found that most unix platforms use infozip's
extension to store uid/gid instead of pkzip's extension...


So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:


Oh, I should check other software's tarball :-)


The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.


Hmm, I was not aware of such security concern about the
tarball including the developers username.


So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.


Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).

I want to hear other experts' comment; no need for me to
work "--uid" "--gid" anymore, and should I switch to
"--append-file" options?

Regards,
mpsuzuki

René Scharfe wrote:

Am 02.01.2018 um 07:58 schrieb suzuki toshiya:

Dear René ,

René Scharfe wrote:

Am 29.12.2017 um 15:05 schrieb suzuki toshiya:

The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)

Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:

  dist: git-archive$(X) configure
  ./git-archive --format=tar \
  --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
  @mkdir -p $(GIT_TARNAME)
  @cp configure $(GIT_TARNAME)
  @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
  @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
  $(TAR) rf $(GIT_TARNAME).tar \
  $(GIT_TARNAME)/configure \
  $(GIT_TARNAME)/version \
  $(GIT_TARNAME)/git-gui/version
  @$(RM) -r $(GIT_TARNAME)
  gzip -f -9 $(GIT_TARNAME).tar

Having files with different owners and groups is a non-issue when
extracting with --no-same-owner, which is the default for regular users.
I assume this covers most use cases in the wild.

The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more 

Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-03 Thread suzuki toshiya

Dear René ,

By overlooking your response, I was writing a patch to add
uid/gid into zip archive X-D (not finished yet)
https://github.com/mpsuzuki/git/tree/add-zip-uid-gid
However, I found that most unix platforms use infozip's
extension to store uid/gid instead of pkzip's extension...


So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:


Oh, I should check other software's tarball :-)


The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.


Hmm, I was not aware of such security concern about the
tarball including the developers username.


So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.


Hmm, it could be reasonable to assume that --append-file
would serve more cases than --uid --gid options. There
might be many people who don't care multiple UID/GID in
the source tarball, but want to append some files to the
archive generated by git-archive. I would take a look how
to do that. A point I'm afraid is that some people may
request to pass the file listing the pathnames instead of
giving many --append-file options (and a few people could
want to have a built-in default list specified by GNU
convention :-)).

I want to hear other experts' comment; no need for me to
work "--uid" "--gid" anymore, and should I switch to
"--append-file" options?

Regards,
mpsuzuki

René Scharfe wrote:

Am 02.01.2018 um 07:58 schrieb suzuki toshiya:

Dear René ,

René Scharfe wrote:

Am 29.12.2017 um 15:05 schrieb suzuki toshiya:

The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)

Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html


So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:

  dist: git-archive$(X) configure
  ./git-archive --format=tar \
  --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
  @mkdir -p $(GIT_TARNAME)
  @cp configure $(GIT_TARNAME)
  @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
  @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
  $(TAR) rf $(GIT_TARNAME).tar \
  $(GIT_TARNAME)/configure \
  $(GIT_TARNAME)/version \
  $(GIT_TARNAME)/git-gui/version
  @$(RM) -r $(GIT_TARNAME)
  gzip -f -9 $(GIT_TARNAME).tar

Having files with different owners and groups is a non-issue when
extracting with --no-same-owner, which is the default for regular users.
I assume this covers most use cases in the wild.

The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.

Changing the build procedure to set owner and group to root as well as
UID and GID to zero seems like a better idea.  This is complicated by
the inconsistent command line options for GNU tar and bsdtar, as you
mentioned.

So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.

Not sure what kind of file name transformation abilities would be
needed and how to package them nicely.  The --transform option of GNU
tar with its sed replace expressions seems quite heavy for me.  With
poppler it's only used to add the --prefix string; I'd expect that to
be done for all 

Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-03 Thread René Scharfe
[replying only to the list because emails to per...@pluto.rain.com
 are rejected by my mail server with the following error message:
 "Requested action not taken: mailbox unavailable
  invalid DNS MX or A/ resource record."]

Am 02.01.2018 um 01:32 schrieb Perry Hutchison:
> Ren?? Scharfe  wrote:
>> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>>> The ownership of files created by git-archive is always
>>> root:root. Add --owner and --group options which work
>>> like the GNU tar equivalent to allow overriding these
>>> defaults.
>> ... the receiver would need to be root to set both IDs, or be a
>> group member to set the group ID; I guess the latter is more common.
> 
> If the received files are owned by root:root as stated, I guess the
> receiver must be running as root, no?

That depends on what you mean with "must".  Users who want the files
they extract to be owned by root need root permissions on Unix and
Linux.  If they are OK with owning the files themselves then regular
user accounts suffice.  I assume the latter is much more common.

René


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-02 Thread René Scharfe
Am 02.01.2018 um 07:58 schrieb suzuki toshiya:
> Dear René ,
> 
> René Scharfe wrote:
>> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
>>> The ownership of files created by git-archive is always
>>> root:root. Add --owner and --group options which work
>>> like the GNU tar equivalent to allow overriding these
>>> defaults.
>>
>> In which situations do you use the new options?
>>
>> (The sender would need to know the names and/or IDs on the receiving
>> end.  And the receiver would need to be root to set both IDs, or be a
>> group member to set the group ID; I guess the latter is more common.)
> 
> Thank you for asking the background.
> 
> In the case that additional contents are appended to the tar file
> generated by git-archive, the part by git-archive and the part
> appended by common tar would have different UID/GID, because common
> tar preserves the UID/GID of the original files.
> 
> Of cource, both of GNU tar and bsdtar have the options to set
> UID/GID manually, but their syntax are different.
> 
> In the recent source package of poppler (poppler.freedesktop.org),
> there are 2 sets of UID/GIDs are found:
> https://poppler.freedesktop.org/poppler-0.62.0.tar.xz
> 
> I've discussed with the maintainers of poppler, and there was a
> suggestion to propose a feature to git.
> https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So this is in the context of generating release tarballs that contain
untracked files as well.  That's done in Git's own Makefile, too:

  dist: git-archive$(X) configure
  ./git-archive --format=tar \
  --prefix=$(GIT_TARNAME)/ HEAD^{tree} > $(GIT_TARNAME).tar
  @mkdir -p $(GIT_TARNAME)
  @cp configure $(GIT_TARNAME)
  @echo $(GIT_VERSION) > $(GIT_TARNAME)/version
  @$(MAKE) -C git-gui TARDIR=../$(GIT_TARNAME)/git-gui dist-version
  $(TAR) rf $(GIT_TARNAME).tar \
  $(GIT_TARNAME)/configure \
  $(GIT_TARNAME)/version \
  $(GIT_TARNAME)/git-gui/version
  @$(RM) -r $(GIT_TARNAME)
  gzip -f -9 $(GIT_TARNAME).tar

Having files with different owners and groups is a non-issue when
extracting with --no-same-owner, which is the default for regular users.
I assume this covers most use cases in the wild.

The generated archive leaks the IDs of the user preparing the archive in
the appended entries for untracked files.  I think that's more of a
concern.  Publishing a valid non-root username on your build system may
invite attackers.

Changing the build procedure to set owner and group to root as well as
UID and GID to zero seems like a better idea.  This is complicated by
the inconsistent command line options for GNU tar and bsdtar, as you
mentioned.

So how about making it possible to append untracked files using git
archive?  This could simplify the dist target for Git as well.  It's
orthogonal to adding the ability to explicitly specify owner and group,
but might suffice in most (all?) cases.

Not sure what kind of file name transformation abilities would be
needed and how to package them nicely.  The --transform option of GNU
tar with its sed replace expressions seems quite heavy for me.  With
poppler it's only used to add the --prefix string; I'd expect that to
be done for all appended files anyway.

Perhaps something like --append-file= with no transformation
feature is already enough for most cases?

>> Would it make sense to support the new options for ZIP files as well?
> 
> I was not aware of the availability of UID/GID in pkzip file format...
> Oh, checking APPNOTE.TXT ( 
> https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),
> there is a storage! (see 4.5.7-Unix Extra Field). But it seems
> that current git-archive emits pkzip without the field.

Indeed.  Git doesn't track owners and groups, so it doesn't make
sense to emit that kind of information so far.  If git archive grows
options to specify such meta-information then it should be supported
by all archive formats (or documented to be tar-specific).

However, the UNIX Extra Field in ZIP archives only allows to store
UID and GID (no names), which is useless unless the sender knows the
ID range of the receiver -- which is unlikely when distributing
software on the Internet.  And even then it won't work with Windows,
which has long Security Identifiers (SIDs) instead.

So these are more advantages for letting git archive append untracked
files: It's format-agnostic and more portable.


[snipped interesting history of security-related tar options]

Btw. I like how bsdtar has --insecure as a synonym for -p (preserve
file permissions when extracting).  It's a bit sad that this is still
the default for root, though.  OpenBSD cut that behavior out of their
tar almost 20 years ago.  (An evil tar archive could be used to fill
the quota of unsuspecting users, or add setuid executables.)

>>> +#if ULONG_MAX > 0xUL
>>> +    /*
>>> + * --owner, --group rejects uid/gid 

Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-01 Thread suzuki toshiya

Dear René ,

René Scharfe wrote:

Am 29.12.2017 um 15:05 schrieb suzuki toshiya:

The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.


In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)


Thank you for asking the background.

In the case that additional contents are appended to the tar file
generated by git-archive, the part by git-archive and the part
appended by common tar would have different UID/GID, because common
tar preserves the UID/GID of the original files.

Of cource, both of GNU tar and bsdtar have the options to set
UID/GID manually, but their syntax are different.

In the recent source package of poppler (poppler.freedesktop.org),
there are 2 sets of UID/GIDs are found:
https://poppler.freedesktop.org/poppler-0.62.0.tar.xz

I've discussed with the maintainers of poppler, and there was a
suggestion to propose a feature to git.
https://lists.freedesktop.org/archives/poppler/2017-December/012739.html

So now I'm trying.


Would it make sense to support the new options for ZIP files as well?


I was not aware of the availability of UID/GID in pkzip file format...
Oh, checking APPNOTE.TXT ( 
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT ),

there is a storage! (see 4.5.7-Unix Extra Field). But it seems
that current git-archive emits pkzip without the field.

The background why I propose the options for tar format was described
in above. Similar things are hoped by pkzip users? If it's required,
I will try.


+--owner=[:]::
+   Force  as owner and  as uid for the files in the tar
+   archive.  If  is not supplied,  can be either a user
+   name or numeric UID.  In this case the missing part (UID or
+   name) will be inferred from the current host's user database.
+
+--group=[:]::
+   Force  as group and  as gid for the files in the tar
+   archive.  If  is not supplied,  can be either a group
+   name or numeric GID.  In this case the missing part (GID or
+   name) will be inferred from the current host's group database.
+


IIUC the default behavior is kept, i.e. without these options the
archive entries appear to be owned by root:root.  I think it's a good
idea to mention this here.


Indeed. The default behaviour of git-archive without these options
(root:root) would be different from that of (common) tar (preserving
uid/gid of the files to be archived), it should be clarified.


bsdtar has --uname, --uid, --gname, and -gid, which seem simpler.  At
least you could use OPT_STRING and OPT_INTEGER with them (plus a range
check).  And they should be easier to explain.


Thank you very much for proposing good alternative. Indeed, such well-
separated options make the code simple & stable. However, according
to the manual search systems of FreeBSD ( https://www.freebsd.org/cgi/man.cgi ),
the options for such functionalities are not always same.

FreeBSD 8.2 and earlier: --uname, --gname, --uid, --gid are unavailable.
it seems that using "mtree" was the preferred way to specify such).

FreeBSD 8.3 and later: --uname, --gname, --uid, --gid are available.
the manual says follows:

 --uid id
 Use the provided user id number and ignore the user name from the
 archive.  On create, if --uname is not also specified, the user
 name will be set to match the user id.

 --uname name
 Use the provided user name.  On extract, this overrides the user
 name in the archive; if the provided user name does not exist on
 the system, it will be ignored and the user id (from the archive
 or from the --uid option) will be used instead.  On create, this
 sets the user name that will be stored in the archive; the name
 is not verified against the system user database.

Thus, to emulate (post 2012-) bsdtar perfectly, getpwnam(), getpwuid() etc
would be still needed to implement "--uid" X-(.

Tracking the history of bsdtar, maybe I should track the history of GNU
tar. According to ChangeLog, even --owner --group are rather newer option
since 1.13.18 (released on 2000-10-29). The original syntax was like this.

`--owner=USER'
 Specifies that `tar' should use USER as the owner of members when
 creating archives, instead of the user associated with the source
 file.  USER is first decoded as a user symbolic name, but if this
 interpretation fails, it has to be a decimal numeric user ID.

 There is no value indicating a missing number, and `0' usually
 means `root'.  Some people like to force `0' as the value to offer
 in their distributions for the owner of files, because the `root'
 user is anonymous anyway, so 

Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-01 Thread Perry Hutchison
Ren?? Scharfe  wrote:
> Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
> > The ownership of files created by git-archive is always
> > root:root. Add --owner and --group options which work
> > like the GNU tar equivalent to allow overriding these
> > defaults.
> ... the receiver would need to be root to set both IDs, or be a
> group member to set the group ID; I guess the latter is more common.

If the received files are owned by root:root as stated, I guess the
receiver must be running as root, no?


Re: [PATCH] git-archive: accept --owner and --group like GNU tar

2018-01-01 Thread René Scharfe
Am 29.12.2017 um 15:05 schrieb suzuki toshiya:
> The ownership of files created by git-archive is always
> root:root. Add --owner and --group options which work
> like the GNU tar equivalent to allow overriding these
> defaults.

In which situations do you use the new options?

(The sender would need to know the names and/or IDs on the receiving
end.  And the receiver would need to be root to set both IDs, or be a
group member to set the group ID; I guess the latter is more common.)

> Signed-off-by: suzuki toshiya 
> ---
>   Documentation/git-archive.txt |  13 +++
>   archive-tar.c |   8 +-
>   archive.c | 224 
> ++
>   archive.h |   4 +
>   t/t5005-archive-uid-gid.sh| 140 ++
>   t/t5005/parse-tar-file.py |  60 +++
>   tar.h |   2 +
>   7 files changed, 447 insertions(+), 4 deletions(-)
>   create mode 100755 t/t5005-archive-uid-gid.sh
>   create mode 100755 t/t5005/parse-tar-file.py

Would it make sense to support the new options for ZIP files as well?

> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index cfa1e4ebe..0d156f6c1 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -11,6 +11,7 @@ SYNOPSIS
>   [verse]
>   'git archive' [--format=] [--list] [--prefix=/] []
> [-o  | --output=] [--worktree-attributes]
> +   [--owner [username[:uid]] [--group [groupname[:gid]]
> [--remote= [--exec=]] 
> [...]
>   
> @@ -63,6 +64,18 @@ OPTIONS
>   This can be any options that the archiver backend understands.
>   See next section.
>   
> +--owner=[:]::
> + Force  as owner and  as uid for the files in the tar
> + archive.  If  is not supplied,  can be either a user
> + name or numeric UID.  In this case the missing part (UID or
> + name) will be inferred from the current host's user database.
> +
> +--group=[:]::
> + Force  as group and  as gid for the files in the tar
> + archive.  If  is not supplied,  can be either a group
> + name or numeric GID.  In this case the missing part (GID or
> + name) will be inferred from the current host's group database.
> +

IIUC the default behavior is kept, i.e. without these options the
archive entries appear to be owned by root:root.  I think it's a good
idea to mention this here.

bsdtar has --uname, --uid, --gname, and -gid, which seem simpler.  At
least you could use OPT_STRING and OPT_INTEGER with them (plus a range
check).  And they should be easier to explain.

> diff --git a/archive.c b/archive.c
> index 0b7b62af0..aa4b16b75 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -8,6 +8,7 @@
>   #include "parse-options.h"
>   #include "unpack-trees.h"
>   #include "dir.h"
> +#include "tar.h"
>   
>   static char const * const archive_usage[] = {
>   N_("git archive []  [...]"),
> @@ -417,6 +418,223 @@ static void parse_treeish_arg(const char **argv,
>   { OPTION_SET_INT, (s), NULL, (v), NULL, "", \
> PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) }
>   
> +/*
> + * GNU tar --owner, --group options reject hexdigit, signed int values.
> + * strtol(), atoi() are too permissive to simulate the behaviour.
> + */
> +#define STR_IS_DIGIT_OK 0
> +#define STR_IS_NOT_DIGIT -1
> +#define STR_IS_DIGIT_TOO_LARGE -2
> +
> +static int try_as_simple_digit(const char *s, unsigned long *dst)
> +{
> + unsigned long ul;
> + char *endptr;
> +
> + if (strlen(s) != strspn(s, "0123456789"))
> + return STR_IS_NOT_DIGIT;
> +
> + errno = 0;
> + ul = strtoul(s, , 10);
> +
> + /* catch ERANGE */
> + if (errno) {
> + errno = 0;
> + return STR_IS_DIGIT_TOO_LARGE;
> + }
> +
> +#if ULONG_MAX > 0xUL
> + /*
> +  * --owner, --group rejects uid/gid greater than 32-bit
> +  * limits, even on 64-bit platforms.
> +  */
> + if (ul > 0xUL)
> + return STR_IS_DIGIT_TOO_LARGE;
> +#endif

The #if is not really necessary, is it?  Compilers should be able to
optimize the conditional out on 32-bit platforms.

> +static int set_args_uname_uid(struct archiver_args *args,
> + const char *tar_owner)
> +{
> + int r;
> + struct passwd *pw = NULL;
> +
> + if (!args || !tar_owner)
> + return NAME_ID_ERR_PARAMS;
> +
> + r = try_as_name_colon_digit(tar_owner, &(args->uname),
> + &(args->uid));
> + switch (r) {
> + case STR_IS_NAME_COLON_DIGIT:
> + return NAME_ID_BOTH_GIVEN;
> + case STR_HAS_DIGIT_TOO_LARGE:
> + return NAME_ID_ERR_ID_TOO_LARGE;
> + case STR_HAS_DIGIT_BROKEN:
> + return NAME_ID_ERR_SYNTAX;
> + }
> +
> + /* the operand is known to be single token */
> +
> + r = try_as_simple_digit(tar_owner, &(args->uid));
> 

[PATCH] git-archive: accept --owner and --group like GNU tar

2017-12-29 Thread suzuki toshiya
The ownership of files created by git-archive is always
root:root. Add --owner and --group options which work
like the GNU tar equivalent to allow overriding these
defaults.

Signed-off-by: suzuki toshiya 
---
 Documentation/git-archive.txt |  13 +++
 archive-tar.c |   8 +-
 archive.c | 224 ++
 archive.h |   4 +
 t/t5005-archive-uid-gid.sh| 140 ++
 t/t5005/parse-tar-file.py |  60 +++
 tar.h |   2 +
 7 files changed, 447 insertions(+), 4 deletions(-)
 create mode 100755 t/t5005-archive-uid-gid.sh
 create mode 100755 t/t5005/parse-tar-file.py

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index cfa1e4ebe..0d156f6c1 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git archive' [--format=] [--list] [--prefix=/] []
  [-o  | --output=] [--worktree-attributes]
+ [--owner [username[:uid]] [--group [groupname[:gid]]
  [--remote= [--exec=]] 
  [...]
 
@@ -63,6 +64,18 @@ OPTIONS
This can be any options that the archiver backend understands.
See next section.
 
+--owner=[:]::
+   Force  as owner and  as uid for the files in the tar
+   archive.  If  is not supplied,  can be either a user
+   name or numeric UID.  In this case the missing part (UID or
+   name) will be inferred from the current host's user database.
+
+--group=[:]::
+   Force  as group and  as gid for the files in the tar
+   archive.  If  is not supplied,  can be either a group
+   name or numeric GID.  In this case the missing part (GID or
+   name) will be inferred from the current host's group database.
+
 --remote=::
Instead of making a tar archive from the local repository,
retrieve a tar archive from a remote repository. Note that the
diff --git a/archive-tar.c b/archive-tar.c
index c6ed96ee7..ca6471870 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -204,10 +204,10 @@ static void prepare_header(struct archiver_args *args,
xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? 
size : 0);
xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned 
long) args->time);
 
-   xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
-   xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
-   strlcpy(header->uname, "root", sizeof(header->uname));
-   strlcpy(header->gname, "root", sizeof(header->gname));
+   xsnprintf(header->uid, sizeof(header->uid), "%07lo", args->uid);
+   xsnprintf(header->gid, sizeof(header->gid), "%07lo", args->gid);
+   strlcpy(header->uname, args->uname, sizeof(header->uname));
+   strlcpy(header->gname, args->gname, sizeof(header->gname));
xsnprintf(header->devmajor, sizeof(header->devmajor), "%07o", 0);
xsnprintf(header->devminor, sizeof(header->devminor), "%07o", 0);
 
diff --git a/archive.c b/archive.c
index 0b7b62af0..aa4b16b75 100644
--- a/archive.c
+++ b/archive.c
@@ -8,6 +8,7 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "dir.h"
+#include "tar.h"
 
 static char const * const archive_usage[] = {
N_("git archive []  [...]"),
@@ -417,6 +418,223 @@ static void parse_treeish_arg(const char **argv,
{ OPTION_SET_INT, (s), NULL, (v), NULL, "", \
  PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) }
 
+/*
+ * GNU tar --owner, --group options reject hexdigit, signed int values.
+ * strtol(), atoi() are too permissive to simulate the behaviour.
+ */
+#define STR_IS_DIGIT_OK 0
+#define STR_IS_NOT_DIGIT -1
+#define STR_IS_DIGIT_TOO_LARGE -2
+
+static int try_as_simple_digit(const char *s, unsigned long *dst)
+{
+   unsigned long ul;
+   char *endptr;
+
+   if (strlen(s) != strspn(s, "0123456789"))
+   return STR_IS_NOT_DIGIT;
+
+   errno = 0;
+   ul = strtoul(s, , 10);
+
+   /* catch ERANGE */
+   if (errno) {
+   errno = 0;
+   return STR_IS_DIGIT_TOO_LARGE;
+   }
+
+#if ULONG_MAX > 0xUL
+   /*
+* --owner, --group rejects uid/gid greater than 32-bit
+* limits, even on 64-bit platforms.
+*/
+   if (ul > 0xUL)
+   return STR_IS_DIGIT_TOO_LARGE;
+#endif
+
+   if (dst)
+   *dst = ul;
+   return STR_IS_DIGIT_OK;
+}
+
+static const char *skip_leading_colon(const char *s)
+{
+   const char *col_pos;
+
+   col_pos = strchr(s, ':');
+   if (!col_pos)
+   return s;
+
+   return (col_pos + 1);
+}
+
+#define STR_IS_NAME_COLON_DIGIT 0
+#define STR_HAS_NO_COLON -1
+#define STR_HAS_DIGIT_BROKEN -2
+#define STR_HAS_DIGIT_TOO_LARGE -3
+
+static int try_as_name_colon_digit(const char *s, const char **dst_s,
+   unsigned