[issue44262] tarfile: some content different output

2021-06-01 Thread Filipe Laíns

Change by Filipe Laíns :


--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-31 Thread Vasco Gervasi


Vasco Gervasi  added the comment:

Yes, you can close it.

For future reference:

tar_reset = "/tmp/py_tar_reset.tar"

def reset(tarinfo):
tarinfo.uid = tarinfo.gid = 0
tarinfo.uname = tarinfo.gname = "root"
tarinfo.mtime = 1
return tarinfo

with tarfile.open(tar_reset, "w:xz") as tar_obj:
tar_obj.add("/tmp/a", arcname="a", filter=reset)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-30 Thread Filipe Laíns

Filipe Laíns  added the comment:

Yeah, I understand. What you want is achieved by making sure the mtime from the 
tar archive, and files on the archive, is reproducible, like I demonstrated 
here.

Can this be closed?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-30 Thread Vasco Gervasi


Vasco Gervasi  added the comment:

Dear Filipe,
sorry I did not explaing the use case, obiously this is a toy example to show 
my problem.
So I have pipeline, that from a repository generate a tar file, using a python 
script; if the hash of the tar file is different it will trigger other things.
As you can imagine each time the pipeline is run, the content is the same (if 
same commit) but the files timestamps are different and so the tar is different.

Thanks for pointing out that examples, I will check and let you know.

Thanks

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-30 Thread Filipe Laíns

Filipe Laíns  added the comment:

tarfile will keep the mtime from the file, the issue is that you are touching 
the files in the beginning of the script. When you write to the files, you 
change the mtime (last modified time), which produces a different TarInfo. If 
you comment out the code that writes to the files, you get the exact same 
output.


#dir0 = Path("/tmp/a")
#dir0.mkdir(parents=True, exist_ok=True)
#fil0 = dir0 / "eph0"
#fil0.write_text("Text 0", encoding="UTF-8")
#fil1 = dir0 / "eph1"
#fil1.write_text("Text 1", encoding="UTF-8")
#fil2 = dir0 / "eph2"
#fil2.write_text("Text 2", encoding="UTF-8")


$ python compress.py
b'cc3bd1bf99edc4f0796e1c466d251b0f808db790cbdd55bc920c041fb405e535  
/tmp/py_gzip.tgz\n'
b'cc3bd1bf99edc4f0796e1c466d251b0f808db790cbdd55bc920c041fb405e535  
/tmp/py_gzip.tgz\n'
$ python compress.py
b'cc3bd1bf99edc4f0796e1c466d251b0f808db790cbdd55bc920c041fb405e535  
/tmp/py_gzip.tgz\n'
b'cc3bd1bf99edc4f0796e1c466d251b0f808db790cbdd55bc920c041fb405e535  
/tmp/py_gzip.tgz\n'


If you are in a situation where the mtime may change, but you want the same 
output, you can reset it. See the last example in 
https://docs.python.org/3/library/tarfile.html#tar-examples.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-30 Thread Vasco Gervasi

Vasco Gervasi  added the comment:

Dear Filipe,
thanks for your answer.
Following your suggestion, I have tried the attached file.

The output is:
$ python /data/compress.py
b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4  
/tmp/py_gzip.tgz\n'
b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4  
/tmp/py_gzip.tgz\n'
$ rm -rf a/
$ mv py_gzip.tgz py_gzip.tgz0
$ python /data/compress.py
b'9c897d82c332f0d5443fe66112abe5f318bf6e6574e44c5c3c385f398784ac35  
/tmp/py_gzip.tgz\n'
b'9c897d82c332f0d5443fe66112abe5f318bf6e6574e44c5c3c385f398784ac35  
/tmp/py_gzip.tgz\n'
$ diffoscope py_gzip.tgz0 py_gzip.tgz
--- py_gzip.tgz0
+++ py_gzip.tgz
│   --- py_gzip.tgz0-content
├── +++ py_gzip.tgz-content
│ ├── file list
│ │ @@ -1,4 +1,4 @@
│ │ -drwxr-xr-x   0 root (0) root (0)0 2021-05-30 
15:32:56.566535 a/
│ │ --rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:32:56.566535 a/eph0
│ │ --rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:32:56.566535 a/eph1
│ │ --rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:32:56.566535 a/eph2
│ │ +drwxr-xr-x   0 root (0) root (0)0 2021-05-30 
15:33:16.956535 a/
│ │ +-rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:33:16.956535 a/eph0
│ │ +-rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:33:16.956535 a/eph1
│ │ +-rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:33:16.966535 a/eph2

Even if I am specifing an mtime, it is not correctly applied.

Thanks

--
Added file: https://bugs.python.org/file50073/compress.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-29 Thread Filipe Laíns

Filipe Laíns  added the comment:

I modified the script to keep the both Python generated tarballs and ran 
diffoscope, which presents the issue very clearly:


$ diffoscope py.gz py2.gz
--- py.gz
+++ py2.gz
├── filetype from file(1)
│ @@ -1 +1 @@
│ -gzip compressed data, was "py", last modified: Sat May 29 23:24:02 2021, max 
compression
│ +gzip compressed data, was "py2", last modified: Sat May 29 23:24:03 2021, 
max compression


The issue is that by default, when writing gzip files, the mtime will be set 
for the last modification. This is helpful, but might be unwanted in some 
cases. You can change the mtime as shown in [1].

Now let's take a look at the difference between the file Python generated and 
the one the `tar` command generated.


$ diffoscope py.gz tar_a0.tgz
--- py.gz
+++ tar_a0.tgz
├── filetype from file(1)
│ @@ -1 +1 @@
│ -gzip compressed data, was "py", last modified: Sat May 29 23:24:02 2021, max 
compression
│ +gzip compressed data, from Unix


It seems like it generates the same output here because the `tar` command does 
not set any mtime on the archive by default.


[1] 
https://github.com/FFY00/trampolim/blob/dbd03c90eaa2cc732e1a01268786b491dc872fb7/trampolim/_build.py#L354

--
nosy: +FFY00

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-28 Thread Vasco Gervasi


New submission from Vasco Gervasi :

Hi,
I am seeing some irregularities on the the tar files created using python.

Consider the attached script.
This is the output from the scripts:
```
  # gz
b'0f2eb7b3cac63267b1cf51d2bd5e3144f53cc5b172bbad3dccd5adf4ffb2d220  
/tmp/py.gz\n'
9bde8fdb44d98c5a838a9fedaff6e66cd536d91022f8a64a6ecc514f38ce01af
b'e37c3d30ae3c12e872c6aade55ac0a40da8b3f357ce8ed77287bc9f8f024e587  
/tmp/py.gz\n'
7ac976e3c94b90abff3c4138a2d153e9be9cc87e2b5a97baf2be95ca04029936

  # bz2
b'd04678e749491e4de1065d3f72ba66395d6bd8ffba3d6360ed9ca2c514586fd3  
/tmp/py.bz2\n'
9aa293624df8c40f47614045602af41cc603ca92c97c94926296ef38396d6e3f
b'd04678e749491e4de1065d3f72ba66395d6bd8ffba3d6360ed9ca2c514586fd3  
/tmp/py.bz2\n'
9aa293624df8c40f47614045602af41cc603ca92c97c94926296ef38396d6e3f

  # xz
b'a050baa1ab765fa037524ff061d59f62ad37bc6d1bacf98f9bff2f4b4c312fab  
/tmp/py.xz\n'
ca39f034d7812d2420573218c69313ac31fd516ffebe1a57f4e41a32e1e840b9
b'a050baa1ab765fa037524ff061d59f62ad37bc6d1bacf98f9bff2f4b4c312fab  
/tmp/py.xz\n'
ca39f034d7812d2420573218c69313ac31fd516ffebe1a57f4e41a32e1e840b9

b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/tar_a0.tgz\n'
b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/tar_a1.tgz\n'
b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/gzp_a0.tgz\n'
b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/gzp_a1.tgz\n'
```

As you can see the tar generated using the `tar` command are always same, 
instead the one generated using python are not.

Am I missing some arguments?

Thanks

--
components: Library (Lib)
files: compress.py
messages: 394666
nosy: yellowhat
priority: normal
severity: normal
status: open
title: tarfile: some content different output
type: behavior
Added file: https://bugs.python.org/file50070/compress.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com