added both the python library libmat2 and a command line tool called
mat2 to remove metadata from various files.

https://0xacab.org/jvoisin/mat2

tests are disabled because the tarball in https://pypi.org/project/mat2/
doesn't include the test documents.

the test documents are, however, present in
https://0xacab.org/jvoisin/mat2 so cloning that repository separately
and running the test yields the attached test-results.txt file. Looks
like it fails on some video files which i'll look into, but it mostly
works at least on my own personal files!

this library can be a building block for apps that use mat2 like
https://gitlab.com/rmnvgr/metadata-cleaner as well.

the library also requires a couple runtime libraries to be installed,
and they can be checked by running the --check-dependencies command.

$ mat2 --check-dependencies
Dependencies for mat2 0.13.4:
- Cairo: yes
- Exiftool: yes (optional)
- Ffmpeg: yes (optional)
- GLib from PyGobject: yes
- GdkPixbuf from PyGobject: yes
- Mutagen: yes
- Poppler from PyGobject: yes
- PyGobject: yes

please test! works on my files on current/amd64. OK?

-- 
jagtalon.net
weirder.earth/@jag

Attachment: py3-mat2.tar.gz
Description: application/gzip

jag@big ~/D/mat2 (master)> coverage run --branch -m unittest discover -s tests/
...E.....FF..FF..........EEERROR:root:Something went wrong during the 
processing of ./tests/data/clean.avi: Command '['/usr/local/bin/ffmpeg', '-i', 
'./tests/data/clean.avi', '-y', '-map', '0', '-codec', 'copy', '-loglevel', 
'panic', '-hide_banner', '-map_metadata', '-1', '-map_chapters', '-1', 
'-disposition', '0', '-fflags', '+bitexact', '-flags:v', '+bitexact', 
'-flags:a', '+bitexact', './tests/data/clean.cleaned.avi']' returned non-zero 
exit status 1.
.ERROR:root:Something went wrong during the processing of 
./tests/data/--output.avi: Command '['/usr/local/bin/ffmpeg', '-i', 
'./tests/data/--output.avi', '-y', '-map', '0', '-codec', 'copy', '-loglevel', 
'panic', '-hide_banner', '-map_metadata', '-1', '-map_chapters', '-1', 
'-disposition', '0', '-fflags', '+bitexact', '-flags:v', '+bitexact', 
'-flags:a', '+bitexact', './tests/data/--output.cleaned.avi']' returned 
non-zero exit status 1.
...ERROR:root:Unable to parse /tmp/tmp5je1k6bq/OEBPS/content.opf in 
./tests/data/clean.epub.
WARNING:root:Something went wrong during deep cleaning of OEBPS/content.opf in 
./tests/data/clean.epub
..........FWARNING:root:Not a valid bencoded string: 137
WARNING:root:Not a valid bencoded string: 137
WARNING:root:Not a valid bencoded string: 
WARNING:root:Not a valid bencoded string: 
WARNING:root:Not a valid bencoded string: 
WARNING:root:Invalid bencoded value (data after valid prefix)
..F............................[+] Testing pdf
[+] Testing png
[+] Testing jpg
[+] Testing wav
[+] Testing aiff
[+] Testing mp3
[+] Testing ogg
[+] Testing flac
[+] Testing docx
[+] Testing odt
[+] Testing tiff
Warning: [minor] Can't delete IFD0 from TIFF - ./tests/data/clean.tiff
[+] Testing bmp
[+] Testing torrent
[+] Testing odf
[+] Testing odg
[+] Testing txt
[+] Testing gif
[+] Testing css
[+] Testing svg
[+] Testing ppm
[+] Testing avi
[+] Testing mp4
WARNING:root:The format of "./tests/data/clean.mp4" (video/mp4) has some 
mandatory metadata fields; mat2 filled them with standard data.
WARNING:root:The format of "./tests/data/clean.cleaned.mp4" (video/mp4) has 
some mandatory metadata fields; mat2 filled them with standard data.
[+] Testing wmv
WARNING:root:The format of "./tests/data/clean.wmv" (video/x-ms-wmv) has some 
mandatory metadata fields; mat2 filled them with standard data.
WARNING:root:The format of "./tests/data/clean.cleaned.wmv" (video/x-ms-wmv) 
has some mandatory metadata fields; mat2 filled them with standard data.
[+] Testing heic
Warning: ICC_Profile deleted. Image colors may be affected - 
./tests/data/clean.heic
Warning: ICC_Profile deleted. Image colors may be affected - 
./tests/data/clean.cleaned.heic
...EEEEEWARNING:root:./tests/data/clean.pptx contains invalid cNvPr: {1, 2, 3, 
4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 20, 22, 24}
................E....FE..........ERROR:root:In file ./tests/data/clean.docx, 
element word/media/setup.py's format (text/x-python) isn't supported
.ERROR:root:In file ./tests/data/clean.odt, element Pictures/setup.py's format 
(text/x-python) isn't supported
.....Warning: [minor] Can't delete IFD0 from TIFF - ./tests/data/clean.tiff
..WARNING:root:In file ./tests/data/clean.docx, keeping unknown element 
word/media/setup.py (format: text/x-python)
.WARNING:root:In file ./tests/data/clean.docx, omitting unknown element 
word/media/setup.py (format: text/x-python)
..
======================================================================
ERROR: test_different (test_climat2.TestCommandLineParallel.test_different)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 269, in 
test_different
    shutil.copytree(src, dst)
  File "/usr/local/lib/python3.11/shutil.py", line 573, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/shutil.py", line 471, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: './tests/data/parallel'

======================================================================
ERROR: test_docx (test_corrupted_files.TestCorruptedEmbedded.test_docx)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 69, in 
test_docx
    parser.remove_all()
    ^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'remove_all'

======================================================================
ERROR: test_odt (test_corrupted_files.TestCorruptedEmbedded.test_odt)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 77, in 
test_odt
    self.assertFalse(parser.remove_all())
                     ^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'remove_all'

======================================================================
ERROR: test_tar (test_libmat2.TestCleaningArchives.test_tar)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 679, in test_tar
    
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
 'This is a comment, be careful!')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'

======================================================================
ERROR: test_tarbz2 (test_libmat2.TestCleaningArchives.test_tarbz2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 749, in 
test_tarbz2
    
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
 'This is a comment, be careful!')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'

======================================================================
ERROR: test_targz (test_libmat2.TestCleaningArchives.test_targz)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 714, in test_targz
    
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
 'This is a comment, be careful!')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'

======================================================================
ERROR: test_tarxz (test_libmat2.TestCleaningArchives.test_tarxz)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 784, in test_tarxz
    
self.assertEqual(meta['./tests/data/dirty.docx']['word/media/image1.png']['Comment'],
 'This is a comment, be careful!')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'

======================================================================
ERROR: test_zip (test_libmat2.TestCleaningArchives.test_zip)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 649, in test_zip
    
self.assertEqual(meta['tests/data/dirty.docx']['word/media/image1.png']['Comment'],
 'This is a comment, be careful!')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'word/media/image1.png'

======================================================================
ERROR: test_tar (test_libmat2.TestGetMeta.test_tar)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 241, in test_tar
    self.assertEqual(meta['./tests/data/dirty.flac']['comments'], 'Thank you 
for using MAT !')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'comments'

======================================================================
ERROR: test_zip (test_libmat2.TestGetMeta.test_zip)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 189, in test_zip
    self.assertEqual(meta['tests/data/dirty.flac']['comments'], 'Thank you for 
using MAT !')
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'comments'

======================================================================
FAIL: test_docx (test_climat2.TestGetMeta.test_docx)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 203, in test_docx
    self.assertIn(b'Application: LibreOffice/5.4.5.1$Linux_X86_64', stdout)
AssertionError: b'Application: LibreOffice/5.4.5.1$Linux_X86_64' not found in 
b"[-] ./tests/data/dirty.docx's format (None) is not supported\n"

======================================================================
FAIL: test_flac (test_climat2.TestGetMeta.test_flac)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 226, in test_flac
    self.assertIn(b'comments: Thank you for using MAT !', stdout)
AssertionError: b'comments: Thank you for using MAT !' not found in b"[-] 
./tests/data/dirty.flac's format (None) is not supported\n"

======================================================================
FAIL: test_odt (test_climat2.TestGetMeta.test_odt)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 211, in test_odt
    self.assertIn(b'generator: LibreOffice/3.3$Unix', stdout)
AssertionError: b'generator: LibreOffice/3.3$Unix' not found in b"[-] 
./tests/data/dirty.odt's format (None) is not supported\n"

======================================================================
FAIL: test_ogg (test_climat2.TestGetMeta.test_ogg)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_climat2.py", line 234, in test_ogg
    self.assertIn(b'comments: Thank you for using MAT !', stdout)
AssertionError: b'comments: Thank you for using MAT !' not found in b"[-] 
./tests/data/dirty.ogg's format (None) is not supported\n"

======================================================================
FAIL: test_tar (test_corrupted_files.TestCorruptedFiles.test_tar)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 320, in 
test_tar
    with self.assertRaises(ValueError):
AssertionError: ValueError not raised

======================================================================
FAIL: test_zip (test_corrupted_files.TestCorruptedFiles.test_zip)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_corrupted_files.py", line 242, in 
test_zip
    with self.assertRaises(ValueError):
AssertionError: ValueError not raised

======================================================================
FAIL: test_wmv (test_libmat2.TestGetMeta.test_wmv)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jag/Downloads/mat2/tests/test_libmat2.py", line 206, in test_wmv
    self.assertEqual(mimetype, 'video/x-ms-wmv')
AssertionError: None != 'video/x-ms-wmv'

----------------------------------------------------------------------
Ran 125 tests in 97.346s

FAILED (failures=7, errors=10)

Reply via email to