Bug#838860: Version 32+ Flash (SWF) files detected as 'application/octet-stream' (data), not 'application/x-shockwave-flash' by file (libmagic1)

2017-01-25 Thread Christoph Biedl
Laurence Parry wrote...

> Perhaps, though the SWF format does not make it easy . . .

Thanks a lot for your input, and sorry for the long delay. I'll try to
find a solution that covers at least the vast majority of the files that
are around there. Frankly speaking, file(1) cannot be perfect and will
never be. But we can at least aim.

> == FWS ==
(...)

> On the plus side, "FrameSize RECT always has Xmin and Ymin value of 0." So
> we could create 31 cases depending on the value of the ninth octet equating
> to a particular bitmask and then check for 0-values for Xmin and Xmax [which
> vary in length and, for Ymin, position, depending on their length].
> 
> In other words, in this particular case, we check in bitstream order for:
> [01011|000|xxx|00]
> [mask |   Xmin   |   Xmax   |   Ymin  ]
> 
> I foresee lots of & and ^, unfortunately. But it should be possible. Could
> short-cut it a bit, since for all but the 1- and 2-bit cases, the rest of
> the ninth octet must be 0 in order to match Xmin, so it's not necessary to
> mask the ninth octet to match the first five bits.

That is something to work on. Most notably, a mask len of six and above
requires the following octet has a value of 0x1f the most, i.e.
non-printable. This leaves six cases to examine, that's feasible.


> == ZLIB  (CWS) ==

> CM (compression method) nibble is always 8, and the CINFO (compression info)
> nibble which defines the base-2 logarithm of the LC77 window size, minus
> eight, must be 7 or below. In all the files I have examined, it is 7;
> however it could theoretically be something else. This means the ninth byte
> of a CWS file is 0xN8 , where N <= 7; and commonly it is 0x78 ('x'). [Note:
> it is perfectly possible for an uncompressed FWS file to have an 0x78 in the
> 9th position.]

You brought back old memories. I remember I had to detect compressed
files before, might have been git's packed files. However, this is one
of the places where I'd sacrifice perfection for a solution that is good
enough for the most cases.

> == LZMA (ZWS) ==

> I don't have any of these SWF files to hand, but the specification above
> notes that LZMA Utils only creates files with lz/lp/pb values 3/0/2. This
> would correspond to a properties byte of 0x5d (9th octet). There is also a
> little-endian dictionary size and a file length, which may be all FF if it
> is unknown. For comparison, one bare .lzma file looks like this:
> 
>   5d 00 00 80 00 ff ff ff  ff ff ff ff ff 00 16 e9
> |]...|

So we'll have to guess here anyway. For all three I'll try to come up
with something suitable within the next hours (uploads targetting
stretch should be done be tomorrow). Upstreaming them will be my job,
too.

> Perhaps it's possible to delegate to the LZMA and ZLIB magic to test this?

I'll keep that in mind. It might require a major change in file's
architecture.

Christoph


signature.asc
Description: Digital signature


Bug#838860: Version 32+ Flash (SWF) files detected as 'application/octet-stream' (data), not 'application/x-shockwave-flash' by file (libmagic1)

2017-01-20 Thread Laurence Parry
Should this be submitted upstream via https://bugs.gw.com? I have not done 
so myself because the FAQ suggests that the maintainer should, if necessary.

I appreciate that it'd be nice to have a Debian-developed resolution, as 
this issue was triggered by a fix for another Debian issue. However, it'd 
also be nice to resolve this upstream before Debian 9 is released, as there 
will be an increasing number of Flash files with such versions over its 
lifetime.

Ideally all three styles of SWF file would be able to be distinguished from 
regular text files - but failing that, reverting the version test but 
testing for the presence of ZLIB compression bytes for CWS files, as 
documented above, would at least avoid a regression of the original issue 
with CWSDPMI.TXT in openttd:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745546

If more sample files are required for testing, I suspect Newgrounds would 
provide a fertile source.
-- 
Laurence "GreenReaper" Parry 



Bug#838860: Version 32+ Flash (SWF) files detected as 'application/octet-stream' (data), not 'application/x-shockwave-flash' by file (libmagic1)

2016-10-17 Thread Laurence Parry

Perhaps, though the SWF format does not make it easy . . .

== FWS ==
https://www.adobe.com/content/dam/Adobe/en/devnet/swf/pdf/swf-file-format-spec.pdf
(See Appendix A for another walkthrough)

All integer values are little-endian byte order, but big-endian bit order 
within bytes. Signed integers have typical twos-complement arithmetic 
including sign-extension.


To start with, FrameSize is a RECT - a variable-length structure starting 
with an _unsigned_ five-bit value determining how many bits the other four 
_signed_ bit-values (Xmin, Xmax, Ymin, Ymax) each have. If it starts 01011 
in bitstream order, the next eleven bits are Xmin, and so on.


On the plus side, "FrameSize RECT always has Xmin and Ymin value of 0." So 
we could create 31 cases depending on the value of the ninth octet equating 
to a particular bitmask and then check for 0-values for Xmin and Xmax [which 
vary in length and, for Ymin, position, depending on their length].


In other words, in this particular case, we check in bitstream order for:
[01011|000|xxx|00]
[mask |   Xmin   |   Xmax   |   Ymin  ]

I foresee lots of & and ^, unfortunately. But it should be possible. Could 
short-cut it a bit, since for all but the 1- and 2-bit cases, the rest of 
the ninth octet must be 0 in order to match Xmin, so it's not necessary to 
mask the ninth octet to match the first five bits.


FrameRate and FrameCount might be useful, too. Note that as integers, they 
are byte-aligned, with zero-padding at the end of the preceding RECT if 
necessary.


--

There is another problem: those octets are only guaranteed to be available 
for FWS. In the case of CWS or ZWS, the files are compressed after Length 
with ZLIB (introduced in SWF 6) or LZMA (SWF 13) respectively.


The file in question was CWS, and I understand this to be the default option 
in current versions of Adobe software, which are also the ones most likely 
to be saving files in the latest versions. Reviewing an assortment of the 
latest SWF files uploaded to our website, the division is 60%/40% CWS/FWS.


The compressed length relates to the actual length of the file, but I don't 
think libmagic can use that. However, the files must be in the according 
compressed formats, which have their own headers that may be of use.


== ZLIB  (CWS) ==
https://www.ietf.org/rfc/rfc1950.txt
CM (compression method) nibble is always 8, and the CINFO (compression info) 
nibble which defines the base-2 logarithm of the LC77 window size, minus 
eight, must be 7 or below. In all the files I have examined, it is 7; 
however it could theoretically be something else. This means the ninth byte 
of a CWS file is 0xN8 , where N <= 7; and commonly it is 0x78 ('x'). [Note: 
it is perfectly possible for an uncompressed FWS file to have an 0x78 in the 
9th position.]


The flag octet after it, is commonly 0x9C ('Œ') but this is not guaranteed; 
I have also seen 0xDA ('Ú') and various items may be expected, so I would 
not rely on it. Beyond that is the possible dictionary ID and then 
compressed data.


== LZMA (ZWS) ==
http://www.7-zip.org/a/lzma-specification.7z
with a summary at
https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt

I don't have any of these SWF files to hand, but the specification above 
notes that LZMA Utils only creates files with lz/lp/pb values 3/0/2. This 
would correspond to a properties byte of 0x5d (9th octet). There is also a 
little-endian dictionary size and a file length, which may be all FF if it 
is unknown. For comparison, one bare .lzma file looks like this:


  5d 00 00 80 00 ff ff ff  ff ff ff ff ff 00 16 e9 
|]...|


But it is technically possible to create a valid LZMA stream with other 
property bytes, and presumably these would be valid SWF files as well. 
Perhaps it's possible to delegate to the LZMA and ZLIB magic to test this?


--
Laurence "GreenReaper" Parry
http://greenreaper.co.uk - https://inkbunny.net 



Bug#838860: Version 32+ Flash (SWF) files detected as 'application/octet-stream' (data), not 'application/x-shockwave-flash' by file (libmagic1)

2016-10-17 Thread Christoph Biedl
Laurence Parry wrote...

> It was assumed that the version number would remain below 32 "for the time
> being". This time has passed. Version 32 was published in May 2016, and it
> is already up to 34:
> http://www.adobe.com/devnet/articles/flashplayer-air-feature-list.html
> We detected this issue when our web application refused an SWF file created
> by an animator.

Thanks for the catch, although this is rather bad news for the file
program. As any value from 32 on is a printable character, there will
always be a risk of mis-detection.

> Alternatives which would preserve the fix for #745546 might be to permit
> versions below 48 ('0') or 65 ('A'), and/or to test for a sane length, e.g.
> 
> 0   string  CWS Macromedia Flash data (compressed),
> >3  bytex   version %d,
> >>4 lelong  <0x2000 length %d bytes
> !:mime  application/x-shockwave-flash
> 
> This refuses a 512MB compressed Flash file. I am not aware of anyone who's
> created such a file, but it is technically possible (e.g. Flash games with
> very large embedded flash videos).

I'm not really happy about this and could use more ideas. Assuming you
have a major collection to such files, is there anything in the
following header octets (FrameSize, FrameRate, FrameCount) that
somewhat certainly is not printable?

Christoph


signature.asc
Description: Digital signature


Bug#838860: Version 32+ Flash (SWF) files detected as 'application/octet-stream' (data), not 'application/x-shockwave-flash' by file (libmagic1)

2016-09-25 Thread Laurence Parry
Package: file
Version: 1:5.22+15-2+deb8u2
Tags: upstream

Flash files compiled with -swf-version=32 or above are being recognized as
'application/octet-stream' (data) rather than
'application/x-shockwave-flash' due to a restriction in the magic
definition file.

Command:
file -b --mime-type test.swf

Expected output:
application/x-shockwave-flash

Actual output (with jessie and testing):
application/octet-stream

Hex dump of first 16 bytes:
hd -n 16 test.swf
  43 57 53 20 f9 27 53 00  78 9c 94 9a 55 50 1d d1  |CWS
.'S.x...UP..|
(the full file has not yet been publicly released by the creator)

This bug was introduced in 2014 in version 1.10 of the flash magic
definition used by file (via libmagic1 / libmagic-mgc) in an attempt to fix
Debian bug #745546
https://github.com/file/file/commit/281578a58328ed76ea2b00c03c3e45f36203c354#diff-ea5efd5565ac4dfd72536c835cab977c
This appears to be the current upstream version. The version in wheezy is
not affected.

It was assumed that the version number would remain below 32 "for the time
being". This time has passed. Version 32 was published in May 2016, and it
is already up to 34:
http://www.adobe.com/devnet/articles/flashplayer-air-feature-list.html
We detected this issue when our web application refused an SWF file created
by an animator.

It may be prudent to assume that the full version byte may be used.
However, this would trigger the issue mentioned in #745546:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=745546
i.e. misdetection of this file as a 516MB SWF:
http://git.openttd.org/?p=trunk.git;a=blob;f=os/dos/cwsdpmi/cwsdpmi.txt

Alternatives which would preserve the fix for #745546 might be to permit
versions below 48 ('0') or 65 ('A'), and/or to test for a sane length, e.g.

0   string  CWS Macromedia Flash data (compressed),
>3  bytex   version %d,
>>4 lelong  <0x2000 length %d bytes
!:mime  application/x-shockwave-flash

This refuses a 512MB compressed Flash file. I am not aware of anyone who's
created such a file, but it is technically possible (e.g. Flash games with
very large embedded flash videos).

We've worked around this bug by adding a previous version of the magic
definition to /etc/magic for now.

-- 
Laurence "GreenReaper" Parry
http://www.greenreaper.co.uk/ - https://inkbunny.net