Bug#788364: libmagic1: misdetect some Coreboot images as text

2015-09-05 Thread Christoph Biedl
Jérémy Bobbio wrote...

> diffoscope is the tool that we have created as part of the “reproducible
> builds” effort to understand differences between two builds. (...)

Eh, no worries :)  I saw your presentation at DebConf, and of course
I'm interested in supporting diffoscope.

> diffoscope uses libmagic (through its Python bindings) to identify the
> format of the files its trying to compare. Some coreboot images are
> misdetected as text files which results in garbled diffoscope output.

At a quick glance (I might be wrong) I failed to find the magic
0x4F524243 in the images attached to the initial report. OTOH, these
images start with a huge amount (7.3 Mbyte) of \xff octets.
file/libmagic don't look that far into files anyway to it might be
impossible to detect coreboot image files properly. Rainer also
provided some information that point into the same direction.

Still, such a sequence must not be detected as text. I'll prepare
a patch for upstream.

Christoph



signature.asc
Description: Digital signature


Bug#788364: libmagic1: misdetect some Coreboot images as text

2015-09-03 Thread Jérémy Bobbio
retitle 788364 diffoscope: garbled output when comparing some Coreboot images
clone 788364 -1
reassign -1 libmagic1
severity -1 libmagic1 normal
retitle -1 libmagic1: misdetect Coreboot images as text files
thanks

Hi Christoph,

diffoscope is the tool that we have created as part of the “reproducible
builds” effort to understand differences between two builds. We now also
use it to compare builds of Coreboot images.

diffoscope uses libmagic (through its Python bindings) to identify the
format of the files its trying to compare. Some coreboot images are
misdetected as text files which results in garbled diffoscope output.

Proper way to detect Coreboot images is probably to look for a CBFS
header. cbfs_find_header() is how upstream does it:
http://review.coreboot.org/gitweb?p=coreboot.git;a=blob;f=util/cbfstool/cbfs_image.c;h=c40bd6641

I could tell diffoscope to detect Coreboot images with a similar
mechanism but it would probably be better to teach libmagic to do it.
Is that easily doable?

Reiner Herrmann:
> file detects them as plain-text:
> 
> > /tmp/b1_coreboot.rom: ISO-8859 text, with very long lines, with no line 
> > terminators
> > /tmp/b2_coreboot.rom: ISO-8859 text, with very long lines, with no line 
> > terminators
> 
> That's why diffoscope also treats them as text.
> I'm not sure this can/should be fixed inside diffoscope, as we rely on
> libmagic detecting them correctly.

Reiner, I remember you had a look into this during DebConf. Have you
made any progress?

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature


Bug#788364: libmagic1: misdetect some Coreboot images as text

2015-09-03 Thread Reiner Herrmann
Hi Lunar,

On Thu, Sep 03, 2015 at 04:26:57PM +0200, Jérémy Bobbio wrote:
> diffoscope uses libmagic (through its Python bindings) to identify the
> format of the files its trying to compare. Some coreboot images are
> misdetected as text files which results in garbled diffoscope output.
> 
> Proper way to detect Coreboot images is probably to look for a CBFS
> header. cbfs_find_header() is how upstream does it:
> http://review.coreboot.org/gitweb?p=coreboot.git;a=blob;f=util/cbfstool/cbfs_image.c;h=c40bd6641
> 
> I could tell diffoscope to detect Coreboot images with a similar
> mechanism but it would probably be better to teach libmagic to do it.
> Is that easily doable?
> 
> Reiner Herrmann:
> > file detects them as plain-text:
> > 
> > > /tmp/b1_coreboot.rom: ISO-8859 text, with very long lines, with no line 
> > > terminators
> > > /tmp/b2_coreboot.rom: ISO-8859 text, with very long lines, with no line 
> > > terminators
> > 
> > That's why diffoscope also treats them as text.
> > I'm not sure this can/should be fixed inside diffoscope, as we rely on
> > libmagic detecting them correctly.
> 
> Reiner, I remember you had a look into this during DebConf. Have you
> made any progress?

Unfortunately I haven't found any easy solution to it.
It looked like magic(5) files require constant offsets for
checking magic numbers.
And I also didn't see a way to look at an offset backwards from the end
of a file (where CBFS images have an offset to the header).

I just had another look and saw that the "type" field can also be a
search. So it could be possible to detect them via pattern files.



signature.asc
Description: Digital signature