> On June 4, 2014, 11:25 p.m., Matthias Krueger wrote:
> > The Matlab MIME types used seem to be application/x-matlab-data or 
> > application/matlab-mat.
> > 
> > Would it make sense to add them to the mime XML for detection?
> > 
> > <mime-type type="application/x-matlab-data">
> >   <comment>MATLAB data file</comment>
> >   <alias type="application/matlab-mat"/>
> >   <magic priority="50">
> >     <match value="MATLAB" type="string" offset="0"/>
> >   </magic>
> >   <glob pattern="*.mat"/>
> > </mime-type>
> > 
> >
> 
> Chris Mattmann wrote:
>     +1 this makes a ton of sense to add IMO.
> 
> Nick Burch wrote:
>     There's some odd whitespace going on - we normally use 4 spaces and no 
> tabs.
>     
>     When outputting the variables, it would probably make sense to put each 
> one into either a paragraph or a list, so that we get helpful output in html 
> mode as well as text mode
>     
>     With that in place, it would then be possible to have a unit test that 
> checked the html output, as well as the current text one
>     
>     Also on testing, I think at least some of the tests have an 
> implementation of assertContains, which generally gives a more helpful 
> failure message than assertTrue(s.contains(...)) does, might be worth looking 
> into that?
> 
> Ann Burgess wrote:
>     Great input - thank you! I will integrate both and upload the diff.

This is on a good way, some quick additional comments:
* I tested with the files in 
https://github.com/scipy/scipy/tree/master/scipy/io/matlab/tests/data. JMatIO 
only support MATLAB 5 files. This could be added as a quick comment or javadoc.
* I think Tika is based on JDK 1.6. I don't see a reason for the test to take 
care and always just return-succeeding on JDK 1.5.


- Matthias


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22246/#review44773
-----------------------------------------------------------


On June 4, 2014, 10:23 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22246/
> -----------------------------------------------------------
> 
> (Updated June 4, 2014, 10:23 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> This is a new parser for Matlab .mat files.  The parser utilizes the JmatIO, 
> Matlab's MAT-file I/O API in JAVA. JmatIO is available through Maven Central. 
>  The text output from this parser provides variable names and dimensions that 
> are both inside and outside of data structures, but does NOT provide the 
> actual data values within each .mat file. 
> 
> 
> Diffs
> -----
> 
> 
> Diff: https://reviews.apache.org/r/22246/diff/
> 
> 
> Testing
> -------
> 
> Successfully run a basic unit test that checks both --text and --metadata 
> parser output.  
> 
> 
> File Attachments
> ----------------
> 
> Parser File
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/04/cb39636d-ec53-4fbc-b348-6a4db8907f6b__MatParser.java
> Unit Test
>   
> https://reviews.apache.org/media/uploaded/files/2014/06/04/bbff8c6b-caa1-4830-b441-532c28c3c78e__MatParserTest.java
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>

Reply via email to