tika-trunk-jdk1.7 - Build # 37 - Failure

2014-06-10 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #37)

Status: Failure

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/37/ to 
view the results.

tika-trunk-jdk1.6 - Build # 37 - Failure

2014-06-10 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.6 (build #37)

Status: Failure

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.6/37/ to 
view the results.

[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026403#comment-14026403
 ] 

Chris A. Mattmann commented on TIKA-1327:
-

Hey [~annieburgess] can you attach the .mat file here?

> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: tika-trunk-jdk1.7 - Build # 37 - Failure

2014-06-10 Thread Ken Krugler
I'm curious - I went to take a look at Jenkins output, and from the UI it would 
appear that the parser build failed, but the build messages show no failures.

How does one go about determining why exactly a Jenkins build has failed?

Thanks,

-- Ken


On Jun 10, 2014, at 1:24am, Apache Jenkins Server  
wrote:

> The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #37)
> 
> Status: Failure
> 
> Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/37/ 
> to view the results.

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Re: tika-trunk-jdk1.7 - Build # 37 - Failure

2014-06-10 Thread Jukka Zitting
Hi,

On Tue, Jun 10, 2014 at 9:06 AM, Ken Krugler
 wrote:
> I'm curious - I went to take a look at Jenkins output, and from the UI it 
> would appear that the parser build failed, but the build messages show no 
> failures.
>
> How does one go about determining why exactly a Jenkins build has failed?

The console output at
https://builds.apache.org/job/tika-trunk-jdk1.7/37/console is helpful:

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-deploy-plugin:2.6:deploy
(default-deploy) on project tika-parsers: Failed to retrieve remote
metadata org.apache.tika:tika-parsers:1.6-SNAPSHOT/maven-metadata.xml:
Could not transfer metadata
org.apache.tika:tika-parsers:1.6-SNAPSHOT/maven-metadata.xml from/to
apache.snapshots.https
(https://repository.apache.org/content/repositories/snapshots): Failed
to transfer file:
https://repository.apache.org/content/repositories/snapshots/org/apache/tika/tika-parsers/1.6-SNAPSHOT/maven-metadata.xml.
Return code is: 503, ReasonPhrase:Service Temporarily Unavailable. ->
[Help 1]

Looks like a temporary problem with repository.apache.org.

BR,

Jukka Zitting


[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026518#comment-14026518
 ] 

Tyler Palsulich commented on TIKA-411:
--

I'm interested in working on this. Should the list be generated on the fly (if 
so, how?), or once, statically, when the website is generated for each new 
version? Getting the list of types isn't difficult (cat tika-mimetypes.xml | 
grep "mime-type type=" | awk -F '"' '{ print $2 }' > types), but there are 1441 
different types. So... we can't just make a big, unwieldily list. The list 
should be browsable, searchable, and [ctrl + f]able. One idea for the full list 
is to have expandable sublists -- application, audio, etc. Any ideas?

> Generate list of supported and detected types automatically
> ---
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jukka Zitting
>Priority: Minor
>
> Currently we edit the list of supported types 
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to 
> leave the list outdated and incomplete. It would be better if the list was 
> automatically generated from the tika-mimetypes.xml file and the 
> getSupportedTypes() response of the AutoDetectParser class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026526#comment-14026526
 ] 

Nick Burch commented on TIKA-411:
-

I'd suggest just using the Tika App, as the --list- type methods on that 
should provide most of what you need. Or ask the Tika server nicely, it offers 
the list as plain text, html or json, the latter should be fairly easy to 
process in code!

However, I'm not sure about generating all of the page automatically. The 
current formats page has quite a lot of manually written text in it around the 
support for each format, and manually groups related formats together along 
with links to the relevant parsers

Maybe it would be better to have something which calls the Tika App list 
parsers method, then warns you if that parser doesn't get mentioned in the 
formats page?

> Generate list of supported and detected types automatically
> ---
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jukka Zitting
>Priority: Minor
>
> Currently we edit the list of supported types 
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to 
> leave the list outdated and incomplete. It would be better if the list was 
> automatically generated from the tika-mimetypes.xml file and the 
> getSupportedTypes() response of the AutoDetectParser class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026626#comment-14026626
 ] 

Tyler Palsulich commented on TIKA-411:
--

That's definitely a nicer list. But, what is the "something" which calls Tika 
App? We don't know if/where trunk/the jar is. So, just have the user call Tika 
App, then add the generated list to formats.apt? Still seems clunky... But, 
what do you think of adding another option to Tika App: 
--list-parser-details-apt? This would output the list with the proper apt list 
format, with links to the JavaDoc for each parser. I'll add a patch with what I 
mean in a few minutes.

> Generate list of supported and detected types automatically
> ---
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jukka Zitting
>Priority: Minor
>
> Currently we edit the list of supported types 
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to 
> leave the list outdated and incomplete. It would be better if the list was 
> automatically generated from the tika-mimetypes.xml file and the 
> getSupportedTypes() response of the AutoDetectParser class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Tyler Palsulich (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Palsulich updated TIKA-411:
-

Attachment: TIKA-411.patch

To use this patch, package Tika App (mvn package), then run `java -jar 
tika-app.jar --list-parser-details-apt`, append (using >>, if you want) that 
list to tika.site/src/site/apt/[version]/formats.apt, run `mvn site:run` from 
tika.site/, and go to 0.0.0.0:8080/1.6/formats.html.

> Generate list of supported and detected types automatically
> ---
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jukka Zitting
>Priority: Minor
> Attachments: TIKA-411.patch
>
>
> Currently we edit the list of supported types 
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to 
> leave the list outdated and incomplete. It would be better if the list was 
> automatically generated from the tika-mimetypes.xml file and the 
> getSupportedTypes() response of the AutoDetectParser class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026654#comment-14026654
 ] 

Nick Burch commented on TIKA-411:
-

I'm not sure I'm a fan of having the tika app output a slightly odd apt format, 
which is only of slight use to generating the site (given how much extra work 
is needed on the text), and no use to anyone else... Happy to see an example 
though!

We're about to have an always-on copy of the Tika Server, I'd probably rather 
point people there to get an auto-generated list of what parsers, detectors and 
types we have, or point them to grab the tika app and ask it. I'd see the 
website version as being a friendly, human written and grouped intro, with the 
server and app providing up-to-the-minute details as required

> Generate list of supported and detected types automatically
> ---
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jukka Zitting
>Priority: Minor
> Attachments: TIKA-411.patch
>
>
> Currently we edit the list of supported types 
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to 
> leave the list outdated and incomplete. It would be better if the list was 
> automatically generated from the tika-mimetypes.xml file and the 
> getSupportedTypes() response of the AutoDetectParser class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1319) Translation

2014-06-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026662#comment-14026662
 ] 

Nick Burch commented on TIKA-1319:
--

Thinking about this, what about having the translator being a wrapping 
(decorating) parser? You initialise it with a language and a "real" parser (eg 
auto detect), then call parse on it as normal. It gets the real parser to 
handle the content, then it calls out to the translation API on the things 
which need it, and skips the bits that don't. 

That might let us translate certain metadata values but not others, with 
control, along with some of the body content but not all (eg an xml handler 
which passes through the characters but not the tags)

> Translation
> ---
>
> Key: TIKA-1319
> URL: https://issues.apache.org/jira/browse/TIKA-1319
> Project: Tika
>  Issue Type: New Feature
>Reporter: Tyler Palsulich
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.6
>
>
> I just opened up a review on reviews.apache.org -- 
> https://reviews.apache.org/r/22219/. I copied the description below. 
> This patch adds basic language translation functionality to Tika. Translation 
> is provided by a Microsoft API, but accessed through Apache 2 licensed 
> com.memetix.microsoft-translator-java-api 
> (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants 
> to use the translation feature, they have to add a client id and client 
> secret to the 
> tika-core/src/main/resources/org/apache/tika/language/translator.properties 
> file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added 
> com.memetix as a dependency in tika-core. I put the Translator class in 
> org.apache.tika.language. There is no integration with the server or CLI, 
> yet. Further, only Strings are translated right now -- if you pass in a full 
> document with xml tags, the structure will be mangled. But, I think that 
> would be a cool feature -- translate the body, title, subtitle, etc, but not 
> the structural elements. 
> There is still more work to do, but I wanted some more eyes on this to make 
> sure I'm heading in the right direction and this is a desired feature. Let me 
> know what you think!
> There are two simple unit tests for now which translate "hello" to French 
> ("salut"). One for inputting the source and target languages, one for 
> inputing just the target language (and detecting the source language 
> automatically).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026664#comment-14026664
 ] 

Nick Burch commented on TIKA-411:
-

For the lazy, any chance you could post the output it generates?

> Generate list of supported and detected types automatically
> ---
>
> Key: TIKA-411
> URL: https://issues.apache.org/jira/browse/TIKA-411
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jukka Zitting
>Priority: Minor
> Attachments: TIKA-411.patch
>
>
> Currently we edit the list of supported types 
> (http://lucene.apache.org/tika/0.7/formats.html) manually, which is bound to 
> leave the list outdated and incomplete. It would be better if the list was 
> automatically generated from the tika-mimetypes.xml file and the 
> getSupportedTypes() response of the AutoDetectParser class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-411) Generate list of supported and detected types automatically

2014-06-10 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026669#comment-14026669
 ] 

Tyler Palsulich commented on TIKA-411:
--

Yep! I'm not particularly attached to this.

{code}
   * 
org.apache.tika.parser.{{{./api/org/apache/tika/parser/AutoDetectParser}AutoDetectParser}}
 (Composite Parser):

  * 
org.apache.tika.parser.{{{./api/org/apache/tika/parser/DefaultParser}DefaultParser}}
 (Composite Parser):

 * 
org.apache.tika.parser.asm.{{{./api/org/apache/tika/parser/asm/ClassParser}ClassParser}}

* application/java-vm

 * 
org.apache.tika.parser.audio.{{{./api/org/apache/tika/parser/audio/AudioParser}AudioParser}}

* audio/x-wav

* audio/x-aiff

* audio/basic

 * 
org.apache.tika.parser.audio.{{{./api/org/apache/tika/parser/audio/MidiParser}MidiParser}}

* application/x-midi

* audio/midi

 * 
org.apache.tika.parser.chm.{{{./api/org/apache/tika/parser/chm/ChmParser}ChmParser}}

* application/vnd.ms-htmlhelp

* application/chm

* application/x-chm

 * 
org.apache.tika.parser.code.{{{./api/org/apache/tika/parser/code/SourceCodeParser}SourceCodeParser}}

* text/x-java-source

* text/x-c++src

* text/x-groovy

 * 
org.apache.tika.parser.crypto.{{{./api/org/apache/tika/parser/crypto/Pkcs7Parser}Pkcs7Parser}}

* application/pkcs7-signature

* application/pkcs7-mime

 * 
org.apache.tika.parser.dwg.{{{./api/org/apache/tika/parser/dwg/DWGParser}DWGParser}}

* image/vnd.dwg

 * 
org.apache.tika.parser.epub.{{{./api/org/apache/tika/parser/epub/EpubParser}EpubParser}}

* application/x-ibooks+zip

* application/epub+zip

 * 
org.apache.tika.parser.executable.{{{./api/org/apache/tika/parser/executable/ExecutableParser}ExecutableParser}}

* application/x-elf

* application/x-sharedlib

* application/x-executable

* application/x-msdownload

* application/x-coredump

* application/x-object

 * 
org.apache.tika.parser.feed.{{{./api/org/apache/tika/parser/feed/FeedParser}FeedParser}}

* application/atom+xml

* application/rss+xml

 * 
org.apache.tika.parser.font.{{{./api/org/apache/tika/parser/font/AdobeFontMetricParser}AdobeFontMetricParser}}

* application/x-font-adobe-metric

 * 
org.apache.tika.parser.font.{{{./api/org/apache/tika/parser/font/TrueTypeParser}TrueTypeParser}}

* application/x-font-ttf

 * 
org.apache.tika.parser.hdf.{{{./api/org/apache/tika/parser/hdf/HDFParser}HDFParser}}

* application/x-hdf

 * 
org.apache.tika.parser.html.{{{./api/org/apache/tika/parser/html/HtmlParser}HtmlParser}}

* application/x-asp

* application/xhtml+xml

* application/vnd.wap.xhtml+xml

* text/html

 * 
org.apache.tika.parser.image.{{{./api/org/apache/tika/parser/image/ImageParser}ImageParser}}

* image/x-ms-bmp

* image/png

* image/x-icon

* image/vnd.wap.wbmp

* image/gif

* image/bmp

* image/x-xcf

 * 
org.apache.tika.parser.image.{{{./api/org/apache/tika/parser/image/PSDParser}PSDParser}}

* image/vnd.adobe.photoshop

 * 
org.apache.tika.parser.image.{{{./api/org/apache/tika/parser/image/TiffParser}TiffParser}}

* image/tiff

 * 
org.apache.tika.parser.iptc.{{{./api/org/apache/tika/parser/iptc/IptcAnpaParser}IptcAnpaParser}}

* text/vnd.iptc.anpa

 * 
org.apache.tika.parser.iwork.{{{./api/org/apache/tika/parser/iwork/IWorkPackageParser}IWorkPackageParser}}

* application/vnd.apple.iwork

* application/vnd.apple.numbers

* application/vnd.apple.keynote

* application/vnd.apple.pages

 * 
org.apache.tika.parser.jpeg.{{{./api/org/apache/tika/parser/jpeg/JpegParser}JpegParser}}

* image/jpeg

 * 
org.apache.tika.parser.mail.{{{./api/org/apache/tika/parser/mail/RFC822Parser}RFC822Parser}}

* message/rfc822

 * 
org.apache.tika.parser.mbox.{{{./api/org/apache/tika/parser/mbox/MboxParser}MboxParser}}

* application/mbox

 * 
org.apache.tika.parser.mbox.{{{./api/org/apache/tika/parser/mbox/OutlookPSTParser}OutlookPSTParser}}

* application/vnd.ms-outlook-pst

 * 
org.apache.tika.parser.microsoft.{{{./api/org/apache/tika/parser/microsoft/OfficeParser}OfficeParser}}

* application/x-mspublisher

* application/x-tika-msoffice

* application/vnd.ms-excel

* application/sldworks

  

[jira] [Commented] (TIKA-1319) Translation

2014-06-10 Thread Ray Gauss II (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026703#comment-14026703
 ] 

Ray Gauss II commented on TIKA-1319:


[~gagravarr], that comment seems to be more closely related to TIKA-1328.

Should we combine these issues?

> Translation
> ---
>
> Key: TIKA-1319
> URL: https://issues.apache.org/jira/browse/TIKA-1319
> Project: Tika
>  Issue Type: New Feature
>Reporter: Tyler Palsulich
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.6
>
>
> I just opened up a review on reviews.apache.org -- 
> https://reviews.apache.org/r/22219/. I copied the description below. 
> This patch adds basic language translation functionality to Tika. Translation 
> is provided by a Microsoft API, but accessed through Apache 2 licensed 
> com.memetix.microsoft-translator-java-api 
> (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants 
> to use the translation feature, they have to add a client id and client 
> secret to the 
> tika-core/src/main/resources/org/apache/tika/language/translator.properties 
> file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added 
> com.memetix as a dependency in tika-core. I put the Translator class in 
> org.apache.tika.language. There is no integration with the server or CLI, 
> yet. Further, only Strings are translated right now -- if you pass in a full 
> document with xml tags, the structure will be mangled. But, I think that 
> would be a cool feature -- translate the body, title, subtitle, etc, but not 
> the structural elements. 
> There is still more work to do, but I wanted some more eyes on this to make 
> sure I'm heading in the right direction and this is a desired feature. Let me 
> know what you think!
> There are two simple unit tests for now which translate "hello" to French 
> ("salut"). One for inputting the source and target languages, one for 
> inputing just the target language (and detecting the source language 
> automatically).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1319) Translation

2014-06-10 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026748#comment-14026748
 ] 

Chris A. Mattmann commented on TIKA-1319:
-

Hey Ray, yeah I think Nick's comments are related to TIKA-1328. Also I resolved 
this issue and committed this as a first start, incremental patch. Let's head 
over to TIKA-1328 and think about how to flesh this out more.

> Translation
> ---
>
> Key: TIKA-1319
> URL: https://issues.apache.org/jira/browse/TIKA-1319
> Project: Tika
>  Issue Type: New Feature
>Reporter: Tyler Palsulich
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.6
>
>
> I just opened up a review on reviews.apache.org -- 
> https://reviews.apache.org/r/22219/. I copied the description below. 
> This patch adds basic language translation functionality to Tika. Translation 
> is provided by a Microsoft API, but accessed through Apache 2 licensed 
> com.memetix.microsoft-translator-java-api 
> (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants 
> to use the translation feature, they have to add a client id and client 
> secret to the 
> tika-core/src/main/resources/org/apache/tika/language/translator.properties 
> file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added 
> com.memetix as a dependency in tika-core. I put the Translator class in 
> org.apache.tika.language. There is no integration with the server or CLI, 
> yet. Further, only Strings are translated right now -- if you pass in a full 
> document with xml tags, the structure will be mangled. But, I think that 
> would be a cool feature -- translate the body, title, subtitle, etc, but not 
> the structural elements. 
> There is still more work to do, but I wanted some more eyes on this to make 
> sure I'm heading in the right direction and this is a desired feature. Let me 
> know what you think!
> There are two simple unit tests for now which translate "hello" to French 
> ("salut"). One for inputting the source and target languages, one for 
> inputing just the target language (and detecting the source language 
> automatically).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1328) Translate Metadata and Content

2014-06-10 Thread Ray Gauss II (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026783#comment-14026783
 ] 

Ray Gauss II commented on TIKA-1328:


Leaning towards the whitelist approach, perhaps we could add an 
{{isTranslatable}} field / method and corresponding constructor to the 
{{Property}} class (with a default of false) and update the properties we want 
to support translation on?

> Translate Metadata and Content
> --
>
> Key: TIKA-1328
> URL: https://issues.apache.org/jira/browse/TIKA-1328
> Project: Tika
>  Issue Type: New Feature
>Reporter: Tyler Palsulich
> Fix For: 1.7
>
>
> Right now, Translation is only done on Strings. Ideally, users would be able 
> to "turn on" translation while parsing. I can think of a couple options:
> - Make a TranslateAutoDetectParser. Automatically detect the file type, parse 
> it, then translate the content.
> - Make a Context switch. When true, translate the content regardless of the 
> parser used. I'm not sure the best way to go about this method, but I prefer 
> it over another Parser.
> Regardless, we need a black or white list for translation. I think black list 
> would be the way to go -- which fields should not be translated (dates, 
> versions, ...) Any ideas? Also, somewhat unrelated, does anyone know of any 
> other open source translation libraries? If we were really lucky, it wouldn't 
> depend on an online service.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22246: New parser for Matlab .mat files

2014-06-10 Thread Ann Burgess

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22246/
---

(Updated June 10, 2014, 6:17 p.m.)


Review request for tika and Chris Mattmann.


Changes
---

.mat file used for unit test. 


Bugs: tika-1327
https://issues.apache.org/jira/browse/tika-1327


Repository: tika


Description
---

This is a new parser for Matlab .mat files.  The parser utilizes the JmatIO, 
Matlab's MAT-file I/O API in JAVA. JmatIO is available through Maven Central.  
The text output from this parser provides variable names and dimensions that 
are both inside and outside of data structures, but does NOT provide the actual 
data values within each .mat file. 


Diffs
-

  trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml 
1601492 
  trunk/tika-parsers/pom.xml 1601492 
  trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java 
PRE-CREATION 
  
trunk/tika-parsers/src/test/java/org/apache/tika/parser/mat/MatParserTest.java 
PRE-CREATION 
  
trunk/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat
 UNKNOWN 

Diff: https://reviews.apache.org/r/22246/diff/


Testing
---

Successfully run a basic unit test that checks both --text and --metadata 
parser output.  


File Attachments (updated)


TIKA-1327.aburgess.140606.patch.txt
  
https://reviews.apache.org/media/uploaded/files/2014/06/06/3babeb42-6e15-4d31-ae7d-9dc7ef4c5f65__TIKA-1327.aburgess.140606.patch.txt
.mat test file
  
https://reviews.apache.org/media/uploaded/files/2014/06/10/43092452-6890-42cc-8254-fcbb1c8e07c6__breidamerkurjokull_radar_profiles_2009.mat


Thanks,

Ann Burgess



[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Ann Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026821#comment-14026821
 ] 

Ann Burgess commented on TIKA-1327:
---

.mat unit test file too large for JIRA, file is attached on Reviewboard here: 
https://reviews.apache.org/media/uploaded/files/2014/06/10/43092452-6890-42cc-8254-fcbb1c8e07c6__breidamerkurjokull_radar_profiles_2009.mat

> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2014-06-10 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1329:
-

 Summary: Add RecursiveParserWrapper aka Jukka's (and Nick's) 
RecursiveMetadataParser
 Key: TIKA-1329
 URL: https://issues.apache.org/jira/browse/TIKA-1329
 Project: Tika
  Issue Type: Improvement
  Components: parser
Reporter: Tim Allison
Priority: Minor
 Fix For: 1.7


Jukka and Nick have a great demo of parsing metadata recursively on the 
[wiki|http://wiki.apache.org/tika/RecursiveMetadata].  For TIKA-1302, I'd like 
to use something similar, and I think that others may find it useful for 
tika-app and tika-server.

I took the code from the wiki and made some modifications.  I'm not sure if we 
should put this in parsers or in a new module for "examples."  Given that I 
think this would be useful for tika-app and tika-server, I'd prefer parsers, 
but I'm open to any input...including "let's not."

I opened up a review board issue here: 
[ rb|http://reviews.apache.org/r/22433/]




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2014-06-10 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1329:
--

Description: 
Jukka and Nick have a great demo of parsing metadata recursively on the 
[wiki|http://wiki.apache.org/tika/RecursiveMetadata].  For TIKA-1302, I'd like 
to use something similar, and I think that others may find it useful for 
tika-app and tika-server.

I took the code from the wiki and made some modifications.  I'm not sure if we 
should put this in parsers or in a new module for "examples."  Given that I 
think this would be useful for tika-app and tika-server, I'd prefer parsers, 
but I'm open to any input...including "let's not."

I opened up a review board issue here: 
[rb|http://reviews.apache.org/r/22433]


  was:
Jukka and Nick have a great demo of parsing metadata recursively on the 
[wiki|http://wiki.apache.org/tika/RecursiveMetadata].  For TIKA-1302, I'd like 
to use something similar, and I think that others may find it useful for 
tika-app and tika-server.

I took the code from the wiki and made some modifications.  I'm not sure if we 
should put this in parsers or in a new module for "examples."  Given that I 
think this would be useful for tika-app and tika-server, I'd prefer parsers, 
but I'm open to any input...including "let's not."

I opened up a review board issue here: 
[ rb|http://reviews.apache.org/r/22433/]



> Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser
> ---
>
> Key: TIKA-1329
> URL: https://issues.apache.org/jira/browse/TIKA-1329
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.7
>
>
> Jukka and Nick have a great demo of parsing metadata recursively on the 
> [wiki|http://wiki.apache.org/tika/RecursiveMetadata].  For TIKA-1302, I'd 
> like to use something similar, and I think that others may find it useful for 
> tika-app and tika-server.
> I took the code from the wiki and made some modifications.  I'm not sure if 
> we should put this in parsers or in a new module for "examples."  Given that 
> I think this would be useful for tika-app and tika-server, I'd prefer 
> parsers, but I'm open to any input...including "let's not."
> I opened up a review board issue here: 
> [rb|http://reviews.apache.org/r/22433]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 22433: Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2014-06-10 Thread Tim Allison

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22433/
---

Review request for tika.


Repository: tika


Description
---

Jukka and Nick have a great demo of parsing metadata recursively on the wiki. 
For TIKA-1302, I'd like to use something similar, and I think that others may 
find it useful for tika-app and tika-server.

I took the code from the wiki and made some modifications. I'm not sure if we 
should put this in parsers or in a new module for "examples." Given that I 
think this would be useful for tika-app and tika-server, I'd prefer parsers, 
but I'm open to any input...including "let's not."


Diffs
-

  
/trunk/tika-parsers/src/main/java/org/apache/tika/parser/RecursiveParserWrapper.java
 PRE-CREATION 
  
/trunk/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/22433/diff/


Testing
---

Included basic unit tests.


File Attachments


test file
  
https://reviews.apache.org/media/uploaded/files/2014/06/10/8cd34908-0edf-47d8-8eda-b33ef4e47e7d__test_recursive_embedded.docx


Thanks,

Tim Allison



[jira] [Commented] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2014-06-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027122#comment-14027122
 ] 

Nick Burch commented on TIKA-1329:
--

Possibly related is the "Example code in documentation?" thread - 
http://mail-archives.apache.org/mod_mbox/tika-dev/201406.mbox/%3Calpine.DEB.2.02.1406022014580.13115%40urchin.earth.li%3E

> Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser
> ---
>
> Key: TIKA-1329
> URL: https://issues.apache.org/jira/browse/TIKA-1329
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.7
>
>
> Jukka and Nick have a great demo of parsing metadata recursively on the 
> [wiki|http://wiki.apache.org/tika/RecursiveMetadata].  For TIKA-1302, I'd 
> like to use something similar, and I think that others may find it useful for 
> tika-app and tika-server.
> I took the code from the wiki and made some modifications.  I'm not sure if 
> we should put this in parsers or in a new module for "examples."  Given that 
> I think this would be useful for tika-app and tika-server, I'd prefer 
> parsers, but I'm open to any input...including "let's not."
> I opened up a review board issue here: 
> [rb|http://reviews.apache.org/r/22433]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1319) Translation

2014-06-10 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027126#comment-14027126
 ] 

Nick Burch commented on TIKA-1319:
--

Sorry, was an idea that came to me when walking home, looks like I grabbed the 
wrong jira out of my inbox when posting it later... For someone with karma, 
please feel free to move over!

> Translation
> ---
>
> Key: TIKA-1319
> URL: https://issues.apache.org/jira/browse/TIKA-1319
> Project: Tika
>  Issue Type: New Feature
>Reporter: Tyler Palsulich
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.6
>
>
> I just opened up a review on reviews.apache.org -- 
> https://reviews.apache.org/r/22219/. I copied the description below. 
> This patch adds basic language translation functionality to Tika. Translation 
> is provided by a Microsoft API, but accessed through Apache 2 licensed 
> com.memetix.microsoft-translator-java-api 
> (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants 
> to use the translation feature, they have to add a client id and client 
> secret to the 
> tika-core/src/main/resources/org/apache/tika/language/translator.properties 
> file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added 
> com.memetix as a dependency in tika-core. I put the Translator class in 
> org.apache.tika.language. There is no integration with the server or CLI, 
> yet. Further, only Strings are translated right now -- if you pass in a full 
> document with xml tags, the structure will be mangled. But, I think that 
> would be a cool feature -- translate the body, title, subtitle, etc, but not 
> the structural elements. 
> There is still more work to do, but I wanted some more eyes on this to make 
> sure I'm heading in the right direction and this is a desired feature. Let me 
> know what you think!
> There are two simple unit tests for now which translate "hello" to French 
> ("salut"). One for inputting the source and target languages, one for 
> inputing just the target language (and detecting the source language 
> automatically).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-93) OCR support

2014-06-10 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027139#comment-14027139
 ] 

Luis Filipe Nassif commented on TIKA-93:


Hi [~tpalsulich],

I think the option to configure tesseract path is very useful. For example, I 
can distribute tesseract binaries together with my  app and do not need to 
change environment variables on the end user os.

> OCR support
> ---
>
> Key: TIKA-93
> URL: https://issues.apache.org/jira/browse/TIKA-93
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Jukka Zitting
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.7
>
> Attachments: TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, 
> TIKA-93.patch, TesseractOCRParser.patch, TesseractOCRParser.patch, 
> TesseractOCR_Tyler.patch, TesseractOCR_Tyler_v2.patch, testOCR.docx, 
> testOCR.pdf, testOCR.pptx
>
>
> I don't know of any decent open source pure Java OCR libraries, but there are 
> command line OCR tools like Tesseract 
> (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to 
> extract text content (where available) from image files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2014-06-10 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027283#comment-14027283
 ] 

Tim Allison commented on TIKA-1329:
---

Y.  That was on my mind. For basic snippets, I think that is the way to go.  
I've found this particular chunk of code to be useful enough (and non-trivial 
enough) that I'd like to promote it to something somewhere in Tika, and I'd 
like to make the output available as json at least via tika-server (and 
probably tika-app, too).  

> Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser
> ---
>
> Key: TIKA-1329
> URL: https://issues.apache.org/jira/browse/TIKA-1329
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.7
>
>
> Jukka and Nick have a great demo of parsing metadata recursively on the 
> [wiki|http://wiki.apache.org/tika/RecursiveMetadata].  For TIKA-1302, I'd 
> like to use something similar, and I think that others may find it useful for 
> tika-app and tika-server.
> I took the code from the wiki and made some modifications.  I'm not sure if 
> we should put this in parsers or in a new module for "examples."  Given that 
> I think this would be useful for tika-app and tika-server, I'd prefer 
> parsers, but I'm open to any input...including "let's not."
> I opened up a review board issue here: 
> [rb|http://reviews.apache.org/r/22433]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027318#comment-14027318
 ] 

Chris A. Mattmann commented on TIKA-1327:
-

Got Annie's patch integrated, and tested, works awesome! Also managed to fix 
the tika-bundle issue with Felix (simple 1 liner, but since I had to update 
Annie's matlab parser in the tika-bundle Felix magic, I went ahead and fixed it 
too as part of this patch). Here comes the commit!

{noformat}
[INFO] Apache Tika parent  SUCCESS [2.083s]
[INFO] Apache Tika core .. SUCCESS [25.944s]
[INFO] Apache Tika parsers ... SUCCESS [1:26.269s]
[INFO] Apache Tika XMP ... SUCCESS [4.044s]
[INFO] Apache Tika serialization . SUCCESS [4.441s]
[INFO] Apache Tika application ... SUCCESS [20.374s]
[INFO] Apache Tika OSGi bundle ... SUCCESS [25.724s]
[INFO] Apache Tika server  SUCCESS [24.425s]
[INFO] Apache Tika translate . SUCCESS [2.302s]
[INFO] Apache Tika Java-7 Components . SUCCESS [3.327s]
[INFO] Apache Tika ... SUCCESS [0.020s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 3:19.889s
[INFO] Finished at: Tue Jun 10 21:27:53 EDT 2014
[INFO] Final Memory: 86M/213M
[INFO] 
{noformat}


> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved TIKA-1327.
-

   Resolution: Fixed
Fix Version/s: 1.6

- patch applied in r1601805. The test file associated with this commit is quite 
big - [~annieburgess] you may want to try and find a smaller one. Thanks so 
much Annie!

> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
> Fix For: 1.6
>
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027381#comment-14027381
 ] 

Hudson commented on TIKA-1327:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #38 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/38/])
Fix for TIKA-1327: New parser for Matlab .mat files contributed by Annie 
Burgess. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1601805)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-bundle/pom.xml
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* /tika/trunk/tika-parsers/pom.xml
* 
/tika/trunk/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat


> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
> Fix For: 1.6
>
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027387#comment-14027387
 ] 

Hudson commented on TIKA-1327:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #38 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/38/])
Add in the Java code to go along with r1601805: TIKA-1327: Matlab parser from 
Annie Burgess. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1601807)
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mat
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mat/MatParserTest.java
Fix for TIKA-1327: New parser for Matlab .mat files contributed by Annie 
Burgess. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1601805)
* /tika/trunk/CHANGES.txt
* /tika/trunk/tika-bundle/pom.xml
* 
/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* /tika/trunk/tika-parsers/pom.xml
* 
/tika/trunk/tika-parsers/src/test/resources/test-documents/breidamerkurjokull_radar_profiles_2009.mat


> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
> Fix For: 1.6
>
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


tika-trunk-jdk1.7 - Build # 39 - Failure

2014-06-10 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #39)

Status: Failure

Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/39/ to 
view the results.

[jira] [Commented] (TIKA-1327) New parser for Matlab .mat files

2014-06-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027407#comment-14027407
 ] 

Hudson commented on TIKA-1327:
--

FAILURE: Integrated in tika-trunk-jdk1.7 #39 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/39/])
Add in the Java code to go along with r1601805: TIKA-1327: Matlab parser from 
Annie Burgess. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1601807)
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mat
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/mat/MatParserTest.java


> New parser for Matlab .mat files
> 
>
> Key: TIKA-1327
> URL: https://issues.apache.org/jira/browse/TIKA-1327
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.5
>Reporter: Ann Burgess
>Assignee: Chris A. Mattmann
>  Labels: parser
> Fix For: 1.6
>
>
> New parser for Matlab .mat files. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)