[jira] [Commented] (TIKA-1699) Integrate the GROBID PDF extractor in Tika

2015-08-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703387#comment-14703387 ] Nick Burch commented on TIKA-1699: -- Quick one - the wiki mentions needing to do a 6

[jira] [Commented] (TIKA-1711) Modify tika-bundle profile activation to require Java 7

2015-08-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703392#comment-14703392 ] Nick Burch commented on TIKA-1711: -- [~bobpaulin] is our resident OSGi guru, happ

[jira] [Commented] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives

2015-08-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704631#comment-14704631 ] Nick Burch commented on TIKA-1710: -- Thanks for this, applied in smaller chunk

[jira] [Commented] (TIKA-1717) Tika throws exception on detecting content-type of a zip file

2015-08-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704694#comment-14704694 ] Nick Burch commented on TIKA-1717: -- This looks to be a Commons Compress bug, so migh

[jira] [Commented] (TIKA-1717) Tika throws exception on detecting content-type of a zip file

2015-08-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705270#comment-14705270 ] Nick Burch commented on TIKA-1717: -- Once the commons community confirm if it's a

[jira] [Resolved] (TIKA-1711) Remove java6-activated profile from tika-bundle and move its plugins to default build

2015-08-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1711. -- Resolution: Fixed Thanks for this, committed in r1696817. > Remove java6-activated profile from t

[jira] [Created] (TIKA-1718) Upgrade to Commons Compress 1.10

2015-08-20 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1718: Summary: Upgrade to Commons Compress 1.10 Key: TIKA-1718 URL: https://issues.apache.org/jira/browse/TIKA-1718 Project: Tika Issue Type: Improvement

[jira] [Resolved] (TIKA-1718) Upgrade to Commons Compress 1.10

2015-08-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1718. -- Resolution: Fixed Fix Version/s: 1.11 Upgraded, and various TODOs fixed. If someone's

[jira] [Resolved] (TIKA-1710) Replace usages of classes in org.apache.tika.io with current alternatives

2015-08-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1710. -- Resolution: Fixed Thanks for the explanation, Guava dependency removed in r1696860, and

[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core

2015-08-27 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717510#comment-14717510 ] Nick Burch commented on TIKA-1706: -- There are a non-zero number of parsers out t

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728863#comment-14728863 ] Nick Burch commented on TIKA-1728: -- The issues is that the v3 files (and earlier?) ar

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729467#comment-14729467 ] Nick Burch commented on TIKA-1657: -- I don't think we want a full flat list o

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729507#comment-14729507 ] Nick Burch commented on TIKA-1657: -- Maybe we should make the options be something

[jira] [Comment Edited] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729573#comment-14729573 ] Nick Burch edited comment on TIKA-1657 at 9/3/15 6:54 PM: -- L

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729573#comment-14729573 ] Nick Burch commented on TIKA-1657: -- Let's consider this co

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-04 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730605#comment-14730605 ] Nick Burch commented on TIKA-1728: -- Whoops, I'd set the wrong parent. Can you

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-08 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734531#comment-14734531 ] Nick Burch commented on TIKA-1728: -- Detection of the v5 file is handled by the

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736533#comment-14736533 ] Nick Burch commented on TIKA-1728: -- {quote}And the v5 file stores "HWP Document

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738454#comment-14738454 ] Nick Burch commented on TIKA-1728: -- That's the header of one of the OLE2 stream

[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2015-09-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14824513#comment-14824513 ] Nick Burch commented on TIKA-1735: -- Any chance you could produce a small DWG file in

[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2015-09-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14847341#comment-14847341 ] Nick Burch commented on TIKA-1735: -- The test DWG files (from other autocad versions

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902585#comment-14902585 ] Nick Burch commented on TIKA-1740: -- You might be better off writing your own Recur

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902601#comment-14902601 ] Nick Burch commented on TIKA-1739: -- I can't actually use the cTAKES parser on m

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903340#comment-14903340 ] Nick Burch commented on TIKA-1739: -- I'm not sure that the cTAKES parser

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903389#comment-14903389 ] Nick Burch commented on TIKA-1739: -- We explicitly don't let you set an {{AutoDet

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903391#comment-14903391 ] Nick Burch commented on TIKA-1739: -- We explicitly don't let you set an {{AutoDet

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903390#comment-14903390 ] Nick Burch commented on TIKA-1739: -- We explicitly don't let you set an {{AutoDet

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1739: - Comment: was deleted (was: We explicitly don't let you set an {{AutoDetectParser}} in the config,

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1739: - Comment: was deleted (was: We explicitly don't let you set an {{AutoDetectParser}} in the config,

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904156#comment-14904156 ] Nick Burch commented on TIKA-1739: -- My view is that {{AutoDetectParser}} is a spe

[jira] [Resolved] (TIKA-1750) CachedTranslator.isAvailable() throws NPE when underlying translator is null

2015-09-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1750. -- Resolution: Fixed Thanks for this, fixed in r1705107. > CachedTranslator.isAvailable() throws NPE w

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907217#comment-14907217 ] Nick Burch commented on TIKA-1657: -- A couple of things to say up front: * I agree

[jira] [Commented] (TIKA-1085) PDF header and mime detection

2015-09-28 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933108#comment-14933108 ] Nick Burch commented on TIKA-1085: -- I think we're still waiting for you to confi

[jira] [Resolved] (TIKA-1085) PDF header and mime detection

2015-09-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1085. -- Resolution: Fixed Fix Version/s: 1.9 1.10 > PDF header and mime detect

[jira] [Commented] (TIKA-1763) StringIndexOutOfBoundsException in ImageMetadataExtractor

2015-10-05 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943693#comment-14943693 ] Nick Burch commented on TIKA-1763: -- Do you have a small test file you could share w

[jira] [Commented] (TIKA-1766) Insecure repository reference

2015-10-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952277#comment-14952277 ] Nick Burch commented on TIKA-1766: -- I can't see any reference to the specified u

[jira] [Commented] (TIKA-1762) Create Executor Service from TikaConfig

2015-10-15 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958796#comment-14958796 ] Nick Burch commented on TIKA-1762: -- I'm wondering if the Parser Context is

[jira] [Commented] (TIKA-1762) Create Executor Service from TikaConfig

2015-10-15 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959732#comment-14959732 ] Nick Burch commented on TIKA-1762: -- I'd say we have a small but non-zero number

[jira] [Commented] (TIKA-1773) No XML Metadata output for JP2 files

2015-10-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960444#comment-14960444 ] Nick Burch commented on TIKA-1773: -- Are you able to convert an existing Tika test i

[jira] [Commented] (TIKA-1772) Mimetype of VTT files

2015-10-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960454#comment-14960454 ] Nick Burch commented on TIKA-1772: -- Thanks for the patch! Couple of minor points

[jira] [Commented] (TIKA-1773) No XML Metadata output for JP2 files

2015-10-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960599#comment-14960599 ] Nick Burch commented on TIKA-1773: -- Ah, I think I've found the issue. Based

[jira] [Resolved] (TIKA-1772) Mimetype of VTT files

2015-10-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1772. -- Resolution: Fixed Fix Version/s: 1.11 Thanks for that. Looks like we can also do mime magic

[jira] [Comment Edited] (TIKA-1772) Mimetype of VTT files

2015-10-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960722#comment-14960722 ] Nick Burch edited comment on TIKA-1772 at 10/16/15 1:4

[jira] [Commented] (TIKA-1774) org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml not declared

2015-10-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962489#comment-14962489 ] Nick Burch commented on TIKA-1774: -- This looks like a duplicate of TIKA-1215. See [

[jira] [Commented] (TIKA-1785) Move Tika Parser Configuration Files to org/apache/tika/parser/config directory

2015-11-02 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986274#comment-14986274 ] Nick Burch commented on TIKA-1785: -- Potentially we could change the default location

[jira] [Commented] (TIKA-1788) message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header

2015-11-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993539#comment-14993539 ] Nick Burch commented on TIKA-1788: -- Any chance you could write a short junit test sho

[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998814#comment-14998814 ] Nick Burch commented on TIKA-1791: -- There seems to be quite a few changes in the p

[jira] [Commented] (TIKA-1792) Add ASiC-E and ASiC-S mime types

2015-11-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998818#comment-14998818 ] Nick Burch commented on TIKA-1792: -- I don't think we can use the mime magic

[jira] [Commented] (TIKA-1792) Add ASiC-E and ASiC-S mime types

2015-11-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998841#comment-14998841 ] Nick Burch commented on TIKA-1792: -- Luckily these files use the same mimetype sto

[jira] [Resolved] (TIKA-1792) Add ASiC-E and ASiC-S mime types

2015-11-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1792. -- Resolution: Fixed Fix Version/s: (was: 2.0) > Add ASiC-E and ASiC-S mime ty

[jira] [Commented] (TIKA-1792) Add ASiC-E and ASiC-S mime types

2015-11-10 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998898#comment-14998898 ] Nick Burch commented on TIKA-1792: -- Requiring it to be first + uncompressed does in

[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000462#comment-15000462 ] Nick Burch commented on TIKA-1791: -- Thanks for the explanation Next question -

[jira] [Commented] (TIKA-980) MicrodataContentHandler for Apache Tika

2015-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000680#comment-15000680 ] Nick Burch commented on TIKA-980: - Taking a look at {{TIKA-980-1.3-5.patch}}, there&#x

[jira] [Resolved] (TIKA-1793) Email file (.eml extension - "message/rfc822") detected as text/html

2015-11-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1793. -- Resolution: Fixed Fix Version/s: 1.12 Thanks for that, it looks like thunderbird sometimes puts

[jira] [Commented] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005618#comment-15005618 ] Nick Burch commented on TIKA-1791: -- Longer term, we want to move instance-specific co

[jira] [Resolved] (TIKA-1791) URI is not hierarchical exception when location model resource is inside a jar in classpath

2015-11-15 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1791. -- Resolution: Fixed Fix Version/s: 1.12 Thanks, applied in r1714492 (with a few other little tweaks

[jira] [Commented] (TIKA-1796) Issues with tika jar and Microsoft documents like doc.,ppt, xls etc

2015-11-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008971#comment-15008971 ] Nick Burch commented on TIKA-1796: -- Firstly, please don't post to the dev lis

[jira] [Commented] (TIKA-1796) Issues with tika jar and Microsoft documents like doc.,ppt, xls etc

2015-11-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009146#comment-15009146 ] Nick Burch commented on TIKA-1796: -- All the container aware functionality is

[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core

2015-11-26 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028642#comment-15028642 ] Nick Burch commented on TIKA-1706: -- Does anyone have any objections to us going a

[jira] [Commented] (TIKA-1804) Tika use no free json.org

2015-11-30 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031681#comment-15031681 ] Nick Burch commented on TIKA-1804: -- The JSON license has been approved for use by Ap

[jira] [Created] (TIKA-1805) Default parser/detector loading should warn on missing/empty classes

2015-12-01 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1805: Summary: Default parser/detector loading should warn on missing/empty classes Key: TIKA-1805 URL: https://issues.apache.org/jira/browse/TIKA-1805 Project: Tika

[jira] [Resolved] (TIKA-1805) Default parser/detector loading should warn on missing/empty classes

2015-12-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1805. -- Resolution: Fixed Changed as of r1717560, along with an additional handler method to alert if a service

[jira] [Commented] (TIKA-1806) Bouncy Castle conflict

2015-12-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038105#comment-15038105 ] Nick Burch commented on TIKA-1806: -- I've just tried that file with the Tika A

[jira] [Commented] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060203#comment-15060203 ] Nick Burch commented on TIKA-1813: -- My best guess is that these have been trunc

[jira] [Comment Edited] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060203#comment-15060203 ] Nick Burch edited comment on TIKA-1813 at 12/16/15 3:58 PM:

[jira] [Commented] (TIKA-1773) No XML Metadata output for JP2 files

2015-12-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063957#comment-15063957 ] Nick Burch commented on TIKA-1773: -- We can't depend on a LGPL library -

[jira] [Commented] (TIKA-1817) Extracts entire file content for ASCII DXF files

2015-12-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067319#comment-15067319 ] Nick Burch commented on TIKA-1817: -- Any chance you could upload a small sample DXF

[jira] [Commented] (TIKA-1817) Extracts entire file content for ASCII DXF files

2015-12-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067943#comment-15067943 ] Nick Burch commented on TIKA-1817: -- There might be license issues with just taking ra

[jira] [Commented] (TIKA-1817) Extracts entire file content for ASCII DXF files

2015-12-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068077#comment-15068077 ] Nick Burch commented on TIKA-1817: -- I've had a go at adding mime subtypes for b

[jira] [Commented] (TIKA-1817) Extracts entire file content for ASCII DXF files

2015-12-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070169#comment-15070169 ] Nick Burch commented on TIKA-1817: -- Thanks for that. Test file from JustCAD adde

[jira] [Commented] (TIKA-1821) Problem in Tika().detect for xml file signed in CADES

2016-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087613#comment-15087613 ] Nick Burch commented on TIKA-1821: -- Hopefully fixed in r1723581 - the length is par

[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097884#comment-15097884 ] Nick Burch commented on TIKA-1824: -- Tika already supports using a custom classloader

[jira] [Commented] (TIKA-1823) Support detecting DWF format

2016-01-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105081#comment-15105081 ] Nick Burch commented on TIKA-1823: -- Do you have a very small sample DWF file (ide

[jira] [Created] (TIKA-1839) Update website inclusion of Examples for Git

2016-01-22 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1839: Summary: Update website inclusion of Examples for Git Key: TIKA-1839 URL: https://issues.apache.org/jira/browse/TIKA-1839 Project: Tika Issue Type: Task

[jira] [Reopened] (TIKA-1840) No way to link slide notes to slide in PPT output.

2016-01-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reopened TIKA-1840: -- Re-opening as the applied patch causes the notes text to be included twice, which isn't ideal, so fu

[jira] [Commented] (TIKA-1841) Different XML output structure for PPT and PPTX

2016-01-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115857#comment-15115857 ] Nick Burch commented on TIKA-1841: -- I think it would be good to have the PPT and

[jira] [Resolved] (TIKA-1823) Support detecting DWF format

2016-01-26 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1823. -- Resolution: Fixed Fix Version/s: 1.13 Thanks, I've added this magic, along with a unit test

[jira] [Commented] (TIKA-1843) Tika parser for SEG-Y files and new MIME type application/segy

2016-01-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123552#comment-15123552 ] Nick Burch commented on TIKA-1843: -- Looks like Sigrun is an active project, so best

[jira] [Commented] (TIKA-1843) Tika parser for SEG-Y files and new MIME type application/segy

2016-01-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123612#comment-15123612 ] Nick Burch commented on TIKA-1843: -- Getting a maven-built project into the Sonatype

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126317#comment-15126317 ] Nick Burch commented on TIKA-1845: -- Near the top of the jira page are some but

[jira] [Commented] (TIKA-1843) Tika parser for SEG-Y files and new MIME type application/segy

2016-02-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126450#comment-15126450 ] Nick Burch commented on TIKA-1843: -- Ideally you'd work with the Sigrun owner to

[jira] [Commented] (TIKA-1841) Different XML output structure for PPT and PPTX

2016-02-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126532#comment-15126532 ] Nick Burch commented on TIKA-1841: -- Ideally we would break out the header and footer

[jira] [Commented] (TIKA-1848) Address issues with Tika 1.12rc#1

2016-02-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130376#comment-15130376 ] Nick Burch commented on TIKA-1848: -- I'm not sure if our test files should hav

[jira] [Resolved] (TIKA-1821) Problem in Tika().detect for xml file signed in CADES

2016-02-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1821. -- Resolution: Fixed Fix Version/s: 1.13 Thanks for these, I've used the to add unit tests

[jira] [Commented] (TIKA-1850) Tika erroneously detects some versions of jQuery as "text/html"

2016-02-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130678#comment-15130678 ] Nick Burch commented on TIKA-1850: -- Looks like a duplicate to me, are you happy to c

[jira] [Commented] (TIKA-1141) javascript files that contain "

2016-02-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130723#comment-15130723 ] Nick Burch commented on TIKA-1141: -- I've tweaked the mime magic for HTML, s

[jira] [Commented] (TIKA-1850) Tika erroneously detects some versions of jQuery as "text/html"

2016-02-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130860#comment-15130860 ] Nick Burch commented on TIKA-1850: -- Please grab a nightly build / build from git,

[jira] [Commented] (TIKA-1850) Tika erroneously detects some versions of jQuery as "text/html"

2016-02-04 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132249#comment-15132249 ] Nick Burch commented on TIKA-1850: -- It's showing up for me in the snapshots r

[jira] [Commented] (TIKA-1856) Error while parsing an ogg file

2016-02-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148629#comment-15148629 ] Nick Burch commented on TIKA-1856: -- Picking one of those files to look at,{{oggz-

[jira] [Commented] (TIKA-1858) Unable to extract content from chunked portion of large file

2016-02-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150370#comment-15150370 ] Nick Burch commented on TIKA-1858: -- Other than a handful of text-based file types,

[jira] [Commented] (TIKA-1859) file poi reads tika does not bring the content

2016-02-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150596#comment-15150596 ] Nick Burch commented on TIKA-1859: -- Which file? How isn't it working? How are yo

[jira] [Resolved] (TIKA-1856) Error while parsing an ogg file

2016-02-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1856. -- Resolution: Fixed Fix Version/s: 1.13 The fix was fairly quick in the end, but the process of

[jira] [Resolved] (TIKA-1862) Exception in thread "Thread-9" java.lang.UnsatisfiedLinkError: /usr/lib/jvm/jre/lib/amd64/headless/libmawt.so: libcups.so.2: cannot open shared object file: No such file

2016-02-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1862. -- Resolution: Invalid This isn't a Tika issue. You either need to fix your JVM installation, or tal

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154208#comment-15154208 ] Nick Burch commented on TIKA-1607: -- We have generally required those developing a pa

[jira] [Commented] (TIKA-1864) org.apache.poi.hssf.record.formula.UnaryPlusPtg package for tika-app-1.10

2016-02-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156790#comment-15156790 ] Nick Burch commented on TIKA-1864: -- First up, I'd suggest you upgrade to Apache

[jira] [Commented] (TIKA-1867) Tika external parsers cannot be turned off without patching the tika-app-XX.jar

2016-02-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159042#comment-15159042 ] Nick Burch commented on TIKA-1867: -- You should be able to exclude

[jira] [Commented] (TIKA-1868) create clean tika-server jar and shaded classifier jar

2016-02-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162915#comment-15162915 ] Nick Burch commented on TIKA-1868: -- As explained by several people on the mailing

[jira] [Commented] (TIKA-1869) Jackson update to latest version

2016-02-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162917#comment-15162917 ] Nick Burch commented on TIKA-1869: -- Could you try bumping the version in your

[jira] [Commented] (TIKA-1868) create clean tika-server jar and shaded classifier jar

2016-02-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163016#comment-15163016 ] Nick Burch commented on TIKA-1868: -- I'm not sure why you'd want to be u

[jira] [Commented] (TIKA-1870) Relocating RichTextContentHandler into tika-core from tika-server

2016-02-24 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163117#comment-15163117 ] Nick Burch commented on TIKA-1870: -- Currently the class lacks javadocs to explain wha

<    2   3   4   5   6   7   8   9   10   11   >