[jira] [Commented] (TIKA-4210) Not able to identify tika extension
[ https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827193#comment-17827193 ] Tim Allison commented on TIKA-4210: --- Those files look like this in the rtf file: {code:java} {\pict\wbitmap0\picw14\pich26\wbmbitspixel1\wbmplanes1\wbmwidthbytes2\picwGoal210\pichGoal390 fffcbffc9ffc8ffc87fc83fc81fc80fc807c803c801c800c8004800c801c803c807c80fc81fc83fc87fc8ffc9ffcbffcfffcfffc}\ {code} and {code:java} {\pict\wbitmap0\picw173\pich7\wbmbitspixel1\wbmplanes1\wbmwidthbytes22\picwGoal2076\pichGoal84 fff8c7f8ff1fe3fc7f8ff1fe3fc7f8ff1fe3fc7f8ff1fe38b7f6fedfdbfb7f6fedfdbfb7f6fedfdbfb7f6fedfdb893f27e4fc9f93f27e4fc9f93f27e4fc9f93f27e4fc98b7f6fedfdbfb7f6fedfdbfb7f6fedfdbfb7f6fedfdb8cff9ff3fe7fcff9ff3fe7fcff9ff3fe7fcff9ff3fe78fff8} {code} > Not able to identify tika extension > --- > > Key: TIKA-4210 > URL: https://issues.apache.org/jira/browse/TIKA-4210 > Project: Tika > Issue Type: Bug >Reporter: Tika User >Priority: Major > Attachments: sample.DOC > > > Hi Team, > The attached embedded file contain .MPGA attachments which tika is not able > to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still > showing it as empty. Please look into this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4210) Not able to identify tika extension
[ https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827191#comment-17827191 ] Tim Allison commented on TIKA-4210: --- Nick is right. The file is an RTF file. Tika does find two embedded files identified as x-rtf-raw-bitmap. We don't have a parser for that format, I don't think. {code:java} [ { "Content-Length": "19619", "Content-Type": "application/rtf", "X-TIKA:Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.microsoft.rtf.RTFParser" ], "X-TIKA:Parsed-By-Full-Set": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.microsoft.rtf.RTFParser", "org.apache.tika.parser.EmptyParser" ], "X-TIKA:content": "...", "X-TIKA:content_handler": "ToTextContentHandler", "X-TIKA:embedded_depth": "0", "X-TIKA:parse_time_millis": "143", "resourceName": "sample.DOC.rtf" }, { "Content-Length": "52", "Content-Type": "image/x-rtf-raw-bitmap", "Content-Type-Parser-Override": "image/x-rtf-raw-bitmap", "X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser", "X-TIKA:embedded_depth": "1", "X-TIKA:embedded_id": "1", "X-TIKA:embedded_id_path": "/1", "X-TIKA:embedded_resource_path": "/file_0", "X-TIKA:parse_time_millis": "1", "resourceName": "file_0", "rtf_meta:thumbnail": "false" }, { "Content-Length": "154", "Content-Type": "image/x-rtf-raw-bitmap", "Content-Type-Parser-Override": "image/x-rtf-raw-bitmap", "X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser", "X-TIKA:embedded_depth": "1", "X-TIKA:embedded_id": "2", "X-TIKA:embedded_id_path": "/2", "X-TIKA:embedded_resource_path": "/file_1", "X-TIKA:parse_time_millis": "0", "resourceName": "file_1", "rtf_meta:thumbnail": "false" } ] {code} > Not able to identify tika extension > --- > > Key: TIKA-4210 > URL: https://issues.apache.org/jira/browse/TIKA-4210 > Project: Tika > Issue Type: Bug >Reporter: Tika User >Priority: Major > Attachments: sample.DOC > > > Hi Team, > The attached embedded file contain .MPGA attachments which tika is not able > to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still > showing it as empty. Please look into this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4210) Not able to identify tika extension
[ https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827036#comment-17827036 ] Tika User commented on TIKA-4210: - The attached file is doc extension and from that file it should detect two more files, for those files the tika extension is getting empty. first image : black arrow symbol second image : dotted symbol > Not able to identify tika extension > --- > > Key: TIKA-4210 > URL: https://issues.apache.org/jira/browse/TIKA-4210 > Project: Tika > Issue Type: Bug >Reporter: Tika User >Priority: Major > Attachments: sample.DOC > > > Hi Team, > The attached embedded file contain .MPGA attachments which tika is not able > to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still > showing it as empty. Please look into this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4210) Not able to identify tika extension
[ https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827017#comment-17827017 ] Nick Burch commented on TIKA-4210: -- The attached file seems to be an RTF file. I'm not sure what a ".mega attachment" is, but this file doesn't seem to be one of them... tika-app-2.9.1.jar is able to correctly identify this file as RTF > Not able to identify tika extension > --- > > Key: TIKA-4210 > URL: https://issues.apache.org/jira/browse/TIKA-4210 > Project: Tika > Issue Type: Bug >Reporter: Tika User >Priority: Major > Attachments: sample.DOC > > > Hi Team, > The attached embedded file contain .mega attachments which tika is not able > to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still > showing it as empty. Please look into this. -- This message was sent by Atlassian Jira (v8.20.10#820010)