[jira] [Commented] (TIKA-4210) Not able to identify tika extension

2024-03-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827193#comment-17827193
 ] 

Tim Allison commented on TIKA-4210:
---

Those files look like this in the rtf file:

{code:java}
{\pict\wbitmap0\picw14\pich26\wbmbitspixel1\wbmplanes1\wbmwidthbytes2\picwGoal210\pichGoal390
 
fffcbffc9ffc8ffc87fc83fc81fc80fc807c803c801c800c8004800c801c803c807c80fc81fc83fc87fc8ffc9ffcbffcfffcfffc}\
{code}
and

{code:java}
 
{\pict\wbitmap0\picw173\pich7\wbmbitspixel1\wbmplanes1\wbmwidthbytes22\picwGoal2076\pichGoal84
 
fff8c7f8ff1fe3fc7f8ff1fe3fc7f8ff1fe3fc7f8ff1fe38b7f6fedfdbfb7f6fedfdbfb7f6fedfdbfb7f6fedfdb893f27e4fc9f93f27e4fc9f93f27e4fc9f93f27e4fc98b7f6fedfdbfb7f6fedfdbfb7f6fedfdbfb7f6fedfdb8cff9ff3fe7fcff9ff3fe7fcff9ff3fe7fcff9ff3fe78fff8}
 {code}

 

> Not able to identify tika extension
> ---
>
> Key: TIKA-4210
> URL: https://issues.apache.org/jira/browse/TIKA-4210
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Major
> Attachments: sample.DOC
>
>
> Hi Team,
> The attached embedded file contain .MPGA attachments which tika is  not able 
> to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still 
> showing it as empty. Please look into this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4210) Not able to identify tika extension

2024-03-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827191#comment-17827191
 ] 

Tim Allison commented on TIKA-4210:
---

Nick is right. The file is an RTF file. Tika does find two embedded files 
identified as x-rtf-raw-bitmap. We don't have a parser for that format, I don't 
think.
{code:java}
[
    {
        "Content-Length": "19619",
        "Content-Type": "application/rtf",
        "X-TIKA:Parsed-By": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.microsoft.rtf.RTFParser"
        ],
        "X-TIKA:Parsed-By-Full-Set": [
            "org.apache.tika.parser.DefaultParser",
            "org.apache.tika.parser.microsoft.rtf.RTFParser",
            "org.apache.tika.parser.EmptyParser"
        ],
        "X-TIKA:content": "...",
        "X-TIKA:content_handler": "ToTextContentHandler",
        "X-TIKA:embedded_depth": "0",
        "X-TIKA:parse_time_millis": "143",
        "resourceName": "sample.DOC.rtf"
    },
    {
        "Content-Length": "52",
        "Content-Type": "image/x-rtf-raw-bitmap",
        "Content-Type-Parser-Override": "image/x-rtf-raw-bitmap",
        "X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser",
        "X-TIKA:embedded_depth": "1",
        "X-TIKA:embedded_id": "1",
        "X-TIKA:embedded_id_path": "/1",
        "X-TIKA:embedded_resource_path": "/file_0",
        "X-TIKA:parse_time_millis": "1",
        "resourceName": "file_0",
        "rtf_meta:thumbnail": "false"
    },
    {
        "Content-Length": "154",
        "Content-Type": "image/x-rtf-raw-bitmap",
        "Content-Type-Parser-Override": "image/x-rtf-raw-bitmap",
        "X-TIKA:Parsed-By": "org.apache.tika.parser.EmptyParser",
        "X-TIKA:embedded_depth": "1",
        "X-TIKA:embedded_id": "2",
        "X-TIKA:embedded_id_path": "/2",
        "X-TIKA:embedded_resource_path": "/file_1",
        "X-TIKA:parse_time_millis": "0",
        "resourceName": "file_1",
        "rtf_meta:thumbnail": "false"
    }
] {code}

> Not able to identify tika extension
> ---
>
> Key: TIKA-4210
> URL: https://issues.apache.org/jira/browse/TIKA-4210
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Major
> Attachments: sample.DOC
>
>
> Hi Team,
> The attached embedded file contain .MPGA attachments which tika is  not able 
> to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still 
> showing it as empty. Please look into this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4210) Not able to identify tika extension

2024-03-14 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827036#comment-17827036
 ] 

Tika User commented on TIKA-4210:
-

The attached file is doc extension and from that file it should detect two more 
files, for those files the tika extension is getting empty.




first image : black arrow symbol

second image : dotted symbol

> Not able to identify tika extension
> ---
>
> Key: TIKA-4210
> URL: https://issues.apache.org/jira/browse/TIKA-4210
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Major
> Attachments: sample.DOC
>
>
> Hi Team,
> The attached embedded file contain .MPGA attachments which tika is  not able 
> to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still 
> showing it as empty. Please look into this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4210) Not able to identify tika extension

2024-03-14 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827017#comment-17827017
 ] 

Nick Burch commented on TIKA-4210:
--

The attached file seems to be an RTF file. I'm not sure what a ".mega 
attachment" is, but this file doesn't seem to be one of them...

tika-app-2.9.1.jar is able to correctly identify this file as RTF

> Not able to identify tika extension
> ---
>
> Key: TIKA-4210
> URL: https://issues.apache.org/jira/browse/TIKA-4210
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Major
> Attachments: sample.DOC
>
>
> Hi Team,
> The attached embedded file contain .mega attachments which tika is  not able 
> to identify its extension. Tried in in tika versions 2.9.0 and 2.9.1 still 
> showing it as empty. Please look into this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)