[ 
https://issues.apache.org/jira/browse/TIKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Palsulich updated TIKA-1310:
----------------------------------

    Attachment: multi_valued_test.html
                multi_metadata_output.json
                multi_metadata_expected.json
                cli_json_test.patch

This also affects the current revision. The issue is that {{<meta 
property="multiple_values" content="1,2,3,4" />}} is parsed into 
{{"multiple_values":1,2,3,4}}. I'm pretty sure (correct me if I'm wrong) the 
proper response should be {{"multiple_values":\[1,2,3,4\]}}. The JSON formatter 
at org.apache.tika.cli.TikaCLI.NoDocumentJSONMetHandler should handle multiple 
values. But, two issues: the JSON option of the CLI is not being tested at all 
and {{1,2,3,4}} is parsed as a single value. So, there are no brackets printed 
around the list.

I attached a simple HTML file with this issue, the current JSON output, the 
expected JSON output, and a patch with a (currently failing) unit test.

> Parse error - fb:admins property
> --------------------------------
>
>                 Key: TIKA-1310
>                 URL: https://issues.apache.org/jira/browse/TIKA-1310
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.5
>         Environment: java version "1.7.0_60-ea"
> Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 24.60-b03, mixed mode)
>            Reporter: Vitor Oliveira
>            Priority: Critical
>         Attachments: cli_json_test.patch, multi_metadata_expected.json, 
> multi_metadata_output.json, multi_valued_test.html
>
>
> Steps to reproduce the problem:
> 1) Download the HTML file:
> curl --output default.html 
> "http://techcrunch.com/2014/05/01/snapchat-adds-text-chat-and-video-calls/";
> 2) Extract the metadata
> java -jar tika-app-1.5.jar --json default.html --encoding=UTF-8 > 
> metadata.json
> There is a problem with the "fb:admins" property that does not allow the JSON 
> file to be parsed properly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to