[
https://issues.apache.org/jira/browse/TIKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tyler Palsulich updated TIKA-1310:
----------------------------------
Attachment: multi_valued_test.html
multi_metadata_output.json
multi_metadata_expected.json
cli_json_test.patch
This also affects the current revision. The issue is that {{<meta
property="multiple_values" content="1,2,3,4" />}} is parsed into
{{"multiple_values":1,2,3,4}}. I'm pretty sure (correct me if I'm wrong) the
proper response should be {{"multiple_values":\[1,2,3,4\]}}. The JSON formatter
at org.apache.tika.cli.TikaCLI.NoDocumentJSONMetHandler should handle multiple
values. But, two issues: the JSON option of the CLI is not being tested at all
and {{1,2,3,4}} is parsed as a single value. So, there are no brackets printed
around the list.
I attached a simple HTML file with this issue, the current JSON output, the
expected JSON output, and a patch with a (currently failing) unit test.
> Parse error - fb:admins property
> --------------------------------
>
> Key: TIKA-1310
> URL: https://issues.apache.org/jira/browse/TIKA-1310
> Project: Tika
> Issue Type: Bug
> Components: metadata
> Affects Versions: 1.5
> Environment: java version "1.7.0_60-ea"
> Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 24.60-b03, mixed mode)
> Reporter: Vitor Oliveira
> Priority: Critical
> Attachments: cli_json_test.patch, multi_metadata_expected.json,
> multi_metadata_output.json, multi_valued_test.html
>
>
> Steps to reproduce the problem:
> 1) Download the HTML file:
> curl --output default.html
> "http://techcrunch.com/2014/05/01/snapchat-adds-text-chat-and-video-calls/"
> 2) Extract the metadata
> java -jar tika-app-1.5.jar --json default.html --encoding=UTF-8 >
> metadata.json
> There is a problem with the "fb:admins" property that does not allow the JSON
> file to be parsed properly.
--
This message was sent by Atlassian JIRA
(v6.2#6252)