I have a PDF which was created using Apache PDF Box 2.0.0-SNAPSHOT.
Unfortunately Tika 1.10 seems unable to extract any text from the PDF,
I don't get any exceptions or errors. The code is as simple as:
new Tika().parseToString(new FileInputStream(f))
Tika is always returning just the empty
Thanks
I have created an issue.
metadata.set(RESOURCE_NAME_KEY, filename) also did not work. For now I am
telling the parser specifically it is plain text files. But it would be really
nice to have this addressed because I would like to use the auto detect ability
in my app.
regards
> On
File works with Tika trunk. What's on your classpath: tika-app or just
tika-core? Is there a chance that you don't have tika-parsers on your cp?
-Original Message-
From: Adam Retter [mailto:adam.ret...@googlemail.com]
Sent: Wednesday, October 14, 2015 12:14 PM
To:
On Wed, 14 Oct 2015, Ziqi Zhang wrote:
My apologies, here are the testing files attached.
Any chance you could open a bug in bugzilla, and attach these files there?
At first glance, it looks like those files have some certain text patterns
near the start which is causing them to be
Hi
There might be a bug with the AutoDetectParser, which fails to recognise some
plain-text files as plain text.
In the attachment are three testing files, as you can see they are all plain
text.
The following code is used for my testing:
AutoDetectParser parser = new
This is a result of false positive mime-type detection. In first case file
starts with "ID3" which is usually present in mp3 (audio/mpeg) files. Other
two files starts with P1 or P4 which are present in start of
image/x-portable-bitmap files.
You can either use text parser directrly or pass
Many thanks
As for bugzilla, I was unable to create a new bug, as it is saying “first you
must pick a product…” and there is no tika in the list.
> On 14 Oct 2015, at 10:40, Konstantin Gribov wrote:
>
> This is a result of false positive mime-type detection. In first case
On Wed, 14 Oct 2015, Ziqi Zhang wrote:
As for bugzilla, I was unable to create a new bug, as it is saying
“first you must pick a product…” and there is no tika in the list.
Sorry, wrong project - POI uses Bugzilla, Tika uses JIRA, I wasn't paying
enough attention!
The starting point for