Dear PDFBox Dev Team,

After searching through online
<https://stackoverflow.com/search?page=5&tab=Relevance&q=pdfbox%20order>, I
am certain that using setSortByPosition(true) would help. However, I am
struggling to get the config file right. Can you please provide any advice
on it?

Thanks so much in advance. Regards, Luke

On Fri, 20 Dec 2019 at 18:06, Lu Sun <vistax...@gmail.com> wrote:

> Dear PDFBox Dev Team,
>
> Hope this message finds you well.
>
> Just wanted to raise this for your attention. Please can you provide any
> solutions on the parsing order issue? Attached is my config file, an
> example of pdf file and my parsing results.
>
> Thanks so much in advance. Wish you and your team a Merry Christmas and
> Happy New Year.
>
> Regards,
> Luke
>
> On Tue, 17 Dec 2019 at 12:34, Tim Allison <talli...@apache.org> wrote:
>
>> PDFBox Colleagues,
>>   Any recommendations?
>>
>> On Mon, Dec 16, 2019 at 7:05 AM Lu Sun <vistax...@gmail.com> wrote:
>>
>>> Dear Tika Dev Team,
>>>
>>>
>>>
>>> Hope this email finds you well.
>>>
>>>
>>>
>>> I have been actively using Tika for pdf file reading. One issue I found
>>> is the parsing order. As shown in attached image, the parsing order of pdf
>>> file is not  based on position of texts.
>>>
>>>
>>>
>>> As suggested in this github link
>>> <https://github.com/chrismattmann/tika-python/issues/266>, I used a
>>> customized config file (see attached), hoping to solve the issue. But this
>>> has not worked out. If any chance, can you please review this issue, and
>>> provide any insights or solutions?
>>>
>>>
>>>
>>> Thanks so much in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Luke
>>>
>>

Reply via email to