Hi Oleg,

UIMA could be useful for extracting text from XML (I'm not familiar
enough with it...), but I think we should still fix Tika's own XML
extraction.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Dec 20, 2012 at 6:14 AM, Oleg Tikhonov <o...@apache.org> wrote:
> Hi Make,
>
> May be consider using of UIMA ("the rule engine") ?
>
> BR,
> Oleg
>
>
>
> On Thu, Dec 20, 2012 at 1:05 PM, Michael McCandless (JIRA)
> <j...@apache.org>wrote:
>
>>
>>      [
>> https://issues.apache.org/jira/browse/TIKA-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Michael McCandless updated TIKA-1048:
>> -------------------------------------
>>
>>     Attachment: TIKA-1048.patch
>>
>> Patch w/ failing test ... I'm not sure where/how to best fix this yet ...
>>
>> > XMLParser should add whitespace between elements
>> > ------------------------------------------------
>> >
>> >                 Key: TIKA-1048
>> >                 URL: https://issues.apache.org/jira/browse/TIKA-1048
>> >             Project: Tika
>> >          Issue Type: Bug
>> >          Components: parser
>> >            Reporter: Michael McCandless
>> >             Fix For: 1.3
>> >
>> >         Attachments: TIKA-1048.patch
>> >
>> >
>> > If the incoming XML is compact (ie doesn't have whitespace between
>> elements), I think we should somehow add whitespace between elements when
>> extracting text?
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>

Reply via email to