[ https://issues.apache.org/jira/browse/TIKA-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905636#comment-16905636 ]
Sergey Beryozkin commented on TIKA-2910: ---------------------------------------- Hi [~talli...@apache.org], IMHO it should be fixed in the 1.x branch as well, may be with a property letting the users to enable or disable this fix at runtime > Text extraction using Tika command line and Tika server differs > --------------------------------------------------------------- > > Key: TIKA-2910 > URL: https://issues.apache.org/jira/browse/TIKA-2910 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.21 > Reporter: Walter > Priority: Major > Labels: newbie > Attachments: CorpusP_25471990.xml > > > When extracting TXT from the very same XML file using either Tika command > line utility or the Tika in server mode, the results differ. > It looks as if PCDATA in deeper nested XML structures are just ignored and > only an empty line is returned. > I assume both use the same base code. Are there any default settings that may > differ or can be set? > -- This message was sent by Atlassian JIRA (v7.6.14#76016)