[
https://issues.apache.org/jira/browse/JCR-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved JCR-1894.
--------------------------------
Resolution: Incomplete
Without an example document there's little we can do about this. See the Tika
project (http://tika.apache.org/) for the text extraction functionality
Jackrabbit nowadays uses, and file an issue at
https://issues.apache.org/jira/browse/TIKA if the problem still occurs with
Tika.
> Word doc extraction problem
> ---------------------------
>
> Key: JCR-1894
> URL: https://issues.apache.org/jira/browse/JCR-1894
> Project: Jackrabbit Content Repository
> Issue Type: Bug
> Components: jackrabbit-text-extractors
> Affects Versions: core 1.4.3
> Environment: OS: Windows 2003 sp2 My-eclipse6.0 / tomcat 5.5 and
> Athelon500+
> Reporter: Rajesh Upadhyay
>
> Hi,
> I have a .doc file which contains data inside a table. Now i want to parse
> the table to get the table values. Normal Parsing is not working for table( I
> mean using String tokenizer) because it is giving some unwanted special
> characters while parsing the table. So I just want to convert that .doc to
> .txt file, then only it is easy to split the values. But i can't make it! Can
> any one please tell me how to parse a MS WORD TABLE Values?
> We need to know the process by which we can index a doc file excluding
> special characters,
> When we will show the excerpt then these special characters make it
> unreadable.
> Thanks in advance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.