[jira] Created: (JCR-1894) Word doc extraction problem

Rajesh Upadhyay (JIRA) Wed, 03 Dec 2008 02:30:08 -0800

Word doc extraction problem
---------------------------

                 Key: JCR-1894
                 URL: https://issues.apache.org/jira/browse/JCR-1894
             Project: Jackrabbit
          Issue Type: Bug
          Components: jackrabbit-text-extractors
    Affects Versions: core 1.4.3
         Environment: OS: Windows 2003 sp2 My-eclipse6.0 / tomcat 5.5 and 
Athelon500+
            Reporter: Rajesh Upadhyay



Hi,
I have a .doc file which contains data inside a table. Now i want to parse the 
table to get the table values. Normal Parsing is not working for table( I mean 
using String tokenizer) because it is giving some unwanted special characters 
while parsing the table. So I just want to convert that .doc to .txt file, then 
only it is easy to split the values. But i can't make it! Can any one please 
tell me how to parse a MS WORD TABLE Values?

We need to know the process by which we can index a doc file excluding special 
characters,
When we will show the excerpt then these special characters make it unreadable.

Thanks in advance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (JCR-1894) Word doc extraction problem

Reply via email to