https://bz.apache.org/bugzilla/show_bug.cgi?id=60556

            Bug ID: 60556
           Summary: IllegalArgumentException: The end () must not be
                    before the start ()
           Product: POI
           Version: 3.15-FINAL
          Hardware: PC
            Status: NEW
          Severity: major
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: ismaelgom...@gmail.com
  Target Milestone: ---

Created attachment 34596
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34596&action=edit
File which the code fails

I'm extracting the text from a WordExtractor class (apache POI), but I have an
error for some .doc files. Here the code:

"
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.poifs.filesystem.OfficeXmlFileException;

public class [class name] 
{ 
        public static void main(String... args) throws FileNotFoundException,
IOException, NullPointerException, OfficeXmlFileException {

                File[] files = new File("[input path]").listFiles();    
                showFiles(files);
        }

        public static void showFiles(File[] files) throws
FileNotFoundException, IOException, NullPointerException,
OfficeXmlFileException {

                File log = new File("[output name]/out.tsv");

            for (File file : files) {
                if (file.isDirectory()) {
                        //System.out.println("Directory/" + file.getName());
                        showFiles(file.listFiles()); // Calls same method
again.
                } else {
                            String N = file.getName();  

                            // caso .docx
                                if (N.toLowerCase().endsWith(".docx") &&
!N.toLowerCase().startsWith("~"))
                                {       
                                       
System.out.println(file.getAbsolutePath());
                                        XWPFDocument docx = new
XWPFDocument(new FileInputStream(file));
                                        XWPFWordExtractor we = new
XWPFWordExtractor(docx);
                                        String T =
we.getText().replaceAll("\\n", " ").replaceAll("\\r", " ");

                                // PARA ESCRIBIR EL ARCHIVO
                                try{
//                                      if(!log.exists()){
//                                              System.out.println("We had to
make a new file.");
//                                              log.createNewFile();
//                                      }

                                        FileWriter fileWriter = new
FileWriter(log, true);
                                        BufferedWriter bufferedWriter = new
BufferedWriter(fileWriter);
                                       
bufferedWriter.write(file.getAbsolutePath()+"\t"+T+"\n");
                                        bufferedWriter.close();

                                } catch (IOException e) {
                            System.err.println("Problem writing .DOCX to the
file out.txt " + e.getMessage());
                        }
                                } 
                                else {

                                        if (N.toLowerCase().endsWith(".doc") &&
!N.toLowerCase().startsWith("~"))
                                        {
                                               
System.out.println(file.getAbsolutePath());

                                                HWPFDocument doc = new
HWPFDocument(new FileInputStream(file));
                                                WordExtractor we = new
WordExtractor(doc);
                                                //WordExtractor we = new
WordExtractor(new FileInputStream(file));
                                                String T =
we.getText().replaceAll("\\n", " ").replaceAll("\\r", " ");

                                                // PARA ESCRIBIR EL ARCHIVO
                                                try{
//                                                      if(!log.exists()){
//                                                             
log.createNewFile();
//                                                      }

                                                        FileWriter fileWriter =
new FileWriter(log, true);
                                                        BufferedWriter
bufferedWriter = new BufferedWriter(fileWriter);
                                                       
bufferedWriter.write(file.getAbsolutePath()+"\t"+T+"\n");
                                                        bufferedWriter.close();

                                                } catch (IOException e) {
                                                       
System.err.println("Problem writing .DOC to the file out.txt " +
e.getMessage());
                                                        }
                                        }
                                }       
                        }
                }
        }
}
"

For most .docx and .doc files it's work fine.

The error message is:

Exception in thread "main" java.lang.RuntimeException: 
java.lang.IllegalArgumentException: The end (4958) must not be before the start
(4990)

How can I fix it?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to