[Bug 69314] New: Header content from .doc not extracted

bugzilla Tue, 10 Sep 2024 05:26:43 -0700

https://bz.apache.org/bugzilla/show_bug.cgi?id=69314


            Bug ID: 69314
           Summary: Header content from .doc not extracted
           Product: POI
           Version: 5.2.3-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HWPF
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

Over on https://issues.apache.org/jira/browse/TIKA-4307, August Valera shared a
.doc file whose header content is not being extracted.

The content is extracted when he converts the .doc to a .docx, and I can see
the content when I open the file in LibreOffice.

The debug logging file that August shared shows that POI identifies some issues
during the initial parse -- there may just be problems with the file.

I can confirm through the debugger that the content is in the document string,
but the ranges for the HeaderStories do not seem to include the header content.

Any help would be appreciated. Thank you!

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Bug 69314] New: Header content from .doc not extracted

Reply via email to