ybg163yx opened a new issue, #980:
URL: https://github.com/apache/poi/issues/980

   ### Description
   Calling `XWPFDocument#setParagraph(XWPFParagraph paragraph, int pos)` may 
cause
   an inconsistency between the internal `bodyElements` and `paragraphs` lists.
   
   After calling `setParagraph`, the element stored at the same position in
   `bodyElements` and `paragraphs` may no longer refer to the same paragraph
   instance.
   
   ### Impact
   This inconsistency breaks `XWPFDocument#removeBodyElement(int pos)`.
   
   `removeBodyElement` relies on `getParagraphPos(int bodyPos)` to locate the
   corresponding paragraph index. When the paragraph was previously replaced via
   `setParagraph`, `getParagraphPos` may return `-1`, causing
   `paragraphs.remove(paraPos)` to fail with an exception.
   
   ### Root Cause Analysis
   In `setParagraph`, two different update mechanisms are used:
   
   * The `paragraphs` list is updated via `ArrayList#set`, directly replacing 
the
     paragraph reference.
   * The underlying XML (`CTDocument`) is updated via
     `ctDocument.getBody().setPArray(...)`.
   
   During XML processing, the generated XMLBeans code eventually calls
   `XObj.copy_contents_from`, which **copies** the XML contents instead of
   reusing the existing `CTP` / `XWPFParagraph` instance.
   
   As a result, the paragraph object referenced by `paragraphs` differs from the
   one created and stored in `bodyElements`, leading to inconsistent internal
   state.
   
   ### Steps to Reproduce
   A sample DOCX file is attached.
   
   ```java
   public static void main(String[] args) throws IOException {
       FileInputStream fis =
           new FileInputStream("test_1989242873218412545.docx");
   
       try (XWPFDocument document = new XWPFDocument(fis)) {
           List<XWPFParagraph> paragraphs = document.getParagraphs();
           document.setParagraph(paragraphs.get(5), 6);
   
           // For debugging: inspect internal state after setParagraph
           System.out.println("--");
       }
   }
   Expected Behavior
   
   After calling setParagraph, the internal bodyElements and paragraphs
   collections should remain consistent, and subsequent calls to
   removeBodyElement should work correctly.
   
   Actual Behavior
   
   bodyElements and paragraphs become inconsistent, causing
   removeBodyElement to fail when removing a paragraph.
   
   Additional Information
   
   I have identified the cause and implemented a local fix.
   A Pull Request will be submitted shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to