ybg163yx opened a new issue, #980:
URL: https://github.com/apache/poi/issues/980
### Description
Calling `XWPFDocument#setParagraph(XWPFParagraph paragraph, int pos)` may
cause
an inconsistency between the internal `bodyElements` and `paragraphs` lists.
After calling `setParagraph`, the element stored at the same position in
`bodyElements` and `paragraphs` may no longer refer to the same paragraph
instance.
### Impact
This inconsistency breaks `XWPFDocument#removeBodyElement(int pos)`.
`removeBodyElement` relies on `getParagraphPos(int bodyPos)` to locate the
corresponding paragraph index. When the paragraph was previously replaced via
`setParagraph`, `getParagraphPos` may return `-1`, causing
`paragraphs.remove(paraPos)` to fail with an exception.
### Root Cause Analysis
In `setParagraph`, two different update mechanisms are used:
* The `paragraphs` list is updated via `ArrayList#set`, directly replacing
the
paragraph reference.
* The underlying XML (`CTDocument`) is updated via
`ctDocument.getBody().setPArray(...)`.
During XML processing, the generated XMLBeans code eventually calls
`XObj.copy_contents_from`, which **copies** the XML contents instead of
reusing the existing `CTP` / `XWPFParagraph` instance.
As a result, the paragraph object referenced by `paragraphs` differs from the
one created and stored in `bodyElements`, leading to inconsistent internal
state.
### Steps to Reproduce
A sample DOCX file is attached.
```java
public static void main(String[] args) throws IOException {
FileInputStream fis =
new FileInputStream("test_1989242873218412545.docx");
try (XWPFDocument document = new XWPFDocument(fis)) {
List<XWPFParagraph> paragraphs = document.getParagraphs();
document.setParagraph(paragraphs.get(5), 6);
// For debugging: inspect internal state after setParagraph
System.out.println("--");
}
}
Expected Behavior
After calling setParagraph, the internal bodyElements and paragraphs
collections should remain consistent, and subsequent calls to
removeBodyElement should work correctly.
Actual Behavior
bodyElements and paragraphs become inconsistent, causing
removeBodyElement to fail when removing a paragraph.
Additional Information
I have identified the cause and implemented a local fix.
A Pull Request will be submitted shortly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]