https://issues.apache.org/bugzilla/show_bug.cgi?id=52285

             Bug #: 52285
           Summary: [Patch] Enhance XWPF Paragraph to parse (nested) smart
                    tags
           Product: POI
           Version: unspecified
          Platform: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Created attachment 28026
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=28026
patch tar.gz and xml file

Word sometimes adds smart tags to text entered by the user.

They might be simle, like this:
            <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
                w:element="country-region">
                <w:r>
                    <w:rPr>
                        <w:lang w:val="en-US" />
                    </w:rPr>
                    <w:t>India</w:t>
                </w:r>
            </w:smartTag>

or even nested:

            <w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags"
                w:element="PersonName">
                <w:smartTag w:uri="urn:schemas:contacts" w:element="GivenName">
                    <w:r>
                        <w:rPr>
                            <w:lang w:val="en-US" />
                        </w:rPr>
                        <w:t>Marilyn</w:t>
                    </w:r>
                </w:smartTag>
                <w:r>
                    <w:rPr>
                        <w:lang w:val="en-US" />
                    </w:rPr>
                    <w:t xml:space="preserve"> </w:t>
                </w:r>
                <w:smartTag w:uri="urn:schemas:contacts" w:element="Sn">
                    <w:r>
                        <w:rPr>
                            <w:lang w:val="en-US" />
                        </w:rPr>
                        <w:t>Monroe</w:t>
                    </w:r>
                </w:smartTag>
            </w:smartTag>

The previous implementation for a paragraph simply ignores instances of
CTSmartTagRun.
My proposed patch introduces recusrive parsing for CTSmartTagRun. 
I did consider making all tags recursive, but this failed other tests. I think
this might be an option for further improvement.

This makes test cases checking for smart tags pass and fixes two issues in
Tika.

My implementation does discard the information from the smart tag.

Patch also contains minor cleanup of the mixed tab/spacing in this class, and
removed a duplicate document!= null check.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to