https://issues.apache.org/bugzilla/show_bug.cgi?id=51901

             Bug #: 51901
           Summary: [PATCH] StringChunk.parseAs7BitData - Encoding not
                    found - US-ASCII; format=flowed
           Product: POI
           Version: 3.8-dev
          Platform: PC
            Status: NEW
          Severity: major
          Priority: P2
         Component: HSMF
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Created attachment 27616
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27616
Patch for issue

Some message files appear to have additional information for charset when
dealing with some US-ASCII types.

Patch attached, looks for an occurrence of a semicolon and substrings the
string if present.  NOTE: won't work if a valid charset encoding for a string
can contain semicolons as a valid option.  Other option could be to modify
Pattern used to produce charsets.

Actual m.group(1) string returned from Content-Type: "US-ASCII; format=flowed;
delsp=yes"

Unable to attach sample file due to sensitive nature.

Exception Message Stack Trace: POI-3.8-beta4


BaseTextExtractionService - Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@2ddd595d
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser@2ddd595d

Caused by: java.lang.RuntimeException: Encoding not found - US-ASCII;
format=flowed
    at
org.apache.poi.hsmf.datatypes.StringChunk.parseAs7BitData(StringChunk.java:155)
    at
org.apache.poi.hsmf.datatypes.StringChunk.parseString(StringChunk.java:86)
    at
org.apache.poi.hsmf.datatypes.StringChunk.set7BitEncoding(StringChunk.java:74)
    at org.apache.poi.hsmf.MAPIMessage.set7BitEncoding(MAPIMessage.java:413)
    at org.apache.poi.hsmf.MAPIMessage.guess7BitEncoding(MAPIMessage.java:373)
    at
org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:73)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:219)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 49 more
Caused by: java.io.UnsupportedEncodingException: US-ASCII; format=flowed
    at java.lang.StringCoding.decode(StringCoding.java:170)
    at java.lang.String.<init>(String.java:443)
    at java.lang.String.<init>(String.java:515)
    at
org.apache.poi.hsmf.datatypes.StringChunk.parseAs7BitData(StringChunk.java:153)
    ... 56 more

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to