[jira] [Updated] (TIKA-1611) Allow RecursiveParserWrapper to catch exceptions from embedded documents

2015-04-21 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1611:
--
Description: 
While parsing embedded documents, currently, if a parser hits an 
EncryptedDocumentException or anything wrapped in a TikaException, the 
Exception is swallowed by {{ParsingEmbeddedDocumentExtractor}}:
{noformat}
DELEGATING_PARSER.parse(
newStream,
new EmbeddedContentHandler(new 
BodyContentHandler(handler)),
metadata, context);
} catch (EncryptedDocumentException ede) {
// TODO: can we log a warning that we lack the password?
// For now, just skip the content
} catch (TikaException e) {
// TODO: can we log a warning somehow?
// Could not parse the entry, just skip the content
} finally {
tmp.close();
}
{noformat}


For some applications, it might be better to store the stack trace of the 
attachment that caused an exception.

The proposal would be to include the stack trace in the metadata object for 
that particular attachment.

The user will be able to specify whether or not to store stack traces, and the 
default will be to store stack traces.  This will be a small change to the 
legacy behavior.

  was:
While parsing embedded documents, currently, if a parser hits an Exception, the 
Exception is swallowed by {{ParsingEmbeddedDocumentExtractor}}:
{noformat}
DELEGATING_PARSER.parse(
newStream,
new EmbeddedContentHandler(new 
BodyContentHandler(handler)),
metadata, context);
} catch (EncryptedDocumentException ede) {
// TODO: can we log a warning that we lack the password?
// For now, just skip the content
} catch (TikaException e) {
// TODO: can we log a warning somehow?
// Could not parse the entry, just skip the content
} finally {
tmp.close();
}
{noformat}


For some applications, it might be better to store the stack trace of the 
attachment that caused an exception.

The proposal would be to include the stack trace in the metadata object for 
that particular attachment.

The user will be able to specify whether or not to store stack traces, and the 
default will be to store stack traces.  This will be a small change to the 
legacy behavior.


 Allow RecursiveParserWrapper to catch exceptions from embedded documents
 

 Key: TIKA-1611
 URL: https://issues.apache.org/jira/browse/TIKA-1611
 Project: Tika
  Issue Type: Improvement
  Components: core
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Minor
 Fix For: 1.9


 While parsing embedded documents, currently, if a parser hits an 
 EncryptedDocumentException or anything wrapped in a TikaException, the 
 Exception is swallowed by {{ParsingEmbeddedDocumentExtractor}}:
 {noformat}
 DELEGATING_PARSER.parse(
 newStream,
 new EmbeddedContentHandler(new 
 BodyContentHandler(handler)),
 metadata, context);
 } catch (EncryptedDocumentException ede) {
 // TODO: can we log a warning that we lack the password?
 // For now, just skip the content
 } catch (TikaException e) {
 // TODO: can we log a warning somehow?
 // Could not parse the entry, just skip the content
 } finally {
 tmp.close();
 }
 {noformat}
 For some applications, it might be better to store the stack trace of the 
 attachment that caused an exception.
 The proposal would be to include the stack trace in the metadata object for 
 that particular attachment.
 The user will be able to specify whether or not to store stack traces, and 
 the default will be to store stack traces.  This will be a small change to 
 the legacy behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1611) Allow RecursiveParserWrapper to catch exceptions from embedded documents

2015-04-21 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1611:
--
Description: 
While parsing embedded documents, currently, if a parser hits an Exception, the 
Exception is swallowed by {{ParsingEmbeddedDocumentExtractor}}:
{noformat}
DELEGATING_PARSER.parse(
newStream,
new EmbeddedContentHandler(new 
BodyContentHandler(handler)),
metadata, context);
} catch (EncryptedDocumentException ede) {
// TODO: can we log a warning that we lack the password?
// For now, just skip the content
} catch (TikaException e) {
// TODO: can we log a warning somehow?
// Could not parse the entry, just skip the content
} finally {
tmp.close();
}
{noformat}


For some applications, it might be better to store the stack trace of the 
attachment that caused an exception.

The proposal would be to include the stack trace in the metadata object for 
that particular attachment.

The user will be able to specify whether or not to store stack traces, and the 
default will be to store stack traces.  This will be a small change to the 
legacy behavior.

  was:
While parsing embedded documents, currently, if a parser hits an Exception, the 
parsing of the entire document comes to a grinding halt.  For some 
applications, it might be better to catch the exception at the attachment level.

The proposal would be to include the stack trace in the metadata object for 
that particular attachment.

The user will be able to specify whether or not to catch embedded exceptions, 
and the default will be to catch embedded exceptions.  This will be a small 
change to the legacy behavior.


 Allow RecursiveParserWrapper to catch exceptions from embedded documents
 

 Key: TIKA-1611
 URL: https://issues.apache.org/jira/browse/TIKA-1611
 Project: Tika
  Issue Type: Improvement
  Components: core
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Minor
 Fix For: 1.9


 While parsing embedded documents, currently, if a parser hits an Exception, 
 the Exception is swallowed by {{ParsingEmbeddedDocumentExtractor}}:
 {noformat}
 DELEGATING_PARSER.parse(
 newStream,
 new EmbeddedContentHandler(new 
 BodyContentHandler(handler)),
 metadata, context);
 } catch (EncryptedDocumentException ede) {
 // TODO: can we log a warning that we lack the password?
 // For now, just skip the content
 } catch (TikaException e) {
 // TODO: can we log a warning somehow?
 // Could not parse the entry, just skip the content
 } finally {
 tmp.close();
 }
 {noformat}
 For some applications, it might be better to store the stack trace of the 
 attachment that caused an exception.
 The proposal would be to include the stack trace in the metadata object for 
 that particular attachment.
 The user will be able to specify whether or not to store stack traces, and 
 the default will be to store stack traces.  This will be a small change to 
 the legacy behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)