[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746737#comment-16746737 ]
Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:14 PM: ------------------------------------------------------------------- Updated references are: * [RFC-5652, Cryptographic Message Syntax (CMS)|https://tools.ietf.org/html/rfc5652] * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification|https://tools.ietf.org/html/rfc5751] * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS Structures|https://tools.ietf.org/html/rfc7468] Tika looks for any "pkcs7" OID at the beginning of the file and, if found, returns "application/pkcs7-signature". The OIDs that should be looked for are "pkcs7-signedData", "pkcs7-envelopedData" and "id-smime-ct-compressedData". There are three media types with "pkcs7-signedData" at the beginning, namely: * "application/pkcs7-signature", extention ".p7s", when the signed content is not present (detached signature) * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the signed content is present * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when there are only certificates and (optionally) CRLs When the OID is "pkcs7-envelopedData" the media type is "application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m". When the OID is "id-smime-ct-compressedData" the media type is "application/pkcs7-mime; smime-type=compressed-data" and the extension is ".p7z". Extension ".p7b" is registered in Tika with media type "application/x-pkcs7-certificates" but I think the content of such files is the same as ".p7c" ones. Furthermore the label in the textual encoding is always PKCS7 (i.e. the file begins with "-----BEGIN PKCS7"). I can provide examples, built using openssl, but to support those media types Tika shall: * return parameters in media type when detecting streams * return different extensions based on media type parameters * further inspect streams when "-----BEGIN PKCS7" or "pkcs7-signedData" are found (like it does for XML streams) * register "application/pkcs7-signature" as sub-class of "application/pkcs7-mime" was (Author: roberto.benedetti): Updated references are: * [RFC-5652, Cryptographic Message Syntax (CMS)|https://tools.ietf.org/html/rfc5652] * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification|https://tools.ietf.org/html/rfc5751] * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS Structures|https://tools.ietf.org/html/rfc7468] Tika looks for any "pkcs7" OID at the beginning of the file and, if found, returns "application/pkcs7-signature". The OIDs that should be looked for are "pkcs7-signedData", "pkcs7-envelopedData" and "id-smime-ct-compressedData". There are three media types with "pkcs7-signedData" at the beginning, namely: * "application/pkcs7-signature", extention ".p7s", when the signed content is not present (detached signature) * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the signed content is present * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when there are only certificates and (optionally) CRLs When the OID is "pkcs7-envelopedData" the media type is "application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m". When the OID is "id-smime-ct-compressedData" the media type is "application/pkcs7-mime; smime-type=compressed-data" and the extension is ".p7z". Extension ".p7b" is registered in Tika with media type "application/x-pkcs7-certificates" but I think the content of such files is the same as ".p7c" ones. Furthermore the label in the textual encoding is always PKCS7 (i.e. the file begins with "-----BEGIN PKCS7"). I can provide examples, built using openssl, but to support those media types Tika shall: * return parameters in media type when detecting streams * return different extensions based on media type parameters * further inspect streams when "-----BEGIN PKCS7" or "pkcs7-signedData" are found (like it does for XML streams) * register "application/pkcs7-signature" as sub-class of "application/pkcs7-mime" (it is referred to as "degenerated case") > Problem in Tika().detect for xml file signed in CADES > ----------------------------------------------------- > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector > Affects Versions: 1.13 > Environment: JDK 1.7 > Reporter: Michele Andreano > Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005)