[ 
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816811#comment-17816811
 ] 

Lonzak commented on TIKA-3784:
------------------------------

PKCS12 is not the easiest format :-|

The oid for pkcs12 starts with "1.2.840.113549.1.12"

I decoded one pkcs12 example (from redhat) and got the following:
{code:java}
<SEQUENCE>
 <INTEGER>3</INTEGER>
 <SEQUENCE>
  <OBJECT_IDENTIFIER Comment="PKCS #7" 
Description="data">1.2.840.113549.1.7.1</OBJECT_IDENTIFIER>
  <NODE Sign="a0">
   <OCTET_STRING>
    <OCTET_STRING>
     <SEQUENCE>
      <SEQUENCE>
       <OBJECT_IDENTIFIER Comment="PKCS #7" 
Description="data">1.2.840.113549.1.7.1</OBJECT_IDENTIFIER>
       <NODE Sign="a0">
        <OCTET_STRING>
         <OCTET_STRING>
          <SEQUENCE>
           <SEQUENCE>
            <OBJECT_IDENTIFIER Comment="PKCS #12 BagIds" 
Description="pkcs-12-pkcs-8ShroudedKeyBag">1.2.840.113549.1.12.10.1.2</OBJECT_IDENTIFIER>
            <NODE Sign="a0">
             <SEQUENCE>
              <SEQUENCE>
               <OBJECT_IDENTIFIER Comment="PKCS #12 PbeIds" 
Description="pbeWithSHAAnd3-KeyTripleDES-CBC">1.2.840.113549.1.12.1.3</OBJECT_IDENTIFIER>
               <SEQUENCE>
                <OCTET_STRING>0xC8CCE579B6DE5B393F7C4885714C04BA</OCTET_STRING>
                <INTEGER>2000</INTEGER>
               </SEQUENCE>
              </SEQUENCE>
              <OCTET_STRING>0x...(shortened fro readability)</OCTET_STRING>
             </SEQUENCE>
            </NODE>
            <SET>
             <SEQUENCE>
              <OBJECT_IDENTIFIER Comment="PKCS #9 via PKCS #12" 
Description="friendlyName (for PKCS 
#12)">1.2.840.113549.1.9.20</OBJECT_IDENTIFIER>
              <SET>
               <BMP_STRING Incomplete="true">0x00630061</BMP_STRING>
              </SET>
             </SEQUENCE>
             <SEQUENCE>
              <OBJECT_IDENTIFIER Comment="PKCS #9 via PKCS #12" 
Description="localKeyID (for PKCS 
#12)">1.2.840.113549.1.9.21</OBJECT_IDENTIFIER>
              <SET>
               
<OCTET_STRING>0x0CDA92EB395D4697A9D178352AF6B2BF06947888</OCTET_STRING>
              </SET>
             </SEQUENCE>
            </SET>
           </SEQUENCE>
          </SEQUENCE>
         </OCTET_STRING>
        </OCTET_STRING>
       </NODE>
      </SEQUENCE>
      <SEQUENCE>
       <OBJECT_IDENTIFIER Comment="PKCS #7" 
Description="encryptedData">1.2.840.113549.1.7.6</OBJECT_IDENTIFIER>
       <NODE Sign="a0">
        <SEQUENCE>
         <INTEGER/>
         <SEQUENCE>
          <OBJECT_IDENTIFIER Comment="PKCS #7" 
Description="data">1.2.840.113549.1.7.1</OBJECT_IDENTIFIER>
          <SEQUENCE>
           <OBJECT_IDENTIFIER Comment="PKCS #12 PbeIds" 
Description="pbeWithSHAAnd40BitRC2-CBC">1.2.840.113549.1.12.1.6</OBJECT_IDENTIFIER>
           <SEQUENCE>
            <OCTET_STRING>0x7F432D60BCD2888476E6CB9CD2BC69F1</OCTET_STRING>
            <INTEGER>2000</INTEGER>
           </SEQUENCE>
          </SEQUENCE>
          <NODE Sign="a0">
           <OCTET_STRING>0x...(shortened fro readability)</OCTET_STRING>
           <OCTET_STRING>0x0E8E4C15DCB1D87F</OCTET_STRING>
          </NODE>
         </SEQUENCE>
        </SEQUENCE>
       </NODE>
      </SEQUENCE>
     </SEQUENCE>
    </OCTET_STRING>
   </OCTET_STRING>
  </NODE>
 </SEQUENCE>
 <SEQUENCE>
  <SEQUENCE>
   <SEQUENCE>
    <OBJECT_IDENTIFIER Comment="OIW" 
Description="sha1">1.3.14.3.2.26</OBJECT_IDENTIFIER>
    <NULL/>
   </SEQUENCE>
   <OCTET_STRING>0x6DFFA14B5A8A32A87DAD2CFCE1EAEBDAFB89C897</OCTET_STRING>
  </SEQUENCE>
  <OCTET_STRING>0x826699C21B9A4C9E3E608D3C8FBD2310</OCTET_STRING>
  <INTEGER>2000</INTEGER>
 </SEQUENCE>
</SEQUENCE> {code}
The following things points to a pkcs12 format:
 # Presence of PKCS#12-specific object identifiers (OIDs):
 ## PKCS#12 Bag Types: presence of OIDs such as 1.2.840.113549.1.12.10.1.x, 
which indicate different types of key and certificate bags (KeyBags, CertBags, 
etc.).
 ## PKCS#12 PbeIds: Encryption and hashing OIDs such as 
1.2.840.113549.1.12.1.x, which indicate the use of specific encryption 
mechanisms 
 # Use of encryption schemes:
 ## Recognize encryption schemes, especially those that are typical for 
PKCS#12, such as pbeWithSHAAnd3-KeyTripleDES-CBC and pbeWithSHAAnd40BitRC2-CBC. 
These schemes are crucial for the security of PKCS#12 files and a clear 
indication of their presence.
 #  Structure of the file:
 ## Analyzing the file structure for multi-level nested SEQUENCE and 
OCTET_STRING elements, which are typically used to store encrypted private keys 
and certificates. The complexity of this structure is characteristic of PKCS#12 
files.
 # Specific attributes:
 ## PKCS#9 attributes such as friendlyName (OID 1.2.840.113549.1.9.20) and 
localKeyID (OID 1.2.840.113549.1.9.21) are commonly used to provide metadata 
for keys and certificates within the container.
 #  ... and more

 

However since we are already talking about Libraries - Standard Java Crypto and 
BouncyCastle have all this already inside. They are parsing the structures, 
analyze and use it. So using one of these two would be the easiest solution 
imho. I have never written a Detector so please excuse my ignorance:
{code:java}
public class PKCS12Detector implements Detector {    
    private static final long serialVersionUID = -8414458255467101503L;
    private static final MediaType PKCS12_MEDIA_TYPE = 
MediaType.application("x-pkcs12");    
    
    @Override
    public MediaType detect(InputStream input, Metadata metadata) {
        try {
            KeyStore keyStore = KeyStore.getInstance("PKCS12");
            keyStore.load(input, null);
            return PKCS12_MEDIA_TYPE; // success
        }
        catch (Exception e) {
            return MediaType.OCTET_STREAM; // something else
        }
    }
} {code}
A bouncy castle one would look quite similar...

> Detector returns "application/x-x509-key" when scanning a .p12 file
> -------------------------------------------------------------------
>
>                 Key: TIKA-3784
>                 URL: https://issues.apache.org/jira/browse/TIKA-3784
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.26
>            Reporter: Matthias Hofbauer
>            Priority: Critical
>         Attachments: dump_p12s.txt
>
>
> We are using tika to check if the MIME type of the file extensions matches 
> with the MIME type of the file content.
> After our upgrade from tika-core 1.22 to 1.26 our logic does not work anymore 
> for certificates of type .p12, .pfx, .cer, .der.
> For the .p12 and .pfx extension the MIME type is "application/x-pkcs12" but 
> the tika detector returns "application/x-x509-key" instead.
> After checking the tika-mimetype.xml and comparing it to my .p12 file I found 
> the following MIME magic which explains why I got these types back.
> {code:xml}
> <mime-type type="application/x-x509-key;format=der">
>     <sub-class-of type="application/x-x509-key"/>
>     <!-- These are just a bunch of magic integers as defined by the key 
> format... -->
>     <!-- Always seem to have a version integer as their first entry, -->
>     <!--  normally 00, 01 or 02, check for that -->
>     <magic priority="40">
>       <match value="0x3081FF020100" type="string"
>               mask="0xFFFF00FFFFFC" offset="0"/>
>       <match value="0x3082FFFF020100" type="string"
>               mask="0xFFFF0000FFFFFC" offset="0"/>
>     </magic>
> </mime-type> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to