[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816122#comment-17816122 ] Lonzak edited comment on TIKA-4194 at 2/9/24 4:15 PM: -- I did investigate a bit further - (however my knowledge in this area is quite limited): Tika is indeed looking at the bytes - a working keystore has the following "Magic" matcher: [40/application/x-x509-key; format=der string 0 0x3082020100 0xFC] If I open that file in a hex editor I can see: {code:java} 0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore) 0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class) {code} This seems to match except for the FF and last 00 values. (Maybe these bytes are ignored?) If I open a non working one I get: {code:java} 0x 30 80 02 01 03 30 80 (Bits from a non working keystore) 0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class){code} So the 2nd hex number is different thus it is not a match I would guess. But the bits also seems to to be shifted? {code:java} 0x 30 80 02 01 03 30 80 (Bits from a non working keystore) 0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore) 0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class){code} So an approach could be to add the missing magic bytes to an existing/new Magic class? So maybe a matcher: {{magic=0x3080FF3080}} would work?{{{}{}}} was (Author: tom_1st): I did investigate a bit further - (however my knowledge in this area is quite limited): Tika is indeed looking at the bytes - a working keystore has the following "Magic" matcher: [40/application/x-x509-key; format=der string 0 0x3082020100 0xFC] If I open that file in a hex editor I can see: {code:java} 0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore) 0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class) {code} This seems to match except for the FF and last 00 values. (Maybe these bytes are ignored?) If I open a non working one I get: {code:java} 0x 30 80 02 01 03 30 80 (Bits from a non working keystore) 0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class){code} So the 2nd hex number is different thus it is not a match I would guess. But the bits also seems to to be shifted? {code:java} 0x 30 80 02 01 03 30 80 (Bits from a non working keystore) 0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore) 0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class){code} So an approach could be to add the missing magic bytes to an existing/new Magic class? > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048
[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816132#comment-17816132 ] Lonzak edited comment on TIKA-4194 at 2/9/24 5:52 PM: -- I read a bit [more|https://stackoverflow.com/a/31451808/2311528]. The whole context is ASN.1 DER encoding. So it is not magic bytes but ASN.1 encoding... "30 82" is followed by two further bytes that specify the length of the SEQUENCE in an explicit number. This enables the coding of objects with a length of up to 65535 (0x) bytes. "30 80", on the other hand, signals the start of a SEQUENCE with an undefined length. The final length of the SEQUENCE is not specified in advance. Instead, the end of the SEQUENCE is marked by a special end-of-contents (EOC) marker pair "00 00". This encoding method is typically used when the total length of the SEQUENCE is not known at the time of encoding or when it is practical to treat the data as a stream. To cover both cases, one could define an additional rule or adjust the existing rule to be more flexible. Directly adapting the current rule to include {{0x3080}} could be challenging because the structure and logic behind the length indication and subsequent content are different. Instead, we might need to add a new rule specifically targeting keystores with {{{}0x3080{}}}. Note, however, that detecting content with indefinite length is more challenging, as one may not be able to straightforwardly check for a specific byte sequence after {{{}0x3080{}}}. {code:java} [40/application/x-x509-key; format=der string 0 0x3080??]{code} In this hypothetical rule, {{??}} stands for a placeholder, as the specific handling for content with indefinite length needs to be adjusted, possibly by implementing a logic that recognizes the end of the stream instead of relying on fixed byte patterns. was (Author: tom_1st): I read a bit more. The whole context is ASN.1 DER encoding. "30 82" is followed by two further bytes that specify the length of the SEQUENCE in an explicit number. This enables the coding of objects with a length of up to 65535 (0x) bytes. "30 80", on the other hand, signals the start of a SEQUENCE with an undefined length. The final length of the SEQUENCE is not specified in advance. Instead, the end of the SEQUENCE is marked by a special end-of-contents (EOC) marker pair "00 00". This encoding method is typically used when the total length of the SEQUENCE is not known at the time of encoding or when it is practical to treat the data as a stream. To cover both cases, one could define an additional rule or adjust the existing rule to be more flexible. Directly adapting the current rule to include {{0x3080}} could be challenging because the structure and logic behind the length indication and subsequent content are different. Instead, we might need to add a new rule specifically targeting keystores with {{{}0x3080{}}}. Note, however, that detecting content with indefinite length is more challenging, as one may not be able to straightforwardly check for a specific byte sequence after {{{}0x3080{}}}. {code:java} [40/application/x-x509-key; format=der string 0 0x3080??]{code} In this hypothetical rule, {{??}} stands for a placeholder, as the specific handling for content with indefinite length needs to be adjusted, possibly by implementing a logic that recognizes the end of the stream instead of relying on fixed byte patterns. > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyT
[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816607#comment-17816607 ] Lonzak edited comment on TIKA-4194 at 2/12/24 1:47 PM: --- Interestingly the "application/pkcs7-signature" type looks quite similar: {code:java} {code} Just had to adapt the offset a bit and and did work: {code:java} {code} However I didn't find a keystore with 0x3081 so the offset is unclear in that case. My solution would look like this now and works for all the cases... {code:java} ... {code} was (Author: tom_1st): Interestingly the "application/pkcs7-signature" type looks quite similar: {code:java} {code} Just had to adapt the offset a bit and and did work: {code:java} {code} However I didn't find a keystore with 0x3081 so the offset is unclear in that case. My solution would look like this now and works for all the cases... {code:java} ... {code} > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X
[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867029#comment-17867029 ] Lonzak edited comment on TIKA-4194 at 7/18/24 2:48 PM: --- The ticket was resolved with 2.9.2 - right? So the version can be set and the ticket be closed... I would do that but don't have the rights... was (Author: tom_1st): The ticket was resolved with 2.9.2 - right? So the version can be set and the ticket be closed... > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > (As pointed out in TIKA-3784 the mimetype should really be > "application/x-pkcs12" but for us "application/x-x509-key" works for now) > > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),key