[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-09 Thread Lonzak (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816122#comment-17816122
 ] 

Lonzak edited comment on TIKA-4194 at 2/9/24 4:15 PM:
--

I did investigate a bit further - (however my knowledge in this area is quite 
limited):

Tika is indeed looking at the bytes - a working keystore has the following 
"Magic" matcher:

[40/application/x-x509-key; format=der string 0 0x3082020100 
0xFC]

If I open that file in a hex editor I can see:

 
{code:java}
0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore)
0x 30 82 FF FF 02 01 00   (magic bytes from the Magic class)
{code}
This seems to match except for the FF and last 00 values. (Maybe these bytes 
are ignored?)

 

If I open a non working one I get:
{code:java}
0x 30 80 02 01 03 30 80 (Bits from a non working keystore)
0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class){code}
So the 2nd hex number is different thus it is not a match I would guess. But 
the bits also seems to to be shifted?
{code:java}
0x 30 80   02 01 03 30 80 (Bits from a non working keystore)
0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore)
0x 30 82 FF FF 02 01 00   (magic bytes from the Magic class){code}
So an approach could be to add the missing magic bytes to an existing/new Magic 
class?

 

So maybe a matcher:

{{magic=0x3080FF3080}}

would work?{{{}{}}}


was (Author: tom_1st):
I did investigate a bit further - (however my knowledge in this area is quite 
limited):

Tika is indeed looking at the bytes - a working keystore has the following 
"Magic" matcher:

[40/application/x-x509-key; format=der string 0 0x3082020100 
0xFC]

If I open that file in a hex editor I can see:

 
{code:java}
0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore)
0x 30 82 FF FF 02 01 00   (magic bytes from the Magic class)
{code}
This seems to match except for the FF and last 00 values. (Maybe these bytes 
are ignored?)

 

If I open a non working one I get:
{code:java}
0x 30 80 02 01 03 30 80 (Bits from a non working keystore)
0x 30 82 FF FF 02 01 00 (magic bytes from the Magic class){code}
So the 2nd hex number is different thus it is not a match I would guess. But 
the bits also seems to to be shifted?
{code:java}
0x 30 80   02 01 03 30 80 (Bits from a non working keystore)
0x 30 82 10 29 02 01 03 30 82 (Bits from a working keystore)
0x 30 82 FF FF 02 01 00   (magic bytes from the Magic class){code}
So an approach could be to add the missing magic bytes to an existing/new Magic 
class?

> tika fails to detect certain pkcs12 keystores types p12 pfx
> ---
>
> Key: TIKA-4194
> URL: https://issues.apache.org/jira/browse/TIKA-4194
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.9.1
>Reporter: Lonzak
>Priority: Major
>
> We use tika to detect the type of a file which is uploaded. In most cases 
> this works quite well. However recently some files were rejected because tika 
> reports an invalid file type. We'll get
> {code:java}
> APPLICATION/OCTET-STREAM{code}
> instead of
> {code:java}
> APPLICATION/X-X509-KEY{code}
> I did an analysis and found that tika doesn't recognize certain types of 
> pkcs12 keystores. The test keystores can be found 
> [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master].
> I created a list to show which ones are effected.  Out of 157 keystores 132 
> are correctly detected and 25 are not.
>  
> ||#||correct?||type||filename||
> |1|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |2|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |3|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |4|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |5|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12|
> |6|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |7|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048

[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-09 Thread Lonzak (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816132#comment-17816132
 ] 

Lonzak edited comment on TIKA-4194 at 2/9/24 5:52 PM:
--

I read a bit [more|https://stackoverflow.com/a/31451808/2311528]. The whole 
context is ASN.1 DER encoding. So it is not magic bytes but ASN.1 encoding...

"30 82" is followed by two further bytes that specify the length of the 
SEQUENCE in an explicit number. This enables the coding of objects with a 
length of up to 65535 (0x) bytes.

"30 80", on the other hand, signals the start of a SEQUENCE with an undefined 
length. The final length of the SEQUENCE is not specified in advance. Instead, 
the end of the SEQUENCE is marked by a special end-of-contents (EOC) marker 
pair "00 00". This encoding method is typically used when the total length of 
the SEQUENCE is not known at the time of encoding or when it is practical to 
treat the data as a stream.

 

To cover both cases, one could define an additional rule or adjust the existing 
rule to be more flexible. Directly adapting the current rule to include 
{{0x3080}} could be challenging because the structure and logic behind the 
length indication and subsequent content are different. Instead, we might need 
to add a new rule specifically targeting keystores with {{{}0x3080{}}}. Note, 
however, that detecting content with indefinite length is more challenging, as 
one may not be able to straightforwardly check for a specific byte sequence 
after {{{}0x3080{}}}.
{code:java}
[40/application/x-x509-key; format=der string 0 0x3080??]{code}
In this hypothetical rule, {{??}} stands for a placeholder, as the 
specific handling for content with indefinite length needs to be adjusted, 
possibly by implementing a logic that recognizes the end of the stream instead 
of relying on fixed byte patterns.


was (Author: tom_1st):
I read a bit more. The whole context is ASN.1 DER encoding.

"30 82" is followed by two further bytes that specify the length of the 
SEQUENCE in an explicit number. This enables the coding of objects with a 
length of up to 65535 (0x) bytes.

"30 80", on the other hand, signals the start of a SEQUENCE with an undefined 
length. The final length of the SEQUENCE is not specified in advance. Instead, 
the end of the SEQUENCE is marked by a special end-of-contents (EOC) marker 
pair "00 00". This encoding method is typically used when the total length of 
the SEQUENCE is not known at the time of encoding or when it is practical to 
treat the data as a stream.

 

To cover both cases, one could define an additional rule or adjust the existing 
rule to be more flexible. Directly adapting the current rule to include 
{{0x3080}} could be challenging because the structure and logic behind the 
length indication and subsequent content are different. Instead, we might need 
to add a new rule specifically targeting keystores with {{{}0x3080{}}}. Note, 
however, that detecting content with indefinite length is more challenging, as 
one may not be able to straightforwardly check for a specific byte sequence 
after {{{}0x3080{}}}.
{code:java}
[40/application/x-x509-key; format=der string 0 0x3080??]{code}
In this hypothetical rule, {{??}} stands for a placeholder, as the 
specific handling for content with indefinite length needs to be adjusted, 
possibly by implementing a logic that recognizes the end of the stream instead 
of relying on fixed byte patterns.

> tika fails to detect certain pkcs12 keystores types p12 pfx
> ---
>
> Key: TIKA-4194
> URL: https://issues.apache.org/jira/browse/TIKA-4194
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.9.1
>Reporter: Lonzak
>Priority: Major
>
> We use tika to detect the type of a file which is uploaded. In most cases 
> this works quite well. However recently some files were rejected because tika 
> reports an invalid file type. We'll get
> {code:java}
> APPLICATION/OCTET-STREAM{code}
> instead of
> {code:java}
> APPLICATION/X-X509-KEY{code}
> I did an analysis and found that tika doesn't recognize certain types of 
> pkcs12 keystores. The test keystores can be found 
> [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master].
> I created a list to show which ones are effected.  Out of 157 keystores 132 
> are correctly detected and 25 are not.
>  
> ||#||correct?||type||filename||
> |1|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |2|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyT

[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread Lonzak (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816607#comment-17816607
 ] 

Lonzak edited comment on TIKA-4194 at 2/12/24 1:47 PM:
---

Interestingly the "application/pkcs7-signature" type looks quite similar:
{code:java}

  
         
      
      
         
      
      
         
      
      
         
      
      
         
      
 
{code}
Just had to adapt the offset a bit and and did work:
{code:java}
  
         
      
      
         
      {code}
However I didn't find a keystore with 0x3081 so the offset is unclear in that 
case. My solution would look like this now and works for all the cases...
{code:java}
    
      
     ...
    {code}


was (Author: tom_1st):
Interestingly the "application/pkcs7-signature" type looks quite similar:

 

 
{code:java}

  
         
      
      
         
      
      
         
      
      
         
      
      
         
      
 
{code}
 

Just had to adapt the offset a bit and and did work:

 
{code:java}
  
         
      
      
         
      {code}
 

 

However I didn't find a keystore with 0x3081 so the offset is unclear in that 
case. My solution would look like this now and works for all the cases...

 
{code:java}
    
      
     ...
    {code}
 

 

 

 

> tika fails to detect certain pkcs12 keystores types p12 pfx
> ---
>
> Key: TIKA-4194
> URL: https://issues.apache.org/jira/browse/TIKA-4194
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.9.1
>Reporter: Lonzak
>Priority: Major
>
> We use tika to detect the type of a file which is uploaded. In most cases 
> this works quite well. However recently some files were rejected because tika 
> reports an invalid file type. We'll get
> {code:java}
> APPLICATION/OCTET-STREAM{code}
> instead of
> {code:java}
> APPLICATION/X-X509-KEY{code}
> I did an analysis and found that tika doesn't recognize certain types of 
> pkcs12 keystores. The test keystores can be found 
> [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master].
> I created a list to show which ones are effected.  Out of 157 keystores 132 
> are correctly detected and 25 are not.
>  
> ||#||correct?||type||filename||
> |1|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |2|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |3|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |4|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |5|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12|
> |6|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |7|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |8|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |9|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |10|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |11|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12|
> |12|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |13|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |14|OK|APPLICATION/X

[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-07-18 Thread Lonzak (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867029#comment-17867029
 ] 

Lonzak edited comment on TIKA-4194 at 7/18/24 2:48 PM:
---

The ticket was resolved with 2.9.2 - right?

So the version can be set and the ticket be closed... I would do that but don't 
have the rights...


was (Author: tom_1st):
The ticket was resolved with 2.9.2 - right?

So the version can be set and the ticket be closed...

> tika fails to detect certain pkcs12 keystores types p12 pfx
> ---
>
> Key: TIKA-4194
> URL: https://issues.apache.org/jira/browse/TIKA-4194
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.9.1
>Reporter: Lonzak
>Priority: Major
>
> We use tika to detect the type of a file which is uploaded. In most cases 
> this works quite well. However recently some files were rejected because tika 
> reports an invalid file type. We'll get
> {code:java}
> APPLICATION/OCTET-STREAM{code}
> instead of
> {code:java}
> APPLICATION/X-X509-KEY{code}
> (As pointed out in TIKA-3784 the mimetype should really be 
> "application/x-pkcs12" but for us "application/x-x509-key" works for now)
>  
> I did an analysis and found that tika doesn't recognize certain types of 
> pkcs12 keystores. The test keystores can be found 
> [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master].
> I created a list to show which ones are effected.  Out of 157 keystores 132 
> are correctly detected and 25 are not.
>  
> ||#||correct?||type||filename||
> |1|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |2|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |3|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |4|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |5|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12|
> |6|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |7|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |8|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |9|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |10|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |11|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12|
> |12|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |13|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |14|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |15|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |16|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12|
> |17|OK|APPLICATION/X-X509-KEY; 
> FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),key