Re: [PR] Bump aws.version from 1.12.656 to 1.12.657 [tika]
THausherr merged PR #1592: URL: https://github.com/apache/tika/pull/1592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump aws.version from 1.12.656 to 1.12.657 [tika]
dependabot[bot] opened a new pull request, #1592: URL: https://github.com/apache/tika/pull/1592 Bumps `aws.version` from 1.12.656 to 1.12.657. Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.656 to 1.12.657 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md";>com.amazonaws:aws-java-sdk-s3's changelog. 1.12.657 2024-02-12 AWS AppSync Features Adds support for new options on GraphqlAPIs, Resolvers and Data Sources for emitting Amazon CloudWatch metrics for enhanced monitoring of AppSync APIs. Amazon CloudWatch Features This release enables PutMetricData API request payload compression by default. Amazon Neptune Graph Features Adding a new option "parameters" for data plane api ExecuteQuery to support running parameterized query via SDK. Amazon Route 53 Domains Features This release adds bill contact support for RegisterDomain, TransferDomain, UpdateDomainContact and GetDomainDetail API. Commits https://github.com/aws/aws-sdk-java/commit/3d1c7de71d5fbb74d542f6634778d61254ba0667";>3d1c7de AWS SDK for Java 1.12.657 https://github.com/aws/aws-sdk-java/commit/4ce0ae3633f217e6b461cfd03b76e206782058a6";>4ce0ae3 Update GitHub version number to 1.12.657-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.656...1.12.657";>compare view Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.656 to 1.12.657 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md";>com.amazonaws:aws-java-sdk-transcribe's changelog. 1.12.657 2024-02-12 AWS AppSync Features Adds support for new options on GraphqlAPIs, Resolvers and Data Sources for emitting Amazon CloudWatch metrics for enhanced monitoring of AppSync APIs. Amazon CloudWatch Features This release enables PutMetricData API request payload compression by default. Amazon Neptune Graph Features Adding a new option "parameters" for data plane api ExecuteQuery to support running parameterized query via SDK. Amazon Route 53 Domains Features This release adds bill contact support for RegisterDomain, TransferDomain, UpdateDomainContact and GetDomainDetail API. Commits https://github.com/aws/aws-sdk-java/commit/3d1c7de71d5fbb74d542f6634778d61254ba0667";>3d1c7de AWS SDK for Java 1.12.657 https://github.com/aws/aws-sdk-java/commit/4ce0ae3633f217e6b461cfd03b76e206782058a6";>4ce0ae3 Update GitHub version number to 1.12.657-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.656...1.12.657";>compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816827#comment-17816827 ] Tim Allison commented on TIKA-3784: --- Well, sure, if you want to make it easy! Y, let's go with something like that! I'll see what I can do tomorrow. Thank you! > Detector returns "application/x-x509-key" when scanning a .p12 file > --- > > Key: TIKA-3784 > URL: https://issues.apache.org/jira/browse/TIKA-3784 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 1.26 >Reporter: Matthias Hofbauer >Priority: Critical > Attachments: dump_p12s.txt > > > We are using tika to check if the MIME type of the file extensions matches > with the MIME type of the file content. > After our upgrade from tika-core 1.22 to 1.26 our logic does not work anymore > for certificates of type .p12, .pfx, .cer, .der. > For the .p12 and .pfx extension the MIME type is "application/x-pkcs12" but > the tika detector returns "application/x-x509-key" instead. > After checking the tika-mimetype.xml and comparing it to my .p12 file I found > the following MIME magic which explains why I got these types back. > {code:xml} > > > > > > > mask="0x00FC" offset="0"/> > mask="0xFC" offset="0"/> > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816811#comment-17816811 ] Lonzak edited comment on TIKA-3784 at 2/12/24 11:16 PM: PKCS12 is not the easiest format :-| The oid for pkcs12 starts with "1.2.840.113549.1.12" I decoded one pkcs12 example (from redhat) and got the following: {code:java} 3 1.2.840.113549.1.7.1 1.2.840.113549.1.7.1 1.2.840.113549.1.12.10.1.2 1.2.840.113549.1.12.1.3 0xC8CCE579B6DE5B393F7C4885714C04BA 2000 0x...(shortened fro readability) 1.2.840.113549.1.9.20 0x00630061 1.2.840.113549.1.9.21 0x0CDA92EB395D4697A9D178352AF6B2BF06947888 1.2.840.113549.1.7.6 1.2.840.113549.1.7.1 1.2.840.113549.1.12.1.6 0x7F432D60BCD2888476E6CB9CD2BC69F1 2000 0x...(shortened fro readability) 0x0E8E4C15DCB1D87F 1.3.14.3.2.26 0x6DFFA14B5A8A32A87DAD2CFCE1EAEBDAFB89C897 0x826699C21B9A4C9E3E608D3C8FBD2310 2000 {code} The oid is 3x in there - among others. The following things point to a pkcs12 format: # Presence of PKCS#12-specific object identifiers (OIDs): ## PKCS#12 Bag Types: presence of OIDs such as 1.2.840.113549.1.12.10.1.x, which indicate different types of key and certificate bags (KeyBags, CertBags, etc.). ## PKCS#12 PbeIds: Encryption and hashing OIDs such as 1.2.840.113549.1.12.1.x, which indicate the use of specific encryption mechanisms # Use of encryption schemes: ## Recognize encryption schemes, especially those that are typical for PKCS#12, such as pbeWithSHAAnd3-KeyTripleDES-CBC and pbeWithSHAAnd40BitRC2-CBC. These schemes are crucial for the security of PKCS#12 files and a clear indication of their presence. # Structure of the file: ## Analyzing the file structure for multi-level nested SEQUENCE and OCTET_STRING elements, which are typically used to store encrypted private keys and certificates. The complexity of this structure is characteristic of PKCS#12 files. # Specific attributes: ## PKCS#9 attributes such as friendlyName (OID 1.2.840.113549.1.9.20) and localKeyID (OID 1.2.840.113549.1.9.21) are commonly used to provide metadata for keys and certificates within the container. # ... and more However since we are already talking about Libraries - Standard Java Crypto and BouncyCastle have all this already inside. They are parsing the structures, analyze and use it. So using one of these two would be the easiest solution imho. I have never written a Detector so please excuse my ignorance: {code:java} public class PKCS12Detector implements Detector { private static final long serialVersionUID = -8414458255467101503L; private static final MediaType PKCS12_MEDIA_TYPE = MediaType.application("x-pkcs12"); @Override public MediaType detect(InputStream input, Metadata metadata) { try { KeyStore keyStore = KeyStore.getInstance("PKCS12"); keyStore.load(input, null); return PKCS12_MEDIA_TYPE; // success } catch (Exception e) { return MediaType.OCTET_STREAM; // something else } } } {code} A bouncy castle one would look quite similar. And also the Keystore loading takes quite some time.. was (Author: tom_1st): PKCS12 is not the easiest format :-| The oid for pkcs12 starts with "1.2.840.113549.1.12" I decoded one pkcs12 example (from redhat) and got the following: {code:java} 3 1.2.840.113549.1.7.1 1.2.840.113549.1.7.1 1.2.840.113549.1.12.10.1.2 1.2.840.113549.1.12.1.3 0xC8CCE579B6DE5B393F7C4885714C04BA 2000 0x...(shortened fro readability) 1.2.840.113549.1.9.20 0x00630061 1.2.840.113549.1.9.21 0x0CDA92EB395D4697A9D178352AF6B2BF06947888
[jira] [Comment Edited] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816811#comment-17816811 ] Lonzak edited comment on TIKA-3784 at 2/12/24 11:15 PM: PKCS12 is not the easiest format :-| The oid for pkcs12 starts with "1.2.840.113549.1.12" I decoded one pkcs12 example (from redhat) and got the following: {code:java} 3 1.2.840.113549.1.7.1 1.2.840.113549.1.7.1 1.2.840.113549.1.12.10.1.2 1.2.840.113549.1.12.1.3 0xC8CCE579B6DE5B393F7C4885714C04BA 2000 0x...(shortened fro readability) 1.2.840.113549.1.9.20 0x00630061 1.2.840.113549.1.9.21 0x0CDA92EB395D4697A9D178352AF6B2BF06947888 1.2.840.113549.1.7.6 1.2.840.113549.1.7.1 1.2.840.113549.1.12.1.6 0x7F432D60BCD2888476E6CB9CD2BC69F1 2000 0x...(shortened fro readability) 0x0E8E4C15DCB1D87F 1.3.14.3.2.26 0x6DFFA14B5A8A32A87DAD2CFCE1EAEBDAFB89C897 0x826699C21B9A4C9E3E608D3C8FBD2310 2000 {code} The oid is 3x in there - among others. The following things point to a pkcs12 format: # Presence of PKCS#12-specific object identifiers (OIDs): ## PKCS#12 Bag Types: presence of OIDs such as 1.2.840.113549.1.12.10.1.x, which indicate different types of key and certificate bags (KeyBags, CertBags, etc.). ## PKCS#12 PbeIds: Encryption and hashing OIDs such as 1.2.840.113549.1.12.1.x, which indicate the use of specific encryption mechanisms # Use of encryption schemes: ## Recognize encryption schemes, especially those that are typical for PKCS#12, such as pbeWithSHAAnd3-KeyTripleDES-CBC and pbeWithSHAAnd40BitRC2-CBC. These schemes are crucial for the security of PKCS#12 files and a clear indication of their presence. # Structure of the file: ## Analyzing the file structure for multi-level nested SEQUENCE and OCTET_STRING elements, which are typically used to store encrypted private keys and certificates. The complexity of this structure is characteristic of PKCS#12 files. # Specific attributes: ## PKCS#9 attributes such as friendlyName (OID 1.2.840.113549.1.9.20) and localKeyID (OID 1.2.840.113549.1.9.21) are commonly used to provide metadata for keys and certificates within the container. # ... and more However since we are already talking about Libraries - Standard Java Crypto and BouncyCastle have all this already inside. They are parsing the structures, analyze and use it. So using one of these two would be the easiest solution imho. I have never written a Detector so please excuse my ignorance: {code:java} public class PKCS12Detector implements Detector { private static final long serialVersionUID = -8414458255467101503L; private static final MediaType PKCS12_MEDIA_TYPE = MediaType.application("x-pkcs12"); @Override public MediaType detect(InputStream input, Metadata metadata) { try { KeyStore keyStore = KeyStore.getInstance("PKCS12"); keyStore.load(input, null); return PKCS12_MEDIA_TYPE; // success } catch (Exception e) { return MediaType.OCTET_STREAM; // something else } } } {code} A bouncy castle one would look quite similar... was (Author: tom_1st): PKCS12 is not the easiest format :-| The oid for pkcs12 starts with "1.2.840.113549.1.12" I decoded one pkcs12 example (from redhat) and got the following: {code:java} 3 1.2.840.113549.1.7.1 1.2.840.113549.1.7.1 1.2.840.113549.1.12.10.1.2 1.2.840.113549.1.12.1.3 0xC8CCE579B6DE5B393F7C4885714C04BA 2000 0x...(shortened fro readability) 1.2.840.113549.1.9.20 0x00630061 1.2.840.113549.1.9.21 0x0CDA92EB395D4697A9D178352AF6B2BF06947888
[jira] [Updated] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lonzak updated TIKA-4194: - Description: We use tika to detect the type of a file which is uploaded. In most cases this works quite well. However recently some files were rejected because tika reports an invalid file type. We'll get {code:java} APPLICATION/OCTET-STREAM{code} instead of {code:java} APPLICATION/X-X509-KEY{code} (As pointed out in TIKA-3784 the mimetype should really be "application/x-pkcs12" but for us "application/x-x509-key" works for now) I did an analysis and found that tika doesn't recognize certain types of pkcs12 keystores. The test keystores can be found [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. I created a list to show which ones are effected. Out of 157 keystores 132 are correctly detected and 25 are not. ||#||correct?||type||filename|| |1|OK|APPLICATION/X-X509-KEY; FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |2|OK|APPLICATION/X-X509-KEY; FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |3|OK|APPLICATION/X-X509-KEY; FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |4|OK|APPLICATION/X-X509-KEY; FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |5|OK|APPLICATION/X-X509-KEY; FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| |6|OK|APPLICATION/X-X509-KEY; FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |7|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |8|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |9|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |10|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |11|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| |12|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |13|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |14|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |15|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |16|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |17|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |18|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(5),prf(default)),rc2-cbc(keyBits(160=40bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |19|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(5),prf(hmacWithSHA256)),rc2-cbc(keyBits(160=40bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |20|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(8),prf(default)),rc2-cbc(keyBits(120=64bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| |21|OK|APPLICATION/X-X509-KEY; FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(defaul
[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816811#comment-17816811 ] Lonzak commented on TIKA-3784: -- PKCS12 is not the easiest format :-| The oid for pkcs12 starts with "1.2.840.113549.1.12" I decoded one pkcs12 example (from redhat) and got the following: {code:java} 3 1.2.840.113549.1.7.1 1.2.840.113549.1.7.1 1.2.840.113549.1.12.10.1.2 1.2.840.113549.1.12.1.3 0xC8CCE579B6DE5B393F7C4885714C04BA 2000 0x...(shortened fro readability) 1.2.840.113549.1.9.20 0x00630061 1.2.840.113549.1.9.21 0x0CDA92EB395D4697A9D178352AF6B2BF06947888 1.2.840.113549.1.7.6 1.2.840.113549.1.7.1 1.2.840.113549.1.12.1.6 0x7F432D60BCD2888476E6CB9CD2BC69F1 2000 0x...(shortened fro readability) 0x0E8E4C15DCB1D87F 1.3.14.3.2.26 0x6DFFA14B5A8A32A87DAD2CFCE1EAEBDAFB89C897 0x826699C21B9A4C9E3E608D3C8FBD2310 2000 {code} The following things points to a pkcs12 format: # Presence of PKCS#12-specific object identifiers (OIDs): ## PKCS#12 Bag Types: presence of OIDs such as 1.2.840.113549.1.12.10.1.x, which indicate different types of key and certificate bags (KeyBags, CertBags, etc.). ## PKCS#12 PbeIds: Encryption and hashing OIDs such as 1.2.840.113549.1.12.1.x, which indicate the use of specific encryption mechanisms # Use of encryption schemes: ## Recognize encryption schemes, especially those that are typical for PKCS#12, such as pbeWithSHAAnd3-KeyTripleDES-CBC and pbeWithSHAAnd40BitRC2-CBC. These schemes are crucial for the security of PKCS#12 files and a clear indication of their presence. # Structure of the file: ## Analyzing the file structure for multi-level nested SEQUENCE and OCTET_STRING elements, which are typically used to store encrypted private keys and certificates. The complexity of this structure is characteristic of PKCS#12 files. # Specific attributes: ## PKCS#9 attributes such as friendlyName (OID 1.2.840.113549.1.9.20) and localKeyID (OID 1.2.840.113549.1.9.21) are commonly used to provide metadata for keys and certificates within the container. # ... and more However since we are already talking about Libraries - Standard Java Crypto and BouncyCastle have all this already inside. They are parsing the structures, analyze and use it. So using one of these two would be the easiest solution imho. I have never written a Detector so please excuse my ignorance: {code:java} public class PKCS12Detector implements Detector { private static final long serialVersionUID = -8414458255467101503L; private static final MediaType PKCS12_MEDIA_TYPE = MediaType.application("x-pkcs12"); @Override public MediaType detect(InputStream input, Metadata metadata) { try { KeyStore keyStore = KeyStore.getInstance("PKCS12"); keyStore.load(input, null); return PKCS12_MEDIA_TYPE; // success } catch (Exception e) { return MediaType.OCTET_STREAM; // something else } } } {code} A bouncy castle one would look quite similar... > Detector returns "application/x-x509-key" when scanning a .p12 file > --- > > Key: TIKA-3784 > URL: https://issues.apache.org/jira/browse/TIKA-3784 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 1.26 >Reporter: Matthias Hofbauer >Priority: Critical > Attachments: dump_p12s.txt > > > We are using tika to check if the MIME type of the file extensions matches > with the MIME type of the file content. > After our upgrade from tika-core 1.22 to 1.26 our logic does not work anymore > for certificates of type .p12, .pfx, .cer, .der. > For the .p12 and .pfx extension the MIME type is "application/x-pkcs12" but > the tika detector returns "application/x-x509-key" instead. > After checking the tika-mimetype.xml and comparing it to my .p12 file I found > the following MIME magic which explains why I got these types back. > {code:xml} > > > > > > >
[jira] [Commented] (TIKA-4191) tika-core and other deps should be "provided" in non-app contexts
[ https://issues.apache.org/jira/browse/TIKA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816804#comment-17816804 ] Hudson commented on TIKA-4191: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1505 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1505/]) TIKA-4191 -- reduce tika-core's scope to "provided" where possible (#1575) (github: [https://github.com/apache/tika/commit/fb6ba1a33a225d91de1e2d162317ae629ee8c3ab]) * (edit) tika-app/pom.xml * (edit) tika-eval/tika-eval-app/pom.xml * (edit) tika-server/tika-server-core/pom.xml * (edit) tika-eval/tika-eval-core/pom.xml * (edit) tika-fuzzing/pom.xml * (edit) tika-translate/pom.xml * (edit) tika-xmp/pom.xml * (edit) tika-java7/pom.xml * (edit) CHANGES.txt * (edit) tika-batch/pom.xml > tika-core and other deps should be "provided" in non-app contexts > - > > Key: TIKA-4191 > URL: https://issues.apache.org/jira/browse/TIKA-4191 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816788#comment-17816788 ] Nick Burch commented on TIKA-3784: -- >From [https://datatracker.ietf.org/doc/rfc7292/] it looks like PKCS12 is based >on PKCS7, so that's expected. There's a few more types defined in >[https://www.rfc-editor.org/rfc/rfc7292.html#appendix-D] - not sure if we can >find any of those to match on? Thought [https://www.cs.auckland.ac.nz/~pgut001/pubs/pfx.html] does suggest this isn't an ideal format... > Detector returns "application/x-x509-key" when scanning a .p12 file > --- > > Key: TIKA-3784 > URL: https://issues.apache.org/jira/browse/TIKA-3784 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 1.26 >Reporter: Matthias Hofbauer >Priority: Critical > Attachments: dump_p12s.txt > > > We are using tika to check if the MIME type of the file extensions matches > with the MIME type of the file content. > After our upgrade from tika-core 1.22 to 1.26 our logic does not work anymore > for certificates of type .p12, .pfx, .cer, .der. > For the .p12 and .pfx extension the MIME type is "application/x-pkcs12" but > the tika detector returns "application/x-x509-key" instead. > After checking the tika-mimetype.xml and comparing it to my .p12 file I found > the following MIME magic which explains why I got these types back. > {code:xml} > > > > > > > mask="0x00FC" offset="0"/> > mask="0xFC" offset="0"/> > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4196) Add a BOM charset detector
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816779#comment-17816779 ] Hudson commented on TIKA-4196: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1504/]) TIKA-4196 -- add a bom EncodingDetector (#1590) (github: [https://github.com/apache/tika/commit/7c758c31e6e3f52b4c5f8ad2ac8169dc0f8b310a]) * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/test/java/org/apache/tika/parser/txt/BOMDetectorTest.java * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/main/java/org/apache/tika/parser/txt/BOMDetector.java > Add a BOM charset detector > -- > > Key: TIKA-4196 > URL: https://issues.apache.org/jira/browse/TIKA-4196 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Trivial > > The ICU4j and the StandardHtmlEncodingDetector detectors include a bom > detector, but for some use cases it would be useful to factor that out and > allow users to configure bom detection on their own. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816781#comment-17816781 ] Hudson commented on TIKA-4194: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1504/]) [TIKA-4194] Fix for unrecognized pkcs12 keystores (#1589) (github: [https://github.com/apache/tika/commit/c2acd713bb31b88419ebc70dd31c4bfb23bd390f]) * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)
[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816780#comment-17816780 ] Hudson commented on TIKA-4195: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1504/]) TIKA-4195 -- jsoup parser shouldn't conceal backoff to default encoding (#1591) (github: [https://github.com/apache/tika/commit/455409bf80801152e7c855ddc994fedc32c4cfcf]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/test/java/org/apache/tika/parser/txt/TXTParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java * (edit) tika-core/src/main/java/org/apache/tika/detect/AutoDetectReader.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (edit) tika-core/src/main/java/org/apache/tika/detect/CompositeEncodingDetector.java > JSoupParser conceals null from the EncodingDetector > --- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > Fix For: 3.0.0 > > > The JSoupParser runs encoding detection on the InputStream. If the result is > null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4191) tika-core and other deps should be "provided" in non-app contexts
[ https://issues.apache.org/jira/browse/TIKA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816715#comment-17816715 ] ASF GitHub Bot commented on TIKA-4191: -- tballison merged PR #1575: URL: https://github.com/apache/tika/pull/1575 > tika-core and other deps should be "provided" in non-app contexts > - > > Key: TIKA-4191 > URL: https://issues.apache.org/jira/browse/TIKA-4191 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] TIKA-4191 -- reduce tika-core's scope to "provided" where possible [tika]
tballison merged PR #1575: URL: https://github.com/apache/tika/pull/1575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (TIKA-4197) Downgrade jackrabbit in 2.x
[ https://issues.apache.org/jira/browse/TIKA-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4197. --- Fix Version/s: 2.9.2 Resolution: Fixed > Downgrade jackrabbit in 2.x > --- > > Key: TIKA-4197 > URL: https://issues.apache.org/jira/browse/TIKA-4197 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Major > Fix For: 2.9.2 > > > Looks like the latest jackrabbit requires Java 11: > https://github.com/apache/tika/actions/runs/7875864667/job/21488695827 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-4197) Downgrade jackrabbit in 2.x
Tim Allison created TIKA-4197: - Summary: Downgrade jackrabbit in 2.x Key: TIKA-4197 URL: https://issues.apache.org/jira/browse/TIKA-4197 Project: Tika Issue Type: Bug Reporter: Tim Allison Looks like the latest jackrabbit requires Java 11: https://github.com/apache/tika/actions/runs/7875864667/job/21488695827 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-4195) JSoupParser conceals null from the EncodingDetector
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4195: -- Description: The JSoupParser runs encoding detection on the InputStream. If the result is null, the parser applies the default charset -- US-ASCII. This behavior is ok. The problem is that there is no way to distinguish when a faulty encoding detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I don't think the JSoupParser should report the fallback encoding as if it were detected. I'm not sure how best to report this in the metadata, but we need to be able to differentiate detection and fallback encoding. was: The JSoupParser is runs encoding detection on the inputstream. If the result is null, the parser applies the default charset -- US-ASCII. This behavior is ok. The problem is that there is no way to distinguish when a faulty encoding detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I don't think the JSoupParser should report the fallback encoding as if it were detected. I'm not sure how best to report this in the metadata, but we need to be able to differentiate detection and fallback encoding. > JSoupParser conceals null from the EncodingDetector > --- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > Fix For: 3.0.0 > > > The JSoupParser runs encoding detection on the InputStream. If the result is > null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (TIKA-4195) JSoupParser conceals null from the EncodingDetector
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4195. --- Fix Version/s: 3.0.0 Resolution: Fixed > JSoupParser conceals null from the EncodingDetector > --- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > Fix For: 3.0.0 > > > The JSoupParser is runs encoding detection on the inputstream. If the result > is null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] TIKA-4195 -- jsoup parser shouldn't conceal backoff to default encoding [tika]
tballison merged PR #1591: URL: https://github.com/apache/tika/pull/1591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816694#comment-17816694 ] ASF GitHub Bot commented on TIKA-4195: -- tballison merged PR #1591: URL: https://github.com/apache/tika/pull/1591 > JSoupParser conceals null from the EncodingDetector > --- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > > The JSoupParser is runs encoding detection on the inputstream. If the result > is null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816689#comment-17816689 ] Tim Allison commented on TIKA-4194: --- Merged and cherry-picked into branch_2x. [~tom_1st] if you do have time to look at TIKA-3784, I'd be interested if you see any value in parsing these files with bouncycastle as a detector. Thank you! > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |18|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(5),prf(default)),rc2-cb
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816687#comment-17816687 ] ASF GitHub Bot commented on TIKA-4194: -- tballison merged PR #1589: URL: https://github.com/apache/tika/pull/1589 > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |18|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(5),prf(default)),rc2-cbc(keyBits(160=40bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |19|OK|APPLICATION/X-X509-KEY; > FORMAT=DER
Re: [PR] [TIKA-4194] Fix for unrecognized pkcs12 keystores [tika]
tballison merged PR #1589: URL: https://github.com/apache/tika/pull/1589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-4196) Add a BOM charset detector
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816680#comment-17816680 ] ASF GitHub Bot commented on TIKA-4196: -- tballison merged PR #1590: URL: https://github.com/apache/tika/pull/1590 > Add a BOM charset detector > -- > > Key: TIKA-4196 > URL: https://issues.apache.org/jira/browse/TIKA-4196 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Trivial > > The ICU4j and the StandardHtmlEncodingDetector detectors include a bom > detector, but for some use cases it would be useful to factor that out and > allow users to configure bom detection on their own. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] TIKA-4196 [tika]
tballison merged PR #1590: URL: https://github.com/apache/tika/pull/1590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816679#comment-17816679 ] ASF GitHub Bot commented on TIKA-4195: -- tballison opened a new pull request, #1591: URL: https://github.com/apache/tika/pull/1591 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! > JSoupParser conceals null from the EncodingDetector > --- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > > The JSoupParser is runs encoding detection on the inputstream. If the result > is null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] TIKA-4195 -- jsoup parser shouldn't conceal backoff to default encoding [tika]
tballison opened a new pull request, #1591: URL: https://github.com/apache/tika/pull/1591 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (TIKA-4196) Add a BOM charset detector
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4196: -- Description: The ICU4j and the StandardHtmlEncodingDetector detectors include a bom detector, but for some use cases it would be useful to factor that out and allow users to configure bom detection on their own. (was: The ICU4j detector uses a bom detector, but for some use cases it would be useful to factor that out and allow users to configure bom detection on their own.) > Add a BOM charset detector > -- > > Key: TIKA-4196 > URL: https://issues.apache.org/jira/browse/TIKA-4196 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Trivial > > The ICU4j and the StandardHtmlEncodingDetector detectors include a bom > detector, but for some use cases it would be useful to factor that out and > allow users to configure bom detection on their own. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4196) Add a BOM charset detector
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816668#comment-17816668 ] ASF GitHub Bot commented on TIKA-4196: -- tballison opened a new pull request, #1590: URL: https://github.com/apache/tika/pull/1590 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! > Add a BOM charset detector > -- > > Key: TIKA-4196 > URL: https://issues.apache.org/jira/browse/TIKA-4196 > Project: Tika > Issue Type: New Feature >Reporter: Tim Allison >Priority: Trivial > > The ICU4j detector uses a bom detector, but for some use cases it would be > useful to factor that out and allow users to configure bom detection on their > own. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] TIKA-4196 [tika]
tballison opened a new pull request, #1590: URL: https://github.com/apache/tika/pull/1590 Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA) which describes the problem or the improvement. We cannot accept pull requests without an issue because the change wouldn't be listed in the release notes. * the issue ID (`TIKA-`) - is referenced in the title of the pull request - and placed in front of your commit messages surrounded by square brackets (`[TIKA-] Issue or pull request title`) * commits are squashed into a single one (or few commits for larger changes) * Tika is successfully built and unit tests pass by running `mvn clean test` * there should be no conflicts when merging the pull request branch into the *recent* `main` branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled `main` branch * if you add new module that downstream users will depend upon add it to relevant group in `tika-bom/pom.xml`. We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the [Tika mailing list](http://tika.apache.org/mail-lists.html). Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (TIKA-4196) Add a BOM charset detector
Tim Allison created TIKA-4196: - Summary: Add a BOM charset detector Key: TIKA-4196 URL: https://issues.apache.org/jira/browse/TIKA-4196 Project: Tika Issue Type: New Feature Reporter: Tim Allison The ICU4j detector uses a bom detector, but for some use cases it would be useful to factor that out and allow users to configure bom detection on their own. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816661#comment-17816661 ] Tim Allison commented on TIKA-4194: --- Thank you for this! I'll try to take a look later today. Is there anything in https://issues.apache.org/jira/browse/TIKA-3784?focusedCommentId=17816191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17816191 that could be useful? IIUC, magic is doubtful for this file type. On that comment I ran the bouncycastle parser on the files and pulled out some info. Can we use that info for detection? Again, thank you, and I'm not necessarily against a "better than what we have" magic guessing approach. > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/
[jira] [Created] (TIKA-4195) JSoupParser conceals null from the EncodingDetector
Tim Allison created TIKA-4195: - Summary: JSoupParser conceals null from the EncodingDetector Key: TIKA-4195 URL: https://issues.apache.org/jira/browse/TIKA-4195 Project: Tika Issue Type: Improvement Reporter: Tim Allison The JSoupParser is runs encoding detection on the inputstream. If the result is null, the parser applies the default charset -- US-ASCII. This behavior is ok. The problem is that there is no way to distinguish when a faulty encoding detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I don't think the JSoupParser should report the fallback encoding as if it were detected. I'm not sure how best to report this in the metadata, but we need to be able to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816607#comment-17816607 ] Lonzak edited comment on TIKA-4194 at 2/12/24 1:47 PM: --- Interestingly the "application/pkcs7-signature" type looks quite similar: {code:java} {code} Just had to adapt the offset a bit and and did work: {code:java} {code} However I didn't find a keystore with 0x3081 so the offset is unclear in that case. My solution would look like this now and works for all the cases... {code:java} ... {code} was (Author: tom_1st): Interestingly the "application/pkcs7-signature" type looks quite similar: {code:java} {code} Just had to adapt the offset a bit and and did work: {code:java} {code} However I didn't find a keystore with 0x3081 so the offset is unclear in that case. My solution would look like this now and works for all the cases... {code:java} ... {code} > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816607#comment-17816607 ] Lonzak commented on TIKA-4194: -- Interestingly the "application/pkcs7-signature" type looks quite similar: {code:java} {code} Just had to adapt the offset a bit and and did work: {code:java} {code} However I didn't find a keystore with 0x3081 so the offset is unclear in that case. My solution would look like this now and works for all the cases... {code:java} ... {code} > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |1
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816608#comment-17816608 ] Lonzak commented on TIKA-4194: -- Added a pull request: https://github.com/apache/tika/pull/1589 > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |18|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(5),prf(default)),rc2-cbc(keyBits(160=40bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |19|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PB
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816606#comment-17816606 ] ASF GitHub Bot commented on TIKA-4194: -- Lonzak commented on PR #1589: URL: https://github.com/apache/tika/pull/1589#issuecomment-1938709322 It would appreciated if the change could go into 2.9.X ([branch_2x](https://github.com/apache/tika/tree/branch_2x)) > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |18|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyL
Re: [PR] [TIKA-4194] Fix for unrecognized pkcs12 keystores [tika]
Lonzak commented on PR #1589: URL: https://github.com/apache/tika/pull/1589#issuecomment-1938709322 It would appreciated if the change could go into 2.9.X ([branch_2x](https://github.com/apache/tika/tree/branch_2x)) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816605#comment-17816605 ] ASF GitHub Bot commented on TIKA-4194: -- Lonzak opened a new pull request, #1589: URL: https://github.com/apache/tika/pull/1589 Fixes the issue that some pkcs12 keystores are not correctly detected. Tested with 157x p12 keystores (kindly provided by [redhat](https://github.com/redhat-qe-security/keyfile-corpus/tree/master).) > tika fails to detect certain pkcs12 keystores types p12 pfx > --- > > Key: TIKA-4194 > URL: https://issues.apache.org/jira/browse/TIKA-4194 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.9.1 >Reporter: Lonzak >Priority: Major > > We use tika to detect the type of a file which is uploaded. In most cases > this works quite well. However recently some files were rejected because tika > reports an invalid file type. We'll get > {code:java} > APPLICATION/OCTET-STREAM{code} > instead of > {code:java} > APPLICATION/X-X509-KEY{code} > I did an analysis and found that tika doesn't recognize certain types of > pkcs12 keystores. The test keystores can be found > [here|https://github.com/redhat-qe-security/keyfile-corpus/tree/master]. > I created a list to show which ones are effected. Out of 157 keystores 132 > are correctly detected and 25 are not. > > ||#||correct?||type||filename|| > |1|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |2|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|dsa(1024,sha1),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |3|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |4|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |5|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(none),key(none).p12| > |6|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|ecdsa(P-256,sha256),cert(pbeWithSHAAnd40BitRC2-CBC,salt(8),iter(2048)),key(pbeWithSHAAnd3-KeyTripleDES-CBC,salt(8),iter(2048)),mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |7|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |8|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(0),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |9|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |10|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(16),iter(2048),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |11|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(64),iter(100),keyLen(default),prf(hmacWithSHA512)),aes-256-cbc(IV(16,mac(sha512,salt(64),iter(100)),pass(ascii).p12| > |12|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |13|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(1),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |14|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),aes-128-cbc(IV(16,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |15|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(100),keyLen(default),prf(default)),des-ede3-cbc(IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |16|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(default)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |17|OK|APPLICATION/X-X509-KEY; > FORMAT=DER|rsa(2048,sha256),cert&key(PBES2(PBKDF2(salt(8),iter(2048),keyLen(16),prf(hmacWithSHA256)),rc2-cbc(keyBits(56=128bit),IV(8,mac(sha1,salt(8),iter(2048)),pass(ascii).p12| > |18|OK|APPLICATION/X-X509-KEY; > FO