[jira] [Updated] (TIKA-1748) Upgrade to POI 3.13-final when available
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1748: -- Attachment: TIKA-1748.patch Y, not too much work. All tests pass, what could possibly go wrong? I added markup for tables in pptx parallel to ppt. Happy to break that into separate issue if desired. [~gagravarr], if you have a chance, would you mind reviewing this before I commit? > Upgrade to POI 3.13-final when available > > > Key: TIKA-1748 > URL: https://issues.apache.org/jira/browse/TIKA-1748 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > Attachments: TIKA-1748.patch > > > Upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: SSL configuration in tika server 1.10
Hi, Will it be possible to configure TIKA server to handle the traffic on HTTPS ? Currently i am running tika-server.jar and server is reachable via http://localhost:/tika But my requirement is to have server running on HTTPS. On Fri, Sep 25, 2015 at 6:44 PM, Rahul Khandelwal wrote: > Hi, > > > I need help on configuring tika server for HTTPS request. > > > > -- > > *Thanks,* > > *Rahul Khandelwal* > -- *Regards,* *Rahul Khandelwal* *Software Engineer,* *Druva Data Solutions, Pune*
[jira] [Comment Edited] (TIKA-1748) Upgrade to POI 3.13-final when available
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933186#comment-14933186 ] Tim Allison edited comment on TIKA-1748 at 9/28/15 11:40 AM: - As [~kunda] pointed out, you're using a future version of POI. :) But seriously, 3.13-final includes a major refactoring of HSLF (sl branch), and we need to modify Tika (slightly, I hope*) to work with this new branch. * Famous last words... was (Author: talli...@mitre.org): As [~kunda] pointed out, you're using a future version of POI. :) But seriously, 3.13-final includes a major refactoring of HSLF (sl branch), and we need to modify Tika (slightly, I hope) to work with this new branch. > Upgrade to POI 3.13-final when available > > > Key: TIKA-1748 > URL: https://issues.apache.org/jira/browse/TIKA-1748 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > > Upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933186#comment-14933186 ] Tim Allison commented on TIKA-1748: --- As [~kunda] pointed out, you're using a future version of POI. :) But seriously, 3.13-final includes a major refactoring of HSLF (sl branch), and we need to modify Tika (slightly, I hope) to work with this new branch. > Upgrade to POI 3.13-final when available > > > Key: TIKA-1748 > URL: https://issues.apache.org/jira/browse/TIKA-1748 > Project: Tika > Issue Type: Task >Reporter: Tim Allison >Priority: Minor > > Upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1736) Bouncy Castle version binary incompatibility
[ https://issues.apache.org/jira/browse/TIKA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933175#comment-14933175 ] Tim Allison commented on TIKA-1736: --- Should be fixed when [2.1.1|https://sourceforge.net/p/jackcessencrypt/feature-requests/2/] is released. Thank you, James Ahlborn! > Bouncy Castle version binary incompatibility > > > Key: TIKA-1736 > URL: https://issues.apache.org/jira/browse/TIKA-1736 > Project: Tika > Issue Type: Bug >Reporter: Tim Allison >Priority: Minor > > One file in our Common Crawl stash demonstrates a Bouncy Castle version > conflict...incompatible binaries with Jackcess and our current version of > Bouncy Castle. > java.lang.NoSuchMethodError: > org.bouncycastle.crypto.StreamCipher.processBytes([BII[BI)V > at > com.healthmarketscience.jackcess.impl.BaseCryptCodecHandler.streamDecrypt(BaseCryptCodecHandler.java:91) > at > com.healthmarketscience.jackcess.impl.BaseJetCryptCodecHandler.decodePage(BaseJetCryptCodecHandler.java:62) > at > com.healthmarketscience.jackcess.impl.PageChannel.readPage(PageChannel.java:224) > at com.healthmarketscience.jackcess.impl.UsageMap.read(UsageMap.java:130) > at > com.healthmarketscience.jackcess.impl.PageChannel.initialize(PageChannel.java:117) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:516) > at > com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:389) > at > com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:248) > at TestIt.testIt(TestIt.java:19) > A full description and test file are attached > [here|https://sourceforge.net/p/jackcessencrypt/feature-requests/2/#b65d]. > There was an API change in 1.51 that causes this problem. 1.50 works with > the one test file, and 1.51 does not work. We're currently using 1.52. > It looks like POI is using 1.51 in trunk, now. According to PDFBox trunk's > build.xml, they're using 1.50, but their pom.xml has 1.51. > Two options that I see: > 1) close our eyes and hope it doesn't affect too many people before Jackcess > Encrypt upgrades... perhaps add a try/catch for this one version conflict? > Is there any shade magic we can do on our end ... or (I'm assuming) would > that have to be done by Jackcess (or an upgrade, of course)? > 2) downgrade our bc-prov to 1.50 (from 1.52). > Other options? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1085) PDF header and mime detection
[ https://issues.apache.org/jira/browse/TIKA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933108#comment-14933108 ] Nick Burch commented on TIKA-1085: -- I think we're still waiting for you to confirm if the fix I applied back in May works or not! (The fix was in Tika 1.9 and 1.10) > PDF header and mime detection > - > > Key: TIKA-1085 > URL: https://issues.apache.org/jira/browse/TIKA-1085 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.3 >Reporter: Marco Quaranta >Priority: Minor > Labels: detection, header, mime, pdf > Attachments: hello-world-bom.pdf, hello-world.pdf, test.pdf > > > I've found some PDF files Tika recognizes as application/octet-stream. > These files differs from regularly identified PDF having a different header: > the %PDF-N.n string isn't at the beginning (zero offset) of the file but in > the first 1024 bytes. > PDF reference states that "The first line of a PDF file shall be a header > consisting of the 5 characters %PDF– followed by a version > number of the form 1.N, where N is a digit between 0 and 7" > (http://tinyurl.com/8vnzm3c "p. 7.5.2 File Header"). > Looking further at implementation notes by Adobe (http://tinyurl.com/cbqpb24 > p. 3.4.1 File Header) I've discover that: "Acrobat viewers require only that > the header appear somewhere within the first 1024 bytes of the file" > What do you think about a PDF magic match with an offset 0:1024? > > Thank you, > Marco -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1085) PDF header and mime detection
[ https://issues.apache.org/jira/browse/TIKA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933104#comment-14933104 ] Matthew Buckett commented on TIKA-1085: --- Is there anything I can do to help this get fixed? > PDF header and mime detection > - > > Key: TIKA-1085 > URL: https://issues.apache.org/jira/browse/TIKA-1085 > Project: Tika > Issue Type: Improvement > Components: mime >Affects Versions: 1.3 >Reporter: Marco Quaranta >Priority: Minor > Labels: detection, header, mime, pdf > Attachments: hello-world-bom.pdf, hello-world.pdf, test.pdf > > > I've found some PDF files Tika recognizes as application/octet-stream. > These files differs from regularly identified PDF having a different header: > the %PDF-N.n string isn't at the beginning (zero offset) of the file but in > the first 1024 bytes. > PDF reference states that "The first line of a PDF file shall be a header > consisting of the 5 characters %PDF– followed by a version > number of the form 1.N, where N is a digit between 0 and 7" > (http://tinyurl.com/8vnzm3c "p. 7.5.2 File Header"). > Looking further at implementation notes by Adobe (http://tinyurl.com/cbqpb24 > p. 3.4.1 File Header) I've discover that: "Acrobat viewers require only that > the header appear somewhere within the first 1024 bytes of the file" > What do you think about a PDF magic match with an offset 0:1024? > > Thank you, > Marco -- This message was sent by Atlassian JIRA (v6.3.4#6332)