[jira] [Updated] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1748:
--
Attachment: TIKA-1748.patch

Y, not too much work.  All tests pass, what could possibly go wrong?

I added markup for tables in pptx parallel to ppt.  Happy to break that into 
separate issue if desired.

[~gagravarr], if you have a chance, would you mind reviewing this before I 
commit?

> Upgrade to POI 3.13-final when available
> 
>
> Key: TIKA-1748
> URL: https://issues.apache.org/jira/browse/TIKA-1748
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Attachments: TIKA-1748.patch
>
>
> Upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: SSL configuration in tika server 1.10

2015-09-28 Thread Rahul Khandelwal
Hi,


Will it be possible to configure TIKA server to handle the traffic on HTTPS
?

Currently i am running tika-server.jar and server is reachable via
http://localhost:/tika
But my requirement is to have server running on HTTPS.


On Fri, Sep 25, 2015 at 6:44 PM, Rahul Khandelwal 
wrote:

> Hi,
>
>
> I need help on configuring tika server for HTTPS request.
>
>
>
> --
>
> *Thanks,*
>
> *Rahul Khandelwal*
>



-- 
*Regards,*
*Rahul Khandelwal*
*Software Engineer,*
*Druva Data Solutions, Pune*


[jira] [Comment Edited] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933186#comment-14933186
 ] 

Tim Allison edited comment on TIKA-1748 at 9/28/15 11:40 AM:
-

As [~kunda] pointed out, you're using a future version of POI. :)  But 
seriously, 3.13-final includes a major refactoring of HSLF (sl branch), and we 
need to modify Tika (slightly, I hope*) to work with this new branch.

* Famous last words...


was (Author: talli...@mitre.org):
As [~kunda] pointed out, you're using a future version of POI. :)  But 
seriously, 3.13-final includes a major refactoring of HSLF (sl branch), and we 
need to modify Tika (slightly, I hope) to work with this new branch.

> Upgrade to POI 3.13-final when available
> 
>
> Key: TIKA-1748
> URL: https://issues.apache.org/jira/browse/TIKA-1748
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>
> Upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933186#comment-14933186
 ] 

Tim Allison commented on TIKA-1748:
---

As [~kunda] pointed out, you're using a future version of POI. :)  But 
seriously, 3.13-final includes a major refactoring of HSLF (sl branch), and we 
need to modify Tika (slightly, I hope) to work with this new branch.

> Upgrade to POI 3.13-final when available
> 
>
> Key: TIKA-1748
> URL: https://issues.apache.org/jira/browse/TIKA-1748
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
>
> Upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1736) Bouncy Castle version binary incompatibility

2015-09-28 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933175#comment-14933175
 ] 

Tim Allison commented on TIKA-1736:
---

Should be fixed when 
[2.1.1|https://sourceforge.net/p/jackcessencrypt/feature-requests/2/] is 
released.  Thank you, James Ahlborn!

> Bouncy Castle version binary incompatibility
> 
>
> Key: TIKA-1736
> URL: https://issues.apache.org/jira/browse/TIKA-1736
> Project: Tika
>  Issue Type: Bug
>Reporter: Tim Allison
>Priority: Minor
>
> One file in our Common Crawl stash demonstrates a Bouncy Castle version 
> conflict...incompatible binaries with Jackcess and our current version of 
> Bouncy Castle.
> java.lang.NoSuchMethodError: 
> org.bouncycastle.crypto.StreamCipher.processBytes([BII[BI)V
>  at 
> com.healthmarketscience.jackcess.impl.BaseCryptCodecHandler.streamDecrypt(BaseCryptCodecHandler.java:91)
>  at 
> com.healthmarketscience.jackcess.impl.BaseJetCryptCodecHandler.decodePage(BaseJetCryptCodecHandler.java:62)
>  at 
> com.healthmarketscience.jackcess.impl.PageChannel.readPage(PageChannel.java:224)
>  at com.healthmarketscience.jackcess.impl.UsageMap.read(UsageMap.java:130)
>  at 
> com.healthmarketscience.jackcess.impl.PageChannel.initialize(PageChannel.java:117)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:516)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:389)
>  at 
> com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:248)
>  at TestIt.testIt(TestIt.java:19)
> A full description and test file are attached 
> [here|https://sourceforge.net/p/jackcessencrypt/feature-requests/2/#b65d].
> There was an API change in 1.51 that causes this problem.  1.50 works with 
> the one test file, and 1.51 does not work.  We're currently using 1.52.
> It looks like POI is using 1.51 in trunk, now. According to PDFBox trunk's 
> build.xml, they're using 1.50, but their pom.xml has 1.51.
> Two options that I see:
> 1) close our eyes and hope it doesn't affect too many people before Jackcess 
> Encrypt upgrades... perhaps add a try/catch for this one version conflict?  
> Is there any shade magic we can do on our end ... or (I'm assuming) would 
> that have to be done by Jackcess (or an upgrade, of course)?
> 2) downgrade our bc-prov to 1.50 (from 1.52).
> Other options?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1085) PDF header and mime detection

2015-09-28 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933108#comment-14933108
 ] 

Nick Burch commented on TIKA-1085:
--

I think we're still waiting for you to confirm if the fix I applied back in May 
works or not! (The fix was in Tika 1.9 and 1.10)

> PDF header and mime detection
> -
>
> Key: TIKA-1085
> URL: https://issues.apache.org/jira/browse/TIKA-1085
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.3
>Reporter: Marco Quaranta
>Priority: Minor
>  Labels: detection, header, mime, pdf
> Attachments: hello-world-bom.pdf, hello-world.pdf, test.pdf
>
>
> I've found some PDF files Tika recognizes as application/octet-stream.
> These files differs from regularly identified PDF having a different header: 
> the %PDF-N.n string isn't at the beginning (zero offset) of the file but in 
> the first 1024 bytes.
> PDF reference states that "The first line of a PDF file shall be a header 
> consisting of the 5 characters  %PDF–  followed by a version 
> number of the form 1.N, where N is a digit between 0 and 7" 
> (http://tinyurl.com/8vnzm3c "p. 7.5.2 File Header"). 
> Looking further at implementation notes by Adobe (http://tinyurl.com/cbqpb24 
> p. 3.4.1 File Header) I've discover that: "Acrobat viewers require only that 
> the header appear somewhere within the first 1024 bytes of the file"
> What do you think about a PDF magic match with an offset 0:1024?
> 
> Thank you,
> Marco



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1085) PDF header and mime detection

2015-09-28 Thread Matthew Buckett (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933104#comment-14933104
 ] 

Matthew Buckett commented on TIKA-1085:
---

Is there anything I can do to help this get fixed?

> PDF header and mime detection
> -
>
> Key: TIKA-1085
> URL: https://issues.apache.org/jira/browse/TIKA-1085
> Project: Tika
>  Issue Type: Improvement
>  Components: mime
>Affects Versions: 1.3
>Reporter: Marco Quaranta
>Priority: Minor
>  Labels: detection, header, mime, pdf
> Attachments: hello-world-bom.pdf, hello-world.pdf, test.pdf
>
>
> I've found some PDF files Tika recognizes as application/octet-stream.
> These files differs from regularly identified PDF having a different header: 
> the %PDF-N.n string isn't at the beginning (zero offset) of the file but in 
> the first 1024 bytes.
> PDF reference states that "The first line of a PDF file shall be a header 
> consisting of the 5 characters  %PDF–  followed by a version 
> number of the form 1.N, where N is a digit between 0 and 7" 
> (http://tinyurl.com/8vnzm3c "p. 7.5.2 File Header"). 
> Looking further at implementation notes by Adobe (http://tinyurl.com/cbqpb24 
> p. 3.4.1 File Header) I've discover that: "Acrobat viewers require only that 
> the header appear somewhere within the first 1024 bytes of the file"
> What do you think about a PDF magic match with an offset 0:1024?
> 
> Thank you,
> Marco



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)