[jira] [Created] (TIKA-3721) DGN parser

2022-04-19 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3721:
--

 Summary: DGN parser
 Key: TIKA-3721
 URL: https://issues.apache.org/jira/browse/TIKA-3721
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 2.3.0
Reporter: Dan Coldrick


Does anyone have any experience with the DGN file format by MicroStation? I see 
TIKA doesn't have a parser so would it be possible to create one? 

https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-19 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524588#comment-17524588
 ] 

Dan Coldrick commented on TIKA-3719:


Hi [~tallison] 

I'm far from being a java developer so not sure how much further I can help but 
how about adding some parameter to the xml config file? Something like:
{code:java}

    
        
            
                true
                JKS
                1
                c:/temp/keystore.jks
                JKS
                1
                c:/temp/keystore.jks
            
        
    

{code}
Also holding keystore passwords in clear text doesn't feel right to me so might 
have to do something around encrypting them somehow.

Next step would also to add some Authorization (Basic Auth would be a good 
start :) ) to the server but maybe that would be a separate feature? Would that 
be worthwhile raising?

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-20 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525225#comment-17525225
 ] 

Dan Coldrick commented on TIKA-3719:


Hi [~tallison] 

Yes that sounds like a great idea, think it should be optional where it's 
either hard coded in the config or with the ability to get it from an 
Environment variable.

Any thoughts about adding Authorization? Separate JIRA? Personally would be 
really good to get to a stage where tika server can be a fully hosted rest 
service with Auth and some security around it.

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-20 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3725:
--

 Summary: Add Authorization to Tika Server (Suggest Basic to start 
off with)
 Key: TIKA-3725
 URL: https://issues.apache.org/jira/browse/TIKA-3725
 Project: Tika
  Issue Type: New Feature
  Components: tika-server
Affects Versions: 2.3.0
Reporter: Dan Coldrick


I would be good to get some Authentication/Authorization added to TIKA server 
to be able to add another layer of security around the Tika Server Rest service.

This could become a rabbit hole with the number of options available around 
Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter basic 
Auth is added. 

How to store user(s)/password suggest looking at how other apache products do 
the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-20 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525254#comment-17525254
 ] 

Dan Coldrick commented on TIKA-3719:


Hi [~tallison] 

Unfortunately I probably can't share our full use case :(

What I can say is it won't be public facing but our security guys will pickup 
that TIKA server isn't https/doesn't have any authorization around it. We have 
been using tika server for the last 2 years with a .net wrapper round it 
(basically the .net wrapper spawns tika server and passes it files to process 
with some extra logic around dwg files). 

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-20 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525286#comment-17525286
 ] 

Dan Coldrick commented on TIKA-3719:


[~tallison] 

I don't but is it maybe something like this which would have to be taken in 
combination with what I posted earlier?
{code:java}
KeyStoreType keystore = new KeyStoreType();
keystore.setType("JKS");
keystore.setPassword("1");
keystore.setResource("keystore.jks");
TrustManagersType tmt = new TrustManagersType();
tmt.setKeyStore(keystore);
TLSServerParameters parameters = new TLSServerParameters();
parameters.setTrustManagers(TLSParameterJaxBUtils.getTrustManagers(tmt,true)); 
{code}

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-20 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525293#comment-17525293
 ] 

Dan Coldrick commented on TIKA-3719:


Epic stuff [~tallison] 

I'll pull it down tomorrow and check it out (it's very late in the UK now). 
Thank you  :)

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3719:
---
Attachment: image-2022-04-21-18-52-50-706.png

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525914#comment-17525914
 ] 

Dan Coldrick commented on TIKA-3719:


Hi [~tallison] 

I don't think the trust store code I provided works, as soon as I set a trust 
store it errors.

INFO  [main] 18:50:17,420 org.apache.tika.server.core.TikaServerProcess 
Starting Apache Tika server
INFO  [main] 18:50:17,496 org.apache.tika.server.core.TikaServerProcess Using 
custom config: G:\git\tika-server-config-default.xml
ERROR [main] 18:50:23,995 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils Could not load keystore 
resource C:/Program Files/Java/jdk1.8.0_301/jre/lib/security/cacerts
ERROR [main] 18:50:23,995 org.apache.tika.server.core.TikaServerProcess Can't 
start: 
java.io.IOException: Could not load keystore resource C:/Program 
Files/Java/jdk1.8.0_301/jre/lib/security/cacerts
    at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getKeyStore(TLSParameterJaxBUtils.java:161)
 ~[cxf-core-3.5.1.jar:3.5.1]
    at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getTrustManagers(TLSParameterJaxBUtils.java:395)
 ~[cxf-core-3.5.1.jar:3.5.1]
    at 
org.apache.tika.server.core.TikaServerProcess.getTlsParams(TikaServerProcess.java:304)
 ~[classes/:?]
    at 
org.apache.tika.server.core.TikaServerProcess.initServer(TikaServerProcess.java:265)
 ~[classes/:?]
    at 
org.apache.tika.server.core.TikaServerProcess.main(TikaServerProcess.java:133) 
~[classes/:?]

 

If I ignore the trust manager the server starts. I'll do some digging around 
about the trust store but not 100% I'll be able to solve it.

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525947#comment-17525947
 ] 

Dan Coldrick edited comment on TIKA-3719 at 4/21/22 6:27 PM:
-

[~tallison] 

trustKeyStore.setFile and now it looks like the truststore works (well it loads 
anyway).

 
{code:java}
trustKeyStore.setFile(tlsConfig.getTrustStoreFile()); {code}
 

 


was (Author: monkmachine):
[~tallison] 

trustKeyStore.setFile and now it looks like the truststore works (well it loads 
anyway).

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525947#comment-17525947
 ] 

Dan Coldrick commented on TIKA-3719:


[~tallison] 

trustKeyStore.setFile and now it looks like the truststore works (well it loads 
anyway).

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3719:
---
Attachment: localhost.jks

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525950#comment-17525950
 ] 

Dan Coldrick commented on TIKA-3719:


Tim I've attached my noddy localhost keystore which contains a localhost 
private cert if it's of any use to you. Password is "1"

i.e. the number 1

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525960#comment-17525960
 ] 

Dan Coldrick commented on TIKA-3719:


I could get it working with my .pfx which is a pkcs12. I used powershell to 
generate it, again if the command is of any use here it is:

New-SelfSignedCertificate -CertStoreLocation Cert:\LocalMachine\My -DnsName 
"localhost" -FriendlyName "localhost" -NotAfter (Get-Date).AddYears(10)

I could then export it from windows certificate store to put in the jks I 
attached.
 

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525947#comment-17525947
 ] 

Dan Coldrick edited comment on TIKA-3719 at 4/21/22 6:35 PM:
-

[~tallison] 

trustKeyStore.setFile and now it looks like the truststore works (well it loads 
anyway).
{code:java}
trustKeyStore.setFile(tlsConfig.getTrustStoreFile()); {code}
 


was (Author: monkmachine):
[~tallison] 

trustKeyStore.setFile and now it looks like the truststore works (well it loads 
anyway).

 
{code:java}
trustKeyStore.setFile(tlsConfig.getTrustStoreFile()); {code}
 

 

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525982#comment-17525982
 ] 

Dan Coldrick commented on TIKA-3719:


Super(*)

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526006#comment-17526006
 ] 

Dan Coldrick commented on TIKA-3721:


So would it be possible to do it with Bentley viewer? Convert to PDF then parse 
with the PDF parser? 

[https://stackoverflow.com/questions/2560706/convert-dgn-to-pdf]

[https://communities.bentley.com/products/microstation/microstation_printing/f/printing-and-plotting-forum/104978/bentley-view-v8i-how-to-get-pdf-from-dgn-using-command-line]

 

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526010#comment-17526010
 ] 

Dan Coldrick commented on TIKA-3719:


Ok Tim, will get it checked over and let you know in the next day or so if 
that's ok. Many thanks for your effort, very much appreciated.

I'll open a separate ticket for the Key Store passwords as well :)

Thanks again :) (*)

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526098#comment-17526098
 ] 

Dan Coldrick commented on TIKA-3725:


[~tallison] [~nick] 

I definitely think Basic authorization is a good starting point, at least TIKA 
server would have some security around it which from a consumer point of view 
would allow to host TIKA server in a more secure way than what it currently is.

[~nick] [~tallison]  is it possible to reach out to the CXF devs in you Apache 
capacity to review the current way TIKA server is setup? Almost like a code 
review for best practice so it would be possible to use the CXF configuration 
files? I did notice whilst having a go with the SSL stuff if you drop a CXF.xml 
in the resources folder it appeared to spawn a separate jetty server but I 
don't have any idea how it works.

[https://cxf.apache.org/docs/secure-jax-rs-services.html]

 

> Add Authorization to Tika Server (Suggest Basic to start off with)
> --
>
> Key: TIKA-3725
> URL: https://issues.apache.org/jira/browse/TIKA-3725
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> I would be good to get some Authentication/Authorization added to TIKA server 
> to be able to add another layer of security around the Tika Server Rest 
> service.
> This could become a rabbit hole with the number of options available around 
> Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter 
> basic Auth is added. 
> How to store user(s)/password suggest looking at how other apache products do 
> the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526106#comment-17526106
 ] 

Dan Coldrick commented on TIKA-3725:


[~tallison] 

as per lots of previous comments you're a super(*)

:)

> Add Authorization to Tika Server (Suggest Basic to start off with)
> --
>
> Key: TIKA-3725
> URL: https://issues.apache.org/jira/browse/TIKA-3725
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> I would be good to get some Authentication/Authorization added to TIKA server 
> to be able to add another layer of security around the Tika Server Rest 
> service.
> This could become a rabbit hole with the number of options available around 
> Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter 
> basic Auth is added. 
> How to store user(s)/password suggest looking at how other apache products do 
> the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-21 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526113#comment-17526113
 ] 

Dan Coldrick commented on TIKA-3721:


Would it be possible to add DGN Detector? Looks like it's apache license?

https://mvnrepository.com/artifact/com.github.peeveen/tika-dgn-detector/0.4

https://github.com/peeveen/tika-dgn-detector

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526618#comment-17526618
 ] 

Dan Coldrick commented on TIKA-3721:


My looking at bentley viewer to convert via PDF looks like it's a no go due to 
licensing :(

 

Well done Tim on getting detector working.

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-22 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526620#comment-17526620
 ] 

Dan Coldrick commented on TIKA-3725:


I thought basic Auth was a good start, JWT will require a bit more 
configuration than just basic.

> Add Authorization to Tika Server (Suggest Basic to start off with)
> --
>
> Key: TIKA-3725
> URL: https://issues.apache.org/jira/browse/TIKA-3725
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> I would be good to get some Authentication/Authorization added to TIKA server 
> to be able to add another layer of security around the Tika Server Rest 
> service.
> This could become a rabbit hole with the number of options available around 
> Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter 
> basic Auth is added. 
> How to store user(s)/password suggest looking at how other apache products do 
> the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3721:
---
Attachment: image-2022-04-22-20-00-45-704.png

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-22-20-00-45-704.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3721:
---
Attachment: image-2022-04-22-20-01-09-564.png

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3721:
---
Attachment: image-2022-04-22-20-02-24-180.png

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526633#comment-17526633
 ] 

Dan Coldrick commented on TIKA-3721:


Yeah think our company already own the Aspose .net libraries but  we've found 
them to be quite hit and miss with how good they work. Was hoping to get 
something in tika but doesn't look possible. Think even adding the detector is 
a good start.

Tim assuming we still won't be able to pull the meta data out of the without a 
proper parser? I'm not 100% with this so am I right in thinking (Is this how 
other meta data parsers work) these are the properties that could be pulled out 
the file:

!image-2022-04-22-20-00-45-704.png!

If so I've set a couple:

!image-2022-04-22-20-01-09-564.png!

Then if I unzip the dgn (V8 version) I can see them in the Unzipped 
documentsummaryfile

!image-2022-04-22-20-02-24-180.png!

 

Would that be on the right lines you think to be able to get them out?

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526644#comment-17526644
 ] 

Dan Coldrick commented on TIKA-3721:


[~tallison]  You're above my ability with this lol, POI is the Microsoft 
library to pull stuff out of docx, xlsx etc? Looks like these might be similar 
in the fact they are zipped but not 100% sure how this all works, can just see 
the patterns. 

In Peeveen's github he has loads of examples, V7 doesn't look to work the same 
as V8 (where v8 is a zip a bit like docx/xslx etc):

[https://github.com/peeveen/tika-dgn-detector/tree/master/src/test/resources/dgn/dgn8]

[https://github.com/peeveen/tika-dgn-detector/tree/master/src/test/resources/dgn/dgn7]

 

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-22 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526651#comment-17526651
 ] 

Dan Coldrick commented on TIKA-3721:


Thanks Tim, I might have a play the weekend if I bored. 

Have a nice weekend.

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-23 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526803#comment-17526803
 ] 

Dan Coldrick commented on TIKA-3721:


Is this along the right lines?
{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.tika.parser.dgn;import java.io.IOException;
import java.io.InputStream;
import java.util.Collections;
import java.util.Set;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.input.CloseShieldInputStream;
import org.apache.poi.poifs.filesystem.DirectoryNode;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AbstractParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.microsoft.SummaryExtractor;
import org.apache.tika.sax.XHTMLContentHandler;/**
 * DGN (CAD Drawing) parser. This is a very basic parser, which just looks for
 * bits of the headers.
 */
public class DGNParser extends AbstractParser {    /**
     * 
     */
    private static final long serialVersionUID = 311571157668507304L;
    private static MediaType TYPE = MediaType.image("vnd.dgn");
    public Set getSupportedTypes(ParseContext context) {
        return Collections.singleton(TYPE);
    }
    public void parse(InputStream stream, ContentHandler handler, Metadata 
metadata, ParseContext context)
            throws IOException, SAXException, TikaException {        
XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
        xhtml.startDocument();
        SummaryExtractor summaryExtractor = new SummaryExtractor(metadata);
        final DirectoryNode root;
        TikaInputStream tstream = TikaInputStream.cast(stream);
        POIFSFileSystem mustCloseFs = null;
        try {
            if (tstream == null) {
                mustCloseFs = new POIFSFileSystem(new 
CloseShieldInputStream(stream));
                root = mustCloseFs.getRoot();
            } else {
                final Object container = tstream.getOpenContainer();
                if (container instanceof POIFSFileSystem) {
                    root = ((POIFSFileSystem) container).getRoot();
                } else if (container instanceof DirectoryNode) {
                    root = (DirectoryNode) container;
                } else {
                    POIFSFileSystem fs = null;
                    if (tstream.hasFile()) {
                        fs = new POIFSFileSystem(tstream.getFile(), true);
                    } else {
                        fs = new POIFSFileSystem(new 
CloseShieldInputStream(tstream));
                    }
                    // tstream will close the fs, no need to close this below
                    tstream.setOpenContainer(fs);
                    root = fs.getRoot();                }
            }            summaryExtractor.parseSummaries(root);        } 
finally {
            IOUtils.closeQuietly(mustCloseFs);
        }
        xhtml.endDocument();
    }
}
  {code}
I know I'm not handling v7's yet but it does appear to output v8's meta data at 
least? If we have it in it's own parser there is the option to extend for V7's? 
Again I'm not really a proper java developer and can just hack my way around to 
get stuff working so any feedback would be good?

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: Screenshot from 2022-04-22 16-03-44.png, 
> dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://doc

[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527706#comment-17527706
 ] 

Dan Coldrick commented on TIKA-3725:


[~tallison]  I see you've got some responses from the CXF guys :) Great news

> Add Authorization to Tika Server (Suggest Basic to start off with)
> --
>
> Key: TIKA-3725
> URL: https://issues.apache.org/jira/browse/TIKA-3725
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> I would be good to get some Authentication/Authorization added to TIKA server 
> to be able to add another layer of security around the Tika Server Rest 
> service.
> This could become a rabbit hole with the number of options available around 
> Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter 
> basic Auth is added. 
> How to store user(s)/password suggest looking at how other apache products do 
> the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527706#comment-17527706
 ] 

Dan Coldrick edited comment on TIKA-3725 at 4/25/22 6:34 PM:
-

[~tallison]  I see you've got some responses from the CXF guys :) Great news

Quick question is that thread only for apache people? i.e. not open to public?


was (Author: monkmachine):
[~tallison]  I see you've got some responses from the CXF guys :) Great news

> Add Authorization to Tika Server (Suggest Basic to start off with)
> --
>
> Key: TIKA-3725
> URL: https://issues.apache.org/jira/browse/TIKA-3725
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> I would be good to get some Authentication/Authorization added to TIKA server 
> to be able to add another layer of security around the Tika Server Rest 
> service.
> This could become a rabbit hole with the number of options available around 
> Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter 
> basic Auth is added. 
> How to store user(s)/password suggest looking at how other apache products do 
> the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527711#comment-17527711
 ] 

Dan Coldrick commented on TIKA-3719:


[~tallison]  Just stick something in confluence, that's where I get all my info 
(as user) from about tika server.

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527713#comment-17527713
 ] 

Dan Coldrick commented on TIKA-3719:


Would also say if you want help with documenting stuff in confluence I'd be 
happy to help

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527783#comment-17527783
 ] 

Dan Coldrick commented on TIKA-3719:


Hi [~tallison] 

Yes happy with beta, be really good if the CXF guys can have a review (which 
looks like they are going to) and extend to take cxf.xml files with all that 
entails. Honestly can't thank you enough for the help you've provided. :)

My Confluence name is Dan Coldrick

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527784#comment-17527784
 ] 

Dan Coldrick commented on TIKA-3719:


Also is it possible to link to confluence from the main tika page and make it 
stand out more? Confluence has a lot more detail than the main tika page which 
I've always found to be more useful (might also help I'm a massive fan of 
confluence)

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3731:
--

 Summary: Tika CAD DWG reader not pulling meta data from new cad 
files
 Key: TIKA-3731
 URL: https://issues.apache.org/jira/browse/TIKA-3731
 Project: Tika
  Issue Type: Bug
  Components: metadata
Affects Versions: 2.3.0
Reporter: Dan Coldrick


 

The tika DWG reader is only pulling meta data from up to drawing format AC1024  
(see code snippet) where it looks to be AC1027 & AC1032 can also be read from 
the same get2007and2010Props meta data extractor.
{code:java}

 switch (version) {
            case "AC1015":
                metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
                if (skipTo2000PropertyInfoSection(stream, header)) {
                    get2000Props(stream, metadata, xhtml);
                }
                break;
            case "AC1018":
                metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
                if (skipToPropertyInfoSection(stream, header)) {
                    get2004Props(stream, metadata, xhtml);
                }
                break;
            case "AC1021":
            case "AC1024":
                metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
                if (skipToPropertyInfoSection(stream, header)) {
                    get2007and2010Props(stream, metadata, xhtml);
                }
                break;
            default:
                throw new TikaException("Unsupported AutoCAD drawing version: " 
+ version);
        } {code}
Looks like the case statement just needs extending and for examples files to be 
created for AC1027/AC1032. 

Current versions of auto cad can be found here:

https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527786#comment-17527786
 ] 

Dan Coldrick commented on TIKA-3731:


related to https://issues.apache.org/jira/browse/TIKA-1735 but that looked to 
also try to include a parser so thought it would be good to split the two 
issues and get the bug fixed. 

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Major
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3731:
---
Attachment: testDWG-AC1027.dwg

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Major
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3731:
---
Attachment: AutoCAD 2018 format (1).dwg

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Major
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-25 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527787#comment-17527787
 ] 

Dan Coldrick commented on TIKA-3731:


I've attached a AC1027 and AC1032 dwg to extend the tests.

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Major
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-26 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528201#comment-17528201
 ] 

Dan Coldrick commented on TIKA-3719:


[~tallison]  https://issues.apache.org/jira/browse/TIKA-3737

 

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (TIKA-3737) Update Main Apache Tika Website with link to Tika confluence

2022-04-26 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3737:
--

 Summary: Update Main Apache Tika Website with link to Tika 
confluence
 Key: TIKA-3737
 URL: https://issues.apache.org/jira/browse/TIKA-3737
 Project: Tika
  Issue Type: Task
  Components: site
Affects Versions: 2.3.0
Reporter: Dan Coldrick


Update Main Apache Tika Website with link to Tika confluence, there is more 
detail on the confluence pages than on the main site. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files

2022-04-26 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528205#comment-17528205
 ] 

Dan Coldrick commented on TIKA-3731:


[~tallison]  Fine by me ref the custom metadata keys

> Tika CAD DWG reader not pulling meta data from new cad files
> 
>
> Key: TIKA-3731
> URL: https://issues.apache.org/jira/browse/TIKA-3731
> Project: Tika
>  Issue Type: Improvement
>  Components: metadata
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg
>
>
>  
> The tika DWG reader is only pulling meta data from up to drawing format 
> AC1024  (see code snippet) where it looks to be AC1027 & AC1032 can also be 
> read from the same get2007and2010Props meta data extractor.
> {code:java}
>  switch (version) {
>             case "AC1015":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipTo2000PropertyInfoSection(stream, header)) {
>                     get2000Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1018":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2004Props(stream, metadata, xhtml);
>                 }
>                 break;
>             case "AC1021":
>             case "AC1024":
>                 metadata.set(Metadata.CONTENT_TYPE, TYPE.toString());
>                 if (skipToPropertyInfoSection(stream, header)) {
>                     get2007and2010Props(stream, metadata, xhtml);
>                 }
>                 break;
>             default:
>                 throw new TikaException("Unsupported AutoCAD drawing version: 
> " + version);
>         } {code}
> Looks like the case statement just needs extending and for examples files to 
> be created for AC1027/AC1032. 
> Current versions of auto cad can be found here:
> https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-26 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528203#comment-17528203
 ] 

Dan Coldrick commented on TIKA-3719:


Thanks for the confluence update writes as well :)

> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-26 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528242#comment-17528242
 ] 

Dan Coldrick commented on TIKA-3721:


[~tallison]  can you just check that the xhtml comes out of my code? I wasn't 
sure how that worked. Also probably want to add a test to check it works? Like 
I said in my comment was asking for feed back really lol :)

Can I have a simple check of the process to update tika's GIT? Is it roughly:
 * Create a Jira 
 * Create a Branch using the Jira number
 * Create a pull request using the new branch
 * Someone who knows what they are doing either approves the code, rejects the 
code or fixes the code so it can be merged into main

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: Screenshot from 2022-04-22 16-03-44.png, 
> dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3721) DGN parser

2022-04-26 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528262#comment-17528262
 ] 

Dan Coldrick commented on TIKA-3721:


Thanks Tim

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: Screenshot from 2022-04-22 16-03-44.png, 
> dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3742:
--

 Summary: Advice around DGN7 parser and whether to add to TIKA
 Key: TIKA-3742
 URL: https://issues.apache.org/jira/browse/TIKA-3742
 Project: Tika
  Issue Type: Task
  Components: parser
Reporter: Dan Coldrick


Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for DGN7 
which produces an dgndump.exe which will dump all the data from the DGN. From 
my initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Attachment: DGN.zip

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529023#comment-17529023
 ] 

Dan Coldrick commented on TIKA-3742:


 
{code:java}
package org.apache.tika.parser.dgn;import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Set;import org.apache.commons.compress.utils.IOUtils;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AbstractParser;
import org.apache.tika.parser.ParseContext;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;public 

class DGN7Parser extends AbstractParser {    

private static final long serialVersionUID = 7609445358323296566L;    

Set SUPPORTED_TYPES = 
Collections.singleton(MediaType.image("vnd.dgn; version=7"));    

@Override
    public Set getSupportedTypes(ParseContext context) {
        return SUPPORTED_TYPES;
    }    @Override
    public void parse(InputStream stream, ContentHandler handler, Metadata 
metadata, ParseContext context)
            throws IOException, TikaException, SAXException {
        File file = new File("G:/temp/Drawing.dgn");
        try (OutputStream outputStream = new FileOutputStream(file)) {
            IOUtils.copy(stream, outputStream);
        }
        Runtime rt = Runtime.getRuntime();
        String[] commands = {"C:\\Users\\monkm\\DGN\\dgndump.exe","-r","1", 
"G:\\temp\\Drawing.dgn"};
        Process proc = rt.exec(commands);        

BufferedReader stdInput = new BufferedReader(new 
             InputStreamReader(proc.getInputStream()));        
BufferedReader stdError = new BufferedReader(new 
             InputStreamReader(proc.getErrorStream()));
        
        ArrayList ar = new ArrayList();

        String s = null;
        while ((s = stdInput.readLine()) != null) {
            if(s.startsWith("  string = \"")) {
                ar.add(s.substring(12, s.length()-1).trim());
            }
            System.out.println(s);
        }
            System.out.println(ar);
        while ((s = stdError.readLine()) != null) {
            System.out.println(s);
        }
    }}
  {code}
 

 

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Attachment: ExampleOutput.txt

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Description: 
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library 
([http://dgnlib.maptools.org/|http://dgnlib.maptools.org/)] for DGN7 which 
produces an dgndump.exe which will dump all the data from the DGN. From my 
initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).

  was:
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for DGN7 
which produces an dgndump.exe which will dump all the data from the DGN. From 
my initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).


> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library 
> ([http://dgnlib.maptools.org/|http://dgnlib.maptools.org/)] for DGN7 which 
> produces an dgndump.exe which will dump all the data from the DGN. From my 
> initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Description: 
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library 
[http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which 
produces an dgndump.exe which will dump all the data from the DGN. From my 
initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).

  was:
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library 
([http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which 
produces an dgndump.exe which will dump all the data from the DGN. From my 
initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).


> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library 
> [http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which 
> produces an dgndump.exe which will dump all the data from the DGN. From my 
> initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Description: 
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library 
([http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which 
produces an dgndump.exe which will dump all the data from the DGN. From my 
initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).

  was:
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library 
([http://dgnlib.maptools.org/|http://dgnlib.maptools.org/)] for DGN7 which 
produces an dgndump.exe which will dump all the data from the DGN. From my 
initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).


> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library 
> ([http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which 
> produces an dgndump.exe which will dump all the data from the DGN. From my 
> initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Description: 
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for DGN7 
which produces an dgndump.exe which will dump all the data from the DGN. From 
my initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).

  was:
Hi [~tallison] & Whoever else. 

I managed to compile the C/C++ library 
[http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which 
produces an dgndump.exe which will dump all the data from the DGN. From my 
initial testing it looks pretty good. 

Would you guys think it was worth adding this or just keep it as a custom 
parser rather than in the main source code? It's under MIT license. I've 
attached the exe (zipped), a copy of the output from the dump and my very dirty 
testing calling the exe (my code I was only interested in the Strings so am 
only pulling those into a string array at the moment to check it's pulling out 
the correct data).


> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529023#comment-17529023
 ] 

Dan Coldrick edited comment on TIKA-3742 at 4/27/22 8:09 PM:
-

 
{code:java}
package org.apache.tika.parser.dgn;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Set;
import org.apache.commons.compress.utils.IOUtils;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AbstractParser;
import org.apache.tika.parser.ParseContext;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

public class DGN7Parser extends AbstractParser {    

private static final long serialVersionUID = 7609445358323296566L;    

Set SUPPORTED_TYPES = 
Collections.singleton(MediaType.image("vnd.dgn; version=7"));    

@Override
    public Set getSupportedTypes(ParseContext context) {
        return SUPPORTED_TYPES;
    }    @Override
    public void parse(InputStream stream, ContentHandler handler, Metadata 
metadata, ParseContext context)
            throws IOException, TikaException, SAXException {
        File file = new File("G:/temp/Drawing.dgn");
        try (OutputStream outputStream = new FileOutputStream(file)) {
            IOUtils.copy(stream, outputStream);
        }
        Runtime rt = Runtime.getRuntime();
        String[] commands = {"C:\\Users\\monkm\\DGN\\dgndump.exe","-r","1", 
"G:\\temp\\Drawing.dgn"};
        Process proc = rt.exec(commands);        

BufferedReader stdInput = new BufferedReader(new 
             InputStreamReader(proc.getInputStream()));        
BufferedReader stdError = new BufferedReader(new 
             InputStreamReader(proc.getErrorStream()));
        
        ArrayList ar = new ArrayList();

        String s = null;
        while ((s = stdInput.readLine()) != null) {
            if(s.startsWith("  string = \"")) {
                ar.add(s.substring(12, s.length()-1).trim());
            }
            System.out.println(s);
        }
            System.out.println(ar);
        while ((s = stdError.readLine()) != null) {
            System.out.println(s);
        }
    }}
  {code}
 

 


was (Author: monkmachine):
 
{code:java}
package org.apache.tika.parser.dgn;import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Set;import org.apache.commons.compress.utils.IOUtils;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AbstractParser;
import org.apache.tika.parser.ParseContext;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;public 

class DGN7Parser extends AbstractParser {    

private static final long serialVersionUID = 7609445358323296566L;    

Set SUPPORTED_TYPES = 
Collections.singleton(MediaType.image("vnd.dgn; version=7"));    

@Override
    public Set getSupportedTypes(ParseContext context) {
        return SUPPORTED_TYPES;
    }    @Override
    public void parse(InputStream stream, ContentHandler handler, Metadata 
metadata, ParseContext context)
            throws IOException, TikaException, SAXException {
        File file = new File("G:/temp/Drawing.dgn");
        try (OutputStream outputStream = new FileOutputStream(file)) {
            IOUtils.copy(stream, outputStream);
        }
        Runtime rt = Runtime.getRuntime();
        String[] commands = {"C:\\Users\\monkm\\DGN\\dgndump.exe","-r","1", 
"G:\\temp\\Drawing.dgn"};
        Process proc = rt.exec(commands);        

BufferedReader stdInput = new BufferedReader(new 
             InputStreamReader(proc.getInputStream()));        
BufferedReader stdError = new BufferedReader(new 
             InputStreamReader(proc.getErrorStream()));
        
        ArrayList ar = new ArrayList();

        String s = null;
        while ((s = stdInput.readLine()) != null) {
            if(s.startsWith("  string = \"")) {
                ar.add(s.substring(12, s.length()-1).trim());
            }
            System.out.println(s);
        }
            System.out.println(ar);
        while ((s = stdError.readLine()) != null) {
            System.out.println(s);
        }
    }}
  {code}
 

 

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues

[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529033#comment-17529033
 ] 

Dan Coldrick commented on TIKA-3742:


[~nick] 

Apologies, new to all this. Can you point me at some documentation? External 
parsers assume they don't exist in the main TIKA GIT and you have another repo 
just for that parser that users can add in? Or does it work differently?

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529037#comment-17529037
 ] 

Dan Coldrick commented on TIKA-3742:


[~tallison]  I struggle to get out the bed in the morning let alone read C/C++ 
and convert it to Java. I can make out what's it's doing but no idea how it 
does the bytes read stuff which is really how the underlying bits work. I can 
see how in the file there are the element types but again no idea how they are 
mapped to the bytes, I've never had any dealings with C/C++.

I was happy after an hour pissing about on google I managed to get it to 
compile (on Windows) :D

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529042#comment-17529042
 ] 

Dan Coldrick commented on TIKA-3742:


[~tallison]  got a link to that or an example?

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-27 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529044#comment-17529044
 ] 

Dan Coldrick commented on TIKA-3742:


lol, you posted before I responded

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529409#comment-17529409
 ] 

Dan Coldrick commented on TIKA-3742:


[~nick]  I can have a go although I can't get the following line to compile in 
eclipse:

byte[] str = is.readNBytes(len);

 

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529667#comment-17529667
 ] 

Dan Coldrick commented on TIKA-3742:


[~nick]  I've made a start today which I can share at some point tomorrow (been 
to the pub tonight lol so will have to wait till tomorrow ), are you ok if I 
lean on you 2 for help? I'd rather write something myself which you can rip 
apart so I can learn something. I've learnt a lot in the last week or so 
already :)

 

I also think there is some meta data in there somewhere which we should be able 
to pull out :)

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-28 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529667#comment-17529667
 ] 

Dan Coldrick edited comment on TIKA-3742 at 4/29/22 6:08 AM:
-

[~nick]  I've made a start today which I can share at some point tomorrow , are 
you ok if I lean on you 2 for help? I'd rather write something myself which you 
can rip apart so I can learn something. I've learnt a lot in the last week or 
so already :)

 

I also think there is some meta data in there somewhere which we should be able 
to pull out :)


was (Author: monkmachine):
[~nick]  I've made a start today which I can share at some point tomorrow (been 
to the pub tonight lol so will have to wait till tomorrow ), are you ok if I 
lean on you 2 for help? I'd rather write something myself which you can rip 
apart so I can learn something. I've learnt a lot in the last week or so 
already :)

 

I also think there is some meta data in there somewhere which we should be able 
to pull out :)

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-30 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530448#comment-17530448
 ] 

Dan Coldrick commented on TIKA-3742:


Hi [~nick] 

I'm struggling, I can see there are deletes which I want to exclude from the 
parser but can't work out how to in Java.

I can see come out in DGN dump with a deleted attribute:

 
{code:java}
Element:Text         Level:27 id:19707  (DELETED) 
  offset=1959730  size=74 bytes
  graphic_group:0   color:0 weight:0 style:0
  properties=1536,MODIFIED,NEW
  origin=(963453.83000,96730.11000), rotation=272.763292
  font=1, just=2, length_mult=119.99, height_mult=119.99
  string = "HARVARD     RD" {code}
 I can see in the core element structure it should be there:

 

 
{code:java}
The first 18 words of an element in the design file are its fixed header -- 
 containing the element type, level, words to follow, and range 
 information. The C declaration for this header is as follows
 
   typedef struct
      {
      unsigned          level:6              ;            /* level element is 
on */
      unsigned          :1                   ;           /* reserved */
      unsigned          complex:1            ;          /* component of complex 
elem.*/
      unsigned          type:7               ;          /* type of element */
      unsigned          deleted:1            ;          /* set if element is 
deleted */
      unsigned short             words       ;           /* words to follow in 
element */
      unsigned long           xlow           ;            /* element range - 
low */
      unsigned long           ylow           ;
      unsigned long           zlow           ;
      unsigned long           xhigh          ;           /* element range - 
high */
      unsigned long           yhigh          ;
      unsigned long           zhigh          ;
      } Elm_hdr         {code}
 

 

You get the type out (which I think is from the same header structure)
{code:java}
int h2 = tstream.read() ;
int type = h2 & 0x7f; {code}
How do I get the deleted attribute out so I can remove it from the parse 
content? Also you said about type 37, I don't have any examples where we have 
type 37 elements.

 

I've created a fork and created some dirty code to test in:

[https://github.com/monkmachine/tika/tree/TIKA-3742/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/main/java/org/apache/tika/parser/dgn]

 

 

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-30 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530457#comment-17530457
 ] 

Dan Coldrick commented on TIKA-3742:


Is this correct for working out the deletion? If it is I might actually 
understand how its working a bit more!
{code:java}
boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code}

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-30 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530457#comment-17530457
 ] 

Dan Coldrick edited comment on TIKA-3742 at 4/30/22 9:32 PM:
-

Is this correct for working out the deletion? If it is I might actually 
understand how its working a bit more!
{code:java}
boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code}
Edit:

That looks to have done the trick. Next issue is I'm getting different values 
out compared to dgn dump (and when I look at the file using MicroStation) 

An example is in 

I get out 1264t.dgn (I've attached to JIRA)

INDIAN     TRAIL     RD]

where as it should just be

INDIAN     TRAIL     RD


was (Author: monkmachine):
Is this correct for working out the deletion? If it is I might actually 
understand how its working a bit more!
{code:java}
boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code}

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-30 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530457#comment-17530457
 ] 

Dan Coldrick edited comment on TIKA-3742 at 4/30/22 9:32 PM:
-

Is this correct for working out the deletion? If it is I might actually 
understand how its working a bit more!
{code:java}
boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code}
Edit:

That looks to have done the trick. Next issue is I'm getting different values 
out compared to dgn dump (and when I look at the file using MicroStation) 

An example is in 

I get out 1264t.dgn (I've attached to JIRA)

INDIAN     TRAIL     RD]

where as it should just be

INDIAN     TRAIL     RD

I've got multiple examples of these.


was (Author: monkmachine):
Is this correct for working out the deletion? If it is I might actually 
understand how its working a bit more!
{code:java}
boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code}
Edit:

That looks to have done the trick. Next issue is I'm getting different values 
out compared to dgn dump (and when I look at the file using MicroStation) 

An example is in 

I get out 1264t.dgn (I've attached to JIRA)

INDIAN     TRAIL     RD]

where as it should just be

INDIAN     TRAIL     RD

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-04-30 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3742:
---
Attachment: 1264t.dgn

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2022-05-03 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531396#comment-17531396
 ] 

Dan Coldrick commented on TIKA-1570:


[~tallison] 

Can we bring this PR back to life? I think it would be great to create a 
windows installer for TIKA server?

> Seeking a stop method for better use with Apache Commons Daemon
> ---
>
> Key: TIKA-1570
> URL: https://issues.apache.org/jira/browse/TIKA-1570
> Project: Tika
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 1.7
>Reporter: Jason Borg
>Priority: Minor
>
> I've got tika-server-1.7.jar from http://tika.apache.org/download.html
> I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon 
> from http://commons.apache.org/proper/commons-daemon/binaries.html
> I can get Tika started as a service, but I can't determine what to use for a 
> stop method.
> prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath 
> "C:\Tika Service\tika-server-1.7.jar" --StartClass 
> "org.apache.tika.server.TikaServerCli" --StopClass 
> "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main 
> --Description "Tika Daemon Windows Service" --StartMode java --StopMode java
> This starts, and works as I'd hope, but when trying to stop the service it 
> doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] 
> args) isn't a suitable stop method, but I'm lost for alternatives.
> Using Daemon in exe mode works for start, but gives inconsistent results for 
> stop. Adding a stop method to Tika would be ideal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3753:
--

 Summary: Move Parsers pages in Confluence under the home page
 Key: TIKA-3753
 URL: https://issues.apache.org/jira/browse/TIKA-3753
 Project: Tika
  Issue Type: Improvement
  Components: documentation
Reporter: Dan Coldrick


Move Parsers pages in Confluence under the home page, at the moment it's quite 
hard to find the parsers info as the don't show on the homepage in the tree. 
Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3753:
---
Attachment: screenshot-1.png

> Move Parsers pages in Confluence under the home page
> 
>
> Key: TIKA-3753
> URL: https://issues.apache.org/jira/browse/TIKA-3753
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> Move Parsers pages in Confluence under the home page, at the moment it's 
> quite hard to find the parsers info as the don't show on the homepage in the 
> tree. Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3753:
---
Attachment: screenshot-2.png

> Move Parsers pages in Confluence under the home page
> 
>
> Key: TIKA-3753
> URL: https://issues.apache.org/jira/browse/TIKA-3753
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Move Parsers pages in Confluence under the home page, at the moment it's 
> quite hard to find the parsers info as the don't show on the homepage in the 
> tree. Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533968#comment-17533968
 ] 

Dan Coldrick commented on TIKA-3753:


This is where you come in to confluence via google or the website:

!screenshot-2.png!

 

As you can see there is no way to see these pages:

!screenshot-1.png!

> Move Parsers pages in Confluence under the home page
> 
>
> Key: TIKA-3753
> URL: https://issues.apache.org/jira/browse/TIKA-3753
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Move Parsers pages in Confluence under the home page, at the moment it's 
> quite hard to find the parsers info as the don't show on the homepage in the 
> tree. Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533968#comment-17533968
 ] 

Dan Coldrick edited comment on TIKA-3753 at 5/9/22 7:09 PM:


This is where you come in to confluence via google or the website, suggest a 
parent page is put under the home page called parsers (or something similar) 
and the child pages detailing the parsers is put under there.

!screenshot-2.png!

 

As you can see there is no way to see these pages:

!screenshot-1.png!


was (Author: monkmachine):
This is where you come in to confluence via google or the website:

!screenshot-2.png!

 

As you can see there is no way to see these pages:

!screenshot-1.png!

> Move Parsers pages in Confluence under the home page
> 
>
> Key: TIKA-3753
> URL: https://issues.apache.org/jira/browse/TIKA-3753
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Move Parsers pages in Confluence under the home page, at the moment it's 
> quite hard to find the parsers info as the don't show on the homepage in the 
> tree. Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533981#comment-17533981
 ] 

Dan Coldrick commented on TIKA-3753:


[~tallison]  you ok if I do it and you can have a quick review?

> Move Parsers pages in Confluence under the home page
> 
>
> Key: TIKA-3753
> URL: https://issues.apache.org/jira/browse/TIKA-3753
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Move Parsers pages in Confluence under the home page, at the moment it's 
> quite hard to find the parsers info as the don't show on the homepage in the 
> tree. Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3753) Move Parsers pages in Confluence under the home page

2022-05-09 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533999#comment-17533999
 ] 

Dan Coldrick commented on TIKA-3753:


[~tallison]  New Page created with page tree underneath, I've added what I 
think should be in there but you may have a different opinion/may want to add 
other pages?

https://cwiki.apache.org/confluence/display/TIKA/Parsers

> Move Parsers pages in Confluence under the home page
> 
>
> Key: TIKA-3753
> URL: https://issues.apache.org/jira/browse/TIKA-3753
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> Move Parsers pages in Confluence under the home page, at the moment it's 
> quite hard to find the parsers info as the don't show on the homepage in the 
> tree. Suggest they are moved under the home tree under a new page?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-05-11 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535102#comment-17535102
 ] 

Dan Coldrick commented on TIKA-1735:


Hi [~tallison] 

Sorry to bother you, I've create a fork of this and managed to pull out all of 
the text from the json. Can you have quick review to see if I'm going along the 
right lines? I think I've tried to make it so you can pass the config via the 
tika-config.xl parser section. Think there is still a lot to do to finish it 
off such as making it resilient (i.e. checking the exe is there), sorting the 
formatting (objects of type mtext have formatting 
[https://www.cadforum.cz/en/text-formatting-codes-in-mtext-objects-tip8640|https://www.cadforum.cz/en/text-formatting-codes-in-mtext-objects-tip8640),]
 of which I've done a few of them), adding proper test cases, think I also need 
to add something around timeout of the creation of the .json by dwgread (i.e. 
kill the spawned process after x amount of time (probably config)) 

I'd really appreciate some feedback as like I said before this is all quite new 
to me:

[https://github.com/monkmachine/tika/tree/DWGRead]

If you don't have time please let me know and I'll go find someone else to 
annoy ;)

 

Thanks

 

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: Bug
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.ext

[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-05-12 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535912#comment-17535912
 ] 

Dan Coldrick commented on TIKA-1735:


[~tallison] 

Hi I've opened [https://github.com/apache/tika/pull/558]

If you could have a quick scan and review I'd appreciate it , like I say not 
finished but want to check I'm going down the right route.

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: Bug
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-05-12 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536356#comment-17536356
 ] 

Dan Coldrick commented on TIKA-3725:


Hi [~tallison] 

Seen you've had some responses :)

What are the disadvantages of adding spring? What would the advantages be? 
Assume it adds quite a lot of complication but brings a load of benefits(but 
maybe complications)?

Would it be possible to drive Tika Server forward with spring by allowing more 
configuration (installation as a service, SSL, Authorization, whitelists etc)? 
To me the Rest Api's offer so much as a generic service.

> Add Authorization to Tika Server (Suggest Basic to start off with)
> --
>
> Key: TIKA-3725
> URL: https://issues.apache.org/jira/browse/TIKA-3725
> Project: Tika
>  Issue Type: New Feature
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
>
> I would be good to get some Authentication/Authorization added to TIKA server 
> to be able to add another layer of security around the Tika Server Rest 
> service.
> This could become a rabbit hole with the number of options available around 
> Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter 
> basic Auth is added. 
> How to store user(s)/password suggest looking at how other apache products do 
> the same?  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2022-05-13 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536684#comment-17536684
 ] 

Dan Coldrick commented on TIKA-1570:


[~tallison] 

Many thanks, will try to look at this next week :)

> Seeking a stop method for better use with Apache Commons Daemon
> ---
>
> Key: TIKA-1570
> URL: https://issues.apache.org/jira/browse/TIKA-1570
> Project: Tika
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 1.7
>Reporter: Jason Borg
>Priority: Minor
> Fix For: 2.4.1
>
>
> I've got tika-server-1.7.jar from http://tika.apache.org/download.html
> I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon 
> from http://commons.apache.org/proper/commons-daemon/binaries.html
> I can get Tika started as a service, but I can't determine what to use for a 
> stop method.
> prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath 
> "C:\Tika Service\tika-server-1.7.jar" --StartClass 
> "org.apache.tika.server.TikaServerCli" --StopClass 
> "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main 
> --Description "Tika Daemon Windows Service" --StartMode java --StopMode java
> This starts, and works as I'd hope, but when trying to stop the service it 
> doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] 
> args) isn't a suitable stop method, but I'm lost for alternatives.
> Using Daemon in exe mode works for start, but gives inconsistent results for 
> stop. Adding a stop method to Tika would be ideal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-05-17 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538400#comment-17538400
 ] 

Dan Coldrick commented on TIKA-1735:


Apologies [~tallison] 

Not been able to look at the regexes properly today, I have run a load of 
documents through (100+) and found a few more formatting tags:

\pxqc; = centered
\pxqr;= right
\pxql; = left

 

Also wasted 2 hours trying to work out why the json was coming out invalid 
against some, turns out some idiot (me) had been replacing "nan" with "" 
think I should replace NaN with null instead of 0?

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: Bug
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA

2022-05-18 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539019#comment-17539019
 ] 

Dan Coldrick commented on TIKA-3742:


[~nick] any advice? I'm stuck on the random chars at the moment with this one 
so any help would be appreciated :)

> Advice around DGN7 parser and whether to add to TIKA
> 
>
> Key: TIKA-3742
> URL: https://issues.apache.org/jira/browse/TIKA-3742
> Project: Tika
>  Issue Type: Task
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt
>
>
> Hi [~tallison] & Whoever else. 
> I managed to compile the C/C++ library [http://dgnlib.maptools.org/]  for 
> DGN7 which produces an dgndump.exe which will dump all the data from the DGN. 
> From my initial testing it looks pretty good. 
> Would you guys think it was worth adding this or just keep it as a custom 
> parser rather than in the main source code? It's under MIT license. I've 
> attached the exe (zipped), a copy of the output from the dump and my very 
> dirty testing calling the exe (my code I was only interested in the Strings 
> so am only pulling those into a string array at the moment to check it's 
> pulling out the correct data).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2022-05-19 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539809#comment-17539809
 ] 

Dan Coldrick commented on TIKA-1570:


[~tallison] I've tested and it works, I've created a WIP page in confluence on 
how I got it to install as a Windows service.

I needed a break from DWG's so picked this up instead :) Feel free to butcher 
my confluence page:

[https://cwiki.apache.org/confluence/display/TIKA/TikaServer+Windows+Service+-+WIP]

 

> Seeking a stop method for better use with Apache Commons Daemon
> ---
>
> Key: TIKA-1570
> URL: https://issues.apache.org/jira/browse/TIKA-1570
> Project: Tika
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 1.7
>Reporter: Jason Borg
>Priority: Minor
> Fix For: 2.4.1
>
>
> I've got tika-server-1.7.jar from http://tika.apache.org/download.html
> I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon 
> from http://commons.apache.org/proper/commons-daemon/binaries.html
> I can get Tika started as a service, but I can't determine what to use for a 
> stop method.
> prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath 
> "C:\Tika Service\tika-server-1.7.jar" --StartClass 
> "org.apache.tika.server.TikaServerCli" --StopClass 
> "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main 
> --Description "Tika Daemon Windows Service" --StartMode java --StopMode java
> This starts, and works as I'd hope, but when trying to stop the service it 
> doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] 
> args) isn't a suitable stop method, but I'm lost for alternatives.
> Using Daemon in exe mode works for start, but gives inconsistent results for 
> stop. Adding a stop method to Tika would be ideal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-3523) A replacement for enableFileUrl or Support for Google Cloud

2022-05-20 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539958#comment-17539958
 ] 

Dan Coldrick commented on TIKA-3523:


[~tallison] 

that error is probably because the command needs to be in quotes? Probably look 
for C:\Program Files\xx where Program Files has a space in so the parameter 
is being split into "C:\Program" and "Files\x"

"C:\Program Files\"

Not interested in this ticket but randomly saw the error.

> A replacement for enableFileUrl or Support for Google Cloud
> ---
>
> Key: TIKA-3523
> URL: https://issues.apache.org/jira/browse/TIKA-3523
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.0.0
>Reporter: Fatih Pazarbasi
>Priority: Minor
>
> Hello,
> I have a setup where users upload their files to a cloud bucket and I forward 
> the fileUrl to make ocr on them in a serverless cloud instance. I do it this 
> way so the users do not contact with the Tika Server and I have a copy of 
> what they've sent to process it. Also they have nothing to do with the 
> unprocessed response.
> Now that you've removed the enableFileUrl... I have to download the files to 
> the backend instance from the cloud bucket they have uploaded their files to, 
> and put them to /tika server back again...
> I tried the following config.xml to work around the situation but it was in 
> vain...
>   For the made up url: 
> [https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/]
> {code:java}
>  
>   
>
>fsf 
>
> https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o
>  
>
>   
>  
>  
>   
>
>fse 
>gs://abcd-efgh.appspot.com/users 
>
>   
>  
>  
>   
>   true 
>   
>  
>  
>   
>   /path/to/tika-config.xml 
>   
> {code}
> {code:java}
> headers: { 
> Accept: 'text/plain', 
> 'User-Agent': 'Firebase Functions', 
> fetcherName: 'fsf', 
> fetchKey: 'somefilethatdoesnotexist.pdf',   
> },{code}
> It doesn't support the gs:// Google Storage bucket either. I have all the 
> necessary permissions but it didn't help. I'm using a dockerized version of 
> tika server, so the file System does not seem to be my concern...
>   
>  In the golden times of 1.2x Iwas simply using:
>   
> {code:java}
> headers: {   
> Accept: 'text/plain',   
> 'User-Agent': 'Firebase Functions',   
> fileUrl: 
> 'https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf',
>  
> },{code}
>  
>   
>  Am I missing something? If not my wish is that can you please make it so 
> that fetchName is the definitive  first part of the old fileUrl and fetchKey 
> is the specific pointer to a file?
> This way I have control over the urls that's been sent to tika server to some 
> extend, unlike enableFileUrl and also eat my cake without creating extra 
> traffic on the backend by downloading from the bucket and uploading to tika. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-05-24 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541833#comment-17541833
 ] 

Dan Coldrick commented on TIKA-1735:


[~tallison] Just so you know this isn't dead, we're still testing dwgread over 
the next week or so, I'm on holiday from tomorrow so will pick this back up 
when I'm back.

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: Bug
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-10-04 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612781#comment-17612781
 ] 

Dan Coldrick commented on TIKA-1735:


[~tallison]  Apologies I've been missing for a few months, got pulled onto 
another project which has consumed pretty much all my time.  We've got some 
testing to do this week but think we're really close to being able to say you 
can merge this pull request. Is there anything you would need as I've done lots 
of commits and playing about? I'm unsure how you take all those commits and 
clean them up into one nice commit?

I think sometime next week I could give the go ahead once we've finished 
testing to merge?

 

Let me know what you need?

 

Thanks

 

Dan

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: Bug
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-10-04 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612801#comment-17612801
 ] 

Dan Coldrick commented on TIKA-1735:


[~tallison]  Happy with that, can you give me until tomorrow night (UK time)? 
I've got some testing I can do with my test team and come back to you?

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: New Feature
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-10-05 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613098#comment-17613098
 ] 

Dan Coldrick commented on TIKA-1735:


[~tallison] All good, happy for it to be merged. I've only tested on windows, 
I've added some tests for the config to test if DWG read is installed to run it 
(the test check whether the DWGRead can be found and if it can't abandons the 
tests), not sure if on your build server you want to install it and have the 
tests run? Leave that up to you.

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: New Feature
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2022-10-05 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613100#comment-17613100
 ] 

Dan Coldrick commented on TIKA-1735:


[~tallison]  should also say thank you for your help :) We've had some real 
good success with this.

> Unsupported AutoCAD drawing version: AC1027
> ---
>
> Key: TIKA-1735
> URL: https://issues.apache.org/jira/browse/TIKA-1735
> Project: Tika
>  Issue Type: New Feature
>Reporter: Luca Perico
>Priority: Major
> Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "
> 
> 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:497)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>   at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>   ... 27 more
> 500
> "



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser

2022-10-17 Thread Dan Coldrick (Jira)
Dan Coldrick created TIKA-3883:
--

 Summary: Fixes for Parsing DWG files using DWG read Parser
 Key: TIKA-3883
 URL: https://issues.apache.org/jira/browse/TIKA-3883
 Project: Tika
  Issue Type: Bug
  Components: parser
Reporter: Dan Coldrick


We have identified a couple of problems with parsing the JSON produced by DWG 
Read. This ticket is to Jira is to fix those issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser

2022-10-17 Thread Dan Coldrick (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Coldrick updated TIKA-3883:
---
Description: We have identified a couple of problems with parsing the JSON 
produced by DWG Read. This Jira is to fix those issues  (was: We have 
identified a couple of problems with parsing the JSON produced by DWG Read. 
This ticket is to Jira is to fix those issues)

> Fixes for Parsing DWG files using DWG read Parser
> -
>
> Key: TIKA-3883
> URL: https://issues.apache.org/jira/browse/TIKA-3883
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
>
> We have identified a couple of problems with parsing the JSON produced by DWG 
> Read. This Jira is to fix those issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser

2022-10-17 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619052#comment-17619052
 ] 

Dan Coldrick commented on TIKA-3883:


[~tallison] unfortunately not, none of our team have access to create DWG's and 
I can't use use the ones we have used in our testing. We had 3 DWG files fail 
out of circa 12k and this code change fixed those, not sure if that's 
acceptable to you?

> Fixes for Parsing DWG files using DWG read Parser
> -
>
> Key: TIKA-3883
> URL: https://issues.apache.org/jira/browse/TIKA-3883
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
>
> We have identified a couple of problems with parsing the JSON produced by DWG 
> Read. This Jira is to fix those issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser

2022-10-17 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619060#comment-17619060
 ] 

Dan Coldrick commented on TIKA-3883:


Thanks [~tallison] 

> Fixes for Parsing DWG files using DWG read Parser
> -
>
> Key: TIKA-3883
> URL: https://issues.apache.org/jira/browse/TIKA-3883
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Reporter: Dan Coldrick
>Priority: Minor
>
> We have identified a couple of problems with parsing the JSON produced by DWG 
> Read. This Jira is to fix those issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)