[jira] [Created] (TIKA-3721) DGN parser
Dan Coldrick created TIKA-3721: -- Summary: DGN parser Key: TIKA-3721 URL: https://issues.apache.org/jira/browse/TIKA-3721 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 2.3.0 Reporter: Dan Coldrick Does anyone have any experience with the DGN file format by MicroStation? I see TIKA doesn't have a parser so would it be possible to create one? https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524588#comment-17524588 ] Dan Coldrick commented on TIKA-3719: Hi [~tallison] I'm far from being a java developer so not sure how much further I can help but how about adding some parameter to the xml config file? Something like: {code:java} true JKS 1 c:/temp/keystore.jks JKS 1 c:/temp/keystore.jks {code} Also holding keystore passwords in clear text doesn't feel right to me so might have to do something around encrypting them somehow. Next step would also to add some Authorization (Basic Auth would be a good start :) ) to the server but maybe that would be a separate feature? Would that be worthwhile raising? > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525225#comment-17525225 ] Dan Coldrick commented on TIKA-3719: Hi [~tallison] Yes that sounds like a great idea, think it should be optional where it's either hard coded in the config or with the ability to get it from an Environment variable. Any thoughts about adding Authorization? Separate JIRA? Personally would be really good to get to a stage where tika server can be a fully hosted rest service with Auth and some security around it. > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
Dan Coldrick created TIKA-3725: -- Summary: Add Authorization to Tika Server (Suggest Basic to start off with) Key: TIKA-3725 URL: https://issues.apache.org/jira/browse/TIKA-3725 Project: Tika Issue Type: New Feature Components: tika-server Affects Versions: 2.3.0 Reporter: Dan Coldrick I would be good to get some Authentication/Authorization added to TIKA server to be able to add another layer of security around the Tika Server Rest service. This could become a rabbit hole with the number of options available around Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter basic Auth is added. How to store user(s)/password suggest looking at how other apache products do the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525254#comment-17525254 ] Dan Coldrick commented on TIKA-3719: Hi [~tallison] Unfortunately I probably can't share our full use case :( What I can say is it won't be public facing but our security guys will pickup that TIKA server isn't https/doesn't have any authorization around it. We have been using tika server for the last 2 years with a .net wrapper round it (basically the .net wrapper spawns tika server and passes it files to process with some extra logic around dwg files). > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525286#comment-17525286 ] Dan Coldrick commented on TIKA-3719: [~tallison] I don't but is it maybe something like this which would have to be taken in combination with what I posted earlier? {code:java} KeyStoreType keystore = new KeyStoreType(); keystore.setType("JKS"); keystore.setPassword("1"); keystore.setResource("keystore.jks"); TrustManagersType tmt = new TrustManagersType(); tmt.setKeyStore(keystore); TLSServerParameters parameters = new TLSServerParameters(); parameters.setTrustManagers(TLSParameterJaxBUtils.getTrustManagers(tmt,true)); {code} > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525293#comment-17525293 ] Dan Coldrick commented on TIKA-3719: Epic stuff [~tallison] I'll pull it down tomorrow and check it out (it's very late in the UK now). Thank you :) > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3719: --- Attachment: image-2022-04-21-18-52-50-706.png > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525914#comment-17525914 ] Dan Coldrick commented on TIKA-3719: Hi [~tallison] I don't think the trust store code I provided works, as soon as I set a trust store it errors. INFO [main] 18:50:17,420 org.apache.tika.server.core.TikaServerProcess Starting Apache Tika server INFO [main] 18:50:17,496 org.apache.tika.server.core.TikaServerProcess Using custom config: G:\git\tika-server-config-default.xml ERROR [main] 18:50:23,995 org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils Could not load keystore resource C:/Program Files/Java/jdk1.8.0_301/jre/lib/security/cacerts ERROR [main] 18:50:23,995 org.apache.tika.server.core.TikaServerProcess Can't start: java.io.IOException: Could not load keystore resource C:/Program Files/Java/jdk1.8.0_301/jre/lib/security/cacerts at org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getKeyStore(TLSParameterJaxBUtils.java:161) ~[cxf-core-3.5.1.jar:3.5.1] at org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getTrustManagers(TLSParameterJaxBUtils.java:395) ~[cxf-core-3.5.1.jar:3.5.1] at org.apache.tika.server.core.TikaServerProcess.getTlsParams(TikaServerProcess.java:304) ~[classes/:?] at org.apache.tika.server.core.TikaServerProcess.initServer(TikaServerProcess.java:265) ~[classes/:?] at org.apache.tika.server.core.TikaServerProcess.main(TikaServerProcess.java:133) ~[classes/:?] If I ignore the trust manager the server starts. I'll do some digging around about the trust store but not 100% I'll be able to solve it. > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525947#comment-17525947 ] Dan Coldrick edited comment on TIKA-3719 at 4/21/22 6:27 PM: - [~tallison] trustKeyStore.setFile and now it looks like the truststore works (well it loads anyway). {code:java} trustKeyStore.setFile(tlsConfig.getTrustStoreFile()); {code} was (Author: monkmachine): [~tallison] trustKeyStore.setFile and now it looks like the truststore works (well it loads anyway). > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525947#comment-17525947 ] Dan Coldrick commented on TIKA-3719: [~tallison] trustKeyStore.setFile and now it looks like the truststore works (well it loads anyway). > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3719: --- Attachment: localhost.jks > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525950#comment-17525950 ] Dan Coldrick commented on TIKA-3719: Tim I've attached my noddy localhost keystore which contains a localhost private cert if it's of any use to you. Password is "1" i.e. the number 1 > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525960#comment-17525960 ] Dan Coldrick commented on TIKA-3719: I could get it working with my .pfx which is a pkcs12. I used powershell to generate it, again if the command is of any use here it is: New-SelfSignedCertificate -CertStoreLocation Cert:\LocalMachine\My -DnsName "localhost" -FriendlyName "localhost" -NotAfter (Get-Date).AddYears(10) I could then export it from windows certificate store to put in the jks I attached. > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525947#comment-17525947 ] Dan Coldrick edited comment on TIKA-3719 at 4/21/22 6:35 PM: - [~tallison] trustKeyStore.setFile and now it looks like the truststore works (well it loads anyway). {code:java} trustKeyStore.setFile(tlsConfig.getTrustStoreFile()); {code} was (Author: monkmachine): [~tallison] trustKeyStore.setFile and now it looks like the truststore works (well it loads anyway). {code:java} trustKeyStore.setFile(tlsConfig.getTrustStoreFile()); {code} > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525982#comment-17525982 ] Dan Coldrick commented on TIKA-3719: Super(*) > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526006#comment-17526006 ] Dan Coldrick commented on TIKA-3721: So would it be possible to do it with Bentley viewer? Convert to PDF then parse with the PDF parser? [https://stackoverflow.com/questions/2560706/convert-dgn-to-pdf] [https://communities.bentley.com/products/microstation/microstation_printing/f/printing-and-plotting-forum/104978/bentley-view-v8i-how-to-get-pdf-from-dgn-using-command-line] > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526010#comment-17526010 ] Dan Coldrick commented on TIKA-3719: Ok Tim, will get it checked over and let you know in the next day or so if that's ok. Many thanks for your effort, very much appreciated. I'll open a separate ticket for the Key Store passwords as well :) Thanks again :) (*) > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526098#comment-17526098 ] Dan Coldrick commented on TIKA-3725: [~tallison] [~nick] I definitely think Basic authorization is a good starting point, at least TIKA server would have some security around it which from a consumer point of view would allow to host TIKA server in a more secure way than what it currently is. [~nick] [~tallison] is it possible to reach out to the CXF devs in you Apache capacity to review the current way TIKA server is setup? Almost like a code review for best practice so it would be possible to use the CXF configuration files? I did notice whilst having a go with the SSL stuff if you drop a CXF.xml in the resources folder it appeared to spawn a separate jetty server but I don't have any idea how it works. [https://cxf.apache.org/docs/secure-jax-rs-services.html] > Add Authorization to Tika Server (Suggest Basic to start off with) > -- > > Key: TIKA-3725 > URL: https://issues.apache.org/jira/browse/TIKA-3725 > Project: Tika > Issue Type: New Feature > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > I would be good to get some Authentication/Authorization added to TIKA server > to be able to add another layer of security around the Tika Server Rest > service. > This could become a rabbit hole with the number of options available around > Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter > basic Auth is added. > How to store user(s)/password suggest looking at how other apache products do > the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526106#comment-17526106 ] Dan Coldrick commented on TIKA-3725: [~tallison] as per lots of previous comments you're a super(*) :) > Add Authorization to Tika Server (Suggest Basic to start off with) > -- > > Key: TIKA-3725 > URL: https://issues.apache.org/jira/browse/TIKA-3725 > Project: Tika > Issue Type: New Feature > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > I would be good to get some Authentication/Authorization added to TIKA server > to be able to add another layer of security around the Tika Server Rest > service. > This could become a rabbit hole with the number of options available around > Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter > basic Auth is added. > How to store user(s)/password suggest looking at how other apache products do > the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526113#comment-17526113 ] Dan Coldrick commented on TIKA-3721: Would it be possible to add DGN Detector? Looks like it's apache license? https://mvnrepository.com/artifact/com.github.peeveen/tika-dgn-detector/0.4 https://github.com/peeveen/tika-dgn-detector > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526618#comment-17526618 ] Dan Coldrick commented on TIKA-3721: My looking at bentley viewer to convert via PDF looks like it's a no go due to licensing :( Well done Tim on getting detector working. > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526620#comment-17526620 ] Dan Coldrick commented on TIKA-3725: I thought basic Auth was a good start, JWT will require a bit more configuration than just basic. > Add Authorization to Tika Server (Suggest Basic to start off with) > -- > > Key: TIKA-3725 > URL: https://issues.apache.org/jira/browse/TIKA-3725 > Project: Tika > Issue Type: New Feature > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > I would be good to get some Authentication/Authorization added to TIKA server > to be able to add another layer of security around the Tika Server Rest > service. > This could become a rabbit hole with the number of options available around > Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter > basic Auth is added. > How to store user(s)/password suggest looking at how other apache products do > the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3721: --- Attachment: image-2022-04-22-20-00-45-704.png > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-22-20-00-45-704.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3721: --- Attachment: image-2022-04-22-20-01-09-564.png > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3721: --- Attachment: image-2022-04-22-20-02-24-180.png > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526633#comment-17526633 ] Dan Coldrick commented on TIKA-3721: Yeah think our company already own the Aspose .net libraries but we've found them to be quite hit and miss with how good they work. Was hoping to get something in tika but doesn't look possible. Think even adding the detector is a good start. Tim assuming we still won't be able to pull the meta data out of the without a proper parser? I'm not 100% with this so am I right in thinking (Is this how other meta data parsers work) these are the properties that could be pulled out the file: !image-2022-04-22-20-00-45-704.png! If so I've set a couple: !image-2022-04-22-20-01-09-564.png! Then if I unzip the dgn (V8 version) I can see them in the Unzipped documentsummaryfile !image-2022-04-22-20-02-24-180.png! Would that be on the right lines you think to be able to get them out? > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526644#comment-17526644 ] Dan Coldrick commented on TIKA-3721: [~tallison] You're above my ability with this lol, POI is the Microsoft library to pull stuff out of docx, xlsx etc? Looks like these might be similar in the fact they are zipped but not 100% sure how this all works, can just see the patterns. In Peeveen's github he has loads of examples, V7 doesn't look to work the same as V8 (where v8 is a zip a bit like docx/xslx etc): [https://github.com/peeveen/tika-dgn-detector/tree/master/src/test/resources/dgn/dgn8] [https://github.com/peeveen/tika-dgn-detector/tree/master/src/test/resources/dgn/dgn7] > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526651#comment-17526651 ] Dan Coldrick commented on TIKA-3721: Thanks Tim, I might have a play the weekend if I bored. Have a nice weekend. > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526803#comment-17526803 ] Dan Coldrick commented on TIKA-3721: Is this along the right lines? {code:java} /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.tika.parser.dgn;import java.io.IOException; import java.io.InputStream; import java.util.Collections; import java.util.Set; import org.apache.commons.io.IOUtils; import org.apache.commons.io.input.CloseShieldInputStream; import org.apache.poi.poifs.filesystem.DirectoryNode; import org.apache.poi.poifs.filesystem.POIFSFileSystem; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException; import org.apache.tika.exception.TikaException; import org.apache.tika.io.TikaInputStream; import org.apache.tika.metadata.Metadata; import org.apache.tika.mime.MediaType; import org.apache.tika.parser.AbstractParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.parser.microsoft.SummaryExtractor; import org.apache.tika.sax.XHTMLContentHandler;/** * DGN (CAD Drawing) parser. This is a very basic parser, which just looks for * bits of the headers. */ public class DGNParser extends AbstractParser { /** * */ private static final long serialVersionUID = 311571157668507304L; private static MediaType TYPE = MediaType.image("vnd.dgn"); public Set getSupportedTypes(ParseContext context) { return Collections.singleton(TYPE); } public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); xhtml.startDocument(); SummaryExtractor summaryExtractor = new SummaryExtractor(metadata); final DirectoryNode root; TikaInputStream tstream = TikaInputStream.cast(stream); POIFSFileSystem mustCloseFs = null; try { if (tstream == null) { mustCloseFs = new POIFSFileSystem(new CloseShieldInputStream(stream)); root = mustCloseFs.getRoot(); } else { final Object container = tstream.getOpenContainer(); if (container instanceof POIFSFileSystem) { root = ((POIFSFileSystem) container).getRoot(); } else if (container instanceof DirectoryNode) { root = (DirectoryNode) container; } else { POIFSFileSystem fs = null; if (tstream.hasFile()) { fs = new POIFSFileSystem(tstream.getFile(), true); } else { fs = new POIFSFileSystem(new CloseShieldInputStream(tstream)); } // tstream will close the fs, no need to close this below tstream.setOpenContainer(fs); root = fs.getRoot(); } } summaryExtractor.parseSummaries(root); } finally { IOUtils.closeQuietly(mustCloseFs); } xhtml.endDocument(); } } {code} I know I'm not handling v7's yet but it does appear to output v8's meta data at least? If we have it in it's own parser there is the option to extend for V7's? Again I'm not really a proper java developer and can just hack my way around to get stuff working so any feedback would be good? > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Attachments: Screenshot from 2022-04-22 16-03-44.png, > dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://doc
[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527706#comment-17527706 ] Dan Coldrick commented on TIKA-3725: [~tallison] I see you've got some responses from the CXF guys :) Great news > Add Authorization to Tika Server (Suggest Basic to start off with) > -- > > Key: TIKA-3725 > URL: https://issues.apache.org/jira/browse/TIKA-3725 > Project: Tika > Issue Type: New Feature > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > I would be good to get some Authentication/Authorization added to TIKA server > to be able to add another layer of security around the Tika Server Rest > service. > This could become a rabbit hole with the number of options available around > Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter > basic Auth is added. > How to store user(s)/password suggest looking at how other apache products do > the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527706#comment-17527706 ] Dan Coldrick edited comment on TIKA-3725 at 4/25/22 6:34 PM: - [~tallison] I see you've got some responses from the CXF guys :) Great news Quick question is that thread only for apache people? i.e. not open to public? was (Author: monkmachine): [~tallison] I see you've got some responses from the CXF guys :) Great news > Add Authorization to Tika Server (Suggest Basic to start off with) > -- > > Key: TIKA-3725 > URL: https://issues.apache.org/jira/browse/TIKA-3725 > Project: Tika > Issue Type: New Feature > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > I would be good to get some Authentication/Authorization added to TIKA server > to be able to add another layer of security around the Tika Server Rest > service. > This could become a rabbit hole with the number of options available around > Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter > basic Auth is added. > How to store user(s)/password suggest looking at how other apache products do > the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527711#comment-17527711 ] Dan Coldrick commented on TIKA-3719: [~tallison] Just stick something in confluence, that's where I get all my info (as user) from about tika server. > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Assignee: Tim Allison >Priority: Minor > Fix For: 2.4.0 > > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527713#comment-17527713 ] Dan Coldrick commented on TIKA-3719: Would also say if you want help with documenting stuff in confluence I'd be happy to help > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Assignee: Tim Allison >Priority: Minor > Fix For: 2.4.0 > > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527783#comment-17527783 ] Dan Coldrick commented on TIKA-3719: Hi [~tallison] Yes happy with beta, be really good if the CXF guys can have a review (which looks like they are going to) and extend to take cxf.xml files with all that entails. Honestly can't thank you enough for the help you've provided. :) My Confluence name is Dan Coldrick > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Assignee: Tim Allison >Priority: Minor > Fix For: 2.4.0 > > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527784#comment-17527784 ] Dan Coldrick commented on TIKA-3719: Also is it possible to link to confluence from the main tika page and make it stand out more? Confluence has a lot more detail than the main tika page which I've always found to be more useful (might also help I'm a massive fan of confluence) > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Assignee: Tim Allison >Priority: Minor > Fix For: 2.4.0 > > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files
Dan Coldrick created TIKA-3731: -- Summary: Tika CAD DWG reader not pulling meta data from new cad files Key: TIKA-3731 URL: https://issues.apache.org/jira/browse/TIKA-3731 Project: Tika Issue Type: Bug Components: metadata Affects Versions: 2.3.0 Reporter: Dan Coldrick The tika DWG reader is only pulling meta data from up to drawing format AC1024 (see code snippet) where it looks to be AC1027 & AC1032 can also be read from the same get2007and2010Props meta data extractor. {code:java} switch (version) { case "AC1015": metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); if (skipTo2000PropertyInfoSection(stream, header)) { get2000Props(stream, metadata, xhtml); } break; case "AC1018": metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); if (skipToPropertyInfoSection(stream, header)) { get2004Props(stream, metadata, xhtml); } break; case "AC1021": case "AC1024": metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); if (skipToPropertyInfoSection(stream, header)) { get2007and2010Props(stream, metadata, xhtml); } break; default: throw new TikaException("Unsupported AutoCAD drawing version: " + version); } {code} Looks like the case statement just needs extending and for examples files to be created for AC1027/AC1032. Current versions of auto cad can be found here: https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files
[ https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527786#comment-17527786 ] Dan Coldrick commented on TIKA-3731: related to https://issues.apache.org/jira/browse/TIKA-1735 but that looked to also try to include a parser so thought it would be good to split the two issues and get the bug fixed. > Tika CAD DWG reader not pulling meta data from new cad files > > > Key: TIKA-3731 > URL: https://issues.apache.org/jira/browse/TIKA-3731 > Project: Tika > Issue Type: Bug > Components: metadata >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Major > > > The tika DWG reader is only pulling meta data from up to drawing format > AC1024 (see code snippet) where it looks to be AC1027 & AC1032 can also be > read from the same get2007and2010Props meta data extractor. > {code:java} > switch (version) { > case "AC1015": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipTo2000PropertyInfoSection(stream, header)) { > get2000Props(stream, metadata, xhtml); > } > break; > case "AC1018": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2004Props(stream, metadata, xhtml); > } > break; > case "AC1021": > case "AC1024": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2007and2010Props(stream, metadata, xhtml); > } > break; > default: > throw new TikaException("Unsupported AutoCAD drawing version: > " + version); > } {code} > Looks like the case statement just needs extending and for examples files to > be created for AC1027/AC1032. > Current versions of auto cad can be found here: > https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files
[ https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3731: --- Attachment: testDWG-AC1027.dwg > Tika CAD DWG reader not pulling meta data from new cad files > > > Key: TIKA-3731 > URL: https://issues.apache.org/jira/browse/TIKA-3731 > Project: Tika > Issue Type: Bug > Components: metadata >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Major > Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg > > > > The tika DWG reader is only pulling meta data from up to drawing format > AC1024 (see code snippet) where it looks to be AC1027 & AC1032 can also be > read from the same get2007and2010Props meta data extractor. > {code:java} > switch (version) { > case "AC1015": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipTo2000PropertyInfoSection(stream, header)) { > get2000Props(stream, metadata, xhtml); > } > break; > case "AC1018": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2004Props(stream, metadata, xhtml); > } > break; > case "AC1021": > case "AC1024": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2007and2010Props(stream, metadata, xhtml); > } > break; > default: > throw new TikaException("Unsupported AutoCAD drawing version: > " + version); > } {code} > Looks like the case statement just needs extending and for examples files to > be created for AC1027/AC1032. > Current versions of auto cad can be found here: > https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files
[ https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3731: --- Attachment: AutoCAD 2018 format (1).dwg > Tika CAD DWG reader not pulling meta data from new cad files > > > Key: TIKA-3731 > URL: https://issues.apache.org/jira/browse/TIKA-3731 > Project: Tika > Issue Type: Bug > Components: metadata >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Major > Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg > > > > The tika DWG reader is only pulling meta data from up to drawing format > AC1024 (see code snippet) where it looks to be AC1027 & AC1032 can also be > read from the same get2007and2010Props meta data extractor. > {code:java} > switch (version) { > case "AC1015": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipTo2000PropertyInfoSection(stream, header)) { > get2000Props(stream, metadata, xhtml); > } > break; > case "AC1018": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2004Props(stream, metadata, xhtml); > } > break; > case "AC1021": > case "AC1024": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2007and2010Props(stream, metadata, xhtml); > } > break; > default: > throw new TikaException("Unsupported AutoCAD drawing version: > " + version); > } {code} > Looks like the case statement just needs extending and for examples files to > be created for AC1027/AC1032. > Current versions of auto cad can be found here: > https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files
[ https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527787#comment-17527787 ] Dan Coldrick commented on TIKA-3731: I've attached a AC1027 and AC1032 dwg to extend the tests. > Tika CAD DWG reader not pulling meta data from new cad files > > > Key: TIKA-3731 > URL: https://issues.apache.org/jira/browse/TIKA-3731 > Project: Tika > Issue Type: Bug > Components: metadata >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Major > Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg > > > > The tika DWG reader is only pulling meta data from up to drawing format > AC1024 (see code snippet) where it looks to be AC1027 & AC1032 can also be > read from the same get2007and2010Props meta data extractor. > {code:java} > switch (version) { > case "AC1015": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipTo2000PropertyInfoSection(stream, header)) { > get2000Props(stream, metadata, xhtml); > } > break; > case "AC1018": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2004Props(stream, metadata, xhtml); > } > break; > case "AC1021": > case "AC1024": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2007and2010Props(stream, metadata, xhtml); > } > break; > default: > throw new TikaException("Unsupported AutoCAD drawing version: > " + version); > } {code} > Looks like the case statement just needs extending and for examples files to > be created for AC1027/AC1032. > Current versions of auto cad can be found here: > https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528201#comment-17528201 ] Dan Coldrick commented on TIKA-3719: [~tallison] https://issues.apache.org/jira/browse/TIKA-3737 > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Assignee: Tim Allison >Priority: Minor > Fix For: 2.4.0 > > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (TIKA-3737) Update Main Apache Tika Website with link to Tika confluence
Dan Coldrick created TIKA-3737: -- Summary: Update Main Apache Tika Website with link to Tika confluence Key: TIKA-3737 URL: https://issues.apache.org/jira/browse/TIKA-3737 Project: Tika Issue Type: Task Components: site Affects Versions: 2.3.0 Reporter: Dan Coldrick Update Main Apache Tika Website with link to Tika confluence, there is more detail on the confluence pages than on the main site. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3731) Tika CAD DWG reader not pulling meta data from new cad files
[ https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528205#comment-17528205 ] Dan Coldrick commented on TIKA-3731: [~tallison] Fine by me ref the custom metadata keys > Tika CAD DWG reader not pulling meta data from new cad files > > > Key: TIKA-3731 > URL: https://issues.apache.org/jira/browse/TIKA-3731 > Project: Tika > Issue Type: Improvement > Components: metadata >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Fix For: 2.4.0 > > Attachments: AutoCAD 2018 format (1).dwg, testDWG-AC1027.dwg > > > > The tika DWG reader is only pulling meta data from up to drawing format > AC1024 (see code snippet) where it looks to be AC1027 & AC1032 can also be > read from the same get2007and2010Props meta data extractor. > {code:java} > switch (version) { > case "AC1015": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipTo2000PropertyInfoSection(stream, header)) { > get2000Props(stream, metadata, xhtml); > } > break; > case "AC1018": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2004Props(stream, metadata, xhtml); > } > break; > case "AC1021": > case "AC1024": > metadata.set(Metadata.CONTENT_TYPE, TYPE.toString()); > if (skipToPropertyInfoSection(stream, header)) { > get2007and2010Props(stream, metadata, xhtml); > } > break; > default: > throw new TikaException("Unsupported AutoCAD drawing version: > " + version); > } {code} > Looks like the case statement just needs extending and for examples files to > be created for AC1027/AC1032. > Current versions of auto cad can be found here: > https://knowledge.autodesk.com/support/autocad/learn-explore/caas/sfdcarticles/sfdcarticles/drawing-version-codes-for-autocad.html > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs
[ https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528203#comment-17528203 ] Dan Coldrick commented on TIKA-3719: Thanks for the confluence update writes as well :) > Tika Server Ability to Run HTTPs > > > Key: TIKA-3719 > URL: https://issues.apache.org/jira/browse/TIKA-3719 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Assignee: Tim Allison >Priority: Minor > Fix For: 2.4.0 > > Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks > > > We need the ability to run TIKA server as a https end point, I can't see > anything in the config that allows for this. > Looks like I'm not the only one: > [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https] > > If anyone can point to some documentation on how it might be possible it > would be really appreciated. > > Thanks -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528242#comment-17528242 ] Dan Coldrick commented on TIKA-3721: [~tallison] can you just check that the xhtml comes out of my code? I wasn't sure how that worked. Also probably want to add a test to check it works? Like I said in my comment was asking for feed back really lol :) Can I have a simple check of the process to update tika's GIT? Is it roughly: * Create a Jira * Create a Branch using the Jira number * Create a pull request using the new branch * Someone who knows what they are doing either approves the code, rejects the code or fixes the code so it can be merged into main > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Fix For: 2.4.0 > > Attachments: Screenshot from 2022-04-22 16-03-44.png, > dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3721) DGN parser
[ https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528262#comment-17528262 ] Dan Coldrick commented on TIKA-3721: Thanks Tim > DGN parser > -- > > Key: TIKA-3721 > URL: https://issues.apache.org/jira/browse/TIKA-3721 > Project: Tika > Issue Type: New Feature > Components: parser >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > Fix For: 2.4.0 > > Attachments: Screenshot from 2022-04-22 16-03-44.png, > dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, > image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png > > > Does anyone have any experience with the DGN file format by MicroStation? I > see TIKA doesn't have a parser so would it be possible to create one? > https://docs.fileformat.com/cad/dgn/ -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
Dan Coldrick created TIKA-3742: -- Summary: Advice around DGN7 parser and whether to add to TIKA Key: TIKA-3742 URL: https://issues.apache.org/jira/browse/TIKA-3742 Project: Tika Issue Type: Task Components: parser Reporter: Dan Coldrick Hi [~tallison] & Whoever else. I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Attachment: DGN.zip > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529023#comment-17529023 ] Dan Coldrick commented on TIKA-3742: {code:java} package org.apache.tika.parser.dgn;import java.io.BufferedReader; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStream; import java.util.ArrayList; import java.util.Collections; import java.util.Set;import org.apache.commons.compress.utils.IOUtils; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.mime.MediaType; import org.apache.tika.parser.AbstractParser; import org.apache.tika.parser.ParseContext; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException;public class DGN7Parser extends AbstractParser { private static final long serialVersionUID = 7609445358323296566L; Set SUPPORTED_TYPES = Collections.singleton(MediaType.image("vnd.dgn; version=7")); @Override public Set getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } @Override public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, TikaException, SAXException { File file = new File("G:/temp/Drawing.dgn"); try (OutputStream outputStream = new FileOutputStream(file)) { IOUtils.copy(stream, outputStream); } Runtime rt = Runtime.getRuntime(); String[] commands = {"C:\\Users\\monkm\\DGN\\dgndump.exe","-r","1", "G:\\temp\\Drawing.dgn"}; Process proc = rt.exec(commands); BufferedReader stdInput = new BufferedReader(new InputStreamReader(proc.getInputStream())); BufferedReader stdError = new BufferedReader(new InputStreamReader(proc.getErrorStream())); ArrayList ar = new ArrayList(); String s = null; while ((s = stdInput.readLine()) != null) { if(s.startsWith(" string = \"")) { ar.add(s.substring(12, s.length()-1).trim()); } System.out.println(s); } System.out.println(ar); while ((s = stdError.readLine()) != null) { System.out.println(s); } }} {code} > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Attachment: ExampleOutput.txt > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Description: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library ([http://dgnlib.maptools.org/|http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). was: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library ([http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library > ([http://dgnlib.maptools.org/|http://dgnlib.maptools.org/)] for DGN7 which > produces an dgndump.exe which will dump all the data from the DGN. From my > initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Description: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library [http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). was: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library ([http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library > [http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which > produces an dgndump.exe which will dump all the data from the DGN. From my > initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Description: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library ([http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). was: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library ([http://dgnlib.maptools.org/|http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library > ([http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which > produces an dgndump.exe which will dump all the data from the DGN. From my > initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Description: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). was: Hi [~tallison] & Whoever else. I managed to compile the C/C++ library [http://dgnlib.maptools.org|http://dgnlib.maptools.org/)] for DGN7 which produces an dgndump.exe which will dump all the data from the DGN. From my initial testing it looks pretty good. Would you guys think it was worth adding this or just keep it as a custom parser rather than in the main source code? It's under MIT license. I've attached the exe (zipped), a copy of the output from the dump and my very dirty testing calling the exe (my code I was only interested in the Strings so am only pulling those into a string array at the moment to check it's pulling out the correct data). > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529023#comment-17529023 ] Dan Coldrick edited comment on TIKA-3742 at 4/27/22 8:09 PM: - {code:java} package org.apache.tika.parser.dgn; import java.io.BufferedReader; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStream; import java.util.ArrayList; import java.util.Collections; import java.util.Set; import org.apache.commons.compress.utils.IOUtils; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.mime.MediaType; import org.apache.tika.parser.AbstractParser; import org.apache.tika.parser.ParseContext; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException; public class DGN7Parser extends AbstractParser { private static final long serialVersionUID = 7609445358323296566L; Set SUPPORTED_TYPES = Collections.singleton(MediaType.image("vnd.dgn; version=7")); @Override public Set getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } @Override public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, TikaException, SAXException { File file = new File("G:/temp/Drawing.dgn"); try (OutputStream outputStream = new FileOutputStream(file)) { IOUtils.copy(stream, outputStream); } Runtime rt = Runtime.getRuntime(); String[] commands = {"C:\\Users\\monkm\\DGN\\dgndump.exe","-r","1", "G:\\temp\\Drawing.dgn"}; Process proc = rt.exec(commands); BufferedReader stdInput = new BufferedReader(new InputStreamReader(proc.getInputStream())); BufferedReader stdError = new BufferedReader(new InputStreamReader(proc.getErrorStream())); ArrayList ar = new ArrayList(); String s = null; while ((s = stdInput.readLine()) != null) { if(s.startsWith(" string = \"")) { ar.add(s.substring(12, s.length()-1).trim()); } System.out.println(s); } System.out.println(ar); while ((s = stdError.readLine()) != null) { System.out.println(s); } }} {code} was (Author: monkmachine): {code:java} package org.apache.tika.parser.dgn;import java.io.BufferedReader; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStream; import java.util.ArrayList; import java.util.Collections; import java.util.Set;import org.apache.commons.compress.utils.IOUtils; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.mime.MediaType; import org.apache.tika.parser.AbstractParser; import org.apache.tika.parser.ParseContext; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException;public class DGN7Parser extends AbstractParser { private static final long serialVersionUID = 7609445358323296566L; Set SUPPORTED_TYPES = Collections.singleton(MediaType.image("vnd.dgn; version=7")); @Override public Set getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } @Override public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, TikaException, SAXException { File file = new File("G:/temp/Drawing.dgn"); try (OutputStream outputStream = new FileOutputStream(file)) { IOUtils.copy(stream, outputStream); } Runtime rt = Runtime.getRuntime(); String[] commands = {"C:\\Users\\monkm\\DGN\\dgndump.exe","-r","1", "G:\\temp\\Drawing.dgn"}; Process proc = rt.exec(commands); BufferedReader stdInput = new BufferedReader(new InputStreamReader(proc.getInputStream())); BufferedReader stdError = new BufferedReader(new InputStreamReader(proc.getErrorStream())); ArrayList ar = new ArrayList(); String s = null; while ((s = stdInput.readLine()) != null) { if(s.startsWith(" string = \"")) { ar.add(s.substring(12, s.length()-1).trim()); } System.out.println(s); } System.out.println(ar); while ((s = stdError.readLine()) != null) { System.out.println(s); } }} {code} > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529033#comment-17529033 ] Dan Coldrick commented on TIKA-3742: [~nick] Apologies, new to all this. Can you point me at some documentation? External parsers assume they don't exist in the main TIKA GIT and you have another repo just for that parser that users can add in? Or does it work differently? > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529037#comment-17529037 ] Dan Coldrick commented on TIKA-3742: [~tallison] I struggle to get out the bed in the morning let alone read C/C++ and convert it to Java. I can make out what's it's doing but no idea how it does the bytes read stuff which is really how the underlying bits work. I can see how in the file there are the element types but again no idea how they are mapped to the bytes, I've never had any dealings with C/C++. I was happy after an hour pissing about on google I managed to get it to compile (on Windows) :D > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529042#comment-17529042 ] Dan Coldrick commented on TIKA-3742: [~tallison] got a link to that or an example? > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529044#comment-17529044 ] Dan Coldrick commented on TIKA-3742: lol, you posted before I responded > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529409#comment-17529409 ] Dan Coldrick commented on TIKA-3742: [~nick] I can have a go although I can't get the following line to compile in eclipse: byte[] str = is.readNBytes(len); > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529667#comment-17529667 ] Dan Coldrick commented on TIKA-3742: [~nick] I've made a start today which I can share at some point tomorrow (been to the pub tonight lol so will have to wait till tomorrow ), are you ok if I lean on you 2 for help? I'd rather write something myself which you can rip apart so I can learn something. I've learnt a lot in the last week or so already :) I also think there is some meta data in there somewhere which we should be able to pull out :) > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529667#comment-17529667 ] Dan Coldrick edited comment on TIKA-3742 at 4/29/22 6:08 AM: - [~nick] I've made a start today which I can share at some point tomorrow , are you ok if I lean on you 2 for help? I'd rather write something myself which you can rip apart so I can learn something. I've learnt a lot in the last week or so already :) I also think there is some meta data in there somewhere which we should be able to pull out :) was (Author: monkmachine): [~nick] I've made a start today which I can share at some point tomorrow (been to the pub tonight lol so will have to wait till tomorrow ), are you ok if I lean on you 2 for help? I'd rather write something myself which you can rip apart so I can learn something. I've learnt a lot in the last week or so already :) I also think there is some meta data in there somewhere which we should be able to pull out :) > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530448#comment-17530448 ] Dan Coldrick commented on TIKA-3742: Hi [~nick] I'm struggling, I can see there are deletes which I want to exclude from the parser but can't work out how to in Java. I can see come out in DGN dump with a deleted attribute: {code:java} Element:Text Level:27 id:19707 (DELETED) offset=1959730 size=74 bytes graphic_group:0 color:0 weight:0 style:0 properties=1536,MODIFIED,NEW origin=(963453.83000,96730.11000), rotation=272.763292 font=1, just=2, length_mult=119.99, height_mult=119.99 string = "HARVARD RD" {code} I can see in the core element structure it should be there: {code:java} The first 18 words of an element in the design file are its fixed header -- containing the element type, level, words to follow, and range information. The C declaration for this header is as follows typedef struct { unsigned level:6 ; /* level element is on */ unsigned :1 ; /* reserved */ unsigned complex:1 ; /* component of complex elem.*/ unsigned type:7 ; /* type of element */ unsigned deleted:1 ; /* set if element is deleted */ unsigned short words ; /* words to follow in element */ unsigned long xlow ; /* element range - low */ unsigned long ylow ; unsigned long zlow ; unsigned long xhigh ; /* element range - high */ unsigned long yhigh ; unsigned long zhigh ; } Elm_hdr {code} You get the type out (which I think is from the same header structure) {code:java} int h2 = tstream.read() ; int type = h2 & 0x7f; {code} How do I get the deleted attribute out so I can remove it from the parse content? Also you said about type 37, I don't have any examples where we have type 37 elements. I've created a fork and created some dirty code to test in: [https://github.com/monkmachine/tika/tree/TIKA-3742/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-cad-module/src/main/java/org/apache/tika/parser/dgn] > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530457#comment-17530457 ] Dan Coldrick commented on TIKA-3742: Is this correct for working out the deletion? If it is I might actually understand how its working a bit more! {code:java} boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code} > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530457#comment-17530457 ] Dan Coldrick edited comment on TIKA-3742 at 4/30/22 9:32 PM: - Is this correct for working out the deletion? If it is I might actually understand how its working a bit more! {code:java} boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code} Edit: That looks to have done the trick. Next issue is I'm getting different values out compared to dgn dump (and when I look at the file using MicroStation) An example is in I get out 1264t.dgn (I've attached to JIRA) INDIAN TRAIL RD] where as it should just be INDIAN TRAIL RD was (Author: monkmachine): Is this correct for working out the deletion? If it is I might actually understand how its working a bit more! {code:java} boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code} > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530457#comment-17530457 ] Dan Coldrick edited comment on TIKA-3742 at 4/30/22 9:32 PM: - Is this correct for working out the deletion? If it is I might actually understand how its working a bit more! {code:java} boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code} Edit: That looks to have done the trick. Next issue is I'm getting different values out compared to dgn dump (and when I look at the file using MicroStation) An example is in I get out 1264t.dgn (I've attached to JIRA) INDIAN TRAIL RD] where as it should just be INDIAN TRAIL RD I've got multiple examples of these. was (Author: monkmachine): Is this correct for working out the deletion? If it is I might actually understand how its working a bit more! {code:java} boolean isdeleted = BigInteger.valueOf(h2).testBit(7); {code} Edit: That looks to have done the trick. Next issue is I'm getting different values out compared to dgn dump (and when I look at the file using MicroStation) An example is in I get out 1264t.dgn (I've attached to JIRA) INDIAN TRAIL RD] where as it should just be INDIAN TRAIL RD > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3742: --- Attachment: 1264t.dgn > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531396#comment-17531396 ] Dan Coldrick commented on TIKA-1570: [~tallison] Can we bring this PR back to life? I think it would be great to create a windows installer for TIKA server? > Seeking a stop method for better use with Apache Commons Daemon > --- > > Key: TIKA-1570 > URL: https://issues.apache.org/jira/browse/TIKA-1570 > Project: Tika > Issue Type: Improvement > Components: server >Affects Versions: 1.7 >Reporter: Jason Borg >Priority: Minor > > I've got tika-server-1.7.jar from http://tika.apache.org/download.html > I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon > from http://commons.apache.org/proper/commons-daemon/binaries.html > I can get Tika started as a service, but I can't determine what to use for a > stop method. > prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath > "C:\Tika Service\tika-server-1.7.jar" --StartClass > "org.apache.tika.server.TikaServerCli" --StopClass > "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main > --Description "Tika Daemon Windows Service" --StartMode java --StopMode java > This starts, and works as I'd hope, but when trying to stop the service it > doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] > args) isn't a suitable stop method, but I'm lost for alternatives. > Using Daemon in exe mode works for start, but gives inconsistent results for > stop. Adding a stop method to Tika would be ideal. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (TIKA-3753) Move Parsers pages in Confluence under the home page
Dan Coldrick created TIKA-3753: -- Summary: Move Parsers pages in Confluence under the home page Key: TIKA-3753 URL: https://issues.apache.org/jira/browse/TIKA-3753 Project: Tika Issue Type: Improvement Components: documentation Reporter: Dan Coldrick Move Parsers pages in Confluence under the home page, at the moment it's quite hard to find the parsers info as the don't show on the homepage in the tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3753) Move Parsers pages in Confluence under the home page
[ https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3753: --- Attachment: screenshot-1.png > Move Parsers pages in Confluence under the home page > > > Key: TIKA-3753 > URL: https://issues.apache.org/jira/browse/TIKA-3753 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Dan Coldrick >Priority: Minor > Attachments: screenshot-1.png > > > Move Parsers pages in Confluence under the home page, at the moment it's > quite hard to find the parsers info as the don't show on the homepage in the > tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3753) Move Parsers pages in Confluence under the home page
[ https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3753: --- Attachment: screenshot-2.png > Move Parsers pages in Confluence under the home page > > > Key: TIKA-3753 > URL: https://issues.apache.org/jira/browse/TIKA-3753 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Dan Coldrick >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > Move Parsers pages in Confluence under the home page, at the moment it's > quite hard to find the parsers info as the don't show on the homepage in the > tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3753) Move Parsers pages in Confluence under the home page
[ https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533968#comment-17533968 ] Dan Coldrick commented on TIKA-3753: This is where you come in to confluence via google or the website: !screenshot-2.png! As you can see there is no way to see these pages: !screenshot-1.png! > Move Parsers pages in Confluence under the home page > > > Key: TIKA-3753 > URL: https://issues.apache.org/jira/browse/TIKA-3753 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Dan Coldrick >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > Move Parsers pages in Confluence under the home page, at the moment it's > quite hard to find the parsers info as the don't show on the homepage in the > tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3753) Move Parsers pages in Confluence under the home page
[ https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533968#comment-17533968 ] Dan Coldrick edited comment on TIKA-3753 at 5/9/22 7:09 PM: This is where you come in to confluence via google or the website, suggest a parent page is put under the home page called parsers (or something similar) and the child pages detailing the parsers is put under there. !screenshot-2.png! As you can see there is no way to see these pages: !screenshot-1.png! was (Author: monkmachine): This is where you come in to confluence via google or the website: !screenshot-2.png! As you can see there is no way to see these pages: !screenshot-1.png! > Move Parsers pages in Confluence under the home page > > > Key: TIKA-3753 > URL: https://issues.apache.org/jira/browse/TIKA-3753 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Dan Coldrick >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > Move Parsers pages in Confluence under the home page, at the moment it's > quite hard to find the parsers info as the don't show on the homepage in the > tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3753) Move Parsers pages in Confluence under the home page
[ https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533981#comment-17533981 ] Dan Coldrick commented on TIKA-3753: [~tallison] you ok if I do it and you can have a quick review? > Move Parsers pages in Confluence under the home page > > > Key: TIKA-3753 > URL: https://issues.apache.org/jira/browse/TIKA-3753 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Dan Coldrick >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > Move Parsers pages in Confluence under the home page, at the moment it's > quite hard to find the parsers info as the don't show on the homepage in the > tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3753) Move Parsers pages in Confluence under the home page
[ https://issues.apache.org/jira/browse/TIKA-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533999#comment-17533999 ] Dan Coldrick commented on TIKA-3753: [~tallison] New Page created with page tree underneath, I've added what I think should be in there but you may have a different opinion/may want to add other pages? https://cwiki.apache.org/confluence/display/TIKA/Parsers > Move Parsers pages in Confluence under the home page > > > Key: TIKA-3753 > URL: https://issues.apache.org/jira/browse/TIKA-3753 > Project: Tika > Issue Type: Improvement > Components: documentation >Reporter: Dan Coldrick >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > Move Parsers pages in Confluence under the home page, at the moment it's > quite hard to find the parsers info as the don't show on the homepage in the > tree. Suggest they are moved under the home tree under a new page? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535102#comment-17535102 ] Dan Coldrick commented on TIKA-1735: Hi [~tallison] Sorry to bother you, I've create a fork of this and managed to pull out all of the text from the json. Can you have quick review to see if I'm going along the right lines? I think I've tried to make it so you can pass the config via the tika-config.xl parser section. Think there is still a lot to do to finish it off such as making it resilient (i.e. checking the exe is there), sorting the formatting (objects of type mtext have formatting [https://www.cadforum.cz/en/text-formatting-codes-in-mtext-objects-tip8640|https://www.cadforum.cz/en/text-formatting-codes-in-mtext-objects-tip8640),] of which I've done a few of them), adding proper test cases, think I also need to add something around timeout of the creation of the .json by dwgread (i.e. kill the spawned process after x amount of time (probably config)) I'd really appreciate some feedback as like I said before this is all quite new to me: [https://github.com/monkmachine/tika/tree/DWGRead] If you don't have time please let me know and I'll go find someone else to annoy ;) Thanks > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: Bug >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.ext
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535912#comment-17535912 ] Dan Coldrick commented on TIKA-1735: [~tallison] Hi I've opened [https://github.com/apache/tika/pull/558] If you could have a quick scan and review I'd appreciate it , like I say not finished but want to check I'm going down the right route. > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: Bug >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536356#comment-17536356 ] Dan Coldrick commented on TIKA-3725: Hi [~tallison] Seen you've had some responses :) What are the disadvantages of adding spring? What would the advantages be? Assume it adds quite a lot of complication but brings a load of benefits(but maybe complications)? Would it be possible to drive Tika Server forward with spring by allowing more configuration (installation as a service, SSL, Authorization, whitelists etc)? To me the Rest Api's offer so much as a generic service. > Add Authorization to Tika Server (Suggest Basic to start off with) > -- > > Key: TIKA-3725 > URL: https://issues.apache.org/jira/browse/TIKA-3725 > Project: Tika > Issue Type: New Feature > Components: tika-server >Affects Versions: 2.3.0 >Reporter: Dan Coldrick >Priority: Minor > > I would be good to get some Authentication/Authorization added to TIKA server > to be able to add another layer of security around the Tika Server Rest > service. > This could become a rabbit hole with the number of options available around > Authentication/Authorization (Oauth, OpenId etc) so suggest as a starter > basic Auth is added. > How to store user(s)/password suggest looking at how other apache products do > the same? -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536684#comment-17536684 ] Dan Coldrick commented on TIKA-1570: [~tallison] Many thanks, will try to look at this next week :) > Seeking a stop method for better use with Apache Commons Daemon > --- > > Key: TIKA-1570 > URL: https://issues.apache.org/jira/browse/TIKA-1570 > Project: Tika > Issue Type: Improvement > Components: server >Affects Versions: 1.7 >Reporter: Jason Borg >Priority: Minor > Fix For: 2.4.1 > > > I've got tika-server-1.7.jar from http://tika.apache.org/download.html > I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon > from http://commons.apache.org/proper/commons-daemon/binaries.html > I can get Tika started as a service, but I can't determine what to use for a > stop method. > prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath > "C:\Tika Service\tika-server-1.7.jar" --StartClass > "org.apache.tika.server.TikaServerCli" --StopClass > "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main > --Description "Tika Daemon Windows Service" --StartMode java --StopMode java > This starts, and works as I'd hope, but when trying to stop the service it > doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] > args) isn't a suitable stop method, but I'm lost for alternatives. > Using Daemon in exe mode works for start, but gives inconsistent results for > stop. Adding a stop method to Tika would be ideal. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538400#comment-17538400 ] Dan Coldrick commented on TIKA-1735: Apologies [~tallison] Not been able to look at the regexes properly today, I have run a load of documents through (100+) and found a few more formatting tags: \pxqc; = centered \pxqr;= right \pxql; = left Also wasted 2 hours trying to work out why the json was coming out invalid against some, turns out some idiot (me) had been replacing "nan" with "" think I should replace NaN with null instead of 0? > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: Bug >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3742) Advice around DGN7 parser and whether to add to TIKA
[ https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539019#comment-17539019 ] Dan Coldrick commented on TIKA-3742: [~nick] any advice? I'm stuck on the random chars at the moment with this one so any help would be appreciated :) > Advice around DGN7 parser and whether to add to TIKA > > > Key: TIKA-3742 > URL: https://issues.apache.org/jira/browse/TIKA-3742 > Project: Tika > Issue Type: Task > Components: parser >Reporter: Dan Coldrick >Priority: Minor > Attachments: 1264t.dgn, DGN.zip, ExampleOutput.txt > > > Hi [~tallison] & Whoever else. > I managed to compile the C/C++ library [http://dgnlib.maptools.org/] for > DGN7 which produces an dgndump.exe which will dump all the data from the DGN. > From my initial testing it looks pretty good. > Would you guys think it was worth adding this or just keep it as a custom > parser rather than in the main source code? It's under MIT license. I've > attached the exe (zipped), a copy of the output from the dump and my very > dirty testing calling the exe (my code I was only interested in the Strings > so am only pulling those into a string array at the moment to check it's > pulling out the correct data). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539809#comment-17539809 ] Dan Coldrick commented on TIKA-1570: [~tallison] I've tested and it works, I've created a WIP page in confluence on how I got it to install as a Windows service. I needed a break from DWG's so picked this up instead :) Feel free to butcher my confluence page: [https://cwiki.apache.org/confluence/display/TIKA/TikaServer+Windows+Service+-+WIP] > Seeking a stop method for better use with Apache Commons Daemon > --- > > Key: TIKA-1570 > URL: https://issues.apache.org/jira/browse/TIKA-1570 > Project: Tika > Issue Type: Improvement > Components: server >Affects Versions: 1.7 >Reporter: Jason Borg >Priority: Minor > Fix For: 2.4.1 > > > I've got tika-server-1.7.jar from http://tika.apache.org/download.html > I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon > from http://commons.apache.org/proper/commons-daemon/binaries.html > I can get Tika started as a service, but I can't determine what to use for a > stop method. > prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath > "C:\Tika Service\tika-server-1.7.jar" --StartClass > "org.apache.tika.server.TikaServerCli" --StopClass > "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main > --Description "Tika Daemon Windows Service" --StartMode java --StopMode java > This starts, and works as I'd hope, but when trying to stop the service it > doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] > args) isn't a suitable stop method, but I'm lost for alternatives. > Using Daemon in exe mode works for start, but gives inconsistent results for > stop. Adding a stop method to Tika would be ideal. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3523) A replacement for enableFileUrl or Support for Google Cloud
[ https://issues.apache.org/jira/browse/TIKA-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539958#comment-17539958 ] Dan Coldrick commented on TIKA-3523: [~tallison] that error is probably because the command needs to be in quotes? Probably look for C:\Program Files\xx where Program Files has a space in so the parameter is being split into "C:\Program" and "Files\x" "C:\Program Files\" Not interested in this ticket but randomly saw the error. > A replacement for enableFileUrl or Support for Google Cloud > --- > > Key: TIKA-3523 > URL: https://issues.apache.org/jira/browse/TIKA-3523 > Project: Tika > Issue Type: Wish > Components: tika-server >Affects Versions: 2.0.0 >Reporter: Fatih Pazarbasi >Priority: Minor > > Hello, > I have a setup where users upload their files to a cloud bucket and I forward > the fileUrl to make ocr on them in a serverless cloud instance. I do it this > way so the users do not contact with the Tika Server and I have a copy of > what they've sent to process it. Also they have nothing to do with the > unprocessed response. > Now that you've removed the enableFileUrl... I have to download the files to > the backend instance from the cloud bucket they have uploaded their files to, > and put them to /tika server back again... > I tried the following config.xml to work around the situation but it was in > vain... > For the made up url: > [https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/] > {code:java} > > > >fsf > > https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o > > > > > > > >fse >gs://abcd-efgh.appspot.com/users > > > > > > true > > > > > /path/to/tika-config.xml > > {code} > {code:java} > headers: { > Accept: 'text/plain', > 'User-Agent': 'Firebase Functions', > fetcherName: 'fsf', > fetchKey: 'somefilethatdoesnotexist.pdf', > },{code} > It doesn't support the gs:// Google Storage bucket either. I have all the > necessary permissions but it didn't help. I'm using a dockerized version of > tika server, so the file System does not seem to be my concern... > > In the golden times of 1.2x Iwas simply using: > > {code:java} > headers: { > Accept: 'text/plain', > 'User-Agent': 'Firebase Functions', > fileUrl: > 'https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf', > > },{code} > > > Am I missing something? If not my wish is that can you please make it so > that fetchName is the definitive first part of the old fileUrl and fetchKey > is the specific pointer to a file? > This way I have control over the urls that's been sent to tika server to some > extend, unlike enableFileUrl and also eat my cake without creating extra > traffic on the backend by downloading from the bucket and uploading to tika. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541833#comment-17541833 ] Dan Coldrick commented on TIKA-1735: [~tallison] Just so you know this isn't dead, we're still testing dwgread over the next week or so, I'm on holiday from tomorrow so will pick this back up when I'm back. > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: Bug >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612781#comment-17612781 ] Dan Coldrick commented on TIKA-1735: [~tallison] Apologies I've been missing for a few months, got pulled onto another project which has consumed pretty much all my time. We've got some testing to do this week but think we're really close to being able to say you can merge this pull request. Is there anything you would need as I've done lots of commits and playing about? I'm unsure how you take all those commits and clean them up into one nice commit? I think sometime next week I could give the go ahead once we've finished testing to merge? Let me know what you need? Thanks Dan > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: Bug >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612801#comment-17612801 ] Dan Coldrick commented on TIKA-1735: [~tallison] Happy with that, can you give me until tomorrow night (UK time)? I've got some testing I can do with my test team and come back to you? > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: New Feature >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613098#comment-17613098 ] Dan Coldrick commented on TIKA-1735: [~tallison] All good, happy for it to be merged. I've only tested on windows, I've added some tests for the config to test if DWG read is installed to run it (the test check whether the DWGRead can be found and if it can't abandons the tests), not sure if on your build server you want to install it and have the tests run? Leave that up to you. > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: New Feature >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613100#comment-17613100 ] Dan Coldrick commented on TIKA-1735: [~tallison] should also say thank you for your help :) We've had some real good success with this. > Unsupported AutoCAD drawing version: AC1027 > --- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: New Feature >Reporter: Luca Perico >Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > " > > 500 name=""QTime"">3 AutoCAD drawing version: AC1027 name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > 500 > " -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser
Dan Coldrick created TIKA-3883: -- Summary: Fixes for Parsing DWG files using DWG read Parser Key: TIKA-3883 URL: https://issues.apache.org/jira/browse/TIKA-3883 Project: Tika Issue Type: Bug Components: parser Reporter: Dan Coldrick We have identified a couple of problems with parsing the JSON produced by DWG Read. This ticket is to Jira is to fix those issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser
[ https://issues.apache.org/jira/browse/TIKA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Coldrick updated TIKA-3883: --- Description: We have identified a couple of problems with parsing the JSON produced by DWG Read. This Jira is to fix those issues (was: We have identified a couple of problems with parsing the JSON produced by DWG Read. This ticket is to Jira is to fix those issues) > Fixes for Parsing DWG files using DWG read Parser > - > > Key: TIKA-3883 > URL: https://issues.apache.org/jira/browse/TIKA-3883 > Project: Tika > Issue Type: Bug > Components: parser >Reporter: Dan Coldrick >Priority: Minor > > We have identified a couple of problems with parsing the JSON produced by DWG > Read. This Jira is to fix those issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser
[ https://issues.apache.org/jira/browse/TIKA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619052#comment-17619052 ] Dan Coldrick commented on TIKA-3883: [~tallison] unfortunately not, none of our team have access to create DWG's and I can't use use the ones we have used in our testing. We had 3 DWG files fail out of circa 12k and this code change fixed those, not sure if that's acceptable to you? > Fixes for Parsing DWG files using DWG read Parser > - > > Key: TIKA-3883 > URL: https://issues.apache.org/jira/browse/TIKA-3883 > Project: Tika > Issue Type: Bug > Components: parser >Reporter: Dan Coldrick >Priority: Minor > > We have identified a couple of problems with parsing the JSON produced by DWG > Read. This Jira is to fix those issues -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3883) Fixes for Parsing DWG files using DWG read Parser
[ https://issues.apache.org/jira/browse/TIKA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619060#comment-17619060 ] Dan Coldrick commented on TIKA-3883: Thanks [~tallison] > Fixes for Parsing DWG files using DWG read Parser > - > > Key: TIKA-3883 > URL: https://issues.apache.org/jira/browse/TIKA-3883 > Project: Tika > Issue Type: Bug > Components: parser >Reporter: Dan Coldrick >Priority: Minor > > We have identified a couple of problems with parsing the JSON produced by DWG > Read. This Jira is to fix those issues -- This message was sent by Atlassian Jira (v8.20.10#820010)