[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537730#comment-17537730 ]
ASF GitHub Bot commented on TIKA-1735: -------------------------------------- monkmachine commented on PR #558: URL: https://github.com/apache/tika/pull/558#issuecomment-1128022252 > > > > @nddipiazza @tballison This looks messy, can you advise a way to clean it up? A better way of doing it? Still think its worth having the comments there? > > > > > > OMG, what a mess. The output, not you. > > What I've done before is a regex pattern+matcher that captures the escape sequence first OR then the controls ~/(\)|([A-Z0-9]{1,5})/, capture group(2) (and skip it), append group 1 to tail. > > That's a rough answer and probably wrong, but see what you can do. > > The braces...hmmmm... Maybe take a second pass and do the same thing? You can't just add this in the OR ~/{[^}]{0,50}}/ because that'll not correctly process escaped } within the brackets. > > I threw together a somewhat working example. I think there are still some things I'm missing: https://github.com/tballison/tika-addons/blob/main/DWGReadDev/src/test/java/TestRegexCleaners.java > > Obv, we'll want to make the patterns static, etc. Will take a look @tballison , thanks for your help. I've been cleaning up the code to match the checkstyle (which I've only learnt about today) and testing my janky regexes (in the current form) against some documents I have. Like I said I managed to build Tika Server and check the config was working correctly so been a successful few hours today :) Will take a look at your example tomorrow and hopefully at some point this week find some time to check the stop method on the other pull request. We can then look to create a guide/script on how to install Tika Server as a windows service using Daemon. > Unsupported AutoCAD drawing version: AC1027 > ------------------------------------------- > > Key: TIKA-1735 > URL: https://issues.apache.org/jira/browse/TIKA-1735 > Project: Tika > Issue Type: Bug > Reporter: Luca Perico > Priority: Major > Attachments: testDWG-AC1027.dwg > > > Trying to index .dwg file (version AC1027) I get 500 error response. > "<?xml version=""1.0"" encoding=""UTF-8""?> > <response> > <lst name=""responseHeader""><int name=""status"">500</int><int > name=""QTime"">3</int></lst><lst name=""error""><str A1:F378 Unsupported > AutoCAD drawing version: AC1027</str><str > name=""trace"">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: > AC1027 > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.eclipse.jetty.server.Server.handle(Server.java:497) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD > drawing version: AC1027 > at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221) > ... 27 more > </str><int name=""code"">500</int></lst> > </response>" -- This message was sent by Atlassian Jira (v8.20.7#820007)