[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537730#comment-17537730
 ] 

ASF GitHub Bot commented on TIKA-1735:
--------------------------------------

monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1128022252

   > 
   
   
   
   > > > @nddipiazza @tballison This looks messy, can you advise a way to clean 
it up? A better way of doing it? Still think its worth having the comments 
there?
   > > 
   > > 
   > > OMG, what a mess. The output, not you.
   > > What I've done before is a regex pattern+matcher that captures the 
escape sequence first OR then the controls ~/(\)|([A-Z0-9]{1,5})/, capture 
group(2) (and skip it), append group 1 to tail.
   > > That's a rough answer and probably wrong, but see what you can do.
   > > The braces...hmmmm... Maybe take a second pass and do the same thing? 
You can't just add this in the OR ~/{[^}]{0,50}}/ because that'll not correctly 
process escaped } within the brackets.
   > 
   > I threw together a somewhat working example. I think there are still some 
things I'm missing: 
https://github.com/tballison/tika-addons/blob/main/DWGReadDev/src/test/java/TestRegexCleaners.java
   > 
   > Obv, we'll want to make the patterns static, etc.
   
   Will take a look @tballison , thanks for your help. I've been cleaning up 
the code to match the checkstyle (which I've only learnt about today) and 
testing my janky regexes (in the current form) against some documents I have.  
Like I said I managed to build Tika Server and check the config was working 
correctly so been a successful few hours today :) Will take a look at your 
example tomorrow and hopefully at some point this week find some time to check 
the stop method on the other pull request. We can then look to create a 
guide/script on how to install Tika Server as a windows service using Daemon.




> Unsupported AutoCAD drawing version: AC1027
> -------------------------------------------
>
>                 Key: TIKA-1735
>                 URL: https://issues.apache.org/jira/browse/TIKA-1735
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Luca Perico
>            Priority: Major
>         Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "<?xml version=""1.0"" encoding=""UTF-8""?>
> <response>
> <lst name=""responseHeader""><int name=""status"">500</int><int 
> name=""QTime"">3</int></lst><lst name=""error""><str A1:F378 Unsupported 
> AutoCAD drawing version: AC1027</str><str 
> name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>       at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>       at org.eclipse.jetty.server.Server.handle(Server.java:497)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>       at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>       at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>       ... 27 more
> </str><int name=""code"">500</int></lst>
> </response>"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to