[jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617924#comment-16617924 ] Tim Allison edited comment on TIKA-2727 at 9/17/18 7:16 PM: {quote}Does sersion 1.19 solves this issue more delicate? {quote} Somewhat, if the user sets the above option, we respect that. Otherwise, we set the limit to 20 expansions for our XML parsers. {quote}I'm afraid that 1.19 will brings same issue back to us. {quote} Y. That wouldn't surprise me. If you're able to help us figure out what's going on, we can try to fix it. :D was (Author: talli...@mitre.org): {quote}Does sersion 1.19 solves this issue more delicate? {quote} Somewhat, if the user sets the above option, we respect that. Otherwise, we set the limit to 20 expansions for our XML parsers. {quote}I'm afraid that 1.19 will brings same issue back to us. {quote} Y. That wouldn't surprise me. If you're able to help us figure out what's going on, we can fix it. :D > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617928#comment-16617928 ] Slava G commented on TIKA-2727: --- Will definitely work to provide as much as possible information to solve this. Thanks > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617924#comment-16617924 ] Tim Allison commented on TIKA-2727: --- {quote}Does sersion 1.19 solves this issue more delicate? {quote} Somewhat, if the user sets the above option, we respect that. Otherwise, we set the limit to 20 expansions for our XML parsers. {quote}I'm afraid that 1.19 will brings same issue back to us. {quote} Y. That wouldn't surprise me. If you're able to help us figure out what's going on, we can fix it. :D > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617833#comment-16617833 ] Slava G edited comment on TIKA-2727 at 9/17/18 5:23 PM: I'm using TIKA directly in my code, Does sersion 1.19 solves this issue more delicate? Also, we're using 1.17, because when we switched to 1.18 many customers data for parsing is failed due to very strange error : https://issues.apache.org/jira/browse/TIKA-2676 I'm afraid that 1.19 will brings same issue back to us. was (Author: slavago): I'm using TIKA directly in my code, Does sersion 1.19 solves this issue more delicate? Also, we're using 1.17, because when we switched to 1.18 many customers data for parsing is failed due to very strange error, that was not discivered by our QA. So,. I'm afraid that 1.19 will brings same issue back to us. > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: 1.19?
Hi, folks. During building with openjdk10 I found some potential issues and/or places for future improvement: - minor javadoc issues (I'll fix them, and as they are trivial I don't see them as blocker for rc1->release promotion): usage of bare `>` sign instead of `>`, references to unimported or absent methods/classes etc; - new javadoc will switch to use html5 instead of html4 and recommends to choose one explicitly; - some bnd (OSGi bundler) warnings related to: - versioning (`aQute.bnd.annotation.Version` annotation is deprecated since bnd 3.2); - exporting one package from different bundles (like `o.a.tika.language.translate` from both tika-core and tika-translate); - using OSGi activator from another bundle; - some private references, could be an issue but needs additional research; - forbiddenapis warnings because checks are enabled for some types/classes not present on classpath (like commons-io in tika-core) but it's just some noise and not even a minor issue; On Sat, Sep 15, 2018 at 2:29 PM Tim Allison wrote: > I found some areas for improvement, but no surprises. I'm going to > cut 1.19-rc1 now. > On Thu, Sep 13, 2018 at 9:04 PM Tim Allison wrote: > > > > Reports are here: > > http://162.242.228.174/reports/tika-1.18V1.19-pre-rc1.tgz > > > > There are a few things I want to look into...tomorrow. Let me know if > > you see anything surprising. > > On Tue, Sep 11, 2018 at 6:17 AM Tim Allison wrote: > > > > > > Unless there are objections, I’ll kick off the regression tests today. > > > > > > On Thu, Sep 6, 2018 at 10:34 AM Tim Allison > wrote: > > >> > > >> All, > > >> > > >> POI 4.0.0 is available. I'm integrating that now, and I plan to > > >> kick off the full regression tests shortly. Are there any other > > >> blockers on 1.19? Anything else we want to get in? > > >> > > >> If you've made commits against master but not branch_1x, those > > >> changes won't make it into 1.19. I've cherry-picked a few, but might > > >> not have gotten all of them. > > >> > > >>Cheers, > > >> > > >>Tim > -- Best regards, Konstantin Gribov
[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617833#comment-16617833 ] Slava G commented on TIKA-2727: --- I'm using TIKA directly in my code, Does sersion 1.19 solves this issue more delicate? Also, we're using 1.17, because when we switched to 1.18 many customers data for parsing is failed due to very strange error, that was not discivered by our QA. So,. I'm afraid that 1.19 will brings same issue back to us. > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617788#comment-16617788 ] Tim Allison commented on TIKA-2552: --- [~TigerC10], y, give it a try: [https://lists.apache.org/thread.html/f078df60365f496b369d97fdf51f565047f15447ca454239579508aa@%3Cdev.tika.apache.org%3E] > Upgrade to POI 4.0.0 when available > --- > > Key: TIKA-2552 > URL: https://issues.apache.org/jira/browse/TIKA-2552 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Blocker > Fix For: 1.19, 2.0.0 > > Attachments: TIKA-2552_--_first_draft.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617783#comment-16617783 ] Tim Allison commented on TIKA-2727: --- If you're using Tika directly within your code (DON'T DO THIS!), that'll affect everything in your jvm that is paying attention to it. :D If you're running tika in batch mode, you can limit it to the child process with, e.g. {{-JDjdk.xml.entityExpansionLimit=10}} Or, if you are using the new robust tika-server feature available in Tika 1.19, you can specify that for the child jvm, too. Or, if you're using the ForkParser, you can specify it there. > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617718#comment-16617718 ] Konstantin Gribov commented on TIKA-2552: - [~TigerC10], Tim rolled RC1 this weekend, so, hopefully this week. > Upgrade to POI 4.0.0 when available > --- > > Key: TIKA-2552 > URL: https://issues.apache.org/jira/browse/TIKA-2552 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Blocker > Fix For: 1.19, 2.0.0 > > Attachments: TIKA-2552_--_first_draft.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Release Apache Tika 1.19 Candidate #1
Tim, thanks for staging new release. All LGTM: builds with all tests with OpenJDK 8u181 & 10.0.2+13 on ArchLinux (without tesseract/ocr). All checksums are correct, gpg signatures are valid. [x] +1 Release this package as Apache Tika 1.19 [ ] -1 Do not release this package because... On Mon, Sep 17, 2018 at 5:49 PM Oleg Tikhonov wrote: > Hi Tim, > thanks ! > > [INFO] Apache Tika parent . SUCCESS [ > 5.138 s] > [INFO] Apache Tika core ... SUCCESS [ > 58.722 s] > [INFO] Apache Tika parsers SUCCESS [04:20 > min] > [INFO] Apache Tika XMP SUCCESS [ > 10.705 s] > [INFO] Apache Tika serialization .. SUCCESS [ > 6.820 s] > [INFO] Apache Tika batch .. SUCCESS [02:32 > min] > [INFO] Apache Tika language detection . SUCCESS [ > 5.612 s] > [INFO] Apache Tika application SUCCESS [01:27 > min] > [INFO] Apache Tika OSGi bundle SUCCESS [ > 47.224 s] > [INFO] Apache Tika translate .. SUCCESS [ > 5.712 s] > [INFO] Apache Tika server . SUCCESS [01:23 > min] > [INFO] Apache Tika examples ... SUCCESS [ > 24.945 s] > [INFO] Apache Tika Java-7 Components .. SUCCESS [ > 6.356 s] > [INFO] Apache Tika eval ... SUCCESS [ > 51.488 s] > [INFO] Apache Tika Deep Learning (powered by DL4J) SUCCESS [05:41 > min] > [INFO] Apache Tika Natural Language Processing SUCCESS [ > 56.145 s] > [INFO] Apache Tika SUCCESS [ > 0.088 s] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 20:09 min > [INFO] Finished at: 2018-09-17T17:47:18+03:00 > [INFO] Final Memory: 187M/1674M > +1 To release. > > Did only basic stuff, centOS 7.4 > > Oleg > > On Sat, Sep 15, 2018 at 2:42 PM Tim Allison wrote: > > > A candidate for the Tika 1.19 release is available at: > > https://dist.apache.org/repos/dist/dev/tika/ > > > > The release candidate is a zip archive of the sources in: > > https://github.com/apache/tika/tree/1.19-rc1/ > > > > The SHA-512 checksum of the archive is > > > > > b0ec5f1746ceb002e3f33d2a55680952dad63ec9421f5245d28e33398d077547b88a6f521a4b76563f38bf887aa33b8a07de318c5c546039623be3ae65d34eec. > > > > In addition, a staged maven repository is available here: > > > > > https://repository.apache.org/content/repositories/orgapachetika-1036/org/apache/tika > > > > Please vote on releasing this package as Apache Tika 1.19. > > The vote is open for the next 72 hours and passes if a majority of at > > least three +1 Tika PMC votes are cast. > > > > [ ] +1 Release this package as Apache Tika 1.19 > > [ ] -1 Do not release this package because... > > > > Here's my +1. > > > > Cheers, > > > > Tim > > > -- Best regards, Konstantin Gribov
Re: [VOTE] Release Apache Tika 1.19 Candidate #1
Hi Tim, thanks ! [INFO] Apache Tika parent . SUCCESS [ 5.138 s] [INFO] Apache Tika core ... SUCCESS [ 58.722 s] [INFO] Apache Tika parsers SUCCESS [04:20 min] [INFO] Apache Tika XMP SUCCESS [ 10.705 s] [INFO] Apache Tika serialization .. SUCCESS [ 6.820 s] [INFO] Apache Tika batch .. SUCCESS [02:32 min] [INFO] Apache Tika language detection . SUCCESS [ 5.612 s] [INFO] Apache Tika application SUCCESS [01:27 min] [INFO] Apache Tika OSGi bundle SUCCESS [ 47.224 s] [INFO] Apache Tika translate .. SUCCESS [ 5.712 s] [INFO] Apache Tika server . SUCCESS [01:23 min] [INFO] Apache Tika examples ... SUCCESS [ 24.945 s] [INFO] Apache Tika Java-7 Components .. SUCCESS [ 6.356 s] [INFO] Apache Tika eval ... SUCCESS [ 51.488 s] [INFO] Apache Tika Deep Learning (powered by DL4J) SUCCESS [05:41 min] [INFO] Apache Tika Natural Language Processing SUCCESS [ 56.145 s] [INFO] Apache Tika SUCCESS [ 0.088 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 20:09 min [INFO] Finished at: 2018-09-17T17:47:18+03:00 [INFO] Final Memory: 187M/1674M +1 To release. Did only basic stuff, centOS 7.4 Oleg On Sat, Sep 15, 2018 at 2:42 PM Tim Allison wrote: > A candidate for the Tika 1.19 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > https://github.com/apache/tika/tree/1.19-rc1/ > > The SHA-512 checksum of the archive is > > b0ec5f1746ceb002e3f33d2a55680952dad63ec9421f5245d28e33398d077547b88a6f521a4b76563f38bf887aa33b8a07de318c5c546039623be3ae65d34eec. > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1036/org/apache/tika > > Please vote on releasing this package as Apache Tika 1.19. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.19 > [ ] -1 Do not release this package because... > > Here's my +1. > > Cheers, > > Tim >
[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available
[ https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617586#comment-16617586 ] Ian Cervantez commented on TIKA-2552: - Is there an eta on 1.19 yet? > Upgrade to POI 4.0.0 when available > --- > > Key: TIKA-2552 > URL: https://issues.apache.org/jira/browse/TIKA-2552 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Blocker > Fix For: 1.19, 2.0.0 > > Attachments: TIKA-2552_--_first_draft.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (TIKA-2728) Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)
[ https://issues.apache.org/jira/browse/TIKA-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-2728. -- Resolution: Duplicate > Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility) > -- > > Key: TIKA-2728 > URL: https://issues.apache.org/jira/browse/TIKA-2728 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.18 > Environment: Java 9 / Java 10 >Reporter: Ian Cervantez >Priority: Major > Labels: dependency-upgrade > > The poi project has a bug in version 3.17 when running on Java 9/10: > {code:none} > WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper > (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to > method > com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int) > WARNING: Please consider reporting this to the maintainers of > org.apache.poi.util.DocumentHelper > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future > release{code} > See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details. > > There was never a 3.17.1 release, but there has been a 4.0.0 release. > Suggest updating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TIKA-2728) Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)
[ https://issues.apache.org/jira/browse/TIKA-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Cervantez updated TIKA-2728: Description: The poi project has a bug in version 3.17 when running on Java 9/10: {code:none} WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to method com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int) WARNING: Please consider reporting this to the maintainers of org.apache.poi.util.DocumentHelper WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release{code} See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details. There was never a 3.17.1 release, but there has been a 4.0.0 release. Suggest updating. was: The poi project has a bug in version 3.17 when running on Java 9/10: {code:java} WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to method com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int) WARNING: Please consider reporting this to the maintainers of org.apache.poi.util.DocumentHelper WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release{code} See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details. There was never a 3.17 release, but there has been a 4.0.0 release. Suggest updating. > Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility) > -- > > Key: TIKA-2728 > URL: https://issues.apache.org/jira/browse/TIKA-2728 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.18 > Environment: Java 9 / Java 10 >Reporter: Ian Cervantez >Priority: Major > Labels: dependency-upgrade > > The poi project has a bug in version 3.17 when running on Java 9/10: > {code:none} > WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper > (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to > method > com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int) > WARNING: Please consider reporting this to the maintainers of > org.apache.poi.util.DocumentHelper > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future > release{code} > See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details. > > There was never a 3.17.1 release, but there has been a 4.0.0 release. > Suggest updating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TIKA-2728) Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)
Ian Cervantez created TIKA-2728: --- Summary: Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility) Key: TIKA-2728 URL: https://issues.apache.org/jira/browse/TIKA-2728 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.18 Environment: Java 9 / Java 10 Reporter: Ian Cervantez The poi project has a bug in version 3.17 when running on Java 9/10: {code:java} WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to method com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int) WARNING: Please consider reporting this to the maintainers of org.apache.poi.util.DocumentHelper WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release{code} See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details. There was never a 3.17 release, but there has been a 4.0.0 release. Suggest updating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617516#comment-16617516 ] Slava G edited comment on TIKA-2727 at 9/17/18 1:22 PM: Great !!! Thanks. Is the jdk.xml.entityExpansionLimit relevant only for TIKA or can affect anything else XML related ? Also, is the 1.19 uses same fix or it's more TIKA related ? was (Author: slavago): Great !!! Thanks. Is the jdk.xml.entityExpansionLimit relevant only for TIKA or can affect anything else XML related ? > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617516#comment-16617516 ] Slava G commented on TIKA-2727: --- Great !!! Thanks. Is the jdk.xml.entityExpansionLimit relevant only for TIKA or can affect anything else XML related ? > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617480#comment-16617480 ] Tim Allison commented on TIKA-2727: --- Until you can upgrade to 1.19, you should be able to limit entity expansion via the commandline, e.g.: {{-Djdk.xml.entityExpansionLimit=10}} > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
[ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2727. --- Resolution: Fixed Fix Version/s: 2.0.0 1.19 Thank you for sharing this problem with us. We fixed this already in 1.19, which is currently in the voting process for release and should be out in a few days, unless there are surprises... > Parsing and detect mime type of XML file stuck in infinite loop > --- > > Key: TIKA-2727 > URL: https://issues.apache.org/jira/browse/TIKA-2727 > Project: Tika > Issue Type: Bug > Components: detector, parser >Affects Versions: 1.17 >Reporter: Slava G >Assignee: Tim Allison >Priority: Major > Fix For: 1.19, 2.0.0 > > Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml > > > Hi, > I'm trying to parse (even mime type detect) some XML file that it's not > large, but kinda tricky and my process hangs on : > XMLStringBuffer.append(char[], int, int) line: not available > XMLStringBuffer.append(XMLString) line: not available > XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, > String, boolean, String) line: not available > XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available > XMLNSDocumentScannerImpl.scanStartElement() line: not available > XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not > available > XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean) > line: not available > XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) > line: not available > XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not > available > XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) > line: not available > SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not > available > SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not > available > SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available > SAXParserImpl.parse(InputSource, DefaultHandler) line: not available > SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 > XmlRootExtractor.extractRootElement(InputStream) line: 62 > XmlRootExtractor.extractRootElement(byte[]) line: 42 > MimeTypes.getMimeType(byte[]) line: 212 > MimeTypes.detect(InputStream, Metadata) line: 494 > DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84 > > Please see attached XML file. > Please advise. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)