[jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617924#comment-16617924
 ] 

Tim Allison edited comment on TIKA-2727 at 9/17/18 7:16 PM:


{quote}Does sersion 1.19 solves this issue more delicate?
{quote}
Somewhat, if the user sets the above option, we respect that.  Otherwise, we 
set the limit to 20 expansions for our XML parsers.
{quote}I'm afraid that 1.19 will brings same issue back to us. 
{quote}
Y.  That wouldn't surprise me.  If you're able to help us figure out what's 
going on, we can try to fix it. :D


was (Author: talli...@mitre.org):
{quote}Does sersion 1.19 solves this issue more delicate?
{quote}
Somewhat, if the user sets the above option, we respect that.  Otherwise, we 
set the limit to 20 expansions for our XML parsers.
{quote}I'm afraid that 1.19 will brings same issue back to us. 
{quote}
Y.  That wouldn't surprise me.  If you're able to help us figure out what's 
going on, we can fix it. :D

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617928#comment-16617928
 ] 

Slava G commented on TIKA-2727:
---

Will definitely work to provide as much as possible information to solve this.

Thanks 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617924#comment-16617924
 ] 

Tim Allison commented on TIKA-2727:
---

{quote}Does sersion 1.19 solves this issue more delicate?
{quote}
Somewhat, if the user sets the above option, we respect that.  Otherwise, we 
set the limit to 20 expansions for our XML parsers.
{quote}I'm afraid that 1.19 will brings same issue back to us. 
{quote}
Y.  That wouldn't surprise me.  If you're able to help us figure out what's 
going on, we can fix it. :D

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617833#comment-16617833
 ] 

Slava G edited comment on TIKA-2727 at 9/17/18 5:23 PM:


I'm using TIKA directly in my code,

Does sersion 1.19 solves this issue more delicate?

Also, we're using 1.17, because when we switched to 1.18 many customers data 
for parsing is failed due to very strange error :    
https://issues.apache.org/jira/browse/TIKA-2676

 I'm afraid that 1.19 will brings same issue back to us. 

 


was (Author: slavago):
I'm using TIKA directly in my code,

Does sersion 1.19 solves this issue more delicate?

Also, we're using 1.17, because when we switched to 1.18 many customers data 
for parsing is failed due to very strange error, that was not discivered by our 
QA. So,. I'm afraid that 1.19 will brings same issue back to us. 

 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: 1.19?

2018-09-17 Thread Konstantin Gribov
Hi, folks.

During building with openjdk10 I found some potential issues and/or places
for future improvement:

   - minor javadoc issues (I'll fix them, and as they are trivial I don't
   see them as blocker for rc1->release promotion): usage of bare `>` sign
   instead of `>`, references to unimported or absent methods/classes etc;
   - new javadoc will switch to use html5 instead of html4 and recommends
   to choose one explicitly;
   - some bnd (OSGi bundler) warnings related to:
  - versioning (`aQute.bnd.annotation.Version` annotation is deprecated
  since bnd 3.2);
  - exporting one package from different bundles (like
  `o.a.tika.language.translate` from both tika-core and tika-translate);
  - using OSGi activator from another bundle;
  - some private references, could be an issue but needs additional
  research;
   - forbiddenapis warnings because checks are enabled for some
   types/classes not present on classpath (like commons-io in tika-core) but
   it's just some noise and not even a minor issue;


On Sat, Sep 15, 2018 at 2:29 PM Tim Allison  wrote:

> I found some areas for improvement, but no surprises.  I'm going to
> cut 1.19-rc1 now.
> On Thu, Sep 13, 2018 at 9:04 PM Tim Allison  wrote:
> >
> > Reports are here:
> > http://162.242.228.174/reports/tika-1.18V1.19-pre-rc1.tgz
> >
> > There are a few things I want to look into...tomorrow.  Let me know if
> > you see anything surprising.
> > On Tue, Sep 11, 2018 at 6:17 AM Tim Allison  wrote:
> > >
> > > Unless there are objections, I’ll kick off the regression tests today.
> > >
> > > On Thu, Sep 6, 2018 at 10:34 AM Tim Allison 
> wrote:
> > >>
> > >> All,
> > >>
> > >>   POI 4.0.0 is available.  I'm integrating that now, and I plan to
> > >> kick off the full regression tests shortly.  Are there any other
> > >> blockers on 1.19?  Anything else we want to get in?
> > >>
> > >>   If you've made commits against master but not branch_1x, those
> > >> changes won't make it into 1.19.  I've cherry-picked a few, but might
> > >> not have gotten all of them.
> > >>
> > >>Cheers,
> > >>
> > >>Tim
>
-- 

Best regards,
Konstantin Gribov


[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617833#comment-16617833
 ] 

Slava G commented on TIKA-2727:
---

I'm using TIKA directly in my code,

Does sersion 1.19 solves this issue more delicate?

Also, we're using 1.17, because when we switched to 1.18 many customers data 
for parsing is failed due to very strange error, that was not discivered by our 
QA. So,. I'm afraid that 1.19 will brings same issue back to us. 

 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-09-17 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617788#comment-16617788
 ] 

Tim Allison commented on TIKA-2552:
---

[~TigerC10], y, give it a try: 
[https://lists.apache.org/thread.html/f078df60365f496b369d97fdf51f565047f15447ca454239579508aa@%3Cdev.tika.apache.org%3E]

 

> Upgrade to POI 4.0.0 when available
> ---
>
> Key: TIKA-2552
> URL: https://issues.apache.org/jira/browse/TIKA-2552
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 1.19, 2.0.0
>
> Attachments: TIKA-2552_--_first_draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617783#comment-16617783
 ] 

Tim Allison commented on TIKA-2727:
---

If you're using Tika directly within your code (DON'T DO THIS!), that'll affect 
everything in your jvm that is paying attention to it. :D 

 

If you're running tika in batch mode, you can limit it to the child process 
with, e.g. {{-JDjdk.xml.entityExpansionLimit=10}}  Or, if you are using the new 
robust tika-server feature available in Tika 1.19, you can specify that for the 
child jvm, too.  Or, if you're using the ForkParser, you can specify it there.

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-09-17 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617718#comment-16617718
 ] 

Konstantin Gribov commented on TIKA-2552:
-

[~TigerC10], Tim rolled RC1 this weekend, so, hopefully this week.

> Upgrade to POI 4.0.0 when available
> ---
>
> Key: TIKA-2552
> URL: https://issues.apache.org/jira/browse/TIKA-2552
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 1.19, 2.0.0
>
> Attachments: TIKA-2552_--_first_draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Tika 1.19 Candidate #1

2018-09-17 Thread Konstantin Gribov
Tim, thanks for staging new release.

All LGTM: builds with all tests with OpenJDK 8u181 & 10.0.2+13 on ArchLinux
(without tesseract/ocr). All checksums are correct, gpg signatures are
valid.

[x] +1 Release this package as Apache Tika 1.19
[ ] -1 Do not release this package because...


On Mon, Sep 17, 2018 at 5:49 PM Oleg Tikhonov  wrote:

> Hi Tim,
> thanks !
>
> [INFO] Apache Tika parent . SUCCESS [
> 5.138 s]
> [INFO] Apache Tika core ... SUCCESS [
> 58.722 s]
> [INFO] Apache Tika parsers  SUCCESS [04:20
> min]
> [INFO] Apache Tika XMP  SUCCESS [
> 10.705 s]
> [INFO] Apache Tika serialization .. SUCCESS [
> 6.820 s]
> [INFO] Apache Tika batch .. SUCCESS [02:32
> min]
> [INFO] Apache Tika language detection . SUCCESS [
> 5.612 s]
> [INFO] Apache Tika application  SUCCESS [01:27
> min]
> [INFO] Apache Tika OSGi bundle  SUCCESS [
> 47.224 s]
> [INFO] Apache Tika translate .. SUCCESS [
> 5.712 s]
> [INFO] Apache Tika server . SUCCESS [01:23
> min]
> [INFO] Apache Tika examples ... SUCCESS [
> 24.945 s]
> [INFO] Apache Tika Java-7 Components .. SUCCESS [
> 6.356 s]
> [INFO] Apache Tika eval ... SUCCESS [
> 51.488 s]
> [INFO] Apache Tika Deep Learning (powered by DL4J)  SUCCESS [05:41
> min]
> [INFO] Apache Tika Natural Language Processing  SUCCESS [
> 56.145 s]
> [INFO] Apache Tika  SUCCESS [
> 0.088 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 20:09 min
> [INFO] Finished at: 2018-09-17T17:47:18+03:00
> [INFO] Final Memory: 187M/1674M
> +1 To release.
>
> Did only basic stuff, centOS 7.4
>
> Oleg
>
> On Sat, Sep 15, 2018 at 2:42 PM Tim Allison  wrote:
>
> > A candidate for the Tika 1.19 release is available at:
> >   https://dist.apache.org/repos/dist/dev/tika/
> >
> > The release candidate is a zip archive of the sources in:
> >   https://github.com/apache/tika/tree/1.19-rc1/
> >
> > The SHA-512 checksum of the archive is
> >
> >
> b0ec5f1746ceb002e3f33d2a55680952dad63ec9421f5245d28e33398d077547b88a6f521a4b76563f38bf887aa33b8a07de318c5c546039623be3ae65d34eec.
> >
> > In addition, a staged maven repository is available here:
> >
> >
> https://repository.apache.org/content/repositories/orgapachetika-1036/org/apache/tika
> >
> > Please vote on releasing this package as Apache Tika 1.19.
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 Tika PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Tika 1.19
> > [ ] -1 Do not release this package because...
> >
> > Here's my +1.
> >
> > Cheers,
> >
> >   Tim
> >
>
-- 

Best regards,
Konstantin Gribov


Re: [VOTE] Release Apache Tika 1.19 Candidate #1

2018-09-17 Thread Oleg Tikhonov
Hi Tim,
thanks !

[INFO] Apache Tika parent . SUCCESS [
5.138 s]
[INFO] Apache Tika core ... SUCCESS [
58.722 s]
[INFO] Apache Tika parsers  SUCCESS [04:20
min]
[INFO] Apache Tika XMP  SUCCESS [
10.705 s]
[INFO] Apache Tika serialization .. SUCCESS [
6.820 s]
[INFO] Apache Tika batch .. SUCCESS [02:32
min]
[INFO] Apache Tika language detection . SUCCESS [
5.612 s]
[INFO] Apache Tika application  SUCCESS [01:27
min]
[INFO] Apache Tika OSGi bundle  SUCCESS [
47.224 s]
[INFO] Apache Tika translate .. SUCCESS [
5.712 s]
[INFO] Apache Tika server . SUCCESS [01:23
min]
[INFO] Apache Tika examples ... SUCCESS [
24.945 s]
[INFO] Apache Tika Java-7 Components .. SUCCESS [
6.356 s]
[INFO] Apache Tika eval ... SUCCESS [
51.488 s]
[INFO] Apache Tika Deep Learning (powered by DL4J)  SUCCESS [05:41
min]
[INFO] Apache Tika Natural Language Processing  SUCCESS [
56.145 s]
[INFO] Apache Tika  SUCCESS [
0.088 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 20:09 min
[INFO] Finished at: 2018-09-17T17:47:18+03:00
[INFO] Final Memory: 187M/1674M
+1 To release.

Did only basic stuff, centOS 7.4

Oleg

On Sat, Sep 15, 2018 at 2:42 PM Tim Allison  wrote:

> A candidate for the Tika 1.19 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.19-rc1/
>
> The SHA-512 checksum of the archive is
>
> b0ec5f1746ceb002e3f33d2a55680952dad63ec9421f5245d28e33398d077547b88a6f521a4b76563f38bf887aa33b8a07de318c5c546039623be3ae65d34eec.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1036/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.19.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.19
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Cheers,
>
>   Tim
>


[jira] [Commented] (TIKA-2552) Upgrade to POI 4.0.0 when available

2018-09-17 Thread Ian Cervantez (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617586#comment-16617586
 ] 

Ian Cervantez commented on TIKA-2552:
-

Is there an eta on 1.19 yet?

> Upgrade to POI 4.0.0 when available
> ---
>
> Key: TIKA-2552
> URL: https://issues.apache.org/jira/browse/TIKA-2552
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Blocker
> Fix For: 1.19, 2.0.0
>
> Attachments: TIKA-2552_--_first_draft.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2728) Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)

2018-09-17 Thread Nick Burch (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Burch resolved TIKA-2728.
--
Resolution: Duplicate

> Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)
> --
>
> Key: TIKA-2728
> URL: https://issues.apache.org/jira/browse/TIKA-2728
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.18
> Environment: Java 9 / Java 10
>Reporter: Ian Cervantez
>Priority: Major
>  Labels: dependency-upgrade
>
> The poi project has a bug in version 3.17 when running on Java 9/10:
> {code:none}
> WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper 
> (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to 
> method 
> com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.poi.util.DocumentHelper
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future 
> release{code}
> See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details.
>  
> There was never a 3.17.1 release, but there has been a 4.0.0 release.  
> Suggest updating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2728) Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)

2018-09-17 Thread Ian Cervantez (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cervantez updated TIKA-2728:

Description: 
The poi project has a bug in version 3.17 when running on Java 9/10:
{code:none}
WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper 
(file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to method 
com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.poi.util.DocumentHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release{code}
See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details.

 

There was never a 3.17.1 release, but there has been a 4.0.0 release.  Suggest 
updating.

  was:
The poi project has a bug in version 3.17 when running on Java 9/10:
{code:java}
WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper 
(file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to method 
com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.poi.util.DocumentHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release{code}
See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details.

 

There was never a 3.17 release, but there has been a 4.0.0 release.  Suggest 
updating.


> Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)
> --
>
> Key: TIKA-2728
> URL: https://issues.apache.org/jira/browse/TIKA-2728
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.18
> Environment: Java 9 / Java 10
>Reporter: Ian Cervantez
>Priority: Major
>  Labels: dependency-upgrade
>
> The poi project has a bug in version 3.17 when running on Java 9/10:
> {code:none}
> WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper 
> (file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to 
> method 
> com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.poi.util.DocumentHelper
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future 
> release{code}
> See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details.
>  
> There was never a 3.17.1 release, but there has been a 4.0.0 release.  
> Suggest updating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2728) Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 compatibility)

2018-09-17 Thread Ian Cervantez (JIRA)
Ian Cervantez created TIKA-2728:
---

 Summary: Update poi dependency from 3.17 to 4.0.0 (For Java 9/10 
compatibility)
 Key: TIKA-2728
 URL: https://issues.apache.org/jira/browse/TIKA-2728
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.18
 Environment: Java 9 / Java 10
Reporter: Ian Cervantez


The poi project has a bug in version 3.17 when running on Java 9/10:
{code:java}
WARNING: Illegal reflective access by org.apache.poi.util.DocumentHelper 
(file:/usr/local/tomcat/webapps/ROOT/WEB-INF/lib/poi-ooxml-3.17.jar) to method 
com.sun.org.apache.xerces.internal.util.SecurityManager.setEntityExpansionLimit(int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.poi.util.DocumentHelper
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release{code}
See [https://bz.apache.org/bugzilla/show_bug.cgi?id=61564] for more details.

 

There was never a 3.17 release, but there has been a 4.0.0 release.  Suggest 
updating.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617516#comment-16617516
 ] 

Slava G edited comment on TIKA-2727 at 9/17/18 1:22 PM:


Great !!! Thanks.

Is the jdk.xml.entityExpansionLimit relevant only for TIKA or can affect 
anything else XML related ?

Also, is the 1.19 uses same fix or it's more TIKA related ? 


was (Author: slavago):
Great !!! Thanks.

Is the jdk.xml.entityExpansionLimit relevant only for TIKA or can affect 
anything else XML related ?

 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617516#comment-16617516
 ] 

Slava G commented on TIKA-2727:
---

Great !!! Thanks.

Is the jdk.xml.entityExpansionLimit relevant only for TIKA or can affect 
anything else XML related ?

 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617480#comment-16617480
 ] 

Tim Allison commented on TIKA-2727:
---

Until you can upgrade to 1.19, you should be able to limit entity expansion via 
the commandline, e.g.:

{{-Djdk.xml.entityExpansionLimit=10}}

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop

2018-09-17 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2727.
---
   Resolution: Fixed
Fix Version/s: 2.0.0
   1.19

Thank you for sharing this problem with us.  We fixed this already in 1.19, 
which is currently in the voting process for release and should be out in a few 
days, unless there are surprises...

 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---
>
> Key: TIKA-2727
> URL: https://issues.apache.org/jira/browse/TIKA-2727
> Project: Tika
>  Issue Type: Bug
>  Components: detector, parser
>Affects Versions: 1.17
>Reporter: Slava G
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.19, 2.0.0
>
> Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not 
> large, but kinda tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, 
> String, boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not 
> available 
> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
>  line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean)
>  line: not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not 
> available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) 
> line: not available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not 
> available 
> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)