[jira] [Updated] (ANY23-522) EmbeddedJSONLDExtractorTest taking > 7s

2021-10-20 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated ANY23-522:
---
Description: 
When running against master branch, I observe the 
[EmbeddedJSONLDTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java]
 taking significant time to execute.

{code:bash}
---
Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
---
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.692 s - in 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
{code}

The 
[JSONLDExtractorTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java]
 does not pose any issues

{code:bash}
---
Test set: org.apache.any23.extractor.rdf.JSONLDExtractorTest
---
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
org.apache.any23.extractor.rdf.JSONLDExtractorTest
{code}

  was:
When running against master branch, I observe the 
[EmbeddedJSONLDTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java]
 taking significant time to execute.

{code:text}
---
Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
---
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.692 s - in 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
{code}

The 
[JSONLDExtractorTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java]
 does not pose any issues

{code:text}
---
Test set: org.apache.any23.extractor.rdf.JSONLDExtractorTest
---
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
org.apache.any23.extractor.rdf.JSONLDExtractorTest
{code}


> EmbeddedJSONLDExtractorTest taking > 7s
> ---
>
> Key: ANY23-522
> URL: https://issues.apache.org/jira/browse/ANY23-522
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: extractors, json-ld
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> When running against master branch, I observe the 
> [EmbeddedJSONLDTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java]
>  taking significant time to execute.
> {code:bash}
> ---
> Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
> ---
> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.692 s - in 
> org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
> {code}
> The 
> [JSONLDExtractorTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java]
>  does not pose any issues
> {code:bash}
> ---
> Test set: org.apache.any23.extractor.rdf.JSONLDExtractorTest
> ---
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
> org.apache.any23.extractor.rdf.JSONLDExtractorTest
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ANY23-522) EmbeddedJSONLDExtractorTest taking > 7s

2021-10-20 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431365#comment-17431365
 ] 

Lewis John McGibbney commented on ANY23-522:


I just observed the following
{code:text}
---
Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
---
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.241 s - in 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
{code}

> EmbeddedJSONLDExtractorTest taking > 7s
> ---
>
> Key: ANY23-522
> URL: https://issues.apache.org/jira/browse/ANY23-522
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: extractors, json-ld
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> When running against master branch, I observe the 
> [EmbeddedJSONLDTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java]
>  taking significant time to execute.
> {code:text}
> ---
> Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
> ---
> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.692 s - in 
> org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
> {code}
> The 
> [JSONLDExtractorTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java]
>  does not pose any issues
> {code:text}
> ---
> Test set: org.apache.any23.extractor.rdf.JSONLDExtractorTest
> ---
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
> org.apache.any23.extractor.rdf.JSONLDExtractorTest
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ANY23-522) EmbeddedJSONLDExtractorTest taking > 7s

2021-10-20 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17431365#comment-17431365
 ] 

Lewis John McGibbney edited comment on ANY23-522 at 10/20/21, 5:02 PM:
---

I just observed the following
{code:bash}
---
Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
---
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.241 s - in 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
{code}


was (Author: lewismc):
I just observed the following
{code:text}
---
Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
---
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.241 s - in 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
{code}

> EmbeddedJSONLDExtractorTest taking > 7s
> ---
>
> Key: ANY23-522
> URL: https://issues.apache.org/jira/browse/ANY23-522
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: extractors, json-ld
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> When running against master branch, I observe the 
> [EmbeddedJSONLDTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java]
>  taking significant time to execute.
> {code:text}
> ---
> Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
> ---
> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.692 s - in 
> org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
> {code}
> The 
> [JSONLDExtractorTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java]
>  does not pose any issues
> {code:text}
> ---
> Test set: org.apache.any23.extractor.rdf.JSONLDExtractorTest
> ---
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
> org.apache.any23.extractor.rdf.JSONLDExtractorTest
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-522) EmbeddedJSONLDExtractorTest taking > 7s

2021-10-20 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-522:
--

 Summary: EmbeddedJSONLDExtractorTest taking > 7s
 Key: ANY23-522
 URL: https://issues.apache.org/jira/browse/ANY23-522
 Project: Apache Any23
  Issue Type: Improvement
  Components: extractors, json-ld
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


When running against master branch, I observe the 
[EmbeddedJSONLDTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java]
 taking significant time to execute.

{code:text}
---
Test set: org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
---
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.692 s - in 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractorTest
{code}

The 
[JSONLDExtractorTest|https://github.com/apache/any23/blob/master/core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java]
 does not pose any issues

{code:text}
---
Test set: org.apache.any23.extractor.rdf.JSONLDExtractorTest
---
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 s - in 
org.apache.any23.extractor.rdf.JSONLDExtractorTest
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ANY23-504) XML-based parsers should not load external DTDs by default

2021-10-20 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated ANY23-504:
---
Summary: XML-based parsers should not load external DTDs by default  (was: 
Optionally disable remote HTTP connections when resolving XML entities)

> XML-based parsers should not load external DTDs by default
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:265)
> at org.apache.any23.Any23.extract(Any23.java:315)
> at org.apache.any23.Any23.extract(Any23.java:483)
> at org.apache.any23.Any23.extract(Any23.java:345)
> at 
> org.apache.nutch.any23.Any23ParseFilter$Any23Parser.parse(Any23ParseFilter.java:106)
> at 
> 

[jira] [Created] (ANY23-521) Bump jsoup from 1.14.2 to 1.14.3

2021-10-19 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-521:
--

 Summary: Bump jsoup from 1.14.2 to 1.14.3
 Key: ANY23-521
 URL: https://issues.apache.org/jira/browse/ANY23-521
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/213



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-521) Bump jsoup from 1.14.2 to 1.14.3

2021-10-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-521.

Resolution: Fixed

> Bump jsoup from 1.14.2 to 1.14.3
> 
>
> Key: ANY23-521
> URL: https://issues.apache.org/jira/browse/ANY23-521
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/213



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-10-19 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430629#comment-17430629
 ] 

Lewis John McGibbney commented on ANY23-504:


{quote}Nevertheless - is it the expected behavior that the parser reads a 
remote DTD? - even if it is called because of a misconfiguration.{quote}

[~snagel], no (I don't think so anyway). [I am in discussions with the RDF4J 
Team|https://github.com/eclipse/rdf4j/issues/3347] on that point.

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:265)
> at org.apache.any23.Any23.extract(Any23.java:315)
> at org.apache.any23.Any23.extract(Any2

[jira] [Updated] (NUTCH-2898) IDE Setup for nutch with Intellij IDEA is not well documented

2021-10-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2898:

Fix Version/s: 1.19

> IDE Setup for nutch with Intellij IDEA is not well documented
> -
>
> Key: NUTCH-2898
> URL: https://issues.apache.org/jira/browse/NUTCH-2898
> Project: Nutch
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Abu Sufian Milon
>Assignee: Abu Sufian Milon
>Priority: Minor
> Fix For: 1.19
>
>
> I think, the title is pretty much clear about the issue. 
> I'm preparing a README.md pull request with a tested solution. 
> Also I'm willing to contribute to wiki documentation. I've already created an 
> account at [https://cwiki.apache.org/.|https://cwiki.apache.org/] 
> Username: "liam.logan.web"
> Can anyone please tell me, where I can find someone, to apply for 
> documentation editing privilege?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-2898) IDE Setup for nutch with Intellij IDEA is not well documented

2021-10-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2898.
-
Resolution: Fixed

> IDE Setup for nutch with Intellij IDEA is not well documented
> -
>
> Key: NUTCH-2898
> URL: https://issues.apache.org/jira/browse/NUTCH-2898
> Project: Nutch
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Abu Sufian Milon
>Assignee: Abu Sufian Milon
>Priority: Minor
> Fix For: 1.19
>
>
> I think, the title is pretty much clear about the issue. 
> I'm preparing a README.md pull request with a tested solution. 
> Also I'm willing to contribute to wiki documentation. I've already created an 
> account at [https://cwiki.apache.org/.|https://cwiki.apache.org/] 
> Username: "liam.logan.web"
> Can anyone please tell me, where I can find someone, to apply for 
> documentation editing privilege?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (NUTCH-2898) IDE Setup for nutch with Intellij IDEA is not well documented

2021-10-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned NUTCH-2898:
---

Assignee: Abu Sufian Milon

> IDE Setup for nutch with Intellij IDEA is not well documented
> -
>
> Key: NUTCH-2898
> URL: https://issues.apache.org/jira/browse/NUTCH-2898
> Project: Nutch
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Abu Sufian Milon
>Assignee: Abu Sufian Milon
>Priority: Minor
>
> I think, the title is pretty much clear about the issue. 
> I'm preparing a README.md pull request with a tested solution. 
> Also I'm willing to contribute to wiki documentation. I've already created an 
> account at [https://cwiki.apache.org/.|https://cwiki.apache.org/] 
> Username: "liam.logan.web"
> Can anyone please tell me, where I can find someone, to apply for 
> documentation editing privilege?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-520) Augment any23 extractor CLI to print all mimetypes for a given extractor

2021-10-19 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-520.

Resolution: Fixed

> Augment any23 extractor CLI to print all mimetypes for a given extractor
> 
>
> Key: ANY23-520
> URL: https://issues.apache.org/jira/browse/ANY23-520
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: CLI
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> In the same way that you can [print all of the document 
> extractors|http://any23.apache.org/getting-started.html#The_ExtractorDocumentation_tool],
>  I would like to provide a convenience mechanism to print all of the 
> mimetypes which can be processed by each extractor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-10-14 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428474#comment-17428474
 ] 

Lewis John McGibbney edited comment on ANY23-504 at 10/14/21, 5:35 PM:
---

Hi [~snagel] yes it helps a lot. My next question was to ask what any23 
extractors were activated via the 
[any23.extractors|https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1263-L1267]
 configuration setting. As you can see, by default we only have it set to 
_html-microdata_.
The behavior you are experiencing is directly inline with what I would expect 
if I activated the _*rdf-xml*_ extractor on a HTML document. 
This is validated by the Media Types defined within the [RDFXMLExtractorFactory 
constructor 
semantics|https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/rdf/RDFXMLExtractorFactory.java#L42].

Let me state my thoughts. This is NOT a bug. It is however a problem. 

We could provide some sort of _break_ mechanism which would allow us to report 
to the client that an error has occurred as a result of the defined extractor 
being incapable of processing the input data.

Does that make sense? Thanks for sticking with me on this one...


was (Author: lewismc):
Hi [~snagel] yes it helps a lot. My next question was to ask what any23 
extractors were activated via the 
[any23.extractors|https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1263-L1267]
 configuration setting. As you can see, by default we only have it set to 
_html-microdata_.
The behavior you are experiencing is directly inline with what I would expect 
if I activated the _*rdf-xml*_ extractor on a HTML document. 
This is validated by the Media Types defined within the [RDFXMLExtractorFactory 
constructor 
semantics|https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/rdf/RDFXMLExtractorFactory.java#L42].

Let me state my thoughts. This is NOT a bug. It is however a problem. Further I 
believe that we could provide some sort of _break_ mechanism which would allow 
us to report to the client that an error as a result of the extractor overrides 
not being suitable as extractor implementations for the given input data.

Does that make sense? Thanks for sticking with me on this one...

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupC

[jira] [Updated] (ANY23-520) Augment any23 extractor CLI to print all mimetypes for a given extractor

2021-10-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated ANY23-520:
---
Summary: Augment any23 extractor CLI to print all mimetypes for a given 
extractor  (was: Augment any23 mimes CLI to print all mimetypes which can be 
processed by Any23 )

> Augment any23 extractor CLI to print all mimetypes for a given extractor
> 
>
> Key: ANY23-520
> URL: https://issues.apache.org/jira/browse/ANY23-520
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: CLI
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> In the same way that you can [print all of the document 
> extractors|http://any23.apache.org/getting-started.html#The_ExtractorDocumentation_tool],
>  I would like to provide a convenience mechanism to print all of the 
> mimetypes which can be processed by each extractor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-520) Augment any23 mimes CLI to print all mimetypes which can be processed by Any23

2021-10-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-520:
--

 Summary: Augment any23 mimes CLI to print all mimetypes which can 
be processed by Any23 
 Key: ANY23-520
 URL: https://issues.apache.org/jira/browse/ANY23-520
 Project: Apache Any23
  Issue Type: Improvement
  Components: CLI
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


In the same way that you can [print all of the document 
extractors|http://any23.apache.org/getting-started.html#The_ExtractorDocumentation_tool],
 I would like to provide a convenience mechanism to print all of the mimetypes 
which can be processed by each extractor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-10-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428474#comment-17428474
 ] 

Lewis John McGibbney commented on ANY23-504:


Hi [~snagel] yes it helps a lot. My next question was to ask what any23 
extractors were activated via the 
[any23.extractors|https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1263-L1267]
 configuration setting. As you can see, by default we only have it set to 
_html-microdata_.
The behavior you are experiencing is directly inline with what I would expect 
if I activated the _*rdf-xml*_ extractor on a HTML document. 
This is validated by the Media Types defined within the [RDFXMLExtractorFactory 
constructor 
semantics|https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/rdf/RDFXMLExtractorFactory.java#L42].

Let me state my thoughts. This is NOT a bug. It is however a problem. Further I 
believe that we could provide some sort of _break_ mechanism which would allow 
us to report to the client that an error as a result of the extractor overrides 
not being suitable as extractor implementations for the given input data.

Does that make sense? Thanks for sticking with me on this one...

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
>

[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-10-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428383#comment-17428383
 ] 

Lewis John McGibbney commented on ANY23-504:


Hi [~snagel] the associated pull request DOESN'T fix a bug neither does it 
verify the presence of a bug. Instead it demonstrates that the TriXParser 
should NOT be called when processing the test file BBC_News_Scotland.html.
Are you able to reproduce this test result again?

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:265)
> at org.apache.any23.Any23.extract(Any23.java:315)
> at org.apache.any23.Any23.extract(Any23.java:483)
>

[jira] [Resolved] (ANY23-519) Bump maven-enforcer-plugin from 3.0.0-M2 to 3.0.0

2021-10-12 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-519.

Resolution: Fixed

> Bump maven-enforcer-plugin from 3.0.0-M2 to 3.0.0
> -
>
> Key: ANY23-519
> URL: https://issues.apache.org/jira/browse/ANY23-519
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/211



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-519) Bump maven-enforcer-plugin from 3.0.0-M2 to 3.0.0

2021-10-12 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-519:
--

 Summary: Bump maven-enforcer-plugin from 3.0.0-M2 to 3.0.0
 Key: ANY23-519
 URL: https://issues.apache.org/jira/browse/ANY23-519
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/211



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-10-11 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427219#comment-17427219
 ] 

Lewis John McGibbney commented on ANY23-504:


[~snagel] OK so although I know literally nothing about the TriX data form at 
or media type, I have been able to discover that the 
*org.eclipse.rdf4j.rio.trix.TriXParser* [should NOT be activated as part of the 
single document 
extraction|https://github.com/eclipse/rdf4j/issues/3347#issuecomment-939203533].
 
I'm therefore investigating the mimetype detection in Any23 to see if that's 
where the issue lies. 

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocument

[jira] [Updated] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA and 2.1.0

2021-10-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated TIKA-3453:
---
Summary: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" 
Defaulting to no-operation (NOP) logger implementation for tika-docker 
2.0.0-BETA and 2.1.0  (was: SLF4J: Failed to load class 
"org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger 
implementation for tika-docker 2.0.0-BETA)

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA and 2.1.0
> ---
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>      Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426383#comment-17426383
 ] 

Lewis John McGibbney commented on TIKA-3453:


The problem lies in the *__* of the [logging 
dependencies|https://github.com/apache/tika/blob/main/tika-server/tika-server-core/pom.xml#L129-L139].


{code:xml}

  org.apache.logging.log4j
  log4j-core
  ${log4j2.version}
  test


  org.apache.logging.log4j
  log4j-slf4j-impl
  ${log4j2.version}
  test
{code}


> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>  Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426378#comment-17426378
 ] 

Lewis John McGibbney commented on TIKA-3453:


OK so the problem lies in the tika-server source NOT with tika-docker or 
tika-helm. Using main branch I can replicate as well

% java -jar tika-server-core-2.1.1-SNAPSHOT.jar
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Oct 08, 2021 2:05:20 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9998/

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>      Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426372#comment-17426372
 ] 

Lewis John McGibbney commented on TIKA-3453:


[~davemeikle] can you also reproduce?

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>  Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426369#comment-17426369
 ] 

Lewis John McGibbney edited comment on TIKA-3453 at 10/8/21, 8:47 PM:
--

Yep I can reproduce this [~scottbessler]. Using tika-docker 2.1.0-full via 
tika-helm

{code:bash}
% kubectl logs tika-66796c96-psl2b -n tika -f
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
{code}

[~tallison] was this not supposed to be sorted out? What digging did we do 
before on this?
 Oct 08, 2021 8:40:28 PM org.apache.cxf.endpoint.ServerImpl initDestination
 INFO: Setting the server's publish address to be [http://0.0.0.0:9998/]


was (Author: lewismc):
Yep I can reproduce this [~scottbessler]. Using tika-docker 2.1.0-full via 
tika-helm

{{% kubectl logs tika-66796c96-psl2b -n tika -f
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.}}

[~tallison] was this not supposed to be sorted out? What digging did we do 
before on this?
Oct 08, 2021 8:40:28 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://0.0.0.0:9998/

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>      Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426369#comment-17426369
 ] 

Lewis John McGibbney edited comment on TIKA-3453 at 10/8/21, 8:46 PM:
--

Yep I can reproduce this [~scottbessler]. Using tika-docker 2.1.0-full via 
tika-helm

{{% kubectl logs tika-66796c96-psl2b -n tika -f
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.}}

[~tallison] was this not supposed to be sorted out? What digging did we do 
before on this?
Oct 08, 2021 8:40:28 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://0.0.0.0:9998/


was (Author: lewismc):
Yep I can reproduce this [~scottbessler]. Using tika-docker 2.1.0-full via 
tika-helm
{{% kubectl logs tika-66796c96-psl2b -n tika -f
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.}}

[~tallison] was this not supposed to be sorted out? What digging did we do 
before on this?
Oct 08, 2021 8:40:28 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://0.0.0.0:9998/

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>      Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426369#comment-17426369
 ] 

Lewis John McGibbney commented on TIKA-3453:


Yep I can reproduce this [~scottbessler]. Using tika-docker 2.1.0-full via 
tika-helm
{{% kubectl logs tika-66796c96-psl2b -n tika -f
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.}}

[~tallison] was this not supposed to be sorted out? What digging did we do 
before on this?
Oct 08, 2021 8:40:28 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://0.0.0.0:9998/

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>      Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated TIKA-3453:
---
Fix Version/s: (was: 2.0.0)
   2.1.1

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>  Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.1.1
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3566) Upgrade tika-helm to 2.1.0

2021-10-08 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-3566:
--

 Summary: Upgrade tika-helm to 2.1.0
 Key: TIKA-3566
 URL: https://issues.apache.org/jira/browse/TIKA-3566
 Project: Tika
  Issue Type: Improvement
  Components: helm
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.1.0


Simple upgrade to [tika-docker 
2.1.0|https://hub.docker.com/layers/apache/tika/2.1.0/images/sha256-5bb52afa9726cf2ca022441cc75ef357de9f8deb41a88a9b2964780e934d11e7?context=explore].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-518) Bump jacoco-maven-plugin from 0.8.4 to 0.8.7

2021-10-08 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-518:
--

 Summary: Bump jacoco-maven-plugin from 0.8.4 to 0.8.7
 Key: ANY23-518
 URL: https://issues.apache.org/jira/browse/ANY23-518
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/208



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-518) Bump jacoco-maven-plugin from 0.8.4 to 0.8.7

2021-10-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-518.

Resolution: Fixed

> Bump jacoco-maven-plugin from 0.8.4 to 0.8.7
> 
>
> Key: ANY23-518
> URL: https://issues.apache.org/jira/browse/ANY23-518
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/208



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-517) Bump maven-javadoc-plugin from 3.2.0 to 3.3.1

2021-10-08 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-517:
--

 Summary: Bump maven-javadoc-plugin from 3.2.0 to 3.3.1
 Key: ANY23-517
 URL: https://issues.apache.org/jira/browse/ANY23-517
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/209



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-517) Bump maven-javadoc-plugin from 3.2.0 to 3.3.1

2021-10-08 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-517.

Resolution: Fixed

> Bump maven-javadoc-plugin from 3.2.0 to 3.3.1
> -
>
> Key: ANY23-517
> URL: https://issues.apache.org/jira/browse/ANY23-517
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/209



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-516) Bump appassembler-booter from 1.10 to 2.1.0

2021-10-08 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-516:
--

 Summary: Bump appassembler-booter from 1.10 to 2.1.0
 Key: ANY23-516
 URL: https://issues.apache.org/jira/browse/ANY23-516
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/210



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3453) SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA

2021-10-07 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425772#comment-17425772
 ] 

Lewis John McGibbney commented on TIKA-3453:


I can investiagte. Thanks [~scottbessler]we didn't upgrade internally yet but I 
will force that now and report back.

> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder" Defaulting to 
> no-operation (NOP) logger implementation for tika-docker 2.0.0-BETA
> -
>
> Key: TIKA-3453
> URL: https://issues.apache.org/jira/browse/TIKA-3453
> Project: Tika
>  Issue Type: Bug
>  Components: docker, server
>    Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.0.0
>
>
> It looks like logging libraries are not being interpreted correctly from Java 
> classpath.
> We need logging turned on so we can intercept anomalies.
> Investigating...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GORA-405) Create Gora OpenAPI specification

2021-10-06 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/GORA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425234#comment-17425234
 ] 

Lewis John McGibbney commented on GORA-405:
---

Hi Kevin, please ping me if you would like input on this. Thanks

> Create Gora OpenAPI specification
> -
>
> Key: GORA-405
> URL: https://issues.apache.org/jira/browse/GORA-405
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-compiler, gora-core
>Reporter: Udesh Liyanaarachchi
>Assignee: Udesh Liyanaarachchi
>Priority: Minor
>  Labels: gsoc2019, outreachy2021
> Fix For: 1.0
>
>
> As to the discussion [~lewismc]  initiated in the 
> [mail list 
> thread|https://www.mail-archive.com/dev@gora.apache.org/msg05444.html] we 
> need to implement a REST API for GORA.
> This is the initial proposal documentation for the [GORA REST 
> API|http://docs.apachegoraapi.apiary.io/].
> The plan will be to implement the API with  [Apache CXF 
> |http://cxf.apache.org] using [CXF's JAXRS 
> |http://cxf.apache.org/docs/jax-rs.html] implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GORA-405) Create Gora REST API

2021-10-06 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/GORA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425186#comment-17425186
 ] 

Lewis John McGibbney commented on GORA-405:
---

Hi Kevin,
This can be renamed to “Create Gora OpenAPI specification”

On Wed, Oct 6, 2021 at 09:12 Kevin Ratnasekera (Jira) 

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


> Create Gora REST API
> 
>
> Key: GORA-405
> URL: https://issues.apache.org/jira/browse/GORA-405
> Project: Apache Gora
>  Issue Type: Improvement
>  Components: gora-compiler, gora-core
>Reporter: Udesh Liyanaarachchi
>Assignee: Udesh Liyanaarachchi
>Priority: Minor
>  Labels: gsoc2019, outreachy2021
> Fix For: 1.0
>
>
> As to the discussion [~lewismc]  initiated in the 
> [mail list 
> thread|https://www.mail-archive.com/dev@gora.apache.org/msg05444.html] we 
> need to implement a REST API for GORA.
> This is the initial proposal documentation for the [GORA REST 
> API|http://docs.apachegoraapi.apiary.io/].
> The plan will be to implement the API with  [Apache CXF 
> |http://cxf.apache.org] using [CXF's JAXRS 
> |http://cxf.apache.org/docs/jax-rs.html] implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-515) Bump commons-lang3 from 3.10 to 3.12.0

2021-10-06 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-515:
--

 Summary: Bump commons-lang3 from 3.10 to 3.12.0
 Key: ANY23-515
 URL: https://issues.apache.org/jira/browse/ANY23-515
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/207



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-515) Bump commons-lang3 from 3.10 to 3.12.0

2021-10-06 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-515.

Resolution: Fixed

> Bump commons-lang3 from 3.10 to 3.12.0
> --
>
> Key: ANY23-515
> URL: https://issues.apache.org/jira/browse/ANY23-515
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/207



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-514) Bump maven-scm-provider-gitexe from 1.9 to 1.12.0

2021-10-06 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-514.

Resolution: Fixed

> Bump maven-scm-provider-gitexe from 1.9 to 1.12.0
> -
>
> Key: ANY23-514
> URL: https://issues.apache.org/jira/browse/ANY23-514
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/206



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-514) Bump maven-scm-provider-gitexe from 1.9 to 1.12.0

2021-10-06 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-514:
--

 Summary: Bump maven-scm-provider-gitexe from 1.9 to 1.12.0
 Key: ANY23-514
 URL: https://issues.apache.org/jira/browse/ANY23-514
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/206



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-513) Bump formatter-maven-plugin from 2.14.0 to 2.16.0

2021-10-04 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-513.

Resolution: Fixed

> Bump formatter-maven-plugin from 2.14.0 to 2.16.0
> -
>
> Key: ANY23-513
> URL: https://issues.apache.org/jira/browse/ANY23-513
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-513) Bump formatter-maven-plugin from 2.14.0 to 2.16.0

2021-10-04 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-513:
--

 Summary: Bump formatter-maven-plugin from 2.14.0 to 2.16.0
 Key: ANY23-513
 URL: https://issues.apache.org/jira/browse/ANY23-513
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-512) Bump maven-jxr-plugin from 3.0.0 to 3.1.1

2021-09-29 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-512.

  Assignee: Lewis John McGibbney
Resolution: Fixed

> Bump maven-jxr-plugin from 3.0.0 to 3.1.1
> -
>
> Key: ANY23-512
> URL: https://issues.apache.org/jira/browse/ANY23-512
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/202



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-512) Bump maven-jxr-plugin from 3.0.0 to 3.1.1

2021-09-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-512:
--

 Summary: Bump maven-jxr-plugin from 3.0.0 to 3.1.1
 Key: ANY23-512
 URL: https://issues.apache.org/jira/browse/ANY23-512
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/202



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-511) Bump snakeyaml from 1.26 to 1.29

2021-09-29 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-511.

Resolution: Fixed

> Bump snakeyaml from 1.26 to 1.29
> 
>
> Key: ANY23-511
> URL: https://issues.apache.org/jira/browse/ANY23-511
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-511) Bump snakeyaml from 1.26 to 1.29

2021-09-29 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-511:
--

 Summary: Bump snakeyaml from 1.26 to 1.29
 Key: ANY23-511
 URL: https://issues.apache.org/jira/browse/ANY23-511
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/201



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (NUTCH-2897) Do not supress deprecated API warnings

2021-09-28 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2897:
---

 Summary: Do not supress deprecated API warnings
 Key: NUTCH-2897
 URL: https://issues.apache.org/jira/browse/NUTCH-2897
 Project: Nutch
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.18
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.19


We suppress deprecated warnings in three places
# 
[Plugin.java#L92-L96|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/plugin/Plugin.java#L92-L96]
# 
[NutchJob.java#L35-L38|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/NutchJob.java#L35-L38],
 and
# 
[TikaParser.java#L92-L95|https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/TikaParser.java#L92-L95]

Instead of suppressing the warnings we should instead use the correct 
*@Deprecated* annotation and *@deprecated* Javadoc. This is not difficult to do 
and should have been done first time around.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-510) Bump maven-site-plugin from 3.7.1 to 3.9.1

2021-09-26 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-510.

Resolution: Fixed

> Bump maven-site-plugin from 3.7.1 to 3.9.1
> --
>
> Key: ANY23-510
> URL: https://issues.apache.org/jira/browse/ANY23-510
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/200



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-509) Bump velocity from 1.5 to 1.7

2021-09-26 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-509.

Resolution: Fixed

> Bump velocity from 1.5 to 1.7
> -
>
> Key: ANY23-509
> URL: https://issues.apache.org/jira/browse/ANY23-509
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-510) Bump maven-site-plugin from 3.7.1 to 3.9.1

2021-09-26 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-510:
--

 Summary: Bump maven-site-plugin from 3.7.1 to 3.9.1
 Key: ANY23-510
 URL: https://issues.apache.org/jira/browse/ANY23-510
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/200



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-508) Bump maven-project-info-reports-plugin from 3.0.0 to 3.1.2

2021-09-26 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-508.

Resolution: Fixed

> Bump maven-project-info-reports-plugin from 3.0.0 to 3.1.2
> --
>
> Key: ANY23-508
> URL: https://issues.apache.org/jira/browse/ANY23-508
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/198



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-509) Bump velocity from 1.5 to 1.7

2021-09-26 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-509:
--

 Summary: Bump velocity from 1.5 to 1.7
 Key: ANY23-509
 URL: https://issues.apache.org/jira/browse/ANY23-509
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-508) Bump maven-project-info-reports-plugin from 3.0.0 to 3.1.2

2021-09-26 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-508:
--

 Summary: Bump maven-project-info-reports-plugin from 3.0.0 to 3.1.2
 Key: ANY23-508
 URL: https://issues.apache.org/jira/browse/ANY23-508
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/198



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-507) Bump commons-csv from 1.8 to 1.9.0

2021-09-26 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-507:
--

 Summary: Bump commons-csv from 1.8 to 1.9.0
 Key: ANY23-507
 URL: https://issues.apache.org/jira/browse/ANY23-507
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/197



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-507) Bump commons-csv from 1.8 to 1.9.0

2021-09-26 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-507.

Resolution: Fixed

> Bump commons-csv from 1.8 to 1.9.0
> --
>
> Key: ANY23-507
> URL: https://issues.apache.org/jira/browse/ANY23-507
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/197



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-09-23 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated ANY23-504:
---
Fix Version/s: 2.6

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:265)
> at org.apache.any23.Any23.extract(Any23.java:315)
> at org.apache.any23.Any23.extract(Any23.java:483)
> at org.apache.any23.Any23.extract(Any23.java:345)
> at 
> org.apache.nutch.any23.Any23ParseFilter$Any23Parser.parse(Any23ParseFilter.java:106)
> at 
> org.apache.nutch.any23.Any23ParseFilter$Any23Parser.(Any23ParseFilter.java:81)
> at 
&g

[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-09-22 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418823#comment-17418823
 ] 

Lewis John McGibbney commented on ANY23-504:


In this case the extractor was an [instanceOf 
ContentExtractor|https://github.com/apache/any23/blob/2a5b41acccfbb04de62824f1375d74c490071343/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L520-L524

We could augment the application-level configuration in 
[default-configuration.properties|https://github.com/apache/any23/blob/master/api/src/main/resources/default-configuration.properties].
 I've not yet pinned down exactly how this could be implemented seeing as the 
underlying parse implementation is 
[org.eclipse.rdf4j.rio.trix.TriXParser|https://rdf4j.org/javadoc/latest/index.html?org/eclipse/rdf4j/rio/trix/TriXParser.html].

Any23 RDF Parsers are configured in 
[RDFParserFactory.configureParser(...)|https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/rdf/RDFParserFactory.java#L278-L305].
 

The other observation... should the TrixParser really have been activated here?

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.S

[jira] [Resolved] (ANY23-506) Bump jcommander from 1.78 to 1.81

2021-09-22 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-506.

Resolution: Fixed

> Bump jcommander from 1.78 to 1.81
> -
>
> Key: ANY23-506
> URL: https://issues.apache.org/jira/browse/ANY23-506
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/195



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-506) Bump jcommander from 1.78 to 1.81

2021-09-22 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-506:
--

 Summary: Bump jcommander from 1.78 to 1.81
 Key: ANY23-506
 URL: https://issues.apache.org/jira/browse/ANY23-506
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/195



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-505) Bump maven-scm-publish-plugin from 1.0-beta-2 to 3.1.0

2021-09-22 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-505.

Resolution: Fixed

> Bump maven-scm-publish-plugin from 1.0-beta-2 to 3.1.0
> --
>
> Key: ANY23-505
> URL: https://issues.apache.org/jira/browse/ANY23-505
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/196



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-505) Bump maven-scm-publish-plugin from 1.0-beta-2 to 3.1.0

2021-09-22 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-505:
--

 Summary: Bump maven-scm-publish-plugin from 1.0-beta-2 to 3.1.0
 Key: ANY23-505
 URL: https://issues.apache.org/jira/browse/ANY23-505
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/196



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2895) Allow to add plugin dependency jars by wildcard

2021-09-22 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418680#comment-17418680
 ] 

Lewis John McGibbney commented on NUTCH-2895:
-

I agree with your motivation here Seb. I need to study this some...

> Allow to add plugin dependency jars by wildcard
> ---
>
> Key: NUTCH-2895
> URL: https://issues.apache.org/jira/browse/NUTCH-2895
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, runtime
>Affects Versions: 1.18
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.19
>
>
> The plugin descriptors (plugin.xml) require to list all dependent jar files 
> one by one as "library". This makes upgrading plugins which include a longer 
> list of dependencies (parse-tika, any23, protocol-okhttp, Selenium-based 
> protocols) a non-trivial task. Maybe we could add a "wildcard" rule that 
> allows to add all (remaining) jars in the plugin folder to the classpath.
> Sure this would make the Nutch plugin system differ from the [original 
> Eclipse plugin 
> architecture|http://www.eclipse.org/articles/Article-Plug-in-architecture/plugin_architecture.html].
>  When looking into recent Eclipse plugins: dependencies are in the lib/ 
> folder and listed in the file "MANIFEST.MF".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-09-22 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418677#comment-17418677
 ] 

Lewis John McGibbney commented on ANY23-504:


I'll see if I can reproduce. I've never seen this before. 

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:265)
> at org.apache.any23.Any23.extract(Any23.java:315)
> at org.apache.any23.Any23.extract(Any23.java:483)
> at org.apache.any23.Any23.extract(Any23.java:345)
> at 
> org.apache.nutch.any23.Any23ParseFilter$Any23Parser.parse(Any23ParseFilter.java:106)
> at 
> org.apache.nutch.any23.Any

[jira] [Assigned] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-09-22 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reassigned ANY23-504:
--

Assignee: Lewis John McGibbney

> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>    Assignee: Lewis John McGibbney
>Priority: Major
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown 
> Source)
> at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
> Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown 
> Source)
> at 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser.parse(SimpleSAXParser.java:197)
> - locked <0x00071bfe6f28> (a 
> org.eclipse.rdf4j.common.xml.SimpleSAXParser)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:177)
> at org.eclipse.rdf4j.rio.trix.TriXParser.parse(TriXParser.java:134)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:86)
> at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:39)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:523)
> at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:265)
> at org.apache.any23.Any23.extract(Any23.java:315)
> at org.apache.any23.Any23.extract(Any23.java:483)
> at org.apache.any23.Any23.extract(Any23.java:345)
> at 
> org.apache.nutch.any23.Any23ParseFilter$Any23Parser.parse(Any23ParseFilter.java:106)
> at 
> org.apache.nutch.any23.Any23ParseFilter$Any23Parser.(Any23ParseFilter.java:81)
> at 
> org

lewismc resign as mentor for Apache SDAP Incubating

2021-09-18 Thread lewis john mcgibbney
Hi SDAP PPMC,

I’ve not be active on SDAP for quite some time. I would like to remove
myself as mentor for the project.
I think with a few releases SDAP could graduate pretty soon. Best of luck
with that.

IPMC,
It may be the case that SDAP requires a bit more mentorship in order to
make the first incubating release I am not sure. The project is in
reasonable health so there are no blocker..

Thanks
lewismc
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[jira] [Resolved] (NUTCH-2893) fireant upgrade dependency elasticsearch-rest-high-level-client in src/plugin/indexer-elastic/ivy.xml from 7.11.1 to 7.13.2

2021-09-17 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2893.
-
Resolution: Fixed

> fireant upgrade dependency elasticsearch-rest-high-level-client in 
> src/plugin/indexer-elastic/ivy.xml from 7.11.1 to 7.13.2
> ---
>
> Key: NUTCH-2893
> URL: https://issues.apache.org/jira/browse/NUTCH-2893
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.19
>
>
> https://github.com/apache/nutch/pull/688



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (NUTCH-2893) fireant upgrade dependency elasticsearch-rest-high-level-client in src/plugin/indexer-elastic/ivy.xml from 7.11.1 to 7.13.2

2021-09-17 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2893:
---

 Summary: fireant upgrade dependency 
elasticsearch-rest-high-level-client in src/plugin/indexer-elastic/ivy.xml from 
7.11.1 to 7.13.2
 Key: NUTCH-2893
 URL: https://issues.apache.org/jira/browse/NUTCH-2893
 Project: Nutch
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.19


https://github.com/apache/nutch/pull/688



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (NUTCH-2892) Upgrade to Any23 2.5

2021-09-17 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-2892:
---

 Summary: Upgrade to Any23 2.5
 Key: NUTCH-2892
 URL: https://issues.apache.org/jira/browse/NUTCH-2892
 Project: Nutch
  Issue Type: Improvement
  Components: build
Affects Versions: 1.18
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.19


I recently released Any23 which includes some important fixes. I'll go ahead 
and upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-503) Bump apache from 21 to 24

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-503.

Resolution: Fixed

> Bump apache from 21 to 24
> -
>
> Key: ANY23-503
> URL: https://issues.apache.org/jira/browse/ANY23-503
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> Original issue created at https://github.com/apache/any23/pull/193
> The PR required more work so I create a different PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-503) Bump apache from 21 to 24

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-503:
--

 Summary: Bump apache from 21 to 24
 Key: ANY23-503
 URL: https://issues.apache.org/jira/browse/ANY23-503
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


Original issue created at https://github.com/apache/any23/pull/193
The PR required more work so I create a different PR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-502) Bump maven-surefire-plugin from 3.0.0-M3 to 3.0.0-M5

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-502:
--

 Summary: Bump maven-surefire-plugin from 3.0.0-M3 to 3.0.0-M5
 Key: ANY23-502
 URL: https://issues.apache.org/jira/browse/ANY23-502
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-502) Bump maven-surefire-plugin from 3.0.0-M3 to 3.0.0-M5

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-502.

Resolution: Fixed

> Bump maven-surefire-plugin from 3.0.0-M3 to 3.0.0-M5
> 
>
> Key: ANY23-502
> URL: https://issues.apache.org/jira/browse/ANY23-502
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/191



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-501) Bump maven-invoker-plugin from 3.2.1 to 3.2.2

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-501:
--

 Summary: Bump maven-invoker-plugin from 3.2.1 to 3.2.2
 Key: ANY23-501
 URL: https://issues.apache.org/jira/browse/ANY23-501
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/192



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-501) Bump maven-invoker-plugin from 3.2.1 to 3.2.2

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-501.

Resolution: Fixed

> Bump maven-invoker-plugin from 3.2.1 to 3.2.2
> -
>
> Key: ANY23-501
> URL: https://issues.apache.org/jira/browse/ANY23-501
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/192



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-500) Bump maven-assembly-plugin from 3.1.1 to 3.3.0

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-500.

Resolution: Fixed

> Bump maven-assembly-plugin from 3.1.1 to 3.3.0
> --
>
> Key: ANY23-500
> URL: https://issues.apache.org/jira/browse/ANY23-500
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-499) Bump spotbugs-maven-plugin from 4.1.3 to 4.3.0

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-499.

Resolution: Fixed

> Bump spotbugs-maven-plugin from 4.1.3 to 4.3.0
> --
>
> Key: ANY23-499
> URL: https://issues.apache.org/jira/browse/ANY23-499
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/190



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-499) Bump spotbugs-maven-plugin from 4.1.3 to 4.3.0

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-499:
--

 Summary: Bump spotbugs-maven-plugin from 4.1.3 to 4.3.0
 Key: ANY23-499
 URL: https://issues.apache.org/jira/browse/ANY23-499
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/190



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-500) Bump maven-assembly-plugin from 3.1.1 to 3.3.0

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-500:
--

 Summary: Bump maven-assembly-plugin from 3.1.1 to 3.3.0
 Key: ANY23-500
 URL: https://issues.apache.org/jira/browse/ANY23-500
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-498) Bump httpcore from 4.4.13 to 4.4.14

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-498.

Resolution: Fixed

> Bump httpcore from 4.4.13 to 4.4.14
> ---
>
> Key: ANY23-498
> URL: https://issues.apache.org/jira/browse/ANY23-498
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/188



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-498) Bump httpcore from 4.4.13 to 4.4.14

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-498:
--

 Summary: Bump httpcore from 4.4.13 to 4.4.14
 Key: ANY23-498
 URL: https://issues.apache.org/jira/browse/ANY23-498
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/188



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-497) Bump commons-codec from 1.14 to 1.15

2021-09-16 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-497.

Resolution: Fixed

> Bump commons-codec from 1.14 to 1.15
> 
>
> Key: ANY23-497
> URL: https://issues.apache.org/jira/browse/ANY23-497
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/187



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-497) Bump commons-codec from 1.14 to 1.15

2021-09-16 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-497:
--

 Summary: Bump commons-codec from 1.14 to 1.15
 Key: ANY23-497
 URL: https://issues.apache.org/jira/browse/ANY23-497
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/187



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-496) Bump tika.version from 1.27 to 2.1.0

2021-09-14 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-496.

Resolution: Fixed

> Bump tika.version from 1.27 to 2.1.0 
> -
>
> Key: ANY23-496
> URL: https://issues.apache.org/jira/browse/ANY23-496
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The original PR at https://github.com/apache/any23/pull/183 was not 
> sufficient. There was more work required. I will submit a separate PR to 
> cover this work. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-496) Bump tika.version from 1.27 to 2.1.0

2021-09-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-496:
--

 Summary: Bump tika.version from 1.27 to 2.1.0 
 Key: ANY23-496
 URL: https://issues.apache.org/jira/browse/ANY23-496
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


The original PR at https://github.com/apache/any23/pull/183 was not sufficient. 
There was more work required. I will submit a separate PR to cover this work. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-494) Bump maven-gpg-plugin from 1.6 to 3.0.1

2021-09-14 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-494.

Resolution: Fixed

> Bump maven-gpg-plugin from 1.6 to 3.0.1
> ---
>
> Key: ANY23-494
> URL: https://issues.apache.org/jira/browse/ANY23-494
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/185



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-495) Bump junit from 4.13 to 4.13.2

2021-09-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-495:
--

 Summary: Bump junit from 4.13 to 4.13.2
 Key: ANY23-495
 URL: https://issues.apache.org/jira/browse/ANY23-495
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Affects Versions: 2.6
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney


https://github.com/apache/any23/pull/184



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-494) Bump maven-gpg-plugin from 1.6 to 3.0.1

2021-09-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-494:
--

 Summary: Bump maven-gpg-plugin from 1.6 to 3.0.1
 Key: ANY23-494
 URL: https://issues.apache.org/jira/browse/ANY23-494
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/185



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-493) Bump commons-io from 2.6 to 2.7

2021-09-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-493.

Resolution: Fixed

> Bump commons-io from 2.6 to 2.7
> ---
>
> Key: ANY23-493
> URL: https://issues.apache.org/jira/browse/ANY23-493
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/182



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-493) Bump commons-io from 2.6 to 2.7

2021-09-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-493:
--

 Summary: Bump commons-io from 2.6 to 2.7
 Key: ANY23-493
 URL: https://issues.apache.org/jira/browse/ANY23-493
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/182



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-492) Bump poi.version from 4.1.2 to 5.0.0

2021-09-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-492.

Resolution: Fixed

> Bump poi.version from 4.1.2 to 5.0.0
> 
>
> Key: ANY23-492
> URL: https://issues.apache.org/jira/browse/ANY23-492
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/181



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-492) Bump poi.version from 4.1.2 to 5.0.0

2021-09-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-492:
--

 Summary: Bump poi.version from 4.1.2 to 5.0.0
 Key: ANY23-492
 URL: https://issues.apache.org/jira/browse/ANY23-492
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/181



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-491) Bump tika.version from 1.24 to 1.27

2021-09-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-491:
--

 Summary: Bump tika.version from 1.24 to 1.27
 Key: ANY23-491
 URL: https://issues.apache.org/jira/browse/ANY23-491
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/179



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-490) Bump httpclient.version from 4.5.12 to 4.5.13

2021-09-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-490.

Resolution: Fixed

> Bump httpclient.version from 4.5.12 to 4.5.13
> -
>
> Key: ANY23-490
> URL: https://issues.apache.org/jira/browse/ANY23-490
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/178



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-490) Bump httpclient.version from 4.5.12 to 4.5.13

2021-09-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-490:
--

 Summary: Bump httpclient.version from 4.5.12 to 4.5.13
 Key: ANY23-490
 URL: https://issues.apache.org/jira/browse/ANY23-490
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/178



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-489) Bump slf4j.logger.version from 1.7.30 to 1.7.32

2021-09-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-489.

Resolution: Fixed

> Bump slf4j.logger.version from 1.7.30 to 1.7.32
> ---
>
> Key: ANY23-489
> URL: https://issues.apache.org/jira/browse/ANY23-489
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/177



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ANY23-488) Bump jsonld-java from 0.13.2 to 0.13.3

2021-09-13 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-488.

Resolution: Fixed

> Bump jsonld-java from 0.13.2 to 0.13.3
> --
>
> Key: ANY23-488
> URL: https://issues.apache.org/jira/browse/ANY23-488
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/176



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-489) Bump slf4j.logger.version from 1.7.30 to 1.7.32

2021-09-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-489:
--

 Summary: Bump slf4j.logger.version from 1.7.30 to 1.7.32
 Key: ANY23-489
 URL: https://issues.apache.org/jira/browse/ANY23-489
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/177



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-488) Bump jsonld-java from 0.13.2 to 0.13.3

2021-09-13 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-488:
--

 Summary: Bump jsonld-java from 0.13.2 to 0.13.3
 Key: ANY23-488
 URL: https://issues.apache.org/jira/browse/ANY23-488
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/176



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ANY23-487) Bump jsoup from 1.13.1 to 1.14.2

2021-09-13 Thread Lewis John McGibbney (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414253#comment-17414253
 ] 

Lewis John McGibbney commented on ANY23-487:


https://github.com/apache/any23/pull/176

> Bump jsoup from 1.13.1 to 1.14.2
> 
>
> Key: ANY23-487
> URL: https://issues.apache.org/jira/browse/ANY23-487
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, dependency
>    Reporter: Lewis John McGibbney
>    Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/175



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    2   3   4   5   6   7   8   9   10   11   >