[GitHub] [any23] dependabot[bot] opened a new pull request #207: Bump commons-lang3 from 3.10 to 3.12.0

2021-10-04 Thread GitBox


dependabot[bot] opened a new pull request #207:
URL: https://github.com/apache/any23/pull/207


   Bumps commons-lang3 from 3.10 to 3.12.0.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-lang3=maven=3.10=3.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [any23] dependabot[bot] opened a new pull request #206: Bump maven-scm-provider-gitexe from 1.9 to 1.12.0

2021-10-04 Thread GitBox


dependabot[bot] opened a new pull request #206:
URL: https://github.com/apache/any23/pull/206


   Bumps maven-scm-provider-gitexe from 1.9 to 1.12.0.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.maven.scm:maven-scm-provider-gitexe=maven=1.9=1.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities

2021-10-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424176#comment-17424176
 ] 

ASF GitHub Bot commented on ANY23-504:
--

lewismc opened a new pull request #205:
URL: https://github.com/apache/any23/pull/205


   *Context*
   This PR is a WIP.
   The unit test attempt sot perform a simple document extraction using the BBC 
Scotland HTML as input.
   
   *How to debug*
   One can inspect the `TriXExtractor` issues by setting a breakpoint at 
[org/apache/any23/extractor/SingleDocumentExtraction.java#L543](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L543).
 You can then evaluate the following expression
   
   ```
   extractionResult.getIssues().toArray()[1]
   ```
   
   This indicates the following
   
   ```
   FATAL:   'org.eclipse.rdf4j.rio.RDFParseException: The attribute name 
must be specified in the attribute-list declaration for element "charset". 
[line 181, column 45]
at 
org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:333)
at 
org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:724)
at 
org.eclipse.rdf4j.rio.trix.TriXParser.reportFatalError(TriXParser.java:253)
at org.eclipse.rdf4j.rio.trix.TriXParser.fatalError(TriXParser.java:419)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.scanAttlistDecl(Unknown 
Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(Unknown Source)
at org.apa...'  (-1,-1)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Optionally disable remote HTTP connections when resolving XML entities
> --
>
> Key: ANY23-504
> URL: https://issues.apache.org/jira/browse/ANY23-504
> Project: Apache Any23
>  Issue Type: Improvement
>Reporter: Sebastian Nagel
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> The Any23 parser should optionally avoid to open HTTP connections when 
> parsing XML.
> While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file 
> "BBC_News_Scotland.htm", the parser did hang for about two minutes with an 
> open HTTP connection to "hans-moleman.w3.org" and the following stack:
> {noformat}
> "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s 
> tid=0x7efc713bd800 nid=0x16ff4 runnable  [0x7efc29f2d000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native 
> Method)
> at 
> java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168)
> at 
> java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140)
> at 
> java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252)
> at 
> java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292)
> at 
> java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351)
> - locked <0x00071be1bb68> (a java.io.BufferedInputStream)
> at 
> sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754)
> at 
> sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520)
> - locked <0x00071be11040> (a 
> sun.net.www.protocol.http.HttpURLConnection)
> at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
> Source)
> at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
> at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown 
> Source)
> at 

[GitHub] [any23] lewismc opened a new pull request #205: ANY23-504 Optionally disable remote HTTP connections when resolving XML entities

2021-10-04 Thread GitBox


lewismc opened a new pull request #205:
URL: https://github.com/apache/any23/pull/205


   *Context*
   This PR is a WIP.
   The unit test attempt sot perform a simple document extraction using the BBC 
Scotland HTML as input.
   
   *How to debug*
   One can inspect the `TriXExtractor` issues by setting a breakpoint at 
[org/apache/any23/extractor/SingleDocumentExtraction.java#L543](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L543).
 You can then evaluate the following expression
   
   ```
   extractionResult.getIssues().toArray()[1]
   ```
   
   This indicates the following
   
   ```
   FATAL:   'org.eclipse.rdf4j.rio.RDFParseException: The attribute name 
must be specified in the attribute-list declaration for element "charset". 
[line 181, column 45]
at 
org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:333)
at 
org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:724)
at 
org.eclipse.rdf4j.rio.trix.TriXParser.reportFatalError(TriXParser.java:253)
at org.eclipse.rdf4j.rio.trix.TriXParser.fatalError(TriXParser.java:419)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.scanAttlistDecl(Unknown 
Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(Unknown Source)
at org.apa...'  (-1,-1)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (ANY23-513) Bump formatter-maven-plugin from 2.14.0 to 2.16.0

2021-10-04 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/ANY23-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-513.

Resolution: Fixed

> Bump formatter-maven-plugin from 2.14.0 to 2.16.0
> -
>
> Key: ANY23-513
> URL: https://issues.apache.org/jira/browse/ANY23-513
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 2.6
>
>
> https://github.com/apache/any23/pull/204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ANY23-513) Bump formatter-maven-plugin from 2.14.0 to 2.16.0

2021-10-04 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created ANY23-513:
--

 Summary: Bump formatter-maven-plugin from 2.14.0 to 2.16.0
 Key: ANY23-513
 URL: https://issues.apache.org/jira/browse/ANY23-513
 Project: Apache Any23
  Issue Type: Improvement
  Components: build
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.6


https://github.com/apache/any23/pull/204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [any23] lewismc merged pull request #204: Bump formatter-maven-plugin from 2.14.0 to 2.16.0

2021-10-04 Thread GitBox


lewismc merged pull request #204:
URL: https://github.com/apache/any23/pull/204


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org