[GitHub] [any23] dependabot[bot] opened a new pull request #207: Bump commons-lang3 from 3.10 to 3.12.0
dependabot[bot] opened a new pull request #207: URL: https://github.com/apache/any23/pull/207 Bumps commons-lang3 from 3.10 to 3.12.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.commons:commons-lang3=maven=3.10=3.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [any23] dependabot[bot] opened a new pull request #206: Bump maven-scm-provider-gitexe from 1.9 to 1.12.0
dependabot[bot] opened a new pull request #206: URL: https://github.com/apache/any23/pull/206 Bumps maven-scm-provider-gitexe from 1.9 to 1.12.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.maven.scm:maven-scm-provider-gitexe=maven=1.9=1.12.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (ANY23-504) Optionally disable remote HTTP connections when resolving XML entities
[ https://issues.apache.org/jira/browse/ANY23-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424176#comment-17424176 ] ASF GitHub Bot commented on ANY23-504: -- lewismc opened a new pull request #205: URL: https://github.com/apache/any23/pull/205 *Context* This PR is a WIP. The unit test attempt sot perform a simple document extraction using the BBC Scotland HTML as input. *How to debug* One can inspect the `TriXExtractor` issues by setting a breakpoint at [org/apache/any23/extractor/SingleDocumentExtraction.java#L543](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L543). You can then evaluate the following expression ``` extractionResult.getIssues().toArray()[1] ``` This indicates the following ``` FATAL: 'org.eclipse.rdf4j.rio.RDFParseException: The attribute name must be specified in the attribute-list declaration for element "charset". [line 181, column 45] at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:333) at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:724) at org.eclipse.rdf4j.rio.trix.TriXParser.reportFatalError(TriXParser.java:253) at org.eclipse.rdf4j.rio.trix.TriXParser.fatalError(TriXParser.java:419) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source) at org.apache.xerces.impl.XMLDTDScannerImpl.scanAttlistDecl(Unknown Source) at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(Unknown Source) at org.apa...' (-1,-1) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optionally disable remote HTTP connections when resolving XML entities > -- > > Key: ANY23-504 > URL: https://issues.apache.org/jira/browse/ANY23-504 > Project: Apache Any23 > Issue Type: Improvement >Reporter: Sebastian Nagel >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.6 > > > The Any23 parser should optionally avoid to open HTTP connections when > parsing XML. > While testing the Nutch's Any23 plugin with 2.5 (NUTCH-2892) on the file > "BBC_News_Scotland.htm", the parser did hang for about two minutes with an > open HTTP connection to "hans-moleman.w3.org" and the following stack: > {noformat} > "parse-0" #19 daemon prio=5 os_prio=0 cpu=1432.93ms elapsed=15.85s > tid=0x7efc713bd800 nid=0x16ff4 runnable [0x7efc29f2d000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(java.base@11.0.11/Native > Method) > at > java.net.SocketInputStream.socketRead(java.base@11.0.11/SocketInputStream.java:115) > at > java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:168) > at > java.net.SocketInputStream.read(java.base@11.0.11/SocketInputStream.java:140) > at > java.io.BufferedInputStream.fill(java.base@11.0.11/BufferedInputStream.java:252) > at > java.io.BufferedInputStream.read1(java.base@11.0.11/BufferedInputStream.java:292) > at > java.io.BufferedInputStream.read(java.base@11.0.11/BufferedInputStream.java:351) > - locked <0x00071be1bb68> (a java.io.BufferedInputStream) > at > sun.net.www.http.HttpClient.parseHTTPHeader(java.base@11.0.11/HttpClient.java:754) > at > sun.net.www.http.HttpClient.parseHTTP(java.base@11.0.11/HttpClient.java:689) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@11.0.11/HttpURLConnection.java:1615) > - locked <0x00071be11040> (a > sun.net.www.protocol.http.HttpURLConnection) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@11.0.11/HttpURLConnection.java:1520) > - locked <0x00071be11040> (a > sun.net.www.protocol.http.HttpURLConnection) > at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown > Source) > at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source) > at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown > Source) > at
[GitHub] [any23] lewismc opened a new pull request #205: ANY23-504 Optionally disable remote HTTP connections when resolving XML entities
lewismc opened a new pull request #205: URL: https://github.com/apache/any23/pull/205 *Context* This PR is a WIP. The unit test attempt sot perform a simple document extraction using the BBC Scotland HTML as input. *How to debug* One can inspect the `TriXExtractor` issues by setting a breakpoint at [org/apache/any23/extractor/SingleDocumentExtraction.java#L543](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L543). You can then evaluate the following expression ``` extractionResult.getIssues().toArray()[1] ``` This indicates the following ``` FATAL: 'org.eclipse.rdf4j.rio.RDFParseException: The attribute name must be specified in the attribute-list declaration for element "charset". [line 181, column 45] at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:333) at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:724) at org.eclipse.rdf4j.rio.trix.TriXParser.reportFatalError(TriXParser.java:253) at org.eclipse.rdf4j.rio.trix.TriXParser.fatalError(TriXParser.java:419) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source) at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source) at org.apache.xerces.impl.XMLDTDScannerImpl.scanAttlistDecl(Unknown Source) at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(Unknown Source) at org.apa...' (-1,-1) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (ANY23-513) Bump formatter-maven-plugin from 2.14.0 to 2.16.0
[ https://issues.apache.org/jira/browse/ANY23-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved ANY23-513. Resolution: Fixed > Bump formatter-maven-plugin from 2.14.0 to 2.16.0 > - > > Key: ANY23-513 > URL: https://issues.apache.org/jira/browse/ANY23-513 > Project: Apache Any23 > Issue Type: Improvement > Components: build >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 2.6 > > > https://github.com/apache/any23/pull/204 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ANY23-513) Bump formatter-maven-plugin from 2.14.0 to 2.16.0
Lewis John McGibbney created ANY23-513: -- Summary: Bump formatter-maven-plugin from 2.14.0 to 2.16.0 Key: ANY23-513 URL: https://issues.apache.org/jira/browse/ANY23-513 Project: Apache Any23 Issue Type: Improvement Components: build Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 2.6 https://github.com/apache/any23/pull/204 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [any23] lewismc merged pull request #204: Bump formatter-maven-plugin from 2.14.0 to 2.16.0
lewismc merged pull request #204: URL: https://github.com/apache/any23/pull/204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@any23.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org