[GitHub] any23 pull request #111: Any23 295 librdfa module
Github user JulioCCBUcuenca closed the pull request at: https://github.com/apache/any23/pull/111 ---
[GitHub] any23 issue #111: Any23 295 librdfa module
Github user lewismc commented on the issue: https://github.com/apache/any23/pull/111 Which of the PR's are you attempting to use? I think this PR is not suitable for merging. Please close it off and resubmit a clean PR, thank you @JulioCCBUcuenca ---
[jira] [Commented] (ANY23-381) JsonParseException: Illegal unquoted character
[ https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567568#comment-16567568 ] Hudson commented on ANY23-381: -- SUCCESS: Integrated in Jenkins build Any23-trunk #1609 (See [https://builds.apache.org/job/Any23-trunk/1609/]) ANY23-381 fix illegal unescaped characters in JSON-LD (hans: rev 817e744af90d8f3c9bf419e5c395c421e0c3924a) * (edit) core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java * (edit) core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java * (add) test-resources/src/test/resources/html/html-jsonld-unescaped-characters.html > JsonParseException: Illegal unquoted character > -- > > Key: ANY23-381 > URL: https://issues.apache.org/jira/browse/ANY23-381 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > While perusing the site http://losangeles.eventful.com/events I stumbled > across the following exception: > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) > ... 36 more > Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted > character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be > included in string value > at [Source: (BufferedReader); line: 1, column: 147] > at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278) > at > com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) > at > com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) > at > com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) > at > com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) > at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) > ... 37 more > {noformat} > caused by the {{description}} field in the following json spanning multiple > unescaped newlines: > {noformat} > { > "@context": "http://schema.org";, > "@type": "Event", > "name": "#1 Magic Show in L.A.", > "description": "#1 MAGIC SHOW IN L.A. > The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist > extraordinaire Ivan Amodei is on a national tour with his show INTIMATE > ILLUSIONS. > Currently, on an ei...", > "startDate": "Saturday, August 11, 2018 4:00 PM", > "image": > "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png", > "location": { > "@type": "Place", > "name": "Beverly Wilshire Hotel", > "url": > "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4", > "address": { > "streetAddress": "9500 Wilshire Boulevard", > "addressLocality": "Beverly Hills", > "addressRegion": "California", > "postalCode": "90212" > } > }, > "offers": { > "@type": "Offer", > "url": > "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets", > "availability": "http://schema.org/InStock"; > }, > "performer": [{ > "@type": "Person", > "name": "Ivan Amodei" >
[jira] [Assigned] (ANY23-237) Fix RDFa test 0087: stylesheet reserved word is stripped out
[ https://issues.apache.org/jira/browse/ANY23-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende reassigned ANY23-237: - Assignee: Hans Brende > Fix RDFa test 0087: stylesheet reserved word is stripped out > > > Key: ANY23-237 > URL: https://issues.apache.org/jira/browse/ANY23-237 > Project: Apache Any23 > Issue Type: Bug >Reporter: stephane corlosquet >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > We have pretty much 100% green results on the official RDFa test suite at > http://rdfa.info/test-suite/. There is only one fail remaining: test 0087. > For some reason, any23 isn't able to extract a triple out of this markup: > {code} > href="http://example.org/stylesheet";>stylesheet > {code} > when it can extract the right triple for all the other elements in the test > such as > {code} > http://example.org/alternate";>alternate > {code} > I'm going to need some help to figure this out, as I have no idea what part > of any23 is causing this. I checked the same test on semargl (our RDFa > parser) and it is passing no problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ANY23-169) Incorrect interpretation of relative and absolute paths with Microdata
[ https://issues.apache.org/jira/browse/ANY23-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende reassigned ANY23-169: - Assignee: Hans Brende > Incorrect interpretation of relative and absolute paths with Microdata > -- > > Key: ANY23-169 > URL: https://issues.apache.org/jira/browse/ANY23-169 > Project: Apache Any23 > Issue Type: Bug > Components: microdata >Reporter: Ruben Verborgh >Assignee: Hans Brende >Priority: Major > Labels: microdata, url, urls > Fix For: 2.3 > > > Parsing the following fragment located at > http://ruben.verborgh.org/tmp/slash-test.html > Homepage > Other > results in the URIs > http://ruben.verborgh.org/tmp/slash-test.html// > http://ruben.verborgh.org/tmp/slash-test.html/other.html > instead of the correct > http://ruben.verborgh.org/tmp/ > http://ruben.verborgh.org/tmp/other.html > Note that there is no trailing slash in the original. > Test case: > http://ruben.verborgh.org/tmp/slash-test.html > http://any23.org/any23/?format=best&uri=http%3A%2F%2Fruben.verborgh.org%2Ftmp%2Fslash-test.html&validation-mode=none -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ANY23-331) Tool service implementations declared in wrong module?
[ https://issues.apache.org/jira/browse/ANY23-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende reassigned ANY23-331: - Assignee: Hans Brende > Tool service implementations declared in wrong module? > -- > > Key: ANY23-331 > URL: https://issues.apache.org/jira/browse/ANY23-331 > Project: Apache Any23 > Issue Type: Bug > Components: CLI, core >Affects Versions: 2.1 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > It appears that all the {{org.apache.any23.cli.Tool}}-implementing classes > are declared in the *core*/src/main/resources/META-INF/services directory > instead of the *cli*/src/main/resources/META-INF/services directory (even > though all of the referenced Tool implementations occur in the *cli* module, > *not* the core module). I'm not sure what effect (if any) this would have on > service loading, but this feels wrong. Comments anyone? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ANY23-378) JsonParseException caused by trailing commas in JSON-LD
[ https://issues.apache.org/jira/browse/ANY23-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende updated ANY23-378: -- Summary: JsonParseException caused by trailing commas in JSON-LD (was: JsonParseException) > JsonParseException caused by trailing commas in JSON-LD > --- > > Key: ANY23-378 > URL: https://issues.apache.org/jira/browse/ANY23-378 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > On the page http://golfavisen.dk/golfavisen-award-2018/ I'm getting a > JsonParseException in the EmbeddedJSONLDExtractor: > {noformat} > org.apache.any23.extractor.ExtractionException: Error while parsing RDF > document. > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:175) > at > org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.extractJSONLDScript(EmbeddedJSONLDExtractor.java:149) > at > org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.run(EmbeddedJSONLDExtractor.java:83) > at > org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.run(EmbeddedJSONLDExtractor.java:54) > at > org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:480) > at > org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:259) > at > org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:323) > at > org.apache.any23.extractor.html.AbstractExtractorTestCase.extract(AbstractExtractorTestCase.java:189) > at > org.apache.any23.extractor.html.AbstractExtractorTestCase.assertExtract(AbstractExtractorTestCase.java:204) > ... 28 more > Caused by: org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:171) > ... 36 more > Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected > character ('}' (code 125)): was expecting double-quote to start field name > at [Source: (BufferedReader); line: 9, column: 10] > at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:561) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddName(ReaderBasedJsonParser.java:1757) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName(ReaderBasedJsonParser.java:907) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:512) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) > at > com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) > at > com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) > at > com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) > at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) > ... 37 more > {noformat} > caused by the following json: > {noformat} > { "@context": "http://schema.org";, > "@type": "Event", > "name": "PINNACLE BANK CHAMPIONSHIP", > "startDate": "2018-7-19T00-00-00-00", > "endDate": "2018-7-19T23-23-59-00", > "image":"http://golfavisen.dk/wp-content/uploads/2017/03/WEB.png";, > "description":"PINNACLE BANK CHAMPIONSHIP", > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ANY23-381) JsonParseException: Illegal unquoted character
[ https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567523#comment-16567523 ] ASF GitHub Bot commented on ANY23-381: -- Github user asfgit closed the pull request at: https://github.com/apache/any23/pull/112 > JsonParseException: Illegal unquoted character > -- > > Key: ANY23-381 > URL: https://issues.apache.org/jira/browse/ANY23-381 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > While perusing the site http://losangeles.eventful.com/events I stumbled > across the following exception: > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) > ... 36 more > Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted > character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be > included in string value > at [Source: (BufferedReader); line: 1, column: 147] > at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278) > at > com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) > at > com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) > at > com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) > at > com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) > at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) > ... 37 more > {noformat} > caused by the {{description}} field in the following json spanning multiple > unescaped newlines: > {noformat} > { > "@context": "http://schema.org";, > "@type": "Event", > "name": "#1 Magic Show in L.A.", > "description": "#1 MAGIC SHOW IN L.A. > The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist > extraordinaire Ivan Amodei is on a national tour with his show INTIMATE > ILLUSIONS. > Currently, on an ei...", > "startDate": "Saturday, August 11, 2018 4:00 PM", > "image": > "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png", > "location": { > "@type": "Place", > "name": "Beverly Wilshire Hotel", > "url": > "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4", > "address": { > "streetAddress": "9500 Wilshire Boulevard", > "addressLocality": "Beverly Hills", > "addressRegion": "California", > "postalCode": "90212" > } > }, > "offers": { > "@type": "Offer", > "url": > "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets", > "availability": "http://schema.org/InStock"; > }, > "performer": [{ > "@type": "Person", > "name": "Ivan Amodei" > }] > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] any23 pull request #112: ANY23-381 escape illegal characters in JSON-LD stri...
Github user asfgit closed the pull request at: https://github.com/apache/any23/pull/112 ---
[jira] [Resolved] (ANY23-381) JsonParseException: Illegal unquoted character
[ https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende resolved ANY23-381. --- Resolution: Fixed > JsonParseException: Illegal unquoted character > -- > > Key: ANY23-381 > URL: https://issues.apache.org/jira/browse/ANY23-381 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > While perusing the site http://losangeles.eventful.com/events I stumbled > across the following exception: > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) > ... 36 more > Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted > character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be > included in string value > at [Source: (BufferedReader); line: 1, column: 147] > at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278) > at > com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) > at > com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) > at > com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) > at > com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) > at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) > ... 37 more > {noformat} > caused by the {{description}} field in the following json spanning multiple > unescaped newlines: > {noformat} > { > "@context": "http://schema.org";, > "@type": "Event", > "name": "#1 Magic Show in L.A.", > "description": "#1 MAGIC SHOW IN L.A. > The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist > extraordinaire Ivan Amodei is on a national tour with his show INTIMATE > ILLUSIONS. > Currently, on an ei...", > "startDate": "Saturday, August 11, 2018 4:00 PM", > "image": > "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png", > "location": { > "@type": "Place", > "name": "Beverly Wilshire Hotel", > "url": > "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4", > "address": { > "streetAddress": "9500 Wilshire Boulevard", > "addressLocality": "Beverly Hills", > "addressRegion": "California", > "postalCode": "90212" > } > }, > "offers": { > "@type": "Offer", > "url": > "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets", > "availability": "http://schema.org/InStock"; > }, > "performer": [{ > "@type": "Person", > "name": "Ivan Amodei" > }] > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ANY23-381) JsonParseException: Illegal unquoted character
[ https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567522#comment-16567522 ] ASF GitHub Bot commented on ANY23-381: -- GitHub user HansBrende opened a pull request: https://github.com/apache/any23/pull/112 ANY23-381 escape illegal characters in JSON-LD strings mvn clean test -> all tests passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/HansBrende/any23 ANY23-381 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/112.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #112 commit 817e744af90d8f3c9bf419e5c395c421e0c3924a Author: Hans Date: 2018-08-02T21:33:36Z ANY23-381 fix illegal unescaped characters in JSON-LD > JsonParseException: Illegal unquoted character > -- > > Key: ANY23-381 > URL: https://issues.apache.org/jira/browse/ANY23-381 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > While perusing the site http://losangeles.eventful.com/events I stumbled > across the following exception: > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) > ... 36 more > Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted > character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be > included in string value > at [Source: (BufferedReader); line: 1, column: 147] > at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) > at > com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016) > at > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278) > at > com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) > at > com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) > at > com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) > at > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) > at > com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) > at > com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) > at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) > at > com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) > at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) > ... 37 more > {noformat} > caused by the {{description}} field in the following json spanning multiple > unescaped newlines: > {noformat} > { > "@context": "http://schema.org";, > "@type": "Event", > "name": "#1 Magic Show in L.A.", > "description": "#1 MAGIC SHOW IN L.A. > The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist > extraordinaire Ivan Amodei is on a national tour with his show INTIMATE > ILLUSIONS. > Currently, on an ei...", > "startDate": "Saturday, August 11, 2018 4:00 PM", > "image": > "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png", > "location": { > "@type": "Place", > "name": "Beverly Wilshire Hotel", > "url": > "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4", > "address": { > "streetAddress": "9500 Wilshire Boulevard", > "addressLocality": "Beverly Hills", > "addressRegion": "California", > "postalCode": "90212" > } > }, > "offers":
[GitHub] any23 pull request #112: ANY23-381 escape illegal characters in JSON-LD stri...
GitHub user HansBrende opened a pull request: https://github.com/apache/any23/pull/112 ANY23-381 escape illegal characters in JSON-LD strings mvn clean test -> all tests passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/HansBrende/any23 ANY23-381 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/112.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #112 commit 817e744af90d8f3c9bf419e5c395c421e0c3924a Author: Hans Date: 2018-08-02T21:33:36Z ANY23-381 fix illegal unescaped characters in JSON-LD ---
[GitHub] any23 issue #111: Any23 295 librdfa module
Github user JulioCCBUcuenca commented on the issue: https://github.com/apache/any23/pull/111 @lewismc Could you please create a separate branch for the implementation of librdfa-rdf4j? ---
[GitHub] any23 pull request #111: Any23 295 librdfa module
GitHub user JulioCCBUcuenca opened a pull request: https://github.com/apache/any23/pull/111 Any23 295 librdfa module # librdfa-rdf4j Implementation of librdfa bridge. This implementation has a RDF4J Parser along with the bridge between librdfa (C) and RDF4J (Java). You can merge this pull request into a Git repository by running: $ git pull https://github.com/JulioCCBUcuenca/any23 ANY23-295_librdfa Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/111.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #111 commit a9f37b2293fb371eda431b4385e26cf99fbff365 Author: Julio Caguano Date: 2018-06-29T03:03:56Z Add extractors for librdfa. Signed-off-by: Julio Caguano commit eec861c65f66bd9cbcb45bcabe7d9ca1834ae3d2 Author: Julio Caguano Date: 2018-06-29T03:05:29Z Merge branch 'master' of https://github.com/apache/any23 into ANY23-295 commit ccf33eafca699dde016f0551f664edb584a6e5ba Author: Julio Caguano Date: 2018-07-13T03:30:50Z Merge branch 'master' of https://github.com/apache/any23 into ANY23-295 commit 64082692139f69df16d2985b6e9591d000e6457b Author: Julio Caguano Date: 2018-07-16T15:18:09Z add librdfa extractor commit 2fe500f5283a96a4d2428fad30b0b671384d6a73 Author: Julio Caguano Date: 2018-07-16T15:19:04Z Merge branch 'master' of https://github.com/apache/any23 into ANY23-295 commit 68f0d8078fb2adc3d11d9e8ebf83c6e7e58aa9b3 Author: Julio Caguano Date: 2018-07-19T03:14:56Z ignore basic test commit bd70dfc1abc4864fcd3857291d009e2a45d9b556 Author: Julio Caguano Date: 2018-07-26T03:40:57Z Make libdrfa configurable. commit a271a21760ca4caee0ec9359008cd446fe8b950a Author: Julio Caguano Date: 2018-07-30T00:39:19Z solve integration test. Librdfa is loaded with SPI. commit 5dbc86c7601f3e9abd1aec08cb5e41716c2e7cab Author: Julio Caguano Date: 2018-07-30T03:18:25Z Add test suite commit c4b5dccbbd004e480494f38f57238936d3e8942d Author: Julio Caguano Date: 2018-07-30T03:18:52Z add last version of librdfa-rdf4j commit 85e0c7e13df92e457d8144c9cf66f67fe85bd9e5 Author: Julio Caguano Date: 2018-07-30T03:20:52Z Add lang tag. lang tag is used to identify language in HTML pages, and xml:lang is used to identify in xml files. commit b0a21ff13fa9e03c7734be54f1bdb2a8b85ad307 Author: Julio Caguano Date: 2018-08-02T19:31:24Z Merge branch 'master' of https://github.com/apache/any23 into ANY23-295 commit d0e5f8319cf2f52c4d4ba3d7cfc9f789cb225847 Author: Julio Caguano Date: 2018-08-02T20:12:32Z librdfa module, bridge. ---
[jira] [Commented] (ANY23-380) RDFa SAXParseException: attribute was already specified
[ https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567392#comment-16567392 ] Hudson commented on ANY23-380: -- SUCCESS: Integrated in Jenkins build Any23-trunk #1608 (See [https://builds.apache.org/job/Any23-trunk/1608/]) ANY23-380 disallow duplicate attribute keys (hans: rev 4e3011a4d80545f04563f427687f4fa74e17103f) * (add) test-resources/src/test/resources/html/rdfa/attribute-already-specified.html * (edit) core/src/test/java/org/apache/any23/extractor/rdfa/RDFa11ExtractorTest.java * (edit) core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java > RDFa SAXParseException: attribute was already specified > --- > > Key: ANY23-380 > URL: https://issues.apache.org/jira/browse/ANY23-380 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came > upon the following exception: > > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; > lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified > for element "a". > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178) > ... 34 more > Caused by: org.semarglproject.rdf.ParseException: > org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute > "dort..." was already specified for element "a". > at > org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141) > at org.semarglproject.source.XmlSource.process(XmlSource.java:50) > at > org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109) > ... 36 more > Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; > Attribute "dort..." was already specified for element "a". > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at org.semarglproject.source.XmlSource.process(XmlSource.java:48) > ... 40 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ANY23-381) JsonParseException: Illegal unquoted character
[ https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende updated ANY23-381: -- Description: While perusing the site http://losangeles.eventful.com/events I stumbled across the following exception: {noformat} org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) ... 36 more Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value at [Source: (BufferedReader); line: 1, column: 147] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) at com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278) at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) at com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) at com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) at com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) at com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) ... 37 more {noformat} caused by the {{description}} field in the following json spanning multiple unescaped newlines: {noformat} { "@context": "http://schema.org";, "@type": "Event", "name": "#1 Magic Show in L.A.", "description": "#1 MAGIC SHOW IN L.A. The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist extraordinaire Ivan Amodei is on a national tour with his show INTIMATE ILLUSIONS. Currently, on an ei...", "startDate": "Saturday, August 11, 2018 4:00 PM", "image": "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png", "location": { "@type": "Place", "name": "Beverly Wilshire Hotel", "url": "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4", "address": { "streetAddress": "9500 Wilshire Boulevard", "addressLocality": "Beverly Hills", "addressRegion": "California", "postalCode": "90212" } }, "offers": { "@type": "Offer", "url": "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets", "availability": "http://schema.org/InStock"; }, "performer": [{ "@type": "Person", "name": "Ivan Amodei" }] } {noformat} was: While perusing the site http://losangeles.eventful.com/events I stumbled across the following exception: {noformat} org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) ... 36 more Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value at [Source: (BufferedReader); line: 1, column: 147] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) at com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) at com.fasterxml.jackson.core.json
[jira] [Created] (ANY23-381) JsonParseException: Illegal unquoted character
Hans Brende created ANY23-381: - Summary: JsonParseException: Illegal unquoted character Key: ANY23-381 URL: https://issues.apache.org/jira/browse/ANY23-381 Project: Apache Any23 Issue Type: Bug Components: extractors Affects Versions: 2.3 Reporter: Hans Brende Assignee: Hans Brende Fix For: 2.3 While perusing the site http://losangeles.eventful.com/events I stumbled across the following exception: {noformat} org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196) ... 36 more Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in string value at [Source: (BufferedReader); line: 1, column: 147] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) at com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278) at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264) at com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729) at com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196) at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173) at com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154) at com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111) at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71) ... 37 more {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ANY23-380) RDFa SAXParseException: attribute was already specified
[ https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende resolved ANY23-380. --- Resolution: Fixed Assignee: Hans Brende > RDFa SAXParseException: attribute was already specified > --- > > Key: ANY23-380 > URL: https://issues.apache.org/jira/browse/ANY23-380 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Assignee: Hans Brende >Priority: Major > Fix For: 2.3 > > > When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came > upon the following exception: > > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; > lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified > for element "a". > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178) > ... 34 more > Caused by: org.semarglproject.rdf.ParseException: > org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute > "dort..." was already specified for element "a". > at > org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141) > at org.semarglproject.source.XmlSource.process(XmlSource.java:50) > at > org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109) > ... 36 more > Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; > Attribute "dort..." was already specified for element "a". > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at org.semarglproject.source.XmlSource.process(XmlSource.java:48) > ... 40 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ANY23-380) RDFa SAXParseException: attribute was already specified
[ https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567311#comment-16567311 ] ASF GitHub Bot commented on ANY23-380: -- Github user asfgit closed the pull request at: https://github.com/apache/any23/pull/110 > RDFa SAXParseException: attribute was already specified > --- > > Key: ANY23-380 > URL: https://issues.apache.org/jira/browse/ANY23-380 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Priority: Major > Fix For: 2.3 > > > When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came > upon the following exception: > > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; > lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified > for element "a". > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178) > ... 34 more > Caused by: org.semarglproject.rdf.ParseException: > org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute > "dort..." was already specified for element "a". > at > org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141) > at org.semarglproject.source.XmlSource.process(XmlSource.java:50) > at > org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109) > ... 36 more > Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; > Attribute "dort..." was already specified for element "a". > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at org.semarglproject.source.XmlSource.process(XmlSource.java:48) > ... 40 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] any23 pull request #110: ANY23-380 disallow duplicate attribute keys
Github user asfgit closed the pull request at: https://github.com/apache/any23/pull/110 ---
[jira] [Commented] (ANY23-380) RDFa SAXParseException: attribute was already specified
[ https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567307#comment-16567307 ] ASF GitHub Bot commented on ANY23-380: -- GitHub user HansBrende opened a pull request: https://github.com/apache/any23/pull/110 ANY23-380 disallow duplicate attribute keys I disallowed duplicate attribute keys in html to avoid `org.xml.sax.SAXParseException`s. Along the way, I also cleaned up some annoying or unnecessary logging/console output produced by our massive suite of test cases. Also cleaned up some javadoc/miscellaneous items. mvn clean test -> all tests passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/HansBrende/any23 ANY23-380 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/110.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #110 commit 4e3011a4d80545f04563f427687f4fa74e17103f Author: Hans Date: 2018-08-01T21:06:55Z ANY23-380 disallow duplicate attribute keys commit 159aeb489473f600213142a746d39a49e3d3548b Author: Hans Date: 2018-08-02T17:46:44Z cleaned up annoying logging/console output commit 0291f588d04859053ef4eb8845686bad824b4461 Author: Hans Date: 2018-08-02T18:01:19Z added license and javadoc > RDFa SAXParseException: attribute was already specified > --- > > Key: ANY23-380 > URL: https://issues.apache.org/jira/browse/ANY23-380 > Project: Apache Any23 > Issue Type: Bug > Components: extractors >Affects Versions: 2.3 >Reporter: Hans Brende >Priority: Major > Fix For: 2.3 > > > When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came > upon the following exception: > > {noformat} > org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; > lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified > for element "a". > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95) > at > org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178) > ... 34 more > Caused by: org.semarglproject.rdf.ParseException: > org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute > "dort..." was already specified for element "a". > at > org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141) > at org.semarglproject.source.XmlSource.process(XmlSource.java:50) > at > org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167) > at > org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154) > at > org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109) > ... 36 more > Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; > Attribute "dort..." was already specified for element "a". > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at org.semarglproject.source.XmlSource.process(XmlSource.java:48) > ... 40 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] any23 pull request #110: ANY23-380 disallow duplicate attribute keys
GitHub user HansBrende opened a pull request: https://github.com/apache/any23/pull/110 ANY23-380 disallow duplicate attribute keys I disallowed duplicate attribute keys in html to avoid `org.xml.sax.SAXParseException`s. Along the way, I also cleaned up some annoying or unnecessary logging/console output produced by our massive suite of test cases. Also cleaned up some javadoc/miscellaneous items. mvn clean test -> all tests passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/HansBrende/any23 ANY23-380 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/any23/pull/110.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #110 commit 4e3011a4d80545f04563f427687f4fa74e17103f Author: Hans Date: 2018-08-01T21:06:55Z ANY23-380 disallow duplicate attribute keys commit 159aeb489473f600213142a746d39a49e3d3548b Author: Hans Date: 2018-08-02T17:46:44Z cleaned up annoying logging/console output commit 0291f588d04859053ef4eb8845686bad824b4461 Author: Hans Date: 2018-08-02T18:01:19Z added license and javadoc ---