[GitHub] any23 pull request #111: Any23 295 librdfa module

2018-08-02 Thread JulioCCBUcuenca
Github user JulioCCBUcuenca closed the pull request at:

https://github.com/apache/any23/pull/111


---


[GitHub] any23 issue #111: Any23 295 librdfa module

2018-08-02 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/111
  
Which of the PR's are you attempting to use?
I think this PR is not suitable for merging. Please close it off and 
resubmit a clean PR, thank you @JulioCCBUcuenca 


---


[jira] [Commented] (ANY23-381) JsonParseException: Illegal unquoted character

2018-08-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567568#comment-16567568
 ] 

Hudson commented on ANY23-381:
--

SUCCESS: Integrated in Jenkins build Any23-trunk #1609 (See 
[https://builds.apache.org/job/Any23-trunk/1609/])
ANY23-381 fix illegal unescaped characters in JSON-LD (hans: rev 
817e744af90d8f3c9bf419e5c395c421e0c3924a)
* (edit) core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java
* (edit) 
core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java
* (add) 
test-resources/src/test/resources/html/html-jsonld-unescaped-characters.html


> JsonParseException: Illegal unquoted character
> --
>
> Key: ANY23-381
> URL: https://issues.apache.org/jira/browse/ANY23-381
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> While perusing the site http://losangeles.eventful.com/events I stumbled 
> across the following exception:
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
>   ... 36 more
> Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
> character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
> included in string value
>  at [Source: (BufferedReader); line: 1, column: 147]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278)
>   at 
> com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
>   at 
> com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
>   at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
>   ... 37 more
> {noformat}
> caused by the {{description}} field in the following json spanning multiple 
> unescaped newlines: 
> {noformat}
>   {
> "@context": "http://schema.org";,
> "@type": "Event",
> "name": "#1 Magic Show in L.A.",
> "description": "#1 MAGIC SHOW IN L.A.
> The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist 
> extraordinaire Ivan Amodei is on a national tour with his show INTIMATE 
> ILLUSIONS. 
> Currently, on an ei...",
> "startDate": "Saturday, August 11, 2018  4:00 PM",
> "image": 
> "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png",
> "location": {
>   "@type": "Place",
>   "name": "Beverly Wilshire Hotel",
>   "url": 
> "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4",
>   "address": {
> "streetAddress": "9500 Wilshire Boulevard",
> "addressLocality": "Beverly Hills",
> "addressRegion": "California",
> "postalCode": "90212"
>   }
> },
> "offers": {
>   "@type": "Offer",
>   "url": 
> "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets",
>   "availability": "http://schema.org/InStock";
> },
> "performer": [{
>   "@type": "Person",
>   "name": "Ivan Amodei"
>   

[jira] [Assigned] (ANY23-237) Fix RDFa test 0087: stylesheet reserved word is stripped out

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende reassigned ANY23-237:
-

Assignee: Hans Brende

> Fix RDFa test 0087: stylesheet reserved word is stripped out
> 
>
> Key: ANY23-237
> URL: https://issues.apache.org/jira/browse/ANY23-237
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> We have pretty much 100% green results on the official RDFa test suite at 
> http://rdfa.info/test-suite/. There is only one fail remaining: test 0087.
> For some reason, any23 isn't able to extract a triple out of this markup:
> {code}
>   href="http://example.org/stylesheet";>stylesheet
> {code}
> when it can extract the right triple for all the other elements in the test 
> such as 
> {code}
>  http://example.org/alternate";>alternate
> {code}
> I'm going to need some help to figure this out, as I have no idea what part 
> of any23 is causing this. I checked the same test on semargl (our RDFa 
> parser) and it is passing no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ANY23-169) Incorrect interpretation of relative and absolute paths with Microdata

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende reassigned ANY23-169:
-

Assignee: Hans Brende

> Incorrect interpretation of relative and absolute paths with Microdata
> --
>
> Key: ANY23-169
> URL: https://issues.apache.org/jira/browse/ANY23-169
> Project: Apache Any23
>  Issue Type: Bug
>  Components: microdata
>Reporter: Ruben Verborgh
>Assignee: Hans Brende
>Priority: Major
>  Labels: microdata, url, urls
> Fix For: 2.3
>
>
> Parsing the following fragment located at 
> http://ruben.verborgh.org/tmp/slash-test.html
> Homepage
> Other
> results in the URIs
> http://ruben.verborgh.org/tmp/slash-test.html//
> http://ruben.verborgh.org/tmp/slash-test.html/other.html
> instead of the correct
> http://ruben.verborgh.org/tmp/
> http://ruben.verborgh.org/tmp/other.html
> Note that there is no trailing slash in the original.
> Test case:
> http://ruben.verborgh.org/tmp/slash-test.html
> http://any23.org/any23/?format=best&uri=http%3A%2F%2Fruben.verborgh.org%2Ftmp%2Fslash-test.html&validation-mode=none



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ANY23-331) Tool service implementations declared in wrong module?

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende reassigned ANY23-331:
-

Assignee: Hans Brende

> Tool service implementations declared in wrong module?
> --
>
> Key: ANY23-331
> URL: https://issues.apache.org/jira/browse/ANY23-331
> Project: Apache Any23
>  Issue Type: Bug
>  Components: CLI, core
>Affects Versions: 2.1
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> It appears that all the {{org.apache.any23.cli.Tool}}-implementing classes 
> are declared in the *core*/src/main/resources/META-INF/services directory 
> instead of the *cli*/src/main/resources/META-INF/services directory (even 
> though all of the referenced Tool implementations occur in the *cli* module, 
> *not* the core module). I'm not sure what effect (if any) this would have on 
> service loading, but this feels wrong. Comments anyone?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ANY23-378) JsonParseException caused by trailing commas in JSON-LD

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende updated ANY23-378:
--
Summary: JsonParseException caused by trailing commas in JSON-LD  (was: 
JsonParseException)

> JsonParseException caused by trailing commas in JSON-LD
> ---
>
> Key: ANY23-378
> URL: https://issues.apache.org/jira/browse/ANY23-378
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> On the page http://golfavisen.dk/golfavisen-award-2018/ I'm getting a 
> JsonParseException in the EmbeddedJSONLDExtractor:
> {noformat}
> org.apache.any23.extractor.ExtractionException: Error while parsing RDF 
> document.
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:175)
>   at 
> org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.extractJSONLDScript(EmbeddedJSONLDExtractor.java:149)
>   at 
> org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.run(EmbeddedJSONLDExtractor.java:83)
>   at 
> org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.run(EmbeddedJSONLDExtractor.java:54)
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:480)
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:259)
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:323)
>   at 
> org.apache.any23.extractor.html.AbstractExtractorTestCase.extract(AbstractExtractorTestCase.java:189)
>   at 
> org.apache.any23.extractor.html.AbstractExtractorTestCase.assertExtract(AbstractExtractorTestCase.java:204)
>   ... 28 more
> Caused by: org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:171)
>   ... 36 more
> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected 
> character ('}' (code 125)): was expecting double-quote to start field name
>  at [Source: (BufferedReader); line: 9, column: 10]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:561)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddName(ReaderBasedJsonParser.java:1757)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName(ReaderBasedJsonParser.java:907)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:512)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
>   at 
> com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
>   at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
>   ... 37 more
> {noformat}
> caused by the following json:
> {noformat}
> { "@context": "http://schema.org";,
>   "@type": "Event",
>   "name": "PINNACLE BANK CHAMPIONSHIP",
>   "startDate": "2018-7-19T00-00-00-00",
>   "endDate": "2018-7-19T23-23-59-00",
>   "image":"http://golfavisen.dk/wp-content/uploads/2017/03/WEB.png";,
>   "description":"PINNACLE BANK CHAMPIONSHIP",
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-381) JsonParseException: Illegal unquoted character

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567523#comment-16567523
 ] 

ASF GitHub Bot commented on ANY23-381:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/112


> JsonParseException: Illegal unquoted character
> --
>
> Key: ANY23-381
> URL: https://issues.apache.org/jira/browse/ANY23-381
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> While perusing the site http://losangeles.eventful.com/events I stumbled 
> across the following exception:
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
>   ... 36 more
> Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
> character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
> included in string value
>  at [Source: (BufferedReader); line: 1, column: 147]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278)
>   at 
> com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
>   at 
> com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
>   at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
>   ... 37 more
> {noformat}
> caused by the {{description}} field in the following json spanning multiple 
> unescaped newlines: 
> {noformat}
>   {
> "@context": "http://schema.org";,
> "@type": "Event",
> "name": "#1 Magic Show in L.A.",
> "description": "#1 MAGIC SHOW IN L.A.
> The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist 
> extraordinaire Ivan Amodei is on a national tour with his show INTIMATE 
> ILLUSIONS. 
> Currently, on an ei...",
> "startDate": "Saturday, August 11, 2018  4:00 PM",
> "image": 
> "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png",
> "location": {
>   "@type": "Place",
>   "name": "Beverly Wilshire Hotel",
>   "url": 
> "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4",
>   "address": {
> "streetAddress": "9500 Wilshire Boulevard",
> "addressLocality": "Beverly Hills",
> "addressRegion": "California",
> "postalCode": "90212"
>   }
> },
> "offers": {
>   "@type": "Offer",
>   "url": 
> "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets",
>   "availability": "http://schema.org/InStock";
> },
> "performer": [{
>   "@type": "Person",
>   "name": "Ivan Amodei"
> }]
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #112: ANY23-381 escape illegal characters in JSON-LD stri...

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/112


---


[jira] [Resolved] (ANY23-381) JsonParseException: Illegal unquoted character

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-381.
---
Resolution: Fixed

> JsonParseException: Illegal unquoted character
> --
>
> Key: ANY23-381
> URL: https://issues.apache.org/jira/browse/ANY23-381
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> While perusing the site http://losangeles.eventful.com/events I stumbled 
> across the following exception:
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
>   ... 36 more
> Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
> character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
> included in string value
>  at [Source: (BufferedReader); line: 1, column: 147]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278)
>   at 
> com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
>   at 
> com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
>   at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
>   ... 37 more
> {noformat}
> caused by the {{description}} field in the following json spanning multiple 
> unescaped newlines: 
> {noformat}
>   {
> "@context": "http://schema.org";,
> "@type": "Event",
> "name": "#1 Magic Show in L.A.",
> "description": "#1 MAGIC SHOW IN L.A.
> The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist 
> extraordinaire Ivan Amodei is on a national tour with his show INTIMATE 
> ILLUSIONS. 
> Currently, on an ei...",
> "startDate": "Saturday, August 11, 2018  4:00 PM",
> "image": 
> "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png",
> "location": {
>   "@type": "Place",
>   "name": "Beverly Wilshire Hotel",
>   "url": 
> "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4",
>   "address": {
> "streetAddress": "9500 Wilshire Boulevard",
> "addressLocality": "Beverly Hills",
> "addressRegion": "California",
> "postalCode": "90212"
>   }
> },
> "offers": {
>   "@type": "Offer",
>   "url": 
> "//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets",
>   "availability": "http://schema.org/InStock";
> },
> "performer": [{
>   "@type": "Person",
>   "name": "Ivan Amodei"
> }]
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-381) JsonParseException: Illegal unquoted character

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567522#comment-16567522
 ] 

ASF GitHub Bot commented on ANY23-381:
--

GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/112

ANY23-381 escape illegal characters in JSON-LD strings

mvn clean test -> all tests passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-381

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #112


commit 817e744af90d8f3c9bf419e5c395c421e0c3924a
Author: Hans 
Date:   2018-08-02T21:33:36Z

ANY23-381 fix illegal unescaped characters in JSON-LD




> JsonParseException: Illegal unquoted character
> --
>
> Key: ANY23-381
> URL: https://issues.apache.org/jira/browse/ANY23-381
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> While perusing the site http://losangeles.eventful.com/events I stumbled 
> across the following exception:
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
>   ... 36 more
> Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
> character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
> included in string value
>  at [Source: (BufferedReader); line: 1, column: 147]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278)
>   at 
> com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
>   at 
> com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
>   at 
> com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
>   at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
>   at 
> com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
>   at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
>   ... 37 more
> {noformat}
> caused by the {{description}} field in the following json spanning multiple 
> unescaped newlines: 
> {noformat}
>   {
> "@context": "http://schema.org";,
> "@type": "Event",
> "name": "#1 Magic Show in L.A.",
> "description": "#1 MAGIC SHOW IN L.A.
> The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist 
> extraordinaire Ivan Amodei is on a national tour with his show INTIMATE 
> ILLUSIONS. 
> Currently, on an ei...",
> "startDate": "Saturday, August 11, 2018  4:00 PM",
> "image": 
> "//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png",
> "location": {
>   "@type": "Place",
>   "name": "Beverly Wilshire Hotel",
>   "url": 
> "//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4",
>   "address": {
> "streetAddress": "9500 Wilshire Boulevard",
> "addressLocality": "Beverly Hills",
> "addressRegion": "California",
> "postalCode": "90212"
>   }
> },
> "offers": 

[GitHub] any23 pull request #112: ANY23-381 escape illegal characters in JSON-LD stri...

2018-08-02 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/112

ANY23-381 escape illegal characters in JSON-LD strings

mvn clean test -> all tests passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-381

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #112


commit 817e744af90d8f3c9bf419e5c395c421e0c3924a
Author: Hans 
Date:   2018-08-02T21:33:36Z

ANY23-381 fix illegal unescaped characters in JSON-LD




---


[GitHub] any23 issue #111: Any23 295 librdfa module

2018-08-02 Thread JulioCCBUcuenca
Github user JulioCCBUcuenca commented on the issue:

https://github.com/apache/any23/pull/111
  
@lewismc Could you please create a separate branch for the implementation 
of librdfa-rdf4j?


---


[GitHub] any23 pull request #111: Any23 295 librdfa module

2018-08-02 Thread JulioCCBUcuenca
GitHub user JulioCCBUcuenca opened a pull request:

https://github.com/apache/any23/pull/111

Any23 295 librdfa module

# librdfa-rdf4j
Implementation of librdfa bridge. This implementation has a RDF4J Parser  
along with the bridge between librdfa (C) and RDF4J (Java). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JulioCCBUcuenca/any23 ANY23-295_librdfa

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #111


commit a9f37b2293fb371eda431b4385e26cf99fbff365
Author: Julio Caguano 
Date:   2018-06-29T03:03:56Z

Add extractors for librdfa.

Signed-off-by: Julio Caguano

commit eec861c65f66bd9cbcb45bcabe7d9ca1834ae3d2
Author: Julio Caguano 
Date:   2018-06-29T03:05:29Z

Merge branch 'master' of https://github.com/apache/any23 into ANY23-295

commit ccf33eafca699dde016f0551f664edb584a6e5ba
Author: Julio Caguano 
Date:   2018-07-13T03:30:50Z

Merge branch 'master' of https://github.com/apache/any23 into ANY23-295

commit 64082692139f69df16d2985b6e9591d000e6457b
Author: Julio Caguano 
Date:   2018-07-16T15:18:09Z

add librdfa extractor

commit 2fe500f5283a96a4d2428fad30b0b671384d6a73
Author: Julio Caguano 
Date:   2018-07-16T15:19:04Z

Merge branch 'master' of https://github.com/apache/any23 into ANY23-295

commit 68f0d8078fb2adc3d11d9e8ebf83c6e7e58aa9b3
Author: Julio Caguano 
Date:   2018-07-19T03:14:56Z

ignore basic test

commit bd70dfc1abc4864fcd3857291d009e2a45d9b556
Author: Julio Caguano 
Date:   2018-07-26T03:40:57Z

Make libdrfa configurable.

commit a271a21760ca4caee0ec9359008cd446fe8b950a
Author: Julio Caguano 
Date:   2018-07-30T00:39:19Z

solve integration test. Librdfa is loaded with SPI.

commit 5dbc86c7601f3e9abd1aec08cb5e41716c2e7cab
Author: Julio Caguano 
Date:   2018-07-30T03:18:25Z

Add test suite

commit c4b5dccbbd004e480494f38f57238936d3e8942d
Author: Julio Caguano 
Date:   2018-07-30T03:18:52Z

add last version of librdfa-rdf4j

commit 85e0c7e13df92e457d8144c9cf66f67fe85bd9e5
Author: Julio Caguano 
Date:   2018-07-30T03:20:52Z

Add lang tag. lang tag is used to identify language in HTML pages, and 
xml:lang is used to identify in xml files.

commit b0a21ff13fa9e03c7734be54f1bdb2a8b85ad307
Author: Julio Caguano 
Date:   2018-08-02T19:31:24Z

Merge branch 'master' of https://github.com/apache/any23 into ANY23-295

commit d0e5f8319cf2f52c4d4ba3d7cfc9f789cb225847
Author: Julio Caguano 
Date:   2018-08-02T20:12:32Z

librdfa module, bridge.




---


[jira] [Commented] (ANY23-380) RDFa SAXParseException: attribute was already specified

2018-08-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567392#comment-16567392
 ] 

Hudson commented on ANY23-380:
--

SUCCESS: Integrated in Jenkins build Any23-trunk #1608 (See 
[https://builds.apache.org/job/Any23-trunk/1608/])
ANY23-380 disallow duplicate attribute keys (hans: rev 
4e3011a4d80545f04563f427687f4fa74e17103f)
* (add) 
test-resources/src/test/resources/html/rdfa/attribute-already-specified.html
* (edit) 
core/src/test/java/org/apache/any23/extractor/rdfa/RDFa11ExtractorTest.java
* (edit) core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java


> RDFa SAXParseException: attribute was already specified
> ---
>
> Key: ANY23-380
> URL: https://issues.apache.org/jira/browse/ANY23-380
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came 
> upon the following exception:
>  
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; 
> lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified 
> for element "a".
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178)
>   ... 34 more
> Caused by: org.semarglproject.rdf.ParseException: 
> org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute 
> "dort..." was already specified for element "a".
>   at 
> org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
>   at 
> org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
>   ... 36 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; 
> Attribute "dort..." was already specified for element "a".
>   at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ANY23-381) JsonParseException: Illegal unquoted character

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende updated ANY23-381:
--
Description: 
While perusing the site http://losangeles.eventful.com/events I stumbled across 
the following exception:

{noformat}
org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
... 36 more
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
included in string value
 at [Source: (BufferedReader); line: 1, column: 147]
at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278)
at 
com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
at 
com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
at 
com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
at 
com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
at 
com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
at 
com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
at 
com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
at 
com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
at 
com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
... 37 more
{noformat}

caused by the {{description}} field in the following json spanning multiple 
unescaped newlines: 

{noformat}
  {
"@context": "http://schema.org";,
"@type": "Event",
"name": "#1 Magic Show in L.A.",
"description": "#1 MAGIC SHOW IN L.A.
The current WINNER of the CW’s Penn & Teller’s FOOL US, Illusionist 
extraordinaire Ivan Amodei is on a national tour with his show INTIMATE 
ILLUSIONS. 

Currently, on an ei...",
"startDate": "Saturday, August 11, 2018  4:00 PM",
"image": 
"//d1marr3m5x4iac.cloudfront.net/images/perspectivecrop373by249/I0-001/040/358/185-9.png_/1-magic-show-la-85.png",
"location": {
  "@type": "Place",
  "name": "Beverly Wilshire Hotel",
  "url": 
"//losangeles.eventful.com/venues/beverly-wilshire-hotel-/V0-001-003541383-4",
  "address": {
"streetAddress": "9500 Wilshire Boulevard",
"addressLocality": "Beverly Hills",
"addressRegion": "California",
"postalCode": "90212"
  }
},
"offers": {
  "@type": "Offer",
  "url": 
"//losangeles.eventful.com/events/1-magic-show-la-/E0-001-114704991-1/tickets",
  "availability": "http://schema.org/InStock";
},
"performer": [{
  "@type": "Person",
  "name": "Ivan Amodei"
}]
  }
{noformat}

  was:
While perusing the site http://losangeles.eventful.com/events I stumbled across 
the following exception:

{noformat}
org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
... 36 more
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
included in string value
 at [Source: (BufferedReader); line: 1, column: 147]
at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
at 
com.fasterxml.jackson.core.json

[jira] [Created] (ANY23-381) JsonParseException: Illegal unquoted character

2018-08-02 Thread Hans Brende (JIRA)
Hans Brende created ANY23-381:
-

 Summary: JsonParseException: Illegal unquoted character
 Key: ANY23-381
 URL: https://issues.apache.org/jira/browse/ANY23-381
 Project: Apache Any23
  Issue Type: Bug
  Components: extractors
Affects Versions: 2.3
Reporter: Hans Brende
Assignee: Hans Brende
 Fix For: 2.3


While perusing the site http://losangeles.eventful.com/events I stumbled across 
the following exception:

{noformat}
org.eclipse.rdf4j.rio.RDFParseException: Could not parse JSONLD
at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:77)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:196)
... 36 more
Caused by: com.fasterxml.jackson.core.JsonParseException: Illegal unquoted 
character ((CTRL-CHAR, code 10)): has to be escaped using backslash to be 
included in string value
 at [Source: (BufferedReader); line: 1, column: 147]
at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._throwUnquotedSpace(ParserMinimalBase.java:627)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2045)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2016)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:278)
at 
com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
at 
com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
at 
com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
at 
com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
at 
com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:3972)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2264)
at 
com.fasterxml.jackson.core.JsonParser.readValueAs(JsonParser.java:1729)
at 
com.github.jsonldjava.utils.JsonUtils.fromJsonParser(JsonUtils.java:196)
at com.github.jsonldjava.utils.JsonUtils.fromReader(JsonUtils.java:173)
at 
com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:154)
at 
com.github.jsonldjava.utils.JsonUtils.fromInputStream(JsonUtils.java:111)
at org.eclipse.rdf4j.rio.jsonld.JSONLDParser.parse(JSONLDParser.java:71)
... 37 more
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ANY23-380) RDFa SAXParseException: attribute was already specified

2018-08-02 Thread Hans Brende (JIRA)


 [ 
https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-380.
---
Resolution: Fixed
  Assignee: Hans Brende

> RDFa SAXParseException: attribute was already specified
> ---
>
> Key: ANY23-380
> URL: https://issues.apache.org/jira/browse/ANY23-380
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came 
> upon the following exception:
>  
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; 
> lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified 
> for element "a".
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178)
>   ... 34 more
> Caused by: org.semarglproject.rdf.ParseException: 
> org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute 
> "dort..." was already specified for element "a".
>   at 
> org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
>   at 
> org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
>   ... 36 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; 
> Attribute "dort..." was already specified for element "a".
>   at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-380) RDFa SAXParseException: attribute was already specified

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567311#comment-16567311
 ] 

ASF GitHub Bot commented on ANY23-380:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/110


> RDFa SAXParseException: attribute was already specified
> ---
>
> Key: ANY23-380
> URL: https://issues.apache.org/jira/browse/ANY23-380
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came 
> upon the following exception:
>  
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; 
> lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified 
> for element "a".
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178)
>   ... 34 more
> Caused by: org.semarglproject.rdf.ParseException: 
> org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute 
> "dort..." was already specified for element "a".
>   at 
> org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
>   at 
> org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
>   ... 36 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; 
> Attribute "dort..." was already specified for element "a".
>   at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #110: ANY23-380 disallow duplicate attribute keys

2018-08-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/110


---


[jira] [Commented] (ANY23-380) RDFa SAXParseException: attribute was already specified

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ANY23-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567307#comment-16567307
 ] 

ASF GitHub Bot commented on ANY23-380:
--

GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/110

ANY23-380 disallow duplicate attribute keys

I disallowed duplicate attribute keys in html to avoid 
`org.xml.sax.SAXParseException`s.

Along the way, I also cleaned up some annoying or unnecessary 
logging/console output produced by our massive suite of test cases.

Also cleaned up some javadoc/miscellaneous items.

mvn clean test -> all tests passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-380

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #110


commit 4e3011a4d80545f04563f427687f4fa74e17103f
Author: Hans 
Date:   2018-08-01T21:06:55Z

ANY23-380 disallow duplicate attribute keys

commit 159aeb489473f600213142a746d39a49e3d3548b
Author: Hans 
Date:   2018-08-02T17:46:44Z

cleaned up annoying logging/console output

commit 0291f588d04859053ef4eb8845686bad824b4461
Author: Hans 
Date:   2018-08-02T18:01:19Z

added license and javadoc




> RDFa SAXParseException: attribute was already specified
> ---
>
> Key: ANY23-380
> URL: https://issues.apache.org/jira/browse/ANY23-380
> Project: Apache Any23
>  Issue Type: Bug
>  Components: extractors
>Affects Versions: 2.3
>Reporter: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> When browsing the page https://www.lokalkompass.de/bilder/kirche.html I came 
> upon the following exception:
>  
> {noformat}
> org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException; 
> lineNumber: 235; columnNumber: 511; Attribute "dort..." was already specified 
> for element "a".
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
>   at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:178)
>   ... 34 more
> Caused by: org.semarglproject.rdf.ParseException: 
> org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; Attribute 
> "dort..." was already specified for element "a".
>   at 
> org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
>   at 
> org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
>   at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
>   at 
> org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
>   ... 36 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 235; columnNumber: 511; 
> Attribute "dort..." was already specified for element "a".
>   at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>   at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
>   ... 40 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #110: ANY23-380 disallow duplicate attribute keys

2018-08-02 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/110

ANY23-380 disallow duplicate attribute keys

I disallowed duplicate attribute keys in html to avoid 
`org.xml.sax.SAXParseException`s.

Along the way, I also cleaned up some annoying or unnecessary 
logging/console output produced by our massive suite of test cases.

Also cleaned up some javadoc/miscellaneous items.

mvn clean test -> all tests passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-380

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #110


commit 4e3011a4d80545f04563f427687f4fa74e17103f
Author: Hans 
Date:   2018-08-01T21:06:55Z

ANY23-380 disallow duplicate attribute keys

commit 159aeb489473f600213142a746d39a49e3d3548b
Author: Hans 
Date:   2018-08-02T17:46:44Z

cleaned up annoying logging/console output

commit 0291f588d04859053ef4eb8845686bad824b4461
Author: Hans 
Date:   2018-08-02T18:01:19Z

added license and javadoc




---