[ 
https://issues.apache.org/jira/browse/JENA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183125#comment-14183125
 ] 

Andy Seaborne edited comment on JENA-806 at 10/24/14 5:51 PM:
--------------------------------------------------------------

{{\"}}-escape sequences are not allowed in URIs, only {{\u}} 

{noformat}
[8]     IRIREF  ::=     '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
[10]    UCHAR   ::=     '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX 
HEX
{noformat}

{{ECHAR}} is only in {{STRING_LITERAL_QUOTE}}

(The "" in thr file name is somewhat bizarre!)


was (Author: andy.seaborne):
{{\"}}-escape sequences are not allowed in URIs, only {{\u}} 

{noformat}
[8]     IRIREF  ::=     '<' ([^#x00-#x20<>"{}|^`\] | UCHAR)* '>'
[10]    UCHAR   ::=     '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX 
HEX
{noformat}

{{ECHAR}] is only in {{STRING_LITERAL_QUOTE}}

(The "" in thr file name is somewhat bizarre!)

> illegal escape sequence value exception on legal characters
> -----------------------------------------------------------
>
>                 Key: JENA-806
>                 URL: https://issues.apache.org/jira/browse/JENA-806
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Cmd line tools
>    Affects Versions: Jena 2.12.1
>         Environment: Ubuntu 14.04, Java 8
>            Reporter: Nick Lothian
>
> When loading the Wikidata data dump using tdbloader2, I received the 
> following error:
> {{ERROR [line: 142128, col: 121] illegal escape sequence value: " (0x22)
> org.apache.jena.riot.RiotException: [line: 142128, col: 121] illegal escape 
> sequence value: " (0x22)
>         at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:136)
>         at 
> org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:163)
>         at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:106)
>         at 
> org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:67)
>         at 
> org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:54)
>         at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:42)
>         at org.apache.jena.riot.RiotReader.parse(RiotReader.java:119)
>         at org.apache.jena.riot.RiotReader.parse(RiotReader.java:96)
>         at org.apache.jena.riot.RiotReader.parse(RiotReader.java:69)
>         at 
> com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:162)
>         at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
>         at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
>         at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
>         at 
> com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)
> }}
> Looking that that line 
> {{sed '142128!d' uncompressed/wikidata-simple-statements.nt}}
> {{<http://www.wikidata.org/entity/Q16873> 
> <http://www.wikidata.org/entity/P18c> 
> <http://commons.wikimedia.org/wiki/File:\"Retrat_de_l'escriptor_Juan_Carlos_Onetti_(1909-1994)\".png>
>  .}}
> Column 121 is the "R" after the ". 
> Looking at http://www.w3.org/TR/n-triples/#n-triples-grammar, it appears that 
> the " character is allowed.
> Should tdbloader2 load this or am I missing something?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to