Jan Martin Keil created JENA-2188:
-------------------------------------

             Summary: Escape % in TokenizerText#fatal
                 Key: JENA-2188
                 URL: https://issues.apache.org/jira/browse/JENA-2188
             Project: Apache Jena
          Issue Type: Bug
          Components: RIOT
    Affects Versions: Jena 4.2.0
            Reporter: Jan Martin Keil


The presence of "%" near to a syntax error might cause _TokenizerText#fatal_ to 
throw an _UnknownFormatConversionException_ instead of a _RiotParseException_. 
This happens because of the use of _String#format_ without escaping "%". See 
the following example:
{code:java}
import java.io.ByteArrayInputStream;
import static java.nio.charset.StandardCharsets.UTF_8;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.riot.Lang;
import org.apache.jena.riot.RDFParserBuilder;
import org.junit.jupiter.api.Test;
public class TokenizerTextTest {
  @Test
  public void fatal() {
    RDFParserBuilder.create().source(new 
ByteArrayInputStream("<http://example.org/s> <http://example.org/p> 
\"example\"@de-DE\" <http://example.org/%D8-graph>" .getBytes(UTF_8))) 
.lang(Lang.NQUADS).parse(ModelFactory.createDefaultModel());
  }
}{code}
 This causes:
{code}
java.util.UnknownFormatConversionException: Conversion = 'D'
        at 
java.base/java.util.Formatter$FormatSpecifier.conversion(Formatter.java:2839)
        at 
java.base/java.util.Formatter$FormatSpecifier.<init>(Formatter.java:2865)
        at java.base/java.util.Formatter.parse(Formatter.java:2713)
        at java.base/java.util.Formatter.format(Formatter.java:2655)
        at java.base/java.util.Formatter.format(Formatter.java:2609)
        at java.base/java.lang.String.format(String.java:2897)
        at 
org.apache.jena.riot.tokens.TokenizerText.fatal(TokenizerText.java:1347)
        at 
org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:773)
        at 
org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
        at 
org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
        at 
org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
        at 
org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
        at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:98)
        at org.apache.jena.riot.lang.LangNQuads.parseOne(LangNQuads.java:78)
        at org.apache.jena.riot.lang.LangNQuads.runParser(LangNQuads.java:53)
        at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
        at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:181)
        at org.apache.jena.riot.RDFParser.read(RDFParser.java:358)
        at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:348)
        at org.apache.jena.riot.RDFParser.parse(RDFParser.java:295)
        at org.apache.jena.riot.RDFParser.parse(RDFParser.java:241)
        at org.apache.jena.riot.RDFParser.parse(RDFParser.java:250)
        at 
org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:574)
        at TokenizerTextTest.fatal(TokenizerTextTest.java:17)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to