[ 
https://issues.apache.org/jira/browse/AVRO-3560?focusedWorklogId=786974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-786974
 ]

ASF GitHub Bot logged work on AVRO-3560:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Jul/22 07:42
            Start Date: 01/Jul/22 07:42
    Worklog Time Spent: 10m 
      Work Description: KalleOlaviNiemitalo commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172042027

   Could you add tests for trailing content in a UTF-8 file parsed using 
Schema.parse(File file)? [JsonFactory.createParser(File 
f)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1025-L1031)
 apparently creates an InputStream for that, and 
[JsonFactory._createParser(InputStream in, IOContext 
ctxt)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1653-L1668)
 calls 
[ByteSourceJsonBootstrapper.constructParser](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/ByteSourceJsonBootstrapper.java#L251-L271),
 which creates an UTF8StreamJsonParser in that case. UTF8StreamJsonParser is a 
"byte-based" parser and inherits [JsonParser.releaseBuffered(Writer 
w)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L836),
 which just returns -1, but [UTF8StreamJsonParser.releaseBuffered(OutputStream 
out)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L224-L236)
 can write something to the OutputStream. I think this means that, to correctly 
detect trailing content in a UTF-8 file, Schema.parse would have to call not 
only JsonParser.releaseBuffered(Writer) but also 
JsonParser.releaseBuffered(OutputStream), or first call 
JsonParser.getInputSource() and then use the type of the result to guess 
whether the parser is byte-based or char-based.
   
   If parse(InputStream) parses a stream that has trailing content, and 
JsonParser buffers that content, it would be best to return that content to the 
InputStream so that the caller can then read it. However I don't see how to do 
that.
   
   Is it possible that JsonParser buffers some content from a Reader, and the 
buffered content is all space characters and thus ignored by Avro.Schema, but 
the Reader has more content that JsonContent did not even read because its 
buffer filled up? In which case, Avro.Schema would have to check the Reader as 
well.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 786974)
    Time Spent: 50m  (was: 40m)

> avro ignores input after end of avsc json
> -----------------------------------------
>
>                 Key: AVRO-3560
>                 URL: https://issues.apache.org/jira/browse/AVRO-3560
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.11.0
>            Reporter: Radai Rosenblatt
>            Assignee: Radai Rosenblatt
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> try the following unit test:
> {code}
> @Test
> public void littleBobbySchemas() throws Exception {
>     Schema.Parser parser = new Schema.Parser();
>     parser.setValidate(true);
>     parser.setValidateDefaults(true);
>     Schema schema = parser.parse("{\"type\": \"string\"}; DROP TABLE 
> STUDENTS");
>     Assert.assertNotNull(schema);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to