[ 
https://issues.apache.org/jira/browse/AVRO-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790536#action_12790536
 ] 

Philip Zeyliger commented on AVRO-245:
--------------------------------------

bq. i think ant uses glob-like patterns, not regex. instead i just named the 
file AvroTestUtil.java.

Cool.  I intended [A-Z]* to be glob, not regex, and I thought it worked 
locally, but perhaps I was deluding myself.

bq. // FIXME: re-create encoder to avoid extra spaces (Jackson bug?)

I think this comment isn't quite enough to figure out what the fix-me is 
implying.  What's your TODO/FIXME convention?  (Perhaps that comment was 
intended for yourself, and intended to never be committed.)

I wandered into the Jackson code when I ran into this, and it's reasonably set 
up to write one JSON object, and we're writing many.  So I don't think they'd 
say it was a bug: I think they'd say we should use a different JsonGenerator 
for every bit of JSON we write.

{noformat}
      while (true) {
        try {
          datum = reader.read(null, decoder);
        } catch (AvroRuntimeException e) {            // FIXME: at EOF
{noformat}
It bugs me that this works.  The example it ought to fail is (json-data) "1 2 
3\n" (note: no newlines between records) against schema "int".  This ought to 
throw an error.

The core issue is that we've got two different things going on: we're both 
line-oriented and JSON-oriented.  We should check that the JSON on every line 
is well-formed, and the code fails to.  (My original code was broken too: when 
I wrote the test, it didn't throw an error for the malformed data; just read 
one entry and went on; also StringInputStream was from ant, which shouldn't 
even be on avroj's classpath.)

One way to avoid this mess is to require that the input file be a JSON array.  
So "[1, 2, 3]" (with arbitrary whitespace).  I think this makes it harder to 
use line-oriented unix tools with this, but it does solve both problems.  What 
do you think?  

It also worries me every time JsonDecoder calls "in.nextToken();" without 
checking that the value it got was expected (typically "null" or possibly 
END_ARRAY or END_OBJECT).  It doesn't seem that using the ValidatingDecoder 
makes it check that, but i could be wrong.



> Commandline utility for converting to and from Avro's binary format.
> --------------------------------------------------------------------
>
>                 Key: AVRO-245
>                 URL: https://issues.apache.org/jira/browse/AVRO-245
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: AVRO-245.patch, AVRO-245.patch.txt, AVRO-245.patch.txt, 
> AVRO-245.patch.txt, AVRO-245.patch.txt
>
>
> A utility for avrotool that can convert between Avro binary data and the JSON 
> textual form.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to