[ 
https://issues.apache.org/jira/browse/AVRO-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959537#comment-16959537
 ] 

Ryan Skraba edited comment on AVRO-2561 at 10/25/19 8:16 AM:
-------------------------------------------------------------

Thanks for the reports!

Placing the avro-tools-1.9.1.jar, twitter.avsc and twitter.json files in a 
single directory, I can reproduce the error with:
{code:bash}
docker run -it --rm --volume $(pwd):/home/docker-user:rw --workdir 
/home/docker-user adoptopenjdk/openjdk13:jdk-13_33 \
    bash -c "java -jar avro-tools-1.9.1.jar fromjson --schema-file twitter.avsc 
twitter.json > twitter.avro"  
{code}
 

And see the 2-character java.version causing the problem:
{code:java}
docker run -it --rm adoptopenjdk/openjdk13:jdk-13_33 java 
-XshowSettings:properties -version
{code}
 

For extra fun, the bug does *not* exist if you replace the image with 
*openjdk:13* or *azul/zulu-openjdk:13* (with a reported java.version of 
13.0.1).  The generated Avro file looks correct.  Using a JDK from another 
provider for avro-tools might be a quick workaround.

I imagine that the reason the Hadoop libraries are included are to read/write 
avro files directly to a Hadoop FileSystem.  I can think of a couple of 
possible fixes:
 # Removing the ability to read/write from Hadoop files systems, and remove 
those libraries from the CLI tool.
 # Skip using the Hadoop file systems if reading/writing from a local file 
system.
 # Bump the included hadoop version to one that works with the new java 
versioning scheme.  Version 2.7.4, for example, doesn't have the suspicious 
line.

The last alternative would probably be the most viable for a 1.9.2 fix.  What 
do you think?


was (Author: ryanskraba):
Thanks for the reports!

Placing the avro-tools-1.9.1.jar, twitter.avsc and twitter.json files in a 
single directory, I can reproduce the error with:
{code:bash}
docker run -it --rm --volume $(pwd):/home/docker-user:rw --workdir 
/home/docker-user adoptopenjdk/openjdk13:jdk-13_33 \
    bash -c "java -jar avro-tools-1.9.1.jar fromjson --schema-file twitter.avsc 
twitter.json > twitter.avro"  
{code}
 

And see the 2-character java.version causing the problem:

 
{code:java}
docker run -it --rm adoptopenjdk/openjdk13:jdk-13_33 java 
-XshowSettings:properties -version
{code}
 

For extra fun, the bug does *not* exist if you replace the image with 
*openjdk:13* or *azul/zulu-openjdk:13* (with a reported java.version of 
13.0.1).  The generated Avro file looks correct.  Using a JDK from another 
provider for avro-tools might be a quick workaround.

I imagine that the reason the Hadoop libraries are included are to read/write 
avro files directly to a Hadoop FileSystem.  I can think of a couple of 
possible fixes:
 # Removing the ability to read/write from Hadoop files systems, and remove 
those libraries from the CLI tool.
 # Skip using the Hadoop file systems if reading/writing from a local file 
system.
 # Bump the included hadoop version to one that works with the new java 
versioning scheme.  Version 2.7.4, for example, doesn't have the suspicious 
line.

The last alternative would probably be the most viable for a 1.9.2 fix.  What 
do you think?

> java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
> -------------------------------------------------------------------
>
>                 Key: AVRO-2561
>                 URL: https://issues.apache.org/jira/browse/AVRO-2561
>             Project: Apache Avro
>          Issue Type: Bug
>            Reporter: Leo
>            Priority: Major
>
> I had the following exception when running {{avro-tools}}.
> These are the steps to reproduce:
> {noformat}
> brew install avro-tools
> avro-tools fromjson --schema-file twitter.avsc twitter.json
> {noformat}
> {noformat}
> Exception in thread "main" java.lang.ExceptionInInitializerError
>  at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
>  at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2823)
>  at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2818)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2684)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357)
>  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
>  at org.apache.avro.tool.Util.openFromFS(Util.java:88)
>  at org.apache.avro.tool.Util.parseSchemaFromFS(Util.java:166)
>  at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:75)
>  at org.apache.avro.tool.Main.run(Main.java:66)
>  at org.apache.avro.tool.Main.main(Main.java:55)
> Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
>  at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
>  at java.base/java.lang.String.substring(String.java:1883)
>  at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
>  ... 13 more
> {noformat}
> *twitter.avsc*
> {code:json}
> {
>  "type" : "record",
>  "name" : "twitter_schema",
>  "namespace" : "com.miguno.avro",
>  "fields" : [ {
>  "name" : "username",
>  "type" : "string",
>  "doc" : "Name of the user account on Twitter.com"
>  }, {
>  "name" : "tweet",
>  "type" : "string",
>  "doc" : "The content of the user's Twitter message"
>  }, {
>  "name" : "timestamp",
>  "type" : "long",
>  "doc" : "Unix epoch time in seconds"
>  } ],
>  "doc:" : "A basic schema for storing Twitter messages"
> }
> {code}
> *twitter.json*
> {code:json}
> {"username":"miguno","tweet":"Rock: Nerf paper, scissors is 
> fine.","timestamp": 1366150681 }
> {"username":"BlizzardCS","tweet":"Works as intended. Terran is 
> IMBA.","timestamp": 1366154481 }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to