[ 
https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787030#comment-17787030
 ] 

Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 5:42 AM:
------------------------------------------------------------------

!image-2023-11-16-21-40-07-628.png|width=831,height=202!

This error is expected because some names using *'-'* in schema are invalid, it 
is thrown from *validateName* in 
[https://github.com/apache/avro/blob/branch-1.11/lang/java/avro/src/main/java/org/apache/avro/Schema.java],
 please double check?


was (Author: JIRAUSER280855):
!image-2023-11-16-21-40-07-628.png|width=831,height=202!

This error is expected because some names using *'-'* in schema are invalid 
[validateName|https://github.com/apache/avro/blob/branch-1.11/lang/java/avro/src/main/java/org/apache/avro/Schema.java]
 in 
[https://github.com/apache/avro/blob/branch-1.11/lang/java/avro/src/main/java/org/apache/avro/Schema.java]
 , please double check?

> Problem with a cat
> ------------------
>
>                 Key: PARQUET-2378
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2378
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Rémy Léone
>            Priority: Major
>         Attachments: image-2023-11-16-21-40-07-628.png
>
>
> *$* parquet cat train-00000-of-00001-15a05aeec7726f9d.parquet                 
>        
> Unknown error
> shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in: 
> original-instruction
>  at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607)
>  at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92)
>  at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:556)
>  at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:595)
>  at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295)
>  at 
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)
>  at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89)
>  at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405)
>  at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66)
>  at org.apache.parquet.cli.Main.run(Main.java:163)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.parquet.cli.Main.main(Main.java:193)
> the data set in question is: 
> [https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to