Rémy Léone created PARQUET-2377: ----------------------------------- Summary: Issues when using the head on a parquet file Key: PARQUET-2377 URL: https://issues.apache.org/jira/browse/PARQUET-2377 Project: Parquet Issue Type: Bug Reporter: Rémy Léone
``` *$* parquet head train-00000-of-00001-15a05aeec7726f9d.parquet Unknown error shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in: original-instruction at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607) at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92) at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:556) at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:595) at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295) at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279) at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89) at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405) at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66) at org.apache.parquet.cli.Main.run(Main.java:163) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.parquet.cli.Main.main(Main.java:193) ``` this is the dataset in question: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data -- This message was sent by Atlassian Jira (v8.20.10#820010)