[ 
https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787046#comment-17787046
 ] 

Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:28 AM:
------------------------------------------------------------------

What about we directly print content given a parquet file? Below is some code 
sample:
{code:java}
  String input = <parquet file>;

  ParquetReader<SimpleRecord> reader = null;
  try {
    PrintWriter writer = new PrintWriter(Main.out, true);
    reader = ParquetReader.builder(new SimpleReadSupport(), new 
Path(input)).build();
    ParquetMetadata metadata = ParquetFileReader.readFooter(new 
Configuration(), new Path(input));
    JsonRecordFormatter.JsonGroupFormatter formatter = 
JsonRecordFormatter.fromSchema(metadata.getFileMetaData().getSchema());

    for (SimpleRecord value = reader.read(); value != null; value = 
reader.read()) {
      value.prettyPrint(writer);
      writer.println();
    }
  } finally {
    if (reader != null) {
      try {
        reader.close();
      } catch (Exception ex) {
      }
    }
  }
}{code}
Output sample:
{code:java}
.......   


id = 15012
category = open_qa
original-instruction = What is the difference between a road bike and a 
mountain bike?
original-context = 
original-response = Road bikes are built to be ridden on asphalt and cement 
surfaces and have thin tires, whereas mountain bikes are built to be ridden on 
dirt and have wider tires. Road bikes also have more aerodynamic handle bars 
while mountain bike handle bars a built for less responsive steering while 
bouncing around off the road.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = What is the difference between a road bike and a mountain bike?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item = 
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = Road bikes are built to be ridden on asphalt and cement surfaces and 
have thin tires, whereas mountain bikes are built to be ridden on dirt and have 
wider tires. Road bikes also have more aerodynamic handle bars while mountain 
bike handle bars a built for less responsive steering while bouncing around off 
the road.
.status:
..list:
...item = submitted


id = 15013
category = general_qa
original-instruction = How does GIS help in the real estate investment industry?
original-context = 
original-response = Real estate investors depend on precise, accurate location 
intelligence for competitive insights about the markets and locations where 
they do business. Real estate investment teams use GIS to bring together 
location-specific data, mapping, and visualization technology. This enables 
them to provide the latest insights about real estate markets and their 
investments, now and in the future. Using thousands of global datasets, 
investors can quickly understand how their real estate investments are 
performing across town or around the world, quickly access precise local data 
about real estate assets, on any device, anywhere, anytime, including 
information on occupancy, building maintenance, property valuation, and 
more.Real estate companies and investors use GIS to research markets, identify 
new opportunities for growth and expansion, and manage their investments at the 
market and neighborhood levels. They can also use GIS to create professional 
digital and printed materials—such as 3D renderings and virtual 
walk-throughs—to help market investments across platforms. Real estate 
investors can use mobile data collection tools to gather property information 
directly from the field and analyze and share insights across their 
organizations in real time. Investors can leverage precise local knowledge 
about their assets across geographies. GIS maps and dashboards help investors 
see, in real-time, relevant data that can affect properties, and streamline 
investment management with access to all relevant data about every asset in any 
portfolio.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = How does GIS help in the real estate investment industry?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item = 
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = Real estate investors depend on precise, accurate location 
intelligence for competitive insights about the markets and locations where 
they do business. Real estate investment teams use GIS to bring together 
location-specific data, mapping, and visualization technology. This enables 
them to provide the latest insights about real estate markets and their 
investments, now and in the future. Using thousands of global datasets, 
investors can quickly understand how their real estate investments are 
performing across town or around the world, quickly access precise local data 
about real estate assets, on any device, anywhere, anytime, including 
information on occupancy, building maintenance, property valuation, and 
more.Real estate companies and investors use GIS to research markets, identify 
new opportunities for growth and expansion, and manage their investments at the 
market and neighborhood levels. They can also use GIS to create professional 
digital and printed materials—such as 3D renderings and virtual 
walk-throughs—to help market investments across platforms. Real estate 
investors can use mobile data collection tools to gather property information 
directly from the field and analyze and share insights across their 
organizations in real time. Investors can leverage precise local knowledge 
about their assets across geographies. GIS maps and dashboards help investors 
see, in real-time, relevant data that can affect properties, and streamline 
investment management with access to all relevant data about every asset in any 
portfolio.
.status:
..list:
...item = submitted


id = 15014
category = general_qa
original-instruction = What is the Masters?
original-context = 
original-response = The Masters Tournament is a golf tournament held annually 
in the first week of April at Augusta National Golf Club in Augusta, Georgia.  
The Masters is one of four Major golf tournaments and the only one to be played 
at the same course every year.  The course is renowned for its iconic holes, 
impeccable groundskeeping, and colorful flowers that are typically in bloom.  
The winner earns a coveted Green Jacket and a lifetime invitation back to 
compete.  Many players and fans consider The Masters to be their favorite 
tournament given these traditions and the historical moments that have occurred 
in past tournaments.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = What is the Masters?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item = 
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = The Masters Tournament is a golf tournament held annually in the 
first week of April at Augusta National Golf Club in Augusta, Georgia.  The 
Masters is one of four Major golf tournaments and the only one to be played at 
the same course every year.  The course is renowned for its iconic holes, 
impeccable groundskeeping, and colorful flowers that are typically in bloom.  
The winner earns a coveted Green Jacket and a lifetime invitation back to 
compete.  Many players and fans consider The Masters to be their favorite 
tournament given these traditions and the historical moments that have occurred 
in past tournaments.
.status:
..list:
...item = submitted{code}


was (Author: JIRAUSER280855):
What about we directly print content given a parquet file? Below is some code 
sample:
{code:java}
  String input = <parquet file>;

  ParquetReader<SimpleRecord> reader = null;
  try {
    PrintWriter writer = new PrintWriter(Main.out, true);
    reader = ParquetReader.builder(new SimpleReadSupport(), new 
Path(input)).build();
    ParquetMetadata metadata = ParquetFileReader.readFooter(new 
Configuration(), new Path(input));
    JsonRecordFormatter.JsonGroupFormatter formatter = 
JsonRecordFormatter.fromSchema(metadata.getFileMetaData().getSchema());

    for (SimpleRecord value = reader.read(); value != null; value = 
reader.read()) {
      value.prettyPrint(writer);
      writer.println();
    }
  } finally {
    if (reader != null) {
      try {
        reader.close();
      } catch (Exception ex) {
      }
    }
  }
}{code}
Output sample:
{code:java}
.......


id = 15013
category = general_qa
original-instruction = How does GIS help in the real estate investment industry?
original-context = 
original-response = Real estate investors depend on precise, accurate location 
intelligence for competitive insights about the markets and locations where 
they do business. Real estate investment teams use GIS to bring together 
location-specific data, mapping, and visualization technology. This enables 
them to provide the latest insights about real estate markets and their 
investments, now and in the future. Using thousands of global datasets, 
investors can quickly understand how their real estate investments are 
performing across town or around the world, quickly access precise local data 
about real estate assets, on any device, anywhere, anytime, including 
information on occupancy, building maintenance, property valuation, and 
more.Real estate companies and investors use GIS to research markets, identify 
new opportunities for growth and expansion, and manage their investments at the 
market and neighborhood levels. They can also use GIS to create professional 
digital and printed materials—such as 3D renderings and virtual 
walk-throughs—to help market investments across platforms. Real estate 
investors can use mobile data collection tools to gather property information 
directly from the field and analyze and share insights across their 
organizations in real time. Investors can leverage precise local knowledge 
about their assets across geographies. GIS maps and dashboards help investors 
see, in real-time, relevant data that can affect properties, and streamline 
investment management with access to all relevant data about every asset in any 
portfolio.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = How does GIS help in the real estate investment industry?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item = 
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = Real estate investors depend on precise, accurate location 
intelligence for competitive insights about the markets and locations where 
they do business. Real estate investment teams use GIS to bring together 
location-specific data, mapping, and visualization technology. This enables 
them to provide the latest insights about real estate markets and their 
investments, now and in the future. Using thousands of global datasets, 
investors can quickly understand how their real estate investments are 
performing across town or around the world, quickly access precise local data 
about real estate assets, on any device, anywhere, anytime, including 
information on occupancy, building maintenance, property valuation, and 
more.Real estate companies and investors use GIS to research markets, identify 
new opportunities for growth and expansion, and manage their investments at the 
market and neighborhood levels. They can also use GIS to create professional 
digital and printed materials—such as 3D renderings and virtual 
walk-throughs—to help market investments across platforms. Real estate 
investors can use mobile data collection tools to gather property information 
directly from the field and analyze and share insights across their 
organizations in real time. Investors can leverage precise local knowledge 
about their assets across geographies. GIS maps and dashboards help investors 
see, in real-time, relevant data that can affect properties, and streamline 
investment management with access to all relevant data about every asset in any 
portfolio.
.status:
..list:
...item = submittedid = 15014
category = general_qa
original-instruction = What is the Masters?
original-context = 
original-response = The Masters Tournament is a golf tournament held annually 
in the first week of April at Augusta National Golf Club in Augusta, Georgia.  
The Masters is one of four Major golf tournaments and the only one to be played 
at the same course every year.  The course is renowned for its iconic holes, 
impeccable groundskeeping, and colorful flowers that are typically in bloom.  
The winner earns a coveted Green Jacket and a lifetime invitation back to 
compete.  Many players and fans consider The Masters to be their favorite 
tournament given these traditions and the historical moments that have occurred 
in past tournaments.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = What is the Masters?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item = 
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = The Masters Tournament is a golf tournament held annually in the 
first week of April at Augusta National Golf Club in Augusta, Georgia.  The 
Masters is one of four Major golf tournaments and the only one to be played at 
the same course every year.  The course is renowned for its iconic holes, 
impeccable groundskeeping, and colorful flowers that are typically in bloom.  
The winner earns a coveted Green Jacket and a lifetime invitation back to 
compete.  Many players and fans consider The Masters to be their favorite 
tournament given these traditions and the historical moments that have occurred 
in past tournaments.
.status:
..list:
...item = submitted {code}

> Problem with a cat
> ------------------
>
>                 Key: PARQUET-2378
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2378
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Rémy Léone
>            Priority: Major
>         Attachments: image-2023-11-16-21-40-07-628.png
>
>
> *$* parquet cat train-00000-of-00001-15a05aeec7726f9d.parquet                 
>        
> Unknown error
> shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in: 
> original-instruction
>  at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607)
>  at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92)
>  at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:556)
>  at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:595)
>  at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295)
>  at 
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)
>  at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89)
>  at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405)
>  at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66)
>  at org.apache.parquet.cli.Main.run(Main.java:163)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>  at org.apache.parquet.cli.Main.main(Main.java:193)
> the data set in question is: 
> [https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to