[
https://issues.apache.org/jira/browse/PARQUET-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787046#comment-17787046
]
Jiashen Zhang edited comment on PARQUET-2378 at 11/17/23 6:26 AM:
------------------------------------------------------------------
What about we directly print content given a parquet file? Below is some code
sample:
{code:java}
String input = <parquet file>;
ParquetReader<SimpleRecord> reader = null;
try {
PrintWriter writer = new PrintWriter(Main.out, true);
reader = ParquetReader.builder(new SimpleReadSupport(), new
Path(input)).build();
ParquetMetadata metadata = ParquetFileReader.readFooter(new
Configuration(), new Path(input));
JsonRecordFormatter.JsonGroupFormatter formatter =
JsonRecordFormatter.fromSchema(metadata.getFileMetaData().getSchema());
for (SimpleRecord value = reader.read(); value != null; value =
reader.read()) {
value.prettyPrint(writer);
writer.println();
}
} finally {
if (reader != null) {
try {
reader.close();
} catch (Exception ex) {
}
}
}
}{code}
Output sample:
{code:java}
.......
id = 15013
category = general_qa
original-instruction = How does GIS help in the real estate investment industry?
original-context =
original-response = Real estate investors depend on precise, accurate location
intelligence for competitive insights about the markets and locations where
they do business. Real estate investment teams use GIS to bring together
location-specific data, mapping, and visualization technology. This enables
them to provide the latest insights about real estate markets and their
investments, now and in the future. Using thousands of global datasets,
investors can quickly understand how their real estate investments are
performing across town or around the world, quickly access precise local data
about real estate assets, on any device, anywhere, anytime, including
information on occupancy, building maintenance, property valuation, and
more.Real estate companies and investors use GIS to research markets, identify
new opportunities for growth and expansion, and manage their investments at the
market and neighborhood levels. They can also use GIS to create professional
digital and printed materials—such as 3D renderings and virtual
walk-throughs—to help market investments across platforms. Real estate
investors can use mobile data collection tools to gather property information
directly from the field and analyze and share insights across their
organizations in real time. Investors can leverage precise local knowledge
about their assets across geographies. GIS maps and dashboards help investors
see, in real-time, relevant data that can affect properties, and streamline
investment management with access to all relevant data about every asset in any
portfolio.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = How does GIS help in the real estate investment industry?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item =
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = Real estate investors depend on precise, accurate location
intelligence for competitive insights about the markets and locations where
they do business. Real estate investment teams use GIS to bring together
location-specific data, mapping, and visualization technology. This enables
them to provide the latest insights about real estate markets and their
investments, now and in the future. Using thousands of global datasets,
investors can quickly understand how their real estate investments are
performing across town or around the world, quickly access precise local data
about real estate assets, on any device, anywhere, anytime, including
information on occupancy, building maintenance, property valuation, and
more.Real estate companies and investors use GIS to research markets, identify
new opportunities for growth and expansion, and manage their investments at the
market and neighborhood levels. They can also use GIS to create professional
digital and printed materials—such as 3D renderings and virtual
walk-throughs—to help market investments across platforms. Real estate
investors can use mobile data collection tools to gather property information
directly from the field and analyze and share insights across their
organizations in real time. Investors can leverage precise local knowledge
about their assets across geographies. GIS maps and dashboards help investors
see, in real-time, relevant data that can affect properties, and streamline
investment management with access to all relevant data about every asset in any
portfolio.
.status:
..list:
...item = submittedid = 15014
category = general_qa
original-instruction = What is the Masters?
original-context =
original-response = The Masters Tournament is a golf tournament held annually
in the first week of April at Augusta National Golf Club in Augusta, Georgia.
The Masters is one of four Major golf tournaments and the only one to be played
at the same course every year. The course is renowned for its iconic holes,
impeccable groundskeeping, and colorful flowers that are typically in bloom.
The winner earns a coveted Green Jacket and a lifetime invitation back to
compete. Many players and fans consider The Masters to be their favorite
tournament given these traditions and the historical moments that have occurred
in past tournaments.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = What is the Masters?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item =
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = The Masters Tournament is a golf tournament held annually in the
first week of April at Augusta National Golf Club in Augusta, Georgia. The
Masters is one of four Major golf tournaments and the only one to be played at
the same course every year. The course is renowned for its iconic holes,
impeccable groundskeeping, and colorful flowers that are typically in bloom.
The winner earns a coveted Green Jacket and a lifetime invitation back to
compete. Many players and fans consider The Masters to be their favorite
tournament given these traditions and the historical moments that have occurred
in past tournaments.
.status:
..list:
...item = submitted {code}
was (Author: JIRAUSER280855):
What about we directly print content given a parquet file? Below is some code
sample:
{code:java}
String input = <parquet file>;
ParquetReader<SimpleRecord> reader = null;
try {
PrintWriter writer = new PrintWriter(Main.out, true);
reader = ParquetReader.builder(new SimpleReadSupport(), new
Path(input)).build();
ParquetMetadata metadata = ParquetFileReader.readFooter(new
Configuration(), new Path(input));
JsonRecordFormatter.JsonGroupFormatter formatter =
JsonRecordFormatter.fromSchema(metadata.getFileMetaData().getSchema());
for (SimpleRecord value = reader.read(); value != null; value =
reader.read()) {
if (options.hasOption('j')) {
writer.write(formatter.formatRecord(value));
} else {
value.prettyPrint(writer);
}
writer.println();
}
} finally {
if (reader != null) {
try {
reader.close();
} catch (Exception ex) {
}
}
}
}{code}
Output sample:
{code:java}
.......
id = 15013
category = general_qa
original-instruction = How does GIS help in the real estate investment industry?
original-context =
original-response = Real estate investors depend on precise, accurate location
intelligence for competitive insights about the markets and locations where
they do business. Real estate investment teams use GIS to bring together
location-specific data, mapping, and visualization technology. This enables
them to provide the latest insights about real estate markets and their
investments, now and in the future. Using thousands of global datasets,
investors can quickly understand how their real estate investments are
performing across town or around the world, quickly access precise local data
about real estate assets, on any device, anywhere, anytime, including
information on occupancy, building maintenance, property valuation, and
more.Real estate companies and investors use GIS to research markets, identify
new opportunities for growth and expansion, and manage their investments at the
market and neighborhood levels. They can also use GIS to create professional
digital and printed materials—such as 3D renderings and virtual
walk-throughs—to help market investments across platforms. Real estate
investors can use mobile data collection tools to gather property information
directly from the field and analyze and share insights across their
organizations in real time. Investors can leverage precise local knowledge
about their assets across geographies. GIS maps and dashboards help investors
see, in real-time, relevant data that can affect properties, and streamline
investment management with access to all relevant data about every asset in any
portfolio.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = How does GIS help in the real estate investment industry?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item =
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = Real estate investors depend on precise, accurate location
intelligence for competitive insights about the markets and locations where
they do business. Real estate investment teams use GIS to bring together
location-specific data, mapping, and visualization technology. This enables
them to provide the latest insights about real estate markets and their
investments, now and in the future. Using thousands of global datasets,
investors can quickly understand how their real estate investments are
performing across town or around the world, quickly access precise local data
about real estate assets, on any device, anywhere, anytime, including
information on occupancy, building maintenance, property valuation, and
more.Real estate companies and investors use GIS to research markets, identify
new opportunities for growth and expansion, and manage their investments at the
market and neighborhood levels. They can also use GIS to create professional
digital and printed materials—such as 3D renderings and virtual
walk-throughs—to help market investments across platforms. Real estate
investors can use mobile data collection tools to gather property information
directly from the field and analyze and share insights across their
organizations in real time. Investors can leverage precise local knowledge
about their assets across geographies. GIS maps and dashboards help investors
see, in real-time, relevant data that can affect properties, and streamline
investment management with access to all relevant data about every asset in any
portfolio.
.status:
..list:
...item = submittedid = 15014
category = general_qa
original-instruction = What is the Masters?
original-context =
original-response = The Masters Tournament is a golf tournament held annually
in the first week of April at Augusta National Golf Club in Augusta, Georgia.
The Masters is one of four Major golf tournaments and the only one to be played
at the same course every year. The course is renowned for its iconic holes,
impeccable groundskeeping, and colorful flowers that are typically in bloom.
The winner earns a coveted Green Jacket and a lifetime invitation back to
compete. Many players and fans consider The Masters to be their favorite
tournament given these traditions and the historical moments that have occurred
in past tournaments.
new-instruction:
.user_id:
..list:
.value:
..list:
...item = What is the Masters?
.status:
..list:
...item = submitted
new-context:
.user_id:
..list:
.value:
..list:
...item =
.status:
..list:
...item = submitted
new-response:
.user_id:
..list:
.value:
..list:
...item = The Masters Tournament is a golf tournament held annually in the
first week of April at Augusta National Golf Club in Augusta, Georgia. The
Masters is one of four Major golf tournaments and the only one to be played at
the same course every year. The course is renowned for its iconic holes,
impeccable groundskeeping, and colorful flowers that are typically in bloom.
The winner earns a coveted Green Jacket and a lifetime invitation back to
compete. Many players and fans consider The Masters to be their favorite
tournament given these traditions and the historical moments that have occurred
in past tournaments.
.status:
..list:
...item = submitted {code}
> Problem with a cat
> ------------------
>
> Key: PARQUET-2378
> URL: https://issues.apache.org/jira/browse/PARQUET-2378
> Project: Parquet
> Issue Type: Bug
> Reporter: Rémy Léone
> Priority: Major
> Attachments: image-2023-11-16-21-40-07-628.png
>
>
> *$* parquet cat train-00000-of-00001-15a05aeec7726f9d.parquet
>
> Unknown error
> shaded.parquet.org.apache.avro.SchemaParseException: Illegal character in:
> original-instruction
> at shaded.parquet.org.apache.avro.Schema.validateName(Schema.java:1607)
> at shaded.parquet.org.apache.avro.Schema.access$400(Schema.java:92)
> at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:556)
> at shaded.parquet.org.apache.avro.Schema$Field.<init>(Schema.java:595)
> at
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:295)
> at
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:279)
> at org.apache.parquet.cli.util.Schemas.fromParquet(Schemas.java:89)
> at org.apache.parquet.cli.BaseCommand.getAvroSchema(BaseCommand.java:405)
> at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:66)
> at org.apache.parquet.cli.Main.run(Main.java:163)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.parquet.cli.Main.main(Main.java:193)
> the data set in question is:
> [https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en/tree/main/data]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)