Hi Mark, To answer the 1st question, you may want to take a look at the rewrite command in the parquet-cli [1] which concatenates a set of parquet files with the same schema into a larger one.
For nginx access logs to parquet conversion, AFAIK I don't know any existing solution. We do have a convert-csv command in the cli if the nginx-log-to-csv conversion is easy. Hope it helps. [1] https://github.com/apache/parquet-mr/tree/master/parquet-cli Best, Gang On Wed, Mar 6, 2024 at 4:16 AM Mark Lybarger <[email protected]> wrote: > i'm fairly new to parquet format. my team uses this to submit data to be > loaded to our enterprise data warehouse/data lake. i have two questions. > > can i generally concatenate many parquet formatted files together to make > one larger file? i get millions of small xml data files from mobile > devices and want to convert each to parquet via an aws lambda to an s3 > bucket. then i can sweep on a cadence and concatenate the files and submit > to be loaded to the data lake. they don't like millions of submissions per > day, or i would submit each individual file. > > secondly, i have several nginx access logs that i want to convert to > parquet for loading to the same data lake. are there tools for easily > converting these logs to parquet format? >
