Hi Mark,

To answer the 1st question, you may want to take a look at the rewrite
command in the parquet-cli [1] which concatenates a set of parquet
files with the same schema into a larger one.

For nginx access logs to parquet conversion, AFAIK I don't know any
existing solution. We do have a convert-csv command in the cli if the
nginx-log-to-csv conversion is easy.

Hope it helps.

[1] https://github.com/apache/parquet-mr/tree/master/parquet-cli

Best,
Gang

On Wed, Mar 6, 2024 at 4:16 AM Mark Lybarger <[email protected]> wrote:

> i'm fairly new to parquet format.  my team uses this to submit data to be
> loaded to our enterprise data warehouse/data lake.  i have two questions.
>
> can i generally concatenate many parquet formatted files together to make
> one larger file?  i get millions of small xml data files from mobile
> devices and want to convert each to parquet via an aws lambda to an s3
> bucket.  then i can sweep on a cadence and concatenate the files and submit
> to be loaded to the data lake.  they don't like millions of submissions per
> day, or i would submit each individual file.
>
> secondly, i have several nginx access logs that i want to convert to
> parquet for loading to the same data lake.  are there tools for easily
> converting these logs to parquet format?
>

Reply via email to