Re: Kafka Connect Parquet Support?

Ewen Cheslack-Postava Thu, 20 Jul 2017 23:43:26 -0700

Theoretically yes, we would want to support this in Confluent's S3
connector. One of the stumbling blocks is that apparently the Parquet code
is somewhat tied to HDFS currently which causes problems when you're not
using the HDFS S3 connectivity. See, e.g.,
https://github.com/confluentinc/kafka-connect-storage-cloud/issues/26, on
the cloud storage/S3 connector.


On Wed, May 31, 2017 at 11:17 AM, Colin McCabe <[email protected]> wrote:

> Hi Clayton,
>
> It seems like an interesting improvement.  Given that Parquet is
> columnar, you would expect some space savings.  I guess the big question
> is, would each batch of records become a single parquet file?  And how
> does this integrate with the existing logic, which might assume that
> each record can be serialized on its own?
>
> best,
> Colin
>
>
> On Sun, May 7, 2017, at 02:36, Clayton Wohl wrote:
> > With the Kafka Connect S3 sink, I can choose Avro or JSON output format.
> > Is
> > there any chance that Parquet will be supported?
> >
> > For record at a time processing, Parquet isn't a good fit. But for
> > reading/writing batches of records, which is what the Kafka Connect Sink
> > is
> > writing, Parquet is generally better than Avro.
> >
> > Would attempting writing support for this be wise to try or not?
>

Re: Kafka Connect Parquet Support?

Reply via email to