For Python Parquet support, hopefully we can have cross language pipelines
solve this so we only need to implement it once. If it is really popular,
having it implemented more then once may be worthwhile.

Would the point of Arrow be to treat it as an IO connector similar to
ParquetIO or JdbcIO (I was wondering what the purpose of the Arrow
integration is)?

Every C library adds some difficulty for users to test out their pipelines
locally unless the C library was cross compiled for several distributions.
Using C libraries increases the need for using a container like Docker for
execution.


On Wed, May 30, 2018 at 1:56 PM Austin Bennett <whatwouldausti...@gmail.com>
wrote:

> I can see great use cases with s3/Parquet - so that's a great addition
> (which JB is addressing, for Java)!
>
> It would be even more ideal for the use cases I find myself around for
> there to be python parquet support, so for perhaps this next release:
> Would it make sense to be exploring: https://arrow.apache.org ?  I'd be
> happy to explore proper procedure for design/feature proposal and
> documentation for Beam, how to scope and develop it.
>
> Also, from the little I've looked at actual implementation, it appears
> that (py)arrow relies on underlying C binaries, which was listed as a
> problem or at least a point against choice of package with the developing
> python/kafka source.  How big an issue is that -- what else should I be
> considering?  Guidance absolutely welcomed!
>

Reply via email to