Support for arrays parquet vectorized reader

Mick Davies Tue, 16 Apr 2019 05:09:59 -0700

Hi,

I'm working with a medical data model that uses arrays of simple types to
represent things like the drug exposures and conditions that are associated
with a patient.


Using this model, patient data is co-located and is consequently processed
by Spark more efficiently. The data is stored in parquet format.

In order to improve processing time we have experimented with adding support
for simple arrays to the parquet vectorized reader.

This change gives us significant performance improvements, > 4x faster for
some operations.

I was wondering whether any enhancements like this have been considered or
whether this work is something that could be useful to the wider community.


Regards

Mick Davies





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Support for arrays parquet vectorized reader

Reply via email to