Are you comparing the read speed on a hadoop cluster, or locally on a
single machine? In a micro benchmark like this, using hadoop local mode for
parquet, but not for avro, could introduce a lot of overhead. Just curious
how you're doing the comparison.

On Thu, May 7, 2015 at 1:06 PM, Robert Synnott <[email protected]> wrote:

> Hi,
> I just started trying out Parquet, and ran into a performance issue. I
> was using the Avro support to try working with a test schema, using
> the 'standalone' approach from here:
>
> http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/
>
> I took an existing Avro schema, consisting of a few columns each
> containing a map, and wrote, then read back, about 40MB of data using
> both Avro's own serialisation, and Parquet's. Parquet's ended up being
> about five times slower. This ratio was maintained when I moved to
> using ~1GB data. I'd expect it to be a little slower, as I was reading
> back all columns, but five times seems high. Is there anything simple
> I might be missing?
> Thanks
> Rob
>



-- 
Alex Levenson
@THISWILLWORK

Reply via email to