Re: How does extending an existing parquet with columns affect impala/spark performance?

2018-04-03 Thread naresh Goud
>From spark point of view it shouldn’t effect. it’s possible to extend
columns of new parquet files and it won’t affect Performance and not
required to change spark application code.



On Tue, Apr 3, 2018 at 9:14 AM Vitaliy Pisarev 
wrote:

> This is not strictly a spark question but I'll give it a shot:
>
> have an existing setup of parquet files that are being queried from impala
> and from spark.
>
> I intend to add some 30 relatively 'heavy' columns to the parquet. Each
> column would store an array of structs. Each struct can have from 5 to 20
> fields. An array may have a couple of thousands of structs.
>
> Theoretically, parquet being a columnar storage- extending it with columns
> should not affect performance of *existing* queries (since they are not
> touching these columns).
>
>- Is this premise correct?
>- What should I watch out for doing this move?
>- In general, what are the considerations when deciding on the "width"
>(i.e amount of columns) of a parquet file?
>
>
> --
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/


How does extending an existing parquet with columns affect impala/spark performance?

2018-04-03 Thread Vitaliy Pisarev
This is not strictly a spark question but I'll give it a shot:

have an existing setup of parquet files that are being queried from impala
and from spark.

I intend to add some 30 relatively 'heavy' columns to the parquet. Each
column would store an array of structs. Each struct can have from 5 to 20
fields. An array may have a couple of thousands of structs.

Theoretically, parquet being a columnar storage- extending it with columns
should not affect performance of *existing* queries (since they are not
touching these columns).

   - Is this premise correct?
   - What should I watch out for doing this move?
   - In general, what are the considerations when deciding on the "width"
   (i.e amount of columns) of a parquet file?