Hi Manish,

Thanks for proposing configured columns for min max cache. This will help
customers who has large data but only few columns are used for filter
condition.
+1 for the solution 1.


Regards,
Kanaka

On Fri, Jun 22, 2018 at 11:39 AM, manishgupta88 <tomanishgupt...@gmail.com>
wrote:

> Thanks Ravi for the feedback. I completely agree with you that we need to
> develop the second solution ASAP. Please find my response below for your
> queries.
>
> 1. what if the query comes on noncached columns, will it start read from
> disk in driver side for minmax ?
> - If query is on a non-cached column then all the blocks will be selected
> and min/max pruning will be done in each executor. In driver side there
> will
> not be any read as it is a single process and it will increase the pruning
> time if for every query min/max values are read from disk. So I feel it is
> better to read in distributed way using the executors.
>
> 2. Are we planning to cache blocklet level information or block level
> information in driver side for cached columns?
> - We will provide an option to user to cache at Block or Blocklet level. It
> will be configurable at table level and default caching will be at Block
> level. I will cover this part in detail in the design document.
>
> 3. What is the impact if we automatically chose cached columns from the
> user query instead of letting the user configure them?
> - Every query can have different filter columns. So if we choose
> automatically then for every different column it will read from disk and
> load into cache. This can be more cumbersome and query time can vary
> unexpectedly which may not be justifiable. So I feel it is better to let
> user to decide which columns to be cached.
>
> Let me know for any more clarifications.
>
> Regards
> Manish Gupta
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>

Reply via email to