Hi Manish, Thanks for proposing configured columns for min max cache. This will help customers who has large data but only few columns are used for filter condition. +1 for the solution 1.
Regards, Kanaka On Fri, Jun 22, 2018 at 11:39 AM, manishgupta88 <tomanishgupt...@gmail.com> wrote: > Thanks Ravi for the feedback. I completely agree with you that we need to > develop the second solution ASAP. Please find my response below for your > queries. > > 1. what if the query comes on noncached columns, will it start read from > disk in driver side for minmax ? > - If query is on a non-cached column then all the blocks will be selected > and min/max pruning will be done in each executor. In driver side there > will > not be any read as it is a single process and it will increase the pruning > time if for every query min/max values are read from disk. So I feel it is > better to read in distributed way using the executors. > > 2. Are we planning to cache blocklet level information or block level > information in driver side for cached columns? > - We will provide an option to user to cache at Block or Blocklet level. It > will be configurable at table level and default caching will be at Block > level. I will cover this part in detail in the design document. > > 3. What is the impact if we automatically chose cached columns from the > user query instead of letting the user configure them? > - Every query can have different filter columns. So if we choose > automatically then for every different column it will read from disk and > load into cache. This can be more cumbersome and query time can vary > unexpectedly which may not be justifiable. So I feel it is better to let > user to decide which columns to be cached. > > Let me know for any more clarifications. > > Regards > Manish Gupta > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ >