Hi Dev I am currenlty working on min max optimization whereIn for string/varhcar data types column we will decide internally whether to write min max or not.
*Background* Currently we are storing min max for all the columns. Currently we are storing page min max, blocklet min max in filefooter and all the blocklet metadata entries in the shard. Consider the case where each column data size is more than 10000 characters. In this case if we write min max then min max will be written 3 times for each column and it will lead to increase in store size which will impact the query performance. *Design proposal* 1. We will introduce a configurable system level property for max characters *"carbon.string.allowed.character.count".* If the data crosses this limit then min max will not be stored for that column. 2. If a page does not contain min max for a column, then blocklet min max will also not contain the entry for min max of that column. 3. Thrift file will be modified to introduce a option Boolean flag which will used in query to identify whether min max is stored for the filter column or not. 4. As of now it will be supported only for dimensions of string/varchar type. We can extend it further to support bigDecimal type measures also in future if required. 5. Block and blocklet dataMap cache will also include storing min max Boolean flag for dimensions column based on which filter pruning will be done. If min max is not written for any column then isScanRequired will return true in driver pruning. 6. In executor again page and blocklet level min max will be checked for filter column. If min max is not written then complete page data will be scanned. *Backward compatibility* 1. For stores prior to 1.5.0 min max flag for all the columns will be set to true during loading dataMap in query flow. Please feel free to share your inputs and suggestions. Regards Manish Gupta