Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1471#discussion_r151827453 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -687,16 +689,17 @@ protected Expression getFilterPredicates(Configuration configuration) { // get tokens for all the required FileSystem for table path TokenCache.obtainTokensForNamenodes(job.getCredentials(), new Path[] { new Path(absoluteTableIdentifier.getTablePath()) }, job.getConfiguration()); - - TableDataMap blockletMap = DataMapStoreManager.getInstance() - .getDataMap(absoluteTableIdentifier, BlockletDataMap.NAME, - BlockletDataMapFactory.class.getName()); + boolean distributedCG = Boolean.parseBoolean(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP, + CarbonCommonConstants.USE_DISTRIBUTED_DATAMAP_DEFAULT)); + TableDataMap blockletMap = + DataMapStoreManager.getInstance().chooseDataMap(absoluteTableIdentifier); DataMapJob dataMapJob = getDataMapJob(job.getConfiguration()); List<ExtendedBlocklet> prunedBlocklets; - if (dataMapJob != null) { + if (distributedCG || blockletMap.getDataMapFactory().getDataMapType() == DataMapType.FG) { --- End diff -- It seems distributedCG and FG behave the same, right? Earlier I though FG datamap will be called in ScanRDD.compute, but it seems not? If we collect all pruned blocklet in driver, will it be too many for driver?
---