Hi All, I am writing a custom storage plugin to read and query non-static json files stored on remote services and wanted to use something similar to Drill's partition pruning to optimise my queries.
The files are looked dynamically within the plugin up via an external service based on the table-id and, optionally also, one of the attributes in json files 'age'. IOW, the lookup service API resembles: List<FileLocations> getDataSources (String tableId) List<FileLocations> getDataSources (String tableId, long ageStart, long ageEnd) So, a query like SELECT * FROM pluginName.tableId WHERE age > 10 AND age < 20, has the potential for optimisation to only scan limited files rather than all the data-sources with all the ages. >From my understanding so far from the drill's documentation, this would be hard to do because: a) Since the remote json files are non-static, meaning they keep changing by the external service, my understanding is that generation of static Parquet files and using Parquet metadata for pruning is not going to help, or it will need to be generated for every query. (Also, CTAS operations on my system are not allowed). b) The drill's pushdown capability is apparently also limited to only 'SELECT col FROM (SELECT * FROM tableid)' types of select subqueries. So, it would not be applicable to generic SELECT queries. I just wanted to confirm that my understanding is correct and I have not overloooked some aspect of drill which enables such type of pruning. Thanks, Lokendra