Hi Ryan, There is an obscure, but very handy feature of Drill called table functions. [1] These allow you to set parameters of your format plugin as part of a query.
You mentioned a storage plugin. I've not tried a table function with a storage plugin. I have tested table functions with a format plugin. Your format or storage plugin has a Jackson-serializable Java class. Normally you set the properties for your plugin in the Drill web console. But, these can also be set in the table function. I had a use case something like yours. I defined an example "regex" plugin where the user can specify a regular expression to apply to to a text file to parse columns. The use can then provide a list of column names. Using the table function, I could specify the regex and column names per-query. This exercise did, however, point out two current limitations of table functions. First, they work only with simple data types (strings, ints). (DRILL-6169) So, my list of columns has to be a single string with a comma delimited list of columns. I could not use the more natural list of strings. Second, table functions do not retain the configured value of parameters: you have to include all parameters in the function, not just the ones you want to change. (DRILL-6168) Yet another option is to set a session option. However, unless you do a bit of clever coding, format plugins don't have visibility to session options (DRILL-5181). Perhaps your use case provides a compelling reason to fix some of these limitations... Thanks, - Paul [1] https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters, see the section "Using the Formats Attributes as Table Function Parameters". On Saturday, April 7, 2018, 10:37:05 PM PDT, Aman Sinha <[email protected]> wrote: A better option would be to have a user-defined function that takes 2 parameters and evaluates to a boolean value. e.g select * from myTable where MyUDF(notColumn, 'value') IS TRUE; The Storage Plugin that you are developing would need to implement a pushdown rule that looks at the filter condition and if it contains 'MyUDF()', it would pushdown to the scan/reader corresponding to your plugin. On Sat, Apr 7, 2018 at 6:58 PM, Hanumath Rao Maduri <[email protected]> wrote: > Hello Ryan, > > Thank you for trying out Drill. Drill/Calcite expects "notColumn" to be > supplied by the underlying scan. > However, I expect that this column will be present in the scan but not past > the filter (notColumn = 'value') in the plan. > In that case you may need to pushdown the filter to the groupScan and then > remove the column projections from your custom groupscan. > > It would be easy for us to guess what could be the issue, if you can post > the logical and physical query plan's for this query. > > Hope this helps. Please do let us know if you have any further issues. > > Thanks, > > > On Sat, Apr 7, 2018 at 2:08 PM, Ryan Shanks <[email protected]> > wrote: > > > Hi Drill Dev Team! > > > > I am writing a custom storage plugin and I am curious if it is possible > in > > Drill to pass a filter value, in the form of a where clause, that is not > > related to a column. What I would like to accomplish is something like: > > > > select * from myTable where notColumn = 'value'; > > > > In the example, notColumn is not a column in myTable, or any other table, > > it is just a specific parameter that the storage plugin will use in the > > filtering process. Additionally, notColumn would not be returned as a > > column so Drill needs to not expect it as a part of the 'select *'. I > > created a rule that will push down and remove these non-column filter > > calls, but I need to somehow tell drill/calcite that the filter name is > > valid, without actually registering it as a column. The following error > > occurs prior to submitting any rules: > > > > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION > ERROR: > > From line 1, column 35 to line 1, column 39: Column 'notColumn' not found > > in any table > > > > > > Alternatively, can I manipulate star queries to only return a subset of > > all the columns for a table? > > > > Any insight would be greatly appreciated! > > > > Thanks, > > Ryan > > >
