Few more thoughts related to this: - For the options that have numeric settings (integer, float etc.) we should have the notion of 'default value', 'minimum value' and 'maximum value' allowed. It may be that most integer valued options have Integer.MAX_VALUE as the maximum value allowed, but there may be some where we would have a smaller upper bound.
- Certain options would require a server restart to take effect... for example, even if one increases/decreases the overall memory available to the drill server while it is running, it won't take effect until the next restart. So, some additional field attributes associated with each option would indicate whether it can be changed while the server is running. - I am not sure where we are in terms of user 'privileges' - is any user allowed to change any of the parameters ? Clearly, we would need to constrain that. We should probably have a discussion/design around this. Aman On Sun, Jan 12, 2014 at 1:35 PM, Jason Altekruse <[email protected]>wrote: > I believe eventually we want to support data sources that will allow > updates. While our current formats, and development focus is on HDFS > archival storage, we have long term goals to support arbitrary storage > engines like MySQL and HBase. These systems will be able to accept updates, > and with our goal for full SQL 2003 compliance I assume that includes > updates. > > Again, I assume this would be far out, but as long as the discussion of > options was open I would bring it up as another level of granularity. > > -Jason > > > On Sun, Jan 12, 2014 at 3:17 PM, Timothy Chen <[email protected]> wrote: > > > Why do we even need transactions when we perform no updates? > > > > Tim > > > > > > > > > > On Sun, Jan 12, 2014 at 1:03 PM, Jason Altekruse > > <[email protected]>wrote: > > > > > Hi Aman, > > > > > > Thanks for the feedback. I will be defining both global and session > local > > > options in my implementation. We will be assuming that session local > > > settings will be tied to the node a client connection is associated > > with, a > > > loss of the connection will lose the session and all settings. Global > > > settings will be added to distributed cache to persist and apply > > throughout > > > all instances of Drill in the cluster. > > > > > > Another level of granularity that seems to be supported is transaction > > > local settings, at least for SQL server all settings within a > transaction > > > are reset to their defaults hen the transaction has finished. I do not > > know > > > the standard well, but I would assume that transactions are included. > In > > a > > > widely distributed system like Drill, I'm uncertain how we are going to > > be > > > able to manage locks necessary to keep transactions consistent. Still > for > > > full compliance I assume we will need to visit the issue eventually, > and > > > might just look at the right place to store those settings when we > have a > > > concept of transactions in Drill. > > > > > > - Jason > > > > > > > > > On Fri, Jan 10, 2014 at 5:28 PM, Aman Sinha <[email protected]> > wrote: > > > > > > > Hi Jason, > > > > so I suppose there are 3 types of option settings that you might be > > > > considering: > > > > 1. Something like 'SET <option_name> TO <value> ' which would > be > > > done > > > > before running a query and would be valid for all subsequent queries > > run > > > in > > > > that session. > > > > 2. A configuration file (e.g drill.conf) that would contain the > > > <key, > > > > value> pairs of options and their values and these would be parsed > and > > > > applicable for *ALL* sessions > > > > 3. I suppose another requirement is to support EXPLAIN commands > > with > > > > either 'physical' or 'logical' options. > > > > > > > > I would vote for limiting the types of options supported in Drill. > In > > > > general, the database vendors have various proprietary set of > options. > > I > > > > would think SQL server is too flexible in terms of supporting the > > > > options... maybe we could look at Postgres (or MySQL) instead which > > has a > > > > more standard single token option name and (for the most part) > single > > > > token values too. > > > > > > > > As another point of reference, Oracle supports optimizer 'hints' > where > > > you > > > > can specify enabling or disabling specific options (example, enable > > hash > > > > join, merge join etc.) for a particular subquery within a query.. > but I > > > > don't think at this stage we should worry about that. > > > > > > > > Aman > > > > > > > > > > > > On Fri, Jan 10, 2014 at 2:28 PM, Jason Altekruse > > > > <[email protected]>wrote: > > > > > > > > > Hello Drill Team, > > > > > > > > > > I have been working the past few days on implementing the SQL > options > > > > > statements within the optiq parser and Drill and unfortunately have > > > hit a > > > > > few major decision points. > > > > > > > > > > I was hoping to make the changes to optiq as minimal as possible, > and > > > > > design a simple parsing rule that would work for all future Drill > > > > options. > > > > > Right now I have it implemented with a single identifier as an > option > > > > name, > > > > > and an identifier or literal as the value (using various types for > > > > options > > > > > needing int, decimal or datetime literals). I did need to add an > > > > additional > > > > > rule to parse the reserved word ON, as it is used to set most of > the > > > > > options I have found and is not parsed as a valid identifier. > > > > > > > > > > I have looked at a number of the options available in various SQL > > > > > implementations, but have found no reference to any ANSI standard > > > > options. > > > > > I am assuming that they are all vendor specific, but any > > documentation > > > > on a > > > > > set of standard options would be greatly appreciated. For anyone > who > > > > knows > > > > > their way around the full standard, I found an old draft available > > here > > > > (I > > > > > did not realize that the current one was behind a paywall). I did > not > > > > have > > > > > any luck yet finding information on options, but will continue > > > searching. > > > > > http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt > > > > > > > > > > http://technet.microsoft.com/en-us/library/ms190356.aspx > > > > > > http://technet.microsoft.com/en-us/library/ms173763(v=sql.105).aspx > > > > > > > > > > Looking at the option list for SQL server, I notice the unfortunate > > > > > presence of multi-token identifiers (Example in first link: > > STATISTICS > > > > IO) > > > > > of options, as well as option values (Example in second link: READ > > > > > UNCOMMITTED). This creates an unfortunate ambiguity that would seem > > to > > > be > > > > > best resolved by a more robust parsing rule. Unfortunately, as we > > need > > > to > > > > > support new options with either of these features we would have to > > > change > > > > > them in the optiq parser. > > > > > > > > > > The other option is that we handle the ambiguity in Drill and just > > make > > > > the > > > > > amendments to optiq simply hand back a list of string tokens found > > > after > > > > > the word SET at the beginning of a statement. > > > > > > > > > > So my question is in two parts, if there are standard options that > > > > require > > > > > us to handle this ambiguity, we obviously need to deal with it. In > > that > > > > > case we have to decide if it belongs in optiq. If the standard > > > specifies > > > > no > > > > > required options, would we be interested in limiting the types of > > > options > > > > > supported by Drill? A reasonable might be single token option > names, > > > and > > > > we > > > > > could then easily allow multi-token values. > > > > > > > > > > As a side note, I was thinking it might be worth including a > parsing > > > rule > > > > > for setting an option with a list of values. I cannot find any uses > > of > > > > such > > > > > a feature incurrent implementations, but it anyone has an idea of > > where > > > > > they might be useful I'd be happy to include the feature (this can > > > always > > > > > be done in the future as a hack where comma separated lists are > > placed > > > in > > > > > quoted strings and then parsed on the Drill side). > > > > > > > > > > -Jason Altekruse > > > > > > > > > > > > > > >
