I believe eventually we want to support data sources that will allow updates. While our current formats, and development focus is on HDFS archival storage, we have long term goals to support arbitrary storage engines like MySQL and HBase. These systems will be able to accept updates, and with our goal for full SQL 2003 compliance I assume that includes updates.
Again, I assume this would be far out, but as long as the discussion of options was open I would bring it up as another level of granularity. -Jason On Sun, Jan 12, 2014 at 3:17 PM, Timothy Chen <[email protected]> wrote: > Why do we even need transactions when we perform no updates? > > Tim > > > > > On Sun, Jan 12, 2014 at 1:03 PM, Jason Altekruse > <[email protected]>wrote: > > > Hi Aman, > > > > Thanks for the feedback. I will be defining both global and session local > > options in my implementation. We will be assuming that session local > > settings will be tied to the node a client connection is associated > with, a > > loss of the connection will lose the session and all settings. Global > > settings will be added to distributed cache to persist and apply > throughout > > all instances of Drill in the cluster. > > > > Another level of granularity that seems to be supported is transaction > > local settings, at least for SQL server all settings within a transaction > > are reset to their defaults hen the transaction has finished. I do not > know > > the standard well, but I would assume that transactions are included. In > a > > widely distributed system like Drill, I'm uncertain how we are going to > be > > able to manage locks necessary to keep transactions consistent. Still for > > full compliance I assume we will need to visit the issue eventually, and > > might just look at the right place to store those settings when we have a > > concept of transactions in Drill. > > > > - Jason > > > > > > On Fri, Jan 10, 2014 at 5:28 PM, Aman Sinha <[email protected]> wrote: > > > > > Hi Jason, > > > so I suppose there are 3 types of option settings that you might be > > > considering: > > > 1. Something like 'SET <option_name> TO <value> ' which would be > > done > > > before running a query and would be valid for all subsequent queries > run > > in > > > that session. > > > 2. A configuration file (e.g drill.conf) that would contain the > > <key, > > > value> pairs of options and their values and these would be parsed and > > > applicable for *ALL* sessions > > > 3. I suppose another requirement is to support EXPLAIN commands > with > > > either 'physical' or 'logical' options. > > > > > > I would vote for limiting the types of options supported in Drill. In > > > general, the database vendors have various proprietary set of options. > I > > > would think SQL server is too flexible in terms of supporting the > > > options... maybe we could look at Postgres (or MySQL) instead which > has a > > > more standard single token option name and (for the most part) single > > > token values too. > > > > > > As another point of reference, Oracle supports optimizer 'hints' where > > you > > > can specify enabling or disabling specific options (example, enable > hash > > > join, merge join etc.) for a particular subquery within a query.. but I > > > don't think at this stage we should worry about that. > > > > > > Aman > > > > > > > > > On Fri, Jan 10, 2014 at 2:28 PM, Jason Altekruse > > > <[email protected]>wrote: > > > > > > > Hello Drill Team, > > > > > > > > I have been working the past few days on implementing the SQL options > > > > statements within the optiq parser and Drill and unfortunately have > > hit a > > > > few major decision points. > > > > > > > > I was hoping to make the changes to optiq as minimal as possible, and > > > > design a simple parsing rule that would work for all future Drill > > > options. > > > > Right now I have it implemented with a single identifier as an option > > > name, > > > > and an identifier or literal as the value (using various types for > > > options > > > > needing int, decimal or datetime literals). I did need to add an > > > additional > > > > rule to parse the reserved word ON, as it is used to set most of the > > > > options I have found and is not parsed as a valid identifier. > > > > > > > > I have looked at a number of the options available in various SQL > > > > implementations, but have found no reference to any ANSI standard > > > options. > > > > I am assuming that they are all vendor specific, but any > documentation > > > on a > > > > set of standard options would be greatly appreciated. For anyone who > > > knows > > > > their way around the full standard, I found an old draft available > here > > > (I > > > > did not realize that the current one was behind a paywall). I did not > > > have > > > > any luck yet finding information on options, but will continue > > searching. > > > > http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt > > > > > > > > http://technet.microsoft.com/en-us/library/ms190356.aspx > > > > http://technet.microsoft.com/en-us/library/ms173763(v=sql.105).aspx > > > > > > > > Looking at the option list for SQL server, I notice the unfortunate > > > > presence of multi-token identifiers (Example in first link: > STATISTICS > > > IO) > > > > of options, as well as option values (Example in second link: READ > > > > UNCOMMITTED). This creates an unfortunate ambiguity that would seem > to > > be > > > > best resolved by a more robust parsing rule. Unfortunately, as we > need > > to > > > > support new options with either of these features we would have to > > change > > > > them in the optiq parser. > > > > > > > > The other option is that we handle the ambiguity in Drill and just > make > > > the > > > > amendments to optiq simply hand back a list of string tokens found > > after > > > > the word SET at the beginning of a statement. > > > > > > > > So my question is in two parts, if there are standard options that > > > require > > > > us to handle this ambiguity, we obviously need to deal with it. In > that > > > > case we have to decide if it belongs in optiq. If the standard > > specifies > > > no > > > > required options, would we be interested in limiting the types of > > options > > > > supported by Drill? A reasonable might be single token option names, > > and > > > we > > > > could then easily allow multi-token values. > > > > > > > > As a side note, I was thinking it might be worth including a parsing > > rule > > > > for setting an option with a list of values. I cannot find any uses > of > > > such > > > > a feature incurrent implementations, but it anyone has an idea of > where > > > > they might be useful I'd be happy to include the feature (this can > > always > > > > be done in the future as a hack where comma separated lists are > placed > > in > > > > quoted strings and then parsed on the Drill side). > > > > > > > > -Jason Altekruse > > > > > > > > > >
