Hi Doug, I am not too crazy about removing the DELETE command. In my mind, HQL syntax needs to be heavily biased towards existing SQL users as opposed to being the cleanest syntax for Hypertable power users (if you're a power user, you really shouldn't be using HQL for anything other than sanity checks and administrative tasks anyway :) ). Correct me if I'm wrong, but I think one of the main goals of HQL is to be similar to SQL. I would lean towards leaving the existing DELETE in place for now and overhauling it sometime in the future.
I think it does make sense to augment the INSERT command. It sounds like your changes won't break the existing insert syntax, but add the ability insert delete records. This makes complete sense to and the curious user and look through the documentation to figure out that deletes are actually inserts. (On a side note, is there a clean way to add functionality to the new insert to delete the last 'n' versions (without knowing the timestamps) ?) I'm also not sure if making the DELETE command support all the predicates that SELECT does is worth the effort at this point. SELECT is quite involved as it is, and I'm not sure its worth jumping through to all the hoops required to implement that. -Sanjit On Thu, Aug 9, 2012 at 10:39 AM, Doug Judd <[email protected]> wrote: > Hi Chris, > > Thanks for the feedback. Next time be sure to CC the lists so that > everyone can benefit from the discussion. > > There are a few reasons why I propose removing the DELETE command in favor > of a more general INSERT command. First, I've always found the DELETE > command to be a source of confusion. It belies the fact that deletes are > handled by *inserting* tombstone records. Second, the current DELETE > command is not very expressive and is also confusing. It only allows you > to delete data from a single row. Also, the syntax for doing a "CELL" > delete vs. a "CELL VERSION" delete is subtly confusing. The syntax looks > very similar: > > TIMESTAMP <timestamp> > VERSION <timestamp> > > Both clauses supply a timestamp, but the VERSION form has the additional > effect of changing the flags field to FLAG_DELETE_CELL_VERSION. This > doesn't feel like good API design. > > Lastly, we've had a number of requests for the ability to insert and > delete data in a single HQL command. Currently the only way to do that via > HQL is through two separate API calls which can lead to transient > inconsistencies. > > Having said all of that, I'm open to a DELETE command, but it really needs > a complete overhaul. We could introduce a new DELETE command with syntax > that is modelled after the SELECT syntax. However, that's a pretty big > project. We'll have to figure out how to cleanly express CELL deletes and > CELL VERSION deletes. Also, a query like the following: > > DELETE * FROM <table> WHERE "bar" < ROW <= "foo"; > > would turn into two separate commands under-the-hood. First a query to > figure out what rows are to be deleted, followed by a (potentially > enormous) set of tombstone inserts. To handle this in a somewhat > consistent way, it should probably be implemented as some sort of > transaction where a single revision number is chosen and supplied > consistently to both sub-operations. > > My inclination is to remove the DELETE command for now, but keep the > documentation and change it to something along the lines of: > > "Deletes are handled by inserting tombstone records that describe what > data is to be deleted. Cells deleted by a tombstone are not physically > deleted immediately, but become invisible once the tombstone has been > inserted. Tombstones and the cells in which they delete are physically > removed at a later time through garbage collection. See the INSERT command > for how to delete data via tombstone insertion." > > This will force everyone to shift away from the current broken DELETE > command and get the pain of API change out of the way now. Then when we > have time to revisit DELETE and make it fully expressive (including support > for queries like the one shown above), we can then unveil the new command > when it's ready without causing disruption. Thoughts? > > - Doug > > On Wed, Aug 8, 2012 at 11:13 PM, Christoph Rupp <[email protected]>wrote: > >> Hi Doug, >> >> a few comments: >> >> 1. the DELETE keyword is self-explanatory, whereas it's not really >> intuitive that you have to insert something in order to delete it. That >> could turn out to be a problem for evaluators and newbies. I would suggest >> at least to leave DELETE in the parser, but translate it to an INSERT >> internally. >> >> 2. We have one issue requesting more flexibility in DELETE (and this was >> asked on the mailing list and in stack overflow a couple of times). DELETE >> should support all the predicates that are also supported by SELECT. If you >> remove the DELETE keyword then this change will become very difficult. >> http://code.google.com/p/hypertable/issues/detail?id=141 >> >> bye >> Christoph >> >> >> 2012/8/9 Doug Judd <[email protected]> >> >>> I'm in the process of re-writing the HQL parser using the latest >>> version of Boost Spirit. As part of this process, I'd like to clean up >>> HQL. I propose that we drop the DELETE command in favor of a more >>> flexible INSERT command. The basic format of the INSERT command will >>> remain the same: >>> >>> INSERT INTO <table> VALUES '(' <value-tuple> ')' [ '(' <value-tuple> ')' >>> ... ] >>> >>> But the format of <value-tuple> would change to be more flexible, >>> allowing for *deletes* as well as *inserts*. >>> >>> *Timestamps* - As is the case now, timestamps will be optional in the >>> <value-tuple>. However, I propose moving the timestamp to the end and >>> not requiring quotes around the timestamp. Here are examples of the >>> supported formats for the <timestamp> field: >>> >>> auto # auto-assign timestamp >>> min # absolute minimum timestamp >>> 2012 # date (year-only) >>> 2012-08 # date (year+month) >>> 2012-08-08 # date >>> 2012-08-08 12:02:00 # date+time >>> 2012-08-08 12:02:00:823432455 # date+time with nanoseconds >>> 1234343454s # seconds since the epoch >>> 1234343454782639847ns # nanoseconds since the epoch >>> >>> *Inserts* - The following are <value-tuple> examples for inserts: >>> >>> ("row", "c:q", "value") >>> ("row", "c:q", "value", <timestamp>) >>> (INS, "row", "c:q", "value") >>> (INS, "row", "c:q", "value", <timestamp>) >>> >>> *Delete Row* - The following are <value-tuple> examples for deleting >>> rows: >>> >>> (DEL, "row") >>> (DEL, "row", <timestamp>) >>> >>> *Delete Column Family* - The following are <value-tuple> examples for >>> deleting column families: >>> >>> (DEL, "row", "c") >>> (DEL, "row", "c", <timestamp>) >>> >>> *Delete Cell* - The following are <value-tuple> examples for deleting >>> cells (e.g. delete all cell versions with this timestamp or less): >>> >>> (DEL, "row", "c:q") >>> (DEL, "row", "c:q", <timestamp>) >>> >>> *Delete Cell Version* - The following are <value-tuple> examples for >>> deleting a specific cell version (e.g. delete the cell version with this >>> exact timestamp): >>> >>> (DEL_VER, "row", "c:q") >>> (DEL_VER, "row", "c:q", <timestamp>) >>> >>> The INSERT command will accept a mix of any of the above <value-tuple> >>> formats, which will obviate the need for the DELETE command (which was kind >>> of wacky anyway). Here are a couple of DELETE examples along with how they >>> would be formulated with the new INSERT syntax: >>> >>> DELETE c FROM foo WHERE ROW = "row1; >>> INSERT INTO foo VALUES (DEL, "row1", "c"); >>> >>> DELETE * FROM foo WHERE ROW = "row1 TIMESTAMP "2012-02-13"; >>> INSERT INTO foo VALUES (DEL, "row1", 2012-02-13); >>> >>> If you have any feedback, please let me know. I'd like make these >>> changes for the upcoming 0.9.6.1 release. >>> >>> - Doug >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Hypertable Development" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group at >>> http://groups.google.com/group/hypertable-dev?hl=en. >>> >> >> > > > -- > Doug Judd > CEO, Hypertable Inc. > > -- > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
