Re: [DISCUSS] get back update status after invoking executeUpdate(...)

Kasper Sørensen Mon, 12 Aug 2013 06:48:37 -0700

2013/8/12 Rohit <[email protected]>
>
> Thanks for this.
> I've 2 queries :
>
> 1. Instead of having 3 methods why can't we define an Enum (Delete, Update, 
> Insert) and use only one method
> which accepts this enum and returns the number of affected rows corresponding 
> to that operation ?


We can do that, but what would the advantage be? This would be a bit
like java.util.Calendar's get and set methods ... I don't think many
people think it's more convenient to say

foo.get(SomeEnum.BAR);

instead of

foo.getBar();

>
> 2. How are we counting the affected rows ? Do we have collection of affected 
> objects to do so (for update, delete and insert) ?

This depends on the backing datastore of course. Some will not be able
to inform us, and we will return -1 in those cases. For instance...

JDBC provides this number after executing the individual statements,
so will can simply sum those numbers together, of course based on our
own model's knowledge about whether the statement was an insert,
update or delete operation.

For CSV and Excel spreadsheets we are performing the updates ourselves
(a big streaming read+write operation) and in those cases we must
increment the numbers everytime we hit a record to update etc.

For MongoDB and CouchDB I think the numbers are also sent back to us
via the drivers, but gotta check. If not, then we'll return -1 in
those cases.

>
> Yes we can discuss the async operation(s) in another Jira issue. Thanks again.
>
> Cheers,
> -RS
>
>
> On 12-Aug-2013, at 1:09 PM, Kasper Sørensen <[email protected]> 
> wrote:
>
> > Let's watch out not to meddle the two features we're talking about.
> >
> > I have been bringing this story forward because I have seen usages where we
> > would like to get an indication of the number of affected rows by some
> > update script. This would still be in a traditional blocking mode. And it
> > is in line with proposing the three methods (or four, if a total is also to
> > be provided) mentioned above.
> >
> > The thing you're proposing is something else in my opinion. I would dub
> > that 'asynchronuous/non-blocking/reactive update execution' or something
> > like that, and I would prefer offering such a feature in a different method
> > since .executeUpdate already has a semantic to it which is blocking.
> > Offering a non-blocking variant of the method impacts everything about the
> > method signature - the return type (a future-like object), the arguments (a
> > listener, both for success and failure) and the exceptions thrown (none,
> > since the execution is happening asynchronuously). I am all open for taking
> > on this kind of update method as well, but maybe we should take that in a
> > separate email thread / JIRA issue (if we agree that it is a entirely
> > different method)?
> >
> > Kasper
> >
> >
> > 2013/8/12 rohit sharma <[email protected]>
> >
> >> Agreed ..for returning the list of affected rows. I think then we need not
> >> require following methods:
> >> public int getInsertedRows();
> >> public int getUpdatedRows();
> >> public int getDeletedRows();
> >>
> >> As per below code an UpdateStatus object is communicating the status of
> >> UpdateScript's run method, not individual insert, delete or update
> >> operations.
> >>
> >> UpdateStatus status = dataContext.executeUpdate(new UpdateScript() {
> >>>  public void run(UpdateCallback cb) {
> >>>    cb.insertInto(table).value(...).execute();
> >>>    cb.insertInto(table).value(...).execute();
> >>>    cb.update(table).where(...).value(...).execute();
> >>>    cb.deleteFrom(table).where(...).execute();
> >>>  }
> >>> });
> >>
> >> I think we need following methods :
> >> 1. Method to state if UpdateScript executed. However, this will not
> >> indicate any failure or success.
> >> 2. Method(s) to indicate is execution was successful or failure.
> >> 3. Method to communicate custom data/info/any results from run method.
> >> I think this is becoming like Future.
> >>
> >>
> >> On Sun, Aug 11, 2013 at 3:30 PM, Kasper Sørensen <
> >> [email protected]> wrote:
> >>
> >>> The main development branch is 'master'. The .executeUpdate(...) method
> >> is
> >>> in the subinterface UpdateableDataContext.
> >>>
> >>> My point is that you can provide an update script with many operations in
> >>> one go. Take this example:
> >>>
> >>> UpdateStatus status = dataContext.executeUpdate(new UpdateScript() {
> >>>>  public void run(UpdateCallback cb) {
> >>>>    cb.insertInto(table).value(...).execute();
> >>>>    cb.insertInto(table).value(...).execute();
> >>>>    cb.update(table).where(...).value(...).execute();
> >>>>    cb.deleteFrom(table).where(...).execute();
> >>>>  }
> >>>> });
> >>>
> >>>
> >>> In this example we know that 2 records have been inserted. Some amount of
> >>> records have been updated (that may or may not be reported by the backing
> >>> datastore) and some amount of records have been deleted (that may or may
> >>> not have been reported by the backing datastore).
> >>>
> >>> Getting a list of affected records would be very complicated IMO, and
> >> would
> >>> only be consistent if done at the time of the transaction. That means
> >> that
> >>> if we even figure out how to do it, we also have to do it eagerly, which
> >> is
> >>> obviously not a good thing to do since I think getting that list is a
> >>> feature that would not apply to most use cases.
> >>>
> >>> The idea about doing this in a multithreaded/nonblocking fashion is
> >>> interesting. It would definately have to be an option in my opinion,
> >> since
> >>> I think the interface right now indicates that the executeUpdate method
> >>> will return succesfully when the update is done (ie. it throws those
> >>> exceptions that may arise during the update, and an unblocking variant
> >>> would not be able to do that).
> >>>
> >>>
> >>> 2013/8/11 Rohit <[email protected]>
> >>>
> >>>>
> >>>> My point is why do we need 3 methods to provide the status ? We can use
> >>>> one method to provide the status (Returns a boolean if update
> >> operation a
> >>>> done or not) and other to provide a list of affected rows/objects keys,
> >>>> this would be less memory intensive.
> >>>> And in case of deletion we can throw a custom exception stating " since
> >>>> this is a delete operation we cannot provide affected rows".
> >>>>>>>>> public interface UpdateStatus {
> >>>>>>>>>  public <List or Collection> getAffectedRowsKeys(); or use
> >>>> getRowsKeys(); or getKeys() throws MetaModelException;
> >>>>                   public Boolean isDone();
> >>>>>>>>> }
> >>>>
> >>>>
> >>>> I see following remote branches in git repo:
> >>>>  remotes/origin/HEAD -> origin/master
> >>>>  remotes/origin/hbase-module
> >>>>  remotes/origin/master
> >>>>
> >>>> Which is the current development branch as I couldn't find
> >> executeUpdate
> >>>> in DataContext ?
> >>>>
> >>>> On 11-Aug-2013, at 1:04 PM, Kasper Sørensen <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> I agree on the reasoning here, but getting a list of affected rows
> >> may
> >>>>> prove to be both quite difficult (from consistency point of view it
> >>>>> needs to be created immediately after some update - how can that be
> >>>>> implemented in an effecient manner (lazy, since you wouldn't always
> >>>>> want to get this list, but then consistency is gone), and how to
> >>>>> guarantee that those rows are even available (e.g. after a DELETE
> >>>>> statement in a JDBC database, you cannot get the rows anymore)), and
> >>>>> too memory intensive (updates can be anything from a single record to
> >>>>> millions of records).
> >>>>>
> >>>>> That's why I think the three methods (maybe with an additional fourth
> >>>>> method, returning the total num. of affected rows) is the best
> >> initial
> >>>>> set of methods.
> >>>>>
> >>>>> 2013/8/11 Rohit <[email protected]>:
> >>>>>> Having an abstraction for communicating status would be a cleaner
> >>>> approach. As it allows us to extend the functionality if required in
> >>> future.
> >>>>>> For Example:
> >>>>>> Lets say we are executing executeUpdate in multithreaded env. then
> >> we
> >>>> can provide methods to check the status of the update. (running or
> >> done).
> >>>>>>
> >>>>>> Also rather than having methods returning int I think we should
> >> return
> >>>> a collection of Affected Rows.
> >>>>>> And rather than having multiple methods for insert/delete/update
> >> just
> >>>> have a single method returning the affected rows.
> >>>>>>
> >>>>>> I think making so will make the API simpler yet functional.
> >>>>>>
> >>>>>>>>> public interface UpdateStatus {
> >>>>>>>>> public <List or Collection> getAffectedRows(); or just use
> >>> getRows()
> >>>>>>>>> }
> >>>>>>
> >>>>>>
> >>>>>> -RS
> >>>>>>
> >>>>>> On 10-Aug-2013, at 10:09 PM, Kasper Sørensen <
> >>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> My experience with returning something like an
> >>>>>>> int/String/other-simple-type is that it seems to fit the purpose at
> >>>>>>> first, but then someone asks "couldn't we also get the status of
> >>>>>>> [something], too?" ... and then you regret not encapsulating the
> >>>>>>> return value so that you could just add another getter on it.
> >>>>>>>
> >>>>>>> 2013/8/10 Henry Saputra <[email protected]>:
> >>>>>>>> We could just make the executeUpdate just return number of
> >> elements
> >>>>>>>> affected similar to relational databases.
> >>>>>>>>
> >>>>>>>> It would be simpler and serve similar purpose.
> >>>>>>>>
> >>>>>>>> - Henry
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Aug 9, 2013 at 12:15 AM, Kasper Sørensen <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Allow me to also bump this issue.
> >>>>>>>>>
> >>>>>>>>> Since this would be a change that would not be compile-time
> >>>>>>>>> incompatible with previous versions, I suggest we introduce at
> >>> least
> >>>>>>>>> the return type of the method now. Since everyone moving the
> >> Apache
> >>>>>>>>> MetaModel from the old metamodel would anyways have to recompile,
> >>>> they
> >>>>>>>>> would not feel the runtime incompatibility of void vs.
> >>>> SomeReturnType.
> >>>>>>>>>
> >>>>>>>>> By now I suggest we just keep a very simple returned interface
> >>> which
> >>>>>>>>> we can always choose to expand (implemented internally, not by
> >>>>>>>>> consumers). For instance something like:
> >>>>>>>>>
> >>>>>>>>> public interface UpdateStatus {
> >>>>>>>>> public int getInsertedRows();
> >>>>>>>>> public int getUpdatedRows();
> >>>>>>>>> public int getDeletedRows();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> This should be quite easy to implement once and reuse for most of
> >>> the
> >>>>>>>>> modules. For some datastores it might not be information that we
> >>> can
> >>>>>>>>> retrieve, so the interface should also specify an extraordinary
> >>> rule;
> >>>>>>>>> e.g. "if -1 is returned it means that it is not known how many
> >> rows
> >>>>>>>>> where inserted/updated/deleted"
> >>>>>>>>>
> >>>>>>>>> Kasper
> >>>>>>>>>
> >>>>>>>>> 2013/7/24 Kasper Sørensen <[email protected]>:
> >>>>>>>>>> In the current API design of MetaModel, the
> >>>>>>>>> DataContext.executeUpdate(...)
> >>>>>>>>>> method is a void method. This was initially chosen because not
> >> all
> >>>>>>>>>> implementations have the capability to report anything about a
> >>>> particular
> >>>>>>>>>> update. But some do, for instance the no. of inserted, updated
> >> or
> >>>> deleted
> >>>>>>>>>> records from a JDBC call. It would be nice to expose this
> >>>> information
> >>>>>>>>> when
> >>>>>>>>>> available.
> >>>>>>>>>>
> >>>>>>>>>> My suggestion for this would be to let the
> >>>> DataContext.executeUpdate(...)
> >>>>>>>>>> method return an object with this information. All methods on
> >> the
> >>>> new
> >>>>>>>>> object
> >>>>>>>>>> type would be optionally returning null, if no information is
> >>>> available.
> >>>>>>>>> But
> >>>>>>>>>> when available, we can at least expose it this way.
> >>>>>>>>>>
> >>>>>>>>>> The change wouldn't have a major impact, since any project using
> >>>>>>>>> MetaModel
> >>>>>>>>>> would already need to recompile because of the namespace change
> >> to
> >>>>>>>>>> org.apache.metamodel. And the change would be compile-time
> >>>> compatible
> >>>>>>>>> with
> >>>>>>>>>> having a void return type.
> >>>>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
>

Re: [DISCUSS] get back update status after invoking executeUpdate(...)

Reply via email to