On Mar 10, 2014, at 15:12, Sanne Grinovero <sa...@infinispan.org> wrote:
> Ok you make some good points, and I've no doubts of it being useful. > > My only concern is that this could slow us down significantly in > providing other features which might be even more useful or pressing. > You have to pick your battles and be wise on where to spend energy > first. > > Considering that it's easier to add methods than to remove them, what > would you think of marking this as experimental for now? > I'd prefer to see the non-indexed query engine delivered first; this > sounds like being a stone on the critical path so it might be wise to > have the option to drop the requirement from a first implementation. > Definitely you're right that we should then implement "some" COUNT > strategy, I'm just not comfortable in committing on this one yet. I can imagine a lot of users emulating this by simply iterating over the entries in the result set. Even if we do just that and document it as slow, I think it's still worth exposing this somewhere. > > Now on a general purpose COUNT: for sure we need one but it's a > pandora's box you're opening. In a sense there is a parallelism > conceptually with my concerns on the API contract we provide for the > clear() method. too keep it short in this context as we're changing > subject, I don't think we'll ever be able to provide a solid guarantee > of a fully reliable value: indexes are not updated in transaction yet, > and M/R does cross boundaries of nodes and datacontainer/cachestore > without making a consistent read snapshot. We should document any such > API as to providing a best effort estimate. > > > > On 10 March 2014 13:16, Adrian Nistor <anis...@redhat.com> wrote: >> I'd vote for keeping it, and executing it lazily in environments where it is >> costly to compute it upfront. >> >> And off course, document this properly so users will be aware it can incur a >> second execution, with significant performance impact and also possibly a >> data visibility/consistency impact. I'd do this because the api is meant to >> be first of all user friendly and useful, not just machine friendly and >> efficient. >> >> There's another reason for having it. Say we remove it, how will users be >> able to know the total number of matching results? Our DSL does not >> currently have a 'count' function. Maybe we should add such a thing first, >> and then think about removing Query.getResultsSize(). >> >> But, if we implement a proper 'count', getResultsSize() could be trivially >> implemented as some kind of syntactic sugar on top of it, so I would still >> consider it worth being in the API. >> >> And then it all boils down to the question: should the DSL provide a count >> function? (+1 from me) >> >> Cheers >> >> >> On 03/10/2014 02:23 PM, Sanne Grinovero wrote: >> >> Hi all, >> we are exposing a nice feature inherited from the Search engine via >> the "simple" DSL version, the one which is also available via Hot Rod: >> >> org.infinispan.query.dsl. >> Query.getResultSize() >> >> To be fair I hadn't noticed we do expose this, I just noticed after a >> recent PR review and I found it surprising. >> >> This method returns the size of the full resultset, disregarding >> pagination options; you can imagine it fit for situations like: >> >> "found 6 million matches, these are the top 20: " >> >> A peculiarity of Hibernate Search is that the total number of matches >> is extremely cheap to figure out as it's generally a side effect of >> finding the 20 results. Essentially we're just exposing an int value >> which was already computed: very cheap, and happens to be useful in >> practice. >> >> This is not the case with a SQL statement, in this case you'd have to >> craft 2 different SQL statements, often incurring the cost of 2 round >> trips to the database. So this getResultSize() is not available on the >> Hibernate ORM Query, only on our FullTextQuery extension. >> >> Now my doubt is if it is indeed a wise move to expose this method on >> the simplified DSL. Of course some people might find it useful, still >> I'm wondering how much we'll be swearing at needing to maintain this >> feature vs its usefulness when we'll implement alternative execution >> engines to run queries, not least on Map/Reduce based filtering, and >> ultimately hybrid strategies. >> >> In case of Map/Reduce I think we'll need to keep track of possible >> de-duplication of results, in case of a Teiid integration it might >> need a second expensive query; so in this case I'd expect this method >> to be lazily evaluated. >> >> Should we rather remove this functionality? >> >> Sanne >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev