Hey Michael, On Oct 12, 2011, at 9:43 AM, Starch, Michael D (388L) wrote:
> > > Here is another related question: > > In our branch we have three catalog functions that have very similar database > back ends, Query, ComplexQuery, and PagedQuery. Unfortunately, complexQuery > performs its work by first running "query" and then running individual > metadata requests for each id returned. This is inefficient from a database > perspective as you are running many many queries, when a single query would > suffice (and that single query was run once to get the list to begin with). All of those functions perform that way, right? IOW, doesn't query and pagedQuery also work that way? But yes we could optimize it by reducing the amount of times we have to query I believe. > > According to our DBA we will see big gains if we eliminate this loop, and the > complexities of sorting the metadata have been solved (that was what yielded > my pervious question). It would be really awesome to do some metrics on this because what's interesting is that the WHERE clause fields on subsequent queries should be over productIDs which themselves are indexed and thus should not be too computationally expensive. They certainly involve computation but my wonder is how much optimization you'll gain at the cost of trying to engineer around this and if it's negligible. > Unfortunately, that means moving some complex query code into the catalog, > and thus needing to return 2 completely different types from one method > "query". I think the longer term solution would be to make complexQuery itself a pagedMethod, and maybe even to get rid of complexQuery, and evolve pagedQuery to take a ComplexQuery object (right now it takes a Query, but ComplexQuery extends Query, right?). Yes, this would involve making the other catalogs support this, but it's probably more architecturally sound in the end. On the other hand, it doesn't make your life a whole lot easier, so I could understand if your answer was: "Don't have time at this point." > My first instinct is to add a complexQuery method to the catalog interface > (bad as it breaks older interfaces), Yep, I wouldn't' be in support of that at the Catalog level. > or sub-interface the catalog interface and add this method (better because > old catalogs would work as they do now), but seeing as you would like me to > move this feature up to apache (assuming we can properly page it), perhaps > you have a better solution that will keep our branch more compatible with > apache, so I have less work to do to migrate my changes. What do you think of my proposal above? To evolve pagedQuery to understand complexQuery (and thus to get the advantage of having complexQuery's be paged, which we're currently missing). Cheers, Chris > > On 11.10.2011, at 21:00, Chris A Mattmann wrote: > >> Hi All, >> >> On Oct 11, 2011, at 7:40 PM, Brian Foster wrote: >> >>> the problem with implementing an equals and hashCode function for the >>> Product object is that it is not always created from db data... many of the >>> objects in the structs package are 'fill what I know at the moment'... no >>> guarantee that any one member variable in the object will always be set... >> >> I totally agree with Brian on this. The lifecycle of any one of the FM >> objects in the o.a.oodt.cas.filemgr.structs (and furthermore in any >> o.a.oodt.*.structs package) is that any of the fields of the object may (or >> may not) be filled at any point in time. It really depends on the lifecycle >> of the object, and the downstream use of them in a service, in the core, or >> in some extension point. The objects are meant to be light-weight, and not >> representative of the *full* set of information at any point in time unless >> absolutely necessary (thereby lowering the total system footprint, etc.), >> making it more light-weight, etc. >> >>> for instance when a Product is created on the client side for an ingest the >>> productId is not set until after ingestion... also the current trunk >>> filemgr's Product object doesn't have an ingested or received time attach >>> to it... at least the last time I checked it didn't... lol... >> >> +1, you are right, it's still that way, for the above stated reasons. >> >>> so an equals method which say just checked against productId and >>> productName could give a false positive in some cases... for example making >>> two sequential calls to getProductById() then calling equals (assuming we >>> implemented it) on the 2 Product objects returned would return true... but >>> if the Product was updated between the 2 calls, equals really should return >>> false because the first Product object is out of date... >> >> +1 >> >>> and doing a deep equals on the Product object would make the operation >>> expensive... the Product object is more meant to be an information >>> carrier... I would recommend storing your Products in a Map<String,Product> >>> where the String key is ProductId >> >> +1, agreed. Using a Map<String, Product> structure is a good way to obviate >> this, and then to define some locally uniqueness key function inside of that >> map (or accept the uniqueness of the product ID which *should be* unique at >> least within a single FM catalog). >> >> Cheers, >> Chris >> >>> On Oct 11, 2011, at 4:46 PM, "Starch, Michael D (388L)" >>> <[email protected]> wrote: >>> >>>> Chris et all, >>>> >>>> Do you see any problems overriding the default equals, and hashCode >>>> methods in the Product class (checking by memory address/reference) to >>>> something that checks to see if the products logically represent the same >>>> thing (same id, name, etc)? >>>> >>>> My issue is the following, I receive data back from the database, with >>>> multiple lines representing a single product (this is a database thing, >>>> and the desired behavior). Thus if I iterate across the results, I will >>>> get multiple Product objects that represent one real Product (and contain >>>> equivalent member variables). In essence they are the same "Product". I >>>> can write cleaner, faster, code to combine the results, if I can test them >>>> for equality and hash them directly, without first pulling out the >>>> productName, or Id. >>>> >>>> This will be a problem if there is some code that expects two "Products" >>>> that have identical member variables to fail the equality test if they are >>>> distinct objects. >>>> >>>> Thanks, >>>> >>>> -Michael >>>> >>>> >>>> >>>> >>>> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
