Actually, its expected that every element will be matched by at least one query. This is a classification application and the intent of the application is that every element of interest will be classified. Many, if not most, of the queries depend on word-search features, e.g., stemmed matches, case insensitivity, etc.
I’m new to this project so it may be that there is a better way to approach the problem in general. This is the system as currently implemented. My overall charge is to improve the throughput performance so my first task is to first understand what the performance bottlenecks are then identify possible solutions. It seems unlikely that we’ve done something silly in our queries or ML configuration but I want to eliminate the easy-to-fix before exploring more complicated options. Cheers, Eliot -- Eliot Kimber http://contrext.com On 5/1/17, 12:10 PM, "Jason Hunter" <[email protected] on behalf of [email protected]> wrote: > The processing is, for each document to be processed, examine on the order of 10-20 elements to see if they match the reverse query by getting the node to be looked up and then doing: Maybe you can reverse query on the document as a whole instead of running 20 reverse queries per document. Only bother with the enumeration of the 20 if there's a proven hit within the document. (I assume the vast majority of the time there's not going to be hits. If that's true then why not prove that in one pop instead of 20 pops.) -jh- _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
