Actually, its expected that every element will be matched by at least one 
query. This is a classification application and the intent of the application 
is that every element of interest will be classified. Many, if not most, of the 
queries depend on word-search features, e.g., stemmed matches, case 
insensitivity, etc. 

I’m new to this project so it may be that there is a better way to approach the 
problem in general. This is the system as currently implemented.

My overall charge is to improve the throughput performance so my first task is 
to first understand what the performance bottlenecks are then identify possible 
solutions.

It seems unlikely that we’ve done something silly in our queries or ML 
configuration but I want to eliminate the easy-to-fix before exploring more 
complicated options. 

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 5/1/17, 12:10 PM, "Jason Hunter" <[email protected] on 
behalf of [email protected]> wrote:

    > The processing is, for each document to be processed, examine on the 
order of 10-20 elements to see if they match the reverse query by getting the 
node to be looked up and then doing:
    
    Maybe you can reverse query on the document as a whole instead of running 
20 reverse queries per document.  Only bother with the enumeration of the 20 if 
there's a proven hit within the document.
    
    (I assume the vast majority of the time there's not going to be hits.  If 
that's true then why not prove that in one pop instead of 20 pops.)
    
    -jh-
    
    _______________________________________________
    General mailing list
    [email protected]
    Manage your subscription at: 
    http://developer.marklogic.com/mailman/listinfo/general
    
    


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to