Re: statistics about word distances in solr
Moin Jens, Jens Fischer schrieb: I was wondering if there's an option to return statistics about distances from the query terms to the most frequent terms in the result documents. The additional information I'm looking for is the average distance between these terms and my search term. So let's say I have two docs the house is red I live in a red house The search for house should also return the info the:1 is:1 red:1.5 I:5 live:4 Could you explain what the distance here is? Something like edit distance? Ah, I see: You want the textual distance between the search term and other terms in the document, and then you want that averaged, i.e. the cumulative distance divided by the number of occurrences. No idea if that functionality is available. However, the sort of calculation you want to perform requires the engine to not only collect all the terms to present as facets (much improved in 1.4, as I've just learned), but to also analyze each document (if I'm not mistaken) to determine the distance for each facet term from your primary query term. (Or terms.) The number of lookup operations is likely to scale as the product of the number of your primary search results, the number of your search terms, and the number of your facets. I assume this is an expensive operation. Michael Ludwig
statistics about word distances in solr
Hi, I was wondering if there's an option to return statistics about distances from the query terms to the most frequent terms in the result documents. At present I return the most frequent terms using facetSearch which returns for each word in the result documents the number ob occurences (within the results). The additional information I'm looking for is the average distance between these terms and my search term. So let's say I have two docs the house is red I live in a red house The search for house should also return the info the:1 is:1 red:1.5 I:5 live:4 and so on... As I wasn't able to find such a function I thought about two solution for the problem 1) Use facetSearch and implement a different facet.method which calculates the average distance of a word to the given search term. Solr doesn't seem to provide an interface to implement a different method so I think this solution would be a bit dogdy and would lead to problems with the next Solr Upgrade. 2) Using the TermVectorComponent which return the position of each word within a document, I could calculate the distance based on this data in the application. But TermVectorComponent returns information per document which means I would need to return all documents of the result set which is probably not recommended. My question is a) Did a miss a function of Solr that already does what I'm looking for? b) Is solution 2) feasible even if I always have to return all docs of the results set (the content doesn't need to be return though, just the statistics) c) Are the interfaces to ammend facetSearch the way I described which I might have missed? Thanks Jens