On 23 June 2014 at 08:54:57, Anže Starič ([email protected]) wrote:
On Mon, Jun 23, 2014 at 2:12 AM, Antonia Horincar <[email protected]> wrote: > How about the case when the admin switches from one search backend to > another, shouldn’t the appropriate index be populated with all existing > resources in BH? This is mainly why I was thinking I need to implement my own > admin commands. This can also be done using bhsearch admin command. When admin decides to switch search backends, he needs to modify search_backend setting in trac.ini and run trac-admin bhsearch upgrade. This looks like a reasonable workflow to me. We could extend the environment_needs_upgrade method in BlodhoundSearchApi to monitor for backend change and request an environment upgrade when it does, but I do not think that this is a priority. I didn’t know that we could achieve this using bhsearch upgrade, that’s why I thought I needed to implement another admin command. But that worked perfectly. > I am currently working on displaying the retrieved Solr results in the > interface. The results are currently shown in the interface, but I am working > on applying highlighting and faceting. > > Also, I have a question regarding the meta keyword parsers. How are the > DocTypeMetaKeywordParser and the other keyword parsers from > bhsearch.query_parser used? MetaKeywordParsers are just match_and_replace rules for words beginning with a $. They are used in MetaKeywordPlugin which could be summarized as: find all words that begin with a $ using a regexp match, pass the word to MetaKeywordParsers, if any of them knows the keyword it will return some text, which you use to replace the keyword string in the original query. > I understood in general what the DefaultQueryParser does, however I’m not > sure I get how parser plugins are used in Whoosh. I would like to understand > the query_parser module better because I used the DefaultQueryParser for > parsing the query. I’m not sure if this is a good idea because basically it > uses Whoosh for parsing the query, but it was easier for the moment. Should I > try to implement my own query parser for Solr? If I understand correctly, solrs expects the query as a string, which it then parses internally. If it is not too hard to reconstruct the query from whoosh, I would use the existing query parser, so you can reuse the existing security processing and meta keyword parsing. It’s actually easier to reconstruct the query from whoosh (by accessing attributes of query objects created by Whoosh), because otherwise I would have to implement a parser to correctly parse a raw query, which in my opinion is much more difficult to achieve. If you want to know more about how whoosh parses queries, here is a short description. Whoosh parses queries with a bunch of match and filter plugins. Match plugins try to match the word to a predefined regular expression and emit a node class upon match. Filters then modify the generated list of nodes to group nodes based on operator priority, manage terms without defined fields etc. MetaKeywordPlugin is both a matcher and a filter. It matches all words starting with a $ and passes them to MetaKeywordParsers. If a MetaKeywordParsers understands a keyword, it expands it into a new string ($ticket -> type:ticket), which is again parsed by whoosh. Parsed representation of the expanded meta keyword is stored inside a MetaKeywordNode. In the filter phase, MetaKeywordPlugin "flattens" the meta keywords (replaces meta keyword nodes with the parsed representation of the expanded text. Thanks for the description, this really helped me understand more about the Whoosh query parser. Anze [1] https://github.com/apache/bloodhound/blob/trunk/bloodhound_search/bhsearch/api.py#L402 I am currently working on adding a “More like this” feature. At first, I was thinking of automatically displaying similar query results when a query has been made. But due to some limitations of Sunburnt, this would mean making two different requests to Solr (one for getting the query results, and one for getting the results that are similar to the initially retrieved results). Would it be better to have a “More like this” button next to the query results, and if a user chooses to see similar results, then a new request would be made to Solr? I began to implement the ITemplateStreamFilter interface for adding a button on the search results page, but haven’t successfully finished yet. Also, you might have noticed I called the paginate(rows=20000) method on the query chain (in the bhsolr.solr_backend query() method). By default, Solr fetches only 10 documents when a query has been made. And there is no way for fetching all query results [1]. I was thinking of a reasonable solution, but I would like your opinion on this matter: So, a solution for this would be to specify a maximum number of results to be retrieved. And the number should be as close as possible to the number of documents stored in the index. So I could keep track of how many documents were added to the index, and update the ‘rows’ parameter every time the number changes. Another solution would be to make multiple smaller queries (with the ‘rows’ parameter set to the max number of results per page) until all results have been fetched. How should I proceed in implementing this? Thanks, Antonia [1] https://issues.apache.org/jira/browse/SOLR-534
