Thanks for the answer. That was helpful. I was sooo wrong.
On 7/7/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Briggs wrote: > > Please keep this thread going as I am also curious to know why this > > has been 'forked'. I am sure that most of this lies within the > > original OPIC filter but I still can't understand why straight forward > > lucene queries have not been used within the application. > > No, this has actually almost nothing to do with the scoring filters > (which were added much later). > > The decision to use a different query syntax than the one from Lucene > was motivated by a few reasons: > > * to avoid the need to support low-level index and searcher operations, > which the Lucene API would require us to implement. > > * to keep the Nutch core largely independent of Lucene, so that it's > possible to use Nutch with different back-end searcher implementations. > This started to materialize only now, with the ongoing effort to use > Solr as a possible backend. > > * to limit the query syntax to those queries that provide best tradeoff > between functionality and performance, in a large-scale search engine. > > > > On 7/6/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote: > > >> Ok, so I guess what I don't understand is what is the "Nutch query > >> syntax"? > > Query syntax is defined in an informal way on the Help page in > nutch.war, or here: > > http://wiki.apache.org/nutch/Features > > Formal syntax definition can be gleaned from > org.apache.nutch.analysis.NutchAnalysis.jj. > > > > >> > >> The main discussion I found on nutch-user is this: > >> http://osdir.com/ml/search.nutch.devel/2004-02/msg00007.html > >> I was wondering why the query syntax is so limited. > >> There are no OR queries, there are no fielded queries, > >> or fuzzy, or approximate... Why? The underlying index > >> supports all these operations. > > > Actually, it's possible to configure Nutch to allow raw field queries - > you need to add a raw field query plugin for this. Pleae see > RawFieldQueryFilter class, and existing plugins that use fielded > queries: query-site, and query-more. Query-more / DateQueryFilter is > especially interesting, because it shows how to use raw token values > from a parsed query to build complex Lucene queries. > > > >> > >> I notice by looking at the or.patch file > >> (https://issues.apache.org/jira/secure/attachment/12360659/or.patch) > >> that one of the programs under consideration is: > >> nutch/searcher/Query.java > >> The code for this is distinct from > >> lucene/search/Query.java > > > See above - they are completely different classes, with completely > different purpose. The use of the same class name is unfortunate and > misleading. > > Nutch Query class is intended to express queries entered by search > engine users, in a tokenized and parsed way, so that the rest of Nutch > may deal with Clauses, Terms and Phrases instead of plain String-s. > > On the other hand, Lucene Query is intended to express arbitrarily > complex Lucene queries - many of these queries would be prohibitively > expensive for a large search engine (e.g. wildcard queries). > > > >> > >> It looks like this is an architecture issue that I don't understand. > >> If nutch is an "extension" of lucene, why does it define a different > >> Query class? > > Nutch is NOT an extension of Lucene. It's an application that uses > Lucene as a library. > > > >> Why don't we just use the Lucene code to query the > >> indexes? Does this have something to do with the nutch webapp > >> (nutch.war)? What is the historical genesis of this issue (or is that > >> even relevant)? > > Nutch webapp doesn't have anything to do with it. The limitations in the > query syntax have different roots (see above). > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
