Thanks for the answer. That was helpful.

I was sooo wrong.

On 7/7/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Briggs wrote:
> > Please keep this thread going as I am also curious to know why this
> > has been 'forked'.   I am sure that most of this lies within the
> > original OPIC filter but I still can't understand why straight forward
> > lucene queries have not been used within the application.
>
> No, this has actually almost nothing to do with the scoring filters
> (which were added much later).
>
> The decision to use a different query syntax than the one from Lucene
> was motivated by a few reasons:
>
> * to avoid the need to support low-level index and searcher operations,
> which the Lucene API would require us to implement.
>
> * to keep the Nutch core largely independent of Lucene, so that it's
> possible to use Nutch with different back-end searcher implementations.
> This started to materialize only now, with the ongoing effort to use
> Solr as a possible backend.
>
> * to limit the query syntax to those queries that provide best tradeoff
> between functionality and performance, in a large-scale search engine.
>
>
> > On 7/6/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote:
>
> >> Ok, so I guess what I don't understand is what is the "Nutch query
> >> syntax"?
>
> Query syntax is defined in an informal way on the Help page in
> nutch.war, or here:
>
> http://wiki.apache.org/nutch/Features
>
> Formal syntax definition can be gleaned from
> org.apache.nutch.analysis.NutchAnalysis.jj.
>
>
>
> >>
> >> The main discussion I found on nutch-user is this:
> >> http://osdir.com/ml/search.nutch.devel/2004-02/msg00007.html
> >>     I was wondering why the query syntax is so limited.
> >>     There are no OR queries, there are no fielded queries,
> >>     or fuzzy, or approximate... Why? The underlying index
> >>     supports all these operations.
>
>
> Actually, it's possible to configure Nutch to allow raw field queries -
> you need to add a raw field query plugin for this. Pleae see
> RawFieldQueryFilter class, and existing plugins that use fielded
> queries: query-site, and query-more. Query-more / DateQueryFilter is
> especially interesting, because it shows how to use raw token values
> from a parsed query to build complex Lucene queries.
>
>
> >>
> >> I notice by looking at the or.patch file
> >> (https://issues.apache.org/jira/secure/attachment/12360659/or.patch)
> >> that one of the programs under consideration is:
> >> nutch/searcher/Query.java
> >> The code for this is distinct from
> >> lucene/search/Query.java
>
>
> See above - they are completely different classes, with completely
> different purpose. The use of the same class name is unfortunate and
> misleading.
>
> Nutch Query class is intended to express queries entered by search
> engine users, in a tokenized and parsed way, so that the rest of Nutch
> may deal with Clauses, Terms and Phrases instead of plain String-s.
>
> On the other hand, Lucene Query is intended to express arbitrarily
> complex Lucene queries - many of these queries would be prohibitively
> expensive for a large search engine (e.g. wildcard queries).
>
>
> >>
> >> It looks like this is an architecture issue that I don't understand.
> >> If nutch is an "extension" of lucene, why does it define a different
> >> Query class?
>
> Nutch is NOT an extension of Lucene. It's an application that uses
> Lucene as a library.
>
>
> >>  Why don't we just use the Lucene code to query the
> >> indexes?  Does this have something to do with the nutch webapp
> >> (nutch.war)?  What is the historical genesis of this issue (or is that
> >> even relevant)?
>
> Nutch webapp doesn't have anything to do with it. The limitations in the
> query syntax have different roots (see above).
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
"Conscious decisions by conscious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to