Confusion about query languages
-------------------------------

         Key: NUTCH-316
         URL: http://issues.apache.org/jira/browse/NUTCH-316
     Project: Nutch
        Type: Bug

  Components: web gui  
    Versions: 0.8-dev    
 Environment: n/a
    Reporter: KuroSaka TeruHiko


In 2006-6-16 nightly source code, src/web/jsp/search.jsp has these lines:

  String queryLang = request.getParameter("lang");
  if (queryLang == null) { queryLang = ""; }
  Query query = Query.parse(queryString, queryLang, nutchConf);

According to the observation of URLs shown in the browser, the lang parameter 
reflects the language
of the GUI (the language in which GUI elements are labeled) as the user clicks 
on the two letter code 
near the bottom of each Nutch GUI screen.

The Java API Doc on Query is not clear about what queryLang is meant.  Is this 
the language of
the query (how query should be lemmatized, if supported by the analyzer, and 
what stop word list
should be applied), is is this the language of the documents to be searched?

Although the two concepts above are closely related, they are not tied to the 
GUI language at all.

I, as Japanese user, might prefer to see all GUIs in Japanese, but I would 
still need to
search English documents for Englsh words.  The current implementation of 
search.jsp seems
to restrict search domain to the documents of the GUI language in one way (by 
treating the
terms to be from the GUI language), or the other (restricting the search domain 
to the documents
of the GI language).

To be perfect, there should be a drop-down list from which the language of 
query analyzer
is selected, and a set of check boxes from which the document languages can be 
selected,
in addition to the existing line of two letter language codes from which the 
GUI language is choosen.

But that would be too clutering.  

Google uses a separate configuration screen to let the user to choose a set of 
languages
of the documents to be searched.  That might be a good middle-of-the-road 
approach.
Because of the lack of language processing on search terms, Google does not 
need to know
the language of the query.  Nutch GUI might want to have a drop down list from 
which a language
of the query can be choosen, with the GUI language pre-selected.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to