Hi,

After being able to search and show the content type, etc, now I came 
across the problem that my web pages, encoded in ISO-8859-1, are not 
"properly" indexed as the summaries and titles are missing the "non 
UTF-8" characters.

I tried specifying the property
*******************************************************************
<property>
    <name>parser.character.encoding.default</name>
    <value>ISO-8859-1</value>
</property>
*******************************************************************
but it made no difference.

On a related note, I can see that my documents have been properly 
identified with the "language-identifier" plugin and I can see the 
"lang" detail on the hits. However, I'm trying to do a search limited to 
the documents in one given language but I cannot get the query to 
identify which language I'm talking about.
I tried using  the same way one can search documents from one site using 
"site:my.site.com criteria" but using lang, language, Language... but 
nothing works and I can see in the logs:
**************
061211 145357 10 query: lang:ca CRUE
061211 145357 10 Language: null <<<<<<< here it should read ca??
061211 145357 10 searching for 20 raw hits
**************
I tried browsing the documentation and searching the web but I could not 
find explicit information on how to build the query to make use of that 
field, now that I know the documents are properly indexed.

Any hints on those subjects?

Thanks in advance,
D.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to