Thanks Vishal.
I had it working last night, with exactly the steps you had suggested.
I was using CentOS, Tomcat, Nutch 0.9 - after I got the basic tutorial
working I tried crawling a tamil site, the crawling worked OK - but I
couldn't search. That's where I needed help.
Here are the tiny road blocks I had to get over:
1. Tamil fonts was not installed in my CentOS, so couldn't search from
command line or browser.
"yum install tamil-fonts" fixed that problem.
2. After installing the fonts, I was able to search from Command line
3. To get the tamil search working from Nutch WAR application, I had to set
the URI encoding in the
Tomcat Connector:
http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8
Now the basic stuff are working.
Thanks Vishal and Saran for the help.
-Surya
vishal vachhani wrote:
>
> If your documents are in UTF then you directly crawl and index the
> documents
> using Nutch(tutorial is given in nutch wiki), else you need to first
> convert
> the documents into UTF-8 and then you can index. After indexing is over
> first try to search using command line searching APIs of Nutch and then
> using modify the GUI(jsp page of nutch) so that it can also search from
> GUI.
> In order to varify your index you can also use "LUKE-lucene index tool".
>
>
>
> On Mon, Jan 26, 2009 at 4:01 AM, suryas <[email protected]> wrote:
>
>>
>> Hi,
>> I want to index & search Tamil (an Indian language) pages using Nutch. I
>> have some knowledge of Lucene and just got the "Nutch Basic Tutorial"
>> working.
>>
>> Where do I look for indexing Tamil or any other Indian language pages?
>>
>> I'm looking for:
>> *step-by-step" documentation for indexing and searching foreign language
>> pages, particularly Indian languages
>> *some examples, samples, tutorials would be nice
>>
>> Or if you could just point me in the right direction, that'll be fine
>> too.
>>
>> I saw some postings from "saran" & "saravana kumar" talking about this
>> same
>> thing. Guys, did you figure this out? could you please help?
>>
>> Could someone help?
>>
>> thanks,
>> Surya
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-index-and-search-Indian-languages-tp21657719p21657719.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
>
>
--
View this message in context:
http://www.nabble.com/How-to-index-and-search-Indian-languages-tp21657719p21684449.html
Sent from the Nutch - User mailing list archive at Nabble.com.