Hello Everyone,
I am using Lucene & Nutch in my project for searching content in the webpages.
For a webpage or any other document, Lucene takes all the words in the page and
indexes them and returns the result when searched.
Lets say, I have 2 webpages as shown below:
Webpage1
----------------------------------------------------------------------
This is the course page of Computer Science Department
Subject: Operating System I
Professor: Qi Li
Details:
The course operating system I deals with the basics of the operating system.
Mainly the three topics dealt are process management, storage management &
memory mangement. etc............................................
..................................................................
----------------------------------------------------------------------
Webpage2
----------------------------------------------------------------------
This is the home page of Computer Science Department
The computer science department offers courses at undergradudate level and
graduate level. The core courses for the graduate students are Mathematical
Foundations of Computer Science, Compilers, Advanced Database, Analysis of
Algorithms and Operating Systems. etc............................
..................................................................
----------------------------------------------------------------------
Now if I search using the word "operating system", the results shows both the
webpages (webpage 1 & webpage2) since the word "operating system" exists in
both the webpage.
But my requirement is different. If I want to search the word "Operating
System" which should appear in the subject field i.e., as in the webpage1, the
result should show only webpage1. How can I achieve this result ?
Please help me in this regard.
Thanks & Regards,
Kunal Gosar
---------------------------------
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.