Oops, my previous post should read "I have NOT explicitely activated those
plugins"
On 1/31/07, Nes Yarug <[EMAIL PROTECTED]> wrote:
I have explicitely activated those plugins. Could you tell me how to do
that with an example as I looked through conf/nutch-default.xml and
couldn't find any references to it. I'm using 0.8.1 by the way. They are
enabled in the build I guess as default.properties is listing them:
#
# Indexing Filter Plugins
#
plugins.index=\
org.apache.nutch.indexer.basic*:\
org.apache.nutch.indexer.more*
#
# Query Filter Plugins
#
plugins.query=\
org.apache.nutch.searcher.basic*:\
org.apache.nutch.searcher.more*:\
org.apache.nutch.searcher.site*:\
org.apache.nutch.searcher.url*
Many thanks,
Nes
On 1/31/07, Zaheed Haque <[EMAIL PROTECTED]> wrote:
>
> Unless you haven't yet.. You need to activate index-more and
> query-more plugin in nutch-site.xml
>
> You can also check the "explan link" from the search results page and
> you will see "lang" is missing if you haven't activated the index-more
> and query-more plugin..
>
> Cheers
>
> On 1/31/07, Nes Yarug <[EMAIL PROTECTED]> wrote:
> > Thank you everyone for your replies.
> >
> > I have implemented the recrawl script from
> > http://wiki.apache.org/nutch/IntranetRecrawl and that is still running
> for
> > over 12 hours so I guess that would index much more pages.
> >
> > Leaves the question about language specific search. I have tried
> adding the
> > lang: clause to my search query by appending lang:en but that is not
> > returning any results (as if lang:en would become part of the actual
> query).
> > The url then looks like this: search.jsp
> > ?query=help+lang%3Aen&hitsPerPage=10&lang=en
> >
> > Anyone has used a language specific search before, do I need to add a
> new
> > (hidden) input field on the search form to specifiy the language
> instead of
> > appending it to the query? That would be my preference anyway, as I
> want the
> > language specific search to be transparant to he user.
> >
> > Again, many thanks for any replies,
> > Nes
> >
> > On 1/30/07, Renaud Richardet <[EMAIL PROTECTED]> wrote:
> > >
> > > Nes Yarug wrote:
> > > > Hi all,
> > > >
> > > > I'm new to Nutch and I have a few questions that I hope to get
> some
> > > > answers
> > > > on. Thanks in advance for any replies.
> > > >
> > > > I want to use Nutch to index a web site I'm maintaining. I've
> followed
> > > > the
> > > > tutorial for intranet crawling and used a list of links (17420
> links
> > > > to 8710
> > > > pages, each page has two unique links) from my site to crawl
> initially.
> > > Actually, you don't need to provide a full list of links to Nutch.
> You
> > > can let it discover links as it crawl your site, and constrain them
> > > using crawl-urlfilter.txt and regex-urlfilter.txt
> > > > The
> > > > command I used was:
> > > >
> > > > bin/nutch crawl urls -dir crawl -depth 20 -topN 100
> > > >
> > > > The crawl completed, but I'm sure that when I was testing the
> search
> > > > it has
> > > > not indexed a lot of pages. What I understand from the following
> > > > command it
> > > > only indexed 1527 of 21378 pages:
> > > >
> > > > CrawlDb statistics start: crawl/crawldb
> > > > Statistics for CrawlDb: crawl/crawldb
> > > > TOTAL urls: 21378
> > > > retry 0: 20878
> > > > retry 1: 487
> > > > retry 2: 10
> > > > retry 3: 3
> > > > min score: 0.014
> > > > avg score: 84.405266
> > > > max score: 37106.03
> > > > status 1 (DB_unfetched): 19848
> > > > status 2 (DB_fetched): 1527
> > > > status 3 (DB_gone): 3
> > > > CrawlDb statistics: done
> > > >
> > > >
> > > > Now my questions:
> > > >
> > > > 1) Will Nutch automatically continue to index the rest of the URLs
> even
> > > > though te initial crawl finished (through some internal scheduler
> of
> > > some
> > > > sorts)?
> > > You will need to refetch, or better: increase the depth, until "all
> your
> > > pages" are fetched.
> > > >
> > > > 2) All of my site's pages at the moment are contained in two
> languages
> > > > (each
> > > > page has exactly two languages, the lang attribute on the html tag
> of
> > > > each
> > > > page contains the language identifier). When searching, is there a
> way
> > > to
> > > > only return pages in a specific language? I know the Nutch UI is
> > > > localised,
> > > > but it will still return pages in english if my UI language is
> German
> > > for
> > > > example. I want it to return German pages only (<html lang="de">)
> when
> > > > searching through the German UI. Is that possible?
> > > try using "lang:" in your query, I'm not sure it's working,
> though...
> > > From the javadoc: "LanguageQueryFilter.java should handles "lang:"
> > > query clauses, causing them to search the "lang" field indexed by
> > > LanguageIdentifier" (see also LanguageIndexingFilter.java).
> > >
> > > HTH,
> > > Renaud
> > >
> > >
> > > --
> > > renaud richardet +1 617 230 9112
> > > renaud <at> oslutions.com http://www.oslutions.com
> > >
> > >
> >
> >
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general