Re: Nutch ScoringFilter plugin problems

2009-01-26 Thread Pau
Hello,
I still have the same problem. I have the following piece of code

  if (linkdb == null) {
System.out.println(Null linkdb);
  } else {
System.out.println(LinkDB not null);
  }
  Inlinks inlinks = linkdb.getInlinks(url);
  System.out.println(a);

On the output I can see it always prints LinkDB not null, so linkdb is not
null. But a never gets printed, so I guess that at:  Inlinks inlinks =
linkdb.getInlinks(url);  there is some error. Maybe the getInlinks function
throws an IOException?
I do catch the IOException, but the catch block is never executed either.

One question, how should I create the LinkDBReader? I do it the following
way:
 linkdb = new LinkDbReader(getConf(), new Path(crawl/linkdb));
Is it right? Thanks.


On Wed, Jan 21, 2009 at 10:16 AM, Pau pau...@gmail.com wrote:

 Ok, I think you are right, maybe inlinks is null. I will try it now.
 Thank you!
 I have no information about the exception. It seems that simply the program
 skips this part of the code... maybe a ScoringFilterExcetion is thrown?


 On Wed, Jan 21, 2009 at 9:47 AM, Doğacan Güney doga...@gmail.com wrote:

 On Tue, Jan 20, 2009 at 7:18 PM, Pau pau...@gmail.com wrote:
  Hello,
  I want to create a new ScoringFilter plugin. In order to evaluate how
  interesting a web page is, I need information about the link structure
 in
  the LinkDB.
  In the method updateDBScore, I have the following lines (among others):
 
  88linkdb = new LinkDbReader(getConf(),
 new
  Path(crawl/linkdb));
  ...
  99System.out.println(Inlinks to  +
 url);
 100Inlinks inlinks =
 linkdb.getInlinks(url);
 101System.out.println(a);
 102IteratorInlink iIt =
 inlinks.iterator();
 103System.out.println(b);
 
  a always gets printed, but b rarely gets printed, so this seems that
 in
  line 102 an error happens, and an exeception is raised. Do you know why
 this
  is happening? What am I doing wrong? Thanks.
 

 Maybe there are no inlinks to that page so inlinks is null? What is
 the exception
 exactly?

 



 --
 Doğacan Güney





Re: Nutch ScoringFilter plugin problems

2009-01-21 Thread Pau
Ok, I think you are right, maybe inlinks is null. I will try it now. Thank
you!
I have no information about the exception. It seems that simply the program
skips this part of the code... maybe a ScoringFilterExcetion is thrown?

On Wed, Jan 21, 2009 at 9:47 AM, Doğacan Güney doga...@gmail.com wrote:

 On Tue, Jan 20, 2009 at 7:18 PM, Pau pau...@gmail.com wrote:
  Hello,
  I want to create a new ScoringFilter plugin. In order to evaluate how
  interesting a web page is, I need information about the link structure in
  the LinkDB.
  In the method updateDBScore, I have the following lines (among others):
 
  88linkdb = new LinkDbReader(getConf(),
 new
  Path(crawl/linkdb));
  ...
  99System.out.println(Inlinks to  +
 url);
 100Inlinks inlinks =
 linkdb.getInlinks(url);
 101System.out.println(a);
 102IteratorInlink iIt =
 inlinks.iterator();
 103System.out.println(b);
 
  a always gets printed, but b rarely gets printed, so this seems that
 in
  line 102 an error happens, and an exeception is raised. Do you know why
 this
  is happening? What am I doing wrong? Thanks.
 

 Maybe there are no inlinks to that page so inlinks is null? What is
 the exception
 exactly?

 



 --
 Doğacan Güney



Nutch ScoringFilter plugin problems

2009-01-20 Thread Pau
Hello,
I want to create a new ScoringFilter plugin. In order to evaluate how
interesting a web page is, I need information about the link structure in
the LinkDB.
In the method updateDBScore, I have the following lines (among others):

88linkdb = new LinkDbReader(getConf(), new
Path(crawl/linkdb));
...
99System.out.println(Inlinks to  + url);
   100Inlinks inlinks = linkdb.getInlinks(url);
   101System.out.println(a);
   102IteratorInlink iIt = inlinks.iterator();
   103System.out.println(b);

a always gets printed, but b rarely gets printed, so this seems that in
line 102 an error happens, and an exeception is raised. Do you know why this
is happening? What am I doing wrong? Thanks.


Troubles while creating a plugin

2008-11-26 Thread Pau
Hello,
I am creating a plugin for Nutch that extends the QueryFilter.
I get a successful compilation with ant and ant war, but when I do a
search, I get the following exception:

26/11/2008 18:50:07 org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: org/apache/commons/codec/DecoderException
at
org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:272)
at
org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:221)
at
org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:201)
at
org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:164)
at
org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:138)
at
org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:121)
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:56)
at org.apache.nutch.util.MimeUtil.init(MimeUtil.java:62)
at org.apache.nutch.protocol.Content.init(Content.java:85)
at
org.apache.nutch.personalizedsearch.searcher.context.ContextQueryFilter.filter(ContextQueryFilter.java:55)
at
org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:111)
at
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:96)
at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:251)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:284)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

The DecoderException class is in commons-codec-1.3.jar, so I added the jar
file to my plugin.xml:
   runtime
  !-- As defined in build.xml this plugin will end up bundled as
recommended.jar --
  library name=personalized-search.jar
 export name=*/
  /library
  library name=commons-codec-1.3.jar /
   /runtime

But the same error appears. Any idea on what I may be doing wrong?
Thanks.


Retrieving text content from html files

2008-11-17 Thread Pau
Hello,
I am developing a Nutch plugin that needs to read the text content from some
URL's. I think that parse-html plugin contains the necessary code to do so,
but I don't know what methods to use and how to use them. What should I do?
Thanks.


Re: Writing a plugin

2008-05-12 Thread Pau
Hi,
I have added my plugin (called recommended) to nutch-site.xml but it seems
that Nutch is not using it.
I say this because when search for recom I get no results, but there is a
page that has the meta-tag:
meta name=recommended content=recom/

I have attached my nutch-site.xml and nutch-default.xml files, maybe you see
something wrong.
Apart from that, my plugin compiles ok, but when I run ant test I get
errors. I have also attached the output for ant test.

On Sun, May 11, 2008 at 8:08 PM, [EMAIL PROTECTED] wrote:

 Hi,

 Yes, you have to add your plugin to nutch-site.xml, along with other
 plugins you probably already have defined there.  If you don't have them in
 nutch-site.xml, look at nutch-default.xml

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Pau [EMAIL PROTECTED]
  To: nutch-dev@lucene.apache.org
  Sent: Sunday, May 11, 2008 8:28:53 AM
  Subject: Writing a plugin
 
  Hello,
  I am following the WritingPluginExample-0.9 and I am a bit confused
 about
  how to get nutch to use my plugin.
  In the section called Getting Ant to Compile Your Plugin it says:
  The next time you run a crawl your parser and index filter should get
  used.
  But at the end of the document, there is another section called Getting
  Nutch to Use Your Plugin.
  Do I have to edit the nutch-site.xml file as Getting Nutch to Use Your
  Plugin says? Or it is not necessary?
  Thank you.


?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
  namehttp.agent.name/name
  valuePauSpider/value
  descriptionHTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

	http.robots.agents
	http.agent.description
	http.agent.url
	http.agent.email
	http.agent.version

  and set their values appropriately.

  /description
/property

property
  namehttp.agent.description/name
  valueNutch Crawler/value
  descriptionFurther description of our bot- this text is used in
  the User-Agent header.  It appears in parenthesis after the agent name.
  /description
/property

property
  namehttp.agent.email/name
  value[EMAIL PROTECTED]/value
  descriptionDescription/description
/property

property
  nameplugin.includes/name
  valuerecommended|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)/value
  descriptionRegular expression naming plugin id names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins.
  /description
/property
/configuration
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--
!-- Do not modify this file directly.  Instead, copy entries that you --
!-- wish to modify from this file into nutch-site.xml and change them --
!-- there.  If nutch-site.xml does not already exist, create it.  --

configuration

!-- file properties --

property
  namefile.content.limit/name
  value65536/value
  descriptionThe length limit for downloaded content, in bytes.
  If this value is nonnegative (=0), content longer than it will be truncated;
  otherwise, no truncation at all.
  /description
/property

property
  namefile.content.ignored/name
  valuetrue/value
  descriptionIf true, no file content will be saved during fetch.
  And it is probably what we want to set most of time, since file:// URLs
  are meant to be local and we can always use them directly at parsing
  and indexing stages. Otherwise file contents will be saved.
  !! NO IMPLEMENTED YET !!
  /description
/property

!-- HTTP properties --

property
  namehttp.agent.name/name
  value/value
  descriptionHTTP 'User-Agent' request header. MUST NOT be empty - 
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

	http.robots.agents

Re: Problem compiling plugins

2008-05-10 Thread Pau
Thank you very much! It worked.
I just downloaded ant-trax.jar and added this file into the ant home's lib
directory. Then ant war was successful.

On Fri, May 9, 2008 at 7:00 PM, [EMAIL PROTECTED] wrote:

 Hi,

 You are missing some ant jars.  I'm not sure which ones, but it looks like
 the class that cannot be found is TraXLiaison , so once you google you'll
 find which optional ant jar this is in.  Get that jar, put it in your ant
 home's lib dir and try again.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Pau [EMAIL PROTECTED]
  To: nutch-dev@lucene.apache.org
  Sent: Friday, May 9, 2008 4:32:08 AM
  Subject: Problem compiling plugins
 
  Hello,
  I have to implement a plugin for Nutch 0.9, so I have followed the
  WritingPluginExample-0.9.
  When I try to compile the plugins I get warnings about
  nutch-extensionpoints.jar:
[jar] Warning: skipping jar archive
 
 /home/pau/Pau/Master/Tesis/nutch-0.9/build/nutch-extensionpoints/nutch-extensionpoints.jar
  because no files were in
  Why do I get this warning?
 
  Furthermore, when I try to compile the .war file with the command 'ant
 war',
  I get the following error:
  generate-locale:
   [echo] Generating docs for locale=ca
   [xslt] java.lang.ClassNotFoundException:
  org.apache.tools.ant.taskdefs.optional.TraXLiaison
   [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
   [xslt] at java.security.AccessController.doPrivileged(Native
  Method)
   [xslt] at
  java.net.URLClassLoader.findClass(URLClassLoader.java:188)
   [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
   [xslt] at
  java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
   [xslt] at java.lang.Class.forName0(Native Method)
   [xslt] at java.lang.Class.forName(Class.java:169)
   [xslt] at
  org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548)
   [xslt] at
 
 org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533)
   [xslt] at
 
 org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785)
   [xslt] at
  org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300)
   [xslt] at
  org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
   [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
  Source)
   [xslt] at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
   [xslt] at
 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
   [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
   [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
   [xslt] at
 org.apache.tools.ant.Target.performTasks(Target.java:385)
   [xslt] at
  org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
   [xslt] at
 
 org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
   [xslt] at
  org.apache.tools.ant.Project.executeTargets(Project.java:1181)
   [xslt] at
 org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:416)
   [xslt] at
  org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105)
   [xslt] at
  org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
   [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
  Source)
   [xslt] at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
   [xslt] at
 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
   [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
   [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
   [xslt] at
 org.apache.tools.ant.Target.performTasks(Target.java:385)
   [xslt] at
  org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
   [xslt] at
  org.apache.tools.ant.Project.executeTarget(Project.java:1298)
   [xslt] at
 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
   [xslt] at
  org.apache.tools.ant.Project.executeTargets(Project.java:1181)
   [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:698)
   [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:199)
   [xslt] at
  org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
   [xslt] at
  org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)
 
  BUILD FAILED
  /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:442: The following error
  occurred while executing this line

Problem compiling plugins

2008-05-09 Thread Pau
Hello,
I have to implement a plugin for Nutch 0.9, so I have followed the
WritingPluginExample-0.9.
When I try to compile the plugins I get warnings about
nutch-extensionpoints.jar:
  [jar] Warning: skipping jar archive
/home/pau/Pau/Master/Tesis/nutch-0.9/build/nutch-extensionpoints/nutch-extensionpoints.jar
because no files were in
Why do I get this warning?

Furthermore, when I try to compile the .war file with the command 'ant war',
I get the following error:
generate-locale:
 [echo] Generating docs for locale=ca
 [xslt] java.lang.ClassNotFoundException:
org.apache.tools.ant.taskdefs.optional.TraXLiaison
 [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 [xslt] at java.security.AccessController.doPrivileged(Native
Method)
 [xslt] at
java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
 [xslt] at
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
 [xslt] at java.lang.Class.forName0(Native Method)
 [xslt] at java.lang.Class.forName(Class.java:169)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300)
 [xslt] at
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
 [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
Source)
 [xslt] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
 [xslt] at
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
 [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
 [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
 [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385)
 [xslt] at
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
 [xslt] at
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
 [xslt] at
org.apache.tools.ant.Project.executeTargets(Project.java:1181)
 [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:416)
 [xslt] at
org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105)
 [xslt] at
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
 [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
Source)
 [xslt] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
 [xslt] at
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
 [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
 [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
 [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385)
 [xslt] at
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
 [xslt] at
org.apache.tools.ant.Project.executeTarget(Project.java:1298)
 [xslt] at
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
 [xslt] at
org.apache.tools.ant.Project.executeTargets(Project.java:1181)
 [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:698)
 [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:199)
 [xslt] at
org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
 [xslt] at
org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)

BUILD FAILED
/home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:442: The following error
occurred while executing this line:
/home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:408:
java.lang.ClassNotFoundException:
org.apache.tools.ant.taskdefs.optional.TraXLiaison

Could you please help me with it?
Thank you very much.