Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by DanielLopez:
http://wiki.apache.org/nutch/FAQ

The comment on the change is:
Added an entry on how to find out more information about hits

------------------------------------------------------------------------------
  
  Results as RSS (XML) rather than HTML are easier for programmatic clients to 
parse: such clients will query against 
[http://lucene.apache.org/nutch/apidocs/org/apache/nutch/searcher/OpenSearchServlet.html
 OpenSearchServlet] rather than search.jsp.  Results as XML can also be 
transformed using XSL stylesheets, the likely direction of UI development going 
forward.
  
+ ==== How can I find out/display the size and mime type of the hits that a 
search returns? ====
+ In order to be able to find this information you have to modify the standard 
{{{plugin.includes}}} property of the nutch configuration file and add the 
{{{index-more}}} filter.
+ {{{
+ <property>
+   <name>plugin.includes</name>
+   <value>...|index-more|...</value>
+   ...
+ </property>
+ }}}
+ After that, you should be able to retrieve the mime-type and content-length 
through the class HitDetails (via the fields "primaryType", "subType" and 
"contentLength") as you normally do for the title and URL of the hits.
+       (Note by DanielLopez) Thanks to Doğacan Güney for the tip.
+ 
  === Crawling ===
  
  ==== Java.io.IOException: No input directories specified in: NutchConf: 
nutch-default.xml , mapred-default.xml ====

Reply via email to