I recently figured out how to get OSCache working with Nutch, so I thought it
might be useful to share that information with the list to save any future
troubles one might have.
For anyone that doesn't know what OSCache is, here is the description as per
the website;
"OSCache is a caching solution that includes a JSP tag library and set of
classes to perform fine grained dynamic caching of JSP content, servlet
responses or arbitrary objects. It provides both in memory and persistent on
disk caches, and can allow your site to have graceful error tolerance (eg if an
error occurs like your db goes down, you can serve the cached content so people
can still surf the site almost without knowing)."
OSCache offers two ways to cache data, one by which you place a specific tag
around code in your JSP source that you wish to cache and the other method
called "CacheFilter" that works in a way that caches all the JSP output.
When using Nutch, the second method is the only one that works and is most
likely what you want anyway. Here is a step-by-step installation setup for
Nutch;
1. Download the package from
http://www.opensymphony.com/oscache/download.action.
2. Extract the package and place the "oscache-*.jar" file in your Nutch package
(ROOT.war) under the "/WEB-INF/lib" directory.
3. From the OSCache package, edit the oscache.properties file with your
specific settings and place in your Nutch package under the "/WEB-INF/classes"
directory.
4. Now we need to add OSCache specific entries into our Nutch
"/WEB-INF/web.xml" file. The minimum required entries are below, with comments;
<filter>
<filter-name>CacheFilter</filter-name>
<filter-class>com.opensymphony.oscache.web.filter.CacheFilter</filter-class>
<!-- This loads the OSCache CacheFilter upon deployment. -->
<init-param>
<param-name>time</param-name>
<param-value>600</param-value>
</init-param>
<!-- This is the maximum amount of time any page will be cached in seconds. -->
</filter>
<filter-mapping>
<filter-name>CacheFilter</filter-name>
<url-pattern>*.jsp</url-pattern>
</filter-mapping>
<!-- This defines which pages the CacheFilter should cache, in this setting we
are telling it to cache all JSP files. You can change this to, for example
search.jsp to only cache search results and nothing else. -->
5. All done, now you just need to re-deploy your application with the changes.
I don't have any benchmark information available, but this can be a useful way
to speed up searches if you operate a semi-busy search engine.
As for myself, I have Google Ads on all my search result pages. The way Google
Ads operates is that each requested page is sent to Google via client-side
javascript and Google quickly sends a request for that exact page to analyze
for relevancy. This creates at least 2 requests per second for the same page.
OSCache will always serve the second request (or more) from its cache. This
reduces load, and opens up more query power for other users at the same time.
OSCache's memory usage will depend on the maximum amount of pages you wish to
store in memory, it offers the ability to store them on disk also. I hope this
helps anyone wishing to do a OSCache setup in the future.
Enjoy!
----- Original Message ----
From: Sean Dean <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, December 27, 2006 1:25:54 AM
Subject: Nutch and OSCache
I'm wondering if anyone is running OSCache with Nutch?
Ive followed there tutorial, and it seems there is a issue when wrapping any
custom tag around any flushed include, which is according to JSP specification.
I guess there is one in the Nutch JSP code stopping me?
Looking at the log output, Nutch runs fine and the point of error is the page
generation.
2006-12-27 00:59:34,427 INFO NutchBean - query: http
2006-12-27 00:59:34,427 INFO NutchBean - lang:
2006-12-27 00:59:34,453 INFO NutchBean - searching for 20 raw hits
2006-12-27 00:59:49,837 WARN NativeCodeLoader - Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2006-12-27 00:59:50,941 INFO NutchBean - total hits: 5116128
2006-12-27 00:59:50,951 WARN [jsp] - Servlet.service() for servlet jsp threw
exception
java.io.IOException: Illegal to flush within a custom tag
at javax.servlet.jsp.tagext.BodyContent.flush(BodyContent.java:79)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:416)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:334)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at
com.opensymphony.oscache.web.filter.CacheFilter.doFilter(CacheFilter.java:161)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199)
at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)
I know this isn't really a Nutch issue per say, but if anyone is running it
without problems any tips would be greatly appreciated.
Thanks,
Sean-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general