Hi,

  I was wondering if anyone knew of a resource, or could concisely
explain, how the javacc-generated default nutch analyzer goes about
tokenizing text.  What I'm really looking for is a plain, nuts'n'bolts
explanation of what gets tokenized, and what doesn't.  I searched the
web for a while but found no good resource.  (I'm not looking for the
JAVA docs)

  Unfortunately, NutchAnalysis.jj, and NutchAnalysis.java are somewhat
opaque to me, and documentation in these files is minimal.



  Files are located here (I'm using v.0.8.1):
  nutch-0.8.1/src/java/org/apache/nutch/analysis/

  Any input will be greatly appreciated!

          joe

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to