Re: Site update
Hey Otis, Weird, must have been caused when I checked out SVN and updated the site last time? Anyways, I ran chmod -R g+w from the top level nutch site checkout, so give it a try now... Cheers, Chris On 1/6/09 8:30 AM, Otis Gospodnetic ogjunk-nu...@yahoo.com wrote: Hm, permission problem. I *think* I need Chris' help with this: [o...@minotaur /www/lucene.apache.org/nutch]$ svn up svn: Can't open file 'skin/translations/.svn/lock': Permission denied [o...@minotaur /www/lucene.apache.org/nutch]$ groups otis apcvs jakarta incubator lucene [o...@minotaur /www/lucene.apache.org/nutch]$ umask 0022 [o...@minotaur /www/lucene.apache.org/nutch]$ ls -al skin/translations/.svn/ total 18 drwxr-xr-x 6 mattmann lucene 512 Apr 6 2007 . == lucene group, but not writable; Chris owns it drwxr-xr-x 3 mattmann lucene 512 Apr 6 2007 .. -r--r--r-- 1 mattmann lucene 109 Apr 6 2007 all-wcprops -r--r--r-- 1 mattmann lucene 260 Apr 6 2007 entries -r--r--r-- 1 mattmann lucene2 Apr 6 2007 format drwxr-xr-x 2 mattmann lucene 512 Apr 6 2007 prop-base drwxr-xr-x 2 mattmann lucene 512 Apr 6 2007 props drwxr-xr-x 2 mattmann lucene 512 Apr 6 2007 text-base drwxr-xr-x 5 mattmann lucene 512 Apr 6 2007 tmp It looks like the problem is that I don't have write permissions there: [o...@minotaur /www/lucene.apache.org/nutch]$ touch skin/translations/.svn/foo touch: skin/translations/.svn/foo: Permission denied [o...@minotaur /www/lucene.apache.org/nutch]$ chmod g+w skin/translations/.svn chmod: skin/translations/.svn: Operation not permitted Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Otis Gospodnetic ogjunk-nu...@yahoo.com To: nutch-dev@lucene.apache.org Sent: Monday, January 5, 2009 5:24:39 PM Subject: Re: Site update One more thing. Forrest 0.8 wouldn't generate site files without me making the following change (so I'll commit this, too, unless somebody thinks this is bad): $ svn diff src/site Index: src/site/forrest.properties === --- src/site/forrest.properties (revision 729973) +++ src/site/forrest.properties (working copy) @@ -73,6 +73,7 @@ #forrest.validate.stylesheets=${forrest.validate} #forrest.validate.skins=${forrest.validate} #forrest.validate.skins.stylesheets=${forrest.validate.skins} +forrest.validate.sitemap=false # *.failonerror=(true|false) - stop when an XML file is invalid #forrest.validate.failonerror=true Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Otis Gospodnetic To: Nutch Developer List Sent: Monday, January 5, 2009 5:21:04 PM Subject: Site update Hello, Quick heads up - I'm about to regenerate the files (HTML + PDF) for the site and update it tomorrow according to the instructions on http://wiki.apache.org/nutch/Website_Update_HOWTO . I have Forrest 0.8, and the site files were last generated with Forrest 0.7, so there will be some changes that are the result of this version increase. Locally, all HTML and PDF files generated with 0.8 look fine. I haven't done this before for Nutch, so if there is something I should pay attention to, please let me know. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
[jira] Created: (NUTCH-677) Segment merge filering based on segment content
Segment merge filering based on segment content --- Key: NUTCH-677 URL: https://issues.apache.org/jira/browse/NUTCH-677 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Marcin Okraszewski Fix For: 0.9.0 I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: MergeFilter.patch The patch for 0.9 Segment merge filering based on segment content --- Key: NUTCH-677 URL: https://issues.apache.org/jira/browse/NUTCH-677 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Marcin Okraszewski Fix For: 0.9.0 Attachments: MergeFilter.patch, SegmentMergeFilter.java I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilter.java The filter interface (referred by the patch). Segment merge filering based on segment content --- Key: NUTCH-677 URL: https://issues.apache.org/jira/browse/NUTCH-677 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Marcin Okraszewski Fix For: 0.9.0 Attachments: MergeFilter.patch, SegmentMergeFilter.java I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: - Attachment: SegmentMergeFilters.java Merge filter aggregation which hides extension point, etc. It is referred by the patch. Segment merge filering based on segment content --- Key: NUTCH-677 URL: https://issues.apache.org/jira/browse/NUTCH-677 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Marcin Okraszewski Fix For: 0.9.0 Attachments: MergeFilter.patch, SegmentMergeFilter.java, SegmentMergeFilters.java I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.