Re: Site update

2009-01-08 Thread Mattmann, Chris A
Hey Otis,

Weird, must have been caused when I checked out SVN and updated the site
last time?

Anyways, I ran chmod -R g+w from the top level nutch site checkout, so give
it a try now...

Cheers,
Chris



On 1/6/09 8:30 AM, Otis Gospodnetic ogjunk-nu...@yahoo.com wrote:

 Hm, permission problem.  I *think* I need Chris' help with this:

 [o...@minotaur /www/lucene.apache.org/nutch]$ svn up
 svn: Can't open file 'skin/translations/.svn/lock': Permission denied

 [o...@minotaur /www/lucene.apache.org/nutch]$ groups
 otis apcvs jakarta incubator lucene

 [o...@minotaur /www/lucene.apache.org/nutch]$ umask
 0022

 [o...@minotaur /www/lucene.apache.org/nutch]$ ls -al skin/translations/.svn/
 total 18
 drwxr-xr-x  6 mattmann  lucene  512 Apr  6  2007 .  == lucene
 group, but not writable; Chris owns it
 drwxr-xr-x  3 mattmann  lucene  512 Apr  6  2007 ..
 -r--r--r--  1 mattmann  lucene  109 Apr  6  2007 all-wcprops
 -r--r--r--  1 mattmann  lucene  260 Apr  6  2007 entries
 -r--r--r--  1 mattmann  lucene2 Apr  6  2007 format
 drwxr-xr-x  2 mattmann  lucene  512 Apr  6  2007 prop-base
 drwxr-xr-x  2 mattmann  lucene  512 Apr  6  2007 props
 drwxr-xr-x  2 mattmann  lucene  512 Apr  6  2007 text-base
 drwxr-xr-x  5 mattmann  lucene  512 Apr  6  2007 tmp



 It looks like the problem is that I don't have write permissions there:

 [o...@minotaur /www/lucene.apache.org/nutch]$ touch skin/translations/.svn/foo
 touch: skin/translations/.svn/foo: Permission denied

 [o...@minotaur /www/lucene.apache.org/nutch]$ chmod g+w skin/translations/.svn
 chmod: skin/translations/.svn: Operation not permitted

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Otis Gospodnetic ogjunk-nu...@yahoo.com
 To: nutch-dev@lucene.apache.org
 Sent: Monday, January 5, 2009 5:24:39 PM
 Subject: Re: Site update

 One more thing.  Forrest 0.8 wouldn't generate site files without me making
 the
 following change (so I'll commit this, too, unless somebody thinks this is
 bad):

 $ svn diff src/site
 Index: src/site/forrest.properties
 ===
 --- src/site/forrest.properties (revision 729973)
 +++ src/site/forrest.properties (working copy)
 @@ -73,6 +73,7 @@
 #forrest.validate.stylesheets=${forrest.validate}
 #forrest.validate.skins=${forrest.validate}
 #forrest.validate.skins.stylesheets=${forrest.validate.skins}
 +forrest.validate.sitemap=false

 # *.failonerror=(true|false) - stop when an XML file is invalid
 #forrest.validate.failonerror=true


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Otis Gospodnetic
 To: Nutch Developer List
 Sent: Monday, January 5, 2009 5:21:04 PM
 Subject: Site update

 Hello,

 Quick heads up - I'm about to regenerate the files (HTML + PDF) for the site
 and
 update it tomorrow according to the instructions on
 http://wiki.apache.org/nutch/Website_Update_HOWTO .  I have Forrest 0.8, and
 the
 site files were last generated with Forrest 0.7, so there will be some
 changes

 that are the result of this version increase.  Locally, all HTML and PDF
 files

 generated with 0.8 look fine.  I haven't done this before for Nutch, so if
 there
 is something I should pay attention to, please let me know.


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




[jira] Created: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)
Segment merge filering based on segment content
---

 Key: NUTCH-677
 URL: https://issues.apache.org/jira/browse/NUTCH-677
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Marcin Okraszewski
 Fix For: 0.9.0


I needed a segment filtering based on meta data detected during parse phase. 
Unfortunately current URL based filtering does not allow for this. So I have 
created a new SegmentMergeFilter extension which receives segment entry which 
is being merged and decides if it should be included or not. Even though I 
needed only ParseData for my purpose I have done it a bit more general purpose, 
so the filter receives all merged data.

The attached patch is for version 0.9 which I use. Unfortunately I didn't have 
time to check how it fits to trunk version. Sorry :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Okraszewski updated NUTCH-677:
-

Attachment: MergeFilter.patch

The patch for 0.9

 Segment merge filering based on segment content
 ---

 Key: NUTCH-677
 URL: https://issues.apache.org/jira/browse/NUTCH-677
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Marcin Okraszewski
 Fix For: 0.9.0

 Attachments: MergeFilter.patch, SegmentMergeFilter.java


 I needed a segment filtering based on meta data detected during parse phase. 
 Unfortunately current URL based filtering does not allow for this. So I have 
 created a new SegmentMergeFilter extension which receives segment entry which 
 is being merged and decides if it should be included or not. Even though I 
 needed only ParseData for my purpose I have done it a bit more general 
 purpose, so the filter receives all merged data.
 The attached patch is for version 0.9 which I use. Unfortunately I didn't 
 have time to check how it fits to trunk version. Sorry :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Okraszewski updated NUTCH-677:
-

Attachment: SegmentMergeFilter.java

The filter interface (referred by the patch).

 Segment merge filering based on segment content
 ---

 Key: NUTCH-677
 URL: https://issues.apache.org/jira/browse/NUTCH-677
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Marcin Okraszewski
 Fix For: 0.9.0

 Attachments: MergeFilter.patch, SegmentMergeFilter.java


 I needed a segment filtering based on meta data detected during parse phase. 
 Unfortunately current URL based filtering does not allow for this. So I have 
 created a new SegmentMergeFilter extension which receives segment entry which 
 is being merged and decides if it should be included or not. Even though I 
 needed only ParseData for my purpose I have done it a bit more general 
 purpose, so the filter receives all merged data.
 The attached patch is for version 0.9 which I use. Unfortunately I didn't 
 have time to check how it fits to trunk version. Sorry :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-677) Segment merge filering based on segment content

2009-01-08 Thread Marcin Okraszewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcin Okraszewski updated NUTCH-677:
-

Attachment: SegmentMergeFilters.java

Merge filter aggregation which hides extension point, etc. It is referred by 
the patch.

 Segment merge filering based on segment content
 ---

 Key: NUTCH-677
 URL: https://issues.apache.org/jira/browse/NUTCH-677
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Marcin Okraszewski
 Fix For: 0.9.0

 Attachments: MergeFilter.patch, SegmentMergeFilter.java, 
 SegmentMergeFilters.java


 I needed a segment filtering based on meta data detected during parse phase. 
 Unfortunately current URL based filtering does not allow for this. So I have 
 created a new SegmentMergeFilter extension which receives segment entry which 
 is being merged and decides if it should be included or not. Even though I 
 needed only ParseData for my purpose I have done it a bit more general 
 purpose, so the filter receives all merged data.
 The attached patch is for version 0.9 which I use. Unfortunately I didn't 
 have time to check how it fits to trunk version. Sorry :(

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.