[jira] Commented: (NUTCH-586) Add option to run compiled classes w/o job file

2007-12-04 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548198
 ] 

Enis Soztutar commented on NUTCH-586:
-

Can someone review this ?

 Add option to run compiled classes w/o job file
 ---

 Key: NUTCH-586
 URL: https://issues.apache.org/jira/browse/NUTCH-586
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 1.0.0

 Attachments: run-core_v1.patch


 bin/nutch adds nutch-*.job files under build and base directory to the 
 classpath. However building the job file takes a long time. We have a target 
 compile-core which builds only the core classes w/o plugins, but we need a 
 way to run the compiled core class files. An option to bin/nutch to run the 
 classes compiled with ant compile-core seems enough. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed

2007-12-04 Thread Andrea Spinelli (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548217
 ] 

Andrea Spinelli commented on NUTCH-585:
---

I absolutely agree that a more general solution is needed; however, I think 
that some of the Nutch current users might benefit from a quick fix.

If there is no opposition, I could submit a patch (less than 20 lines)

On the other hand,anybody thinks that blocking selected portions of text could 
pose serious architectural or stability risks?

About the more general solution, do you think there is a viable path from here 
to there?

-- andrea


 [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
 ---

 Key: NUTCH-585
 URL: https://issues.apache.org/jira/browse/NUTCH-585
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
 Environment: All operating systems
Reporter: Andrea Spinelli
Priority: Minor

 We are using nutch to index our own web sites; we would like not to index 
 certain parts of our pages, because we know they are not relevant (for 
 instance, there are several links to change the background color) and 
 generate spurious matches.
 We have modified the plugin so that it ignores HTML code between certain HTML 
 comments, like
 !-- START-IGNORE --
 ... ignored part ...
 !-- STOP-IGNORE --
 We feel this might be useful to someone else, maybe factorizing the comment 
 strings as constants in the configuration files (say parser.html.ignore.start 
 and parser.html.ignore.stop in nutch-site.xml).
 We are almost ready to contribute our code snippet.  Looking forward for any 
 expression of  interest - or for an explanation why waht we are doing is 
 plain wrong!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-586) Add option to run compiled classes w/o job file

2007-12-04 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548233
 ] 

Andrzej Bialecki  commented on NUTCH-586:
-

+1. I think you also need to put a comment, which clarifies that this works 
only in the local Hadoop mode.

 Add option to run compiled classes w/o job file
 ---

 Key: NUTCH-586
 URL: https://issues.apache.org/jira/browse/NUTCH-586
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 1.0.0

 Attachments: run-core_v1.patch


 bin/nutch adds nutch-*.job files under build and base directory to the 
 classpath. However building the job file takes a long time. We have a target 
 compile-core which builds only the core classes w/o plugins, but we need a 
 way to run the compiled core class files. An option to bin/nutch to run the 
 classes compiled with ant compile-core seems enough. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2007-12-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548252
 ] 

Doğacan Güney commented on NUTCH-581:
-

This patch conflicts with my patch in NUTCH-442 (which I really want to commit 
sometime) but that's my problem :). So +1 from me.



 DistributedSearch does not update search servers added to search-servers.txt 
 on the fly
 ---

 Key: NUTCH-581
 URL: https://issues.apache.org/jira/browse/NUTCH-581
 Project: Nutch
  Issue Type: Improvement
  Components: searcher
Affects Versions: 0.9.0
Reporter: Rohan Mehta
Priority: Minor
 Fix For: 0.9.0

 Attachments: NUTCH-581-2.patch, UpdateSearch.patch


 DistributedSearch client updates the search servers added to the 
 search-servers.txt file on the fly. 
 This patch will updates the search servers on the fly and the client does not 
 need a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-586) Add option to run compiled classes w/o job file

2007-12-04 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated NUTCH-586:


Attachment: run-core_v2.patch

bq. I think you also need to put a comment, which clarifies that this works 
only in the local Hadoop mode.
agreed. This patch addresses that.  

 Add option to run compiled classes w/o job file
 ---

 Key: NUTCH-586
 URL: https://issues.apache.org/jira/browse/NUTCH-586
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 1.0.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 1.0.0

 Attachments: run-core_v1.patch, run-core_v2.patch


 bin/nutch adds nutch-*.job files under build and base directory to the 
 classpath. However building the job file takes a long time. We have a target 
 compile-core which builds only the core classes w/o plugins, but we need a 
 way to run the compiled core class files. An option to bin/nutch to run the 
 classes compiled with ant compile-core seems enough. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-588) Help Need

2007-12-04 Thread Teccon Ingenieros (JIRA)
Help Need
-

 Key: NUTCH-588
 URL: https://issues.apache.org/jira/browse/NUTCH-588
 Project: Nutch
  Issue Type: Task
  Components: indexer
Affects Versions: 0.7.2
 Environment: Linux
Reporter: Teccon Ingenieros


Hello,

We are trying to index a word file, if we put the static url like 
(/servlet/jsp/documento.doc) it works ok, put if we try to do the same with an 
dinamic url that generates that file (/servlet/jsp/leerFichero.jspid=112) it 
does´t work, it does´t index our url.
What can we do?

Regards,


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2007-12-04 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes resolved NUTCH-581.


Resolution: Fixed

Patch committed.  This patch check modified time on search-servers.txt file and 
automatically reloads if changed.  This allows added and removing search 
servers on the fly.Thanks Rohan.

 DistributedSearch does not update search servers added to search-servers.txt 
 on the fly
 ---

 Key: NUTCH-581
 URL: https://issues.apache.org/jira/browse/NUTCH-581
 Project: Nutch
  Issue Type: Improvement
  Components: searcher
Affects Versions: 0.9.0
Reporter: Rohan Mehta
Priority: Minor
 Fix For: 0.9.0

 Attachments: NUTCH-581-2.patch, UpdateSearch.patch


 DistributedSearch client updates the search servers added to the 
 search-servers.txt file on the fly. 
 This patch will updates the search servers on the fly and the client does not 
 need a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed

2007-12-04 Thread Matt Kangas (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548420
 ] 

Matt Kangas commented on NUTCH-585:
---

Simplest path forward... that I can think of:

1) Add a new indexing plugin extension-point for filtering page content.
2) Put your apriori marked-up content exclusion logic into a plugin.
3) Someone else figures out a more general-purpose solution later, and swaps 
out your plugin at that time.

Ergo, you generalize the interface, and lazy-load the more general 
implementation. :-)


 [PARSE-HTML plugin] Block certain parts of HTML code from being indexed
 ---

 Key: NUTCH-585
 URL: https://issues.apache.org/jira/browse/NUTCH-585
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
 Environment: All operating systems
Reporter: Andrea Spinelli
Priority: Minor

 We are using nutch to index our own web sites; we would like not to index 
 certain parts of our pages, because we know they are not relevant (for 
 instance, there are several links to change the background color) and 
 generate spurious matches.
 We have modified the plugin so that it ignores HTML code between certain HTML 
 comments, like
 !-- START-IGNORE --
 ... ignored part ...
 !-- STOP-IGNORE --
 We feel this might be useful to someone else, maybe factorizing the comment 
 strings as constants in the configuration files (say parser.html.ignore.start 
 and parser.html.ignore.stop in nutch-site.xml).
 We are almost ready to contribute our code snippet.  Looking forward for any 
 expression of  interest - or for an explanation why waht we are doing is 
 plain wrong!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-589) Hierarchical Classloaders

2007-12-04 Thread Ryan Levering (JIRA)
Hierarchical Classloaders
-

 Key: NUTCH-589
 URL: https://issues.apache.org/jira/browse/NUTCH-589
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Ryan Levering
Priority: Minor


Currently the Nutch plugin classloader flattens all the jars from a plugins' 
dependencies and instantiates a new classloader for each plugin.  I think it 
would be better to create a hierarchical classloader chain.  Currently plugins 
can't pass objects from a common plugin to one another because the objects are 
created using different classloaders.  Nutch currently avoids this by only 
using interfaces from a common classloader to pass objects between plugins, but 
I can't see the harm in improving the plugin classloader.  It would require a 
change to PluginDescription and PluginClassLoader in order to override 
ClassLoader to maintain the export filter functionality that currently exists.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Nutch\nutch-0.9\build.xml:61: Specify at least one source--a file or resource collection.

2007-12-04 Thread quxy
I want to develop a simple plugin. But I get the error build.xml:61: Specify at 
least one source--a file or resource collection.when I use ant.

Could anyone tell me how to fix it? Thank you very much.




quxy
2007-12-05


[jira] Commented: (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2007-12-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548541
 ] 

Hudson commented on NUTCH-581:
--

Integrated in Nutch-Nightly #285 (See 
[http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/285/])

 DistributedSearch does not update search servers added to search-servers.txt 
 on the fly
 ---

 Key: NUTCH-581
 URL: https://issues.apache.org/jira/browse/NUTCH-581
 Project: Nutch
  Issue Type: Improvement
  Components: searcher
Affects Versions: 0.9.0
Reporter: Rohan Mehta
Priority: Minor
 Fix For: 0.9.0

 Attachments: NUTCH-581-2.patch, UpdateSearch.patch


 DistributedSearch client updates the search servers added to the 
 search-servers.txt file on the fly. 
 This patch will updates the search servers on the fly and the client does not 
 need a restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.