[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Jasper Kamperman (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651208#action_12651208
 ] 

Jasper Kamperman commented on NUTCH-563:


Hi Davide,

My laptop which has nutch-0.9 on it is in the shop so I can't verify where that 
file is, but I think it is altogether possible that nutch-0.8 doesn't yet have 
a file BasicQueryFilter.java .

Sorry I can't be of more help. I'm CC'ing the original author of the patch, but 
he just became Father, so it might be a while until you hear from him :-).

Jasper



> Include custom fields in BasicQueryFilter
> -
>
> Key: NUTCH-563
> URL: https://issues.apache.org/jira/browse/NUTCH-563
> Project: Nutch
>  Issue Type: New Feature
>  Components: searcher
>Reporter: julien nioche
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: diff.BasicQueryFilter.dynamicFields.txt
>
>
> This patch allows to include additional fields in the BasicQueryFilter by 
> specifying runtime parameters.  Any parameter matching the regular expression 
> (query\\.basic\\.(.+)\\.boost") will be added to the list of fields to be 
> used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Pending Commits for Nutch Issues

2008-11-26 Thread Dennis Kubes
If nobody has a problem with them I would like to commit the following 
issues in the next day or two:


NUTCH-663: Upgrade Nutch to the most recent Hadoop version (0.19)
NUTCH-662: Upgrade Nutch to the most recent Lucene version (2.4)
NUTCH-647: Resolve URLs tool
NUTCH-665: Search Load Testing Tool
NUTCH-667: Input Format for working with Content in Hadoop Streaming

And I would like to commit these in < a week:

NUTCH-635: LinkAnalysis Tool for Nutch
NUTCH-646: New Indexing framework for Nutch
NUTCH-594: Serve Nutch search results in XML and JSON
NUTCH-666: Analysis plugins and new language identifier.

There are others too but these are the ones I am trying to get moved 
into trunk right now.


Dennis


[jira] Updated: (NUTCH-646) New Indexing Framework for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-646:
---

Attachment: NUTCH-646-2-20081126.patch

Updated indexing patch.

> New Indexing Framework for Nutch
> 
>
> Key: NUTCH-646
> URL: https://issues.apache.org/jira/browse/NUTCH-646
> Project: Nutch
>  Issue Type: New Feature
>  Components: indexer
>Affects Versions: 0.9.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 0.9.0, 1.0.0
>
> Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, 
> NUTCH-646-2-20081126.patch
>
>
> New indexing framework for Nutch that provides a more generic field 
> abstraction consistent with Lucene index semantics.  Allows multiple MR jobs 
> to be created for different fields and those fields to be aggregated and 
> indexed in the end.  Overcomes limitations of the current indexer that limits 
> what databases are passed into the indexer.  Creates a new extension point as 
> well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Davide (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651152#action_12651152
 ] 

Davide commented on NUTCH-563:
--

Hi Jasper,

could you explain me how to apply it? I can't find the right file to apply the 
diff..

Thank you a lot!

> Include custom fields in BasicQueryFilter
> -
>
> Key: NUTCH-563
> URL: https://issues.apache.org/jira/browse/NUTCH-563
> Project: Nutch
>  Issue Type: New Feature
>  Components: searcher
>Reporter: julien nioche
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: diff.BasicQueryFilter.dynamicFields.txt
>
>
> This patch allows to include additional fields in the BasicQueryFilter by 
> specifying runtime parameters.  Any parameter matching the regular expression 
> (query\\.basic\\.(.+)\\.boost") will be added to the list of fields to be 
> used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[Nutch Wiki] Update of "PluginCentral" by johnroman

2008-11-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by johnroman:
http://wiki.apache.org/nutch/PluginCentral

--
   * WritingPluginExample - A step-by-step example of how to write a plugin for 
the 0.7 branch. - updated by LucasBoullosa
   * [http://wiki.media-style.com/display/nutchDocu/Write+a+plugin Writing 
Plugins] - by Stefan
  
- == Plugins that Come with Nutch (0.7) ==
+ == Plugins that Come with Nutch (0.9) ==
  
  In order to get Nutch to use any of these plugins, you just need to edit your 
conf/nutch-site.xml file and add the name of the plugin to the list of 
plugin.includes.
  
@@ -24, +24 @@

   * '''parse-html''' - Parses HTML documents
   * '''parse-js''' - Parses Java``Script
   * '''parse-mp3''' - Parses MP3s
+  * '''parse-zip''' - Parses ZIP archives
+  * '''parse-mspowerpoint''' - Parses Microsoft Powerpoint files
   * '''parse-msword''' - Parses MS Word documents
+  * '''parse-msexcel''' - Parses MS Excel documents
   * '''parse-pdf''' - Parses PDFs
   * '''parse-rss''' - Parses RSS feeds
+  * '''parse-oo''' - Parses OpenOffice files
+  * '''parse-swf''' - Parses Shockwave Flash
   * '''parse-rtf''' - Parses RTF files
   * '''parse-text''' - Parses text documents
   * '''protocol-file''' - Retreives documents from the filesystem
@@ -47, +52 @@

   * '''lib-commons-httpclient'''
   * '''lib-http'''
   * '''lib-jakarta-poi'''
-  * '''lib-log4j'''
+  * '''lib-log4j''' 
-  * '''lib-lucene-analyzers'''
+  * '''lib-lucene-analyzers''' - Lucene analyzers
-  * '''lib-nekohtml'''
-  * '''lib-parsems'''
+  * '''lib-nekohtml''' - automatic tag balancer 
+  * '''lib-parsems''' - parse ms documents framework
   * '''parse-msexcel''' - Parses MS Excel documents
   * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
   * '''parse-oo''' - Parses Open Office and Star Office documents 
(Extentsions: ODT, OTT, ODH, ODM, ODS, OTS, ODP, OTP, SXW, STW, SXC, STC, SXI, 
STI)


[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Jasper Kamperman (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651093#action_12651093
 ] 

Jasper Kamperman commented on NUTCH-563:


Hi Davide,

I never tried to apply it to 0.8, sorry.

Jasper



> Include custom fields in BasicQueryFilter
> -
>
> Key: NUTCH-563
> URL: https://issues.apache.org/jira/browse/NUTCH-563
> Project: Nutch
>  Issue Type: New Feature
>  Components: searcher
>Reporter: julien nioche
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: diff.BasicQueryFilter.dynamicFields.txt
>
>
> This patch allows to include additional fields in the BasicQueryFilter by 
> specifying runtime parameters.  Any parameter matching the regular expression 
> (query\\.basic\\.(.+)\\.boost") will be added to the list of fields to be 
> used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Troubles while creating a plugin

2008-11-26 Thread Pau
Hello,
I am creating a plugin for Nutch that extends the QueryFilter.
I get a successful compilation with "ant" and "ant war", but when I do a
search, I get the following exception:

26/11/2008 18:50:07 org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: org/apache/commons/codec/DecoderException
at
org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:272)
at
org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:221)
at
org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:201)
at
org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:164)
at
org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:138)
at
org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:121)
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:56)
at org.apache.nutch.util.MimeUtil.(MimeUtil.java:62)
at org.apache.nutch.protocol.Content.(Content.java:85)
at
org.apache.nutch.personalizedsearch.searcher.context.ContextQueryFilter.filter(ContextQueryFilter.java:55)
at
org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:111)
at
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:96)
at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:251)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:284)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

The DecoderException class is in commons-codec-1.3.jar, so I added the jar
file to my plugin.xml:
   
  
  
 
  
  
   

But the same error appears. Any idea on what I may be doing wrong?
Thanks.


[jira] Updated: (NUTCH-667) Input Format for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-667:
---

Summary: Input Format for working with Content in Hadoop Streaming  (was: 
Input Forma for working with Content in Hadoop Streaming)

> Input Format for working with Content in Hadoop Streaming
> -
>
> Key: NUTCH-667
> URL: https://issues.apache.org/jira/browse/NUTCH-667
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: NUTCH-667-1-20081126.patch
>
>
> This is a ContextAsText input format that removes line endings with spaces 
> that allow Nutch content to be used more effectively inside of Hadoop 
> streaming jobs that allow MapReduce jobs to be written in any language that 
> can communicate with stdin and stdout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-667) Input Forma for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-667:
---

Attachment: NUTCH-667-1-20081126.patch

Input format for working with hadoop streaming.

> Input Forma for working with Content in Hadoop Streaming
> 
>
> Key: NUTCH-667
> URL: https://issues.apache.org/jira/browse/NUTCH-667
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: NUTCH-667-1-20081126.patch
>
>
> This is a ContextAsText input format that removes line endings with spaces 
> that allow Nutch content to be used more effectively inside of Hadoop 
> streaming jobs that allow MapReduce jobs to be written in any language that 
> can communicate with stdin and stdout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-667) Input Forma for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)
Input Forma for working with Content in Hadoop Streaming


 Key: NUTCH-667
 URL: https://issues.apache.org/jira/browse/NUTCH-667
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0


This is a ContextAsText input format that removes line endings with spaces that 
allow Nutch content to be used more effectively inside of Hadoop streaming jobs 
that allow MapReduce jobs to be written in any language that can communicate 
with stdin and stdout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-635:
---

Attachment: (was: NUTCH-635-8-20080818.patch)

> LinkAnalysis Tool for Nutch
> ---
>
> Key: NUTCH-635
> URL: https://issues.apache.org/jira/browse/NUTCH-635
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, 
> NUTCH-635-3-20080614.patch, NUTCH-635-4-20080615.patch, 
> NUTCH-635-5-20080620.patch, NUTCH-635-6-20080725.patch, 
> NUTCH-635-7-20080808.patch, NUTCH-635-9-20081126.patch
>
>
> This is a basic pagerank type link analysis tool for nutch which simulates a 
> sparse matrix using inlinks and outlinks and converges after a given number 
> of iterations.  This tool is mean to replace the current scoring system in 
> nutch with a system that converges instead of exponentially increasing 
> scores.  Also includes a tool to create an outlinkdb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-635:
---

Attachment: NUTCH-635-9-20081126.patch

Updated final patch for new link analysis framework.  I am also going to write 
up some documentation on the wiki for how this new process works.

> LinkAnalysis Tool for Nutch
> ---
>
> Key: NUTCH-635
> URL: https://issues.apache.org/jira/browse/NUTCH-635
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, 
> NUTCH-635-3-20080614.patch, NUTCH-635-4-20080615.patch, 
> NUTCH-635-5-20080620.patch, NUTCH-635-6-20080725.patch, 
> NUTCH-635-7-20080808.patch, NUTCH-635-9-20081126.patch
>
>
> This is a basic pagerank type link analysis tool for nutch which simulates a 
> sparse matrix using inlinks and outlinks and converges after a given number 
> of iterations.  This tool is mean to replace the current scoring system in 
> nutch with a system that converges instead of exponentially increasing 
> scores.  Also includes a tool to create an outlinkdb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: (was: NUTCH-663-1-20081126.patch)

> Upgrade Nutch to use Hadoop 0.19
> 
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
> NUTCH-663-1-20081126.patch
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: NUTCH-663-1-20081126.patch

Updated patch to include API changes in Nutch classes.

> Upgrade Nutch to use Hadoop 0.19
> 
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
> NUTCH-663-1-20081126.patch
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-666:
---

Attachment: NUTCH-666-1-20081126.patch

Fixed patch.  Now includes the changes to AnalyzerFactory to allow multiple 
languages per plugin.

> Analysis plugins for multiple language and new Language Identifier Tool
> ---
>
> Key: NUTCH-666
> URL: https://issues.apache.org/jira/browse/NUTCH-666
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: NUTCH-666-1-20081126.patch
>
>
> Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
> russian, and thai.  Also includes a new Language Identifier tool that used 
> the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-666:
---

Attachment: (was: NUTCH-666-1-20081126.patch)

> Analysis plugins for multiple language and new Language Identifier Tool
> ---
>
> Key: NUTCH-666
> URL: https://issues.apache.org/jira/browse/NUTCH-666
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: NUTCH-666-1-20081126.patch
>
>
> Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
> russian, and thai.  Also includes a new Language Identifier tool that used 
> the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Summary: Upgrade Nutch to use Hadoop 0.19  (was: Upgrade Nutch to use 
Hadoop 0.18.2)

change to 0.19 instead of 0.18.2

> Upgrade Nutch to use Hadoop 0.19
> 
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
> NUTCH-663-1-20081126.patch
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: hadoop-0.19.0-core.jar

Hadoop core jar

> Upgrade Nutch to use Hadoop 0.18.2
> --
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
> NUTCH-663-1-20081126.patch
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: NUTCH-663-1-20081126.patch

Updates jar and native files

> Upgrade Nutch to use Hadoop 0.18.2
> --
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: hadoop-0.19-native.tar.gz, NUTCH-663-1-20081126.patch
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: hadoop-0.19-native.tar.gz

Native files

> Upgrade Nutch to use Hadoop 0.18.2
> --
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: hadoop-0.19-native.tar.gz, NUTCH-663-1-20081126.patch
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-666:
---

Attachment: NUTCH-666-1-20081126.patch

Part one of patch.  This includes the new analyzers for different languages.  
Part two will include the new language identifier tool.

> Analysis plugins for multiple language and new Language Identifier Tool
> ---
>
> Key: NUTCH-666
> URL: https://issues.apache.org/jira/browse/NUTCH-666
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: NUTCH-666-1-20081126.patch
>
>
> Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
> russian, and thai.  Also includes a new Language Identifier tool that used 
> the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)
Analysis plugins for multiple language and new Language Identifier Tool
---

 Key: NUTCH-666
 URL: https://issues.apache.org/jira/browse/NUTCH-666
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0


Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
russian, and thai.  Also includes a new Language Identifier tool that used the 
new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-647) Resolve URLs tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-647:
---

Attachment: NUTCH-647-2-20081126.patch

Updated patch.

> Resolve URLs tool
> -
>
> Key: NUTCH-647
> URL: https://issues.apache.org/jira/browse/NUTCH-647
> Project: Nutch
>  Issue Type: New Feature
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Attachments: NUTCH-647-1-20080818.patch, NUTCH-647-2-20081126.patch
>
>
> A tool that takes a listing of urls and attempts to resolve their IP 
> addresses.  Useful for running after the fetcher has run to determine if DNS 
> problems exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-665) Search Load Testing Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-665:
---

Attachment: NUTCH-665-20081126-1.patch

Search load testing tool.

> Search Load Testing Tool
> 
>
> Key: NUTCH-665
> URL: https://issues.apache.org/jira/browse/NUTCH-665
> Project: Nutch
>  Issue Type: New Feature
>  Components: searcher
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
>Priority: Minor
> Fix For: 1.0.0
>
> Attachments: NUTCH-665-20081126-1.patch
>
>
> A tool which spawn a number of threads and executes searches against 
> configured search servers.  This is used for light load testing of search 
> servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-665) Search Load Testing Tool

2008-11-26 Thread Dennis Kubes (JIRA)
Search Load Testing Tool


 Key: NUTCH-665
 URL: https://issues.apache.org/jira/browse/NUTCH-665
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0


A tool which spawn a number of threads and executes searches against configured 
search servers.  This is used for light load testing of search servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Davide (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651009#action_12651009
 ] 

Davide commented on NUTCH-563:
--

Hi,
is it possible to apply this code also on Nutch 0.8.1? Can you explain me how?

Thanks

> Include custom fields in BasicQueryFilter
> -
>
> Key: NUTCH-563
> URL: https://issues.apache.org/jira/browse/NUTCH-563
> Project: Nutch
>  Issue Type: New Feature
>  Components: searcher
>Reporter: julien nioche
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: diff.BasicQueryFilter.dynamicFields.txt
>
>
> This patch allows to include additional fields in the BasicQueryFilter by 
> specifying runtime parameters.  Any parameter matching the regular expression 
> (query\\.basic\\.(.+)\\.boost") will be added to the list of fields to be 
> used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650982#action_12650982
 ] 

Dennis Kubes commented on NUTCH-663:


hadoop 0.19 was release.  I am integrating it in and should have a patch 
shortly.

> Upgrade Nutch to use Hadoop 0.18.2
> --
>
> Key: NUTCH-663
> URL: https://issues.apache.org/jira/browse/NUTCH-663
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.0.0
> Environment: All
>Reporter: Dennis Kubes
>Assignee: Dennis Kubes
> Fix For: 1.0.0
>
>
> Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
> performance improvements, bug fixes, and new functionality.  Changes some 
> current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-664) Possibility to update already stored documents.

2008-11-26 Thread Sergey Khilkov (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650912#action_12650912
 ] 

Sergey Khilkov commented on NUTCH-664:
--

Yes, It will be great to have changeDocument() method of IndexWriter class. 
Hope it's possible )

> Possibility to update already stored documents.
> ---
>
> Key: NUTCH-664
> URL: https://issues.apache.org/jira/browse/NUTCH-664
> Project: Nutch
>  Issue Type: Wish
>Reporter: Sergey Khilkov
>Priority: Minor
>
> We have huge index of stored documents. It is high cost procedure to fetch 
> page, merge indexes any time we update some information about page. The 
> information can be changed 1-3 times per day. At this moment we have to store 
> changed info in database, but in this case we have lots of problems with 
> sorting, search restricions and so on. Lucene itself allows delete single 
> document and add new one into existing index. But there is a problem with 
> hadoop... As I understand hadoop filesystem has no possibility to write in 
> random positions. But it will be great feature if nutch will be able to 
> update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-664) Possibility to update already stored documents.

2008-11-26 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-664:


  Priority: Minor  (was: Major)
Issue Type: Wish  (was: New Feature)

There is no proposed design, so this is a Wish.

> Possibility to update already stored documents.
> ---
>
> Key: NUTCH-664
> URL: https://issues.apache.org/jira/browse/NUTCH-664
> Project: Nutch
>  Issue Type: Wish
>Reporter: Sergey Khilkov
>Priority: Minor
>
> We have huge index of stored documents. It is high cost procedure to fetch 
> page, merge indexes any time we update some information about page. The 
> information can be changed 1-3 times per day. At this moment we have to store 
> changed info in database, but in this case we have lots of problems with 
> sorting, search restricions and so on. Lucene itself allows delete single 
> document and add new one into existing index. But there is a problem with 
> hadoop... As I understand hadoop filesystem has no possibility to write in 
> random positions. But it will be great feature if nutch will be able to 
> update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.