From: Oscar [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: subscribe
This is how you are trying to subscribe. This is incorrect. You should
send a mail to the following email address to subscribe to the mailing
list.
[EMAIL PROTECTED]
Regards,
Susam Pal
http://susam.in/
On 6/30/07, Oscar
Is it that the interface 'org.apache.nutch.net.URLFilter' was compiled
with JDK 1.5 earlier? I have seen this problem happening with a beta
version of JDK 1.6.
Are you using the latest version, JDK 1.6 Update 2?
Regards,
Susam Pal
http://susam.in/
On 9/11/07, Doğacan Güney [EMAIL PROTECTED
suggestions?
Regards,
Susam Pal
http://susam.in/
you tried parse-pdf?
Regards,
Susam Pal
http://susam.in/
mailing list.
Regards,
Susam Pal
http://susam.in/
On 10/10/07, Christopher Bader [EMAIL PROTECTED] wrote:
I ran Nutch on a subset of Wikipedia, and it works. But for each search it
always gives exactly two choices.
How do I configure it so that it gives (a) N choices, for arbitrary N, or
(b
it is not a bug in Nutch 0.9 This
looks like a configuration problem at your end. Please discuss this
properly in [EMAIL PROTECTED] instead of submitting it as a
bug in Nutch.
Regards,
Susam Pal
On Jan 8, 2008 7:16 AM, sudarat (JIRA) [EMAIL PROTECTED] wrote:
nutch crawl and index problem
I wanted to send this as a private reply but sent it to the list
instead. Sorry for the inconvenience.
On Jan 8, 2008 10:21 AM, Susam Pal [EMAIL PROTECTED] wrote:
I have replied this query of yours yesterday in
[EMAIL PROTECTED] If you haven't received the reply,
probably you have
doesn\'t occur in Linux. I am not well
acquainted with the Hadoop code yet. Could someone throw light on what
might be going wrong?
Regards,
Susam Pal
On 2/7/08, DS jha [EMAIL PROTECTED] wrote:
Hi -
Looks like latest trunk version of nutch is failing with the following
exception when trying
this failed with the same error.
Right now I don't have a Windows system with me. I will try setting it
as /cygdrive/d/tmp/ tomorrow when I again have access to a Windows
system and then I'll update the mailing list with the observations.
Thanks for the suggestion.
Regards,
Susam Pal
On Thu, Feb
)
at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
Regards,
Susam Pal
On Thu, Feb 14, 2008 at 10:07 PM, Susam Pal [EMAIL PROTECTED] wrote:
What I did try was setting hadoop.tmp.dir to /opt/tmp. I found the
behavior
I still can't see any DEBUG logs in your log file. Did you go through
my earlier mail?
Regards,
Susam Pal
On Wed, Mar 12, 2008 at 9:39 PM, [EMAIL PROTECTED] wrote:
Hi All,
I am facing a problem in running nutch where the proxy authentication is
required to crawl the site.(eg. google.com
valuable
work can be done. What do you say?
Regards,
Susam Pal
I agree with John too. Probably you meant $ 0.02, since 0.02 cents is too
less. It is usually 2 cents. :-P
Regards,
Susam Pal
On Tue, Dec 2, 2008 at 6:09 PM, John Martyniak [EMAIL PROTECTED] wrote:
Is NUTCH-442 going to be part of the 1.0 release? I hope so, Nutch/Solr
integration would
'conf/crawl-tool.xml' ?
Regards,
Susam Pal
On Tue, Apr 7, 2009 at 1:07 AM, Susam Pal susam@gmail.com wrote:
The inline documentation of 'conf/crawl-tool.xml' mentions:
!-- Do not modify this file directly. Instead, copy entries that you --
!-- wish to modify from this file into nutch-site.xml and change them
are available at:
http://lucene.apache.org/nutch/version_control.html
Regards,
Susam Pal
included it in CC.
This feature is not present in Nutch. We have recorded the summary of
some old discussions regarding this here:
http://wiki.apache.org/nutch/HttpPostAuthentication But this was never
implemented.
Regards,
Susam Pal
[
https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-44:
---
Attachment: NUTCH-44.patch
Attached a patch.
To apply:-
patch -p0 NUTCH-44.patch
ant war
cp build
[
https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-44:
---
Attachment: (was: NUTCH-44.patch)
too many search results
---
Key
[
https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-44:
---
Attachment: NUTCH-44.patch
Updated my previous patch to fix the issue in opensearch too.
To apply:-
patch
[
https://issues.apache.org/jira/browse/NUTCH-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-281:
Attachment: NUTCH-281.patch
Uploading a patch.
Put the base tag outside comments and now the relative
Type: Improvement
Components: fetcher
Affects Versions: 1.0.0
Reporter: Susam Pal
'protocol-http11' is a protocol plugin which supports retrieving documents via
the HTTP 1.0, HTTP 1.1 and HTTPS protocols, optionally with Basic, Digest and
NTLM authentication schemes
[
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-557:
Attachment: protocol-http11v0.1.patch
I have generated this patch against Nutch trunk.
To apply:-
patch
[
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-557:
Priority: Minor (was: Major)
protocol-http11 for HTTP 1.1, HTTPS, NTLM, Basic and Digest Authentication
[
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528854
]
Susam Pal commented on NUTCH-557:
-
No, there isn't any significant difference in performance. Here's a list
Components: fetcher
Affects Versions: 1.0.0
Reporter: Susam Pal
Priority: Minor
Added basic, digest and NTLM authentication schemes to protocol-httpclient. The
authentication schemes can be configured for proxy server as well as web
servers of a domain
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-559:
Attachment: NUTCH-559v0.1.patch
I have generated this patch against Nutch trunk. It will add support
[
https://issues.apache.org/jira/browse/NUTCH-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal closed NUTCH-557.
---
Resolution: Won't Fix
As per the discussion, 'protocol-http11' has been turned into a patch for
'protocol
[
https://issues.apache.org/jira/browse/NUTCH-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530175
]
susam edited comment on NUTCH-539 at 9/25/07 10:54 AM:
---
1. There is a bug in the patch. The
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-559:
Priority: Major (was: Minor)
Apart from adding the authentication features, this patch would fix three
[
https://issues.apache.org/jira/browse/NUTCH-560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530519
]
Susam Pal commented on NUTCH-560:
-
I analysed 'protocol-http' and it behaves almost in the same manner. While
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-559:
Attachment: NUTCH-559v0.4.patch
Uploading a revised (v0.4) patch that has all authentication configuration
[
https://issues.apache.org/jira/browse/NUTCH-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-559:
Attachment: NUTCH-559v0.5.patch
Uploading a revised (v0.5) patch with some test cases. Added a 'scheme
[
https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-601:
Attachment: NUTCH-601v0.2.patch
Attached a revised patch (NUTCH-601v0.2.patch), which removes the old
[
https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-601:
Attachment: NUTCH-601v0.1.patch
Patch attached.
Recrawling on existing crawl directory using force option
Versions: 1.0.0
Reporter: Susam Pal
Priority: Minor
Added a '-force' option to the 'bin/nutch crawl' command line. With this
option, one can crawl and recrawl in the following manner:
{code}
bin/nutch crawl urls -dir crawl -depth 2 -topN 10 -threads 5
bin/nutch crawl urls -dir
[
https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12565848#action_12565848
]
Susam Pal commented on NUTCH-601:
-
The 'if (newIndex != index)' condition is just a check
[
https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-601:
Attachment: NUTCH-601v1.0.patch
Attached another patch (NUTCH-601v1.0.patch) that always deletes the old
[
https://issues.apache.org/jira/browse/NUTCH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-601:
Attachment: NUTCH-601v0.3.patch
Attached a revised patch (NUTCH-601v0.3.patch) that makes the code simpler
[
https://issues.apache.org/jira/browse/NUTCH-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-612:
Attachment: NUTCH-612v0.1.patch
Attached patch to fix the bug. This modifies Crawl.java and Generator.java
Components: generator
Affects Versions: 1.0.0
Reporter: Susam Pal
Fix For: 1.0.0
When a crawl is done using the 'bin/nutch crawl' command, no filtering is done
in Generator even if 'crawl.generate.filter' is set to true in the
configuration file.
The problem
Issue Type: Bug
Components: web gui
Affects Versions: 1.0.0
Reporter: Susam Pal
Priority: Minor
The inline documentation of 'conf/crawl-tool.xml' mentions:
{code:xml}
!-- Do not modify this file directly. Instead, copy entries that you --
!-- wish
[
https://issues.apache.org/jira/browse/NUTCH-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susam Pal updated NUTCH-735:
Attachment: NUTCH-735v0.1.patch
Attached patch.
crawl-tool.xml must be read before nutch-site.xml when
43 matches
Mail list logo