[
http://issues.apache.org/jira/browse/NUTCH-65?page=comments#action_12315010 ]
Lutischán Ferenc commented on NUTCH-65:
---
Dear Developers,
I have a finally solution (I have a firewall, I can't make patch with svn), I
suggested please commit
Cache.jsp some times generate NullPointerException
--
Key: NUTCH-123
URL: http://issues.apache.org/jira/browse/NUTCH-123
Project: Nutch
Type: Bug
Components: web gui
Environment: All systems
Reporter
[
http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359564 ]
Lutischán Ferenc commented on NUTCH-133:
Dear Stephan,
Please see http://issues.apache.org/jira/browse/NUTCH-123.
This problem is also problem in cached.jsp.
Regards
Problem encountered with ant during compilation
---
Key: NUTCH-174
URL: http://issues.apache.org/jira/browse/NUTCH-174
Project: Nutch
Type: Bug
Versions: 0.7.1
Environment: Suse LInux 9.3
Reporter: Matthias
Using -dir: creates an error, when the directory already exists
---
Key: NUTCH-176
URL: http://issues.apache.org/jira/browse/NUTCH-176
Project: Nutch
Type: Bug
Versions: 0.7.1
Environment: SUSE
Default installation seems to produce working entity of nutch
-
Key: NUTCH-177
URL: http://issues.apache.org/jira/browse/NUTCH-177
Project: Nutch
Type: Bug
Versions: 0.7.1
Environment: Linux SUSE
[ http://issues.apache.org/jira/browse/NUTCH-177?page=all ]
Matthias Günter updated NUTCH-177:
--
Attachment: crawl-urlfilter.txt
The crawl-filter with a change for apache.org
Default installation seems to produce working entity of nutch
[ http://issues.apache.org/jira/browse/NUTCH-177?page=all ]
Matthias Günter updated NUTCH-177:
--
Attachment: urllist.txt
URL-List used..
Default installation seems to produce working entity of nutch
http: proxy exception list:
Key: NUTCH-208
URL: http://issues.apache.org/jira/browse/NUTCH-208
Project: Nutch
Type: New Feature
Components: fetcher
Versions: 0.8-dev
Reporter: Matthias Günter
Priority: Minor
I
[ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]
Matthias Günter updated NUTCH-208:
--
Attachment: patch.txt
A preliminary patch!!
http: proxy exception list:
---
Key: NUTCH-208
URL: http
[ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]
Matthias Günter updated NUTCH-208:
--
Attachment: patch.txt
A preliminary patch!!
http: proxy exception list:
---
Key: NUTCH-208
URL: http
[
http://issues.apache.org/jira/browse/NUTCH-339?page=comments#action_12433354 ]
Doğacan Güney commented on NUTCH-339:
-
I have made a few changes to Andrzej's latest patch. The biggest change is that
BLOCKED_ADDR_QUEUE is now a priority
[ http://issues.apache.org/jira/browse/NUTCH-339?page=all ]
Doğacan Güney updated NUTCH-339:
Attachment: patch3.txt
Refactor nutch to allow fetcher improvements
Key: NUTCH-339
porting clustering-carrot2 plugin to carrot2 v2.0
-
Key: NUTCH-397
URL: http://issues.apache.org/jira/browse/NUTCH-397
Project: Nutch
Issue Type: Improvement
Reporter: Do?acan
Metadata tries to write null values
---
Key: NUTCH-406
URL: http://issues.apache.org/jira/browse/NUTCH-406
Project: Nutch
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Doğacan Güney
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]
Doğacan Güney updated NUTCH-406:
Attachment: NUTCH-406.patch
A simple patch that writes nulls as empty strings.
Metadata tries to write null values
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]
Doğacan Güney updated NUTCH-406:
Attachment: NUTCH-406.patch
How about something like this then?
Metadata tries to write null values
---
Key
[
http://issues.apache.org/jira/browse/NUTCH-92?page=comments#action_12453682 ]
Dogacan Güney commented on NUTCH-92:
Here is my second attempt at this. Now DistributedSearch$Client keeps a mapping
from addresses to numDocs, and in search
[ http://issues.apache.org/jira/browse/NUTCH-92?page=all ]
Dogacan Güney updated NUTCH-92:
---
Attachment: distributed-idf-v2.patch
DistributedSearch incorrectly scores results
Key: NUTCH-92
[ http://issues.apache.org/jira/browse/NUTCH-411?page=all ]
Dogacan Güney updated NUTCH-411:
Attachment: parse-redirect.patch
Parse ignores meta refresh redirection
--
Key: NUTCH-411
[
http://issues.apache.org/jira/browse/NUTCH-413?page=comments#action_12456832 ]
Dogacan Güney commented on NUTCH-413:
-
Are you sure about this? Running the fetcher (latest trunk) with -noParsing
option does not create any parse segments
[
http://issues.apache.org/jira/browse/NUTCH-413?page=comments#action_12456967 ]
Dogacan Güney commented on NUTCH-413:
-
About command-line options: that is not what I meant(I am not a native
speaker). I meant that I also set fetcher.parse
[
http://issues.apache.org/jira/browse/NUTCH-417?page=comments#action_12458794 ]
Dogacan Güney commented on NUTCH-417:
-
Patch for indexer. Instead of using the FileSystem coming from getRecordWriter,
use FileSystem.get(job) to get the file
[ http://issues.apache.org/jira/browse/NUTCH-417?page=all ]
Dogacan Güney updated NUTCH-417:
Attachment: index.patch
After upgrade to hadoop-0.9.1, parsing and indexing doesn't work
[
http://issues.apache.org/jira/browse/NUTCH-417?page=comments#action_12458811 ]
Dogacan Güney commented on NUTCH-417:
-
Setting speculative execution to false also fixes my problem with parser. Thank
you for the quick answer. I guess you
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
--
Key: NUTCH-420
URL: http://issues.apache.org/jira/browse/NUTCH-420
Project: Nutch
Issue Type: Bug
[ http://issues.apache.org/jira/browse/NUTCH-420?page=all ]
Dogacan Güney updated NUTCH-420:
Attachment: dedup.patch
Patch for the problem. This patch also slightly refactors the code.
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462173
]
Dogacan Güney commented on NUTCH-420:
-
I realized that my last patch if's some irrevelant LOG.debug code
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-420:
Attachment: dedup-v2.patch
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463056
]
Dogacan Güney commented on NUTCH-420:
-
I thought I would attach an index which exhibits this bug. If you run
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-420:
Attachment: index.tar.gz
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463214
]
Dogacan Güney commented on NUTCH-420:
-
Attaching the patch with a testcase (I hope that I got it right, but I am
[
https://issues.apache.org/jira/browse/NUTCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-420:
Attachment: dedup-v3.patch
DeleteDuplicates.HashPartitioner depends on the order of IndexDocs
Add -noAdditions to updatedb
Key: NUTCH-438
URL: https://issues.apache.org/jira/browse/NUTCH-438
Project: Nutch
Issue Type: Improvement
Affects Versions: 0.8.1, 0.8
Reporter: Nicolás Lichtmaier
[
https://issues.apache.org/jira/browse/NUTCH-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolás Lichtmaier updated NUTCH-438:
-
Attachment: noAdditions-backport.diff
I've backported revision 450799 to the 0.8.x branch
Command line utilities should exit with an error message when given wrong
arguments
---
Key: NUTCH-440
URL: https://issues.apache.org/jira/browse/NUTCH-440
Project
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: parse-map-core-untested.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471260
]
Dogacan Güney commented on NUTCH-443:
-
Ok, this is the second attempt(sorry that I am sending patches
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: parse-map-core-draft-v1.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471620
]
Dogacan Güney commented on NUTCH-443:
-
This is pretty much the merge of our work(except parse-rss, it kept
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v1.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v2.patch
Small update to the patch. Now all core junit tests pass.
Now
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v3.patch
new patch, contains a possible fix for CrawlDbReducer problem
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471857
]
Dogacan Güney commented on NUTCH-443:
-
nutch.newbie:
I fail to see what the problem is. If feedparser doesn't
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-444:
Attachment: parse-feed.tar.bz2
OK, here is my feedparsing plugin using rome. Note that this plugin
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v5.patch
New version. Now indexing also works but has a catch. Many
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v6.patch
Oops... I forgot to merge Renaud Richardet's work
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dogacan Güney updated NUTCH-444:
Attachment: parse-feed-v2.tar.bz2
Updated parse-feed plugin. Still not ready for any serious use
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472581
]
Doğacan Güney commented on NUTCH-444:
-
Hi nutch.newbie,
Can you mail me a list of the failing atom urls
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473129
]
Doğacan Güney commented on NUTCH-443:
-
Andrzej:
Thanks for taking the time to review this.
The contract
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473184
]
Doğacan Güney commented on NUTCH-443:
-
Andrzej:
Why does fetcher need to synchronize? Why does the order fetcher
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443-draft-v7.patch
allow parsers to return multiple Parse object
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473383
]
Doğacan Güney commented on NUTCH-443:
-
Regarding the ObjectWritable: since in this case all data is composed
RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt
-
Key: NUTCH-446
URL: https://issues.apache.org/jira/browse/NUTCH-446
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-446:
Attachment: crawl-delay.patch
RobotRulesParser should ignore Crawl-delay values of other bots
[
https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473885
]
Doğacan Güney commented on NUTCH-247:
-
+1 for this approach.
Fetcher should check if agent-name is set
[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-434:
Attachment: NUTCH-434.patch
This patch adds two new classes: GenericWritableConfigurable which
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476212
]
Doğacan Güney commented on NUTCH-445:
-
Has anyone looked at this? Google seems to do site: searches like this too
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476611
]
Doğacan Güney commented on NUTCH-443:
-
* you create the fake CrawlDatum-s in ParseOutputFormat, and then set
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443.02282007-v2.patch
Yet another patch.
ParseResult.filter is out and Nutch
[
https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ricardo J. Méndez updated NUTCH-460:
Attachment: rubyspider-rdf.zip
Code for the aforementioned plugins, to be included under
[
https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482793
]
Ricardo J. Méndez commented on NUTCH-460:
-
Two requirements I hadn't added explicitly:
Apache Jena:
http
[
https://issues.apache.org/jira/browse/NUTCH-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicolás Lichtmaier updated NUTCH-438:
-
Description: It would be great for me to have -noAdditions support (which
is implemented
Scoring filter should distribute score to all outlinks at once
--
Key: NUTCH-468
URL: https://issues.apache.org/jira/browse/NUTCH-468
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-468:
Attachment: scoring.patch
Patch for the issue. It doesn't change the way scoring-opic works
[
https://issues.apache.org/jira/browse/NUTCH-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-468:
Attachment: scoring-v2.patch
That makes sense, patch with the suggested change.
Scoring filter
Fetcher2 sets server-delay and blocking checks incorrectly
--
Key: NUTCH-474
URL: https://issues.apache.org/jira/browse/NUTCH-474
Project: Nutch
Issue Type: Bug
Components
[
https://issues.apache.org/jira/browse/NUTCH-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-474:
Attachment: fetcher2.patch
Fetcher2 sets server-delay and blocking checks incorrectly
Adaptive crawl delay
Key: NUTCH-475
URL: https://issues.apache.org/jira/browse/NUTCH-475
Project: Nutch
Issue Type: Improvement
Components: fetcher
Reporter: Doğacan Güney
Fix
[
https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-475:
Attachment: adaptive-delay_draft.patch
Patch with a simple adaptive algorithm. It measures the last
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-446:
Attachment: crawl-delay_test.patch
Test case for crawl delay rules. Nutch fails the test case
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: NUTCH-443.08052007.patch
Patch updated to latest trunk.
allow parsers to return
[
https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494496
]
Ronny Næss commented on NUTCH-470:
--
Hi, Trond.
Optional meaning does that mean?
I would like more Lucene based
[
https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494734
]
Doğacan Güney commented on NUTCH-446:
-
So, does anyone have objections to this? It fixes an annoying (albeit rare
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494987
]
Doğacan Güney commented on NUTCH-444:
-
Hi Chris,
Well I must say, with all the discussion that's gone on w.r.t
[
https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495350
]
Doğacan Güney commented on NUTCH-485:
-
You probably should not add put(String/Text key, Parse parse) methods
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495357
]
Doğacan Güney commented on NUTCH-443:
-
Well... That's embarrassing. It seems I forgot to include the necessary
[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-444:
Attachment: NUTCH-444.patch
feed.tar.bz2
First version of feed plugin featuring
[
https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495410
]
Doğacan Güney commented on NUTCH-485:
-
I have two more minor nits:
1) ParseResult.isSuccess returns true only
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495696
]
Doğacan Güney commented on NUTCH-443:
-
I am not sure I follow you Andrzej. My patch already does a very similar
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-443:
Attachment: redirect_and_index_v2.patch
New version. Moves parsing code into (content != null
[
https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-25:
---
Attachment: NUTCH-25_draft.patch
Well, something like this should work...
+ Adds a new configurable
[
https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497770
]
Doğacan Güney commented on NUTCH-489:
-
This is obviously useful but:
* Your patches both in this issue
[
https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498113
]
Doğacan Güney commented on NUTCH-489:
-
Hmm.. Won't it now cause Nutch to filter on path on a line like
dedup fails with ArrayIndexOutOfBoundsException
---
Key: NUTCH-491
URL: https://issues.apache.org/jira/browse/NUTCH-491
Project: Nutch
Issue Type: Bug
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498613
]
Doğacan Güney commented on NUTCH-491:
-
Can you retry with the latest trunk? I think a fix related to your issue
java.lang.OutOfMemoryError while indexing.
--
Key: NUTCH-492
URL: https://issues.apache.org/jira/browse/NUTCH-492
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions
[
https://issues.apache.org/jira/browse/NUTCH-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499777
]
Doğacan Güney commented on NUTCH-489:
-
Please ignore my last comment. I don't know what I was on when I wrote
FindBugs: CrawlDbReader and DeleteDuplicates
Key: NUTCH-494
URL: https://issues.apache.org/jira/browse/NUTCH-494
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter
[
https://issues.apache.org/jira/browse/NUTCH-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-494:
Attachment: findbugs_.patch
Patch for CrawlDbReader and DeleteDuplicates.
FindBugs: CrawlDbReader
Unnecessary delays in Fetcher2
--
Key: NUTCH-495
URL: https://issues.apache.org/jira/browse/NUTCH-495
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 1.0.0
Reporter
[
https://issues.apache.org/jira/browse/NUTCH-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doğacan Güney updated NUTCH-495:
Attachment: fetcher2_robots.patch
Unnecessary delays in Fetcher2
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500459
]
Doğacan Güney commented on NUTCH-466:
-
I skimmed through it and it looks awesome. I will try to test it better
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500603
]
Doğacan Güney commented on NUTCH-392:
-
From what I understand of MapFile.Writer code in hadoop, if you give
[
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500935
]
Doğacan Güney commented on NUTCH-392:
-
Perhaps we can allow a user to configure this on a per-structure basis
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501921
]
Doğacan Güney edited comment on NUTCH-466 at 6/6/07 6:08 AM:
-
I still haven't tested
[
https://issues.apache.org/jira/browse/NUTCH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502846
]
Doğacan Güney commented on NUTCH-356:
-
This problem exists with nutch's latest version as evidenced
[
https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505197
]
Doğacan Güney commented on NUTCH-498:
-
Why can't we just set combiner class as LinkDb? AFAICS, you are not doing
[
https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505249
]
Doğacan Güney commented on NUTCH-498:
-
After examining the code better, I am a bit confused. We have
[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505448
]
Doğacan Güney commented on NUTCH-443:
-
Chris, did you get a chance to look at this? If you are busy, I can assign
1 - 100 of 4086 matches
Mail list logo