Look at the section entitled Java Heap Size problem at
http://wiki.apache.org/nutch/RunNutchInEclipse1.0
Unfortunately, the GPL license prevents us from integrating this work
with
Nutch.
I could change the licensing... would the Apache license be better?
Yes, definitely - that would be very nice.
I just changed it to Apache License 2.0.
Frank
Unfortunately, the GPL license prevents us from integrating this work with
Nutch.
I could change the licensing... would the Apache license be better?
Frank
My class ran out of time before we could integrate our project into
Nutch, but our Sitemap parser is available for anyone who would like
to integrate it with Nutch:
Java Sitemap Parser
http://sourceforge.net/projects/sitemap-parser/
--
Frank McCown, Ph.D.
Assistant Professor of Computer Science
- Preferences - Java - Installed JREs - edit -
Default VM arguments
I've set mine to -Xms5m -Xmx150m because I have like 200MB RAM left after
runnig all apps
-Xms (minimum ammount of RAM memory for running applications)
-Xmx (maximum)
It should help.
Thanks,
Bartosz
Frank McCown pisze
any changes
different configurations (different then crawl-urlfilter - adding your
domain).
Thanks,
Bartosz
Frank McCown pisze:
Adding cygwin to my PATH solved my problem with whoami. But now I'm
getting an exception when running the crawler:
Injector: Converting injected urls to crawl
with eclipse on windows with no problems.
Thanks,
Bartosz
--
Frank McCown, Ph.D.
Assistant Professor of Computer Science
Harding University
http://www.harding.edu/fmccown/
[
https://issues.apache.org/jira/browse/NUTCH-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McCown resolved NUTCH-720.
Resolution: Fixed
Fix Version/s: 1.0.0
This issue can be solved by changing
: Frank McCown
Priority: Minor
Google, Yahoo, and Live list all pages they have indexed for the
site:www.example.com query. But Nutch returns back 0 results unless a query
term is also supplied (e.g., site:www.example.com term). It would be helpful
to make Nutch support the site
Google, Yahoo, and Live list all pages they have indexed for the
site:www.example.com query. But Nutch returns back 0 results unless
a query term is also supplied (e.g., site:www.example.com term).
Would it be better for Nutch to respond in the same manner that other
search engines do? This is a
Support for rel=canonical attribute
-
Key: NUTCH-710
URL: https://issues.apache.org/jira/browse/NUTCH-710
Project: Nutch
Issue Type: New Feature
Affects Versions: 1.1
Reporter: Frank
this new attribute as well.
After I get some feedback, I'll submit a request to JIRA. I was
wondering though, would it be better to submit it as an issue for 0.9,
1.0, or 1.1?
Thanks,
Frank
--
Frank McCown, Ph.D.
Assistant Professor of Computer Science
Harding University
http://www.harding.edu
project suggestions
there.
http://www-scf.usc.edu/~csci572/
Good luck!
Cheers,
Chris
On 1/2/08 2:44 PM, Frank McCown [EMAIL PROTECTED] wrote:
Greetings. I'm teaching a class on search engine development this
semester, and I am considering having my students use Nutch
implementation tasks you guys think would
be appropriate for a small group of undergrad, upperclass CS students?
I'm looking for ideas for improving Nutch that they could accomplish
in a few weeks time.
Thanks,
--
Frank McCown, Ph.D.
Assistant Professor of Computer Science
Harding University
http
14 matches
Mail list logo