Dennis, +1
On 6/25/07 4:42 PM, Dennis Kubes [EMAIL PROTECTED] wrote:
If no one has any objections, I will go ahead and commit this.
Dennis Kubes
Dennis Kubes (JIRA) wrote:
[
https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugi
On 6/25/07 8:34 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Author: kubes
Date: Mon Jun 25 20:33:59 2007
New Revision: 550669
URL: http://svn.apache.org/viewvc?view=revrev=550669
Log:
NUTCH-497: Fixes problems relating to StackOverflow errors
and extreme nested tags. Adds general
No problemo!
Thanks!
Cheers,
Chris
On 6/25/07 9:45 PM, Dennis Kubes [EMAIL PROTECTED] wrote:
ooopsgotta remember to do that. Done.
Dennis
Chris Mattmann wrote:
On 6/25/07 8:34 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Author: kubes
Date: Mon Jun 25 20:33:59 2007
New
Doğacan,
This is strange indeed. I noticed this during my testing of parse-feed,
however, thought it was an anomaly. I got this same strange cryptic unit
test error message, and then after some frustration figuring it out, I did
ant clean, then ant compile-core test, and miraculously the error
On 6/20/07 7:17 AM, Doğacan Güney [EMAIL PROTECTED] wrote:
It never passes for me (not even when I do it in src/plugin/feed). If
you
check the output, parseResult only contains a single entry which
is
rsstest.rss.
Okay, please tell me I'm not crazy here. I'm on Mac OS X 10.4, Java
version:
On 6/20/07 8:17 AM, Doğacan Güney [EMAIL PROTECTED] wrote:
Since you are doing compile-core, no plugins get compiled
(say,
urlfilter-prefix), then when you do a ant test in feed
only
protocol-file gets compiled. So, no urlfilter-prefix, no problem :).
I
have to say that I am certain that I
Hi Folks,
I'd just like to throw out my +1 for Doğacan Güney's committer status. I've
been impressed by several of his contributions and the guy just keeps them
coming and coming. I'm not a member of the Lucene PMC, so I don't have
official voting rights, however, I would like to express my
announcing
the completion of the release.
Thanks!
Cheers,
Chris
On 4/4/07 7:21 PM, Chris Mattmann [EMAIL PROTECTED] wrote:
Hi Guys,
I've just moved forward with step 13 in the release process (waiting for
release to propogate to mirrors). Should I just go ahead and do the other
Hi Folks,
After some hard work from all folks involved, we've managed to push out
Apache Nutch, release 0.9. This is the second release of Nutch based
entirely on the underlying Hadoop platform. This release includes several
critical bug fixes, as well as key speedups described in more detail at
wrapped up tonight! :-)
Cheers,
Chris
On 4/4/07 8:04 AM, Sami Siren [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
Hi Folks,
I have posted a candidate for the Apache Nutch 0.9 release at
http://people.apache.org/~mattmann/nutch_0.9/rc2/
Please vote on releasing these packages as Apache
Hi Guys,
I've just moved forward with step 13 in the release process (waiting for
release to propogate to mirrors). Should I just go ahead and do the other
steps (update Nutch site, update Lucene site, Update javadoc, create version
in JIRA, etc.)? It seems that I could do these without the
Hi Guys,
I think we're discussing about the same thing(improving the process), I
just don't think 0.9 is out yet :)
But to wrap it up for me:
+1 for creating 0.9 branch after fixing the bug (and removing the tag),
creating new rc
and starting a vote.
+1.
+1.
So, that's 3
Hi Dennis,
Thanks for taking care of this. :-) Could you update CHANGES.txt as well?
Once you take care of that, in about 2 hrs (when I get home), I'll begin the
release process again.
Thanks!
Cheers,
Chris
On 4/2/07 2:40 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Author: kubes
to it sooner.
Dennis Kubes
Chris Mattmann wrote:
Hi Dennis,
Thanks for taking care of this. :-) Could you update CHANGES.txt as well?
Once you take care of that, in about 2 hrs (when I get home), I'll begin the
release process again.
Thanks!
Cheers,
Chris
On 4/2/07 2:40
Hi Folks,
I have posted a candidate for the Apache Nutch 0.9 release at
http://people.apache.org/~mattmann/nutch_0.9/rc2/
See the included CHANGES-0.9.txt file for details on release
contents and latest changes. The release was made from the 0.9-dev trunk,
including the recent patch applied
Folks,
As an FYI, here is a link to the log of the steps that I followed to get to
this point in the release:
http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc
Cheers,
Chris
On 4/2/07 10:52 PM, Chris Mattmann [EMAIL PROTECTED] wrote:
Hi Folks,
I have posted a candidate
My +1 for 1.0.0. I already changed it to 0.10.0, but this can be easily
reverted, and was probably something that I should have brought to the
attention of the dev list before I did that (sorry about that). In any case,
I think 1.0.0 makes a lot of sense, politically, and software wise. Nutch is
Hi Sami,
A very limited acid test shows that I can do crawling and searching
through web app so that part is ok.
Great! Similar tests of my own showed the same.
About signatures: I can't find your public gpg key anywhere (to verify
the signature), not in KEYS file nor in keyservers I
/, using the same
convention as the others. To get the header, I did a gpg --list-keys.
Thanks!
Cheers,
Chris
On 3/27/07 8:14 AM, Chris Mattmann [EMAIL PROTECTED] wrote:
Hi Sami,
A very limited acid test shows that I can do crawling and searching
through web app so that part is ok.
Great
Hey Doug,
Do you think we should do this in Nutch too? I'm in favor of doing this --
what does everyone else feel?
Thanks!
Cheers,
Chris
__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data
Hi Dennis,
Not to nit-pick, but the place where you inserted your change isn't at the
end (where they typically should be placed). You inserted in the middle of
the file, throwing off the numbering (there are now 2 sets of 18, and 19 in
the unreleased changes section). Could you please append
Dennis,
No probs. Thanks, a lot!
Cheers,
Chris
On 3/10/07 5:35 PM, Dennis Kubes [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
Hi Dennis,
Not to nit-pick, but the place where you inserted your change isn't at the
end (where they typically should be placed). You inserted
On 3/8/07 1:55 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
Hi Andrzej,
Yep, +1. I also want to make a small update, where instead of creating a
new NutchConf object, to just pass it through (maybe via the protocol
layer?). Does this make sense?
I'm not sure
Hi Folks,
As suggested by Sami, I'm moving this discussion to the nutch-dev list.
Seems like I am the guy that is going to do the Nutch 0.9 release :-)
However, it seems also that there are some issues that need to be sorted out
first. I'd like to follow up to Andrzej's email about loose ends
,
Chris
-- Forwarded Message
From: Chris Mattmann [EMAIL PROTECTED]
Date: Mon, 05 Mar 2007 21:25:30 -0800
To: Piotr Kosiorowski [EMAIL PROTECTED]
Cc: Chris Mattmann [EMAIL PROTECTED], Andrzej Bialecki
[EMAIL PROTECTED]
Conversation: Nutch release process help
Subject: Nutch release process help
Hi Guys,
Blocker
* NUTCH-400 (Update add missing license headers) - I believe this is
fixed and should be closed
+1, thanks to Sami for closing it.
* NUTCH-353 (pages that serverside forwards will be refetched every
time) - this was partially fixed in NUTCH-273, but a more
Jérôme Charron wrote:
Hi Chris,
The JIRA issue is the 309 : https://issues.apache.org/jira/browse/NUTCH-309
Thanks for your help.
Jérôme
On 2/13/07, Chris Mattmann [EMAIL PROTECTED] wrote:
Hi Doug, and Jerome,
Ah, yes, the log guard conversation. I remember this from a while back
Dennis,
I take my coffee black: with a single creamer ;) Okay, okay, sorry: I
thought we were talking about *real* hazing ;)
Cheers,
Chris
On 2/28/07 12:31 PM, Dennis Kubes [EMAIL PROTECTED] wrote:
Hi All,
Thank you Andrzej for your kind words. I am looking forward to working
Hi Doug, and Jerome,
Ah, yes, the log guard conversation. I remember this from a while back.
Hmmm, do you guys know what issue that this recorded as in JIRA? I have some
free time recently, so I will be able to add this to my list of Nutch stuff
to work on, and would be happy to take the lead
, and contacting the
folks who've begun work on this issue.
Thanks!
Cheers,
Chris
On 2/7/07 1:31 PM, Doug Cutting [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
Got it. So, the logic behind this is, why bother waiting until the
following fetch to parse (and create ParseData objects from
Guys,
Sorry to be so thick-headed, but could someone explain to me in really
simple language what this change is requesting that is different from the
current Nutch API? I still don't get it, sorry...
Cheers,
Chris
On 2/7/07 9:58 AM, Doug Cutting [EMAIL PROTECTED] wrote:
Renaud Richardet
PROTECTED] wrote:
Chris Mattmann wrote:
Sorry to be so thick-headed, but could someone explain to me in really
simple language what this change is requesting that is different from the
current Nutch API? I still don't get it, sorry...
A Content would no longer generate a single Parse. Instead
Hi Doug,
Since the target of the link must still be indexed separately from the
item itself, how much use is all this? If the RSS document is
considered a single page that changes frequently, and item's links are
considered ordinary outlinks, isn't much the same effect achieved?
IMHO, yes.
Hi there,
I could most likely be of assistance, if you gave me some more information.
For instance: I'm wondering if the use case you describe below is already
supported by the current RSS parse plugin?
The current RSS parser, parse-rss, does in fact index individual items that
are pointed to
you mention asynchronous above,
are you talking about the protocol for fetching the different RSS documents?
Thanks!
Cheers,
Chris
Thanks
-Original Message-
From: Chris Mattmann [EMAIL PROTECTED]
Date: Tue, 30 Jan 2007 18:16:44
To:nutch-dev@lucene.apache.org
Subject: Re
://lucene.apache.org/nutch/link
categorynews
/category
authorkauu/author
On 1/31/07, Chris
Mattmann [EMAIL PROTECTED] wrote:
Hi there,
I could most
likely be of assistance, if you gave me some more
information.
For
instance: I'm wondering if the use case you describe below
It's at least out-of-date and perhaps obsolete. A quick read of
Fetcher.java looks like there might be a case where a fatal error is
logged but the fetcher doesn't exit, in FetcherThread#output().
So this raises an interesting question:
People (such as Scott G.) out there -- are you folks
Hi Doug,
So, does this render the patch that I wrote obsolete?
Cheers,
Chris
On 1/25/07 10:08 AM, Doug Cutting [EMAIL PROTECTED] wrote:
Scott Ganyo (JIRA) wrote:
... since Hadoop hijacks and reassigns all log formatters (also a bad
practice!) in the org.apache.hadoop.util.LogFormatter
Before doubling (or after 0.9.0 tripling?) the maintenance/development work
please consider the following:
One option would be re factoring the code in a way that the parts that are
usable to other projects like protocols?, parsers (this actually was
proposed by
Jukka Zitting some time
Hi Dennis,
On 1/21/07 11:47 AM, Dennis Kubes [EMAIL PROTECTED] wrote:
All,
I am working on a How to Become a Nutch Developer document for the
wiki and I need some input.
I need an overview of how the process for JIRA works? If I am a
developer new to Nutch and just starting to look at
Folks,
When would you like to make the release? I've been working on NUTCH-185,
but got a bit bogged down with other work. If there is interest in having
NUTCH-185 included in the release, I could make a push to get out a patch by
week's end...
As for the rest, my +1 for NUTCH-61 being
Hi Sami,
On 12/9/06 2:27 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Author: siren
Date: Sat Dec 9 14:27:07 2006
New Revision: 485076
URL: http://svn.apache.org/viewvc?view=revrev=485076
Log:
Optimize SpellCheckedMetadata further by taking into account the fact that it
is used only
in org.apache.nutch.metadata that aggreates all
the met key fields from HttpHeaders, and it would be the place that the met
key fields for FileHeaders, etc. could go into.
Let me know what you think, and thanks!
Cheers,
Chris
On 12/9/06 3:53 PM, Sami Siren [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
Hi Sami
Hi Sami,
On 11/23/06 9:45 AM, Sami Siren [EMAIL PROTECTED] wrote:
Couple of points:
1. You used tabs
I just installed a new version of Eclipse, and forgot to change the default
preference for using tabs versus just whitespaces. I've went ahead and
changed this in my Eclipse and will commit
Thanks, Andrzej, thanks to the rest of the folks who voted me in! I really
appreciate the honor and pledge to help maintain the high quality of the
Nutch source code.
Best wishes and happy holidays to all the folks on the list!
Cheers,
Chris
On 11/23/06 4:10 AM, Andrzej Bialecki [EMAIL
Hi Folks,
I noticed that Nutch now requires JDK 5 in order to compile, due to recent
changes to the PluginRepository and some other classes. I think that this is
a good move, however, I wasn't sure that I had seen any official
announcement that Nutch now requires 1.5...
Cheers,
Chris
The switch to 1.5 format was also logged on jira issue
http://issues.apache.org/jira/browse/NUTCH-360
--
Sami Siren
Ahh, I didn't see this. Way to go Sami, I love it when people actually keep
records of changes! ;)
Cheers,
Chris
__
Chris A.
the email address for JIRA to not use the Apache
incubator one anymore, and to use to Lucene one.
Sound good? If so, could someone with permissions please take care of it?
:-)
Cheers,
Chris
On 10/3/06 9:04 AM, Sami Siren [EMAIL PROTECTED] wrote:
Andrzej Bialecki wrote:
Chris Mattmann wrote
Hi Doug and Andrzej,
+1. I think that workflow makes a lot of sense. Currently users in the
nutch-developers group can close and resolve issues. In the Hadoop workflow,
would this continue to be the case?
Cheers,
Chris
On 8/30/06 3:14 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Doug
Hi Chris,
It seems from your email message that your plugin is located in
$NUTCH_HOME/build/custom-meta? Is this where your plugin * code * is
currently stored? If so, this is the wrong location and the most likely
reason that your plugin isn't being loaded.
Plugin code should live in
Hi Jukka,
Thanks for your email. Indeed, there was discussion on the Lucene PMC email
list, about the Tika project. It was decided by the powers that be to
discuss it more on the Nutch mailing list before moving forward with any
vote on making Tika a sub-project of Apache Lucene. With regards to
the following list of candidate committers
who have expressed interested in our proposed project. The leader of the
Tika project would be Chris Mattmann. Chris works at NASA's Jet Propulsion
Laboratory as a Member of the Technical Staff in the Modeling and Data
Management Systems Section. Chris has
Hi Steven,
On 8/16/06 7:36 AM, steven shingler [EMAIL PROTECTED] wrote:
(This thread moved from the User List.)
OK Lukas, lets open it up to the dev list! :)
Particularly, does the group feel moving to Maven would be _a good thing_ ?
+1
I suggested this (however did not make any
Hi Guys,
I've seen on the Hadoop mailing list recently that there was a new status
added for issues in JIRA called Patch Available to let committers know
that a patch is ready for review to commit. How about we add this to the
Nutch jira instance as well? I tried doing this, but I don't think I
Hi Marko,
Thanks for your question. Basically it was set up as a sort of last
result of getting at least * some * information from the PDF file, albeit
littered with garbage. If indeed the parse-text does not really make sense
in terms of a backup parser to handle PDF files and get at least
Hey Andrzej,
On 8/3/06 8:19 AM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
Hi Marko,
Thanks for your question. Basically it was set up as a sort of last
result of getting at least * some * information from the PDF file, albeit
littered with garbage. If indeed
Hi Jukka,
Thanks for your email. Jerome Charron and I proposed a project with a
similar goal in mind that we wanted to dub Tika. Tika would effectively be
a Lucene sub-project, and would factor out some of the capabilities you
mention below from Nutch, incl:
1. MimeType repository
2. Parser
Folks,
Before I (or someone else) reopens the issue, I think it's important to
understand the implications:
1) Having a *side-effect* of the entire system stop processing after merely
logging a message at a certain event level is a poor practice.
I'm not sure that the Fetcher quitting is a *
Hi Andrzej,
The main problem, as Scott observed, is that the static flag affects all
instances of the task executing inside the same JVM. If there are
several Fetcher tasks (or any other tasks that check for SEVERE flag!),
belonging to different jobs, all of them will quit. This is
Hi Alex,
I also noticed this issue a while back. It's described here:
http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200510.mbox/%3c435
[EMAIL PROTECTED]
Cheers,
Chris
On 4/25/06 2:41 PM, Alex [EMAIL PROTECTED] wrote:
Hi there,
I'm fairly new to nutch and in working on the
Hi Stefan,
The DTD actually does allow for custom attributes: Jerome factored them
out of the form:
implementation
your_attr_name=your_attr_value
your_attr2_name=your_attr2_value
.
Into the form:
implementation
...
parameter name=your_attr_name value=your_attr_value/
+1
On 4/7/06 10:20 AM, Doug Cutting [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
+1 for a release sooner rather than later.
I think this is a good plan. There's no reason we can't do another
release in a month. If it is back-compatbible we can call it 0.8.x and
if it's incompatible
Hi Andrzej,
On 4/7/06 12:18 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Do you guys have any additional insights / suggestions whether NUTCH-240
and/or NUTCH-61 should be included in this release?
Looking at the JIRA popular issues pane for Nutch (
+1 for a release sooner rather than later. Several interesting features
contributed since the 0.7 branch I believe are now tested and
production-worthy, at least in my environment. Hats off to the folks who
were able to split the MapReduce and NDFS into Hadoop -- I'm going to be
experimenting with
Hi Folks,
I updated to the latest SVN revision (385691) today, and I am now seeing a
Null Pointer exception in the AnalyzerFactory.java class. It seems that in
some cases, the method:
private Extension getExtension(String lang) { Extension extension =
(Extension)
Thanks Jerome! :-)
Cheers,
Chris
On 3/13/06 4:02 PM, Jérôme Charron [EMAIL PROTECTED] wrote:
I updated to the latest SVN revision (385691) today, and I am now seeing
a
Null Pointer exception in the AnalyzerFactory.java class.
Fixed (r385702). Thanks Chris.
NOTE: not sure if
Hi Stefan,
after a short time I already had 1602 time this lines in my
tasktracker log files.
060307 022707 task_m_2bu9o4 found resource parse-plugins.xml at
file:/home/joa/nutch/conf/parse-plugins.xml
Sounds like this file is loaded 1602 (after lets say 3 minutes) I
guess that wasn't
RuntimeException(x point + Parser.X_POINT_ID + not
found.);
Cheers,
Chris
Cheers,
Stefan
Am 07.03.2006 um 04:38 schrieb Chris Mattmann:
Hi Stefan,
after a short time I already had 1602 time this lines in my
tasktracker log files.
060307 022707 task_m_2bu9o4 found resource parse
) {
throw new RuntimeException(x point + Parser.X_POINT_ID + not
found.);
-Original Message-
From: Chris Mattmann [mailto:[EMAIL PROTECTED]
Sent: Monday, March 06, 2006 7:51 PM
To: 'nutch-dev@lucene.apache.org'
Subject: RE: found resource parse-plugins.xm?
Hi Stefan,
Hi Chris
Hey Doug,
I think that at least in the case of parse-rss, parse-pdf, and the nutch
core if there's probably some utility in having lib-xxx plugins (or at least
putting these jars in the $NUTCH_HOME/lib) for:
commons-httpclient
log4j
xerces
Then, protocol-httpclient, parse-pdf and the rest of
Hi Andrzej,
commons-httpclient-3.0-beta1.jar src/plugin/parse-rss/lib
commons-httpclient-3.0.jarsrc/plugin/protocol-httpclient/lib
Not sure what was the reason to use the beta1, perhaps no reason except
that it was the latest available at the moment...
Yup, I think that was
and .classpath
+1
Am 08.02.2006 um 06:16 schrieb Chris Mattmann:
Hi Folks,
Just wondering if someone could add to the svn:ignore property for
Nutch
the files:
.classpath
.project
I happen to use eclipse to do Nutch development and always ignore
these
files in my other
Hi Folks,
Just wondering if someone could add to the svn:ignore property for Nutch
the files:
.classpath
.project
I happen to use eclipse to do Nutch development and always ignore these
files in my other eclipse projects as well.
Cheers,
Chris
Hi Gail,
Check out:
http://wiki.apache.org/nutch/ParserFactoryImprovementProposal/
That's the way that the parser factory currently works. Also added, but not
described in that proposal is the ability to call a parser by its id, which
is a method present in ParseUtil.java.
G'luck!
Cheers,
Hi Folks,
Jerome and I have been thinking a bit about the whole issue of static
NutchConf, versus removing it and making it a constructor parameter, etc. I
personally think that a lot of this issue stems from the fact that the
actual source code for nutch, and the what I would call source
Hi Folks,
I've tried removing the 5 copies of the comment, however I can't find a
button on JIRA to remove comments. Maybe an administrator for Nutch can do
it? Anyways, the dang thing is running so slow right now, it may just have
to wait until the server stops returning the 503 service
Guys,
My apologies for the spamming comments -- I tried to submit my comment
through JIRA one time and it kept giving me service unavailable. So I
resubmitted like 5 times, on the fifth time it finally went through -- but I
guess the other comments went through too. I'll try and remove them
Hi Folks,
Anybody been experiencing problems building the parse-rtf plugin? I just
noticed while working on NUTCH-139 that there's a line at the end of
RTFParser.java in parse-rtf that returns a new ParseImpl, however, the
constructor for ParseData uses the old ParseData constructor (pre
Hi Folks,
I was just thinking about the ParseData java.util.Properties metaata object
and thinking about the way that we store names in there. Currently, people
are free to name their string-based properties anything that they want, such
as having names of Content-type, content-TyPe,
insensitive?
Stefan
Am 13.12.2005 um 18:07 schrieb Chris Mattmann:
Hi Folks,
I was just thinking about the ParseData java.util.Properties
metaata object
and thinking about the way that we store names in there. Currently,
people
are free to name their string-based properties anything
Hi Folks,
Jerome and I have been talking about an idea to address the current issue
raised by Stefan G. about having a mapping of mimeType-list of pluginIds
rather than mimeType-list of extensionIds in the parse-plugins.xml file.
We've come up with the following proposed update that would
Hi Guys,
Okay, that makes sense then. I will create an issue in JIRA later today
describing the update, and then begin working on this over the next few
days.
Thanks for your responses and reviews.
Cheers,
Chris
On 12/13/05 12:45 PM, Jérôme Charron [EMAIL PROTECTED] wrote:
I agree, too.
Hi James,
You can submit your patch via JIRA
(http://issues.apache.org/jira/browse/NUTCH). You can create an issue there
and then attach your patch to that issue.
G'luck,
Chris
__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and
Hi Guys,
Just wondering if any of the committers checked out
http://issues.apache.org/jira/browse/NUTCH-112. Turns out the link to the
cached.jsp page to the cached content contains an absolute link which makes
the link mess up when you don't deploy the nutch webapp in the root context.
I've
Jerome,
I think that this is a great idea and ensures that there isn't replication
of so-called management information across the system. It could be easily
implemented as a utility method because we have utility java classes that
represent the ParsePluginList, that you could get the mimeTypes
Hi Doug,
Chris Mattmann wrote:
In principle, the mimeType system should give us some guidance on
determining the appropriate mimeType for the content, regardless of
whether
it ends in .foo, .bar or the like.
Right, but the URL filters run long before we know the mime type
Hi Jerome,
Yes, the fetcher can't rely on the document mime-type.
The only thing we can use for filtering is the document's URL.
So, another alternative, could be to exclude only files extensions that
are
registered in the mime-type repository
(some well known file extensions) but for which
generic forms of XML markup content.
Cheers,
Chris Mattmann
Am 24.11.2005 um 00:01 schrieb Jérôme Charron:
Hi,
We (Chris Mattmann, François Martelet, Sébastien Le Callonnec and
me) just
add a new proposal on the nutch Wiki:
http://wiki.apache.org/nutch
Hi Stefan, and Jerome,
A mail archive is a amazing source of information, isn't it?! :-)
To answer your question, just ask your self how many pages per second
your plan to fetch and parse and how much queries per second a lucene
index is able to handle - and you can deliver in the ui.
I
Hi Doug,
I just noticed this comment from your original email:
First, the ParserFactory sometimes uses LOG.severe() which causes the
Fetcher to exit. Is there a reason this cannot be LOG.warning()?
LOG.severe() should only be used if you intend the application to exit.
This configuration
Hi Doug,
Thanks, that worked.
Cheers,
Chris
On 10/17/05 11:56 AM, Doug Cutting [EMAIL PROTECTED] wrote:
Chris Mattmann wrote:
So, my question to you then
is, what type of QueryFilter should I develop in order to get my query for
contactemail:email address to work as a standalone query
Hi,
I'm not an XML expert by any means, but wouldn't it be simpler to just wrap
any text where illegal chars are possible with a !CDATA[ ]! tag? That
way, the offending characters won't be dropped and the process won't be
lossy, no?
If the CDATA method won't work, and there's no other way
Hi Otis,
Point taken. In actuality since both convey the same information I think
that it's okay to support both, but by default say we could code the initial
plugins specified in parse-plugins.xml without the order= attribute. Fair
enough?
Cheers,
Chris
On 9/15/05 3:23 PM, [EMAIL
Hi Jack,
Wow, that's a weird error. I'm not exactly sure what's causing that, let me
look at the stack trace you provided and get back to you at some point on
that. As for your 2nd question:
My question is that can parse-rss support application/xml or more
content-type?
The answer to that is
Hi Jerome,
I may have some time to work on this over the next few days if no one else
does. So, if you're taking the lead on this, I volunteer my help if you'd
like it.
Thanks,
Chris
On 9/8/05 2:06 AM, Jerome Charron (JIRA) [EMAIL PROTECTED] wrote:
Enhance ParserFactory plugin selection
73005.
The patch and source distro are zipped up in the file: parse-rss-73005.zip.
Here is a direct link:
http://issues.apache.org/jira/secure/attachment/12311475/parse-rss-73005.zip
Thanks!
Cheers,
Chris Mattmann
__
Chris A. Mattmann
[EMAIL PROTECTED
They have my votes. Great job so far guys.
Jérôme Charron: +1
Piotr Kosiorowski: +1
Cheers,
Chris Mattmann
On 6/8/05 1:09 PM, Doug Cutting [EMAIL PROTECTED] wrote:
I propose that we add Jérôme Charron and Piotr Kosiorowski as Nutch
committers. Both Jérôme and Piotr have contributed many
recently. Does anyone have any clue as to what
causes it? I can attach the full crawl log if necessary. Please let me know.
Thanks very much,
Chris Mattmann
__
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data
,
Chris Mattmann
-Original Message-
From: Marco PV [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 20, 2005 7:24 PM
To: nutch-dev@incubator.apache.org
Subject: parse-rss fetch problems
Hi,
I'm using /nutch-nightly from April 18th.
I've downloaded and uploaded the last src/plugin/parse
. Thanks for
trying to help Marco out, but his problem regarding compiling had to do with
having the old (0.6) version of Nutch.
Thanks,
Chris
__
-Original Message-
From: Chris Mattmann [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 20, 2005 8
1 - 100 of 119 matches
Mail list logo