Again on the imap-URL-URI Problem:
I wrote my own URLNormalizer and changed to URI in the ProtocolFactory. Problem in the URLNormalizer is checking, if the url points to a file. I dont know where this is necassary. But only for checking, if it works I left that out.
I don't bother about the real imap URL Syntax know. I just want to see my imap-protocol-plugin beeing called.
But it isnt.
injecting imap://mymailserver.org/INBOX writes it to the webdb
as far as I can see the FetchListTool accepts it too, but when the FetcherThread goes through the fetchlist, it breaks when coming to:
if (fetchList.next(fle) == null)
So nutch doesnt come to choose the protocol.
I don't really understand what happens with the Fetchlist and the FetchListEntry. There are no errors coming out, it just doesnt work.
Any ideas why getting the FetchListEntry is null?
Konstantin
Doug Cutting wrote:
Konstantin Ott wrote:
The protocol plugins seem to be the right starting point. But here and at other places like the Fetcher I see that pages are basically needing the java.net.URL. Actually only for splitting the url in host,port, path.... So we only need the URLStreamHandler in the protocol plugins.
Using something like the javax.mail.URLName would leave the necessary StreamHandler in the protocol plugins and it would be easily possible to make plugins for what ever protocol I like to implement.
Any ideas for building protocol plugins not using the java.net.URL ?
You are correct that java.net.URL is used only to parse URLs into protocol, host, port, file, etc. So we could indeed use a different class that only supports this. Two questions:
1. Why should we replace it? What is the problem with java.net.URL? Does it reject unknown protocols? If so, that would be a good reason.
2. What should we replace it with? I would opt for java.net.URI over javax.mail.URLName. It seems well suited to our purposes and is included in the base JVM. However I recall trying to use it when initially writing Nutch's URLNormalizer and found it deficient. But, if URL does not permit us to use arbitrary protocol names, then perhaps we should revisit this, and work-around these deficiencies.
Would someone like to try replacing URL with URI globally, and seeing what works and what fails?
Doug
------------------------------------------------------- This SF.net email is sponsored by Demarc: A global provider of Threat Management Solutions. Download our HomeAdmin security software for free today! http://www.demarc.com/Info/Sentarus/hamr30 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
