[
https://issues.apache.org/jira/browse/DROIDS-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721121#action_12721121
]
Mingfai Ma commented on DROIDS-48:
----------------------------------
let me submit another patch. i have a habit to use the formatter of my IDE but
I haven't set it to use the coding style of this project, so. ... :-P
p.s. for this issue, it could be handled just by adding a weight integer field.
but i feel it is most flexible if the LinkTask could whole any arbitrary data.
And the simplest way is to make it extends Map.
{code}
public class LinkTask extends HashMap<String, Serializable> { //other interface
are skipped;
protected final String id; //whatever data type for ID
protected final URI uri; //refer to DROIDS-52, this may cause problem for
URI)
// all the other data are optional
{code}
use cases:
- say, in submitting a link, we want to associate information about cookie/http
header, so the fetcher could use the cookie info when fetching
- any optional fields like weight could be used
- any component, such as filter or parser or whatever, could mark arbitrary tag
for a link. say, a parser/factory, may read a "parser"/"contentType" value to
decide how the data could be parsed. (so the parser doesn't depends on
HttpEntity in interface) or the outlink could be attached directly to a
LinkTask.
i throw the initial idea here to see if anyone has comment. more details on the
implementation could be provided.
> Support prioritizing in the TaskQueue
> -------------------------------------
>
> Key: DROIDS-48
> URL: https://issues.apache.org/jira/browse/DROIDS-48
> Project: Droids
> Issue Type: New Feature
> Components: core
> Affects Versions: 0.01
> Reporter: Mingfai Ma
> Attachments: DROIDS-48d.patch, DROIDS-48d2.patch
>
>
> Use case:
> - when looping a directory, (imagine someone is too stupid and dunno the
> dmoz database can be downloaded and try to crawl it with Droids) we got
> collect a lot of links that will be handled later. assume the requirement is
> to fetch dmoz directory +1 link outside dmoz.org, In the original mechanism,
> it will keep adding new links to the TaskQueue. Ideally, there should be a
> mechanism to give a higher priority to the non-dmoz.org links, so when
> non-dmoz links are added, they are processed first, and be removed from the
> TaskQueue asap.
> with the patch in DROIDS-47, a constructor is added to the SimpleTaskQueue to
> support a custom Queue. This issue suggests to change the SimpleTaskQueue to
> use a PriorityBlockingQueue by default, and add a getWeight to the Task
> interface
> I'm also thinking about a more complex TaskQueue. to be discussed in the mail
> list later.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.