Make LinkTask supports arbitrary data by extends HashMap, and consider to
refactor Task, Link, and LinkTask
-----------------------------------------------------------------------------------------------------------
Key: DROIDS-54
URL: https://issues.apache.org/jira/browse/DROIDS-54
Project: Droids
Issue Type: New Feature
Components: core
Affects Versions: 0.01
Reporter: Mingfai Ma
refer to the initial idea at:
https://issues.apache.org/jira/browse/DROIDS-48?focusedCommentId=12721121&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12721121
The current implementation of LinkTask
{code}
public class LinkTask implements Link, Serializable
{
private Date started;
private final int depth;
private final URI uri;
private final Link from;
private Date lastModifedDate;
private Collection<URI> linksTo;
private String anchorText;
private int weight;
{code}
Suggested change:
{code}
public class LinkTask extends HashMap<String, Serializable>
or
public class LinkTask extends HashMap<String, Serializable> implements Link
{code}
The minimum required attributes are:
- final ? id,
- mainly to have a minimum size value as hash key and store in memory/data
grid for lookup, e.g. for use as history to avoid duplicated fetching. refer to
DROIDS-53
- final String url
- the original String representation of the URL (preferred), or java.net.URI
representation with the encoded string (seems no good).
- the url is the original one provided by the user in construction. two diff
url may refer to the same url, e.g. http://www.apache.org and
http://www.apache.org/, it's up to the user to decide if they should be
normalized. (and they could use the URL/LinkNormalizer in DROIDS-45
the other fields are basically optional.
- started/taskDate, if the queue use it for sorting, then it's useful,
otherwise, it's just for logging.
- "weight" is another example that not all implementation may need.
- "linksTo", a.k.a. outLinks, is also optional to be attached to the
LinkTask. an implementation may extract the outlink and put them in queue
directly without storing the outlinks in the LinkTask.
- "from", a.k.a. referrer, should not store the Link reference as it will
affect GC.
btw, should we also simplify Link, Task and LinkTask? if we use a Map, it's
very generic already. Link and Task could be different concepts if we need to
use them separately.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.