On Sun, 2010-06-27 at 23:25 +0300, Eugen Paraschiv wrote:
> Hi, I'm working with Apache Droids and have been using it successfully
> in a personal project.

Welcome Eugen, nice to hear that it worked for you. 

>  While I was using and extending the project, I can across some
> questions and made some notes that are now stacking up. 

Please subscribe and post to droids-dev
<[email protected]> and not droids-dev-owner (that are
only the moderators).

> Some of them may already be in the roadmap, and some others may be due
> to the fact that I'm still a new user of the framework. I will try to
> list them here, one by one. Here goes: 
> 
> 1. Why are things such as FileRenameDroid, ExceptionReportHandler,
> ReportHandler, FileRenameDroid  located in the test packages of the
> project? Should the test package contain only tests, or is there a
> reason these classes are located here?

The FileRename I started there as a test/example but yeah I guess there
should be moved. Can you prepare a patch and attach it to the issue
tracker?

> 
> 2. Why does Save (handler) define static utility methods? Should there
> be a utility class for these kinds of methods? 

Yeah (createFile) could be probably refactored 

> 
> 3. Should Save be in fact named SaveHandler?

About the naming we still have some open discussions. We need to pick up
again the naming/concept discussion we had on this list with Ming Fai.

> 
> 4. LinkTask has a two methods called: setLastModifedDate and
> getLastModifedDate (wrong spelling) and one called getLastModifiedDate
> (correct spelling), working with the same field. This is probably not
> right - they need to be renamed and cleaned up
> 

+1

> 5. SaveCrawlingDroid - if the Save handler is created within, the
> client has no control over it - no way of setting properties on it and
> no way to inject another one. A possible solution is to inject the
> handler into the droid; this way, the handler can be configured and
> then injected

I agree partly since the package org.apache.droids.impl is in the test
package meaning it is just for testing the save, sysout, ...

> 
> 6. Why is SimpleTaskQueue also a TaskValidator? It already contains a
> TaskValidator, so what is the advantage of wrapping this and
> delegating the calls

Not 100% sure.

> 7. Proposal: the Save handler has an outputDirectory property; that
> only allows the files to be stored in a single place, with no
> additional rules applied to this process; the process of choosing the
> path of the file can be more flexible, based on client logic; this can
> be done via a simple object that would decide the path based on that
> logic - PathDecider; also this would get rid of the path related logic
> from this handler for a clear separation of responsibilities between
> deciding the path where data is saved and actually saving the data
> 

I recommend to create a new save handler for this, since the save
handler is just for writing a file to disk. However I totally agree that
we can rename the "save" to "simpleSave" and the new save could allow to
write to jcr, svn, ftp, ...



> 8. Proposal: the validation process needs more flexibility - a chain
> of validators that are applied one after the other; when one would
> consider the entity invalid, the process would stop (voters); this can
> be achieved via a ChainValidator that holds a ordered set of
> validators and executes them in order, on validate. This way the
> separation of each validation rule can be done more cleanly than
> mixing all the rules in a single validator

There have been some discussion around validators, but nobody has
provided new patches around the issue.

> 
> 9. Proposal: also, more predefined validators are needed:
> SimpleTaskValidator can be renamed to MaxDepthTaskValidator, as that
> is it's only responsability,

+1

>  and other validators can be added: 
>       * AllowedDomainsTaskValidator - to validate only tasks that are
>         from one or several specified domains, 
>       * RegexTaskValidator - for validating a Task based on some
>         specified regex rules

Actually we have that in the droids-spring package:
 <!-- Filter -->
  <bean

name="org.apache.droids.api.URLFilter/org.apache.droids.net.RegexURLFilter"
    class="org.apache.droids.net.RegexURLFilter">
    <property name="file" value="${droids.filter.regex}"/>
  </bean>

>       * TypeTaskValidator - for validating based on the resource type
> etc

ATM a task is more or less a link to crawl but in the future if we have
various task implementations I agree. 

> 
> 10. Why do the factories not initialize their own state (the Map)? Is
> it to provide more flexibility? It seems that a Factory could just as
> easily instantiate it's own map, because that is it's main state and
> it cannot really be used without an instantiated map. 

That has historical reasons, the first version of droids was one big
package driven by spring. So the maps we stored in the spring conf
rather then in the class.

> 
> 11. Why is the validation process is based on throwing exceptions? As
> a practice, an exception is an exceptional situation; a validator
> declaring a task as invalid is a normal situation, not an exceptional
> one, so perhaps a boolean result would suffice here. 

I am not 100% sure anymore but I remember there have been situations
where the exception was used to stop the process.

..but maybe Ryan or Javier can remember (I think one of them brought the
issue up). 

> 
> 12. A possible bug is the following situation: when a worker merges
> some tasks into the queue, if one of them throws an exception,
> signaling that it is not valid, then the whole merge process is
> stopped and the other tasks are lost. I can go into more detail on
> this if needed. 

You mean the ones coming after the one that threw the exception, yeah,
but that is not a bug. e.g. the MaxDepthTaskValidator have to reject
every task if max depth is reached. The calling unit has to catch
InvalidTaskException and implement the business logic to react to it
(e.g. saving the task to disk and later on display them to the user).


> 
> 13. Is there any set of rules the project follows? I'm thinking of
> things like a set of Sonar rules - an xml file containing the rules
> and priorities. 

You mean http://www.sonarsource.org/features/, no not that I am aware
of.

> 
> 14. There is no null checking in many parts of the code. Is that
> because these checks should happen in client code, or is it something
> that will be present in a future release?
> 

It is hard to say in general, I guess some should happen in client code
other maybe better implemented as a prior check.

> 
> OK, these are just some quick notes and questions I gathered and I
> wanted to put them out there. Some may not make sense, hopefully not
> too many. As for the implementation proposals, I have worked on them
> and can submit patches to be integrated into the project when needed.
> I hope this is the proper way of putting this kinds of questions

Perfect. :)

...and yes we are always welcoming patches and new helping hands.

salu2

>  forward. 
> Thanks, 
> Eugen. 
> 

-- 
Thorsten Scherler <thorsten.at.apache.org>
Open Source Java <consulting, training and solutions>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to