RAT: IHeaderMatcher Design

Robert Burrell Donkin Fri, 12 Jul 2013 12:27:34 -0700

Rat spends a lot of effort parsing textual documents, looking forheaders and boilerplate text. There's an extension point (of sorts) forthe searches that can be performed, provided by IHeaderMatcher[1].

This interface has a few TODOs in. It's used by pushing the text in oneline at a time, after doing some pre-processing. As the TODO indicates,this may not the most elegant design.

As an extension point, IHeaderMatcher has the advantage of flexibility.It would be possible to plug in radically different implementations. Itturns out, though, that few clever new implementations have emerge. Allimplementations seem to do is check for license headers.

One disadvantage of this arrangement is that it pushes some of theparsing outwards toward supposedly pluggable implementations. This meansthat adding new licenses means adding a partial parser.

I wonder whether it might be more intuitive (as well as openingpotential for faster parsing) to use immutable domain objects forlicenses and so on, making them data rather than processors.


Opinions...? Alternatives...?

Robert

[1]
/**
* Resets this matches.
* Subsequent calls to {@link #match} will accumulate new text.
*/
public void reset();

/**
* Matches the text accumulated to licenses.
* TODO probably a poor design choice - hope to fix later
* @param subject TODO
* @param line next line of text, not null
* @return TODO
*/

public boolean match(Document subject, String line) throwsRatHeaderAnalysisException;


http://svn.apache.org/viewvc/creadur/rat/trunk/apache-rat-core/src/main/java/org/apache/rat/analysis/IHeaderMatcher.java?revision=1396305&view=markup

RAT: IHeaderMatcher Design

Reply via email to