On the 0x216 day of Apache Harmony Alexei Fedotov wrote:
> Egor,
> 
> Thank you for your interest. 

We definitely need to improve our documentation. Necessity is not a
real interest :)

> Here is an algorithm:
> 
> 1. Create a list of words from HTML files.
> 2. Merge a dictionary of all words used in documentation.
> 3. Remove a half of the most frequently used words from the dictionary
> - I believe they do not add much sense.
> 4. Remove misspelled words (including identifiers) from the dictionary.
> 5. Give a page +1 for each rare, correctly spelled word according to
> the dictionary.
> 6. Divide to the total number of words on the page.

hm, strange heuristic. More unique correctly spelled words is
beneficial. It does not give a clue on the overall quality of
documentation, which is rather confusing..

I thought of something more natural. Number of documented items
vs. number of non-documented. Plus a penalty to the relative number of
misspelled words.

> I've collected nice RFEs from your letter. Most of them make me think
> and I like them.
> a. Update an ASF block comment
> b. Improve readability. Some things are really easy - like removing
> awk and rewriting most things in perl. Others are a bit more complex -
> I targeted script performance when created auto-generated perl script.
> Also, initial algorithm was a bit more complex - different words had a
> different cost based on their popularity.
> c. Use junit test output format to integrate with
> http://harmonytest.org. I believe I need a feature request for that
> site as well - we need some way to import performance-like rankings to
> the site.

Yes, I thought of the RFE to harmonytest. At least, put the doc items
on a separate page from the build items.

> d. I will think of parsing sources. But I don't think we need to
> maintain both scripts. The generic rule is simple - improve your .h
> and .java files - .cpp files don't count. I suggest better to link
> .html files to contributors.

can you calculate a list of relevant filenames from a doc page? give
filename +1 for each documented item, give a -1 for each undocumented,
divide on the number of items. Is it easy to implement?  Maybe doxygen
has some features to assist this?

> Thank you for ideas. I will certainly update the script. I just want
> to wait a bit - many scripts die just because people are not
> interested to run them a second time. Also, if anyone suggest any
> changes in algorithm or any other RFEs, I want to implement them all
> at once.
> 
> Nadya, could you please point us a good documentation file so we can
> use it as a pattern?

-- 
Egor Pasko

Reply via email to