On the 0x216 day of Apache Harmony Alexei Fedotov wrote: > Egor, > > Thank you for your interest.
We definitely need to improve our documentation. Necessity is not a real interest :) > Here is an algorithm: > > 1. Create a list of words from HTML files. > 2. Merge a dictionary of all words used in documentation. > 3. Remove a half of the most frequently used words from the dictionary > - I believe they do not add much sense. > 4. Remove misspelled words (including identifiers) from the dictionary. > 5. Give a page +1 for each rare, correctly spelled word according to > the dictionary. > 6. Divide to the total number of words on the page. hm, strange heuristic. More unique correctly spelled words is beneficial. It does not give a clue on the overall quality of documentation, which is rather confusing.. I thought of something more natural. Number of documented items vs. number of non-documented. Plus a penalty to the relative number of misspelled words. > I've collected nice RFEs from your letter. Most of them make me think > and I like them. > a. Update an ASF block comment > b. Improve readability. Some things are really easy - like removing > awk and rewriting most things in perl. Others are a bit more complex - > I targeted script performance when created auto-generated perl script. > Also, initial algorithm was a bit more complex - different words had a > different cost based on their popularity. > c. Use junit test output format to integrate with > http://harmonytest.org. I believe I need a feature request for that > site as well - we need some way to import performance-like rankings to > the site. Yes, I thought of the RFE to harmonytest. At least, put the doc items on a separate page from the build items. > d. I will think of parsing sources. But I don't think we need to > maintain both scripts. The generic rule is simple - improve your .h > and .java files - .cpp files don't count. I suggest better to link > .html files to contributors. can you calculate a list of relevant filenames from a doc page? give filename +1 for each documented item, give a -1 for each undocumented, divide on the number of items. Is it easy to implement? Maybe doxygen has some features to assist this? > Thank you for ideas. I will certainly update the script. I just want > to wait a bit - many scripts die just because people are not > interested to run them a second time. Also, if anyone suggest any > changes in algorithm or any other RFEs, I want to implement them all > at once. > > Nadya, could you please point us a good documentation file so we can > use it as a pattern? -- Egor Pasko