Re: RFR [15] 8239487: Better links generation for system properties found in HTML files

Jonathan Gibbons Mon, 09 Mar 2020 19:12:24 -0700

Meta comment:

This review is getting out of hand; I think it was a mistake for me tohave ordered my comments according to your phases, and I apologize; Iwon't do it again.

---

Ther's some naming stuff and minor feedback, but I think you're gettingclose.


-- Jon


On 03/09/2020 10:48 AM, Pavel Rappo wrote:

Jon, the replies are inline and the updated webrev is here:

   http://cr.openjdk.java.net/~prappo/8239487/webrev.01/

On 7 Mar 2020, at 01:53, Jonathan Gibbons <[email protected]> wrote:

phase-1

In AbstractIndexWriter,
the use of "anyOf" with a singleton list seems a bit weird.

I've changed methods' names to "ofCategories" and "containsOfCategories".
I didn't change them to just "of" and "contains" because of the possibility
of having a richer selection of those in the near future. For instance,
ofModule(...), ofPackage(...), havingFirstLetter(...), etc.

"containsOfCategories" doesn't make grammatical sense. Maybe you shouldgo back to "anyOf" naming, which allows for the possibility of"allOf". It's just that both containsAnyOf and containsAllOf collapseto just 'contains' for a single argument.

In SearchIndexItem:
The word "index" is heavily overloaded in javadoc. Would it be better in this case to use 
"INDEX_TAG"? (Just asking).
If nothing else, add doc-comments on the members of Category.

I've added comments instead of renaming that constant.

OK, at least for now. ( I may change my mind in the future :-) )

Folding "boolean systemProperty" into the Category seems like a good idea.

In SearchIndexItems:   ugh for having to put the apiNote after the scary 
comment. Nothing we can do, except maybe one day think about removing the scary 
comments, which have maybe outlived their usefulness. Maybe we could limit them 
to package-info.java fies.

  120      * <p> The returned stream consists of all items {@code i} for which 
there's
It's often considered better to avoid contractions (like "there's") in formal 
writing.

  120      * <p> The returned stream consists of all items {@code i} for which 
there's
  121      * a category {@code c}, from the specified categories, such that
  122      * {@code i.getCategory().equals(c)}. The stream may be empty, if 
there are
  123      * no such items.
The commas are questionable around "from the specified categories". Don't use 
commas when the enclosed text is important to the understanding of the sentence.

Grammar has been fixed. Thanks.

`concat` seems overkill.  For an internal method, I'd simplify the 
anyOfCategories, and either use overloads or a single varargs and just verify 
the length is not zero.  By inspection, it seems to only ever use 1 or 2 args. 
Use overloads!

I still think this method has a value, and I do hope to reuse it really soon.
The reason is, I hold compile-time checks in high regard.

It's not the argument pattern (first, rest) as much as theimplementation. I'm unconvinced of the need to detect duplicates, andconcat is the wrong name if you do want to suppress duplicates.


Quick search [*] through the codebase shows there are other places where this
pattern is used. Many of those ultimately push arguments down to either of

     java.util.EnumSet.of(E first, E... rest)
     java.nio.file.FileSystem.getPath(String first, String... more)

with a couple of exceptions. For example, these methods delegate to neither
of the above:

     java.net.http.WebSocket.Builder.subprotocols(String mostPreferred,
                                                  String... lesserPreferred)
     com.sun.tools.javac.util.List.of(A x1, A x2, A x3, A... rest)

Among the things I found are these:

     javax.tools.StandardJavaFileManager.PathFactory.getPath
     (ahem) jdk.javadoc.internal.doclets.formats.html.markup.HtmlTree.UL

If you think of alternatives like overloads, as you suggested, or plain vararg,

they have their own problems. Both will contain roughly the same code addressing
duplicates and nulls, albeit a bit more simple. Simple, but not trivial. If so,
we might as well provide this code once. It's not a performance-sensitive
place by any means.

For overloads like

     ofCategories(Category category)
     ofCategories(Category firstCategory, Category secondCategory)
     ofCategories(Category firstCategory, Category secondCategory, Category 
thirdCategory)
     ...

the former method also has a naming problem: should "category" be plural or singular?


After I posted the initial webrev, I realized that there's a potential problem
(but not yet a bug) in that `concat` method of mine. Namely, ordering. Set.of()
is ordering-unfriendly by design. So a stream from that set would be of an
unknown encounter order. I fixed in a straightforward way:

     @SafeVarargs
     @SuppressWarnings("varargs")
     private static <T> Stream<T> concat(T required, T... optional) {
         Set<T> set = new LinkedHashSet<>(optional.length + 1, 1.0f);
         set.add(Objects.requireNonNull(required));
         for (T t : optional) {
             if (!set.add(Objects.requireNonNull(t)))
                 throw new IllegalArgumentException();
         }
         return set.stream();
     }

phase-2

AbstractIndexWriter
line 511,512: horrible non-standard formatting: was this suggested by an IDE?

No, it was done by hand. Default IDE's style is not to blame. Fixed.

keyCharacter: when is this method called with an empty string?  It looks like 
it is related to the SearchIndexItem having en empty label ... perhaps we 
should check/fix the problem there.
Be careful of using Character.toUpperCase/toLowerCase ... they are 
locale-dependent.
That being said, maybe we do want the locale-dependent version here, which is 
unusual.

One more of these things can be found here:

     jdk.javadoc.internal.doclets.toolkit.util.IndexBuilder#keyCharacter

This is all preexisting code. That said, we should definitely investigate it.
Do you want me to do it in the context of this task though?

No

While it is reasonable to get the first character of a Java identifier, I 
wonder if it is reasonable
to get the first character of an arbitrary string, i.e. from an {@index} tag.

various: it's not wrong, but I personally don't like the style of importing 
static methods, like groupingBy and toList. Keep it if you want, but if I were 
writing the code, I would use Collectors.methodName

I share your dislike in this case. I guess I'm still trying to cram code into
80-character-long lines. Old habits die hard.

The convention for this code is more like 100-characters or so. The daysof punched cards and video display units with 80x24 character screensare long gone.

phase-3

HtmlDocletWriter: this class is one instance per page ... do you really mean to 
put it here, as compared to Configuration (more global) or some more specific 
kind of page?  (I see it moved in phase 4!)

SystemPropertiesWriter:80
I don't know if this is the right or wrong place for the check, but for reference I would 
compare with other "optional" pages, like Constant Values and Serialized Form.

If in doubt, at least be consistent with what there is at the moment. That makes
sense. As far as I understand you are pointing me to the "builders" 
infrastructure,
right?

SystemPropertyWriter has never had a builder. If we are to use it, may I suggest
we address this in a follow-up issue?

Uugh, builders, uugh. That's a big can of worms that we don't want toopen right now.

SystemPropertiesWriter:155
I like the intention, but I would have thought the link factory would have 
returned a link in code font. I guess I would check other places where links 
are generated.

Jon, I found

     jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter#plainOrCode

and a family of overloaded createLink methods consuming some enigmatic

     @param strong     whether to wrap the {@code label} in a SPAN element

Is there anything in particular I should be looking for?

I guess I was thinking to go look at other places where references tojava elements are created. If you chase down code starting fromAbstractMemberWriter.Signature, via methods like 'addParam', you end upat HtmlDocletWriter.getLink, which you can call with a suitablyconfigured LinkInfoInfo.

TagletWriterImpl:457
The comment is unclear: does "those" refer back to DocFileELement?

I've refined that comment.

I'm also curious why you think using identity equals and hashCode is 
inefficient.

It's not that those methods are inefficient, it's that this *cache* of
DocletElement instances *might* be inefficient. The worst-case scenario is where
titles for the same HTML document represented by different DocletElement objects
are repeatedly calculated and stored. The comment is just about that. I was lazy
and didn't investigate where those instances come from, I also saw that those
types do not override equals & hashCode, hence this comment.

OK, so I guess you're concerned about space-inefficient, nottime-inefficient

because identity .equals and .hashCode are fast ;-)

Utils:  uuugh ... I keep trying to get stuff *out* of the Utils dumping ground, 
and here you are adding to it. This is OK for now, but there may be a different 
utility class waiting to be written for processing DocTrees. This is not the 
only method to be working on DocTrees (although other work may be elsewhere, 
down in jdk.compiler module.)

Please propose a more appropriate place.

It's OK for now.

test: the new convention is to generate files on the fly ... which will be even 
easier when we have text blocks. The ToolBox library class has code to help 
write out smal files like these ... and using text strings avoids the heavy 
copyright overhead.

Not that I'm opposing it, just asking: are there any other benefits or using
on-the-fly file generation beyond simply avoiding "the heavy copyright overhead"
in this case?

In all cases where I've had to create such sets of files, I find it'sgenerally easier to visualize the overall pattern of the files when theyare co-located.


Both of the said techniques have their advantages and disadvantages. On-the-fly
generation is convenient for example, when the output is small or structurally
flat, or when it is parametric. Here it is neither. Pre-baked files can benefit
from IDE support, which is quite convenient. I wouldn't like to give it up this
lightly.

OK.

phase-4

In SystemPropertiesWriter, there is no need to use a WeakHashMap compared to a 
standard HashMap.

Right. Using WeakHashMap is more of a declaration: it's cache, the whole cache,
and nothing but the cache. I like this extra bit of information, but I can swap
it to HashMap if you want.

It's just an unnecessary overhead to be creating all those weak references.
But, your point is taken.

Elements created by javac, which is the dominant collection of elements, will 
be permanent until the end of the javadoc run. And moreover, 
SystemPropertiesWriter should have a short lifetime, and no elements will be 
garbage-collectible while it is alive.

Right. (For the sake of the argument, those DocletElement instances are
created by javadoc.)

Line 172: I see the comment moved here from TagletWriterImpl. Same comments 
apply that I made for phase-3.

In SearchIndexItem, if I understand it correctly, it seems strange to add a 
field of type DocletElement that is specific to certain types of search index 
item.  And, there is a cognizant disparity between the name of the type 
(DocletElement) and the name of the method (getElement). Generally, I would 
expect a method named `getElement` to return an Element.

Understood and fixed.

It seems like all other serach-index items are associated with n element anyway!

Not sure what can be done here. The signature of `getElement` doesn't require
any argument or `int` value for that matter, which could be mistaken for
some sort of index/position such as in List.get(int).

I'm not sure I understand what you're saying here. I was trying to saythat since allsearch index items seems to be abstractly associate with some elementrepresentingthe "target" of the search item, then it was reasonable to have ageneral getter/setterfor the associate element, and that seems to be what you've done,although maybe not

so far as to set it in all cases for all SearchIndexItems.


   ***

What about feedback to the user in case no <title> in the document is found?
Do you have any suggestions?

Which user? the one running javadoc, or the one reading the generated files.
The base name of the file (e.g. net-properties.html)


-Pavel

-------------------------------------------------------------------------------
[*] You can repeat that search using Java-flavored regex:

     (\w+)\s+\w+\s*,\s*\1\s*\.\.\.

There should be more civilized ways of doing that :-) I tried to use the "Search
Structurally" feature of my IDE, but couldn't make it work. The problem is that
it gives me a lot of noise because I haven't found a way to make it tell `[]`
and varargs apart.

One has to be careful when looking for the "(T required, T optional...)"
pattern. Not all the results of the above search have the required semantics.
Some happen to use the same type for semantically different entities. Usually
it's a name and aliases, a base and extensions, a value and args, a delimiter
and elements, an actual and a list of expected, etc. We are specifically
interested in method signatures where these parameters are used to pass a
*uniform* collection of things, but are nevertheless separate, solely for the
sake of friendlier APIs.

Re: RFR [15] 8239487: Better links generation for system properties found in HTML files

Reply via email to