Re: RFR [15] 8239487: Better links generation for system properties found in HTML files

Pavel Rappo Tue, 10 Mar 2020 09:05:29 -0700

Oops. Here is a link to the forgotten updated webrev:

  http://cr.openjdk.java.net/~prappo/8239487/webrev.02/


> On 10 Mar 2020, at 15:45, Pavel Rappo <[email protected]> wrote:
> 
> The replies are inline.
> 
>> On 10 Mar 2020, at 02:11, Jonathan Gibbons <[email protected]> 
>> wrote:
>> 
>> Meta comment:
>> This review is getting out of hand; I think it was a mistake for me to have 
>> ordered my comments according to your phases, and I apologize; I won't do it 
>> again.
> 
> Sorry about that.
> 
>> ---
>> 
>> Ther's some naming stuff and minor feedback, but I think you're getting 
>> close.
>> 
>> -- Jon
>> 
>> 
>> On 03/09/2020 10:48 AM, Pavel Rappo wrote:
>>> Jon, the replies are inline and the updated webrev is here:
>>> 
>>>  http://cr.openjdk.java.net/~prappo/8239487/webrev.01/
>>> 
>>>> On 7 Mar 2020, at 01:53, Jonathan Gibbons <[email protected]> 
>>>> wrote:
>>>> 
>>>> phase-1
>>>> 
>>>> In AbstractIndexWriter,
>>>> the use of "anyOf" with a singleton list seems a bit weird.
>>> I've changed methods' names to "ofCategories" and "containsOfCategories".
>>> I didn't change them to just "of" and "contains" because of the possibility
>>> of having a richer selection of those in the near future. For instance,
>>> ofModule(...), ofPackage(...), havingFirstLetter(...), etc.
>> 
>> "containsOfCategories" doesn't make grammatical sense. Maybe you should go 
>> back to "anyOf" naming, which allows for the possibility of "allOf".  
> 
> I know that naming is hard. On the one hand, all code, both internal and 
> external,
> should be of the same good quality. On the other hand, it's an internal API
> inaccessible from the outside. How much effort do we want to put into it?
> 
> Here's another perspective. On the one hand, the name should be a good 
> mnemonic
> and ideally should be sufficient to see what's going on without having to 
> reach
> for documentation. On the other hand, it is an idealistic and probably
> unattainable goal. Sigh.
> 
> I think that the "contains" method should have "any" in its name. The reason 
> is
> that I can easily imagine reading the code and being puzzled by "Does this 
> mean
> that true is returned when ALL of the listed categories or ANY of them are
> found?" Methods that return Stream<SearchIndexItem> are different in that
> respect. One glance at Category resolves any ambiguities. Indeed, an item may 
> be
> of only one category. Thus, no item can be of many categories, thus the 
> implicit
> semantics is "any/or" rather than "all/and".
> 
>> It's just that both containsAnyOf and containsAllOf collapse to just 
>> 'contains' for a single argument.
> 
> I can only say that this is not a unique case. 
> java.util.concurrent.CompletableFuture's
> allOf and anyOf come to mind.
> 
>>>> In SearchIndexItem:
>>>> The word "index" is heavily overloaded in javadoc. Would it be better in 
>>>> this case to use "INDEX_TAG"? (Just asking).
>>>> If nothing else, add doc-comments on the members of Category.
>>> I've added comments instead of renaming that constant.
>> OK, at least for now. ( I may change my mind in the future :-) )
>> 
>>> 
>>>> Folding "boolean systemProperty" into the Category seems like a good idea.
>>>> 
>>>> In SearchIndexItems:   ugh for having to put the apiNote after the scary 
>>>> comment. Nothing we can do, except maybe one day think about removing the 
>>>> scary comments, which have maybe outlived their usefulness. Maybe we could 
>>>> limit them to package-info.java fies.
>>>> 
>>>> 120      * <p> The returned stream consists of all items {@code i} for 
>>>> which there's
>>>> It's often considered better to avoid contractions (like "there's") in 
>>>> formal writing.
>>>> 
>>>> 120      * <p> The returned stream consists of all items {@code i} for 
>>>> which there's
>>>> 121      * a category {@code c}, from the specified categories, such that
>>>> 122      * {@code i.getCategory().equals(c)}. The stream may be empty, if 
>>>> there are
>>>> 123      * no such items.
>>>> The commas are questionable around "from the specified categories". Don't 
>>>> use commas when the enclosed text is important to the understanding of the 
>>>> sentence.
>>> Grammar has been fixed. Thanks.
>>> 
>>>> `concat` seems overkill.  For an internal method, I'd simplify the 
>>>> anyOfCategories, and either use overloads or a single varargs and just 
>>>> verify the length is not zero.  By inspection, it seems to only ever use 1 
>>>> or 2 args. Use overloads!
>>> I still think this method has a value, and I do hope to reuse it really 
>>> soon.
>>> The reason is, I hold compile-time checks in high regard.
>> 
>> It's not the argument pattern (first, rest) as much as the implementation. 
>> I'm unconvinced of the need to detect duplicates, and concat is the wrong 
>> name if you do want to suppress duplicates.
> 
> I see. I don't have a strong opinion about nulls and duplicates. However, I'm
> convinced that those extra runtime checks have their value.
> 
> That said, I don't want this thread to rathole into a discussion that has been
> done many times in many places. It really is not that essential in this 
> context.
> So I hope you'll welcome this "skimmed milk" version of the "concat" 
> functionality:
> 
>    /**
>     * Returns a concatenated stream of elements.
>     *
>     * <p> The elements of the returned stream are encountered in the 
> following order:
>     * {@code first, remaining[0], remaining[1], ..., 
> remaining[remaining.length - 1]}.
>     *
>     * @param first
>     *         the first element
>     * @param remaining
>     *         the remaining elements, if any
>     * @param <T>
>     *         the type of elements
>     *
>     * @return the stream of elements
>     *
>     * @throws NullPointerException
>     *         if {@code remaining} is {@code null}
>     */
>    private static <T> Stream<T> concatenatedStreamOf(T first, T[] remaining) {
>        return Stream.concat(Stream.of(first), Stream.of(remaining));
>    }
> 
> Just bare-bones functionality of concatenation for internal consumption.
> Duplicates and nulls are dealt with on the client side, if required.
> 
>>> 
>>> Quick search [*] through the codebase shows there are other places where 
>>> this
>>> pattern is used. Many of those ultimately push arguments down to either of
>>> 
>>>    java.util.EnumSet.of(E first, E... rest)
>>>    java.nio.file.FileSystem.getPath(String first, String... more)
>>> 
>>> with a couple of exceptions. For example, these methods delegate to neither
>>> of the above:
>>> 
>>>    java.net.http.WebSocket.Builder.subprotocols(String mostPreferred,
>>>                                                 String... lesserPreferred)
>>>    com.sun.tools.javac.util.List.of(A x1, A x2, A x3, A... rest)
>>> 
>>> Among the things I found are these:
>>> 
>>>    javax.tools.StandardJavaFileManager.PathFactory.getPath
>>>    (ahem) jdk.javadoc.internal.doclets.formats.html.markup.HtmlTree.UL
>>>    If you think of alternatives like overloads, as you suggested, or plain 
>>> vararg,
>>> they have their own problems. Both will contain roughly the same code 
>>> addressing
>>> duplicates and nulls, albeit a bit more simple. Simple, but not trivial. If 
>>> so,
>>> we might as well provide this code once. It's not a performance-sensitive
>>> place by any means.
>>> 
>>> For overloads like
>>> 
>>>    ofCategories(Category category)
>>>    ofCategories(Category firstCategory, Category secondCategory)
>>>    ofCategories(Category firstCategory, Category secondCategory, Category 
>>> thirdCategory)
>>>    ...
>>>    the former method also has a naming problem: should "category" be plural 
>>> or singular?
>>> 
>>> After I posted the initial webrev, I realized that there's a potential 
>>> problem
>>> (but not yet a bug) in that `concat` method of mine. Namely, ordering. 
>>> Set.of()
>>> is ordering-unfriendly by design. So a stream from that set would be of an
>>> unknown encounter order. I fixed in a straightforward way:
>>> 
>>>    @SafeVarargs
>>>    @SuppressWarnings("varargs")
>>>    private static <T> Stream<T> concat(T required, T... optional) {
>>>        Set<T> set = new LinkedHashSet<>(optional.length + 1, 1.0f);
>>>        set.add(Objects.requireNonNull(required));
>>>        for (T t : optional) {
>>>            if (!set.add(Objects.requireNonNull(t)))
>>>                throw new IllegalArgumentException();
>>>        }
>>>        return set.stream();
>>>    }
>>> 
>>>> phase-2
>>>> 
>>>> AbstractIndexWriter
>>>> line 511,512: horrible non-standard formatting: was this suggested by an 
>>>> IDE?
>>> No, it was done by hand. Default IDE's style is not to blame. Fixed.
>>> 
>>>> keyCharacter: when is this method called with an empty string?  It looks 
>>>> like it is related to the SearchIndexItem having en empty label ... 
>>>> perhaps we should check/fix the problem there.
>>>> Be careful of using Character.toUpperCase/toLowerCase ... they are 
>>>> locale-dependent.
>>>> That being said, maybe we do want the locale-dependent version here, which 
>>>> is unusual.
>>> One more of these things can be found here:
>>> 
>>>    jdk.javadoc.internal.doclets.toolkit.util.IndexBuilder#keyCharacter
>>> 
>>> This is all preexisting code. That said, we should definitely investigate 
>>> it.
>>> Do you want me to do it in the context of this task though?
>> No
>>> 
>>>> While it is reasonable to get the first character of a Java identifier, I 
>>>> wonder if it is reasonable
>>>> to get the first character of an arbitrary string, i.e. from an {@index} 
>>>> tag.
>>>> 
>>>> various: it's not wrong, but I personally don't like the style of 
>>>> importing static methods, like groupingBy and toList. Keep it if you want, 
>>>> but if I were writing the code, I would use Collectors.methodName
>>> I share your dislike in this case. I guess I'm still trying to cram code 
>>> into
>>> 80-character-long lines. Old habits die hard.
>> The convention for this code is more like 100-characters or so. The days of 
>> punched cards and video display units with 80x24 character screens are long 
>> gone.
> 
> ok, from now on I'll try to use that extra 25% for the benefit of the 
> jdk.javadoc codebase.
> 
>>> 
>>>> phase-3
>>>> 
>>>> HtmlDocletWriter: this class is one instance per page ... do you really 
>>>> mean to put it here, as compared to Configuration (more global) or some 
>>>> more specific kind of page? (I see it moved in phase 4!)
>>>> 
>>>> SystemPropertiesWriter:80
>>>> I don't know if this is the right or wrong place for the check, but for 
>>>> reference I would compare with other "optional" pages, like Constant 
>>>> Values and Serialized Form.
>>> If in doubt, at least be consistent with what there is at the moment. That 
>>> makes
>>> sense. As far as I understand you are pointing me to the "builders" 
>>> infrastructure,
>>> right?
>>> 
>>> SystemPropertyWriter has never had a builder. If we are to use it, may I 
>>> suggest
>>> we address this in a follow-up issue?
>> 
>> Uugh, builders, uugh.  That's a big can of worms that we don't want to open 
>> right now.
> 
> Right. I have put an extra comment there, if you don't mind.
> 
>>> 
>>>> SystemPropertiesWriter:155
>>>> I like the intention, but I would have thought the link factory would have 
>>>> returned a link in code font. I guess I would check other places where 
>>>> links are generated.
>>> Jon, I found
>>> 
>>>    jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter#plainOrCode
>>> 
>>> and a family of overloaded createLink methods consuming some enigmatic
>>> 
>>>    @param strong     whether to wrap the {@code label} in a SPAN element
>>> 
>>> Is there anything in particular I should be looking for?
>> I guess I was thinking to go look at other places where references to java 
>> elements are created. If you chase down code starting from 
>> AbstractMemberWriter.Signature, via methods like 'addParam', you end up at 
>> HtmlDocletWriter.getLink, which you can call with a suitably configured 
>> LinkInfoInfo.
> 
> I'm looking into it and will update you soon. Thanks.
> 
>>>> TagletWriterImpl:457
>>>> The comment is unclear: does "those" refer back to DocFileELement?
>>> I've refined that comment.
>>> 
>>>> I'm also curious why you think using identity equals and hashCode is 
>>>> inefficient.
>>> It's not that those methods are inefficient, it's that this *cache* of
>>> DocletElement instances *might* be inefficient. The worst-case scenario is 
>>> where
>>> titles for the same HTML document represented by different DocletElement 
>>> objects
>>> are repeatedly calculated and stored. The comment is just about that. I was 
>>> lazy
>>> and didn't investigate where those instances come from, I also saw that 
>>> those
>>> types do not override equals & hashCode, hence this comment.
>> OK, so I guess you're concerned about space-inefficient, not time-inefficient
>> because identity .equals and .hashCode are fast ;-)
> 
> Well, kind of. It's when the cache behaves as if it weren't there and the 
> title
> is parsed out of HTML each and every time.
> 
>>>> Utils:  uuugh ... I keep trying to get stuff *out* of the Utils dumping 
>>>> ground, and here you are adding to it. This is OK for now, but there may 
>>>> be a different utility class waiting to be written for processing 
>>>> DocTrees. This is not the only method to be working on DocTrees (although 
>>>> other work may be elsewhere, down in jdk.compiler module.)
>>> Please propose a more appropriate place.
>> It's OK for now.
>> 
>>> 
>>>> test: the new convention is to generate files on the fly ... which will be 
>>>> even easier when we have text blocks. The ToolBox library class has code 
>>>> to help write out smal files like these ... and using text strings avoids 
>>>> the heavy copyright overhead.
>>> Not that I'm opposing it, just asking: are there any other benefits or using
>>> on-the-fly file generation beyond simply avoiding "the heavy copyright 
>>> overhead"
>>> in this case?
>> 
>> In all cases where I've had to create such sets of files, I find it's 
>> generally easier to visualize the overall pattern of the files when they are 
>> co-located.
>> 
>>> 
>>> Both of the said techniques have their advantages and disadvantages. 
>>> On-the-fly
>>> generation is convenient for example, when the output is small or 
>>> structurally
>>> flat, or when it is parametric. Here it is neither. Pre-baked files can 
>>> benefit
>>> from IDE support, which is quite convenient. I wouldn't like to give it up 
>>> this
>>> lightly.
>> OK.
>>> 
>>>> phase-4
>>>> 
>>>> In SystemPropertiesWriter, there is no need to use a WeakHashMap compared 
>>>> to a standard HashMap.
>>> Right. Using WeakHashMap is more of a declaration: it's cache, the whole 
>>> cache,
>>> and nothing but the cache. I like this extra bit of information, but I can 
>>> swap
>>> it to HashMap if you want.
>> It's just an unnecessary overhead to be creating all those weak references.
>> But, your point is taken.
>>> 
>>>> Elements created by javac, which is the dominant collection of elements, 
>>>> will be permanent until the end of the javadoc run. And moreover, 
>>>> SystemPropertiesWriter should have a short lifetime, and no elements will 
>>>> be garbage-collectible while it is alive.
>>> Right. (For the sake of the argument, those DocletElement instances are
>>> created by javadoc.)
>>> 
>>>> Line 172: I see the comment moved here from TagletWriterImpl. Same 
>>>> comments apply that I made for phase-3.
>>>> 
>>>> In SearchIndexItem, if I understand it correctly, it seems strange to add 
>>>> a field of type DocletElement that is specific to certain types of search 
>>>> index item.  And, there is a cognizant disparity between the name of the 
>>>> type (DocletElement) and the name of the method (getElement). Generally, I 
>>>> would expect a method named `getElement` to return an Element.
>>> Understood and fixed.
>>> 
>>>> It seems like all other serach-index items are associated with n element 
>>>> anyway!
>>> Not sure what can be done here. The signature of `getElement` doesn't 
>>> require
>>> any argument or `int` value for that matter, which could be mistaken for
>>> some sort of index/position such as in List.get(int).
>> I'm not sure I understand what you're saying here. I was trying to say that 
>> since all
>> search index items seems to be abstractly associate with some element 
>> representing
>> the "target" of the search item, then it was reasonable to have a general 
>> getter/setter
>> for the associate element, and that seems to be what you've done, although 
>> maybe not
>> so far as to set it in all cases for all SearchIndexItems.
> 
> I see.
> 
>>>  ***
>>> 
>>> What about feedback to the user in case no <title> in the document is found?
>>> Do you have any suggestions?
>> Which user? the one running javadoc, or the one reading the generated files.
> 
> Bob the javadoc-builder. That is, the one running the javadoc tool.
> 
>> The base name of the file (e.g. net-properties.html)
> 
> This is how the fallback is performed in this change.
> 
> -Pavel

Re: RFR [15] 8239487: Better links generation for system properties found in HTML files

Reply via email to