Hi s'marks, You probably don't need to absolutify paths. And you can easily handle multiple args.
(just for fun!) Checks for javadoc comment; handles popular html entities; handles multiple lines; handles both tt and code: #!/bin/bash find "$@" -name '*.java' | \ xargs -r perl -p0777i -e \ 'do {} while s~^ *\*.*\K<(tt|code)>((?:[^<>{}\&\@]|&(?:lt|gt|amp);)*)</\1>~$_=$2; s/</</g; s/>/>/g; s/&/&/g; "{\@code $_}"~mgie' On Wed, Sep 30, 2015 at 6:16 PM, Stuart Marks <stuart.ma...@oracle.com> wrote: > Hi Alexander, Martin, > > The challenge of Perl file slurping and Emacs one-liners was too much to > bear. > > This is Java, so one-liners are hardly possible. Still, there are a bunch > of improvements that can be made to the Java version. (OK, and I'm showing > off a bit.) > > Take a look at this: > > http://cr.openjdk.java.net/~smarks/misc/SimpleTagEditorSmarks1.java > > I haven't studied the output exhaustively, but it seems to do a reasonably > good job for the common cases. I ran it over java.lang and I noticed a few > cases where there is markup embedded within <code></code> text, which > should be looked at more closely. > > I don't particularly care if you use my version, but there are some > techniques that I'd strongly recommend that you consider using in any such > tool. In particular: > > - Pattern.DOTALL to do multi-line matches > - Pattern.CASE_INSENSITIVE > - try-with-resources to ensure that files are closed properly > - NIO instead of old java.io APIs, particularly Files.walk() and streams > - use Scanner to deal with input file buffering > - Scanner's stream support (I recently added this to JDK 9) > > Enjoy, > > s'marks > > > > On 9/29/15 2:23 PM, Martin Buchholz wrote: > >> Hi Alexander, >> >> your change looks good. It's OK to have manual corrections for automated >> mega-changes like this, as long as they all revert changes. >> >> Random comments: >> >> Should you publish your specdiff? I guess not - it would be empty! >> >> while((s = br.readLine()) != null) { >> >> by matching only one line at a time, you lose the ability to make >> replacements that span lines. Perlers like to "slurp" in the entire file >> as a single string. >> >> s = s.replace( "<CODE>", tag1); >> s = s.replace( "<Code>", tag1); >> s = s.replace("</CODE>", tag2); >> s = s.replace("</Code>", tag2); >> >> Why not use case-insensitive regex? >> >> Here's an emacs-lisp one-liner I've been known to use: >> >> (defun tt-code () >> (interactive) >> (query-replace-regexp "<\\(tt\\|code\\)>\\([^&<>\\\\]+\\)</\\1>" >> "{@code >> \\2}")) >> >> With more work, one can automate transformation of embedded things like >> < >> >> But of course, it's not even possible to transform ALL uses of <code> to >> {@code, if there was imaginative use of nested html tags. >> >> >> On Tue, Sep 29, 2015 at 3:21 AM, Alexander Stepanov < >> alexander.v.stepa...@oracle.com> wrote: >> >> Updated: a few manual corrections were made (as @linkplain tags displays >>> nested {@code } literally): >>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch >>> -checked with specdiff (which of course does not cover documentation for >>> internal packages), no unexpected diffs detected. >>> >>> Regards, >>> Alexander >>> >>> >>> On 9/27/2015 4:52 PM, Alexander Stepanov wrote: >>> >>> Hello Martin, >>>> >>>> Here is some simple app. to replace <code></code> tags with a new-style >>>> {@code } one (which is definitely not so elegant as the Perl >>>> one-liners): >>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/SimpleTagEditor.java >>>> >>>> Corresponding patch for jdk and replacement log (~62k of the tag >>>> changes): >>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/jdk.patch >>>> http://cr.openjdk.java.net/~avstepan/tmp/codeTags/replace.log >>>> (sorry, I have to check the correctness of the patch with specdiff yet, >>>> so this is rather demo at the moment). >>>> >>>> Don't know if these changes (cosmetic by nature) are desired for now or >>>> not. Moreover, probably some part of them should go to another repos >>>> (e.g., >>>> awt, swing -> "client" instead of "dev"). >>>> >>>> Regards, >>>> Alexander >>>> >>>> >>>> >>>> ----- Исходное сообщение ----- >>>> От: alexander.v.stepa...@oracle.com >>>> Кому: marti...@google.com >>>> Копия: core-libs-dev@openjdk.java.net >>>> Отправленные: Четверг, 24 Сентябрь 2015 г 16:06:56 GMT +03:00 Москва, >>>> Санкт-Петербург, Волгоград >>>> Тема: Re: RFR [9] 8133651: replace some <tt> tags (obsolete in html5) in >>>> core-libs docs >>>> >>>> Hello Martin, >>>> >>>> Thank you for review and for the notes! >>>> >>>> > I'm biased of course, but I like the approach I took with >>>> blessed-modifier-order: >>>> > - make the change completely automated >>>> > - leave "human editing" for a separate change >>>> > - publish the code used to make the automated change (in my case, >>>> typically a perl one-liner) >>>> >>>> Automated replacement has an obvious advantage: it is fast and massive. >>>> But there are some disadvantages at the same time (just IMHO). >>>> >>>> Using script it is quite easy to miss some not very trivial cases, e.g.: >>>> - remove unnecessary linebreaks, like >>>> * <tt>someCode >>>> * </tt> >>>> (which would be better to replace with single-line {@code someCode}; >>>> - joining of successive terms, like "<tt>ONE</tt>, <tt>TWO</tt>, >>>> <tt>THREE</tt>" -> "{@code ONE, TWO, THREE}"; >>>> - errors like extra or missing "<" or ">": * <tt>Collection >>>> <T></tt>", - there were a lot of them; >>>> - some cases when <tt></tt> should be replaced with <code></code>, not >>>> {@code } (e.g. because of unicode characters inside of code etc.); >>>> - extra tags inside of <tt> or <code> which should be moved outside of >>>> {@code }, like <tt><i>someCode</i></tt> or <tt><b>someCode</b></tt>; >>>> - simple removing of needless tags, like "<tt>{@link ...}</tt>" -> >>>> "{@link ...}"; >>>> - replace HTML codes with symbols ('<', '>', '@', ...) >>>> - etc. >>>> - plus some other formatting changes and fixes for misprints which would >>>> be omitted during the automated replacement (and wouldn't be done in >>>> future manually because there is no motivation for repeated processing). >>>> >>>> So sometimes it may be difficult to say where is the border between >>>> "trivial" and "human-editing" cases (and the portion of "non-trivial >>>> cases" is definitely not minor); moreover, even the automated >>>> replacement requires the subsequent careful review before publishing of >>>> webrev (as well as by reviewers who probably wouldn't be happy to review >>>> hundreds of files at the same time) and iterative checks/corrections. >>>> specdiff is very useful for this task but also cannot fully cover the >>>> diffs (as some changes are situated in the internal com/... sun/... >>>> packages). >>>> >>>> Moreover, I'm sure that some reviewers would be annoyed with the fact >>>> that some (quite simple) changes were postponed because they are "not >>>> too trivial to be fixed just now" (because they will suspect they would >>>> be postponed forever). So the patch creator would (probably) receive >>>> some advices during the review like "please fix also fix this and that" >>>> (which is normal, of course). >>>> >>>> So my preference was to make the changes package by package (in some >>>> reasonable amount of files) not postponing part of the changes for the >>>> future (sorry for these boring repeating review requests). Please note >>>> that all the above mentioned is *rather explanation of my motivation >>>> than objection* :) (and of course I used some text editor replace >>>> automation which is surely not so advanced as Perl). >>>> >>>> > It's probably correct, but I would have left it out of this change >>>> Yes, I see. Reverted (please update the web page): >>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/index.html >>>> >>>> Thanks, >>>> Alexander >>>> >>>> P.S. The <tt> replacement job is mostly (I guess, ~80%) complete. But >>>> probably this approach should be used if some similar replacement task >>>> for, e.g., <code></code> tags would be planned in future (there are >>>> thousands of them). >>>> >>>> >>>> On 9/24/2015 6:10 AM, Martin Buchholz wrote: >>>> >>>> >>>>> On Sat, Sep 19, 2015 at 6:58 AM, Alexander Stepanov >>>>> <alexander.v.stepa...@oracle.com >>>>> <mailto:alexander.v.stepa...@oracle.com>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> Could you please review the following fix >>>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/ >>>>> <http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/> >>>>> http://cr.openjdk.java.net/~avstepan/8133651/jaxws.00/index.html >>>>> < >>>>> http://cr.openjdk.java.net/%7Eavstepan/8133651/jaxws.00/index.html >>>>> >>>>>> >>>>>> for >>>>> https://bugs.openjdk.java.net/browse/JDK-8133651 >>>>> >>>>> Just another portion of deprecated <tt> (and <xmp>) tags replaced >>>>> with {@code }. Some misprints were also fixed. >>>>> >>>>> >>>>> I'm biased of course, but I like the approach I took with >>>>> blessed-modifier-order: >>>>> - make the change completely automated >>>>> - leave "human editing" for a separate change >>>>> - publish the code used to make the automated change (in my case, >>>>> typically a perl one-liner) >>>>> >>>>> >>>>> The following (expected) changes were detected by specdiff: >>>>> - removed needless dashes in java.util.Locale, >>>>> - removed needless curly brace in >>>>> xml.bind.annotation.XmlElementRef >>>>> >>>>> >>>>> I would do a separate automated "removed needless dashes" changeset. >>>>> >>>>> >>>>> Please let me know if the following changes are desirable or not: >>>>> >>>>> >>>>> http://cr.openjdk.java.net/~avstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html >>>>> < >>>>> >>>>> http://cr.openjdk.java.net/%7Eavstepan/8133651/jdk.00/src/jdk.jconsole/share/classes/sun/tools/jconsole/Formatter.java.udiff.html >>>>> >>>>>> >>>>>> >>>>> >>>>> This is an actual change to the behavior of this code - the >>>>> maintainers of jconsole need to approve it. It's probably correct, >>>>> but I would have left it out of this change. If you remove it, then I >>>>> approve this change. >>>>> >>>>> >>>> >>>