branch: externals/idlwave commit af15d2b4307642051970e7df0b5a372d74436cb9 Author: jdsmith <jdsmith> Commit: jdsmith <jdsmith>
- Initial checkin for IDLv5.6 rinfo scanner --- get_html_rinfo | 1942 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1942 insertions(+) diff --git a/get_html_rinfo b/get_html_rinfo new file mode 100755 index 0000000000..f5cd1779da --- /dev/null +++ b/get_html_rinfo @@ -0,0 +1,1942 @@ +#!/usr/bin/perl +# +# Program to extract the information from the HTML version of the IDL +# manuals (v5.6 and on) and IDL itself, to support IDLWAVE. +# +# (c) 1999, 2000 Carsten Dominik <domi...@astro.uva.nl> +# (c) 2001-2003 J.D. Smith <jdsm...@as.arizona.edu> +# +# Requires the HTML documentation files distributed with IDL 5.6 or +# later, decompiled from HTMLHelp idl.chm with Microsoft's HTML Help +# Workshop. +# +# Talks to the local version of IDL in order to get additional information. +# +# Call this program from the command line like this: +# +# get_html_rinfo -path path/to/htmlfiles/ --idl /path/to/idl/executable +# +# This will scan the HTML, extract routine information, talk to IDL +# and write the following file, needed by IDLWAVE: +# +# idlw-rinfo.el: Routine information for completion etc. + +# The full manpage of this program is available with "perldoc get_html_rinfo". + +# Commentary: +# =========== +# +# IDL currently contains more than 1400 functions, procedures and +# object methods with more than 6500 keywords. In order to support +# writing IDL programs, the IDLWAVE mode uses a list of routines and +# keywords. RSI does not provide such a list in machine-readable +# form. Therefore this program embarks on the task of extracting the +# necessary information from available sources - the IDL manuals which +# are supplied by RSI in HTMLHelp (and thence HTML) format. +# get_html_rinfo works by looking for the "Syntax" sections and +# extracting information from there. +# +# The program extracts information from manuals written by humans (the +# documentation department at RSI). Naturally, such documents are not +# always complete, consistent or free of typos. As a result of this, +# the list of routines and keywords extracted from the manuals will +# not be perfect. In order to account for incorrect or inconsistent +# "Syntax" entries in the manual, get_html_rinfo contains a number of +# special matchers which detect specific entries and corrects them +# manually. See the definition of %specials in the BEGIN block. When +# a new version of IDL is released, the actions of these special +# matchers needs to be checked, because the involved syntax entries +# may have changed. +# +# Please contact the maintainer if you find any inconsistencies between +# the routine information supplied by IDLWAVE, and the IDL documentation. +# +# Full source-guided online help, via the decompiled HTML help files, +# is available for use in many different browsers and help +# systems. See the IDLWAVE documentation for details. +# +# Acknowledgement: +# ================ +# +# Without Perl, the task of reverse-engineering thousands of pages of +# documentation would have been impossible. With Perl, it only takes +# a small (ok, medium-sized) program like this. Thanks to Larry Wall +# and the Perl community. +# +# Thanks to Mark Goosman & Stephanie Staley from RSI for granting me +# permission to extract and distribute routine information from the +# IDL manual, and to Doug Dirks for suggesting the use of decompiled +# HTML help instead of the (infinitely more complicated) PDF help used +# previously, and for patiently fixing the documentation typos this +# routine finds. +# +# Maintainer Information: +# ======================= +# When a new version of IDL is published, the following things might need +# work in order to make the result of this program as good as possible. +# To find the corresponding places in the file, search for "UPDATE". +# +# 1. UPDATE: Manual sections +# ----------------------- +# Each Routine description has many sections. The liast can be +# corrected when necessary. Right now, the Syntax and Keywords +# sections are being parsed. This relies explicitly on the HTML +# "heading" syntax, which should be verified to continue to be +# operational. +# +# 2. UPDATE: Special matchers +# ------------------------ +# The %specials hash contains all the special matchers which fix +# unusual or incorrect routine descriptions. They change the entry +# so that the parser does the right thing, or they add entries to +# special arrays. The first thing to do with a new version is to +# run get_html_rinfo with the -debug flag. Part of the output will +# be a list of all special matchers, with additional info how often +# the matcher matches, and if it did its actions correctly. +# Matchers which no longer match should be checked. Matching, but +# not acting may be due to the fact that RSI fixed that particular +# entry. It is also possible that new special matchers have to be +# written for new entries - you need to check what's new with the +# IDL version and if it is processed correctly. Read the +# documentation just before the definition of the %specials hash to +# find out how such a matcher must behave. +# +# 3. UPDATE: Special sections matchers & parsers +# ------------------------------------------- +# Certain keywords are linked to a separate, special section, apart +# from the routines which use those keywords. Examples include +# Graphics Keywords, Multi-Threading Keywords, Device Keywords, and +# system variables. The %special_sections hash names these +# sections as keys, and two routines as values: one to detect +# whether a given html file is this section (mathcer), and the +# other parse its keywords (parser). These special sections are +# scanned for keywords, and linked to from other routines for the +# with the relevant keywords. +# +# 4. UPDATE: Statement regexp +# ------------------------ +# This is also a special matcher which makes sure that no routine +# info is produced by the syntax entries for IDL language +# constructs. As RSI adds new statements (as COMPILE_OPT in +# version 5.3 or SWITCH/BREAK/CONTINUE in version 5.4), this regexp +# must be extended to match the new statements as well. +# +# 5. UPDATE: Special help topics +# --------------------------- +# Some words which can show up in IDL source code have special help +# topics (not routine names) associated with them. The +# %special_topics hash links downcase versions of these words to +# the appropriate topics. Even if word and topic are the same, it +# must be mentioned here in order to trigger help on this word. +# +# For more detailed info, run `get_html_rinfo' with the `-debug' flag +# and check the resulting files get_html_rinfo.cpl, +# get_html_rinfo.rej, and get_html_rinfo.log. The .cpl "complaint" +# file can be sent to the RSI documentation group for fixing problems +# detected. +# +#============================================================================ + +require 5.004; +use Data::Dumper; +# Parse command line options and make file names +use Getopt::Long; +GetOptions("-debug" => \$debug, + "-path=s" => \$path, + "-idl=s" => \$idl, + "-xname=s" => \$ignore_name_re, + "-xclass=s" => \$ignore_class_re) + or usage(); + +if (@ARGV) { + # Something on the command line + print STDERR "Unrecognized command line args: @ARGV\n"; + exit(1); +} + +$idl = $idl || "idl"; + +%compress = (".gz" => "gunzip -c", + ".bz2" => "bunzip2 -c", + ".Z" => "zcat"); + +# Establish default output file names +$rinfofile = "idlw-rinfo.el"; +$topicsfile = "idlw-help-topics.el"; + +# Check the path +$path ||= "."; +die "Invalid path $path\n" unless !$path || -d $path; + +opendir(DIR,$path) or die "can't opendir $path: $!"; +@files=grep {-r "$path/$_" && /\.html$/} readdir(DIR); +closedir(DIR); + +%routines = (); + +# Open the REJECT and LOG files file for debugging information +if ($debug) { + open REJECT, ">get_html_rinfo.rej" or + die "Cannot write to get_html_rinfo.rej\n"; + open COMPLAIN,">get_html_rinfo.cpl" or + die "Cannot write to get_html_rinfo.cpl\n"; + open LOG, ">get_html_rinfo.log" or + die "Cannot write to get_html_rinfo.log\n"; +} + +# Open the lisp file for output +open RINFO,">$rinfofile" or die "Cannot open $rinfofile for writing: $!"; + +# Scan the files +FILE: +foreach $file (@files) { + open(FILE,"$path/$file"); + local $/=undef; #Slurp mode + $file_contents=<FILE>; + + # Title + if ($file_contents=~m|<title>\s*([^<]+?)\s*</title>|) { + $title=$1; + } else { + $title=""; + } + + # Check for special topic sections, which have no keywords (like + # if...then...else) + foreach (keys %special_topics) { + if (!defined $special_topics{$_}{file} && + $title=~m|^\Q$_\E$|i) { + $special_topics{$_}{file}=$file; + next FILE; + } + } + + # Save files with upcase titles, for tracking classes + $files{uc $title}=$file; + + # Check if file contains one of the various special keyword + # sections, and parse those keywords + foreach (keys %special_sections ) { + if (&{$special_sections{$_}{matcher}}) { #Is it a special section? + &{$special_sections{$_}{parser}}; #Parse the keywords for it + $special_sections{$_}{file}=$file; + %{$special_sections{$_}{kwds}}=%txt_kwds; + next FILE; + } + } + + # Look for system variable definitions + if ($title=~/System Variables$/) { + my ($sysvar,%fake_sysvars,$link); + @parts=split(/(?=<h3\s*class="Heading2">)/,$file_contents); + shift @parts; + + foreach (@parts) { + parse_keywords($_,1,'<h3\s*class="Heading2">'); + next unless %txt_kwds; + %fake_sysvars=%txt_kwds; + parse_keywords($_,1); # Get the tags if any + foreach (keys %fake_sysvars) { + $sysvar=uc $_; + $sysvar=~s/^\!//; + print "GOT SYSVAR: $sysvar\n" if $debug; + %{$sysvars{$sysvar}{tags}}=%txt_kwds; + $sysvars{$sysvar}{main}=$fake_sysvars{$_}? + "$file#$fake_sysvars{$_}":$file; + } + } + next FILE; + } + + # Look for executive commands + if ($title=~/^\.[A-Z]+$/) { + $executive_commands{$title}=$file; + } + + # Look for a version + if(!$idlversion && + $file_contents=~m|<a[^>]*>New Features in IDL ([0-9.]+)\s*</a>|) { + $idlversion=$1; + } + + # Split into component parts + @parts=split(/<h3\s+class="Heading2">\s+/,$file_contents); + shift @parts; + + # Extract the Syntax and Keyword parts + $syntax=$kwds=""; + foreach (@parts) { + # Require a proper heading + next unless m|^\s*<a\s+name="[0-9]+">\s+</a>\s*([^\n\r]+)\s+|i; + my $type=$1; + if (!$syntax && $type=~/^Syntax/) { + $syntax=$_; + } elsif ($type=~/^((?:[A-Z][a-z0-9_]+\s+){0,2}Keywords| + Keywords:\s*(?:[A-Z][a-z0-9_]+\s*){1,2})\s*$/x) { + $kwds.=$_ unless $1=~/Thread Pool Keywords/; # Nothing useful in TPool + } + } + next FILE unless $syntax; + + # Clear a few variables which are used by deeper routines to return stuff + @rejects = @complaints = @add_keywords = @enter = (); + clean_up_syntax(); + + diag("$file($title):>>>>>>>>>>>>>>>>>>\n"); + diag("GOT KWDS SECTION:\n+++++++\n$kwds\n---------\n") if $kwds; + + # Apply the special syntax matchers + &try_specials(); + diag("$syntax\n"); + + # See if there are reasons to reject or complain about this entry + if (@rejects) { + # Reject + $n_rejections += scalar(@rejects); + reject($syntax,$file,@rejects); + next FILE; + } + + # Try to parse out the name, class, calling sequence, and syntax keywords + unless (parse_syntax($syntax)) { + # Make a note that this section could not be parsed and move on + # to the next file. + reject($syntax,$file,("Could not be parsed")); + next FILE; + } + + # Parse the text keywords, and compare to syntax keywords. + @kwds=make_unique(@kwds); + parse_keywords($kwds); + + # Complain strenuously + if (@complaints) { + # Keep, but complain + $n_complaints += scalar(@complaints); + complain($syntax,$file,@complaints); + } + + diag(" SYNTAX KWDS FOUND: \n ".join("\n ",@kwds)); + diag("\n$file($title):<<<<<<<<<<<<<<<<<<\n\n\n"); + + # Normalize the class/routine cases + if ($class) { + $class = case_name("class",$class); + $name = case_name("method",$name); + } else { + $name = case_name("routine",$name); + } + + my $fname=make_full_name($class,$name); + + # If the special matchers have not set @enter, do it here + $enter[0] = [$name,$type,$class,$call] unless @enter; + + # Write a message for debugging output + diag(sprintf("%-20s %-15s %-20s with %3d keywords\n", + $type,$class,$name,scalar(keys %txt_kwds))); + + # Store the stuff. + foreach $a (@enter) { + my $name = $$a[0]; + my $class = $$a[2]; + my $type = $$a[1]; + if ($ignore_name_re && $name =~ /$ignore_name_re/o) { + diag("Ignoring name $name because of -xname option\n"); + $ignore_name_cnt++; + next; + } + if ($ignore_class_re && $class =~ /$ignore_class_re/o) { + diag("Ignoring name $name because of -xclass option\n"); + $ignore_class_cnt++; + next; + } + $e{$class}{$type}{$name}{file}=$file; + %{$e{$class}{$type}{$name}{kwds}}=%txt_kwds if %txt_kwds; + push @{$e{$class}{$type}{$name}{Get}},@getkwds if @getkwds; + push @{$e{$class}{$type}{$name}{Set}},@setkwds if @setkwds; + $e{$class}{$type}{$name}{call} = $$a[3]; + + # Add keywords from other routines + push @{$e{$class}{$type}{$name}{add_kwds}},@add_keywords if + @add_keywords; + + # Special section links + my @special_section_has=grep {$has_special_section_keywords{$_} eq "has"} + (keys %has_special_section_keywords); + push @{$e{$class}{$type}{$name}{has_special_secs_kwds}}, + @special_section_has if @special_section_has; + + my @special_section_needs=grep {$has_special_section_keywords{$_} eq + "needs"} + (keys %has_special_section_keywords); + push @{$e{$class}{$type}{$name}{needs_special_secs_kwds}}, + @special_section_needs if @special_section_needs; + + } + close(FILE); +} + +# We have all the information now in one huge hash. A few things +# still need to be fixed... + +# 1. The IDL manual for object methods GetProperty and SetProperty +# include only an incomplete list of keywords. Additional keywords +# are documented under "Init" with a "Get" or "Set" (or both) +# marker. These were collected during the scan - here we put them +# into the right space. + +foreach $class (keys %e) { + next unless $class; + $iname = case_name("method","Init"); + next unless defined $e{$class}{fun}{$iname}; + $ifile = $e{$class}{fun}{$iname}{file}; + foreach $getset ("Get","Set") { + $pname = case_name("method",$getset."Property"); + if (defined $e{$class}{fun}{$iname} && + @{$e{$class}{fun}{$iname}{$getset}}) { + foreach (@{$e{$class}{fun}{$iname}{$getset}}) { + $e{$class}{pro}{$pname}{extra}{$ifile}{kwds}{$_}= + $e{$class}{fun}{$iname}{kwds}{$_}; + $e{$class}{pro}{$pname}{extra}{$ifile}{special}="$class::$iname" + } + } + } +} + +# 2. The special syntax matchers may have indicated that some +# routine's keywords are linked to special sections (like Graphics +# Keywords), or that keywords from special sections weren't +# mentioned in the syntax, and need to be added outright (like +# MultiThreading). Do this now. +foreach $class (keys %e) { + foreach $type (keys %{$e{$class}}) { + foreach $name (keys %{$e{$class}{$type}}) { + # Needs the keywords added outright + if (defined($e{$class}{$type}{$name}{needs_special_secs_kwds})) { + foreach my $s (@{$e{$class}{$type}{$name}{needs_special_secs_kwds}}) { + unless (defined($special_sections{$s})) { + diag("No such special section: $s."); + next; + } + my $sfile=$special_sections{$s}{file}; + foreach (keys %{$special_sections{$s}{kwds}}) { + $e{$class}{$type}{$name}{extra}{$sfile}{kwds}{$_}= + $special_sections{$s}{kwds}{$_}; + } + $e{$class}{$type}{$name}{extra}{$sfile}{special}=$s; + } + } + # Already mentions keywords, just needs links + if (defined($e{$class}{$type}{$name}{has_special_secs_kwds})) { + foreach my $s (@{$e{$class}{$type}{$name}{has_special_secs_kwds}}) { + unless (defined($special_sections{$s})) { + diag("No such special section: $sec."); + next; + } + my $sfile=$special_sections{$s}{file}; + foreach (keys %{$special_sections{$s}{kwds}}) { + # Skip if keyword not defined in routine, or already linked + next if(!defined($e{$class}{$type}{$name}{kwds}{$_}) or + $e{$class}{$type}{$name}{kwds}{$_}); + delete $e{$class}{$type}{$name}{kwds}{$_}; #Move to extra kwds + $e{$class}{$type}{$name}{extra}{$sfile}{kwds}{$_}= + $special_sections{$s}{kwds}{$_}; + $e{$class}{$type}{$name}{extra}{$sfile}{special}=$s; + } + } + } + } + } +} + +# 3. Some special matchers may have put in a request to add or link +# keywords of one more routines (or maybe just special section +# keywords from a routine) to another. Since now we know all +# keywords of all routines and their origins (special sections, +# etc.), we can do this. + +foreach $class (keys %e) { + foreach $type ("fun","pro") { + foreach $name (keys %{$e{$class}{$type}}) { + $n_routines_total++; + + next unless defined($e{$class}{$type}{$name}{add_kwds}); + + foreach my $add (@{$e{$class}{$type}{$name}{add_kwds}}) { + my (%nokeys,$file_to_add,$keys_to_add,$special_to_add); + my ($has,$aname,$atype,$aclass,$nokeys,$special_sec)=@$add; + # Does it already have the keywords, just requiring a link + $has=$has eq "has"; + # Is it a real entry being asked for? + next unless defined($e{$aclass}{$atype}{$aname}); + foreach (@$nokeys) {$nokeys{$_}++;} + + my $afile=$e{$aclass}{$atype}{$aname}{file}; + + + # Look for special section keywords only (e.g. "graphics + # keywords accepted by PLOT") + if($special_sec and defined($e{$aclass}{$atype}{$aname}{extra})) { + foreach $file (keys %{$e{$aclass}{$atype}{$aname}{extra}}) { + if($e{$aclass}{$atype}{$aname}{extra}{$file}{special} =~ + /$special_sec/) { + $keys_to_add=$e{$aclass}{$atype}{$aname}{extra}{$file}{kwds}; + $file_to_add=$file; + $special_to_add= + $e{$aclass}{$atype}{$aname}{extra}{$file}{special}; + last; + } + } + } + # Otherwise, add from the set of regular keywords + $keys_to_add=$keys_to_add || $e{$aclass}{$atype}{$aname}{kwds}; + $file_to_add=$file_to_add || $afile; + + # Actually add the keys (moving unlinked ones to extra if necessary) + foreach (keys %$keys_to_add) { + next if $nokeys{$_}; + if($has) { #exists in kwds already, move and link it + next if !defined($e{$class}{$type}{$name}{kwds}{$_}) or + $e{$class}{$type}{$name}{kwds}{$_}; + delete $e{$class}{$type}{$name}{kwds}{$_}; # Move to extra keywords + } + $e{$class}{$type}{$name}{extra}{$file_to_add}{kwds}{$_}= + $$keys_to_add{$_}; + $e{$class}{$type}{$name}{extra}{$file_to_add}{special}= + $special_to_add if $special_to_add; + } + } + } continue { + $n_keywords_total += scalar(keys %{$e{$class}{$type}{$name}{kwds}}) + if defined $e{$class}{$type}{$name}{kwds}; + if(defined($e{$class}{$type}{$name}{extra})) { + foreach (keys %{$e{$class}{$type}{$name}{extra}}) { + $n_keywords_total += + scalar(keys %{$e{$class}{$type}{$name}{extra}{$_}{kwds}}); + } + } + } + } +} + +#print Dumper(%e),"\n"; +#print Dumper(%special_sections),"\n"; +#print "SYSVARS:\n",Dumper(%sysvars),"\n"; + +# Print debug information about how often each special matcher matched. +# Will only be visible in debugging mode. +diag("SPECIAL MATCHERS-------------------------------matched success failed\n"); +foreach $key (sort ignoring_case keys %specials) { + diag(sprintf("%-50s %3d %3d %3d\n", + $key,$special_matcnt{$key},$special_actcnt{$key}, + $special_matcnt{$key}-$special_actcnt{$key})); +} +diag(sprintf("\nProblematic entries: %d rejected, %d complains.\n", + $n_rejections,$n_complaints)); + + +# Write the lisp file +# Also write some statistics to STDERR +write_rinfo_header(); +print RINFO "(defconst idlwave-system-routines\n"; +print RINFO " '(\n"; +printf STDERR "\n Nr Class Npro Nfun Ntot Nkwd\n"; +printf STDERR "-----------------------------------------------\n"; +$classcnt = -1; +foreach $class (sort ignoring_case keys %e) { + $npro = scalar(keys %{$e{$class}{"pro"}}); + $nfun = scalar(keys %{$e{$class}{"fun"}}); + $nkwd = 0; + foreach $type ("pro","fun") { + foreach $name (keys %{$e{$class}{$type}}) { + $nkwd += scalar keys %{$e{$class}{$type}{$name}{kwds}} + if defined($e{$class}{$type}{$name}{kwds}); + if(defined($e{$class}{$type}{$name}{extra})) { + foreach $file (keys %{$e{$class}{$type}{$name}{extra}}) { + $nkwd+= scalar keys %{$e{$class}{$type}{$name}{extra}{$file}{kwds}}; + } + } + } + } + $nprotot += $npro; + $nfuntot += $nfun; + $nclass++; + printf STDERR "%3d %-21s %4d %4d %4d %5d\n", + ++$classcnt,$class,$npro,$nfun,$npro+$nfun,$nkwd; + foreach $type ("pro","fun") { + foreach $name (sort ignoring_case keys %{$e{$class}{$type}}) { + print RINFO " " . make_lisp_reader_string($class,$type,$name) . "\n"; + } + } +} + +print STDERR "-" x 47,"\n"; +printf STDERR "Total %4d %4d %5d %5d\n", + $nprotot,$nfuntot,$nprotot+$nfuntot,$n_keywords_total; +printf STDERR "Routines ignored due to -xname: %4d\n",$ignore_name_cnt + if $ignore_name_re; +printf STDERR "Routines ignored due to -xclass: %4d\n",$ignore_class_cnt + if $ignore_class_re; +print RINFO <<EOF; + ) + "$n_routines_total builtin routines with $n_keywords_total keywords for IDL version $idlversion.") +EOF + +# Gather sysvar and class info, and write the info +talk_to_idl(); +write_sysvar_info(); + +# Attach classes to the files documenting them. +foreach $class (keys %e) { + if (defined $files{uc $class} && !defined($classes{$class}{file})) { + $classes{$class}{file}=$files{uc $class}; + } +} +write_classtag_info(); + + +write_executive_commands(); + + +print RINFO <<EOF; +;; Special words with associated help topic files +(defconst idlwave-help-special-topic-words + '( +EOF + +foreach (sort ignoring_case keys %special_topics) { + foreach $word (sort ignoring_case @{$special_topics{$_}{words}}) { + print RINFO " (\"$word\" . \"$special_topics{$_}{file}\")\n"; + } +} +print RINFO <<EOF; +) + "Association list of help files for special context words.") + +EOF + +write_rinfo_footer(); + +close RINFO; + +printf STDERR "Wrote file $rinfofile (%4d kBytes)\n", (-s $rinfofile)/1024.; + + + +# ========================================================================== +# ========================================================================== +# SUB-ROUTINES + +###--- LOGS AND MESSAGES ---### + +sub usage { + # Print usage information + print STDERR <<EOF; +usage: get_html_rinfo [-debug] [-path DIR] [-idl /path/to/idl] + [-xname REGEXP] [-xclass REGEXP] +EOF + exit(1); +} + +sub diag { + # Write diagnosis to STDERR and to the LOG file if we are debugging. + my $msg = @_[0]; + if ($debug) { + #print STDERR $msg; + print $msg; + print LOG $msg; + } +} + +sub reject { + # Write a message to the reject file. + my($string,$file,@reasons) = @_; + if ($debug) { + print REJECT (("-" x 80) . "\n") x 2; + print REJECT "File: $file\n"; + print REJECT "Syntax: $string\n"; + foreach $reason (@reasons) { + print REJECT "Reason: $reason\n"; + } + } +} + +sub complain { + # Write a message to the reject file. + my($string,$file,@reasons) = @_; + if ($debug) { + print COMPLAIN (("-" x 80) . "\n") x 2; + print COMPLAIN "File: $file\n"; + print COMPLAIN "Syntax: $string\n"; + foreach $reason (@reasons) { + print COMPLAIN "Reason: $reason\n"; + } + } +} + +###--- SYNTAX ---### + +sub clean_up_syntax { + # Clean up Syntax + $syntax=~s|<a\s+name="[0-9]+">\s+</a>Syntax\s+</h3>||; #Entro Syntax + $syntax=~s|^<p\s+class="Note">.*?^</p>\s*||msg; # Remove notes + $syntax=~s|^<h5\s+class="Heading4">.*?^</h5>\s*||msg; # And headings + $syntax=~s/and then,.*//s; + $syntax=~s/<br>/\n/g; + $syntax=~s/<.*?>//g; + $syntax=~s/ / /g; + $syntax=~s/—/ - /g; + $syntax=~s/^\s+//; + $syntax=~s/\s+$//; + $syntax=~s/&([gl])t;/$1=="g"?">":"<"/eg; +} + +sub parse_syntax { + my $txt = $_[0]; + + # Initialize a few variables. + # Note that these are global vars which are interpreted by the caller. + @args = @kwds = @getkwds = @setkwds = (); + $name = $type = $class = $call = ""; + + # and any initial explanation string, as in for OPEN + $txt =~ s/^.*?\w:(?!:)\s*//; + + # Determine type (function or procedure) + if ($txt =~ /^\s*[a-z][a-z0-9_]+\s*=/si) { + $type = "fun"; + } else { + $type = "pro"; + } + + # Determine Class (ignore the name - we grab it later) + if ($override_class) { + $class=$override_class; + $override_class + =""; + } else { + if ($txt =~ /->\s*\[([a-z][a-z0-9_]+)::?\]([a-z][a-z0-9_]+)/i) { + $class = $1; + } + } + + # Clean out the remarks about BLABLA keywords + $txt =~ s/^.*?\bkeywords\b.*?://gmi; + # Clean out the "only in..." stuff + $txt =~ s/\(only\s*in[^\)]*\)//gi; + + + # Collapse each chunk of whitespace into a single SPACE + $txt =~ s/[ \t\n\r]+/ /gm; + + # Remove various other bits of detritus. + $txt =~ s/\bor [a-z0-9 ]+,/or/gi; + $txt =~ s/\bFor [a-z0-9 ]+:/or/gi; + + # Get all keywords. Keywords are things with `/' before it or with `=' + # after it. + while ($txt =~m!(\{([/XYZ |]*)\})? + (\/)? + \b + (_?[A-Z][A-Z0-9_]*) + (\ *\{\s*(Get|Set|Get\s*,\s*Set)\s*\})? + \s* + (=)?!gx + ) { + { + next unless $` =~ /,/; + } # ignores the "result=" in functions + ($xyz,$slash,$identifier,$getset,$equal) = ($2,$3,$4,$6,$7); + next unless $slash || $equal || ($xyz && $xyz=~m|/|); + + # Everything before the first keyword is part of the calling sequence + $call = $` unless $call; + + # Some keywords have a {X|Y|Z} in front which must be expanded + if ($xyz) { + @ids = map {tr|/||d; $_ . $identifier} (split(/ *\| */,$xyz)); + } else { + @ids = ($identifier); + } + + # Save the recognized keywords away + push @kwds,@ids; + + # When the keyword has a Get or Set flag, add it to these lists, + # so it can be appended to the Get|SetProperty methods' keyword + # lists later on. + push @getkwds,@ids if $getset =~ /get/i; + push @setkwds,@ids if $getset =~ /set/i; + } + + # If there was no keyword, the whole text must be calling sequence + $call = ($call || $txt); + + # Final whitespace and parens are killed. + $call =~ s/[ \n\t\[\]\{\}\|,]*$//; #] + # We may have killed needed closing parenthesis - fix this. + $call = close_open_parens($call); + + # Find the name of the routine in the calling sequence. + if ($override_name) { + $name=$override_name; + $override_name=""; + } else { + if ($class) { + if ($call =~ /->\s*\[[^\]]+\]([a-z][a-z0-9$_]*)/i) { + $name = $1; + } + } else { + if ($type eq "fun") { + if ($call =~ /=\s*([a-z][a-z0-9\$_]*)/i) { + $name = $1; + } + } elsif ($type eq "pro") { + if ($call =~ /^\s*([a-z][a-z0-9\$_]*)/si) { + $name = $1; + } + } + } + } + + # Return the name to indicate success + return $name; +} + +# Parse the keywords section, gathering links, and comparing to the +# syntax-derived list @kwds (unless no_syntax_kwds is true) +sub parse_keywords { + my ($txt,$no_syntax_kwds,$heading) = @_; + my ($kwd,$key_base,@missing_from_syntax,@missing_from_text, + %syntax_kwds,$link); + @special_keywords=(); + %txt_kwds=(); + $heading=$heading || '<h4\s*class="Heading3">'; + + while ($txt =~ m{$heading\s* + <a\s*name="([0-9]+)">\s*</a> # Embedded Link + \s*([^\n]+)\n # Everything to end of line + }gxs) { + $kwd=$2; + if ($kwd=~/Keyword/) { + push @special_keywords,$kwd; + next; + } + + $link=$1; + next unless + $kwd =~ m{( + (?:(\[XYZ?\])?(\!?[A-Z0-9-_]+)# opt. [XYZ] ff. by caps keybase + (?![a-z]) # Not ff. by lowercase lets + (?:\s*,\s*)? # With an optional comma + )+ # One or more of the above + )}gxs; + ($kwd,$xyz,$key_base)=($1,$2,$3); + if ($xyz) { + $xyz=~tr/XYZ//cd; + map {$txt_kwds{$_ . $key_base}=$link} split(/(?=[XYZ])/,$xyz); + } else { + foreach (split(/,\s*/,$kwd)) { + $txt_kwds{$_}=$link; + } + } + } + + unless($no_syntax_kwds) { + foreach $kwd (@kwds) { + $syntax_kwds{$kwd}++; + unless (defined $txt_kwds{$kwd}) { + push @missing_from_text,$kwd; + } + } + + push @complaints,"Syntax keywords missing from text: " . + join(",",@missing_from_text) if @missing_from_text; + + foreach $kwd (keys %txt_kwds) { + push @missing_from_syntax,$kwd unless $syntax_kwds{$kwd}; + } + push @complaints,"Text keywords missing from syntax: " . + join(",",@missing_from_syntax) if @missing_from_syntax; + + # No link found, but add the missing keywords from Syntax anyway. + map {$txt_kwds{$_}=""} @missing_from_text; + } +} + +sub fix_keywords { + # Fix the case of keywords, sort and uniquify the list. + my @a; + @a = map { case_name("keyword",$_) } @_; + @a = make_unique(@a); + @a = sort ignoring_case @a; + @a; +} + +sub case_name { + # Return $name in a unique case. + # The first time this function gets called, the default casification + # is determined and stored. Later calls with same or different case only + # return this initial value. + my ($type,$name) = @_; + my $lcname = lc($name); + if ($debug) { + $origcase{$type}{$lcname} = $name unless defined $origcase{$type}{$lcname}; + if (defined $origcase{$type}{$lcname} && + $name ne $origcase{$type}{$lcname}) { + diag("CASECHANGE $type: old $origcase{$type}{$lcname}, new $name\n"); + } + } + + if (defined $casedname{$type}{$lcname}) { + return $casedname{$type}{$lcname}; + } else { + my $casedname; + if ($type eq "class" || $type eq "method") { + $casedname = $name; + } else { + $casedname = uc($name) + } + return $casedname{$type}{$lcname} = $casedname; + } +} + +sub make_full_name { + my ($class,$name) = @_; + ($class ? "$class" . "::" : "") . $name +} + +sub balanced_parens { + # Check if all parenthesis are balanced + my $string = $_[0]; + $string =~ tr/()[]{}//cd; + 1 while $string =~ s/\(\)|\[\]|\{\}//g; + $string eq ""; +} + +sub close_open_parens { + # Add all necessary closing parenthesis to the end if a string + my $rtn = $_[0]; + my $string = $_[0]; + $string =~ tr/()[]{}//cd; + 1 while $string =~ s/\(\)|\[\]|\{\}//g; + my $close = reverse($string); + $close =~ tr/([{/)]}/; + $rtn . $close; +} + +sub make_lisp_reader_string { + # Make a string which will be correctly interpreted by the Emacs Lisp reader + # as a lisp list containing all needed information about a routine. + my ($class,$type,$name) = @_; + my $call=$e{$class}{$type}{$name}{call}; + my ($entry,$kwds); + + # In the calling sequence we want `%s' instead of name and class. + # The calling sequence will later be used as format string to make + # a calling sequence with the correct version of class and name. + $call =~ s/^result/Result/; + if ($class) { + $call =~ s/(\[)($class)(:+)(\])($name)\b/$1%s$3$4%s/i; + } else { + $call =~ s/\b$name\b/%s/i; + } + + # Now we make the string which can be parsed by the Lisp reader + # It looks like this: + # ("NAME" TYPE "CLASS" (system) "CALLING SEQUENCE" + # ("FILE1.html" "KWD1" LINK1 "KWD2" LINK2 ...)) + $entry = sprintf('(%s %3s %s', "\"$name\"", $type, + ($class eq "" ? "nil" : "\"$class\"")); + $entry .= " (system) \"$call\""; + $kwds = " (\"$e{$class}{$type}{$name}{file}\""; + + if (defined $e{$class}{$type}{$name}{kwds}) { + $kwds.=" "; + $kwds.=join(" ",map {"(\"$_\"" . + ($e{$class}{$type}{$name}{kwds}{$_}? + " . $e{$class}{$type}{$name}{kwds}{$_}":"") . ")"} + (sort ignoring_case keys %{$e{$class}{$type}{$name}{kwds}})); + } + $kwds.= ')'; + if (defined $e{$class}{$type}{$name}{extra}) { + foreach $file (sort keys %{$e{$class}{$type}{$name}{extra}}) { + $kwds .= " (\"$file\" " . + join(" ",map {"(\"$_\"" . + ($e{$class}{$type}{$name}{extra}{$file}{kwds}{$_}? + " . $e{$class}{$type}{$name}{extra}{$file}{kwds}{$_}": + "") .")"} + (sort ignoring_case + keys %{$e{$class}{$type}{$name}{extra}{$file}{kwds}})) . ')'; + } + } + $kwds=$kwds || " nil"; + + $entry .= $kwds . ')'; + + # Return the entry + $entry; +} + +###--- TALKING TO IDL & WRITING RINFO FILE ---### + +sub write_rinfo_header { + my ($part_of) = $emacs ? + $lisp_part_of_gnu_emacs : $lisp_not_part_of_gnu_emacs; + $date = localtime; + chomp $date; + ($progname = $0) =~ s|.*/||; + chomp $progname; + $n_files_scanned = @files; + print RINFO <<EOF; +;;; idlw-rinfo.el --- Routine Information for IDLWAVE +;; Copyright (c) 1999 Carsten Dominik +;; Copyright (c) 1999, 2000, 2001, 2002, 2003 Free Software Foundation + +;; Author: J.D. Smith <jdsmith\@as.arizona.edu> +;; Version: 5.1 +;; Keywords: languages +$part_of +;;; Commentary: + +;; The constants which contain information about IDLs builtin +;; functions, procedures, system variables and class tags. This +;; information is extracted automatically from the IDL documentation +;; and by talking to IDL. +;; +;; Created by $progname on $date +;; IDL version: $idlversion +;; Number of files scanned: $n_files_scanned +;; Number of routines found: $n_routines_total +;; Number of keywords found: $n_keywords_total +;; +;; New versions of IDLWAVE, documentation, and more information +;; available from: +;; http://idlwave.org +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +;;; Code: +EOF +} + +sub write_rinfo_footer { + print RINFO <<EOF; +(provide 'idlw-rinfo) +(provide 'idlwave-rinfo) + +;;; idlw-rinfo.el ends here +EOF +} #' + +sub talk_to_idl { + # Run IDL and get some important information + my ($code,$class); + open IDLWAVE_EL,"idlwave.el" or die "Can't open idlwave.el"; + while ($_=<IDLWAVE_EL>) { + $code .= $_ if /^;; START OF IDLWAVE SUPPORT ROUTINES/.. + /^;; END OF IDLWAVE SUPPORT ROUTINES/; + } + close IDLWAVE_EL; + $code =~ s/([^\\])\\\"/$1\"/g; + open TMP, ">tmp.pro" or die "Can't write to tmp.pro\n"; + print TMP $code; + close TMP; + + use IPC::Open2; + # Run IDL in order to get the names of system variables + print STDERR "\nTalking to $idl...\n"; + $pid = open2(\*RDR, \*WTR, $idl . ' 2>/dev/null'); + print WTR ".run tmp.pro\n"; + print WTR "idlwave_get_sysvars\n"; + my $fmt = "idlwave_get_class_tags, '%s'\n"; + foreach $class (sort ignoring_case keys %e) { + next unless $class; + printf WTR $fmt,$class; + } + print WTR "exit\n"; + + # Collect the output + while ($_=<RDR>) { + diag("Read: $_\n"); + if (/^IDLWAVE-SYSVAR: (.*[A-Z0-9\$])/) { + push @sv_entries,$1; + } elsif (/^IDLWAVE-CLASS-TAGS: (.*[A-Z0-9\$])/) { + @F=split " ",$1; + $class = shift @F; + $class =~ s/^!//; + $nclass++; + $ntags += scalar(@F); + @{$classes{$class}{tags}} = @F; + } + } + printf STDERR + "%6d System Variables\n%6d Class Structures\n", + scalar(@sv_entries),scalar(keys %classes); +} + +sub write_sysvar_info { + # Print the definition of the sysvar alist to the lisp file. + my $main; + print RINFO "\n(setq idlwave-system-variables-alist\n '("; + foreach (@sv_entries) { + @F=split; + $main = shift @F; + $main =~ s/^!//; + print RINFO "\n (\"$main\""; + $main=uc($main); + if (@F) { + print RINFO " (tags"; + foreach (@F) { + if($sysvars{$main}{tags}{$_}) { + print RINFO " (\"$_\" . $sysvars{$main}{tags}{$_})"; + } else { + print RINFO " (\"$_\")"; + } + } + print RINFO ")"; + } + + my $link=$sysvars{$main}{main}; + if ($link) { + print RINFO " (link \"$link\")"; + } + + print RINFO ")"; + } + print RINFO "))\n\n"; +} + +sub write_executive_commands { + print RINFO "\n(setq idlwave-executive-commands-alist '(\n"; + foreach (keys %executive_commands) { + print RINFO " (\"$_\" . \"$executive_commands{$_}\")\n"; + } + print RINFO "))\n\n"; +} + +sub write_classtag_info { + # print the class tags to the lisp file. + my ($class,$class1,$nclass,$ntags,$ntags1); + my @classes; + my @F; + print RINFO "\n(setq idlwave-system-class-info\n '("; + @classes = sort {$#{$classes{$b}{tags}} <=> $#{$classes{$a}{tags}}} + keys %classes; + CLASS: + for ($i=0;$i<$#classes;$i++) { + $c1 = $classes[$i]; + @c1tags = @{$classes{$c1}{tags}}; + SUPERCLASS_CANDIDATE: + for ($j=$i+1;$j<=$#classes;$j++) { + $c2 = $classes[$j]; + @c2tags = @{$classes{$c2}{tags}}; + next SUPERCLASS_CANDIDATE unless @c2tags; + if (is_subset(\@c2tags,\@c1tags)) { + @c1tags=@{$classes{$c1}{tags}}= + remove_tags($classes{$c1}{tags},@c2tags); + push @{$inherits{$c1}}, $c2; + $n_inherits++; + next SUPERCLASS_CANDIDATE; + } else { + next SUPERCLASS_CANDIDATE; + } + } + } + + foreach $class (@classes) { + print RINFO "\n (\"$class\""; + if (@{$classes{$class}{tags}}) { + print RINFO " (tags"; + foreach (@{$classes{$class}{tags}}) { + print RINFO " \"$_\"";$ntags1++; + } + print RINFO ")"; + } + if (@{$inherits{$class}}) { + print RINFO " (inherits"; + foreach $class1 (@{$inherits{$class}}) { + print RINFO " \"$class1\""; + } + print RINFO ")"; + } + if ($classes{$class}{file}) { + print RINFO " (link \"$classes{$class}{file}\")"; + print RINFO ")"; + } + } + print RINFO "))\n\n"; + print STDERR "$ntags tags, $nclass classes, $n_inherits inheritance cases deduced -> $ntags1 native tags\n" if $debug; +} + +###--- UTILITY ---### + +sub max {$_[0] > $_[1] ? $_[0] : $_[1]} + +sub is_subset { + # Accepts two array references as arguments. + # Checks if the first array is a subset of the second. + my ($ref2,$ref1) = @_; + my @tags1 = @$ref1; + my @tags2 = @$ref2; + my %taghash; + foreach (@tags1) { + $taghash{$_}=1; + } + foreach (@tags2) { + return 0 unless defined($taghash{$_}); + } + return 1; +} + +sub remove_tags { + # Remove entries in the array in the first ref + # the entries to remove are the remaining arguments. + my $ref1 = shift @_; + my @tags = @$ref1; + my @rmtags = @_; + my %taghash; + my $cnt = 0; + foreach (@tags) { + $taghash{$_}=++$cnt; + } + foreach (@rmtags) { + delete $taghash{$_}; + } + sort {$taghash{$a} <=> $taghash{$b}} keys %taghash; +} + +sub make_unique { + # make an array unique. + my @new; + my %seen; + foreach (@_) { + push @new,$_ unless $seen{$_}++; + } + @new; +} + +# for case-insensitive sorting.. +sub ignoring_case {lc($a) cmp lc($b)} + + +###--- FINDING AND FIXING SPECIAL CASES OF THE SYNTAX ---### + +sub try_specials { + # Try if any of the special matchers in %specials matches the current + # entry. + my($key,$sub); + foreach (keys %special_sections) { + $has_special_section_keywords{$_} = 0; + } + SPECIAL: + foreach $key (sort keys %specials) { + $sub = $specials{$key}; + $act = 0; + $rtn = &$sub(); + if ($rtn) { + $special_matcnt{$key}++; + $special_actcnt{$key}++ if $act; + diag("Special $key matched. Count is $special_matcnt{$key}, $special_actcnt{$key}\n"); + next SPECIAL if $rtn == 1; + return 1; + } + } + return 0; +} + +# UPDATE: Special matchers +# ------------------------ +# The BEGIN block sets the array which contains special matchers. The +# hash keys are just strings identifying the matcher. The values are +# anonymous subroutines which work on the global variable $syntax +# which contains the full syntax entry. To fix a problem, the matcher +# may change the $syntax variable. +# The return value of a matcher must be 0, 1, or 2 +# 0: The matcher did not match the entry. +# 1: The matcher matched, and further matchers should be tried. +# 2: The matcher did match and no further matchers should be tried. + +# Matching should mean that the matcher found the entry it was supposed to +# find. When the entry is successfully changed, the matcher should also set +# the variable $act. As typos get fixed by RSI, some matchers will still +# match but not act - a sign that they can be removed from the %specials list. + +# Matchers of buggy entries should also push onto the @complaints array +# an explanation why the matched entry is buggy. The result can later +# be mailed to RSI to make them fix the problem. + +# Some entries describe actually several routines, for example PRINT/PRINTF. +# The matcher must then place entries into the @enter array for each routine. + +# Finally, some routines do not list all the keywords, but just say +# "accepts all keywords accepted by PLOT" or so. See the SURFACE +# matcher for an example how to tell the system to add the +# corresponding keywords to the routine description as well, using the +# @add_keywords array, with format: +# +# [ ("needs"|"has"),name,type,class,no_keys_reference,specials_only ] +# +# "needs" or "has": depending on whether the routine being scanned +# "needs" keywords added, or already "has" them, but just requires +# linking to the proper section. +# name: The name of the add routine +# type: The type of the add routine +# class: The class (if any) of the add routine +# no_keys_reference: A reference to a list of keywords *not* to add. +# specials_only: A string mentioning one of the special keyword +# section names (see %special_sections below), which will limit +# added keywords to those with the named special section origin +# (e.g. "Graphics"). + +# When this program is being run with the -d flag, it will list which +# matcher was used how often at the end. +BEGIN { + + # The following function prefixes the names of special matchers with + # numbers, to make sure they are called in the same sequence as + # specified here. + $cnt = 0; + sub sname {sprintf("%03d %s",++$cnt,$_[0])} + + # The hash with the matchers + %specials = + ( + # ------------------------------------------------------------------ + # Some typos which occur globally in many entries ------------------ + # ------------------------------------------------------------------ + + sname("UNBANACED PARENTHESIS") => + sub { + if (! balanced_parens($syntax)) { + # A parenthesis is missing in the Syntax entry + push @complaints, "Unbalanced parenthesis"; + 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # The explanation of what the "Syntax" entry is must be rejected --- + # ------------------------------------------------------------------ + + sname("SYNTAX") => + sub { + if ($syntax =~ /The\s*"?syntax"?\s*section\s*shows/si) { + # Ignore when Syntax is used to explain itself + push @rejects, "Rejecting SYNTAX section explanation."; + $syntax = ""; + $act = 2; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # C Functions ------------------------------------------------------ + # ------------------------------------------------------------------ + + sname("C FUNCTION") => + sub { + if ($syntax =~ /\A\s*Syntax\s* + (int | + char | + void | + IDL_MEMINT | + IDL_FUN_RET | + IDL_VPTR | + Client + )\b + /xsi) { + push @rejects, "Seems to be a C routine"; + $syntax = ""; + $act = 2; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # IDL statements --------------------------------------------------- + # ------------------------------------------------------------------ + + # UPDATE: Statement regexp + sname("IDL STATEMENT") => + sub { + if ($syntax =~ /\A\s* + (BEGIN \s+ statements | + CASE \s+ expression | + SWITCH \s+ expression | + BREAK \s+ | + CONTINUE \s+ | + COMMON \s+ Block_name | + COMPILE_OPT \s+ | + FORWARD_FUNCTION \s+ | + FOR \s+ variable | + FUNCTION \s+ Function_Name | + GOTO \s* , \s+ label | + IF \s+ expression | + PRO \s+ Procedure_Name | + REPEAT \s+ statement | + WHILE \s+ expression) + /ix) { + # Ignore + $syntax = ""; + push @rejects, "IDL control stucture."; + $act = 2; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Executive statements --------------------------------------------- + # ------------------------------------------------------------------ + + sname("IDL EXECUTIVE COMMAND") => + sub { + if ($syntax =~ /\A\.[A-Z]+/) { + # Command starts with a dot. Ignore it. + $syntax = ""; + push @rejects, "IDL executive command."; + $act = 2; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Entries which must produce several rinfo list entries ------------ + # ------------------------------------------------------------------ + + sname("CALL_METHOD") => + sub { + if ($syntax =~ /CALL_METHOD.*\bor\b.*CALL_METHOD/s) { + # Can be called as function or method - make 2 entries. + $enter[0] = ["CALL_METHOD","pro","","CALL_METHOD, Name, ObjRef, [, P1, ..., Pn]"]; + $enter[1] = ["CALL_METHOD","fun","","Result = CALL_METHOD, Name, ObjRef, [, P1, ..., Pn]"]; + $act = 1; + } else { + 0; + } + }, + + sname("OPEN") => + sub { + if ($syntax =~ /OPENR.*OPENW.*OPENU/s) { + # Make this 3 separate entries + $enter[0] = ["OPENR","pro","","OPENR, Unit, File"]; + $enter[1] = ["OPENW","pro","","OPENW, Unit, File"]; + $enter[2] = ["OPENU","pro","","OPENU, Unit, File"]; + $override_name="OPEN"; + $act = 1; + } else { + 0; + } + }, + + sname("READ") => + sub { + if ($syntax =~ /^\s*READ, \[Prompt/si) { + # Make this 2 separate entries + $syntax =~ s/^[ \t]*READ,.*?\n//m; + $syntax =~ s/^[ \t]*READF,.*?\n/m/; + $enter[0] = ["READ","pro","","READ, [Prompt,] Var1, ..., Varn"]; + $enter[1] = ["READF","pro","","READF, [Prompt,] Unit, Var1, ..., Varn"]; + $act = 1; + } else { + 0; + } + }, + + sname("PRINT") => + sub { + if ($syntax =~ /^\s*PRINT \[, Expr/si) { + # Make this two separate entries + $syntax =~ s/^[ \t]*PRINT .*?\n//m; + $syntax =~ s/^[ \t]*PRINTF .*?\n/m/; + $enter[0] = ["PRINT","pro","","PRINT [, Expr1, ..., Exprn]"]; + $enter[1] = ["PRINTF","pro","","PRINTF [, Unit, Expr1, ..., Exprn]"]; + $act = 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # INIT and CLEANUP ------------------------------------------------- + # ------------------------------------------------------------------ + + sname("IDLcomIDispatch::Init") => + sub { + if ($syntax =~ /obj_new\('IDLcomIDispatch[^\']*'\)/si) { + $enter[0] = ["Init","fun","IDLcomIDispatchCLSID","Result = Obj -> [IDLcomIDispatchCLSID::]Init()"]; + $enter[1] = ["Init","fun","IDLcomIDispatchPROGID","Result = Obj -> [IDLcomIDispatchPROGID::]Init()"]; + $act=1; + 1; + } else { + 0; + } + }, + + sname("INIT & OBJ_NEW") => + sub { + if ($syntax =~ /.*Init.*\bor\b.*OBJ_NEW\b/s) { + # Mention only "Init", not OBJ_NEW in calling sequence + $act = ($syntax =~ s/\bor\b.*OBJ_NEW.*subclass.*?\)//s); + push @complaints,"Most syntax entries for INIT methods list OBJ_NEW before INIT"; + 1; + } else { + 0; + } + }, + + sname("OBJ_NEW & INIT") => + sub { + if ($syntax =~ /\s*OBJ_NEW\(\s*[\`\'][^\`\']+[\'\`]\s*(.*?)\s*\bor\s*(Result\b.*Init\()[^\)]*\)(.*)/si) { + # Mention only "Init", not OBJ_NEW in calling sequence + $syntax = "$2$1$3"; + $syntax =~ s/\([^\)]*in\s+a\s+subclass[^\)]*Init[^\)]*\)\s*//i; + $syntax =~ s/\n[ \t\n]*\n/\n/; + $act = 1; + } else { + 0; + } + }, + + sname("INIT MISSING") => + sub { + if ($syntax =~ /OBJ_NEW\(\s*[\`\']([a-zA-Z0-9_]+)[\`\'](.*?)(\(Only\s+in\s+a\s+sub|\))/s) { #`) + my $class = $1; + my $args = $2; + $args =~ s/^(\s*\[\s*),/$1/; + $syntax = "Result = Obj -> [${class}::]Init($args)"; + push @complaints,"INIT calling sequence missing"; + $act = 1; + } else { + 0; + } + }, + + sname("OBJ_DESTROY & CLEANUP") => + sub { + if ($syntax =~ /^\s*OBJ_DESTROY,\s*Obj\s+or\s*(.*)/si) { + # Mention only "Cleanup" in calling sequence, not OBJ_DESTROY + $syntax = $1; + $syntax =~ s/\(.*?in\s+[^)]*subclass[^)]*Cleanu[^)]*\)//i; + $act = 1; + } else { + 0; + } + }, + + sname("CLEANUP & OBJ_DESTROY") => + sub { + if ($syntax =~ /.*Cleanup.*\bor\b.*OBJ_DESTROY\b/si) { + # Mention only "Cleanup" in calling sequence, not OBJ_DESTROY + push @complaints,"Most syntax entries for CLEANUP methods list OBJ_DESTROY before CLEANUP"; + $act = ($syntax =~ s/\bor\b.*OBJ_DESTROY.*subclass.*?\)//s); + 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Cleaning up some calling sequences ------------------------------- + # ------------------------------------------------------------------ + + sname("ATAN") => + sub { + if ($syntax =~ /ATAN.*ATAN/s) { + # Write a simpler calling sequence + $syntax = "Result = ATAN([Y,] X)"; + $act = 1; + } else { + 0; + } + }, + sname("INTERPOL") => + sub { + if ($syntax =~ /For regular grids.*INTERPOL/i) { + # Simplify calling sequence + $act = ($syntax =~ s/^[ \t]*for (ir)?regular grids:\s*//gmi); + 1; + } else { + 0; + } + }, + sname("WRITE_JPEG") => + sub { + if ($syntax =~ /^\s*WRITE_JPEG\b/si) { + # Remove the UNIT thing from the syntax + $act = ($syntax =~ s/\[,\s*Filename\s*\|\s*,\s*UNIT=lun\s*\](.*)/[, Filename]$1 [, UNIT=lun]/si); + $syntax =~ s/\{.*?\}//sg; + 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Attach or link keywords from another procedure ------------------- + # ------------------------------------------------------------------ + sname("SURFACE,CONTOUR,PLOT_3DBOX (ADD PLOT KEYWORDS)") => + sub { + if ($syntax =~ /^\s*(SURFACE|CONTOUR|PLOT_3DBOX)/si) { + if ($syntax =~ + /Graphics Keywords: Accepts.*PLOT.*except([^\.]*)/s) { + # Most PLOT keywords need to be added later. + $act = 1; + push @complaints, "Keyword list not complete (PLOT kwds omitted)"; + my $nokeys = $1; + $nokeys =~ s/\{XYZ\}([A-Z0-9_]+)/X$1, Y$1, Z$1/g; + my @nokeys = (split(/[^A-Z0-9_]+/,$nokeys)); + shift @nokeys unless $nokeys[0]; # Useless material + push @add_keywords,["needs","PLOT","pro","",\@nokeys,"Graphics"]; + } + 1; + } else { + 0; + } + }, + + sname("POLAR_CONTOUR (LINK CONTOUR KEYWORDS)") => + sub { + if ($syntax =~ /^\s*POLAR_CONTOUR/si) { + if ($kwds =~ /accepts all of the keywords.*CONTOUR routine except([^\.]*)\./s) { + # Most CONTOUR keywords need to be linked later. + $act = 1; + push @complaints, "Keyword list not complete (CONTOUR kwds omitted)"; + my $nokeys = $1; + $nokeys =~ s/\{XYZ\}([A-Z0-9_]+)/X$1, Y$1, Z$1/g; + $nokeys =~ s/and//; + my @nokeys = (split(/[^A-Z0-9_]+/,$nokeys)); + shift @nokeys unless $nokeys[0]; # Useless material + push @add_keywords,["has","CONTOUR","pro","",\@nokeys]; + } + 1; + } else { + 0; + } + }, + + sname("TVSCL (ADD TV)") => + sub { + if ($syntax =~ /^\s*TVSCL/si) { + # Most TV keywords need to be added later. + $act = 1; + push @complaints, "Keyword list not complete (TV kwds omitted)"; + push @add_keywords,["needs","TV","pro"]; + 1; + } else { + 0; + } + }, + + sname("LINK SURFACE KEYWORDS") => + sub { + if ($kwds =~ m|<a\s+name="[0-9]+">\s+</a>SURFACE Keywords|) { + # All SURFACE keywords listed in syntax need to be added later. + $act = 1; + push @complaints, + "Keyword list not complete (SURFACE kwds omitted)"; + push @add_keywords,["has","SURFACE","pro",""]; + 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Attach or link keywords from a special section ------------------- + # ------------------------------------------------------------------ + + sname("GRAPHICS KEYWORDS") => + sub { + if ($syntax =~ /Graphics +Keywords *:/i) { + $act = 1; + $has_special_section_keywords{Graphics} = "has"; + 1; + } else { + 0; + } + }, + + sname("THREAD POOL KEYWORDS") => + sub { + if ($kwds =~ m|</a>Thread Pool Keywords\s*$|mi) { + $act = 1; + $has_special_section_keywords{MultiThreading} = "needs"; + 1; + } else { + 0; + } + }, + + sname("DEVICE KEYWORDS") => + sub { + if ($syntax =~ /^\s*DEVICE/si) { + $act = 1; + $has_special_section_keywords{Device} = "has"; + 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Typos ------------------------------------------------------------ + # ------------------------------------------------------------------ + + sname("SLASH MISSING") => + sub { + if ($syntax =~ /Keywords|\[ *, *(\/|[A-Z][A-Z0-9_]* *=)/g) { + my $left = $` . $&; + my $right = $'; + my $n=0; + while ($right =~ s/((\[|\|)[ \n]*,[ \n]*)([A-Z][A-Z0-9_]*)( *(\||\]))/$1\/$3$4/s) { + $n++; + push @complaints, "Slash before keyword $3 missing"; + print "Fixing keyword $n $3\n" if $debug; + } + if ($n) { + $syntax = $left . $right; + $act = 1; + 1; + } else { + 0; + } + } + }, + + # ------------------------------------------------------------------ + # Keyword omissions ------------------------------------------------ + # ------------------------------------------------------------------ + + sname("SLASH SEPARATED FROM KEYWORD") => + sub { + if ($syntax =~ /\/\s+[A-Z][A-Z0-9_]+/) { + $act = ($syntax =~ s|/\s+([A-Z][A-Z0-9_]+)|/$1|); + push @complaints,"space between slash and KEYWORD $1"; + 1; + } else { + 0; + } + }, + + # ------------------------------------------------------------------ + # Miscellaneous ---------------------------------------------------- + # ------------------------------------------------------------------ + + sname("Class missing")=> + sub { + if ($syntax =~ /->\s*[a-zA-Z0-9_]+/) { + return 1 unless $title; + ($class)=split(/::/,$title); + $act = ($syntax =~ s/(->\s*)/$1\[${class}::\]/); + push @complaints,"Class missing from syntax entry"; + 1; + } else { + 0; + } + }, + + sname("Obj-> & CLASS BRACKETS") => + sub { + if ($syntax =~ /=\s*[a-zA-Z0-9_]+::Init/) { + $act = + ($syntax =~ s/(=\s*)([a-zA-Z0-9_]+::)Init/$1 Obj ->\[$2\]Init/); + push @complaints,"'Obj->' and class brackets missing"; + 1; + } else { + 0; + } + } + ); + + # UPDATE: Special keyword sections + # -------------------------------- + # Some keywords are present only in special, separate sections, + # either for brevity, or because many routine refer to them (e.g., + # Graphics Keywords). This hash helps locate these sections, and + # parse the keywords from it, so that routines relevant keywords can + # be linked to it correctly. The format is a hash with keywords + # identifying the section (e.g. as used with + # %has_special_section_keywords in the syntax special matchers, and + # matching the file <title> of the given section), and the value is + # a reference to a hash with two subroutines. The first,"matcher", + # identifies whether a given file corresponds to one of the special + # sections (it can use $title or $file_contents to help it), and the + # second, "parser", parses the keywords in the special section into + # %txt_kwds. + + # The hash with the matchers + %special_sections = + ( + Graphics=> + { matcher=> sub { + $title=~"Graphics Keywords"; + }, + parser=> sub { + $file_contents=~s|.*<h3\s*class="Heading2">\s*<a name="[0-9]+">\s*</a>Graphics? Keywords List||s; + parse_keywords($file_contents,1,'<h3\s*class="Heading2">'); + } + }, + MultiThreading=> + { matcher=>sub { + $title=~"Thread Pool Keywords"; + }, + parser=>sub { + $file_contents=~s|.*<h3\s*class="Heading2">\s*<a name="[0-9]+">\s*</a>Thread Pool Keywords||s; + parse_keywords($file_contents,1); + } + }, + Device=> + { matcher=>sub { + $title=~"Keywords Accepted by the IDL Devices"; + }, + parser=>sub { + parse_keywords($file_contents,1); + } + }); + + + # UPDATE: Special help topics + # --------------------------- + + # Some words which can show up in IDL source code have special + # help topics associated with them. This hash links downcase + # versions of these special sections (which must match the title + # of the file to be searched), to the appropriate words. Even if + # word and topic are the same, it must be mentioned here in order + # to trigger help on this word to visit that special section. + %special_topics = + ( + ("begin...end"=> {words=>["begin"]}), + (case=> {words=>["case","endcase","of"]}), + (common=> {words=>["common"]}), + (compile_opt=> {words=>["compile_opt"]}), + (continue=> {words=>["continue"]}), + ("repeat...until"=> {words=>["endrep","repeat","until"]}), + ("while...do"=> {words=>["while","do","endwhile"]}), + ("if...then...else"=> {words=>["if","then","endif","else","endelse"]}), + (pro=> {words=>["pro","end"]}), + (for=> {words=>["for","endfor"]}), + (switch=> {words=>["switch","endswitch"]}), + (return=> {words=>["return"]}) + ); + + # The standard Emacs file header notice. + $part_of_gnu_emacs = <<EOF; + +This file is part of GNU Emacs. + +GNU Emacs is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2, or (at your option) +any later version. + +GNU Emacs is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Emacs; see the file COPYING. If not, write to the +Free Software Foundation, Inc., 59 Temple Place - Suite 330, +Boston, MA 02111-1307, USA. +EOF + + # Make the Lisp versions, with the comment chars. + ($lisp_part_of_gnu_emacs = $part_of_gnu_emacs) =~ s/^.*?\S/;; $&/mg; +} + + +__END__ + +=head1 NAME + +get_html_rinfo - Extract information from IDL HTML documentation for +use with IDLWAVE + +=head1 SYNOPSIS + +B<get_html_rinfo> [-path path/to/html/files] + [-debug] + [-idl path/to/idl] + [-xname regexp] + [-xclass regexp] + +=head1 DESCRIPTION + +I<get_html_rinfo> processes the HTML documentation provided with IDL +(the Interactive Data Language, a proprietary language owned by +Research Systems Inc.). It also talks to the IDL program itself for +additional info. It produces the files needed by the Emacs IDLWAVE +mode to support completion, short help (routine info display) and +extensive online HTML help. It produces the file: + +=over 4 + +=item F<idlw-rinfo.el> + +Contains the routine information needed for completion and compact +routine info display, along with URL links into the HTML help system +for routines, keywords, system variables, and special statement. This +file must be installed on the Emacs load path and is part of the +standard IDLWAVE distribution. + +=back + +=head1 OPTIONS + +=over 5 + +=item B<-path> B<path/to/html/files> + +Specify the path to the HTML documentation files, as converted from +IDL's HTMLHelp docs using MS HTMLHelp Workshop. Default is the +current directory. + +=item B<-idl> B</name/or/path/of/idl/executable> + +Specify the name of the IDL executable to talk to for additional +information. The default is F<idl>. + +=item B<-debug> + +Debugging. Writes diagnostics to STDERR and creates these +files: + +F<get_html_rinfo.log>: A copy of the messages printed to STDERR. + +F<get_html_rinfo.rej>: All syntax entries which could not be parsed +correctly, together with an explanation. + +F<get_html_rinfo.cpl>: Syntax entries which contain typos or other +inconsistencies. I<get_html_rinfo> actually fixes most of these problems, +but you should feel free to complain about them to RSI. + +=item B<-xname> B<regexp> + +Exclude routine names matched by the regular expression REGEXP. +For example setting this to "^CDF_" will not store information +about routines starting with the "CDF_" prefix. + +=item B<-xclass> B<regexp> + +Exclude object classes matched by the regular expression REGEXP. + +Options may be abbreviated to uniqueness +=head1 FILES + +F<get_html_rinfo.log>, F<get_html_rinfo.cpl>, F<get_html_rinfo.rej> + +=head1 AUTHOR + +J.D. Smith <jdsm...@as.arizona.edu> + +=cut