These extensions to FixCRLF.java provide the following functionality:
Allow CR as a line ending
Allow for any tab interval from 2 to 80
Allow for preservation of TAB chars in string and character literals
Corresponding changes to fixcrlf.html.
Note that the patchfile contains both sets of changes even though they
are in diferent directories. How should this be handled?
Peter
--
Peter B. West [EMAIL PROTECTED] http://powerup.com.au/~pbwest
"Lord, to whom shall we go?"
Index: fixcrlf.html
===================================================================
RCS file: /home/cvspublic/jakarta-ant/docs/manual/CoreTasks/fixcrlf.html,v
retrieving revision 1.2
diff -u -u -r1.2 fixcrlf.html
--- fixcrlf.html 2001/02/13 12:31:51 1.2
+++ fixcrlf.html 2001/06/23 03:35:17
@@ -1,168 +1,309 @@
-<html>
+ <html>
+
+ <head>
+ <meta http-equiv="Content-Language" content="en-us">
+ <title>Ant User Manual</title>
+ </head>
-<head>
-<meta http-equiv="Content-Language" content="en-us">
-<title>Ant User Manual</title>
-</head>
-
-<body>
-
-<h2><a name="fixcrlf">FixCRLF</a></h2>
-<h3>Description</h3>
-<p>Adjusts a text file to local.</p>
-<p>It is possible to refine the set of files that are being adjusted. This can
be
-done with the <i>includes</i>, <i>includesfile</i>, <i>excludes</i>,
<i>excludesfile</i> and <i>defaultexcludes</i>
-attributes. With the <i>includes</i> or <i>includesfile</i> attribute you
specify the files you want to
-have included by using patterns. The <i>exclude</i> or <i>excludesfile</i>
attribute is used to specify
-the files you want to have excluded. This is also done with patterns. And
-finally with the <i>defaultexcludes</i> attribute, you can specify whether you
-want to use default exclusions or not. See the section on <a
-href="../dirtasks.html#directorybasedtasks">directory based tasks</a>, on how
the
-inclusion/exclusion of files works, and how to write patterns.</p>
-<p>This task forms an implicit <a href="../CoreTypes/fileset.html">FileSet</a>
and
-supports all attributes of <code><fileset></code>
-(<code>dir</code> becomes <code>srcdir</code>) as well as the nested
-<code><include></code>, <code><exclude></code> and
-<code><patternset></code> elements.</p>
-<h3>Parameters</h3>
-<table border="1" cellpadding="2" cellspacing="0">
- <tr>
- <td valign="top"><b>Attribute</b></td>
- <td valign="top"><b>Description</b></td>
- <td align="center" valign="top"><b>Required</b></td>
- </tr>
- <tr>
- <td valign="top">srcDir</td>
- <td valign="top">Where to find the files to be fixed up.</td>
- <td valign="top" align="center">Yes</td>
- </tr>
- <tr>
- <td valign="top">destDir</td>
- <td valign="top">Where to place the corrected files. Defaults to
- srcDir (replacing the original file)</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">includes</td>
- <td valign="top">comma separated list of patterns of files that must be
- included. All files are included when omitted.</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">includesfile</td>
- <td valign="top">the name of a file. Each line of this file is
- taken to be an include pattern</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">excludes</td>
- <td valign="top">comma separated list of patterns of files that must be
- excluded. No files (except default excludes) are excluded when
omitted.</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">excludesfile</td>
- <td valign="top">the name of a file. Each line of this file is
- taken to be an exclude pattern</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">defaultexcludes</td>
- <td valign="top">indicates whether default excludes should be used or not
- ("yes"/"no"). Default excludes are used when
omitted.</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">cr</td>
- <td valign="top">Specifies how carriage return (CR) characters are to
- be handled. Valid values for this property are:
- <ul>
- <li>add: ensure that there is a CR before every LF</li>
- <li>asis: leave CR characters alone</li>
- <li>remove: remove all CR characters</li>
- </ul>
- Default is based on the platform on which you are running this task.
- For Unix platforms, the default is remove. For DOS based systems
- (including Windows), the default is add.
- <p>
- Note: Unless this property is specified as "asis", extra CR
characters
- which do not precede a LF will be removed.</p>
- </td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">tab</td>
- <td valign="top">Specifies how tab characters are to be handled. Valid
- values for this property are:
- <ul>
- <li>add: convert sequences of spaces which span a tab stop to tabs</li>
- <li>asis: leave tab and space characters alone</li>
- <li>remove: convert tabs to spaces</li>
- </ul>
- Default for this parameter is "asis".
- <p>
- Note: Unless this property is specified as "asis", extra
spaces and
- tabs after the last non-whitespace character on the line will be
removed.</p>
- </td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">tablength</td>
- <td valign="top">The number of characters a TAB stop corresponds to.
- Must be a positive power of 2, default for this parameter is 8.</td>
- <td valign="top" align="center">No</td>
- </tr>
- <tr>
- <td valign="top">eof</td>
- <td valign="top">Specifies how DOS end of file (control-Z) characters are
- to be handled. Valid values for this property are:
- <ul>
- <li>add: ensure that there is an EOF character at the end of the
file</li>
- <li>asis: leave EOF characters alone</li>
- <li>remove: remove any EOF character found at the end</li>
- </ul>
- Default is based on the platform on which you are running this task.
- For Unix platforms, the default is remove. For DOS based systems
- (including Windows), the default is asis.
- </td>
- <td valign="top" align="center">No</td>
- </tr>
-</table>
-<h3>Examples</h3>
-<pre> <fixcrlf srcdir="${src}"
- cr="remove" eof="remove"
+ <body>
+
+ <h2><a name="fixcrlf">FixCRLF</a></h2>
+
+ <h3>Description</h3>
+
+ <p>
+ Adjusts a text file to local conventions.
+ </p>
+
+ <p>
+ The set of files to be adjusted can be refined with the
+ <i>includes</i>, <i>includesfile</i>, <i>excludes</i>,
+ <i>excludesfile</i> and <i>defaultexcludes</i>
+ attributes. Patterns provided through the <i>includes</i> or
+ <i>includesfile</i> attributes specify files to be
+ included. Patterns provided throught the <i>exclude</i> or
+ <i>excludesfile</i> attribute specify files to be
+ excluded. Additionally, default exclusions can be specified with
+ the <i>defaultexcludes</i> attribute. See the section on <a
+ href="../dirtasks.html#directorybasedtasks">directory based
+ tasks</a>, for details of file inclusion/exclusion patterns
+ and their usage.
+ </p>
+
+ <p>
+ This task forms an implicit <a href =
+ "../CoreTypes/fileset.html" >FileSet</a> and supports all
+ attributes of <code><fileset></code> (<code>dir</code>
+ becomes <code>srcdir</code>) as well as the nested
+ <code><include></code>, <code><exclude></code> and
+ <code><patternset></code> elements.
+ </p>
+
+ <p>
+ The output file is only written if it is a new file, or if it
+ differs from the existing file. This prevents spurious
+ rebuilds based on unchanged files which have been regenerated
+ by this task.
+ </p>
+
+ <h3>Parameters</h3>
+
+ <table border="1" cellpadding="2" cellspacing="0">
+ <tr>
+ <td valign="top"><b>Attribute</b></td>
+ <td valign="top"><b>Description</b></td>
+ <td align="center" valign="top"><b>Required</b></td>
+ </tr>
+ <tr>
+ <td valign="top">srcDir</td>
+ <td valign="top">Where to find the files to be fixed up.</td>
+ <td valign="top" align="center">Yes</td>
+ </tr>
+ <tr>
+ <td valign="top">destDir</td>
+ <td valign="top">Where to place the modified files. Defaults to
+ srcDir (replacing the original file)</td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">includes</td>
+ <td valign="top">
+ Comma separated list of patterns of files that must be
+ included. All files are included when omitted.
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">includesfile</td>
+ <td valign="top">
+ The name of a file. Each line of this file is taken to be
+ an include pattern
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">excludes</td>
+ <td valign="top">
+ Comma separated list of patterns of files that must be
+ excluded. No files (except default excludes) are excluded
+ when omitted.
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">excludesfile</td>
+ <td valign="top">
+ The name of a file. Each line of this file is taken to be
+ an exclude pattern
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">defaultexcludes</td>
+ <td valign="top">
+ Indicates whether default excludes should be used or not
+ ("yes"/"no"). Default excludes are
+ used when omitted.
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">eol</td>
+ <td valign="top">
+ Specifies how end-of-line (EOL) characters are to be
+ handled. The EOL characters are CR, LF and the pair CRLF.
+ Valid values for this property are:
+ <ul>
+ <li>asis: leave EOL characters alone</li>
+ <li>cr: convert all EOLs to a single CR</li>
+ <li>lf: convert all EOLs to a single LF</li>
+ <li>crlf: convert all EOLs to the pair CRLF</li>
+ </ul>
+ Default is based on the platform on which you are running
+ this task. For Unix platforms, the default is "lf".
+ For DOS based systems (including Windows), the default is
+ "crlf". Defaults may be added later for other systems,
+ e.g. Mac OS.
+ <p>
+ This is the preferred method for specifying EOL. The
+ "<i><b>addcr</b></i>" attribute (see below) is
+ now deprecated. If both are specified, "eol"
+ takes precedence.
+ </p>
+ <p>
+ <i>N.B.</i>: One special case is recognized. The three
+ characters CR-CR-LF are regarded as a single EOL.
+ Unless this property is specified as "asis",
+ this sequence will be converted into the specified EOL
+ type.
+ </p>
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">addcr</td>
+ <td valign="top">
+ <i><b>Deprecated.</b></i> Specifies how CR characters are
+ to be handled at end-of-line (EOL). Valid values for this
+ property are:
+ <ul>
+ <li>asis: leave EOL characters alone.</li>
+ <li>
+ add: add a CR before any single LF characters. The
+ intent is to convert all EOLs to the pair CRLF.
+ </li>
+ <li>
+ remove: remove all CRs from the file. The intent is
+ to convert all EOLs to a single LF.
+ </li>
+ </ul>
+ Default is based on the platform on which you are running
+ this task. For Unix platforms, the default is "remove".
+ For DOS based systems (including Windows), the default is
+ "add".
+ <p>
+ <i>N.B.</i>: One special case is recognized. The three
+ characters CR-CR-LF are regarded as a single EOL.
+ Unless this property is specified as "asis",
+ this sequence will be converted into the specified EOL
+ type.
+ </p>
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">javafiles</td>
+ <td valign="top">
+ Used only in association with the
+ "<i><b>tab</b></i>" attribute (see below), this
+ boolean attribute indicates whether the fileset is a set
+ of java source files
+ ("yes"/"no"). Defaults to
+ "no". See notes in section on "tab".
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">tab</td>
+ <td valign="top">
+ Specifies how tab characters are to be handled. Valid
+ values for this property are:
+ <ul>
+ <li>
+ add: convert sequences of spaces which span a tab stop
+ to tabs
+ </li>
+ <li>asis: leave tab and space characters alone</li>
+ <li>remove: convert tabs to spaces</li>
+ </ul>
+ Default for this parameter is "asis".
+ <p>
+ <i>N.B.</i>: When the attribute
+ "<i><b>javafiles</b></i>" (see above) is
+ "true", literal TAB characters occurring
+ within Java string or character constants are never
+ modified. This functionality also requires the
+ recognition of Java-style comments.
+ </p>
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">tablength</td>
+ <td valign="top">
+ TAB character interval. Valid values are between 2
+ and 80 inclusive. The default for this parameter is 8.
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ <tr>
+ <td valign="top">eof</td>
+ <td valign="top">
+ Specifies how DOS end of file (control-Z) characters are
+ to be handled. Valid values for this property are:
+ <ul>
+ <li>
+ add: ensure that there is an EOF character at the end
+ of the file
+ </li>
+ <li>asis: leave EOF characters alone</li>
+ <li>
+ remove: remove any EOF character(s) found at the end
+ of the file
+ </li>
+ </ul>
+ Default is based on the platform on which you are running
+ this task. For Unix platforms, the default is remove.
+ For DOS based systems (including Windows), the default is
+ asis.
+ </td>
+ <td valign="top" align="center">No</td>
+ </tr>
+ </table>
+
+ <h3>Examples</h3>
+
+ <pre> <fixcrlf srcdir="${src}"
+ eol="lf" eof="remove"
includes="**/*.sh"
/></pre>
-<p>Removes carriage return and eof characters from the shell scripts. Tabs and
-spaces are left as is.</p>
-<pre> <fixcrlf srcdir="${src}"
- cr="add"
+ <p>
+ Replaces EOLs with LF characters and removes eof characters
+ from the shell scripts. Tabs and spaces are left as is.
+ </p>
+ <pre> <fixcrlf srcdir="${src}"
+ eol="crlf"
includes="**/*.bat"
/></pre>
-<p>Ensures that there are carriage return characters prior to evey line feed.
-Tabs and spaces are left as is.
-EOF characters are left alone if run on
-DOS systems, and are removed if run on Unix systems.</p>
-<pre> <fixcrlf srcdir="${src}"
+ <p>
+ Replaces all EOLs with cr-lf pairs in the batch files. Tabs
+ and spaces are left as is. EOF characters are left alone if
+ run on DOS systems, and are removed if run on Unix systems.
+ </p>
+ <pre> <fixcrlf srcdir="${src}"
tabs="add"
includes="**/Makefile"
/></pre>
-<p>Adds or removes CR characters to match local OS conventions, and
-converts spaces to tabs when appropriate. EOF characters are left alone if
-run on DOS systems, and are removed if run on Unix systems.
-Many versions of make require tabs prior to commands.</p>
-<pre> <fixcrlf srcdir="${src}"
+ <p>
+ Sets EOLs according to local OS conventions, and converts
+ sequences of spaces and tabs to the minimal set of spaces and
+ tabs which will maintain spacing within the line. Tabs are
+ set at 8 character intervals. EOF characters are left alone
+ if run on DOS systems, and are removed if run on Unix systems.
+ Many versions of make require tabs prior to commands.
+ </p>
+ <pre> <fixcrlf srcdir="${src}"
tabs="remove"
+ tablength="3"
+ eol="lf"
+ javafiles="yes"
+ includes="**/*.java"
+ /></pre>
+ <p>
+ Converts all EOLs in the included java source files to a
+ single LF. Replace all TAB characters except those in string
+ or character constants with spaces, assuming a tab width of 3.
+ If run on a unix system, any CTRL-Z EOF characters at the end
+ of the file are removed. On DOS/Windows, any such EOF
+ characters will be left untouched.
+ </p>
+ <pre> <fixcrlf srcdir="${src}"
+ tabs="remove"
includes="**/README*"
/></pre>
-<p>Adds or removes CR characters to match local OS conventions, and
-converts all tabs to spaces. EOF characters are left alone if run on
-DOS systems, and are removed if run on Unix systems.
-You never know what editor a user will use to browse README's.</p>
-<hr>
-<p align="center">Copyright © 2000,2001 Apache Software Foundation. All
rights
-Reserved.</p>
+ <p>
+ Sets EOLs according to local OS conventions, and converts all
+ tabs to spaces, assuming a tab width of 8. EOF characters are
+ left alone if run on DOS systems, and are removed if run on
+ Unix systems. You never know what editor a user will use to
+ browse README's.
+ </p>
+ <hr>
+ <p align="center">
+ Copyright © 2000,2001 Apache Software Foundation. All rights
+ Reserved.
+ </p>
-</body>
-</html>
+ </body>
+ </html>
Index: FixCRLF.java
===================================================================
RCS file:
/home/cvspublic/jakarta-ant/src/main/org/apache/tools/ant/taskdefs/FixCRLF.java,v
retrieving revision 1.14
diff -u -u -r1.14 FixCRLF.java
--- FixCRLF.java 2001/04/03 11:26:26 1.14
+++ FixCRLF.java 2001/06/23 03:39:24
@@ -53,17 +53,20 @@
*/
package org.apache.tools.ant.taskdefs;
-
+import org.apache.tools.ant.Project;
import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.DirectoryScanner;
-import org.apache.tools.ant.Project;
import org.apache.tools.ant.types.EnumeratedAttribute;
import java.io.*;
import java.util.*;
-import java.text.*;
/**
+ * FixWhiteSpace.java
+ *
+ * Based on FixCR.java
+ * by Sam Ruby <a href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</a>.
+ *
* Task to convert text source files to local OS formatting conventions, as
* well as repair text files damaged by misconfigured or misguided editors or
* file transfer programs.
@@ -74,7 +77,7 @@
* <li>destdir
* <li>include
* <li>exclude
- * <li>cr
+ * <li>eol
* <li>tab
* <li>eof
* </ul>
@@ -82,42 +85,101 @@
* <p>
* When this task executes, it will scan the srcdir based on the include
* and exclude properties.
+ * <p>
+ * This version generalises the handling of EOL characters, and allows
+ * for CR-only line endings (which I suspect is the standard on Macs.)
+ * Tab handling has also been generalised to accommodate any tabwidth
+ * from 2 to 80, inclusive. Importantly, it will leave untouched any
+ * literal TAB characters embedded within string or character constants.
* <p>
- * <em>Warning:</em> do not run on binary or carefully formatted files.
- * this may sound obvious, but if you don't specify asis, presume that
- * your files are going to be modified. If you want tabs to be fixed,
+ * <em>Warning:</em> do not run on binary files.
+ * <em>Caution:</em> run with care on carefully formatted files.
+ * This may sound obvious, but if you don't specify asis, presume that
+ * your files are going to be modified. If "tabs" is "add" or "remove",
* whitespace characters may be added or removed as necessary. Similarly,
- * for CR's - in fact cr="add" can result in cr characters being removed.
- * (to handle cases where other programs have converted CRLF into CRCRLF).
+ * for CR's - in fact "eol"="crlf" can result in cr characters being removed
+ * in one special case accommodated from the comments in the original version
+ * of this program (FixCRLF), i.e., CRCRLF is regarded as a single EOL
+ * to handle cases where other programs have converted CRLF into CRCRLF.
*
+ * Created: Wed Jun 6 23:38:54 2001
+ *
* @author Sam Ruby <a href="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</a>
+ * @author <a href="mailto:[EMAIL PROTECTED]">Peter B. West</a>
+ * @version $Revision: 1.1 $ $Name: $
*/
-
-public class FixCRLF extends MatchingTask {
- private int addcr; // cr: -1 => remove, 0 => asis, +1 => add
- private int addtab; // tab: -1 => remove, 0 => asis, +1 => add
- private int ctrlz; // eof: -1 => remove, 0 => asis, +1 => add
- private int tablength = 8; // length of tab in spaces
+public class FixWhiteSpace extends MatchingTask
+{
+ private static final int UNDEF = -1;
+ private static final int NOTJAVA = 0;
+ private static final int LOOKING = 1;
+ private static final int IN_CHAR_CONST = 2;
+ private static final int IN_STR_CONST = 3;
+ private static final int IN_SINGLE_COMMENT = 4;
+ private static final int IN_MULTI_COMMENT = 5;
+
+ private static final int ASIS = 0;
+ private static final int CR = 1;
+ private static final int LF = 2;
+ private static final int CRLF = 3;
+ private static final int ADD = 1;
+ private static final int REMOVE = -1;
+ private static final int SPACES = -1;
+ private static final int TABS = 1;
+
+ private int tablength = 8;
+ private StringBuffer spaces = new StringBuffer(8);
+ private StringBuffer linebuf = new StringBuffer(1024);
+ private StringBuffer linebuf2 = new StringBuffer(1024);
+ private int eol;
+ private String eolstr;
+ private int ctrlz;
+ private String ctrlzstr;
+ private int tabs;
+ private boolean javafiles = false;
+ private int defaultState = NOTJAVA;
private File srcDir;
private File destDir = null;
+ public FixWhiteSpace () {
+ spaces.append(" ");
+ if (System.getProperty("line.separator").equals("\r")) {
+ eol = CR;
+ eolstr = new String("\r");
+ } else if (System.getProperty("line.separator").equals("\n")) {
+ eol = LF;
+ eolstr = new String("\n");
+ } else {
+ eol = CRLF;
+ eolstr = new String("\r\n");
+ }
+ if (System.getProperty("path.separator").equals(":")) {
+ ctrlz = REMOVE;
+ ctrlzstr = new String("");
+ }
+ else {
+ ctrlz = ASIS;
+ }
+
+ }
+
/**
- * Defaults the properties based on the system type.
- * <ul><li>Unix: cr="remove" tab="asis" eof="remove"
- * <li>DOS: cr="add" tab="asis" eof="asis"</ul>
- */
- public FixCRLF() {
- if (System.getProperty("path.separator").equals(":")) {
- addcr = -1; // remove
- ctrlz = -1; // remove
- } else {
- addcr = +1; // add
- ctrlz = 0; // asis
+ * Enumerated attribute with the values "asis", "add" and "remove".
+ */
+ public static class AddAsisRemove extends EnumeratedAttribute {
+ public String[] getValues() {
+ return new String[] {"add", "asis", "remove"};
}
}
+ public static class CrLf extends EnumeratedAttribute {
+ public String[] getValues() {
+ return new String[] {"asis", "cr", "lf", "crlf"};
+ }
+ }
+
/**
* Set the source dir to find the source text files.
*/
@@ -133,30 +195,46 @@
this.destDir = destDir;
}
+ /**
+ * Fixing Java source files?
+ */
+ public void setJavafiles(boolean javafiles) {
+ this.javafiles = javafiles;
+ defaultState = javafiles ? LOOKING : NOTJAVA;
+ }
+
+
/**
- * Specify how carriage return (CR) charaters are to be handled
+ * Specify how EndOfLine characters are to be handled
*
* @param option valid values:
* <ul>
- * <li>add: ensure that there is a CR before every LF
- * <li>asis: leave CR characters alone
- * <li>remove: remove all CR characters
+ * <li>asis: leave line endings alone
+ * <li>cr: convert line endings to CR
+ * <li>lf: convert line endings to LF
+ * <li>crlf: convert line endings to CRLF
* </ul>
*/
- public void setCr(AddAsisRemove attr) {
+ public void setEol(CrLf attr) {
String option = attr.getValue();
- if (option.equals("remove")) {
- addcr = -1;
- } else if (option.equals("asis")) {
- addcr = 0;
+ if (option.equals("asis")) {
+ eol = ASIS;
+ } else if (option.equals("cr")) {
+ eol = CR;
+ eolstr = new String("\r");
+ } else if (option.equals("lf")) {
+ eol = LF;
+ eolstr = new String("\n");
} else {
- // must be "add"
- addcr = +1;
+ // Must be "crlf"
+ eol = CRLF;
+ eolstr = new String("\r\n");
}
}
/**
- * Specify how tab charaters are to be handled
+ * Specify how tab characters outside string constants are to be handled
+ * Note that this includes tabs within comments
*
* @param option valid values:
* <ul>
@@ -168,26 +246,30 @@
public void setTab(AddAsisRemove attr) {
String option = attr.getValue();
if (option.equals("remove")) {
- addtab = -1;
+ tabs = SPACES;
} else if (option.equals("asis")) {
- addtab = 0;
+ tabs = ASIS;
} else {
// must be "add"
- addtab = +1;
+ tabs = TABS;
}
}
/**
* Specify tab length in characters
*
- * @param tlength specify the length of tab in spaces, has to be a power
of 2
+ * @param tlength specify the length of tab in spaces, has to be a
+ * power of 2
*/
public void setTablength(int tlength) throws BuildException {
- if (tlength < 2 || (tlength & (tlength-1)) != 0) {
- throw new BuildException("tablength must be a positive power of 2",
- location);
+ if (tlength < 2 || tlength >80) {
+ throw new BuildException("tablength must be between 2 and 80",
+ location);
}
tablength = tlength;
+ for (int i = 0; i < tablength; i++) {
+ spaces.append(' ');
+ }
}
/**
@@ -203,15 +285,16 @@
public void setEof(AddAsisRemove attr) {
String option = attr.getValue();
if (option.equals("remove")) {
- ctrlz = -1;
+ ctrlz = REMOVE;
+ ctrlzstr = new String("");
} else if (option.equals("asis")) {
- ctrlz = 0;
+ ctrlz = ASIS;
} else {
// must be "add"
- ctrlz = +1;
+ ctrlz = ADD;
+ ctrlzstr = new String("\u001A");
}
}
-
/**
* Executes the task.
*/
@@ -238,9 +321,10 @@
// log options used
log("options:" +
- " cr=" + (addcr==1 ? "add" : addcr==0 ? "asis" : "remove") +
- " tab=" + (addtab==1 ? "add" : addtab==0 ? "asis" : "remove") +
- " eof=" + (ctrlz==1 ? "add" : ctrlz==0 ? "asis" : "remove") +
+ " eol=" +
+ (eol==ASIS ? "asis" : eol==CR ? "cr" : eol==LF ? "lf" : "crlf") +
+ " tab=" + (tabs==TABS ? "add" : tabs==ASIS ? "asis" : "remove") +
+ " eof=" + (ctrlz==ADD ? "add" : ctrlz==ASIS ? "asis" : "remove") +
" tablength=" + tablength,
Project.MSG_VERBOSE);
@@ -248,182 +332,702 @@
String[] files = ds.getIncludedFiles();
for (int i = 0; i < files.length; i++) {
- File srcFile = new File(srcDir, files[i]);
+ processFile(files[i]);
+ }
- // read the contents of the file
- int count = (int)srcFile.length();
- byte indata[] = new byte[count];
+ }
+
+ private void processFile(String file)
+ throws BuildException
+ {
+ File srcFile = new File(srcDir, file);
+ File destFile = srcFile;
+ BufferedWriter outWriter;
+ BufferLine line;
+
+ // read the contents of the file
+ OneLiner lines = new OneLiner((int)srcFile.length());
+
+ try {
+ FileReader inReader = new FileReader(srcFile);
+ inReader.read(lines.getBuffer());
+ inReader.close();
+ } catch (IOException e) {
+ throw new BuildException(e);
+ }
+
+ // Set up the output Writer
+ try {
+ if (destDir != null) destFile = new File(destDir, file);
+ FileWriter writer = new FileWriter(destFile);
+ outWriter = new BufferedWriter(writer);
+ } catch (IOException e) {
+ throw new BuildException(e);
+ }
+
+ // Having read he whole buffer, process it now one line at a time
+ // processing TAB characters.
+ // Initialize OneLiner data
+ lines.setDatalen();
+
+ while (lines.hasNext()) {
+ // In-line states
+ int endComment;
+
try {
- FileInputStream inStream = new FileInputStream(srcFile);
- inStream.read(indata);
- inStream.close();
- } catch (IOException e) {
+ line = (BufferLine)lines.next();
+ } catch (NoSuchElementException e) {
throw new BuildException(e);
}
-
- // count the number of cr, lf, and tab characters
- int cr = 0;
- int lf = 0;
- int tab = 0;
-
- for (int k=0; k<count; k++) {
- byte c = indata[k];
- if (c == '\r') cr++;
- if (c == '\n') lf++;
- if (c == '\t') tab++;
- }
-
- // check for trailing eof
- boolean eof = ((count>0) && (indata[count-1] == 0x1A));
-
- // log stats (before fixes)
- log(srcFile + ": size=" + count + " cr=" + cr +
- " lf=" + lf + " tab=" + tab + " eof=" + eof,
- Project.MSG_VERBOSE);
-
- // determine the output buffer size (slightly pessimisticly)
- int outsize = count;
- if (addcr != 0) outsize-=cr;
- if (addcr == +1) outsize+=lf;
- if (addtab == -1) outsize+=tab*(tablength-1);
- if (ctrlz == +1) outsize+=1;
-
- // copy the data
- byte outdata[] = new byte[outsize];
- int o = 0; // output offset
- int line = o; // beginning of line
- int col = 0; // desired column
-
- for (int k=0; k<count; k++) {
- switch (indata[k]) {
- case (byte)' ':
- // advance column
- if (addtab == 0) outdata[o++]=(byte)' ';
- col++;
+
+ String lineString = line.getLineString();
+ int linelen = line.length();
+
+ // Note - all of the following processing NOT done for tabs ASIS
+
+ if (tabs == ASIS) {
+ // Just copy the body of the line across
+ try {
+ outWriter.write(lineString);
+ } catch (IOException e) {
+ throw new BuildException(e);
+ } // end of try-catch
+ }
+ else { // (tabs != ASIS)
+ int ptr;
+
+ while ((ptr = line.getNext()) < linelen) {
+
+ switch (lines.getState()) {
+
+ case NOTJAVA:
+ notInConstant(line, line.getEoline(), outWriter);
break;
-
- case (byte)'\t':
- if (addtab == 0) {
- // treat like any other character
- outdata[o++]=(byte)'\t';
- col++;
- } else {
- // advance column to next tab stop
- col = (col|(tablength-1))+1;
+
+ case IN_MULTI_COMMENT:
+ if ((endComment =
+ lineString.indexOf("*/", line.getNext())
+ ) >= 0)
+ {
+ // End of multiLineComment on this line
+ endComment += 2; // Include the end token
+ lines.setState(LOOKING);
+ }
+ else {
+ endComment = linelen;
}
- break;
-
- case (byte)'\r':
- if (addcr == 0) {
- // treat like any other character
- outdata[o++]=(byte)'\r';
- col++;
+
+ notInConstant(line, endComment, outWriter);
+ break;
+
+ case IN_SINGLE_COMMENT:
+ notInConstant(line, line.getEoline(), outWriter);
+ lines.setState(LOOKING);
+ break;
+
+ case IN_CHAR_CONST:
+ case IN_STR_CONST:
+ // Got here from LOOKING by finding an opening "\'"
+ // next points to that quote character.
+ // Find the end of the constant. Watch out for
+ // backslashes. Literal tabs are left unchanged, and
+ // the column is adjusted accordingly.
+
+ int begin = line.getNext();
+ char terminator = (lines.getState() == IN_STR_CONST
+ ? '\"'
+ : '\'');
+ endOfCharConst(line, terminator);
+ while (line.getNext() < line.getLookahead()) {
+ if (line.getNextCharInc() == '\t') {
+ line.setColumn(
+ line.getColumn() +
+ tablength -
+ line.getColumn() % tablength);
+ }
+ else {
+ line.incColumn();
+ }
}
+
+ // Now output the substring
+ try {
+ outWriter.write(
+ line.substring(begin, line.getNext()));
+ } catch (IOException e) {
+ throw new BuildException(e);
+ } // end of try-catch
+
+ lines.setState(LOOKING);
+
+ break;
+
+
+ case LOOKING:
+ nextStateChange(line);
+ notInConstant(line, line.getLookahead(), outWriter);
break;
+
+ } // end of switch (state)
+
+ } // end of while (line.getNext() < linelen)
+
+ } // end of else (tabs != ASIS)
+
+ // Handle end of line now
+ if (eol == ASIS) {
+ eolstr = line.getEol();
+ }
- case (byte)'\n':
- // start a new line (optional CR followed by LF)
- if (addcr == +1) outdata[o++]=(byte)'\r';
- outdata[o++]=(byte)'\n';
- line=o;
- col=0;
- break;
+ try {
+ outWriter.write(eolstr);
+ } catch (IOException e) {
+ throw new BuildException(e);
+ } // end of try-catch
+
+ } // end of while (lines.hasNext())
+ // Handle CTRLZ
+ try {
+ outWriter.write(ctrlzstr);
+ outWriter.close();
+ } catch (IOException e) {
+ throw new BuildException(e);
+ } // end of try-catch
+
+ }
- default:
- // add tabs if two or more spaces are required
- if (addtab>0 && o+1<line+col) {
- // determine logical column
- int diff=o-line;
-
- // add tabs until this column would be passed
- // note: the start of line is adjusted to match
- while ((diff|(tablength-1))<col) {
- outdata[o++]=(byte)'\t';
- line-=(tablength-1)-(diff&(tablength-1));
- diff=o-line;
- };
- };
-
- // space out to desired column
- while (o<line+col) outdata[o++]=(byte)' ';
-
- // append desired character
- outdata[o++]=indata[k];
- col++;
- }
- }
- // add or remove an eof character as required
- if (ctrlz == +1) {
- if (outdata[o-1]!=0x1A) outdata[o++]=0x1A;
- } else if (ctrlz == -1) {
- if (o>2 && outdata[o-1]==0x0A && outdata[o-2]==0x1A) o--;
- if (o>1 && outdata[o-1]==0x1A) o--;
- }
- // output the data
- try {
- // Determine whether it should be written,
- // that is if it is different than the potentially already
existing file
- boolean write = false;
- byte[] existingdata = indata;
- File destFile = srcFile;
- if (destDir != null) {
- destFile = new File(destDir, files[i]);
- if(destFile.isFile()) {
- int len = (int)destFile.length();
- if(len != o) {
- write = true;
- } else {
- existingdata = new byte[len];
- try {
- FileInputStream in = new
FileInputStream(destFile);
- in.read(existingdata);
- in.close();
- } catch (IOException e) {
- throw new BuildException(e);
- }
- }
- } else {
- write = true;
+ /**
+ * Scan a BufferLine for the next state changing token: the beginning
+ * of a single or multi-line comment, a character or a string constant.
+ *
+ * As a side-effect, sets the buffer state to the next state, and sets
+ * field lookahead to the first character of the state-changing token, or
+ * to the next eol character.
+ *
+ * @param BufferLine bufline BufferLine containing the string
+ * to be processed
+ * @exception org.apache.tools.ant.BuildException
+ * Thrown when end of line is reached
+ * before the terminator is found.
+ */
+ private void nextStateChange(BufferLine bufline)
+ throws BuildException
+ {
+ int eol = bufline.getEoline();
+ int ptr = bufline.getNext();
+
+
+ // Look for next single or double quote, double slash or slash star
+ while (ptr < eol) {
+ switch (bufline.getChar(ptr++)) {
+ case '\'':
+ bufline.parent.setState(IN_CHAR_CONST);
+ bufline.setLookahead(--ptr);
+ return;
+ case '\"':
+ bufline.parent.setState(IN_STR_CONST);
+ bufline.setLookahead(--ptr);
+ return;
+ case '/':
+ if (ptr < eol) {
+ if (bufline.getChar(ptr) == '*') {
+ bufline.parent.setState(IN_MULTI_COMMENT);
+ bufline.setLookahead(--ptr);
+ return;
+ }
+ else if (bufline.getChar(ptr) == '/') {
+ bufline.parent.setState(IN_SINGLE_COMMENT);
+ bufline.setLookahead(--ptr);
+ return;
}
}
+ break;
+ } // end of switch (bufline.getChar(ptr++))
+
+ } // end of while (ptr < eol)
+ // Eol is the next token
+ bufline.setLookahead(ptr);
+ }
- if(!write) {
- if(existingdata.length != o) {
- write = true;
- } else {
- for(int j = 0; j < o; ++j) {
- if(existingdata[j] != outdata[j]) {
- write = true;
- break;
- }
- }
- }
+
+ /**
+ * Scan a BufferLine forward from the 'next' pointer
+ * for the end of a character constant. Set 'lookahead' pointer to the
+ * character following the terminating quote.
+ *
+ * @param BufferLine bufline BufferLine containing the string
+ * to be processed
+ * @param char terminator The constant terminator
+ *
+ * @exception org.apache.tools.ant.BuildException
+ * Thrown when end of line is reached
+ * before the terminator is found.
+ */
+ private void endOfCharConst(BufferLine bufline, char terminator)
+ throws BuildException
+ {
+ int ptr = bufline.getNext();
+ int eol = bufline.getEoline();
+ char c;
+ ptr++; // skip past initial quote
+ while (ptr < eol) {
+ if ((c = bufline.getChar(ptr++)) == '\\') {
+ ptr++;
+ }
+ else {
+ if (c == terminator) {
+ bufline.setLookahead(ptr);
+ return;
}
+ }
+ } // end of while (ptr < eol)
+ // Must have fallen through to the end of the line
+ throw new BuildException("endOfCharConst: unterminated char constant");
+ }
+
- if(write) {
- log(destFile + " is being written", Project.MSG_VERBOSE);
- FileOutputStream outStream = new
FileOutputStream(destFile);
- outStream.write(outdata,0,o);
- outStream.close();
- } else {
- log(destFile + " is not written, as the contents are
identical",
- Project.MSG_VERBOSE);
+ /**
+ * Process a BufferLine string which is not part of of a string constant.
+ * The start position of the string is given by the 'next' field.
+ * Sets the 'next' and 'column' fields in the BufferLine.
+ *
+ * @param BufferLine bufline BufferLine containing the string
+ * to be processed
+ * @param int end Index just past the end of the
+ * string
+ * @param BufferedWriter outWriter Sink for the processed string
+ */
+ private void notInConstant(BufferLine bufline, int end,
+ BufferedWriter outWriter)
+ {
+ // N.B. both column and string index are zero-based
+ // Process a string not part of a constant;
+ // i.e. convert tabs<->spaces as required
+ // This is NOT called for ASIS tab handling
+ int nextTab;
+ int nextStop;
+ int tabspaces;
+ String line = bufline.substring(bufline.getNext(), end);
+ int place = 0; // Zero-based
+ int col = bufline.getColumn(); // Zero-based
+
+ // process sequences of white space
+ // first convert all tabs to spaces
+ linebuf.setLength(0);
+ while ((nextTab = line.indexOf((int) '\t', place)) >= 0) {
+ linebuf.append(line.substring(place, nextTab)); // copy to the TAB
+ col += nextTab - place;
+ tabspaces = tablength - (col % tablength);
+ linebuf.append(spaces.substring(0, tabspaces));
+ col += tabspaces;
+ place = nextTab + 1;
+ } // end of while
+ linebuf.append(line.substring(place, line.length()));
+ // if converting to spaces, all finished
+ String linestring = new String(linebuf.toString());
+ if (tabs == REMOVE) {
+ try {
+ outWriter.write(linestring);
+ } catch (IOException e) {
+ throw new BuildException(e);
+ } // end of try-catch
+ }
+ else { // tabs == ADD
+ int tabCol;
+ linebuf2.setLength(0);
+ place = 0;
+ col = bufline.getColumn();
+ int placediff = col - 0;
+ // for the length of the string, cycle through the tab stop
+ // positions, checking for a space preceded by at least one
+ // other space at the tab stop. if so replace the longest possible
+ // preceding sequence of spaces with a tab.
+ nextStop = col + (tablength - col % tablength);
+ if (nextStop - col < 2) {
+ linebuf2.append(linestring.substring(
+ place, nextStop - placediff));
+ place = nextStop - placediff;
+ nextStop += tablength;
+ }
+
+ for ( ; nextStop - placediff <= linestring.length()
+ ; nextStop += tablength)
+ {
+ for (tabCol = nextStop;
+ --tabCol - placediff >= place
+ && linestring.charAt(tabCol - placediff) == ' '
+ ;)
+ {
+ ; // Loop for the side-effects
+ }
+ // tabCol is column index of the last non-space character
+ // before the next tab stop
+ if (nextStop - tabCol > 2) {
+ linebuf2.append(linestring.substring(
+ place, ++tabCol - placediff));
+ linebuf2.append('\t');
}
+ else {
+ linebuf2.append(linestring.substring(
+ place, nextStop - placediff));
+ } // end of else
+
+ place = nextStop - placediff;
+ } // end of for (nextStop ... )
+
+ // pick up that last bit, if any
+ linebuf2.append(linestring.substring(place, linestring.length()));
+
+ try {
+ outWriter.write(linebuf2.toString());
} catch (IOException e) {
throw new BuildException(e);
+ } // end of try-catch
+
+ } // end of else tabs == ADD
+
+ // Set column position as modified by this method
+ bufline.setColumn(bufline.getColumn() + linestring.length());
+ bufline.setNext(end);
+
+ }
+
+
+ class OneLiner implements Iterator
+ {
+ private static final char CTRLZ = '\u001A';
+
+ private char[] buffer;
+ private int state = defaultState;
+
+ private int next = UNDEF;
+ private int datalen = UNDEF;
+ private String ctrlzstr;
+
+ public OneLiner(int size)
+ throws BuildException
+ {
+ if (size < 0) {
+ throw new BuildException("OneLiner: negative buffer size");
+ }
+
+ buffer = new char[size];
+ }
+
+ public void setDatalen()
+ throws BuildException
+ {
+ if (buffer == null) {
+ throw new BuildException("setDatalen: null buffer");
}
+ next = 0;
+ // Isolate any trailing Ctrl-Z characters
+ datalen = buffer.length;
+ while (datalen > 0 && buffer[datalen - 1] == CTRLZ) {
+ datalen--;
+ }
+ ctrlzstr = new String(buffer, datalen, buffer.length - datalen);
+ }
+
+ public int getNext() {
+ return next;
+ }
+
+ public int getDatalen() {
+ return datalen;
+ }
+
+ public String getCtrlzStr() {
+ return ctrlzstr;
+ }
+
+ public char[] getBuffer() {
+ return buffer;
+ }
+
+ public int getState() {
+ return state;
+ }
+
+ public void setState(int state) {
+ this.state = state;
+ }
- } /* end for */
+ public boolean hasNext()
+ {
+ return next < datalen - 1;
+ }
+
+ public Object next()
+ throws NoSuchElementException
+ {
+ if (! hasNext()) {
+ throw new NoSuchElementException("OneLiner");
+ }
+
+ // Find the next line in the buffer
+ int first = next;
+ int startNext = UNDEF;
+ // startNext marks the beginning of the following line
+ while (next < datalen && startNext < next) {
+ switch (buffer[next]) {
+ case '\r':
+ // Check for \r, \r\n and \r\r\n
+ // Regard \r\r not followed by \n as two lines
+ startNext = next + 1;
+ if (startNext < datalen) {
+ switch (buffer[startNext]) {
+ case '\r':
+ if ((startNext + 1) < datalen
+ && buffer[startNext + 1] == '\n') {
+ startNext += 2;
+ }
+ break;
+
+ case '\n':
+ ++startNext;
+ break;
+
+ } // end of switch (buffer[startNext])
+
+ } // end of if (++startNext < datalen)
+ break;
+
+ case '\n':
+ startNext = next + 1;
+ break;
+ default:
+ next++;
+ break;
+ }
+
+ } // end of while (next < datalen && startNext < next)
+ if (startNext < next) { // Ran out of buffer without an EOL
+ startNext = next;
+ }
+ int eolstr = next;
+ next = startNext;
+ return new BufferLine(this, first, eolstr, next);
+ }
+
+ public void remove()
+ throws UnsupportedOperationException
+ {
+ throw new UnsupportedOperationException("OneLiner");
+ }
+
}
- /**
- * Enumerated attribute with the values "asis", "add" and "remove".
- */
- public static class AddAsisRemove extends EnumeratedAttribute {
- public String[] getValues() {
- return new String[] {"add", "asis", "remove"};
+
+
+ class BufferLine {
+ private OneLiner parent;
+ private int start = UNDEF;
+ private int eolStart = UNDEF;
+ private int startNext = UNDEF;
+ private int next = 0;
+ private int eoline = UNDEF; // offset within line of first eol char
+ private int column = 0;
+ private int lookahead = UNDEF;
+
+ public BufferLine(OneLiner linebuf)
+ throws BuildException
+ {
+ if (linebuf == null) {
+ throw new BuildException("BufferLine: null parent");
+ }
+ if (linebuf.buffer == null) {
+ throw new BuildException("BufferLine: null char buffer");
+ }
+
+ parent = linebuf;
+ }
+
+ public BufferLine(OneLiner lbuf, int start,
+ int eolStart, int startNext)
+ throws BuildException
+ {
+ this(lbuf);
+ if (start < 0 || eolStart < 0 || startNext < 0 ||
+ startNext > lbuf.buffer.length || eolStart > startNext ||
+ start > eolStart) {
+ throw new BuildException("BufferLine: indices out of range");
+ }
+ this.start = start;
+ this.eolStart = eolStart;
+ this.startNext = startNext;
+ this.eoline = eolStart - start;
+ this.next = 0;
+ this.column = 0;
+ }
+
+ public char[] getCharBuffer() {
+ return parent.getBuffer();
+ }
+
+ public int getEoline() {
+ return eoline;
+ }
+
+ public int getNext() {
+ return next;
+ }
+
+ public void setNext(int next) {
+ this.next = next;
+ }
+
+ public int getLookahead() {
+ return lookahead;
+ }
+
+ public void setLookahead(int lookahead) {
+ this.lookahead = lookahead;
}
+
+ public char getChar(int i)
+ throws BuildException
+ {
+ if (start + i < eolStart) {
+ return parent.getBuffer()[start + i];
+ }
+ else {
+ throw new BuildException
+ ("Pointer overflow: getChar");
+ }
+ }
+
+ public char getNextChar()
+ throws BuildException
+ {
+ if (next < eoline) {
+ return parent.getBuffer()[start + next];
+ }
+ else {
+ throw new BuildException
+ ("Pointer overflow: getNextChar");
+ }
+ }
+
+ public char getNextCharInc()
+ throws BuildException
+ {
+ if (next < eoline) {
+ return parent.getBuffer()[start + next++];
+ }
+ else {
+ throw new BuildException
+ ("Pointer overflow: getNextCharInc");
+ }
+ }
+
+ public void setChar(char c, int i)
+ throws BuildException
+ {
+ if (i < eoline) {
+ parent.getBuffer()[start + i] = c;
+ }
+ else {
+ throw new BuildException
+ ("Line overflow: setNext");
+ }
+ }
+
+ public void setNextChar(char c)
+ throws BuildException
+ {
+ if (next < eoline) {
+ parent.getBuffer()[start + next] = c;
+ }
+ else {
+ throw new BuildException
+ ("Line overflow: setNextChar");
+ }
+ }
+
+ public void setNextCharInc(char c)
+ throws BuildException
+ {
+ if (next < eoline) {
+ parent.getBuffer()[start + next++] = c;
+ }
+ else {
+ throw new BuildException
+ ("Line overflow: setNextCharInc");
+ }
+ }
+
+ public int getColumn() {
+ return column;
+ }
+
+ public void setColumn(int col) {
+ column = col;
+ }
+
+ public int incColumn() {
+ return column++;
+ }
+
+ public int length() {
+ return eolStart - start;
+ }
+
+ public int getEolLength() {
+ return startNext - eolStart;
+ }
+
+ public String getLineString()
+ throws BuildException
+ {
+ if (start >= 0) {
+ return new String(parent.getBuffer(), start, eolStart - start);
+ }
+ else {
+ throw new BuildException(
+ "Not initialized: getLineString");
+ }
+ }
+
+ public String getEol()
+ throws BuildException
+ {
+ if (start >= 0) {
+ return new String
+ (parent.getBuffer(), eolStart, startNext - eolStart);
+ } else {
+ throw new BuildException("Not initialized: getEol");
+ }
+ }
+
+ public String substring(int begin) {
+ return this.substring(begin, eoline);
+ }
+
+ public String substring(int begin, int end)
+ throws BuildException
+ {
+ if (begin >= 0) {
+ if (begin < eoline && begin < end) {
+ return new String(
+ parent.getBuffer(),
+ start + begin,
+ end < eoline ? end - begin : eoline - begin);
+ }
+ else {
+ throw new BuildException
+ ("Begin index too big: substring");
+ }
+ }
+ else {
+ throw new BuildException(
+ "Not initialized: substring");
+ }
+
+ }
+
}
+
}