Hi there Brett
brett.t.stew...@exxonmobil.com wrote:
Both textscan() and textread() allow the specification of headerlines like
this:
textscan(fid,'%s','headerlines',N)
in which N is the number of lines to skip. However, with the current
version, if you try to specify N, it always uses N = 2.
Thanks for the report.
You've sent it to the wrong forum, but you didn't know I guess.
octave-dev = for octave-forge packages, i.e. add-on packages that are
not "core-octave" (not maintained by the octave developers sensu stricto
but by other folks).
As both textread and textscan are in core octave, you'd rather ask for
help in help-oct...@octave.org (the Help mailing list).
Actually even that is not correct; the folks there rather want (you to
add) an entry in the bug tracker.
I'll do that for you (later tonight), I already found & fixed the bug
(same one in both functions) and besides, the last months I have fixed a
couple of other bugs in textread and friends.
You can help me by swapping the attached strread.m, textread.m and
textscan.m into the io package in place of the old versions (first
rename those to _textread.m and _textscan.m).
You can do "which textscan.m" (w/o quotes) in octave to find out where
they are located.
Please report back if the attached versions work OK or not.
Philip
## Copyright (C) 2009-2011 Eric Chassande-Mottin, CNRS (France)
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING. If not, see
## <http://www.gnu.org/licenses/>.
## -*- texinfo -*-
## @deftypefn {Function File} {[@var{a}, @dots{}] =} strread (@var{str})
## @deftypefnx {Function File} {[@var{a}, @dots{}] =} strread (@var{str}, @var{format})
## @deftypefnx {Function File} {[@var{a}, @dots{}] =} strread (@var{str}, @var{format}, @var{prop1}, @var{value1}, @dots{})
## Read data from a string.
##
## The string @var{str} is split into words that are repeatedly matched to the
## specifiers in @var{format}. The first word is matched to the first
## specifier,
## the second to the second specifier and so forth. If there are more words
## than
## specifiers, the process is repeated until all words have been processed.
##
## The string @var{format} describes how the words in @var{str} should be
## parsed.
## It may contain any combination of the following specifiers:
## @table @code
## @item %s
## The word is parsed as a string.
##
## @item %d
## @itemx %f
## The word is parsed as a number.
##
## @item %*
## The word is skipped.
## @end table
##
## Parsed word corresponding to the first specifier are returned in the first
## output argument and likewise for the rest of the specifiers.
##
## By default, @var{format} is @t{"%f"}, meaning that numbers are read from
## @var{str}.
##
## For example, the string
##
## @example
## @group
## @var{str} = "\
## Bunny Bugs 5.5\n\
## Duck Daffy -7.5e-5\n\
## Penguin Tux 6"
## @end group
## @end example
##
## @noindent
## can be read using
##
## @example
## [@var{a}, @var{b}, @var{c}] = strread (@var{str}, "%s %s %f");
## @end example
##
## The behavior of @code{strread} can be changed via property-value
## pairs. The following properties are recognized:
##
## @table @asis
## @item "commentstyle"
## Parts of @var{str} are considered comments and will be skipped.
## @var{value} is the comment style and can be any of the following.
## @itemize
## @item "shell"
## Everything from @code{#} characters to the nearest end-line is skipped.
##
## @item "c"
## Everything between @code{/*} and @code{*/} is skipped.
##
## @item "c++"
## Everything from @code{//} characters to the nearest end-line is skipped.
##
## @item "matlab"
## Everything from @code{%} characters to the nearest end-line is skipped.
## @end itemize
##
## @item "delimiter"
## Any character in @var{value} will be used to split @var{str} into words
## (default value = \"\n\").
##
## @item "whitespace"
## Any character in @var{value} will be interpreted as whitespace and
## trimmed; the string defining whitespace must be enclosed in double
## quotes for proper processing of special characters like \t.
##
## @item "emptyvalue"
## Parts of the output where no word is available is filled with @var{value}.
## @end table
##
## @seealso{textread, load, dlmread, fscanf}
## @end deftypefn
function varargout = strread (str, format = "%f", varargin)
## Check input
if (nargin < 1)
print_usage ();
endif
if (!ischar (str) || !ischar (format))
error ("strread: STR and FORMAT arguments must be strings");
endif
## Parse options
comment_flag = false;
numeric_fill_value = 0;
white_spaces = " \n\r\t\b";
delimiter_str = "";
for n = 1:2:length (varargin)
switch (lower (varargin {n}))
case "commentstyle"
comment_flag = true;
switch (lower (varargin {n+1}))
case "c"
comment_specif = {"/*", "*/"};
case "c++"
comment_specif = {"//", "\n"};
case "shell"
comment_specif = {"#", "\n"};
case "matlab"
comment_specif = {"%", "\n"};
otherwise
warning ("strread: unknown comment style '%s'", val);
endswitch
case "delimiter"
delimiter_str = varargin {n+1};
case "emptyvalue"
numeric_fill_value = varargin {n+1};
case "bufsize"
## XXX: We could synthesize this, but that just seems weird...
warning ("strread: property \"bufsize\" is not implemented");
case "whitespace"
white_spaces = varargin {n+1};
case "expchars"
warning ("strread: property \"expchars\" is not implemented");
otherwise
warning ("strread: unknown property \"%s\"", varargin {n});
endswitch
endfor
if (isempty (delimiter_str))
if (~isempty (white_spaces))
delimiter_str = white_spaces;
else
## Default delimiter = newline
delimiter_str = "\n";
endif
endif
## Parse format string
idx = strfind (format, "%")';
specif = format ([idx, idx+1]);
nspecif = length (idx);
idx_star = strfind (format, "%*");
nfields = length (idx) - length (idx_star);
if (max (nargout, 1) != nfields)
error ("strread: the number of output variables must match that specified by FORMAT");
endif
## Remove comments
if (comment_flag)
cstart = strfind (str, comment_specif{1});
cstop = strfind (str, comment_specif{2});
if (length (cstart) > 0)
## Ignore nested openers.
[idx, cidx] = unique (lookup (cstop, cstart), "first");
if (idx(end) == length (cstop))
cidx(end) = []; # Drop the last one if orphaned.
endif
cstart = cstart(cidx);
endif
if (length (cstop) > 0)
## Ignore nested closers.
[idx, cidx] = unique (lookup (cstart, cstop), "first");
if (idx(1) == 0)
cidx(1) = []; # Drop the first one if orphaned.
endif
cstop = cstop(cidx);
endif
len = length (str);
c2len = length (comment_specif{2});
str = cellslices (str, [1, cstop + c2len], [cstart - 1, len]);
str = [str{:}];
endif
## Determine the number of words per line
format = strrep (format, "%", " %");
[~, ~, ~, fmt_words] = regexp (format, "[^ ]+");
num_words_per_line = numel (fmt_words);
for m = 1:numel(fmt_words)
## Convert formats such as "%Ns" to "%s" (see the FIXME below)
if (length (fmt_words{m}) > 2)
if (strcmp (fmt_words{m}(1:2), "%*"))
fmt_words{m} = "%*";
elseif (fmt_words{m}(1) == "%")
fmt_words{m} = fmt_words{m}([1, end]);
endif
endif
endfor
if (~isempty (white_spaces))
## Check for overlapping whitespaces and delimiters & trim whitespace
[ovlp, iw, ~] = intersect (white_spaces, delimiter_str);
if (~isempty (ovlp))
## Remove delimiter chars from white_spaces
white_spaces = cell2mat (strsplit (white_spaces, white_spaces(iw)));
endif
endif
if (~isempty (white_spaces))
## Remove repeated white_space chars. First find white_spaces positions
idx = strchr (str, white_spaces);
## Find repeated white_spaces
idx2 = ~(idx(2:end) - idx(1:end-1) - 1);
## Set al whitespace chars to spaces
## FIXME: this implies real spaces are always part of white_spaces
str(idx(find (idx))) = ' ';
## Set all repeated white_space to \0
str(idx(find (idx2))) = "\0";
str = strsplit (str, "\0");
## Reconstruct trimmed str
str = cell2mat (str);
endif
## Split 'str' into words
words = split_by (str, delimiter_str);
if (~isempty (white_spaces))
## Trim leading and trailing white_spaces
words = strtrim (words);
endif
num_words = numel (words);
num_lines = ceil (num_words / num_words_per_line);
## For each specifier
k = 1;
for m = 1:num_words_per_line
data = words (m:num_words_per_line:end);
## Map to format
## FIXME - add support for formats like "%4s" or "<%s>", "%[a-zA-Z]"
## Someone with regexp experience is needed.
switch fmt_words{m}
case "%s"
data (end+1:num_lines) = {""};
varargout {k} = data';
k++;
case {"%d", "%f"}
n = cellfun (@isempty, data);
data = str2double (data);
data(n) = numeric_fill_value;
data (end+1:num_lines) = numeric_fill_value;
varargout {k} = data.';
k++;
case {"%*", "%*s"}
## skip the word
otherwise
## Ensure descriptive content is consistent
if (numel (unique (data)) > 1
|| ! strcmpi (unique (data), fmt_words{m}))
error ("strread: FORMAT does not match data");
endif
endswitch
endfor
endfunction
function out = split_by (text, sep)
sep = union (sep, "\n"); # Why would newline always have to be a separator?
pat = sprintf ("[^%s]+", sep);
[~, ~, ~, out] = regexp (text, pat);
out(cellfun (@isempty, out)) = {""};
endfunction
%!test
%! [a, b] = strread ("1 2", "%f%f");
%! assert (a == 1 && b == 2);
%!test
%! str = "# comment\n# comment\n1 2 3";
%! [a, b] = strread (str, '%d %s', 'commentstyle', 'shell');
%! assert (a, [1; 3]);
%! assert (b, {"2"; ""});
%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%! str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! [aa, bb] = strread (str, '%f %s');
%! assert (a, aa, 1e-5);
%! assert (cellstr (b), bb);
%!test
%! str = '';
%! a = rand (10, 1);
%! b = char (round (65 + 20 * rand (10, 1)));
%! for k = 1:10
%! str = sprintf ('%s %.6f %s\n', str, a (k), b (k));
%! endfor
%! aa = strread (str, '%f %*s');
%! assert (a, aa, 1e-5);
%!test
%! str = sprintf ('/* this is\nacomment*/ 1 2 3');
%! a = strread (str, '%f', 'commentstyle', 'c');
%! assert (a, [1; 2; 3]);
%!test
%! str = sprintf ("Tom 100 miles/hr\nDick 90 miles/hr\nHarry 80 miles/hr");
%! fmt = "%s %f miles/hr";
%! c = cell (1, 2);
%! [c{:}] = strread (str, fmt);
%! assert (c{1}, {"Tom"; "Dick"; "Harry"})
%! assert (c{2}, [100; 90; 80])
%!test
%! a = strread ("a b c, d e, , f", "%s", "delimiter", ",");
%! assert (a, {"a b c"; "d e"; ""; "f"});
## Copyright (C) 2009-2011 Eric Chassande-Mottin, CNRS (France)
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING. If not, see
## <http://www.gnu.org/licenses/>.
## -*- texinfo -*-
## @deftypefn {Function File} {[@var{a}, @dots{}] =} textread (@var{filename})
## @deftypefnx {Function File} {[@var{a}, @dots{}] =} textread (@var{filename}, @var{format})
## @deftypefnx {Function File} {[@var{a}, @dots{}] =} textread (@var{filename}, @var{format}, @var{prop1}, @var{value1}, @dots{})
## Read data from a text file.
##
## The file @var{filename} is read and parsed according to @var{format}. The
## function behaves like @code{strread} except it works by parsing a file
## instead
## of a string. See the documentation of @code{strread} for details.
## In addition to the options supported by @code{strread}, this function
## supports one more:
## @itemize
## @item "headerlines":
## @end itemize
## The first @var{value} number of lines of @var{str} are skipped.
## @seealso{strread, load, dlmread, fscanf}
## @end deftypefn
## Updates:
## Philip Nienhuis <prnienh...@users.sf.net>
## 2011-03-18 Fix default whitespace setting to same as ML
## 2011-04-08 Fix headerline processing
function varargout = textread (filename, format = "%f", varargin)
## Check input
if (nargin < 1)
print_usage ();
endif
if (!ischar (filename) || !ischar (format))
error ("textread: first and second input arguments must be strings");
endif
## Read file
fid = fopen (filename, "r");
if (fid == -1)
error ("textread: could not open '%s' for reading", filename);
endif
## Maybe skip header lines. Only first occurence of keyword is used
headerlines = find (strcmpi (varargin, "headerlines"), 1);
if (! isempty (headerlines))
h_lines = varargin{headerlines + 1};
## Beware of (possibly computed) zero value for headerline
if (h_lines > 0), fskipl (fid, h_lines); endif
varargin(headerlines:headerlines+1) = [];
endif
str = fread (fid, "char=>char").';
fclose (fid);
## If needed, set up default whitespace param value
if (isempty (strmatch ('whitespace', tolower (strtrim (varargin)))))
nargs = numel (varargin);
varargin(nargs+1:nargs+2) = {'whitespace', " \b\t"};
endif
## Call strread to make it do the real work
[varargout{1:max (nargout, 1)}] = strread (str, format, varargin {:});
endfunction
## Copyright (C) 2010-2011 Ben Abbott <bpabb...@mac.com>
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING. If not, see
## <http://www.gnu.org/licenses/>.
## -*- texinfo -*-
## @deftypefn {Function File} {@var{C} =} textscan (@var{fid}, @var{format})
## @deftypefnx {Function File} {@var{C} =} textscan (@var{fid}, @var{format}, @
## @var{n})
## @deftypefnx {Function File} {@var{C} =} textscan (@var{fid}, @var{format}, @
## @var{param}, @var{value}, @dots{})
## @deftypefnx {Function File} {@var{C} =} textscan (@var{fid}, @var{format}, @
## @var{n}, @var{param}, @var{value}, @dots{})
## @deftypefnx {Function File} {@var{C} =} textscan (@var{str}, @dots{})
## @deftypefnx {Function File} {[@var{C}, @var{position}] =} textscan (@dots{})
## Read data from a text file.
##
## The file associated with @var{fid} is read and parsed according to
## @var{format}. The function behaves like @code{strread} except it works by
## parsing a file instead of a string. See the documentation of
## @code{strread} for details. In addition to the options supported by
## @code{strread}, this function supports one more:
## @itemize
## @item "headerlines":
## @end itemize
## The first @var{value} number of lines of @var{str} are skipped.
##
## The optional input, @var{n}, specifes the number of lines to be read from
## the file, associated with @var{fid}.
##
## The output, @var{C}, is a cell array whose length is given by the number
## of format specifiers.
##
## The second output, @var{position}, provides the position, in characters,
## from the beginning of the file.
##
## @seealso{dlmread, fscanf, load, strread, textread}
## @end deftypefn
## Updates:
## Philip Nienhuis <prnienhuis@ I respond privatelyusers.sf.net>
## 2011-04-08 Fix headerline arg processing bug
function [C, p] = textscan (fid, format, varargin)
## Check input
if (nargin < 1)
print_usage ();
elseif (nargin == 1 || isempty (format))
format = "%f";
endif
if (nargin > 2 && isnumeric (varargin{1}))
nlines = varargin{1};
args = varargin(2:end);
else
nlines = Inf;
args = varargin;
endif
if (! any (strcmpi (args, "emptyvalue")))
## Matlab returns NaNs for missing values
args{end+1} = "emptyvalue";
args{end+1} = NaN;
endif
if (isa (fid, "double") && fid > 0 || ischar (fid))
if (ischar (format))
if (ischar (fid))
if (nargout == 2)
error ("textscan: cannot provide position information for character input");
endif
str = fid;
else
## Maybe skip header lines
headerlines = find (strcmpi (args, "headerlines"), 1);
if (! isempty (headerlines))
h_lines = varargin{headerlines + 1};
## Beware of zero headerline value, fskipl will count lines to EOF then
if (h_lines > 0), fskipl (fid, h_lines); endif
args(headerlines:headerlines+1) = [];
endif
if (isfinite (nlines))
str = "";
for n = 1:nlines
str = strcat (str, fgets (fid));
endfor
else
str = fread (fid, "char=>char").';
endif
endif
## Determine the number of data fields
num_fields = numel (strfind (format, "%")) - ...
numel (idx_star = strfind (format, "%*"));
## Call strread to make it do the real work
C = cell (1, num_fields);
[C{:}] = strread (str, format, args{:});
if (ischar (fid) && isfinite (nlines))
C = cellfun (@(x) x(1:nlines), C, "uniformoutput", false);
endif
if (nargout == 2)
p = ftell (fid);
endif
else
error ("textscan: FORMAT must be a valid specification");
endif
else
error ("textscan: first argument must be a file id or character string");
endif
endfunction
%!test
%! str = "1, 2, 3, 4\n 5, , , 8\n 9, 10, 11, 12";
%! fmtstr = "%f %d %f %s";
%! c = textscan (str, fmtstr, 2, "delimiter", ",", "emptyvalue", -Inf);
%! assert (isequal (c{1}, [1;5]))
%! assert (length (c{1}), 2);
%! assert (iscellstr (c{4}))
%! assert (isequal (c{3}, [3; -Inf]))
%!test
%! b = [10:10:100];
%! b = [b; 8*b/5];
%! str = sprintf ("%g miles/hr = %g kilometers/hr\n", b);
%! fmt = "%f miles/hr = %f kilometers/hr";
%! c = textscan (str, fmt);
%! assert (b(1,:)', c{1})
%! assert (b(2,:)', c{2})
------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev