[newbie] UNIX INTRO: Shell Processing

Richard Myers Fri, 30 Jul 1999 03:01:40 -0700

If you feel the material I've been sending out has been too advanced,
please send me email (it doesn't need to go to the maillist, unless you
want it to...) at:

  [EMAIL PROTECTED]

...to let me know. We can do an intro to the intro.

Some of the material in this lesson is *intermediate* level.

Don't feel bad if this material seems confusing for beginners. I'm hoping
it may be challenging (perhaps even stimulating) for those who've been
doing UNIX for a while.

--------------
         Shell
    Processing
--------------
--------------
        Digest
--------------

The kernel is the software "brain" of the computer system. The kernel
performs low-level and system-level functions. The kernel communicates in
a language that is complex and highly technical.

The shell is built around the kernel. The shell is that part of a
UNIX/Linux system that protects you, the user, from the complexity of the
kernel, and protects the kernel from any inappropriate input by the user.

In this lesson we will explore how the shell processes the command line.

We have already discussed the ways in which UNIX utilities can be combined
to create powerful tools. Such techniques can be applied in scripts, or
can be typed in directly at the command line. In fact, the shell processes
each line of a script in a very similar fashion to processing the command
line.

Each command offered to the command prompt is first evaluated by the
shell. Some testing occurs, and if all goes well, a (possibly modified)
command string is processed, and data is passed to or collected from any
utilities which are called.

     [ Quickie definitions: a command is how you tell the system 
     to do something. Specifically, a command is a series of 
     characters that you type. A command may call a utility, 
     which is simply a program on the system which may be used
     for a particular purpose. Utility programs are sometimes 
     referred to as tools. A command may also call a script,
     or execute some other shell function. ]

  o The shell launches programs, maintains variables, handles redirection
and pipes, expands wildcards, and interfaces with the filesystem.

  o Data may be channeled to or collected from files via redirection.

  o Utilities operate on data according to their specific function, taking
input as command line arguments, or as standard input passed from the
shell via pipes.

  o The shell and the utilities work together to process information. In
other words, the shell does some things, and the utilities do some things.
By experimentation, it is possible to discover which does what.

--------------
      In-depth
--------------

--------------
   Redirection
--------------

Here is a clue to an earlier lesson's mystery: the redirection filename
may be anywhere in the argument list. We can echo characters to a file:

  $ echo xyz > xyzfile
  $ cat xyzfile
  xyz
  $

If we insert a space between the characters, the space is preserved:

  $ echo x y z > xyzfile
  $ cat xyzfile
  x y z
  $

This is because the x, y, and z become separate arguments, and arguments
are separated by a space.

If we insert a lot of spaces, they are condensed to one space:

  $ echo x     y     z     >    xyzfile
  $ cat xyzfile
  x y z
  $

Now lets put the redirection before the string to be redirected. First
(for test purposes) we will insure that the file is an empty file: 

  $ > xyzfile
  $ cat xyzfile
  $

...and then:

  $ echo > xyzfile xyz
  $ cat xyzfile
  xyz
  $

...and the redirection still works.

We can even put the redirection in the middle of the string. Zero the
file again:

  $ > xyzfile
  $ cat xyzfile
  $

...and now our test:

  $ echo x > xyzfile y z
  $ cat xyzfile
  x y z
  $

The redirection command consists of the redirection character, and what it
points to. In this case, the redirection command consists of:

  > xyzfile

It doesn't matter where the redirection command is in the argument list.
The shell will extract it, and process accordingly.

OK, confession time. Zeroing the file really isn't necessary in the above
examples. Simple redirection overwrites whatever was in the file. I just
wanted to make the point that nothing was left over from before.

--------------------
         Introducing
     standard error,
  and the difference 
        between echo 
             and cat
--------------------

We have been dealing with output redirection. Now lets take a look at
input redirection and standard error as we explore the difference between
echo and cat. 

Standard error is (normally) written to the screen, just like standard
output.

Study this series of commands for a moment:

  $ echo This is the contents of xyzfile > xyzfile
  $

  $ cat xyzfile
  This is the contents of xyzfile
  $

  $ echo < xyzfile x y z
  x y z
  $

  $ cat < xyzfile
  This is the contents of xyzfile
  $

  $ cat < xyzfile x y z
  ksh: cat: x: cannot open [No such file or directory]
  ksh: cat: y: cannot open [No such file or directory]
  ksh: cat: z: cannot open [No such file or directory]
  $

What exactly has happened here?

  o The echo command writes arguments to the standard output. 

  o The cat command reads the contents of files in sequence, then writes
them to the standard output.

  o Remember that the standard output normally means the screen.

                 --------------------------------------
Here's a quote from the text _UNIX System V, A Practical Guide, Third
Edition_, by Mark Sobell:

"Using cat with input redirecton from a file yields the same result as
giving a cat command with the filename as an argument. The cat utility is
a member of a class of UNIX utilities that function in this manner. Other
members of this class of utilities are lp, sort, and grep." -page 100
                 --------------------------------------

So we might expect that cat and echo work somewhat differently, even
though both are designed to output text to the screen.

In this command sequence:

  $ echo < xyzfile x y z
  x y z
  $

...the input redirection from file xyzfile is processed 

     [ and I *think* that nothing is done with it-- it certainly 
     doesn't display!  <grin--> see errata, end of lesson ]

...then echo prints the three remaining arguments on the screen.

Catting the xyzfile is straight-forward, and doing that here can verify
what is in file xyzfile: 

  $ cat xyzfile
  This is the contents of xyzfile
  $

We can also use input redirection to accomplish the same thing: 

  $ cat < xyzfile
  This is the contents of xyzfile
  $

But what is this next sequence?

  $ cat < xyzfile x y z
  ksh: cat: x: cannot open [No such file or directory]
  ksh: cat: y: cannot open [No such file or directory]
  ksh: cat: z: cannot open [No such file or directory]
  $

The cat utility is looking for files x, y, and z. We don't have files by
those names.

Why didn't cat print the contents of the xyzfile to screen? 

Here is a similar example.

  $ echo This is the contents of abcfile > abcfile
  $

We have created abcfile. Then:

  $ cat < abcfile xyzfile
  This is the contents of xyzfile
  $

The cat utility prints only the contents of the last file in the argument 
list.

The cat utility is certainly capable of printing the contents of both
files using just one command: 

  $ cat abcfile xyzfile
  This is the contents of abcfile
  This is the contents of xyzfile
  $

We can get a clue if we examine the order in which things happen. There is
no file in our directory with the name "nofile". See what happens:

  $ cat abcfile xyzfile nofile
  -ksh: cat: nofile: cannot open [No such file or directory]
  This is the contents of abcfile
  This is the contents of xyzfile
  $

Notice! The nofile filename is last in the argument list, yet we get the
error message first! 

So this is why the error is printed first. We have some processing going
on (by the shell!) before the files are printed to screen (by the cat
utility!), and one part of that shell processing is checking for the
filenames, and reporting any errors.

Lets try the same thing without redirection:

  $ cat abcfile xyzfile nofile
  -ksh: cat: nofile: cannot open [No such file or directory]
  This is the contents of abcfile
  This is the contents of xyzfile
  $

And we see the error reported first in this example too.

We still haven't solved the mystery of what happened to the contents of
abcfile in this example:

  $ cat < abcfile xyzfile
  This is the contents of xyzfile
  $

If anyone has an answer to this, tell us!


----------------
ADVANCED CONCEPT
----------------

What exactly is the difference between:

  $ cat < filename

...and:

  $ cat filename

??? As a practical matter for beginning UNIXers, there is little
difference, most examples will do exactly the same thing. 

[ Technically, a script may use the first method (with input redirection)
to keep the shell or program from reading anything else from its input. ]


-------------
    Important
      Concept
-------------

Here is an extremely important concept that you will explore in greater
detail later: 

In UNIX you have standard input, standard output, and standard error. 
Like standard output, standard error is normally printed to the screen. 
However, these two important output streams are handled separately, and
they may be redirected. 

----------------
    Redirection, 
      Pipes, and
  UNIX Utilities
----------------

Consider the output of the who utility:

  $ who
  rmyers     term/r0E     May 16 23:40
  iceberg    term/r0G     May 16 23:33
  spirkle    pts001       May 16 23:40
  eshephar   pts002       May 16 23:43
  jadams     pts003       May 16 23:49
  $

This output could be much "friendlier". What if it displayed a header
which identified each column of output? For example:

  $ who
  user ID    DEVICE      LOGIN DATE/TIME
  ---------  ---------   ---------------
  rmyers     term/r0E     May 16 23:40
  iceberg    term/r0G     May 16 23:33
  spirkle    pts001       May 16 23:40
  eshephar   pts002       May 16 23:43
  jadams     pts003       May 16 23:49
  wjones     pts004       May 16 23:22
  $

But what if we then wanted a sorted list? Perhaps using a command such as: 

  $ who | sort
  ---------  ---------   ---------------
  eshephar   pts002       May 16 23:43
  iceberg    term/r0G     May 16 23:33
  jadams     pts003       May 16 23:49
  rmyers     term/r0E     May 16 23:40
  spirkle    pts001       May 16 23:40
  user ID    DEVICE      LOGIN DATE/TIME
  wjones     pts004       May 16 23:22
  $

Our friendly header from the who utility suddenly looks more like a
footer. Not good.

Here's a nice trick-- using the word count command wc (page 744) to count
users: 

  $ who
  rmyers     term/r0E     May 16 23:40
  spirkle    term/r0G     May 16 23:33
  $ who | wc -l
         2
  $

There are two users logged in. The -l option-- this is not a "one", this
is "l" for (l)ine-- the -l option to the wc command counts the number of
lines in the output, which should always equal the number of users.

But what if we were getting a header from the who utility?

  $ who
  user ID    DEVICE      LOGIN DATE/TIME
  ---------  ---------   ---------------
  rmyers     term/r0E     May 16 23:40
  spirkle    term/r0G     May 16 23:33
  $ who | wc -l
         4
  $

Again there are two users logged in, but our command tells us there are
four. 

The message here: with just simple output from our utilities, piping
works. With fancy trappings such as column headers, our output would be
incorrect. 

Utilitarianism is more important than user-friendliness.  All of UNIX
(well, almost all of UNIX) makes use of this philosophy. 


Let us revisit the wc command. wc -l counts lines in a file. Suppose that
we have a file called tenlines which has (surprise!) 10 lines.

  $ cat tenlines
  1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  $

Lets find out how many lines our file has:

  $ wc -l tenlines
        10 tenlines
  $

Notice that the wc command displayed a count, then our filename. Now lets
try almost the same thing-- except that we are using input redirection to
send the tenlines file to the standard input of wc:

  $ wc -l < tenlines
        10
  $

This time wc still counts the lines, but the wc utility does not know what
filename to display.

Why?

In the first case,

  $ wc -l tenlines

..."tenlines" was a filename argument to wc -l. This means that the shell
passed the text string "tenlines" to wc, and wc used that text string to
find a file called tenlines in the working directory.

In the second case when the shell interpreted the line it discovered the
input redirection character "<". 

  $ wc -l < tenlines

In this second case the input redirection character occurred on the
command line, and the shell processes the redirection before passing any
filename arguments to the wc command. Therefore the wc utility was passed
zero filename arguments.

Following the instruction given by the input redirection character, the
shell evaluated the tenlines filename itself. Then the shell piped the
*contents* of the tenlines file to the wc utility's standard input. 
Therefore the wc utility cannot display a filename that it doesn't even
know about.

In fact, the wc utility doesn't even know if the input comes from a file. 
If we leave out the redirection and the filename, we can input some lines
from standard input (that is, from the keyboard) to accomplish something
similar: 

  $ wc -l
  1
  2
  3
  4
  <Control-D>
         4
  $

The 1, 2, 3, 4, and <Control-D> were typed in directly from the keyboard.
Let us examine the filename argument vs. file redirection issue from a
slightly different perspective: what happens if there is no file for wc to
act upon?

  $ wc -l nofile
  ksh: wc: nofile: cannot open [No such file or directory]
  $

In this first case we see that a ksh shell (KornShell) was running the wc
utility, and the wc utility reports that there is no file with the
filename "nofile".

And with redirection...

  $ wc -l < nofile
  ksh: nofile: cannot open [No such file or directory]
  $

...the shell searched for the file and found a problem before wc was even
launched. The wc utility therefore doesn't appear in the error message. 

What about the option -l? Is it passed to wc before the missing file is
detected? We can check this by passing a bad option:

  $ wc -z nofile
  ksh: wc: -z: unknown option
  Usage: wc [-lw] [-c]|-m]] [file...]
  $

The option (first argument on the command line) was apparently evaluated
before the filename was tested. 

[ Note the line:

  Usage: wc [-lw] [-c]|-m]] [file...]

This line is displayed by the wc utility. The wc utility recognizes that
the -z is not an allowed option, and it therefore explains what valid
options may be used. When you obtain this type of response from a utility
such as wc, a next logical step is to execute "man wc" to learn more about
the utility's usage. ]

And once again, with redirection:

  $ wc -z < nofile
  ksh: nofile: cannot open [No such file or directory]
  $

The shell acted first; in this case the shell tested the filename before
the wc utility was able to detect the bad option.

   ------------------------------------------------------------ 
    Note: 
    The behavior with the wc utility-- reporting the filename 
    when it is an argument, not reporting the filename when the 
    contents of the file are piped to the standard input by 
    redirection-- exists, even though the wc command that we 
    normally execute in ksh is what we call a "shell builtin".  
    That is, typing just "wc" does not really execute the /bin/wc 
    utility, it executes a KornShell version of that utility. 
   ------------------------------------------------------------ 

If we repeat our test with an absolute pathname for /bin/wc, we get the
same behavior (but with UX: instead of ksh:) for the argument test:

  $ /bin/wc -l nofile
  UX:wc: ERROR: Cannot open nofile: No such file or directory

So here we have a very slight difference between the wc utility and the wc
shell built-in.

The redirection (handled by the shell) appears the same:

  $ /bin/wc -l < nofile
  ksh: nofile: cannot open [No such file or directory] 
  $

--------------
       Pattern
      Matching
--------------
 
You may have heard that close only counts in horseshoes and hand grenades. 
Now you know that close also counts in UNIX. Suppose that you wanted to
search for a file called "online", but you couldn't remember if you'd
called it "online" or "on-line". Well, try a wildcard.
 
The UNIX shell finds both online and on-line if you search for on*line. 
And surprise! It will also find one-line, once-upon-a-line, and
one.toke.over.the.line (if those files exist).  

$ ls
frontline
on-line
once-upon-a-line
one-line
one-liner
one.more.that.does.not.match
one.toke.over.the.line
$ 

$ ls on*line
on-line
once-upon-a-line
one-line
one.toke.over.the.line
$ 

But it won't match frontline (no leading "fr"), and it won't match
one-liner (no trailing "r").
 
The "*" matches any string of characters, including zero characters. It is
sometimes called a wildcard (like the deuces-are-wild version of poker),
and sometimes it is called a meta-character. Another meta-character is the
question mark ?, which matches any single character. Other characters are
used for more advanced pattern matching (future lesson).

---------------
       Filename
  Substitutions
---------------

Just as the shell evaluates redirection on the command line before it
launches the command, the shell also checks for wildcards in the filename
text string. However, the shell's behavior is a bit different, as we shall
see.

  $ cat this
  this
  $ cat that
  that

We have textfiles this and that (each of which contains its own name).

  $ wc -l this that
         1 this
         1 that
         2 total
  $

We explicitly pass a wildcarded filename as an argument:

  $ wc -l th*
         1 that
         1 this
         2 total
  $

We observe that the wc utility can handle multiple files.

Trust me on this-- the shell expanded the th* to "this that" and passed it
to wc. 

Now lets try a wildcard with redirection:

  $ wc -l < th*
  ksh: th*: Ambiguous
  $

Guess that doesn't work. 

Notice that it is the KornShell (and not the wc utility) telling us that
it doesn't work.

How about passing two explicit filenames using redirection?

  $ wc -l < this that
         1 that
  $

Hmmm, our result is wrong. Guess that doesn't work either. 

Where does the "that" come from in the output?

Obviously "that" is being passed as a filename argument, rather than as
redirection to standard input. (Remember our redirection experience from
above?)

The basic message here is that our simple redirection doesn't work right
with multiple files. 

We recall that redirection does work with a single filename...

  $ wc -l < this
         1
  $

...even though we have lost our filename because of the redirection
(remember, the shell sends the contents of the file to standard input,
so wc never sees the filename!)

OK, enough of the quickie review. 

Lets forget about redirection for a bit, and try some basic filename
wildcard expansion. 

We start with our same two files...

  $ ls this that
  that  this
  $

We display the filenames passed to the ls utility as arguments, and...

  $ ls th*
  that  this
  $

...we can display them with a wildcard. Once again, this is the shell
expanding the th*, and finding the two files in the directory that match. 

What happens if we expand a wildcard but find no match?

  $ ls x*
  UX:ls: ERROR: Cannot access x*: No such file or directory
  $

We don't have any files in the directory that begin with an "x".

But what about the error message? 

Doesn't this indicate that the ls utility is giving us this error? 

If the shell is doing the file processing due to the wildcard, shouldn't
that be a ksh error like before? 

IDEA!!! Why don't we intentionally generate an ls-handled error. 

If we pass a non-existant filename explicitly to ls (so that the shell
keeps its cotton-pickin' wildcard processing out of this), we should be
able to compare errors:

  $ ls x
  UX:ls: ERROR: Cannot access x: No such file or directory

Now compare that to our error from expanding x*

  UX:ls: ERROR: Cannot access x*: No such file or directory

What do you think? Looks about the same, doesn't it?

In fact, the ls utility is generating the error in both cases.

So if the shell is processing the wildcard string, expanding it to include
all filenames in the directory, but there are no filenames to pass, what
exactly is the shell passing to the ls utility? 

We can use a very important trick to find out: set -x
             ---- --------- -----

The command "set -x" enables trace mode. With trace on, we can see exactly
what the shell is passing to the ls utility.

The line that begins with the + is our "trace" line:

  $ set -x
  $ ls x*
  + ls 'x*'
  UX:ls: ERROR: Cannot access x*: No such file or directory
  $

The trace line shows us the results of the hidden processing.

Turn it off with set +x.

If the shell doesn't find any filenames to match the wildcard string, then
it quotes the wildcard string (in this case 'x*') and passes it on to the
utility as if it was a filename argument. 

Thus, the given utility can either handle it or generate an error, as
appropriate.

Just for grins, lets see what trace looks like if we do a wildcard
expansion on existing files:

  $ ls th*
  + ls that this
  that  this
  $

Just what we might have expected.


------------------
      Interpreting
  the Command Line
       and Command
      Substitution
------------------
 
The shell interprets the command line to process important information,
and to remove unnecessary information. For example, unquoted whitespace
in a command statement will be removed before the arguments are passed to
the command. Observe:

                -------------------------

  $ echo one           two             three
  one two three

Whitespace is removed. We saw this before.


  $ echo "one            two            three"
  one            two            three

Whitespace is preserved by double quotes.


  $ echo 'one            two            three'
  one            two            three


Whitespace is preserved by single quotes.


  $ echo `one            two            three`
  -ksh: one: not found

  $

The command fails. The important lesson here is that the "back quotes"
provide a different functionality than single or double quotes.  The back
quote, also known as the "grave accent", has a special purpose. Here is a
similar example:


  $ echo echo this
  echo this
  $

The "echo" command echoes to the screen whatever follows it on the command
line.


  $ echo 'echo this'
  echo this
  $

The single quotes aren't necessary in this case, but they make our printed
text string explicit.


  $ echo "echo this"
  echo this
  $

The double quotes (in this case) work the same way.


  $ echo `echo this`
  this
  $

Whoa! Where did our echo'd echo go?

The second word echo is no longer echo'd by the first echo command. In
fact, the back quotes tell the shell to:

  o Evaluate the command that is between the back quotes
  o Substitute the output of that command into the command line

Therefore, in the command

  $ echo `echo this`

the first echo does its work only AFTER the `echo this` is evaluated. And,
the output of `echo this` is "this".

This is called COMMAND SUBSTITUTION.

Lets try the command substitution without the first echo command to see
what happens:

  $ `echo this`
  ksh: this: cannot execute [Permission denied]
  $

We are echoing "this" to the command line; in other words, the shell is
taking everything that is between the back quotes, evaluating what it
finds there, and then replacing the entire expression (back quotes and
all) with the text string "this".

The equivalent would be simply typing "this" on the command line:

  $ this
  ksh: this: cannot execute [Permission denied]
  $

Note that we get Permission denied because there is a filename "this" in
our directory. If we remove filename "this":

  $ rm this
  $

...and try our command again:

  $ `echo this`
  this: not found
  $ 

The shell does not know what to do with "this", entered as a command.

SOOOOO, what happens if we echo a legitimate command with command
substitution?

  $ `echo date`
  Sun Jun 22 21:43:40 MDT 1997
  $


Ahhh! We get the same result as if we had typed "date" on the command
line:

  $ date
  Sun Jun 22 21:45:31 MDT 1997
  $

...whereas if we had typed echo date without the back qoutes, we get
something entirely different:

  $ echo date
  date
  $

Want to see the shell do its command substitution?

  $ set -x
  $
 
  $ `echo date`
  + echo date
  + date
  Fri Jul 30 02:20:28 MDT 1999
  $ 

The "date" command replaces "`echo date`", and then date is executed.

Now that you know what the back quote does, be aware that it is outdated.
It is from the Bourne Shell, and the Korn Shell and later shells honor it,
but later shells introduce a new technique for command substitution. It
looks like this:

  $ echo $(echo this)
  this
  $

In the first line above, the first $ is your prompt, and the second $,
together with the (), provides the syntax for command substitution.


  _______________________________ end ______________________________   


Errata: Headers can be turned on for the who command, typically using a
formulation such as who -H. Output might appear like this:

$ who -H
NAME       LINE         TIME
root       console      Jul 28 18:40
rtmyers    pts/0        Jul 29 00:12    (cbgw2.lucent.com)
rtmyers    pts/1        Jul 29 02:39    (cbgw2.lucent.com)
$ 

Behavior using the built-in header switch would in fact fail the user test
in the lesson. Observe:

$ who
root       console      Jul 28 18:40
rtmyers    pts/0        Jul 29 00:12    (cbgw2.lucent.com)
rtmyers    pts/1        Jul 29 02:39    (cbgw2.lucent.com)
$ who | wc -l
       3
$ who -H | wc -l
       4
$ who
root       console      Jul 28 18:40
rtmyers    pts/0        Jul 29 00:12    (cbgw2.lucent.com)
rtmyers    pts/1        Jul 29 02:39    (cbgw2.lucent.com)
$ 

Default operation is to NOT print headers.

      ---------------------------------------------------

Here is a bit of playing with cat < file(s) and echo < file(s)

First example:

$ ls
abcfile
xyzfile
$ 

$ cat *
abc
xyz
$ 

$ set -x       
$ 

$ cat < abcfile xyzfile
+ cat xyzfile
+ 0< abcfile
xyz
$ 

Second example:

$ echo one two three > xyzfile
$ echo < xyzfile
 
$

$ set -x
$ echo < xyzfile
+ echo
+ 0< xyzfile
 
$ 

Note the blank lines-- previous two echo results.

And for comparison:

$ set -x
$ echo x
+ echo x
x
$

x prints to the screen. And,

$ echo x > /dev/null
+ echo x
+ 1> /dev/null
$ 

Where does the x go? Why is there no blank line?

      ---------------------------------------------------


best wishes,


richard myers
[newbie] UNIX INTRO: Shell Processing

Reply via email to