Re: [classlib] BufferedReader and FileInputStream.available()

Regis Mon, 09 Nov 2009 20:36:53 -0800

enh wrote:

On Fri, Nov 6, 2009 at 02:28, Mark Hindess <[email protected]> wrote:

In message <[email protected]>, Tim Ellison writes:

On 06/Nov/2009 03:06, Regis wrote:

Mark Hindess wrote:

In message <[email protected]>,
enh writes:

On Wed, Nov 4, 2009 at 23:07, Regis <[email protected]> wrote:

Mark Hindess wrote:

In message <[email protected]>, Regis writes:


Regis, these are really two different issues so lets treat them
as such.  I actually think fixing the /proc reading is better
done in the java code.  I've attached a patch that just catches
and ignores the IOException from the available call.  I think
this is reasonable since if the exception was "real" then the
read will fail if a similar exception.  In fact, since the user
is doing a read call, having the exception thrown by the read is
more in keeping with the expectations of the user than having an
exception thrown by the available call.

Thanks Mark, I ran luni tests with your patch, no regression
found, I'll leave patch to you to apply :)

invoking in.available before read is a little overhead, in the
worst case, one read need four system calls, three seek and one
read. Actually the condition

if ((in.available() = 0) && (out.position() > offset))

it's only for tty file, for normal file it's not necessary. It's
better if we can remove the check for normal file, but I have no
idea how to do this.

isatty(3). for the price of one JNI call at construction and a
boolean field to record the result, you could avoid any later
calls.

This might be a good idea, but I'm getting confused...

At the moment there seems to be a special case in
FileInputStream.available() like this:

  // stdin requires special handling
  if (fd == FileDescriptor.in) {
    return (int) fileSystem.ttyAvailable();
  }
  ... perform seek based available check ...

but I find this confusing because the comment and check implies
that stdin is special but the fileSystem method name implies
that it is being a tty that is the distinction.  This is
confusing because *any* file descriptor can be a tty - e.g. new
FileReader("/dev/tty") - and stdin doesn't have to be a tty -
"java </dev/zero".

Good question, I suppose here should be like this:
if (istty(fd)) {
  return (int) fileSystem.ttyAvailable(fd);
}

With apologies for being pedantic, but IMHO it should be fd.istty()
since there are likely a few places where this could be useful, and we
should cache it in the right place.

Agreed.  I think that is what elliott was suggesting earlier in this
thread.  Definitely seems like we should fix the semantics of these tty
checks so they aren't just about stdin.


yes and no. i was restricting myself to the question "how should we
query the number of available bytes?". on Unix, the answer is with the
FIONREAD ioctl. so my claim was really:

* there should be a "static native int available(FileDescriptor fd)"
method generally available
* the Unix implementation should use the FIONREAD ioctl.
* any platform-specific hacks (such as, say, if Windows really needs
the three-seek hack) should be in the native code for the broken
platforms.
* any Java should use the intention-revealing
"available(FileDescriptor)" method and not concern itself with how
that works on any given platform.

we've done that for Android. (if Windows forces you to treat files and
ttys completely differently, your life will be more difficult, but i
still think that should be in the platform-specific code. as you
pointed out with the "/dev/tty" example, on Unix it's not just
inefficient, it's wrong.)

for the bigger question of BufferedReader.readLine, i don't think we
should be calling "available" at all here. i think somewhere there
must be the equivalent of a "readFully" rather than a "read", and
that's the mistake that needs fixing. fix that, and the "available"
hack can go completely.

I just found a way can remove the in.available() check here. Instead of checkingin.available, I check whether the last char read from stream is '\n'.


Index: modules/luni/src/main/java/java/io/InputStreamReader.java
=====================================================================
--- modules/luni/src/main/java/java/io/InputStreamReader.java
+++ modules/luni/src/main/java/java/io/InputStreamReader.java
@@ -246,14 +246,9 @@ public class InputStreamReader extends Reader {
             while (out.hasRemaining()) {
                 // fill the buffer if needed
                 if (needInput) {
-                    try {
-                        if ((in.available() == 0)
-                            && (out.position() > offset)) {
-                            // we could return the result without blocking read
-                            break;
-                        }
-                    } catch (IOException e) {
-                        // available didn't work so just try the read

+ if ((out.position() > offset) && out.get(out.position() -1) == '\n') {

+                        // we could return the result without blocking read
+                        break;
                     }

                     int to_read = bytes.capacity() - bytes.limit();


--
Best Regards,
Regis.


try this:

#include <unistd.h>
int main() {
  ssize_t n;
  char buf[1024];
  while ((n = read(0, buf, sizeof(buf))) > 0) {
    write(2, ">", 1);
    write(2, buf, n);
  }
  return 0;
}

if you run that and type "hello" then return, then "world" then
return, you'll see (mixing your input and its output):

hello

hello

world

world


because it's the terminal's job to make sure you get data at the end
of a line. (or not, if so configured, but if someone has configured
their terminal that way, that's their decision, they should expect
programs' behavior to change, and we certainly shouldn't go around
reconfiguring the terminal.)

by contrast, if you run this (where my program above has been built as
"read") like this:

echo -e "hello\nworld\n" | ./read

you see:

hello

world

i.e. both lines arrive as a single chunk of data.

(you didn't say exactly _how_ the test fails, so i'm just guessing
it's related to this.)

 --elliott

Still need to look at the available natives too since elliott suggested
that we should replace the 3*seek with a single ioctl for all cases
(though this might not be as portable).

-Mark.

Regis, can you explain a little more about why the check is needed
in the available call?  And why the available check is needed in the
FileInputStream read method?

After I removed the check, everything is fine, except following test is
failed:

  public static void main(String args[])
  {
    BufferedReader reader = new BufferedReader(new
InputStreamReader(System.in));
    String line = null;
    try {
      line = reader.readLine();
    } catch (IOException e) {
      e.printStackTrace();
    }
    System.out.println(line);
  }

In each case, is it really stdin that is special or any file descriptor
representing a tty?

I'd like to understand this as there are a whole bunch of ttyRead with
a similar check that only affects stdin not any tty descriptor.

in the meantime, i wondered also whether it's worth swapping the two
conjuncts in the if, since the latter is essentially free while the
former is expensive. (i've no idea how often the cheap conjunct is
false, though.)

I thought about this but held of doing it for the same reason.  However,
I guess it is almost certainly a win.

Regards,
 Mark.



--
Best Regards,
Regis.

Re: [classlib] BufferedReader and FileInputStream.available()

Reply via email to