Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Denis Kudriashov Wed, 11 Apr 2018 01:03:38 -0700

2018-04-11 8:32 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>:

> >>> Where is it being said that #next and/or #atEnd should be blocking or
> non-blocking ?
> >>
> >> There is existing code that assumes that #atEnd is non-blocking and
> >> that #next is allowed block.  I believe that we should keep those
> >> conditions.
> >
> > I fail to see where that is written down, either way. Can you point me
> to comments stating that, I would really like to know ?
>
> I'm not aware of it being written down, just that ever existing
> implementation I'm aware of behaves this way.
>
> On the other hand, making #atEnd blocking breaks Eliot's REPL sample
> (in Squeak).
>


Could you write here this example, please?


>
>
>
> >>> How is this related to how EOF is signalled ?
> >>
> >> Because, combined with terminal EOF not being known until the user
> >> explicitly flags it (with Ctrl-D) it means that #atEnd can't be used
> >> for iterating over input from stdin connected to a terminal.
> >
> > This seems to me like an exception that only holds for one particular
> stream in one particular scenario (interactive stdin). I might be wrong.
> >
> >>> It seems to me that there are quite a few classes of streams that are
> 'special' in the sense that #next could be blocking and/or #atEnd could be
> unclear - socket/network streams, serial streams, maybe stdio (interactive
> or not). Without a message like #isDataAvailable you cannot handle those
> without blocking.
> >>
> >> Right.  I think this is a distraction (I was trying to explain some
> >> details, but it's causing more confusion instead of helping).
> >>
> >> The important point is that #atEnd doesn't work for iterating over
> >> streams with terminal input
> >
> > Maybe you should also point to the actual code that fails. I mean you
> showed a partial stack trace, but not how you got there, precisely. How
> does the application reading from an interactive stdin do to get into
> trouble ?
>
> Included below.
>
>
> >>> Reading from stdin seems like a very rare case for a Smalltalk system
> (not that it should not be possible).
> >>
> >> There's been quite a bit of discussion and several projects recently
> >> related to using pharo for scripting, so it may become more common.
> >> E.g.
> >>
> >> https://www.quora.com/Can-Smalltalk-be-a-batch-file-
> scripting-language/answer/Philippe-Back-1?share=c19bfc95
> >> https://github.com/rajula96reddy/pharo-cli
> >
> > Still, it is not common at all.
> >
> >>> I have a feeling that too much functionality is being pushed into too
> small an API.
> >>
> >> This is just about how should Zinc streams be iterating over the
> >> underlying streams.  You didn't like checking the result of #next for
> >> nil since it isn't general, correctly pointing out that nil is a valid
> >> value for non-byte oriented streams.  But #atEnd doesn't work for
> >> stdin from a terminal.
> >>
> >>
> >> At this point I think there are three options:
> >>
> >> 1. Modify Zinc to check the return value of #next instead of using
> #atEnd.
> >>
> >> This is what all existing character / byte oriented streams in Squeak
> >> and Pharo do.  At that point the Zinc streams can be used on all file
> >> / stdio input and output.
> >
> > I agree that such code exists in many places, but there is lots of
> stream reading that does not check for nils.
>
> Right.  Streams can be categorised in many ways, but for this
> discussion I think streams are broken in to two types:
>
> 1) Byte / Character oriented
> 2) All others
>
> For historical reasons, byte / character oriented streams need to
> check for EOF by using "stream next == nil" and all other streams
> should use #atEnd.
>
> This avoids the "nil being part of the domain" issue that was
> discussed earlier in the thread.
>
>
> >> 2. Modify all streams to signal EOF in some other way, i.e. a sentinel
> >> or notification / exception.
> >>
> >> This is what we were discussing below.  But it is a decent chunk of
> >> work with significant impact on the existing code base.
> >
> > Agreed. This would be a future extension.
> >
> >> 3. Require anyone who wants to read from stdin to code around Zinc's
> >> inability to handle terminal input.
> >>
> >> I'd prefer to avoid this option if possible.
> >
> > See higher for a more concrete usage example request.
>
>
> testAtEnd.st
> --
> | ch stream string stdin |
>
> 'stdio.cs' asFileReference fileIn.
> "stdin := FileStream stdin."
> stdin := ZnCharacterReadStream on:
>     (ZnBufferedReadStream on:
>         Stdio stdin).
> stream := (String new: 100) writeStream.
> ch := stdin next.
> [ ch == nil ] whileFalse: [
>     stream nextPut: ch.
>     ch := stdin next. ].
> string := stream contents.
> FileStream stdout
>     nextPutAll: string; lf;
>     nextPutAll: 'Characters read: ';
>     nextPutAll: string size asString;
>     lf.
> Smalltalk snapshot: false andQuit: true.
> --
>
> Execute with:
>
> ./pharo --headless Pharo7.0-64bit-e76f1a2.image testAtEnd.st
>
> and type Ctrl-D gives:
>
>
> 'Errors in script loaded from testAtEnd.st'
> MessageNotUnderstood: receiver of "<" is nil
> UndefinedObject(Object)>>doesNotUnderstand: #<
> ZnUTF8Encoder>>nextCodePointFromStream:
> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
> ZnCharacterReadStream>>nextElement
> ZnCharacterReadStream(ZnEncodedReadStream)>>next
> UndefinedObject>>DoIt
> OpalCompiler>>evaluate
>
>
> Using #atEnd to control the loop instead of "stdin next == nil"
> produces the same result.
>
> Replacing stdin with FileStream stdin makes the script work.
>
> stdio.cs fixes a bug in StdioStream which really isn't part of this
> discussion (PR to be submitted).
>
> Cheers,
> Alistair
>
>
>
>
> >> Does that clarify the situation?
> >
> > Yes, it helps. Thanks. But questions remain.
> >
> >> Thanks,
> >> Alistair
> >>
> >>
> >>
> >>>> On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com>
> wrote:
> >>>>
> >>>> First a quick update:
> >>>>
> >>>> After doing some work on primitiveFileAtEnd, #atEnd now answers
> >>>> correctly for files that don't report their size correctly, e.g.
> >>>> /dev/urandom and /proc/cpuinfo, whether the files are opened directly
> or
> >>>> redirected through stdin.
> >>>>
> >>>> However determining whether stdin from a terminal has reached the end
> of
> >>>> file can't be done without making #atEnd blocking since we have to
> wait
> >>>> for the user to flag the end of file, e.g. by typing Ctrl-D.  And
> #atEnd
> >>>> is assumed to be non-blocking.
> >>>>
> >>>> So currently using ZnCharacterReadStream with stdin from a terminal
> will
> >>>> result in a stack dump similar to:
> >>>>
> >>>> MessageNotUnderstood: receiver of "<" is nil
> >>>> UndefinedObject(Object)>>doesNotUnderstand: #<
> >>>> ZnUTF8Encoder>>nextCodePointFromStream:
> >>>> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
> >>>> ZnCharacterReadStream>>nextElement
> >>>> ZnCharacterReadStream(ZnEncodedReadStream)>>next
> >>>> UndefinedObject>>DoIt
> >>>>
> >>>>
> >>>> Going back through the various suggestions that have been made
> regarding
> >>>> using a sentinel object vs. raising a notification / exception, my
> >>>> (still to be polished) suggestion is to:
> >>>>
> >>>> 1. Add an endOfStream instance variable
> >>>> 2. When the end of the stream is reached answer the value of the
> >>>>  instance variable (i.e. the result of sending #value to the
> variable).
> >>>> 3. The initial default value would be a block that raises a
> Deprecation
> >>>>  warning and then returns nil.  This would allow existing code to
> >>>>  function for a changeover period.
> >>>> 4. At the end of the deprecation period the default value would be
> >>>>  changed to a unique sentinel object which would answer itself as its
> >>>>  #value.
> >>>>
> >>>> At any time users of the stream can set their own sentinel, including
> a
> >>>> block that raises an exception.
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Alistair
> >>>>
> >>>>
> >>>> On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com>
> wrote:
> >>>>> Thanks for this discussion.
> >>>>>
> >>>>> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu>
> wrote:
> >>>>>> Alistair,
> >>>>>>
> >>>>>> First off, thanks for the discussions and your contributions, I
> really appreciate them.
> >>>>>>
> >>>>>> But I want to have a discussion at the high level of the definition
> and semantics of the stream API in Pharo.
> >>>>>>
> >>>>>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu>
> wrote:
> >>>>>>>> Playing a bit devil's advocate, the idea is that, in general,
> >>>>>>>>
> >>>>>>>> [ stream atEnd] whileFalse: [ stream next. "..." ].
> >>>>>>>>
> >>>>>>>> is no longer allowed ?
> >>>>>>>
> >>>>>>> It hasn't been allowed "forever" [1].  It's just been misused for
> >>>>>>> almost as long.
> >>>>>>>
> >>>>>>> [1] Time began when stdio stream support was introduced. :-)
> >>>>>>
> >>>>>> I am still not convinced. Another way to put it would be that the
> old #atEnd or #upToEnd do not make sense for these streams and some new
> loop is needed, based on a new test (it exists for socket streams already).
> >>>>>>
> >>>>>> [ stream isDataAvailable ] whileTrue: [ stream next ]
> >>>>>>
> >>>>>>>> And you want to replace it with
> >>>>>>>>
> >>>>>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ]
> whileTrue.
> >>>>>>>>
> >>>>>>>> That is a pretty big change, no ?
> >>>>>>>
> >>>>>>> That's the way quite a bit of code already operates.
> >>>>>>>
> >>>>>>> As Denis pointed out, it's obviously problematic in the general
> sense,
> >>>>>>> since nil can be embedded in non-byte oriented streams.  I suspect
> >>>>>>> that in practice not many people write code that reads streams from
> >>>>>>> both byte oriented and non-byte oriented streams.
> >>>>>>
> >>>>>> Maybe yes, maybe no. As Denis' example shows there is a clear
> definition problem.
> >>>>>>
> >>>>>> And I do use streams of byte arrays or strings all the time, this
> is really important. I want my parsers to work on all kinds of streams.
> >>>>>>
> >>>>>>>> I think/feel like a proper EOF exception would be better, more
> correct.
> >>>>>>>>
> >>>>>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
> >>>>>>>
> >>>>>>> I agree, but the email thread Nicolas pointed to raises some
> >>>>>>> performance questions about this approach.  It should be
> >>>>>>> straightforward to do a basic performance comparison which I'll get
> >>>>>>> around to if other objections aren't raised.
> >>>>>>
> >>>>>> Reading in bigger blocks, using #readInto:startingAt:count: (which
> is basically Unix's (2) Read sys call), would solve performance problems, I
> think.
> >>>>>>
> >>>>>>>> Will we throw away #atEnd then ? Do we need it if we cannot use
> it ?
> >>>>>>>
> >>>>>>> Unix file i/o returns EOF if the end of file has been reach OR if
> an
> >>>>>>> error occurs.  You should still check #atEnd after reading past the
> >>>>>>> end of the file to make sure no error occurred.  Another part of
> the
> >>>>>>> primitive change I'm proposing is to return additional information
> >>>>>>> about what went wrong in the event of an error.
> >>>>>>
> >>>>>> I am sorry, but this kind of semantics (the OR) is way too complex
> at the general image level, it is too specific and based on certain
> underlying implementation details.
> >>>>>>
> >>>>>> Sven
> >>>>>>
> >>>>>>> We could modify the read primitive so that it fails if an error has
> >>>>>>> occurred, and then #atEnd wouldn't be required.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Alistair
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Nicolas,
> >>>>>>>>>
> >>>>>>>>> On 4 April 2018 at 12:36, Nicolas Cellier
> >>>>>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <
> akgrant0...@gmail.com>:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Sven,
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van
> Caekenberghe wrote:
> >>>>>>>>>>>> Somehow, somewhere there was a change to the implementation
> of the
> >>>>>>>>>>>> primitive called by some streams' #atEnd.
> >>>>>>>>>>>
> >>>>>>>>>>> That's a proposed change by me, but it hasn't been integrated
> yet.  So
> >>>>>>>>>>> the discussion below should apply to the current stable vm
> (from August
> >>>>>>>>>>> last year).
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> IIRC, someone said it is implemented as 'remaining size being
> zero'
> >>>>>>>>>>>> and some virtual unix files like /dev/random are zero sized.
> >>>>>>>>>>>
> >>>>>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin)
> it is
> >>>>>>>>>>> effectively defined as:
> >>>>>>>>>>>
> >>>>>>>>>>> atEnd := stream position >= stream size
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> And, as you say, plenty of virtual unix files report size 0.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Now, all kinds of changes are being done image size to work
> around this.
> >>>>>>>>>>>
> >>>>>>>>>>> I would phrase this slightly differently :-)
> >>>>>>>>>>>
> >>>>>>>>>>> Some code does the right thing, while other code doesn't.
> E.g.:
> >>>>>>>>>>>
> >>>>>>>>>>> MultiByteFileStream>>upToEnd is good, while
> >>>>>>>>>>> FileStream>>contents is incorrect
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> I am a strong believer in simple, real (i.e. infinite)
> streams, but I
> >>>>>>>>>>>> am not sure we are doing the right thing here.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Point is, I am not sure #next returning nil is official and
> universal.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Consider the comments:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Stream>>#next
> >>>>>>>>>>>> "Answer the next object accessible by the receiver."
> >>>>>>>>>>>>
> >>>>>>>>>>>> ReadStream>>#next
> >>>>>>>>>>>> "Primitive. Answer the next object in the Stream represented
> by the
> >>>>>>>>>>>> receiver. Fail if the collection of this stream is not an
> Array or a
> >>>>>>>>>>>> String.
> >>>>>>>>>>>> Fail if the stream is positioned at its end, or if the
> position is out
> >>>>>>>>>>>> of
> >>>>>>>>>>>> bounds in the collection. Optional. See Object documentation
> >>>>>>>>>>>> whatIsAPrimitive."
> >>>>>>>>>>>>
> >>>>>>>>>>>> Note how there is no talk about returning nil !
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think we should discuss about this first.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Was the low level change really correct and the right thing
> to do ?
> >>>>>>>>>>>
> >>>>>>>>>>> The primitive change proposed doesn't affect this discussion.
> It will
> >>>>>>>>>>> mean that #atEnd returns false (correctly) sometimes, while
> currently it
> >>>>>>>>>>> returns true (incorrectly).  The end result is still
> incorrect, e.g.
> >>>>>>>>>>> #contents returns an empty string for /proc/cpuinfo.
> >>>>>>>>>>>
> >>>>>>>>>>> You're correct about no mention of nil, but we have:
> >>>>>>>>>>>
> >>>>>>>>>>> FileStream>>next
> >>>>>>>>>>>
> >>>>>>>>>>>     (position >= readLimit and: [self atEnd])
> >>>>>>>>>>>             ifTrue: [^nil]
> >>>>>>>>>>>             ifFalse: [^collection at: (position := position +
> 1)]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> which has been around for a long time (I suspect, before Pharo
> existed).
> >>>>>>>>>>>
> >>>>>>>>>>> Having said that, I think that raising an exception is a better
> >>>>>>>>>>> solution, but it is a much, much bigger change than the one I
> proposed
> >>>>>>>>>>> in https://github.com/pharo-project/pharo/pull/1180.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Alistair
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> yes, if you are after universal behavior englobing Unix
> streams, the
> >>>>>>>>>> Exception might be the best way.
> >>>>>>>>>> Because on special stream you can't allways say in advance, you
> have to try.
> >>>>>>>>>> That's the solution adopted by authors of Xtreams.
> >>>>>>>>>> But there is a runtime penalty associated to it.
> >>>>>>>>>>
> >>>>>>>>>> The penalty once was so high that my proposal to generalize
> EndOfStream
> >>>>>>>>>> usage was rejected a few years ago by AndreaRaab.
> >>>>>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html
> >>>>>>>>>
> >>>>>>>>> Thanks for this, I'll definitely take a look.
> >>>>>>>>>
> >>>>>>>>> Do you have a sense of how Denis' suggestion of using an
> EndOfStream
> >>>>>>>>> object would compare?
> >>>>>>>>>
> >>>>>>>>> It would keep the same coding style, but avoid the problems with
> nil.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Alistair
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> I have regularly benched Xtreams, but stopped a few years ago.
> >>>>>>>>>> Maybe i can excavate and pass on newer VM.
> >>>>>>>>>>
> >>>>>>>>>> In the mean time, i had experimented a programmable end of
> stream behavior
> >>>>>>>>>> (via a block, or any other valuable)
> >>>>>>>>>> http://www.squeaksource.com/XTream.htm
> >>>>>>>>>> so as to reconcile performance and universality, but it was a
> source of
> >>>>>>>>>> complexification at implementation side.
> >>>>>>>>>>
> >>>>>>>>>> Nicolas
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Note also that a Guille introduced something new, #closed
> which is
> >>>>>>>>>>>> related to the difference between having no more elements
> (maybe right now,
> >>>>>>>>>>>> like an open network stream) and never ever being able to
> produce more data.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sven
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >> <stdio.cs>
> >
> >
>

Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Reply via email to