I have trouble understanding your problem analysis, and how your proposed solution, would solve it.
Where is it being said that #next and/or #atEnd should be blocking or non-blocking ? How is this related to how EOF is signalled ? It seems to me that there are quite a few classes of streams that are 'special' in the sense that #next could be blocking and/or #atEnd could be unclear - socket/network streams, serial streams, maybe stdio (interactive or not). Without a message like #isDataAvailable you cannot handle those without blocking. Reading from stdin seems like a very rare case for a Smalltalk system (not that it should not be possible). I have a feeling that too much functionality is being pushed into too small an API. > On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com> wrote: > > First a quick update: > > After doing some work on primitiveFileAtEnd, #atEnd now answers > correctly for files that don't report their size correctly, e.g. > /dev/urandom and /proc/cpuinfo, whether the files are opened directly or > redirected through stdin. > > However determining whether stdin from a terminal has reached the end of > file can't be done without making #atEnd blocking since we have to wait > for the user to flag the end of file, e.g. by typing Ctrl-D. And #atEnd > is assumed to be non-blocking. > > So currently using ZnCharacterReadStream with stdin from a terminal will > result in a stack dump similar to: > > MessageNotUnderstood: receiver of "<" is nil > UndefinedObject(Object)>>doesNotUnderstand: #< > ZnUTF8Encoder>>nextCodePointFromStream: > ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream: > ZnCharacterReadStream>>nextElement > ZnCharacterReadStream(ZnEncodedReadStream)>>next > UndefinedObject>>DoIt > > > Going back through the various suggestions that have been made regarding > using a sentinel object vs. raising a notification / exception, my > (still to be polished) suggestion is to: > > 1. Add an endOfStream instance variable > 2. When the end of the stream is reached answer the value of the > instance variable (i.e. the result of sending #value to the variable). > 3. The initial default value would be a block that raises a Deprecation > warning and then returns nil. This would allow existing code to > function for a changeover period. > 4. At the end of the deprecation period the default value would be > changed to a unique sentinel object which would answer itself as its > #value. > > At any time users of the stream can set their own sentinel, including a > block that raises an exception. > > > Cheers, > Alistair > > > On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com> wrote: >> Thanks for this discussion. >> >> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote: >>> Alistair, >>> >>> First off, thanks for the discussions and your contributions, I really >>> appreciate them. >>> >>> But I want to have a discussion at the high level of the definition and >>> semantics of the stream API in Pharo. >>> >>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> wrote: >>>> >>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> wrote: >>>>> Playing a bit devil's advocate, the idea is that, in general, >>>>> >>>>> [ stream atEnd] whileFalse: [ stream next. "..." ]. >>>>> >>>>> is no longer allowed ? >>>> >>>> It hasn't been allowed "forever" [1]. It's just been misused for >>>> almost as long. >>>> >>>> [1] Time began when stdio stream support was introduced. :-) >>> >>> I am still not convinced. Another way to put it would be that the old >>> #atEnd or #upToEnd do not make sense for these streams and some new loop is >>> needed, based on a new test (it exists for socket streams already). >>> >>> [ stream isDataAvailable ] whileTrue: [ stream next ] >>> >>>>> And you want to replace it with >>>>> >>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue. >>>>> >>>>> That is a pretty big change, no ? >>>> >>>> That's the way quite a bit of code already operates. >>>> >>>> As Denis pointed out, it's obviously problematic in the general sense, >>>> since nil can be embedded in non-byte oriented streams. I suspect >>>> that in practice not many people write code that reads streams from >>>> both byte oriented and non-byte oriented streams. >>> >>> Maybe yes, maybe no. As Denis' example shows there is a clear definition >>> problem. >>> >>> And I do use streams of byte arrays or strings all the time, this is really >>> important. I want my parsers to work on all kinds of streams. >>> >>>>> I think/feel like a proper EOF exception would be better, more correct. >>>>> >>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue. >>>> >>>> I agree, but the email thread Nicolas pointed to raises some >>>> performance questions about this approach. It should be >>>> straightforward to do a basic performance comparison which I'll get >>>> around to if other objections aren't raised. >>> >>> Reading in bigger blocks, using #readInto:startingAt:count: (which is >>> basically Unix's (2) Read sys call), would solve performance problems, I >>> think. >>> >>>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ? >>>> >>>> Unix file i/o returns EOF if the end of file has been reach OR if an >>>> error occurs. You should still check #atEnd after reading past the >>>> end of the file to make sure no error occurred. Another part of the >>>> primitive change I'm proposing is to return additional information >>>> about what went wrong in the event of an error. >>> >>> I am sorry, but this kind of semantics (the OR) is way too complex at the >>> general image level, it is too specific and based on certain underlying >>> implementation details. >>> >>> Sven >>> >>>> We could modify the read primitive so that it fails if an error has >>>> occurred, and then #atEnd wouldn't be required. >>>> >>>> Cheers, >>>> Alistair >>>> >>>> >>>> >>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> wrote: >>>>>> >>>>>> Hi Nicolas, >>>>>> >>>>>> On 4 April 2018 at 12:36, Nicolas Cellier >>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>: >>>>>>>> >>>>>>>> Hi Sven, >>>>>>>> >>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote: >>>>>>>>> Somehow, somewhere there was a change to the implementation of the >>>>>>>>> primitive called by some streams' #atEnd. >>>>>>>> >>>>>>>> That's a proposed change by me, but it hasn't been integrated yet. So >>>>>>>> the discussion below should apply to the current stable vm (from August >>>>>>>> last year). >>>>>>>> >>>>>>>> >>>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero' >>>>>>>>> and some virtual unix files like /dev/random are zero sized. >>>>>>>> >>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is >>>>>>>> effectively defined as: >>>>>>>> >>>>>>>> atEnd := stream position >= stream size >>>>>>>> >>>>>>>> >>>>>>>> And, as you say, plenty of virtual unix files report size 0. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Now, all kinds of changes are being done image size to work around >>>>>>>>> this. >>>>>>>> >>>>>>>> I would phrase this slightly differently :-) >>>>>>>> >>>>>>>> Some code does the right thing, while other code doesn't. E.g.: >>>>>>>> >>>>>>>> MultiByteFileStream>>upToEnd is good, while >>>>>>>> FileStream>>contents is incorrect >>>>>>>> >>>>>>>> >>>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I >>>>>>>>> am not sure we are doing the right thing here. >>>>>>>>> >>>>>>>>> Point is, I am not sure #next returning nil is official and universal. >>>>>>>>> >>>>>>>>> Consider the comments: >>>>>>>>> >>>>>>>>> Stream>>#next >>>>>>>>> "Answer the next object accessible by the receiver." >>>>>>>>> >>>>>>>>> ReadStream>>#next >>>>>>>>> "Primitive. Answer the next object in the Stream represented by the >>>>>>>>> receiver. Fail if the collection of this stream is not an Array or a >>>>>>>>> String. >>>>>>>>> Fail if the stream is positioned at its end, or if the position is out >>>>>>>>> of >>>>>>>>> bounds in the collection. Optional. See Object documentation >>>>>>>>> whatIsAPrimitive." >>>>>>>>> >>>>>>>>> Note how there is no talk about returning nil ! >>>>>>>>> >>>>>>>>> I think we should discuss about this first. >>>>>>>>> >>>>>>>>> Was the low level change really correct and the right thing to do ? >>>>>>>> >>>>>>>> The primitive change proposed doesn't affect this discussion. It will >>>>>>>> mean that #atEnd returns false (correctly) sometimes, while currently >>>>>>>> it >>>>>>>> returns true (incorrectly). The end result is still incorrect, e.g. >>>>>>>> #contents returns an empty string for /proc/cpuinfo. >>>>>>>> >>>>>>>> You're correct about no mention of nil, but we have: >>>>>>>> >>>>>>>> FileStream>>next >>>>>>>> >>>>>>>> (position >= readLimit and: [self atEnd]) >>>>>>>> ifTrue: [^nil] >>>>>>>> ifFalse: [^collection at: (position := position + 1)] >>>>>>>> >>>>>>>> >>>>>>>> which has been around for a long time (I suspect, before Pharo >>>>>>>> existed). >>>>>>>> >>>>>>>> Having said that, I think that raising an exception is a better >>>>>>>> solution, but it is a much, much bigger change than the one I proposed >>>>>>>> in https://github.com/pharo-project/pharo/pull/1180. >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Alistair >>>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> yes, if you are after universal behavior englobing Unix streams, the >>>>>>> Exception might be the best way. >>>>>>> Because on special stream you can't allways say in advance, you have to >>>>>>> try. >>>>>>> That's the solution adopted by authors of Xtreams. >>>>>>> But there is a runtime penalty associated to it. >>>>>>> >>>>>>> The penalty once was so high that my proposal to generalize EndOfStream >>>>>>> usage was rejected a few years ago by AndreaRaab. >>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html >>>>>> >>>>>> Thanks for this, I'll definitely take a look. >>>>>> >>>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream >>>>>> object would compare? >>>>>> >>>>>> It would keep the same coding style, but avoid the problems with nil. >>>>>> >>>>>> Thanks, >>>>>> Alistair >>>>>> >>>>>> >>>>>> >>>>>>> I have regularly benched Xtreams, but stopped a few years ago. >>>>>>> Maybe i can excavate and pass on newer VM. >>>>>>> >>>>>>> In the mean time, i had experimented a programmable end of stream >>>>>>> behavior >>>>>>> (via a block, or any other valuable) >>>>>>> http://www.squeaksource.com/XTream.htm >>>>>>> so as to reconcile performance and universality, but it was a source of >>>>>>> complexification at implementation side. >>>>>>> >>>>>>> Nicolas >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Note also that a Guille introduced something new, #closed which is >>>>>>>>> related to the difference between having no more elements (maybe >>>>>>>>> right now, >>>>>>>>> like an open network stream) and never ever being able to produce >>>>>>>>> more data. >>>>>>>>> >>>>>>>>> Sven >>>>> >>>>> >>>> >>> >>> >> >