2018-04-11 8:32 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>: > >>> Where is it being said that #next and/or #atEnd should be blocking or > non-blocking ? > >> > >> There is existing code that assumes that #atEnd is non-blocking and > >> that #next is allowed block. I believe that we should keep those > >> conditions. > > > > I fail to see where that is written down, either way. Can you point me > to comments stating that, I would really like to know ? > > I'm not aware of it being written down, just that ever existing > implementation I'm aware of behaves this way. > > On the other hand, making #atEnd blocking breaks Eliot's REPL sample > (in Squeak). >
Could you write here this example, please? > > > > >>> How is this related to how EOF is signalled ? > >> > >> Because, combined with terminal EOF not being known until the user > >> explicitly flags it (with Ctrl-D) it means that #atEnd can't be used > >> for iterating over input from stdin connected to a terminal. > > > > This seems to me like an exception that only holds for one particular > stream in one particular scenario (interactive stdin). I might be wrong. > > > >>> It seems to me that there are quite a few classes of streams that are > 'special' in the sense that #next could be blocking and/or #atEnd could be > unclear - socket/network streams, serial streams, maybe stdio (interactive > or not). Without a message like #isDataAvailable you cannot handle those > without blocking. > >> > >> Right. I think this is a distraction (I was trying to explain some > >> details, but it's causing more confusion instead of helping). > >> > >> The important point is that #atEnd doesn't work for iterating over > >> streams with terminal input > > > > Maybe you should also point to the actual code that fails. I mean you > showed a partial stack trace, but not how you got there, precisely. How > does the application reading from an interactive stdin do to get into > trouble ? > > Included below. > > > >>> Reading from stdin seems like a very rare case for a Smalltalk system > (not that it should not be possible). > >> > >> There's been quite a bit of discussion and several projects recently > >> related to using pharo for scripting, so it may become more common. > >> E.g. > >> > >> https://www.quora.com/Can-Smalltalk-be-a-batch-file- > scripting-language/answer/Philippe-Back-1?share=c19bfc95 > >> https://github.com/rajula96reddy/pharo-cli > > > > Still, it is not common at all. > > > >>> I have a feeling that too much functionality is being pushed into too > small an API. > >> > >> This is just about how should Zinc streams be iterating over the > >> underlying streams. You didn't like checking the result of #next for > >> nil since it isn't general, correctly pointing out that nil is a valid > >> value for non-byte oriented streams. But #atEnd doesn't work for > >> stdin from a terminal. > >> > >> > >> At this point I think there are three options: > >> > >> 1. Modify Zinc to check the return value of #next instead of using > #atEnd. > >> > >> This is what all existing character / byte oriented streams in Squeak > >> and Pharo do. At that point the Zinc streams can be used on all file > >> / stdio input and output. > > > > I agree that such code exists in many places, but there is lots of > stream reading that does not check for nils. > > Right. Streams can be categorised in many ways, but for this > discussion I think streams are broken in to two types: > > 1) Byte / Character oriented > 2) All others > > For historical reasons, byte / character oriented streams need to > check for EOF by using "stream next == nil" and all other streams > should use #atEnd. > > This avoids the "nil being part of the domain" issue that was > discussed earlier in the thread. > > > >> 2. Modify all streams to signal EOF in some other way, i.e. a sentinel > >> or notification / exception. > >> > >> This is what we were discussing below. But it is a decent chunk of > >> work with significant impact on the existing code base. > > > > Agreed. This would be a future extension. > > > >> 3. Require anyone who wants to read from stdin to code around Zinc's > >> inability to handle terminal input. > >> > >> I'd prefer to avoid this option if possible. > > > > See higher for a more concrete usage example request. > > > testAtEnd.st > -- > | ch stream string stdin | > > 'stdio.cs' asFileReference fileIn. > "stdin := FileStream stdin." > stdin := ZnCharacterReadStream on: > (ZnBufferedReadStream on: > Stdio stdin). > stream := (String new: 100) writeStream. > ch := stdin next. > [ ch == nil ] whileFalse: [ > stream nextPut: ch. > ch := stdin next. ]. > string := stream contents. > FileStream stdout > nextPutAll: string; lf; > nextPutAll: 'Characters read: '; > nextPutAll: string size asString; > lf. > Smalltalk snapshot: false andQuit: true. > -- > > Execute with: > > ./pharo --headless Pharo7.0-64bit-e76f1a2.image testAtEnd.st > > and type Ctrl-D gives: > > > 'Errors in script loaded from testAtEnd.st' > MessageNotUnderstood: receiver of "<" is nil > UndefinedObject(Object)>>doesNotUnderstand: #< > ZnUTF8Encoder>>nextCodePointFromStream: > ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream: > ZnCharacterReadStream>>nextElement > ZnCharacterReadStream(ZnEncodedReadStream)>>next > UndefinedObject>>DoIt > OpalCompiler>>evaluate > > > Using #atEnd to control the loop instead of "stdin next == nil" > produces the same result. > > Replacing stdin with FileStream stdin makes the script work. > > stdio.cs fixes a bug in StdioStream which really isn't part of this > discussion (PR to be submitted). > > Cheers, > Alistair > > > > > >> Does that clarify the situation? > > > > Yes, it helps. Thanks. But questions remain. > > > >> Thanks, > >> Alistair > >> > >> > >> > >>>> On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com> > wrote: > >>>> > >>>> First a quick update: > >>>> > >>>> After doing some work on primitiveFileAtEnd, #atEnd now answers > >>>> correctly for files that don't report their size correctly, e.g. > >>>> /dev/urandom and /proc/cpuinfo, whether the files are opened directly > or > >>>> redirected through stdin. > >>>> > >>>> However determining whether stdin from a terminal has reached the end > of > >>>> file can't be done without making #atEnd blocking since we have to > wait > >>>> for the user to flag the end of file, e.g. by typing Ctrl-D. And > #atEnd > >>>> is assumed to be non-blocking. > >>>> > >>>> So currently using ZnCharacterReadStream with stdin from a terminal > will > >>>> result in a stack dump similar to: > >>>> > >>>> MessageNotUnderstood: receiver of "<" is nil > >>>> UndefinedObject(Object)>>doesNotUnderstand: #< > >>>> ZnUTF8Encoder>>nextCodePointFromStream: > >>>> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream: > >>>> ZnCharacterReadStream>>nextElement > >>>> ZnCharacterReadStream(ZnEncodedReadStream)>>next > >>>> UndefinedObject>>DoIt > >>>> > >>>> > >>>> Going back through the various suggestions that have been made > regarding > >>>> using a sentinel object vs. raising a notification / exception, my > >>>> (still to be polished) suggestion is to: > >>>> > >>>> 1. Add an endOfStream instance variable > >>>> 2. When the end of the stream is reached answer the value of the > >>>> instance variable (i.e. the result of sending #value to the > variable). > >>>> 3. The initial default value would be a block that raises a > Deprecation > >>>> warning and then returns nil. This would allow existing code to > >>>> function for a changeover period. > >>>> 4. At the end of the deprecation period the default value would be > >>>> changed to a unique sentinel object which would answer itself as its > >>>> #value. > >>>> > >>>> At any time users of the stream can set their own sentinel, including > a > >>>> block that raises an exception. > >>>> > >>>> > >>>> Cheers, > >>>> Alistair > >>>> > >>>> > >>>> On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com> > wrote: > >>>>> Thanks for this discussion. > >>>>> > >>>>> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu> > wrote: > >>>>>> Alistair, > >>>>>> > >>>>>> First off, thanks for the discussions and your contributions, I > really appreciate them. > >>>>>> > >>>>>> But I want to have a discussion at the high level of the definition > and semantics of the stream API in Pharo. > >>>>>> > >>>>>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> > wrote: > >>>>>>> > >>>>>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> > wrote: > >>>>>>>> Playing a bit devil's advocate, the idea is that, in general, > >>>>>>>> > >>>>>>>> [ stream atEnd] whileFalse: [ stream next. "..." ]. > >>>>>>>> > >>>>>>>> is no longer allowed ? > >>>>>>> > >>>>>>> It hasn't been allowed "forever" [1]. It's just been misused for > >>>>>>> almost as long. > >>>>>>> > >>>>>>> [1] Time began when stdio stream support was introduced. :-) > >>>>>> > >>>>>> I am still not convinced. Another way to put it would be that the > old #atEnd or #upToEnd do not make sense for these streams and some new > loop is needed, based on a new test (it exists for socket streams already). > >>>>>> > >>>>>> [ stream isDataAvailable ] whileTrue: [ stream next ] > >>>>>> > >>>>>>>> And you want to replace it with > >>>>>>>> > >>>>>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] > whileTrue. > >>>>>>>> > >>>>>>>> That is a pretty big change, no ? > >>>>>>> > >>>>>>> That's the way quite a bit of code already operates. > >>>>>>> > >>>>>>> As Denis pointed out, it's obviously problematic in the general > sense, > >>>>>>> since nil can be embedded in non-byte oriented streams. I suspect > >>>>>>> that in practice not many people write code that reads streams from > >>>>>>> both byte oriented and non-byte oriented streams. > >>>>>> > >>>>>> Maybe yes, maybe no. As Denis' example shows there is a clear > definition problem. > >>>>>> > >>>>>> And I do use streams of byte arrays or strings all the time, this > is really important. I want my parsers to work on all kinds of streams. > >>>>>> > >>>>>>>> I think/feel like a proper EOF exception would be better, more > correct. > >>>>>>>> > >>>>>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue. > >>>>>>> > >>>>>>> I agree, but the email thread Nicolas pointed to raises some > >>>>>>> performance questions about this approach. It should be > >>>>>>> straightforward to do a basic performance comparison which I'll get > >>>>>>> around to if other objections aren't raised. > >>>>>> > >>>>>> Reading in bigger blocks, using #readInto:startingAt:count: (which > is basically Unix's (2) Read sys call), would solve performance problems, I > think. > >>>>>> > >>>>>>>> Will we throw away #atEnd then ? Do we need it if we cannot use > it ? > >>>>>>> > >>>>>>> Unix file i/o returns EOF if the end of file has been reach OR if > an > >>>>>>> error occurs. You should still check #atEnd after reading past the > >>>>>>> end of the file to make sure no error occurred. Another part of > the > >>>>>>> primitive change I'm proposing is to return additional information > >>>>>>> about what went wrong in the event of an error. > >>>>>> > >>>>>> I am sorry, but this kind of semantics (the OR) is way too complex > at the general image level, it is too specific and based on certain > underlying implementation details. > >>>>>> > >>>>>> Sven > >>>>>> > >>>>>>> We could modify the read primitive so that it fails if an error has > >>>>>>> occurred, and then #atEnd wouldn't be required. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Alistair > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> > wrote: > >>>>>>>>> > >>>>>>>>> Hi Nicolas, > >>>>>>>>> > >>>>>>>>> On 4 April 2018 at 12:36, Nicolas Cellier > >>>>>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant < > akgrant0...@gmail.com>: > >>>>>>>>>>> > >>>>>>>>>>> Hi Sven, > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van > Caekenberghe wrote: > >>>>>>>>>>>> Somehow, somewhere there was a change to the implementation > of the > >>>>>>>>>>>> primitive called by some streams' #atEnd. > >>>>>>>>>>> > >>>>>>>>>>> That's a proposed change by me, but it hasn't been integrated > yet. So > >>>>>>>>>>> the discussion below should apply to the current stable vm > (from August > >>>>>>>>>>> last year). > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> IIRC, someone said it is implemented as 'remaining size being > zero' > >>>>>>>>>>>> and some virtual unix files like /dev/random are zero sized. > >>>>>>>>>>> > >>>>>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) > it is > >>>>>>>>>>> effectively defined as: > >>>>>>>>>>> > >>>>>>>>>>> atEnd := stream position >= stream size > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> And, as you say, plenty of virtual unix files report size 0. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Now, all kinds of changes are being done image size to work > around this. > >>>>>>>>>>> > >>>>>>>>>>> I would phrase this slightly differently :-) > >>>>>>>>>>> > >>>>>>>>>>> Some code does the right thing, while other code doesn't. > E.g.: > >>>>>>>>>>> > >>>>>>>>>>> MultiByteFileStream>>upToEnd is good, while > >>>>>>>>>>> FileStream>>contents is incorrect > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> I am a strong believer in simple, real (i.e. infinite) > streams, but I > >>>>>>>>>>>> am not sure we are doing the right thing here. > >>>>>>>>>>>> > >>>>>>>>>>>> Point is, I am not sure #next returning nil is official and > universal. > >>>>>>>>>>>> > >>>>>>>>>>>> Consider the comments: > >>>>>>>>>>>> > >>>>>>>>>>>> Stream>>#next > >>>>>>>>>>>> "Answer the next object accessible by the receiver." > >>>>>>>>>>>> > >>>>>>>>>>>> ReadStream>>#next > >>>>>>>>>>>> "Primitive. Answer the next object in the Stream represented > by the > >>>>>>>>>>>> receiver. Fail if the collection of this stream is not an > Array or a > >>>>>>>>>>>> String. > >>>>>>>>>>>> Fail if the stream is positioned at its end, or if the > position is out > >>>>>>>>>>>> of > >>>>>>>>>>>> bounds in the collection. Optional. See Object documentation > >>>>>>>>>>>> whatIsAPrimitive." > >>>>>>>>>>>> > >>>>>>>>>>>> Note how there is no talk about returning nil ! > >>>>>>>>>>>> > >>>>>>>>>>>> I think we should discuss about this first. > >>>>>>>>>>>> > >>>>>>>>>>>> Was the low level change really correct and the right thing > to do ? > >>>>>>>>>>> > >>>>>>>>>>> The primitive change proposed doesn't affect this discussion. > It will > >>>>>>>>>>> mean that #atEnd returns false (correctly) sometimes, while > currently it > >>>>>>>>>>> returns true (incorrectly). The end result is still > incorrect, e.g. > >>>>>>>>>>> #contents returns an empty string for /proc/cpuinfo. > >>>>>>>>>>> > >>>>>>>>>>> You're correct about no mention of nil, but we have: > >>>>>>>>>>> > >>>>>>>>>>> FileStream>>next > >>>>>>>>>>> > >>>>>>>>>>> (position >= readLimit and: [self atEnd]) > >>>>>>>>>>> ifTrue: [^nil] > >>>>>>>>>>> ifFalse: [^collection at: (position := position + > 1)] > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> which has been around for a long time (I suspect, before Pharo > existed). > >>>>>>>>>>> > >>>>>>>>>>> Having said that, I think that raising an exception is a better > >>>>>>>>>>> solution, but it is a much, much bigger change than the one I > proposed > >>>>>>>>>>> in https://github.com/pharo-project/pharo/pull/1180. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Alistair > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> yes, if you are after universal behavior englobing Unix > streams, the > >>>>>>>>>> Exception might be the best way. > >>>>>>>>>> Because on special stream you can't allways say in advance, you > have to try. > >>>>>>>>>> That's the solution adopted by authors of Xtreams. > >>>>>>>>>> But there is a runtime penalty associated to it. > >>>>>>>>>> > >>>>>>>>>> The penalty once was so high that my proposal to generalize > EndOfStream > >>>>>>>>>> usage was rejected a few years ago by AndreaRaab. > >>>>>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html > >>>>>>>>> > >>>>>>>>> Thanks for this, I'll definitely take a look. > >>>>>>>>> > >>>>>>>>> Do you have a sense of how Denis' suggestion of using an > EndOfStream > >>>>>>>>> object would compare? > >>>>>>>>> > >>>>>>>>> It would keep the same coding style, but avoid the problems with > nil. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Alistair > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> I have regularly benched Xtreams, but stopped a few years ago. > >>>>>>>>>> Maybe i can excavate and pass on newer VM. > >>>>>>>>>> > >>>>>>>>>> In the mean time, i had experimented a programmable end of > stream behavior > >>>>>>>>>> (via a block, or any other valuable) > >>>>>>>>>> http://www.squeaksource.com/XTream.htm > >>>>>>>>>> so as to reconcile performance and universality, but it was a > source of > >>>>>>>>>> complexification at implementation side. > >>>>>>>>>> > >>>>>>>>>> Nicolas > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Note also that a Guille introduced something new, #closed > which is > >>>>>>>>>>>> related to the difference between having no more elements > (maybe right now, > >>>>>>>>>>>> like an open network stream) and never ever being able to > produce more data. > >>>>>>>>>>>> > >>>>>>>>>>>> Sven > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >> <stdio.cs> > > > > >