Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Sven Van Caekenberghe Tue, 10 Apr 2018 10:37:04 -0700

I have trouble understanding your problem analysis, and how your proposed 
solution, would solve it.


Where is it being said that #next and/or #atEnd should be blocking or 
non-blocking ?
How is this related to how EOF is signalled ?

It seems to me that there are quite a few classes of streams that are 'special' 
in the sense that #next could be blocking and/or #atEnd could be unclear - 
socket/network streams, serial streams, maybe stdio (interactive or not). 
Without a message like #isDataAvailable you cannot handle those without 
blocking.

Reading from stdin seems like a very rare case for a Smalltalk system (not that 
it should not be possible).

I have a feeling that too much functionality is being pushed into too small an 
API.

> On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com> wrote:
> 
> First a quick update:
> 
> After doing some work on primitiveFileAtEnd, #atEnd now answers
> correctly for files that don't report their size correctly, e.g.
> /dev/urandom and /proc/cpuinfo, whether the files are opened directly or
> redirected through stdin.
> 
> However determining whether stdin from a terminal has reached the end of
> file can't be done without making #atEnd blocking since we have to wait
> for the user to flag the end of file, e.g. by typing Ctrl-D.  And #atEnd
> is assumed to be non-blocking.
> 
> So currently using ZnCharacterReadStream with stdin from a terminal will
> result in a stack dump similar to:
> 
> MessageNotUnderstood: receiver of "<" is nil
> UndefinedObject(Object)>>doesNotUnderstand: #<
> ZnUTF8Encoder>>nextCodePointFromStream:
> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
> ZnCharacterReadStream>>nextElement
> ZnCharacterReadStream(ZnEncodedReadStream)>>next
> UndefinedObject>>DoIt
> 
> 
> Going back through the various suggestions that have been made regarding
> using a sentinel object vs. raising a notification / exception, my
> (still to be polished) suggestion is to:
> 
> 1. Add an endOfStream instance variable
> 2. When the end of the stream is reached answer the value of the
>   instance variable (i.e. the result of sending #value to the variable).
> 3. The initial default value would be a block that raises a Deprecation
>   warning and then returns nil.  This would allow existing code to
>   function for a changeover period.
> 4. At the end of the deprecation period the default value would be
>   changed to a unique sentinel object which would answer itself as its
>   #value.
> 
> At any time users of the stream can set their own sentinel, including a
> block that raises an exception.
> 
> 
> Cheers,
> Alistair
> 
> 
> On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com> wrote:
>> Thanks for this discussion.
>> 
>> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>>> Alistair,
>>> 
>>> First off, thanks for the discussions and your contributions, I really 
>>> appreciate them.
>>> 
>>> But I want to have a discussion at the high level of the definition and 
>>> semantics of the stream API in Pharo.
>>> 
>>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> wrote:
>>>> 
>>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>>>>> Playing a bit devil's advocate, the idea is that, in general,
>>>>> 
>>>>> [ stream atEnd] whileFalse: [ stream next. "..." ].
>>>>> 
>>>>> is no longer allowed ?
>>>> 
>>>> It hasn't been allowed "forever" [1].  It's just been misused for
>>>> almost as long.
>>>> 
>>>> [1] Time began when stdio stream support was introduced. :-)
>>> 
>>> I am still not convinced. Another way to put it would be that the old 
>>> #atEnd or #upToEnd do not make sense for these streams and some new loop is 
>>> needed, based on a new test (it exists for socket streams already).
>>> 
>>> [ stream isDataAvailable ] whileTrue: [ stream next ]
>>> 
>>>>> And you want to replace it with
>>>>> 
>>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue.
>>>>> 
>>>>> That is a pretty big change, no ?
>>>> 
>>>> That's the way quite a bit of code already operates.
>>>> 
>>>> As Denis pointed out, it's obviously problematic in the general sense,
>>>> since nil can be embedded in non-byte oriented streams.  I suspect
>>>> that in practice not many people write code that reads streams from
>>>> both byte oriented and non-byte oriented streams.
>>> 
>>> Maybe yes, maybe no. As Denis' example shows there is a clear definition 
>>> problem.
>>> 
>>> And I do use streams of byte arrays or strings all the time, this is really 
>>> important. I want my parsers to work on all kinds of streams.
>>> 
>>>>> I think/feel like a proper EOF exception would be better, more correct.
>>>>> 
>>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
>>>> 
>>>> I agree, but the email thread Nicolas pointed to raises some
>>>> performance questions about this approach.  It should be
>>>> straightforward to do a basic performance comparison which I'll get
>>>> around to if other objections aren't raised.
>>> 
>>> Reading in bigger blocks, using #readInto:startingAt:count: (which is 
>>> basically Unix's (2) Read sys call), would solve performance problems, I 
>>> think.
>>> 
>>>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ?
>>>> 
>>>> Unix file i/o returns EOF if the end of file has been reach OR if an
>>>> error occurs.  You should still check #atEnd after reading past the
>>>> end of the file to make sure no error occurred.  Another part of the
>>>> primitive change I'm proposing is to return additional information
>>>> about what went wrong in the event of an error.
>>> 
>>> I am sorry, but this kind of semantics (the OR) is way too complex at the 
>>> general image level, it is too specific and based on certain underlying 
>>> implementation details.
>>> 
>>> Sven
>>> 
>>>> We could modify the read primitive so that it fails if an error has
>>>> occurred, and then #atEnd wouldn't be required.
>>>> 
>>>> Cheers,
>>>> Alistair
>>>> 
>>>> 
>>>> 
>>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Nicolas,
>>>>>> 
>>>>>> On 4 April 2018 at 12:36, Nicolas Cellier
>>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>:
>>>>>>>> 
>>>>>>>> Hi Sven,
>>>>>>>> 
>>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote:
>>>>>>>>> Somehow, somewhere there was a change to the implementation of the
>>>>>>>>> primitive called by some streams' #atEnd.
>>>>>>>> 
>>>>>>>> That's a proposed change by me, but it hasn't been integrated yet.  So
>>>>>>>> the discussion below should apply to the current stable vm (from August
>>>>>>>> last year).
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero'
>>>>>>>>> and some virtual unix files like /dev/random are zero sized.
>>>>>>>> 
>>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is
>>>>>>>> effectively defined as:
>>>>>>>> 
>>>>>>>> atEnd := stream position >= stream size
>>>>>>>> 
>>>>>>>> 
>>>>>>>> And, as you say, plenty of virtual unix files report size 0.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Now, all kinds of changes are being done image size to work around 
>>>>>>>>> this.
>>>>>>>> 
>>>>>>>> I would phrase this slightly differently :-)
>>>>>>>> 
>>>>>>>> Some code does the right thing, while other code doesn't.  E.g.:
>>>>>>>> 
>>>>>>>> MultiByteFileStream>>upToEnd is good, while
>>>>>>>> FileStream>>contents is incorrect
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I
>>>>>>>>> am not sure we are doing the right thing here.
>>>>>>>>> 
>>>>>>>>> Point is, I am not sure #next returning nil is official and universal.
>>>>>>>>> 
>>>>>>>>> Consider the comments:
>>>>>>>>> 
>>>>>>>>> Stream>>#next
>>>>>>>>> "Answer the next object accessible by the receiver."
>>>>>>>>> 
>>>>>>>>> ReadStream>>#next
>>>>>>>>> "Primitive. Answer the next object in the Stream represented by the
>>>>>>>>> receiver. Fail if the collection of this stream is not an Array or a
>>>>>>>>> String.
>>>>>>>>> Fail if the stream is positioned at its end, or if the position is out
>>>>>>>>> of
>>>>>>>>> bounds in the collection. Optional. See Object documentation
>>>>>>>>> whatIsAPrimitive."
>>>>>>>>> 
>>>>>>>>> Note how there is no talk about returning nil !
>>>>>>>>> 
>>>>>>>>> I think we should discuss about this first.
>>>>>>>>> 
>>>>>>>>> Was the low level change really correct and the right thing to do ?
>>>>>>>> 
>>>>>>>> The primitive change proposed doesn't affect this discussion.  It will
>>>>>>>> mean that #atEnd returns false (correctly) sometimes, while currently 
>>>>>>>> it
>>>>>>>> returns true (incorrectly).  The end result is still incorrect, e.g.
>>>>>>>> #contents returns an empty string for /proc/cpuinfo.
>>>>>>>> 
>>>>>>>> You're correct about no mention of nil, but we have:
>>>>>>>> 
>>>>>>>> FileStream>>next
>>>>>>>> 
>>>>>>>>      (position >= readLimit and: [self atEnd])
>>>>>>>>              ifTrue: [^nil]
>>>>>>>>              ifFalse: [^collection at: (position := position + 1)]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> which has been around for a long time (I suspect, before Pharo 
>>>>>>>> existed).
>>>>>>>> 
>>>>>>>> Having said that, I think that raising an exception is a better
>>>>>>>> solution, but it is a much, much bigger change than the one I proposed
>>>>>>>> in https://github.com/pharo-project/pharo/pull/1180.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Alistair
>>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> yes, if you are after universal behavior englobing Unix streams, the
>>>>>>> Exception might be the best way.
>>>>>>> Because on special stream you can't allways say in advance, you have to 
>>>>>>> try.
>>>>>>> That's the solution adopted by authors of Xtreams.
>>>>>>> But there is a runtime penalty associated to it.
>>>>>>> 
>>>>>>> The penalty once was so high that my proposal to generalize EndOfStream
>>>>>>> usage was rejected a few years ago by AndreaRaab.
>>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html
>>>>>> 
>>>>>> Thanks for this, I'll definitely take a look.
>>>>>> 
>>>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream
>>>>>> object would compare?
>>>>>> 
>>>>>> It would keep the same coding style, but avoid the problems with nil.
>>>>>> 
>>>>>> Thanks,
>>>>>> Alistair
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> I have regularly benched Xtreams, but stopped a few years ago.
>>>>>>> Maybe i can excavate and pass on newer VM.
>>>>>>> 
>>>>>>> In the mean time, i had experimented a programmable end of stream 
>>>>>>> behavior
>>>>>>> (via a block, or any other valuable)
>>>>>>> http://www.squeaksource.com/XTream.htm
>>>>>>> so as to reconcile performance and universality, but it was a source of
>>>>>>> complexification at implementation side.
>>>>>>> 
>>>>>>> Nicolas
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Note also that a Guille introduced something new, #closed which is
>>>>>>>>> related to the difference between having no more elements (maybe 
>>>>>>>>> right now,
>>>>>>>>> like an open network stream) and never ever being able to produce 
>>>>>>>>> more data.
>>>>>>>>> 
>>>>>>>>> Sven
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>

Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Reply via email to