On Fri, Nov 30, 2018 at 10:28 PM Lukasz Cwik <[email protected]> wrote:
>
> On Fri, Nov 30, 2018 at 12:47 PM Robert Bradshaw <[email protected]> wrote:
>>
>> On Fri, Nov 30, 2018 at 7:10 PM Lukasz Cwik <[email protected]> wrote:
>> >
>> > Uh, I'm not sure what your asking.
>>
>> I'm asking why we wanted a markDone in the first place.
>
> When looking at the byte key restriction tracker code, I found a couple of
> bugs around how ranges were being compared and how the byte key range was
> being claimed (we weren't computing the next key correctly). The usage of
> markDone seemed to be a crutch when attempting to correctly implement the
> tryClaim code. Also, all the framework code that powered SDF wasn't aware of
> markDone so it couldn't validate that the last claim failed. So I fixed the
> tryClaim code and then didn't need markDone and removed it since this was the
> only restriction tracker that had it.
>
>> > The SDF API already has a void return on processElement means that a call
>> > to tryClaim must have returned false
>>
>> We could widen this to "or finished the restriction."
>
> Yes, having markDone could be added to the API. Is it a crutch for subtle
> bugs in tryClaim though?
I'm proposing removing the requirement of having either a markDone or
a tryClaim(EndKey).
>> > while a non void return allows the caller to either return STOP (tryClaim
>> > must have returned false) or return RESUME (with a time of when to resume).
>>
>> We could also return STOP if tryClaim never returned false but the
>> restriction was finished.
>>
>> > This allows the framework code to prevent user errors by ensuring the
>> > restriction has been completed.
>>
>> I don't think the framework can ensure this. (It can enforce the above
>> constraints that on a STOP tryClaim did indeed return false on the
>> last call, but I'm fuzzy on the value this actually provides when it
>> just means the use must artificially force it to return a false value.
>> It also means we can't make it an error to try claiming values outside
>> the initial restriction. If we want to make things more explicit, we
>> could require a STOP or RESUME return rather than allow a void
>> return.)
>
> I don't think we want SDF authors to ensure that their values are in the
> initial range first before attempting to claim them as this is the purpose of
> tryClaim. The SDF code would then be checking that the range is valid twice.
>
> processElement() {
> readElement
> isElementInRange?
> if (!tryClaim) {
> return
> }
> }
> (both isElementInRange and tryClaim are now doing the same bounds checking
> which can lead to subtle bounds checking errors).
Generally code would be iterating over the range, and it would likely
be a bug to check past it, but if we want to support code that ignores
range.getEndPosition() and lets tryClaim do all the work I buy that as
a good argument to allow arbitrary claim attempts.
>> Maybe I'm just not clever enough to come up with a kind of source
>> where this could be good at catching errors?
>
> I think the value is that we expect to implement a few restriction tracker
> classes which will be re-used across many SDF implementations. In this case,
> we could point out to the SDF author that they haven't claimed all that they
> said they would process. This would be true whether markDone existed or not.
The general pattern is
processRestriction() {
for (element : source[restriction]) {
if (!tryClaim(element)) {
return STOP
} else {
emit(element)
}
}
tryClaim(everything)
return STOP // or if CONTINUE is returned, omit the above line
}
and I'm having a hard time coming up with any bugs that would be
caught if we didn't require the (seemingly boilerplate)
tryClaim(everything) line. Maybe I'm not thinking of the right source?
>> > Also, "" is the byte key range, the code could have just passed in
>> > range.getEndPosition() in to the final tryClaim, its just that "" is
>> > shorthand and would be similar to passing in Long.MAX_VALUE for the file
>> > offset range.
>>
>> Having to choose a value pass depending on the restriction tracker
>> type is something that could simply be eliminated.
>>
>> > On Fri, Nov 30, 2018 at 2:45 AM Robert Bradshaw <[email protected]>
>> > wrote:
>> >>
>> >> In looking at the SDF examples, it seems error-prone to have to
>> >> remember to write
>> >>
>> >> tryClaim([fake-end-position])
>> >>
>> >> to indicate that a restriction is finished. IIRC, this was done to
>> >> decide whether the entire restriction had been processed on return in
>> >> the case that tryClaim never returned false. It seems preferable to
>> >> encode this into the return value (with a void return meaning iff
>> >> tryClaim returned false, and a non-void return being able to indicate
>> >> any hints as to when, if ever, process should be called again).
>> >>
>> >> Can someone job my memory as to if there was a case in which this
>> >> wouldn't work?