Hi Martin, others,

On 15-12-21 01:59 AM, Martin Hosken wrote:
> Dear Behdad,
>>  buf = hb.buffer_create ()
>> +class Debugger(object):
>> +    def message (self, buf, font, msg, data, _x_what_is_this):
>> +            print(msg)
>> +            return True
>> +debugger = Debugger()
>> +hb.buffer_set_message_func (buf, debugger.message, 1, 0)
>>  hb.buffer_add_utf8 (buf, text.encode('utf-8'), 0, -1)
>>  hb.buffer_guess_segment_properties (buf)
> Yippee. At last, a debug interface :) (Behdad reminds me that I have been 
> asking this once per year for the last 4 years!). Thank you.
> OK. Now to make a great debug interface!
> There are two ways of doing a debug interface: Event driven and One shot. 
> There are probably more, but those are the only two that come to mind now. 
> One shot sends all the information needed to give all the debug information 
> for a debug point in its message. This allows the debugger not to have to 
> keep state, but just record the results and pass them on. Event driven sends, 
> well, events to the debugger and requires the debugger to keep state.
> While one shot seems more inviting and is more in line with what Graphite 
> does. I think for harfbuzz, I would recommend an event based debugger, where 
> you send debug events at the start and end of every lookup, at recursion, 
> during initial reordering and shaping, at dotted circle insertion, etc. and 
> have an enum of events and let the debugger work out what it wants to do with 
> that information.

Agreed about stateful.

> So, I would add an enum to the debug message to give a debug message event 
> type.

My current thinking is that everything is transferred as a text API in
one-line messages.  The client can transform that to an enum if desired.

> One big question that always needs to be answered in the debugger is: where 
> are we? Where in the buffer are we now processing. This is the idx field of 
> the buffer. I don't think this is exposed in the public buffer interface. So 
> it either needs to be exposed or passed as part of the debug message.

I'm unsure about this one.  We don't expose the out_buf pard of the buffer, so
calling client code in the middle of a pass of transformation is harmful
currently.  Exposing all of that, on the other hand, leaks a lot of the buffer
design, which I like to avoid right now.  Indeed, we might end up changing the
buffer internals to accommodate the lookup direction proposal.

So, for now, no callbacks in the middle of a pass.  I understand that's far
from ideal, but at least we are now answering the big question: which lookup
did what.

> I suggest that rather than relying on a message to give the lookup number, 
> that the lookup number be passed as a separate parameter (or in a struct or 
> whatever). The lookup number can be overloaded based on event type. So we 
> could have a starting high level phase event type and use the lookup to say 
> whether that is initial shaping, GSUB, GPOS, etc. for example. Or we could 
> have different event types for each one. That's up to you.

While for regular C APIs I fully agree with you, for this, I'd rather we keep
it as a simple string.  enums and tagged-union types are a headache for
language bindings and even serialization, whereas with 5 lines of code I could
get a debugger going on from Python.  We just need to document the message
syntax completely and it will include all the info that the enum-and-struct
approach does; and performance is definitely not a problem here.

Plus, with a message-based API, clients can handle unknown messages to a
certain degree (eg, printing them out).

> I think we need to send a message each shaper pause when the pause occurs.

Yes, one at the beginning, another at the end.

> For GPOS we need to be passing parameters like the two points in an 
> attachment or the actual calculated offset in a pair or single adjustment. 
> When doing classed based activities, we should be passing the class values 
> involved or perhaps pointers (or offsets) to the data structures involved so 
> that a debugger can turn cross reference that back to source code.

GPOS is more friendly since the buffer structure is fully exposed.  Though,
deferred attachments won't be exposed.

> What does that look like now:
> debug_message(type, buf, idx, lkupidx, void *aptr, void *bptr, msg, ...)
> where aptr and bptr are defined by type and lkupidx and may point to things 
> like an attachment point record or a lookup record in a class based 
> contextual lookup or somesuch. They may also point to debugger specific data 
> structures (perhaps for an attachment point one needs a pointer to the ap 
> record and 2 floats for the resolved x,y coordinates).

That's definitely one thing I *don't* want.

> You know, if we get this right, we should be able to drop the msg, ... since 
> debuggers really don't want to have to parse textual messages. Yes they are 
> easy for a quick trace, but not for a real debugger. But it's welcome to stay 
> to make such tracing programs' lives easier, but it shouldn't contain 
> anything that isn't in the other parameters. If it does, then we need a way 
> to pass it outside the message.

Right.  But I really don't want to add 35-and-growing different structs to
HarfBuzz, just for debugging, either.  Since debuggers can recover whatever
structs they want from the message, and this is a side API I like to keep to a
minimum in HarfBuzz, the message API wins IMO.

> And yes, while I'm trying to define what the kitchen sink is, I'm also trying 
> to keep this lightweight.
> I know the moment I hit send, I'll think of things I've forgotten!


I'm probably going to add shape_plan to list of arguments.  After that, if I
make a release, the API is here to stay...  So, speak very loudly if you think
for whatever reason this is not workable.  Ie, there are things that cannot be
done using a message.  I can't think of any.


HarfBuzz mailing list

Reply via email to