Hi Martin, others, On 15-12-21 01:59 AM, Martin Hosken wrote: > Dear Behdad, > >> buf = hb.buffer_create () >> +class Debugger(object): >> + def message (self, buf, font, msg, data, _x_what_is_this): >> + print(msg) >> + return True >> +debugger = Debugger() >> +hb.buffer_set_message_func (buf, debugger.message, 1, 0) >> hb.buffer_add_utf8 (buf, text.encode('utf-8'), 0, -1) >> hb.buffer_guess_segment_properties (buf) > > Yippee. At last, a debug interface :) (Behdad reminds me that I have been > asking this once per year for the last 4 years!). Thank you. > > OK. Now to make a great debug interface! > > There are two ways of doing a debug interface: Event driven and One shot. > There are probably more, but those are the only two that come to mind now. > One shot sends all the information needed to give all the debug information > for a debug point in its message. This allows the debugger not to have to > keep state, but just record the results and pass them on. Event driven sends, > well, events to the debugger and requires the debugger to keep state. > > While one shot seems more inviting and is more in line with what Graphite > does. I think for harfbuzz, I would recommend an event based debugger, where > you send debug events at the start and end of every lookup, at recursion, > during initial reordering and shaping, at dotted circle insertion, etc. and > have an enum of events and let the debugger work out what it wants to do with > that information.
Agreed about stateful. > So, I would add an enum to the debug message to give a debug message event > type. My current thinking is that everything is transferred as a text API in one-line messages. The client can transform that to an enum if desired. > One big question that always needs to be answered in the debugger is: where > are we? Where in the buffer are we now processing. This is the idx field of > the buffer. I don't think this is exposed in the public buffer interface. So > it either needs to be exposed or passed as part of the debug message. I'm unsure about this one. We don't expose the out_buf pard of the buffer, so calling client code in the middle of a pass of transformation is harmful currently. Exposing all of that, on the other hand, leaks a lot of the buffer design, which I like to avoid right now. Indeed, we might end up changing the buffer internals to accommodate the lookup direction proposal. So, for now, no callbacks in the middle of a pass. I understand that's far from ideal, but at least we are now answering the big question: which lookup did what. > I suggest that rather than relying on a message to give the lookup number, > that the lookup number be passed as a separate parameter (or in a struct or > whatever). The lookup number can be overloaded based on event type. So we > could have a starting high level phase event type and use the lookup to say > whether that is initial shaping, GSUB, GPOS, etc. for example. Or we could > have different event types for each one. That's up to you. While for regular C APIs I fully agree with you, for this, I'd rather we keep it as a simple string. enums and tagged-union types are a headache for language bindings and even serialization, whereas with 5 lines of code I could get a debugger going on from Python. We just need to document the message syntax completely and it will include all the info that the enum-and-struct approach does; and performance is definitely not a problem here. Plus, with a message-based API, clients can handle unknown messages to a certain degree (eg, printing them out). > I think we need to send a message each shaper pause when the pause occurs. Yes, one at the beginning, another at the end. > For GPOS we need to be passing parameters like the two points in an > attachment or the actual calculated offset in a pair or single adjustment. > When doing classed based activities, we should be passing the class values > involved or perhaps pointers (or offsets) to the data structures involved so > that a debugger can turn cross reference that back to source code. GPOS is more friendly since the buffer structure is fully exposed. Though, deferred attachments won't be exposed. > What does that look like now: > > debug_message(type, buf, idx, lkupidx, void *aptr, void *bptr, msg, ...) > > where aptr and bptr are defined by type and lkupidx and may point to things > like an attachment point record or a lookup record in a class based > contextual lookup or somesuch. They may also point to debugger specific data > structures (perhaps for an attachment point one needs a pointer to the ap > record and 2 floats for the resolved x,y coordinates). That's definitely one thing I *don't* want. > You know, if we get this right, we should be able to drop the msg, ... since > debuggers really don't want to have to parse textual messages. Yes they are > easy for a quick trace, but not for a real debugger. But it's welcome to stay > to make such tracing programs' lives easier, but it shouldn't contain > anything that isn't in the other parameters. If it does, then we need a way > to pass it outside the message. Right. But I really don't want to add 35-and-growing different structs to HarfBuzz, just for debugging, either. Since debuggers can recover whatever structs they want from the message, and this is a side API I like to keep to a minimum in HarfBuzz, the message API wins IMO. > And yes, while I'm trying to define what the kitchen sink is, I'm also trying > to keep this lightweight. > > I know the moment I hit send, I'll think of things I've forgotten! lol. I'm probably going to add shape_plan to list of arguments. After that, if I make a release, the API is here to stay... So, speak very loudly if you think for whatever reason this is not workable. Ie, there are things that cannot be done using a message. I can't think of any. Cheers, behdad _______________________________________________ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz