[Lost touch with the list, so I'm trying to catch up here... I did notice that gardena.net is gone - but I forgot that I was using [EMAIL PROTECTED] for this list! *heh*]
> Subject: Re: [linux-audio-dev] more on XAP Virtual Voice ID system > From: Tim Hockin (thockin_AT_hockin.org) > Date: Fri Jan 10 2003 - 00:49:07 EET > > > The plugin CAN use the VVID table to store flags about the > > > voice, > > > as you suggested. I just want to point out that this is > > > essentially the same as the plugin communicating to the host > > > about > > > voices, just more passively. > > > > Only the host can't really make any sense of the data. > > If flags are standardized, it can. Int32: 0 = unused, +ve = plugin > owned, -ve = special meaning. Sure. I just don't see why it would be useful, or why the VVID subsystem should be turned into some kind of synth status API. > > > It seems useful. > > > > Not really, because of the latency, the polling requirement and > > the coarse timing. > > When does the host allocate from the VVID list? Between blocks. As > long as a synth flags or releases a VVID during it's block, the > host benefits from it. The host has to keep a list of which VVIDs > it still is working with, right? No. Only the one who *allocated* a VVID can free it - and that means the sender. If *you* allocate a VVID, you don't want the host to steal it back whenever the *synth* decides it doesn't need the VVID any more. You'd just have to double check "your" VVIDs whenever you send events - and this just to support something that's really a synth implementation detail that just happens to take advantage of a host service. > > > If the plugin can flag VVID table entries as released, the host > > > can have a better idea of which VVIDs it can reuse. > > > > Why would this matter? Again, the host does *not* do physical > > voice management. > > > > You can reuse a VVID at any time, because *you* know whether or > > not you'll need it again. The synth just doesn't care, as all it > > will > > right, but if you hit the ned of the list and loop back to the > start, you need to find the next VVID that is not in use by the > HOST. No, you just need to find the next VVID that *you're* not using, and reassign that to a new context. (ALLOC_VVID or whatever.) You don't really care whether or not the synth's version of a context keeps running for some time after you stop sending events for context; only the synth does - and if you're not going to send any more events for a context, there's no need to keep the VVID. > That can include VVIDs that have ended spontaneously (again, > hihat sample or whatever). VVIDs can't end spontaneously. Only synth voices can, and VVIDs are only temporary references to voices. A voice may detach itself from "it's" VVID, but the VVID is still owned by the sender, and it's still effectively bound to the same context. BTW, this means that synths should actually keep the voice control state for a VVID until it actually knows the context has ended. Normally, this just means that voices don't really detach themselves from VVIDs, but rather just go to sleep, until stolen or woken up again. That is, synths with "virtual voices" might actually have use for a "DETACH_VVID" event. Without it, they basically have to keep both real and virtual voices indefinitely. Not sure it actually matters much, though. Performance wise, it just means you have to deal with voice controls and their ramping (if supported). And since a ramp is actually two events (ramp event with "aim point" + terminator event or new ramp event), and given that ramping across blocks (*) is not allowed, it still means "no events, no processing." (*) I think I've said this before, but anyway: I don't think making ramping accross block boundaries illegal is a good idea. Starting a ramp is actually setting up a *state*. (The receiver transforms the event into a delta value that's applied to the value every sample frame.) It doesn't make sense to me to force senders to explicitly set a new state at the start of each block. Indeed, the fact that ramping events have target and duration arguments looks confusing, but really, it *is* an aim point; not a description of a ramp with a fixed duration. If this was a perfect world (without rounding errors...), you would have sent the delta value directly, but that just won't work in real life. If someone can come up with a better aim point format than <target, duration>, I'm all ears, because it really *is* confusing. It suggests that RAMP events don't behave like SET events, but that's just not the case. The only difference is that RAMP events set the internal "dvalue", while SET events set the "value", and zero "dvalue". > The host just needs to discard any > currently queued events for that (expired) VVID. The plugin is > already ignoreing them. The plugin is *not* ignoring them. It may route them to a "null voice", but that's basically an emergency action taken only when running out of voices. Normally, a synth would keep tracking events per VVID until the VVID is explicitly detached (see above), or the synth steals whatever object is used for the tracking. Again, DETACH_VVID might still be a good idea. Synths won't be able to steal the right voices if they can't tell passive contexts from dead contexts... [...] > > A bowed string instrument is "triggered" by the bow pressure and > > speed exceeding certain levels; not directly by the player > > thinking > > Disagree. SOUND is triggered by pressure/velocity. The instrument > is ready as soon as bow contacts the string. Well, you don't need a *real* voice until you need to play sound, do you? Either way, the distinction is a matter of synth implementation, which is why I think it should be "ignored" by the API. The API should not enforce the distinction, nor prevent synths from making use of the distinction. > > No, I see a host sending continous control data to an init-latched > > synth. This is nothing that an API can fix automatically. > > Ok, let me make it more clear. Again, same example. The host wants > to send 7 parameters to the Note-on. It sends 3 then VELOCITY. But > as soon as VELOCITY is received 'init-time' is over. This is bad. Yes, it's event ordering messed up. This will never happen unless the events are *created* out of order, or mixed up by some event processor. (Though, I can't see how an event processor could reorder incoming events while doing something useful. Remember; we're talking about real time events here; not events in a sequencer database.) > The host has to know which control ends init time. Why? So it can "automatically" reorder events at some point? > Thus the > NOTE/VOICE control we seem to be agreeing on. Indeed, users, senders and some event processors might need to know which events are "triggers", and which events are latched rather than continous. For example, my Wacom tablet would need to know which of X, Y, X-tilt, Y-tilt, Pressure and Distance to send last, and as what voice control. It may not always be obvious - unless it's always the NOTE/VOICE/GATE control. The easiest way is to just make one event the "trigger", but I'm not sure it's the right thing to do. What if you have more than one control of this sort, and the "trigger" is actually a product of both? Maybe just assume that synths will use the standardized NOTE/VOICE/GATE control for one of these, and act as if that was the single trigger? (It would have to latch initializers based on that control only, even if it doesn't do anything else that way.) [...] > > If it has no voice controls, there will be no VVIDs. You can still > > allocate and use one if you don't want to special case this, > > though. Sending voice control events to channel control inputs is > > safe, since the receiver will just ignore the 'vvid' field of > > events. > > I think that if it wants to be a synth, it understands VVIDS. It > doesn't have to DO anything with them, but it needs to be aware. Right, but I'm not even sure there is a reason why they should be aware of VVIDs. What would a mono synth do with VVIDs that anyone would care about? > And the NOTE/VOICE starter is a voice-control, so any Instrument > MUST have that. This is very "anti modular synth". NOTE/VOICE/GATE is a control type hint. I see no reason to imply that it can only be used for a certain kind of controls, since it's really just a "name" used by users and/or hosts to match ins and outs. Why make Channel and Voice controls more different than they have to be? * Channel->Channel: * Voice->Voice: Just make the connection. These are obviously 100% compatible. * Voice->Channel: Make the connection, and assume the user knows what he/she is doing, and won't send polyphonic data this way. (The Channel controls obviously ignore the extra indexing info in the VVIDs.) * Channel->Voice: This works IFF the synth ignores VVIDs. You could have channel/voice control "mappers" and stuff, but I don't see why they should be made more complicated than necessary, when in most cases that make sense, they can actually just be NOPs. About VVID management: Since mono synths won't need VVIDs, host shouldn't have to allocate any for them. (That would be a waste of resources.) The last case also indicates a handy shortcut you can take if you *know* that VVIDs won't be considered. Thus, I'd suggest that plugins can indicate that they won't use VVIDs. [...] > > Why? What does "end a voice" actually mean? > > It means that the host wants this voice to stop. If there is a > release phase, go to it. If not, end this voice (in a > plugin-dpecific way). > Without it, how do you enter the release phase? Right, then we agree on that as well. What I mean is just that "end a voice" doesn't *explicitly* kill the voice instantly. What might be confusing things is that I don's consider "voice" and "context" equivalent - and VVIDs refer to *contexts* rather than voices. There will generally be either zero or one voice connected to a context, but the same context may be used to play several notes. > > >From the sender POV: > > I'm done with this context, and won't send any more events > > referring to it's VVID. > > No. It means I want the sound on this voice to stop. It implies the > above, too. After a VOICE_OFF, no more events will be sent for this > VVID. That just won't work. You don't want continous pitch and stuff to work except when the note is on? Stopping a note is *not* equivalent to releasing the context in which it was played. Another example that demonstrates why this distinction matters would be a polyphonic synth with automatic glisando. (Something you can hardly get right with MIDI, BTW. You haveto use multiple monophonic channels, or trust the synth to be smart enough to do the right thing.) Starting a new note on a VVID when a previous note is still in the release phase would cause a glisando, while if the VVID has no playing voice, one would be activated and started as needed to play a new note. The sender can't reliably know which action will be taken for each new note, so it really *has* to be left to the synth to decide. And for this, the lifetime of VVIDs/contexts need to span zero or more notes, with no upper limit. > > >From the synth POV: > > The voice assigned to this VVID is now silent and passive, > > More, the VVID is done. No more events for this VVID. Nope, not unless *both* the synth and the sender have released the VVID. > The reason > that VVID_ALLOC is needed at voice_start is because the host might > never have sent a VOICE_OFF. Or maybe we can make it simpler: If the host/sender doesn't sent VOICE_OFF when needed, it's broken, just like a MIDI sequencer that forgets to stop playing notes when you hit the stop button. And yes, this is another reason to somehow mark the VOICE/NOTE/GATE control as special. > Host turns the NOTE/VOICE on. > It can either turn the NOTE/VOICE off or DETACH it. Here your > detach name makes more sense. VOICE_OFF and DETACH *have* to be separate concepts. (See above.) > A step sequencer would turn a note > on, then immediately > detach. It would have to send note off as well I think, or we'd have another special case to make all senders compatible with all synths. (And if "note on" is actually a change of the VOICE/GATE control from 0 to 1, you *have* to send an "off" event as well, or the synth won't detect any further "note on" events in that context.) > > assumed to be more special than it really is. > > NOTE/VOICE_ON/VOICE_OFF > > is a gate control. What more do you need to say about it? > > Only if you assume a voice lives forever, which is wasteful. It may be wasteful to use real, active voices just to track control changes, but voice control tracking cannot be avoided. > Besides that, a gate that gets turned off and on and off and on > does not restart a voice, just mutes it temporarily. Not pause, not > restart - mute. Who says? I think that sounds *very* much like a synth implementation thing - but point taken; "GATE" is probably not a good name. [...] > Well, you CAN change Program any time you like - it is not a > per-voice control. In fact, on some MIDI synths, you have to assume it is, sort of. Sending a PC to a Roland JV-1080 has it instantl kill any notes playing on that channel, go to sleep for a few hundreds of a second, and then process any events that might have arrived for the channel during the "nap". This really sucks, but that's the way it works, and it's probably not the only synth that does this. (The technical reason is most probably that spare "patch slots" would have been required to do it in any other way - and as I've discovered with Audiality, that's not as trivial to get right as it might seem at first. You have to let the old patch see *some* of the new events for the channel, until the old patch decides to die.) AWE, Live! and Audigy cards don't do it this way - but PC is *still* not an ordinary control. Playing notes remain controlled by the old patch until they receive their NoteOffs. PC always has to occur *before* new notes. Either way, MIDI doesn't have many voice controls at all, and our channel controls are more similar to MIDI (Channel) CCs in some ways. (Not addressed by note pitch, most importantly.) That is, they can't be compared directly - but the concept that some controls must be sent before they're latched to have the desired effect is still relevant. [...] > Idea 2: similar to idea 1, but less explicit. > -- INIT: > send SET(new_vvid, ctrl) /* implicitly creates a voice */ > send VOICE_ON(new_vvid) /* start the vvid */ > -- RELEASE: > send SET(new_vvid, ctrl) /* send with time X */ > send VOICE_OFF(vvid) /* also time X - plug 'knows' it was for > release */ I see why you don't like this. You're forgetting that it's the *value* that is the "initializer" for the VOICE_OFF action; not the SET event that brings it. Of course the plugin "knows" - the last set put a new value into the control that the VOICE_OFF action code looks at! :-) A synth is a state machine, and the events are just what provides it with data and - directly or indirectly - triggers state changes. We have two issues to deal with, basically: 1. Tracking of voice controls. 2. Allocation and control of physical voices. The easy way is to assume that you use a physical voice whenever you need to track voice controls, but that's just an assumption that a synth author would make to make the implementation simpler. It doesn't *have* to be done that way. If 1 and 2 are handled as separate things by a synth, 2 becomes an implementation issue *entirely*. Senders and hosts don't really have a right to know anything about this - mostly because there are so many ways of doing it that it just doesn't make sense to pretend that anyone cares. As to 1, that's what we're really talking about here. When do you start and stop tracking voice controls? Simple: When you get the first control for a "new" VVID, start tracking. When you know there will be no more data for that VVID, or that you just don't care anymore (voice and/or context stolen), stop tracking. So, this is what I'm sugesting ( {X} means loop X, 0+ times ) : * Context allocation: // Prepare the synth to receive events for 'my_vvid' send(ALLOC_VVID, my_vvid) // (Control tracking starts here.) { * Starting a note: // Set up any latched controls here send(CONTROL, <whatever>, my_vvid, <value>) ... // (Synth updates control values.) // Start the note! send(CONTROL, VOICE, my_vvid, 1) // (Synth latches "on" controls and (re)starts // voice. If control tracking is not done by // real voices, this is when a real voice would // be allocated.) * Stopping a note: send(CONTROL, <whatever>, my_vvid, <value>) ... // (Synth updates control values.) // Stop the note! send(CONTROL, VOICE, my_vvid, 0) // (Synth latches "off" controls and enters the // release phase.) * Controling a note (even in release phase!): send(CONTROL, <whatever>, my_vvid, <value>) // (Synth updates control value.) } * Context deallocation: // Tell the synth we won't talk any more about 'my_vvid' send(DETACH_VVID, my_vvid) // (Control tracking stops here.) This still contains a logic flaw, though. Continous control synths won't necessarily trigger on the VOICE control changes. Does it make sense to assume that they'll latch latched controls at VOICE control changes anyway? It seems illogical to me, but I can see why it might seem to make sense in some cases... //David Olofson - Programmer, Composer, Open Source Advocate .- The Return of Audiality! --------------------------------. | Free/Open Source Audio Engine for use in Games or Studio. | | RT and off-line synth. Scripting. Sample accurate timing. | `---------------------------> http://olofson.net/audiality -' --- http://olofson.net --- http://www.reologica.se ---