There's a thread on nupic-theory about the cochlea, also prompted by Charlie's talk. Can we agree to keep this thread about his MIDI and audio stuff, and talk about cochleas over on theory?
On Sun, Jun 14, 2015 at 2:23 PM, Fergal Byrne <[email protected]> wrote: > BTW, I suspect you'll need the prediction assisted SP for this to work > well, so Felix' comportex would be worth starting with.. > > On Sun, Jun 14, 2015 at 2:16 PM, Fergal Byrne <[email protected] > > wrote: > >> Great talk, Charlie, thanks for coming and sharing your passion for this >> stuff. I'm about as unmusical as any human, which is still pretty musical, >> so it's great to learn from someone who's been thinking about this so much. >> >> >> Here's a kind of outline for a campaign to do the MIDI analysis. Let me >> know if this makes sense. I have some ideas about the audio to MIDI, but >> they need this working first. >> >> From both a theory and a practice point of view, we're concerned with >> what work needs to be done subcortically (the encoder) and what can be >> learned cortically (in the HTM). Nature will hopefully have made similar >> decisions, based on a similar tradeoff. Encoders have to be evolved (or >> programmed in our case), which is very expensive compared to learning in >> cortex (or just shovelling data into HTM). But you can't just shove raw >> data into HTM, because it needs some semantic encoding to extract structure >> from. You also aren't going to persuade Nature to evolve encoders for music >> in anticipation of it, so there's a strict upper bound on the >> hand-engineering of encoders. >> >> The analogy with vision is important. It suggests we need to have a >> hierarchy (including the encoders) which at the bottom is topologically >> mapped to the midi channels (which are a proxy for cochlear frequency >> bands). To first order, let's ignore velocity values and just pretend it's >> a binary stream. To simplify further (I believe your preprocessor does >> this), we should limit temporal resolution to fixed bins and thus eliminate >> very fine differences in onset and offsets across channels from the >> performance, which will more closely resemble the composition. >> >> OK, now the musical information is contained in the position and timing >> of onsets and (less importantly and usefully) offsets. Assuming we're >> streaming, we won't know when the offsets are until they happen. So a >> simple encoder would have a bit per channel per not-off state - onset, on, >> offset. We usually give these each a small width so they're potentially >> seen by lots of columns, let's say 8 replicated bits per channel per state. >> We can reduce this to 4 or 2 at the extremes if they're seldom used in real >> MIDI files. So the encoding would be say 64 * 3 * 8 + 32 * 3 * 4 + 32 * 3 * >> 2 = 528 * 4 = 2112 bits. >> >> You'd then put a topological HTM layer (L4) on top of this, with each >> column seeing inputs from an octave up and down, say. The SDRs this would >> produce after learning would include localised representations of >> individual chords in each stage of their existence, so the columns for the >> middle C-major chord onset will appear when that chord is first played, >> followed by columns for middle-C major "on", then middle-C major "off" and >> then none of the above. A temporal pooling layer (L2/3) above this will >> show "middle C-major" during the on-time of the chord, and can learn that >> melody using sequence memory. >> >> To track multiple voices, you may be able to use this one region. The >> sequences of chords in L2/3 might be predictable enough to hold together in >> parallel, but you'll still get a union of their SDR's. Or you might need >> another region on top to separate the voices. >> >> You can do key identification simply by counting up the L2/3 chords (per >> voice) used in recent bars of the music. The chords which occur over a >> short period give information about the probable key. Some of the L2/3 >> columns will automatically indicate this because they happen to be >> temporally pooling over exactly the set of chords for one key. A classifier >> or higher region trained to predict just the key (over many melodies, each >> in the same key) will give you the key. >> >> To identify a melody, you'd need to add another region above this >> analysis level, which would take chord SDRs from lower region L2/3 into L4 >> and temporally pool over them to produce melody SDRs. >> >> To learn composer, style or mood, you'd provide the "melody identifier" >> region with encodings of the composer, style or mood as L1 input along with >> the melody data from this level. You'd feed the output of the higher region >> in as L1 predictive input to the L2/3 layer and have all regions learn >> together. This would potentially give you a generative model which might >> reproduce Bach or Beethoven style music. >> >> >> Regards, >> >> Fergal Byrne >> >> >> On Sat, Jun 13, 2015 at 8:19 PM, Matthew Lohbihler < >> [email protected]> wrote: >> >>> Perhaps it's the opposite, and explains tinitis. Being someone who has >>> this, i can confirm that the pitch is precisely constant. >>> >>> >>> On 6/13/2015 2:34 PM, Tim Boudreau wrote: >>> >>> Interesting idea, but for it to be correct, shouldn't we have observed >>> cases of highly pitch-specific deafness - i.e. you lose 435-445Hz, but 446 >>> is fine? >>> >>> -Tim >>> >>> On Sat, Jun 13, 2015 at 2:30 AM, Matthew Taylor <[email protected]> >>> wrote: >>> >>>> Interesting reading: >>>> http://hyperphysics.phy-astr.gsu.edu/hbase/sound/place.html >>>> >>>> Sent from my MegaPhone >>>> >>> >>> >>> >>> -- >>> http://timboudreau.com >>> >>> >>> >> >> >> -- >> >> Fergal Byrne, Brenter IT @fergbyrne >> >> http://inbits.com - Better Living through Thoughtful Technology >> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >> >> Founder of Clortex: HTM in Clojure - >> https://github.com/nupic-community/clortex >> Co-creator @OccupyStartups Time-Bombed Open License >> http://occupystartups.me >> >> Author, Real Machine Intelligence with Clortex and NuPIC >> Read for free or buy the book at https://leanpub.com/realsmartmachines >> >> e:[email protected] t:+353 83 4214179 >> Join the quest for Machine Intelligence at http://numenta.org >> Formerly of Adnet [email protected] http://www.adnet.ie >> > > > > -- > > Fergal Byrne, Brenter IT @fergbyrne > > http://inbits.com - Better Living through Thoughtful Technology > http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne > > Founder of Clortex: HTM in Clojure - > https://github.com/nupic-community/clortex > Co-creator @OccupyStartups Time-Bombed Open License > http://occupystartups.me > > Author, Real Machine Intelligence with Clortex and NuPIC > Read for free or buy the book at https://leanpub.com/realsmartmachines > > e:[email protected] t:+353 83 4214179 > Join the quest for Machine Intelligence at http://numenta.org > Formerly of Adnet [email protected] http://www.adnet.ie > -- Fergal Byrne, Brenter IT @fergbyrne http://inbits.com - Better Living through Thoughtful Technology http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne Founder of Clortex: HTM in Clojure - https://github.com/nupic-community/clortex Co-creator @OccupyStartups Time-Bombed Open License http://occupystartups.me Author, Real Machine Intelligence with Clortex and NuPIC Read for free or buy the book at https://leanpub.com/realsmartmachines e:[email protected] t:+353 83 4214179 Join the quest for Machine Intelligence at http://numenta.org Formerly of Adnet [email protected] http://www.adnet.ie
