Re: Streaming and incremental cooccurrence

Pat Ferrel Fri, 24 Apr 2015 08:15:06 -0700

Ok, seems right.

So now to data structures. The input frequency vectors need to be paired with 
each input interaction type and would be nice to have as something that can be 
copied very fast as they get updated. Random access would also be nice but 
iteration is not needed. Over time they will get larger as all items get 
interactions, users will get more actions and appear in more vectors (with 
multi-intereaction data). Seems like hashmaps?

The cooccurrence matrix is more of a question to me. It needs to be updatable 
at the row and column level, and random access for both row and column would be 
nice. It needs to be expandable. To keep it small the keys should be integers, 
not full blown ID strings. There will have to be one matrix per interaction 
type. It should be simple to update the Search Engine to either mirror the 
matrix of use it directly for index updates. Each indicator update should cause 
an index update.

Putting aside speed and size issues this sounds like a NoSQL DB table that is 
cached in-memeory. 

On Apr 23, 2015, at 3:04 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

On Thu, Apr 23, 2015 at 8:53 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> This seems to violate the random choice of interactions to cut but now
> that I think about it does a random choice really matter?
> 

It hasn't ever mattered such that I could see.  There is also some reason
to claim that earliest is best if items are very focussed in time.  Of
course, the opposite argument also applies.  That leaves us with empiricism
where the results are not definitive.

So I don't think that it matters, but I don't think that it does.

Re: Streaming and incremental cooccurrence

Reply via email to