Re: Streaming and incremental cooccurrence

Ted Dunning Fri, 17 Apr 2015 19:26:36 -0700

Yes. Also add the fact that the nano batches are bounded tightly in size both 
max and mean. And mostly filtered away anyway.


Aging is an open question. I have never seen any effect of alternative sampling 
so I would just assume "keep oldest" which just tosses more samples. Then 
occasionally rebuild from batch if you really want aging to go right.  

Search updates any more are true realtime also so that works very well. 

Sent from my iPhone

> On Apr 17, 2015, at 17:20, Pat Ferrel <p...@occamsmachete.com> wrote:
> 
> Thanks. 
> 
> This idea is based on a micro-batch of interactions per update, not 
> individual ones unless I missed something. That matches the typical input 
> flow. Most interactions are filtered away by  frequency and number of 
> interaction cuts.
> 
> A couple practical issues
> 
> In practice won’t this require aging of interactions too? So wouldn’t the 
> update require some old interaction removal? I suppose this might just take 
> the form of added null interactions representing the geriatric ones? Haven’t 
> gone through the math with enough detail to see if you’ve already accounted 
> for this.
> 
> To use actual math (self-join, etc.) we still need to alter the geometry of 
> the interactions to have the same row rank as the adjusted total. In other 
> words the number of rows in all resulting interactions must be the same. Over 
> time this means completely removing rows and columns or allowing empty rows 
> in potentially all input matrices.
> 
> Might not be too bad to accumulate gaps in rows and columns. Not sure if it 
> would have a practical impact (to some large limit) as long as it was done, 
> to keep the real size more or less fixed.
> 
> As to realtime, that would be under search engine control through incremental 
> indexing and there are a couple ways to do that, not a problem afaik. As you 
> point out the query always works and is real time. The index update must be 
> frequent and not impact the engine's availability for queries.
> 
> On Apr 17, 2015, at 2:46 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
> 
> When I think of real-time adaptation of indicators, I think of this:
> 
> http://www.slideshare.net/tdunning/realtime-puppies-and-ponies-evolving-indicator-recommendations-in-realtime
> 
> 
>> On Fri, Apr 17, 2015 at 6:51 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>> I’ve been thinking about Streaming (continuous input) and incremental 
>> coccurrence.
>> 
>> As interactions stream in from the user it it fairly simple to use something 
>> like Spark streaming to maintain a moving time window for all input, and an 
>> update frequency that recalcs all input currently in the time window. I’ve 
>> done this with the current cooccurrence code but though streaming, this is 
>> not incremental.
>> 
>> The current data flow goes from interaction input to geometry and user 
>> dictionary reconciliation to A’A, A’B etc. After the multiply the resulting 
>> cooccurrence matrices are LLR weighted/filtered/down-sampled.
>> 
>> Incremental can mean all sorts of things and may imply different trade-offs. 
>> Did you have anything specific in mind?
> 
>

Re: Streaming and incremental cooccurrence

Reply via email to