Good day,

I have a following task: a stream of “page vies” coming to kafka topic. Each 
view contains list of product Ids from a visited page. The task: to have in 
“real time” Top N product.

I am interested in some solution that would require minimum intermediate writes 
… So  need to build a sliding window for top N product, where the product 
counters dynamically changes and window should present the TOP product for the 
specified period of time.

I believe there is no way to avoid maintaining all product counters counters in 
memory/storage.  But at least I would like to do all logic, all calculation on 
a fly, in memory, not spilling multiple RDD from memory to disk.

So I believe I see one way of doing it:
   Take, msg from kafka take and line up, all elementary action (increase by 1 
the counter for the product PID )
  Each action will be implemented as a call to HTable.increment()  // or 
easier, with incrementColumnValue()…
  After each increment I can apply my own operation “offer” would provide that 
only top N products with counters are kept in another Hbase table (also with 
atomic operations).
 But there is another stream of events: decreasing product counters when view 
expires the legth of sliding window….

So my question: does anybody know/have and can share the piece code/ know how: 
how to implement “sliding Top N window” better.
If nothing will be offered, I will share what I will do myself.

Thank you
Alexey

This message, including any attachments, is the property of Sears Holdings 
Corporation and/or one of its subsidiaries. It is confidential and may contain 
proprietary or legally privileged information. If you are not the intended 
recipient, please delete it without reading the contents. Thank you.

Reply via email to