Hi,
----- Original Message ---- > From: Ian Holsman <[EMAIL PROTECTED]> > To: [email protected] > Sent: Friday, October 10, 2008 11:18:19 PM > Subject: Re: What are the business cases for collaborative filtering? > > Otis Gospodnetic wrote: > > > > Building a data-collection mechanism, storage mechanism, and figuring out > how to feed the data to Taste, do so quickly, frequently enough, etc. > > > > Otis > > > > > hmm. sounds like a good subproject. > currently we are using a custom piece of code hooked up via the apache > logwriter > to feed the data into HDFS and then run stuff. > > but it would be good to have something that does it in real time too Heh, it sounds like we are going through similar steps. I first wrote a simple "beacon servlet" for tracking purposes. Then opted for a simpler (and more static) pixel tracker and a web server (nginx) logging and a log parser that is supposed to process that log and store it to _____ (not sure where, yet, didn't get there) and then from there get it to Taste. This, of course, means more batch oriented processes. Going with the beacon servlet approach could *presumably* do something closer to real-time recommendations.... Ian, can you elaborate on the "feed data into HDFS" part? You simply store it in HDFS? Why HDFS? Why not some other FS or why not a RDBMS? What happens to your data after you store it in the HDFS? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
