Ok, I see, so you need to 1) group and count everything group by date and product_id => {date, product_id, count} (this is 1 map+reduce) 2) group this by product_id and get the value of date for which cnt is highest (this is another 1 map+reduce). Does this sound sensible?
I'm not sure if this can be efficiently done with just 1 stage of map+reduce. On Tue, Jul 3, 2012 at 3:36 PM, Shailesh Samudrala <shailesh2...@gmail.com> wrote: > i want to find out how many times a product was searched during a day, and > then select the day when this is highest. > > Until now, I have extracted all the required fields from the search string, > and I am confused about what exactly I should be passing from the mapper to > the reducer. > > On Tue, Jul 3, 2012 at 3:30 PM, Eugene Kirpichov <ekirpic...@gmail.com>wrote: > >> So you want to compute select max(date) from log group by product? >> Can you describe how far you have advanced so far and where precisely >> are you stuck? >> >> On Tue, Jul 3, 2012 at 3:23 PM, Shailesh Samudrala >> <shailesh2...@gmail.com> wrote: >> > I am writing a sample application to analyze some log files of webpage >> > accesses. Basically, the log files record which products where accessed, >> > and on what date. >> > I want to write a MapReduce program to determine on what date was a >> product >> > most accessed. >> > Please share your ideas with me. Thanks! >> >> >> >> -- >> Eugene Kirpichov >> http://www.linkedin.com/in/eugenekirpichov >> -- Eugene Kirpichov http://www.linkedin.com/in/eugenekirpichov