Correction*

Try this

Map: emit key=product_id value=new MyMap<product_id,timestamp>()

In the reducer output u can place

for a given product_id
                    MyMap.size() -> no of times a product was searched.
                    MyMap.max() -> date with the highest count


*MyMap *state should be preserved across multiple calls of the map method
i.e for each input key-value pair.

On Wed, Jul 4, 2012 at 12:21 PM, Sambit Tripathy <sambi...@gmail.com> wrote:

> Try this
>
> Map: emit key=product_id value=new MyMap<product_id,timestamp>()
>
> In the reducer output u can place
>
> for a given product_id
>                     HashMap.size() -> no of times a product was searched.
>                     HashMap.max() -> date with the highest count
>
>
>
> On Wed, Jul 4, 2012 at 4:48 AM, Eugene Kirpichov <ekirpic...@gmail.com>wrote:
>
>> Well, then you can simply do it like this:
>> Map: emit key=product_id value=date
>> Reduce for a particular product_id: manually count (in a hashtable)
>> dates and their counts, return the date with the highest count
>>
>> Assuming you've started selling products later than computers were
>> invented, this should be fine w.r.t. performance and memory
>> consumption :)
>>
>> On Tue, Jul 3, 2012 at 3:52 PM, Shailesh Samudrala
>> <shailesh2...@gmail.com> wrote:
>> > Yes, I think that is possible, but I'm looking for a 1 MapReduce job
>> > solution, if possible.
>> >
>> > On Tue, Jul 3, 2012 at 3:46 PM, Eugene Kirpichov <ekirpic...@gmail.com
>> >wrote:
>> >
>> >> Ok, I see, so you need to 1) group and count everything group by date
>> >> and product_id => {date, product_id, count} (this is 1 map+reduce) 2)
>> >> group this by product_id and get the value of date for which cnt is
>> >> highest (this is another 1 map+reduce).
>> >> Does this sound sensible?
>> >>
>> >> I'm not sure if this can be efficiently done with just 1 stage of
>> >> map+reduce.
>> >>
>> >> On Tue, Jul 3, 2012 at 3:36 PM, Shailesh Samudrala
>> >> <shailesh2...@gmail.com> wrote:
>> >> > i want to find out how many times a product was searched during a
>> day,
>> >> and
>> >> > then select the day when this is highest.
>> >> >
>> >> > Until now, I have extracted all the required fields from the search
>> >> string,
>> >> > and I am confused about what exactly I should be passing from the
>> mapper
>> >> to
>> >> > the reducer.
>> >> >
>> >> > On Tue, Jul 3, 2012 at 3:30 PM, Eugene Kirpichov <
>> ekirpic...@gmail.com
>> >> >wrote:
>> >> >
>> >> >> So you want to compute select max(date) from log group by product?
>> >> >> Can you describe how far you have advanced so far and where
>> precisely
>> >> >> are you stuck?
>> >> >>
>> >> >> On Tue, Jul 3, 2012 at 3:23 PM, Shailesh Samudrala
>> >> >> <shailesh2...@gmail.com> wrote:
>> >> >> > I am writing a sample application to analyze some log files of
>> webpage
>> >> >> > accesses. Basically, the log files record which products where
>> >> accessed,
>> >> >> > and on what date.
>> >> >> > I want to write a MapReduce program to determine on what date was
>> a
>> >> >> product
>> >> >> > most accessed.
>> >> >> > Please share your ideas with me. Thanks!
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Eugene Kirpichov
>> >> >> http://www.linkedin.com/in/eugenekirpichov
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Eugene Kirpichov
>> >> http://www.linkedin.com/in/eugenekirpichov
>> >>
>>
>>
>>
>> --
>> Eugene Kirpichov
>> http://www.linkedin.com/in/eugenekirpichov
>>
>
>

Reply via email to