You can import the web logs into HDFS, and then use Pig or Hive to do data
analysis.
See
http://hadoop.apache.org/pig/
http://hadoop.apache.org/hive/


On Thu, Jul 8, 2010 at 5:55 PM, Tim Jones <bogol...@ymail.com> wrote:

> Hi,
>
>
> I want to be able to discover the 10 most popular routes through our  web
> site
> that lead a visitor to register with us.
>
> I am already logging page view data but don't seem to be able to  find the
> best
> solution to query it. (Each Visitor has an ID, each Visitor makes multiple
> Visits, each with an ID, and each page request has an ID. I can join across
> all
> of these fields).
>
> The site is quite high traffic, > 3 million page views per month,  so the
> solution needs to scale. That said, I envisage that for the next year or so
> the
> dataset would fit on a single machine.
>
> So far my research on Hadoop seems to suggest that it's generally only
> beneficial to use it if the dataset is large enough to warrant running a
> couple
> of Hadoop nodes.
>
> Is this problem even suited to Hadoop? How might I go about solving this
> problem?
>
>
>
> My initial thought was to use a graph database, and traverse from the
> registration page node outwards, selecting the next node as that which has
> the
> most links between itself and the registration node. I'm running into
> difficulties here, and wondered whether Hadoop might offer an alternative
> approach.
>
>
> Any pointers would be greatly appreciated.
>
> Thanks,
> Tim
>
>
>
>
>
>


-- 
Best Regards

Jeff Zhang

Reply via email to