Of course it won't scale or at least not as good as your suggested
model. Chances are good that my idea is not an option for a
production-system and not as usefull as the less-complex variant. So you
are right!

The reason why I asked was to get an idea of what should be done, if a
record is too big to be processable by a node.

Regards,
Em

Am 19.07.2011 19:54, schrieb Steve Lewis:
> I assumed the problem was count the number of people visiting Moscow
> after London without considering iany intermediate stops. This leads to
> a data structure which is easy to combine. The structure you propose
> adds more information and is difficult to combine. I doubt it could
> handle a billion people and  recommend trying with a hundred people
> visiting 5 out of 20 destinations in random order to see how bad it is
> getting. 
> 
> My schema can handle billions of combinations assuming only that the
> total destinations in any node can be handled - i.e. a billion people
> can visit any of a thousand cities in random order and worst case I need
> a thousand cities and a thousand counts - now I doubt that the schema
> you propose with added order information will scale to those levels
> 
> On Tue, Jul 19, 2011 at 10:39 AM, Em <mailformailingli...@yahoo.de
> <mailto:mailformailingli...@yahoo.de>> wrote:
> 
>     Thanks!
> 
>     So you invert the data and than walk through each inverted result.
>     Good point!
>     What do you think about prefixing each city-name with the index in
>     the list?
> 
>     This way you can say:
>     London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
>     3_Berlin:1...
> 
>     >From this list you can see that people are likely to visit moscow right
>     after london at their first or second journey. This would maintain a
>     strong order (whether that's good or bad depends on a
>     real-world-scenario).
> 
>     Since your ideas gave me a good starting-point for realizing this job
>     (I'll practice it), we can make the problem more heavy-weight, if
>     you like?
> 
>     What happens to records that are too big to be processable by one node?
>     Let's say from my above example of a strongly-ordered list one gets a
>     billion combinations - way too much for one node (we assume that).
>     What possibilities does Hadoop offer to deal with such things?
> 
>     Regards and many thanks for the insights,
>     Em
> 
> 
>     Am 19.07.2011 19:15, schrieb Steve Lewis:
>     > Assume Joe visits Washington, London, Paris and Moscow
>     >
>     > You start with records like
>     > Joe:Washington:20-Jan-2011
>     > Joe:London:14-Feb2011
>     > Joe:Paris :9-Mar-2011
>     >
>     > You want
>     > Joe: Washington, London, Paris and Moscow
>     >
>     > For the next step the person is irrelevant
>     > you want
>     >
>     >
>     > Washington:  London:1, Paris:1 ,Moscow:1
>     >  London: , Paris:1  Moscow:1
>     >  Paris:   Moscow:1
>     > The first say after a visit to Washington there was one visit to
>     London,
>     > one to Paris and one to Moscow
>     >
>     >
>     > This can be combined with the one from Joe
>     >
>     >
>     > Now suppose Bill visits London and Moscow
>     > So he generates
>     > London:    Moscow:1
>     >
>     > This can be combined with the one from Joe saying  London: ,
>     Paris:1 and
>     > Moscow:1
>     >  to give
>     >
>     >  London: , Paris:1 and Moscow:2
>     >
>     > Now suppose Sue visits London and  Riga and Paris
>     > So she generates
>     > London: , Paris:1,Riga 1
>     >
>     > This can be combined with  London: , Paris:1 and Moscow:2 to give
>     >
>     > London: , Paris:2 and Moscow:2,Riga 1
>     >
>     > Note I can keep places in alphabetical order in the result
>     >
>     >
>     >
>     > On Tue, Jul 19, 2011 at 9:53 AM, Em <mailformailingli...@yahoo.de
>     <mailto:mailformailingli...@yahoo.de>
>     > <mailto:mailformailingli...@yahoo.de
>     <mailto:mailformailingli...@yahoo.de>>> wrote:
>     >
>     >     Hi Steven,
>     >
>     >     thanks for your response! For the ease of use we can make those
>     >     assumptions you made - maybe this makes it much easier to
>     help. Those
>     >     little extras are something for after solving the "easy"
>     version of the
>     >     task. :)
>     >
>     >     What do you mean with the following?
>     >
>     >     > The second job takes Person : list of places and return for
>     each place
>     >     > in the list consructs
>     >     > place : 1 | place after P : 1 | next place : 1 ...
>     >
>     >     You mean something like that?
>     >
>     >     Washington DC:1
>     >     New York after Washington DC:1
>     >     Miami after New York:1
>     >
>     >     I do not see the benefit for the result I like to get?
>     >
>     >     The end-result should be something like that:
>     >     Washington DC => New York, Miami, Los Angeles
>     >     New York => Chicago, Seattle, San Francisco
>     >
>     >     The point is, that one can see that persons that visited
>     Washington DC
>     >     are likely to visit New York as the next place, Miami as the
>     second and
>     >     L.A. as the third.
>     >     However, if I choose New York as my starting point, I can see that
>     >     persons that start their journey in New York (and maybe
>     weren't in DC
>     >     before) are likely to visit Chicago, Seattle and San
>     Francisco. Maybe
>     >     Los Angeles comes at the 10th position.
>     >
>     >     Regards,
>     >     Em
>     >
>     >
>     >
>     >
>     > --
>     > Steven M. Lewis PhD
>     > 4221 105th Ave NE
>     > Kirkland, WA 98033
>     > 206-384-1340 <tel:206-384-1340> (cell)
>     > Skype lordjoe_com
>     >
>     >
> 
> 
> 
> 
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
> 
> 

Reply via email to