Of course it won't scale or at least not as good as your suggested model. Chances are good that my idea is not an option for a production-system and not as usefull as the less-complex variant. So you are right!
The reason why I asked was to get an idea of what should be done, if a record is too big to be processable by a node. Regards, Em Am 19.07.2011 19:54, schrieb Steve Lewis: > I assumed the problem was count the number of people visiting Moscow > after London without considering iany intermediate stops. This leads to > a data structure which is easy to combine. The structure you propose > adds more information and is difficult to combine. I doubt it could > handle a billion people and recommend trying with a hundred people > visiting 5 out of 20 destinations in random order to see how bad it is > getting. > > My schema can handle billions of combinations assuming only that the > total destinations in any node can be handled - i.e. a billion people > can visit any of a thousand cities in random order and worst case I need > a thousand cities and a thousand counts - now I doubt that the schema > you propose with added order information will scale to those levels > > On Tue, Jul 19, 2011 at 10:39 AM, Em <mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de>> wrote: > > Thanks! > > So you invert the data and than walk through each inverted result. > Good point! > What do you think about prefixing each city-name with the index in > the list? > > This way you can say: > London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1, > 3_Berlin:1... > > >From this list you can see that people are likely to visit moscow right > after london at their first or second journey. This would maintain a > strong order (whether that's good or bad depends on a > real-world-scenario). > > Since your ideas gave me a good starting-point for realizing this job > (I'll practice it), we can make the problem more heavy-weight, if > you like? > > What happens to records that are too big to be processable by one node? > Let's say from my above example of a strongly-ordered list one gets a > billion combinations - way too much for one node (we assume that). > What possibilities does Hadoop offer to deal with such things? > > Regards and many thanks for the insights, > Em > > > Am 19.07.2011 19:15, schrieb Steve Lewis: > > Assume Joe visits Washington, London, Paris and Moscow > > > > You start with records like > > Joe:Washington:20-Jan-2011 > > Joe:London:14-Feb2011 > > Joe:Paris :9-Mar-2011 > > > > You want > > Joe: Washington, London, Paris and Moscow > > > > For the next step the person is irrelevant > > you want > > > > > > Washington: London:1, Paris:1 ,Moscow:1 > > London: , Paris:1 Moscow:1 > > Paris: Moscow:1 > > The first say after a visit to Washington there was one visit to > London, > > one to Paris and one to Moscow > > > > > > This can be combined with the one from Joe > > > > > > Now suppose Bill visits London and Moscow > > So he generates > > London: Moscow:1 > > > > This can be combined with the one from Joe saying London: , > Paris:1 and > > Moscow:1 > > to give > > > > London: , Paris:1 and Moscow:2 > > > > Now suppose Sue visits London and Riga and Paris > > So she generates > > London: , Paris:1,Riga 1 > > > > This can be combined with London: , Paris:1 and Moscow:2 to give > > > > London: , Paris:2 and Moscow:2,Riga 1 > > > > Note I can keep places in alphabetical order in the result > > > > > > > > On Tue, Jul 19, 2011 at 9:53 AM, Em <mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de> > > <mailto:mailformailingli...@yahoo.de > <mailto:mailformailingli...@yahoo.de>>> wrote: > > > > Hi Steven, > > > > thanks for your response! For the ease of use we can make those > > assumptions you made - maybe this makes it much easier to > help. Those > > little extras are something for after solving the "easy" > version of the > > task. :) > > > > What do you mean with the following? > > > > > The second job takes Person : list of places and return for > each place > > > in the list consructs > > > place : 1 | place after P : 1 | next place : 1 ... > > > > You mean something like that? > > > > Washington DC:1 > > New York after Washington DC:1 > > Miami after New York:1 > > > > I do not see the benefit for the result I like to get? > > > > The end-result should be something like that: > > Washington DC => New York, Miami, Los Angeles > > New York => Chicago, Seattle, San Francisco > > > > The point is, that one can see that persons that visited > Washington DC > > are likely to visit New York as the next place, Miami as the > second and > > L.A. as the third. > > However, if I choose New York as my starting point, I can see that > > persons that start their journey in New York (and maybe > weren't in DC > > before) are likely to visit Chicago, Seattle and San > Francisco. Maybe > > Los Angeles comes at the 10th position. > > > > Regards, > > Em > > > > > > > > > > -- > > Steven M. Lewis PhD > > 4221 105th Ave NE > > Kirkland, WA 98033 > > 206-384-1340 <tel:206-384-1340> (cell) > > Skype lordjoe_com > > > > > > > > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > >