We don't have a partition per user, there is no need for that. In the same way a distributed database doesn't have a partition per user. A partition is just a physical grouping of keys.
-Jay On Tue, Nov 27, 2012 at 12:00 PM, S Ahmed <sahmed1...@gmail.com> wrote: > How does that work out though, I mean with 10 million users that is 10 > million files at least. > > > On Mon, Nov 26, 2012 at 2:02 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Yeah a partition is physically implemented as a log (i.e. a sequence of > > files containing a bunch of messages indexed by offset). So each server > can > > have lots of partitions, but each partition exists entirely on a server. > > > > So in the "newsfeed" case if you partition by user id, you would be > > guaranteed that all activity relevant to that user went to a single > > processor. In our case, yes, we serve out of a different system which is > > the destination after all the pre-processing. > > > > > > On Mon, Nov 26, 2012 at 9:19 AM, S Ahmed <sahmed1...@gmail.com> wrote: > > > > > >Yes, your description is correct. A particular member's data would all > > be > > > >in one partition. > > > When you say in one partition, that also means on the same server? Or > a > > > partition can span a brocker node? > > > > > > At the file level, I'm guessing it has its own physical file then? (or > > set > > > of files as it grows with the file number suffix). > > > > > > So at linkedIn, is this how you present a users dashboard inbox (your > > > friend has a new job, they updated their profile, someone recommended > > them, > > > etc.) I guess you can further sort at the application level then, and > > > cache to a different store? > > > > > > > > > On Mon, Nov 26, 2012 at 11:53 AM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > > > > Yes, your description is correct. A particular member's data would > all > > be > > > > in one partition. > > > > > > > > Broker partitions are just the unit of parallelism--think of each > > > partition > > > > as a totally ordered log you can append to and read from. The > > consumption > > > > of one of these partition logs is single threaded. > > > > > > > > The guarantee is that all messages are added to a partition in the > > order > > > > they arrive. From the point of view of a single producer client this > > will > > > > also be the order in which they are sent. These messages are then > > > delivered > > > > in this order to a consumer thread. > > > > > > > > Hope that helps. > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > On Sun, Nov 25, 2012 at 7:54 PM, S Ahmed <sahmed1...@gmail.com> > wrote: > > > > > > > > > The wiki states "Consider an application that would like to > maintain > > an > > > > > aggregation of the number of profile visitors for each member. It > > would > > > > > like to send all profile visit events for a member to a particular > > > > > partition and, hence, have all updates for a member to appear in > the > > > same > > > > > stream for the same consumer thread." ( > > > > > http://incubator.apache.org/kafka/design.html) > > > > > > > > > > So say I have 5 broker servers, now my producer will send a message > > > for a > > > > > particular profile page visit, with the default algorithm using > > > > > hash(member_id)%num_partitions > > > > > to figur out which broker server to send it it. > > > > > > > > > > So a particular members pageview messages will all go to a single > > > server > > > > > then, is this the case? And therefore all the messages for a given > > > user > > > > > will be in the correct order also right? > > > > > > > > > > So a consumer group that subscribes to the 'profile-page-view' > topic > > > will > > > > > consume page view related messages, is it possible to subscribe to > a > > > > > particular broker partition also? > > > > > > > > > > Are broker partitions meant for cases when you want all messages to > > be > > > > > saved on the same node? > > > > > > > > > > > > > > >