Yes, exactly. Say you have N partitions and a single input. The number of containers can be anything from 1 to N, and you can change it any time.
On Tue, Dec 10, 2013 at 2:27 AM, Klaus Schaefers < [email protected]> wrote: > Hi, > > thx for the detailed answers. But one more question, what do you mean be > "over-partition"? Do you mean I would initially define lets say 100 > partitions and then just assign 2 containers? When I need to scale out I > would just add more containers and Samza would the also redistribute the > state store? What if I want to reduce the number of containers (e.g. > hosting on EC2), can Samza merge the state stores from several containers > to one? > > Cheers, > > Klaus > > > > On Mon, Dec 9, 2013 at 6:40 PM, Chris Riccomini <[email protected] > >wrote: > > > Hi Klaush, > > > > Thanks for your interest. I'm happy to answer any questions you've got. > > > > 1) In case of failure, will Samza restore the state automatically on > > another node? > > > > Yes. If you are using YARN > > (job.factory.class=org.apache.samza.job.yarn.YarnJobFactory), then YARN > > will see that a Samza container has failed. It will re-start the Samza > > container on a machine in the YARN grid. This node might be the same node > > on the grid, or it might be a new node. YARN makes that decision based on > > the available resources in the grid. When the Samza container starts up, > > Samza will restore the container's state to where it was before the > > failure occurred. Once the state has been fully restored, the Samza > > container will then start feeding your StreamTask new messages from the > > input streams. > > > > 2) If I want to scale out and increase the number of stream partitions. > How > > is the local storage handled? Is it distributed by the framework as well > > according to the partitions? > > > > > > Samza's partitioning model is currently determined by the number of > > partitions that a Samza jobs input streams have. A Samza job will always > > have the max of the partition counts from all of its input streams. For > > example, if a Samza job has two input streams: A and B, and stream A has > 4 > > partitions, and stream B has 8, then the Samza job will have 8 > partitions. > > The first four partitions (0-3) of the Samza job will receive messages > > from both stream A and Stream b, and the last 4 partitions (4-7) will > > receive messages ONLY from stream B. These partitions are run physically > > inside of Samza "containers". Samza containers are assigned partitions > > that they are responsible for processing. For example, if you had two > > Samza containers, the first container would process 4 partitions, and the > > second container would process the other four. With YARN, the number of > > containers you have is defined by the config yarn.container.count. Right > > now, you can never have more containers that input stream partitions. > > > > The state for each one of a Samza job's partitions is managed entirely > > independently. That is, each Samza partition has its own state store > > (LevelDb, if you're using samza-kv). So in the example above, if the > > StreamTask were using a single key-value store, there would be 8 LevelDb > > stores, one for each Samza partition. This allows us to move partitions > > between containers. If you were to decide you wanted 3 containers instead > > of 2, Samza would simply cease processing the partitions on the other two > > containers, restore the state for the third container, and then begin > > processing the partitions across all three containers. This model means > > that the maximum parallelism you can get when processing is 1 partition > > per container, and up to as many partitions as your input streams have. > > > > Samza does not support resizing an input stream's partition count right > > now. Once you start a Samza job, the partition counts for the input > > streams are assumed to be static. If you decide you need more > parallelism, > > you need to start a new Samza job with a different job.name, and > > re-process all the input data again. > > > > Cheers, > > Chris > > > > On 12/9/13 8:02 AM, "Klaus Schaefers" <[email protected]> > wrote: > > > > >Hi, > > > > > >I have been reading about the Samza and I like the concept behind it a > > >lot. > > >In particular the local key-value store is a good idea. However I have > > >some > > >short questions regarding the local state that I couldn't answer while > > >reading the web page. I would be very happy if someone could answer them > > >shortly. Here they are: > > > > > > > > >1) In case of failure, will Samza restore the state automatically on > > >another node? > > > > > >2) If I want to scale out and increase the number of stream partitions. > > >How > > >is the local storage handled? Is it distributed by the framework as well > > >according to the partitions? > > > > > > > > >Cheers, > > > > > >Klaus > > > > > > > > > > > > > > >-- > > > > > >-- > > > > > >Klaus Schaefers > > >Senior Optimization Manager > > > > > >Ligatus GmbH > > >Hohenstaufenring 30-32 > > >D-50674 Köln > > > > > >Tel.: +49 (0) 221 / 56939 -784 > > >Fax: +49 (0) 221 / 56 939 - 599 > > >E-Mail: [email protected] > > >Web: www.ligatus.de > > > > > >HRB Köln 56003 > > >Geschäftsführung: > > >Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > > >Dipl.-Wirtschaftsingenieur Arne Wolter > > > > > > > > > > > >-- > > > > > >-- > > > > > >Klaus Schaefers > > >Senior Optimization Manager > > > > > >Ligatus GmbH > > >Hohenstaufenring 30-32 > > >D-50674 Köln > > > > > >Tel.: +49 (0) 221 / 56939 -784 > > >Fax: +49 (0) 221 / 56 939 - 599 > > >E-Mail: [email protected] > > >Web: www.ligatus.de > > > > > >HRB Köln 56003 > > >Geschäftsführung: > > >Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > > >Dipl.-Wirtschaftsingenieur Arne Wolter > > > > > > > -- > > -- > > Klaus Schaefers > Senior Optimization Manager > > Ligatus GmbH > Hohenstaufenring 30-32 > D-50674 Köln > > Tel.: +49 (0) 221 / 56939 -784 > Fax: +49 (0) 221 / 56 939 - 599 > E-Mail: [email protected] > Web: www.ligatus.de > > HRB Köln 56003 > Geschäftsführung: > Dipl.-Kaufmann Lars Hasselbach, Dipl.-Kaufmann Klaus Ludemann, > Dipl.-Wirtschaftsingenieur Arne Wolter >
