GitHub user shanthoosh opened a pull request:

    https://github.com/apache/samza/pull/874

    SAMZA-2058: Integrate the input partition expansion aware 
SystemStreamGrouper to JobModel generation flow.

    SAMZA-1989 added a partition expansion aware SystemStreamPartitionGrouper 
in samza. This PR aims at integrating the SystemStreamGrouper with the job 
model generation workflow of samza
    and make it work for both the yarn and standalone deployment models. 
    
    **Changes:** 
    
    1. Addition of TaskPartitionAssignmentManager to store the task to 
partition assignments present in JobModel to the underlying metadata store.  
This is essential in persisting the Task to SystemStreamPartition assignments 
for the previous run of a samza job. Currently samza-yarn stores the metadata 
for a execution of a job in coordinator stream. Maximum supported kafka message 
size within LI is 1 MB. This limitation drove the decision to denormalize the 
task to SystemStreamPartition Map into individual messages and store in the 
coordinator stream. 
    2. Used the existing Coordinator stream json serde to deserialize/serialize 
the task to partition assigments to raw bytes before reading/writing into 
coordinator stream. 
    3. Changes in JobModelManager to integrate the input partition expansion 
aware SSPGrouper changes.
    4. Code/JavaDoc cleanup done in  MetadataStore utility classes.
    
    **Testing**:
    
    1. Added new unit-tests for all the newly added classes and fixed the 
existing unit-tests depending upon the changes.
    2. Standalone: Wrote few integration tests in TestZkLocalApplcationRunner 
for standalone to test input stream partition expansion.
    3. YARN: Tested this patch with a sample stream-to-table join high-level 
job from samza-hello-samza. Here're the relevant logs:  
https://gist.github.com/shanthoosh/07357bb615d9cbbfa23cc02b98c9d142, which 
verifies that the AM is restarted on partition expansion of input stream and 
correct task to partition assignments are generated.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shanthoosh/samza SEP-5_left-over

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #874
    
----
commit 7bedd46ba98ebb18bdcbf6e3feace7188ac9af20
Author: Shanthoosh Venkataraman <spvenkat@...>
Date:   2018-12-07T02:04:18Z

    Initial commit.

----


---

Reply via email to