[GitHub] [pulsar] devinbost commented on issue #4012: Adding upsert functionality

GitBox Tue, 09 Apr 2019 16:08:29 -0700

devinbost commented on issue #4012: Adding upsert functionality
URL: https://github.com/apache/pulsar/pull/4012#issuecomment-481472762
 
 
   @jerrypeng Thank you for asking these questions. I will explain my reasoning 
behind these changes and then answer your questions. 
   
   ## The Goal
   My primary goal is to improve the continuous deployment process for Pulsar 
environments operating at-scale. In large projects where multiple teams are 
collaborating to develop functions, sinks, and sources, we need an automated 
deployment process that allows us to quickly and easily update a production 
Pulsar environment without downtime for cases where we may have hundreds or 
thousands of inter-dependent Pulsar functions, sinks, and sources that need to 
be deployed simultaneously. My hope is to allow Pulsar to handle more of the 
deployment logic to simplify the implementation of Pulsar in 
large/high-performance production environments. 
   
   ## The Problem
   Because our pulsar-admin commands are dockerized to allow them to operate at 
scale, there is performance overhead with every pulsar-admin command that must 
be executed. Each pulsar-admin command takes approximately 3-5 seconds to 
execute. In a deployment with 300 Pulsar functions, if each pulsar-admin 
command must be executed in series (rather than in parallel), executing 300 
pulsar-admin commands to update these objects takes 15-25 minutes. Because 
Pulsar is in a broken state while these commands are being executed, this 
deployment approach could result in a production Pulsar environment being down 
for 15-25 minutes, far beyond our SLA of 300 milliseconds of downtime. 
Furthermore, if we need to execute a get-status check before each update 
command (so that we can conditionally execute a create command instead), we 
double our wait time and significantly increase the complexity of the 
deployment code that we must maintain.  
   
   ## The Hope
   If we can offload more of the deployment logic to Pulsar, then we could 
easily parallelize the deployment process in a way that avoids the overhead and 
complexity associated with repeatedly executing pulsar admin commands and 
handling the text output. 
   
   ## The Current Approach
   The way we have handled the deployment so far is to use SaltStack to read a 
YAML manifest file with component (function, sink, and source) metadata to 
generate pulsar-admin commands to construct the create statements for these 
components. Here is an obfuscated example of this YAML:
   
   ```
   - type: source
       namespace: ns1
       tenant: tenant1
       name: source1-kafka
       sourceType: kafka
       destinationTopicName: persistent://tenant1/ns1/topic1
       configs:
           bootstrapServers: kafka_bootstrap_servers
           groupId: "kafkaGroupId"
           topic: "kafkaTopic"
           consumerConfigProperties:
           security.protocol: "SASL_PLAINTEXT"
           sasl.kerberos.service.name: "kafka"
           auto.offset.reset: "latest"
           sasl.jaas.config: sasl_jaas_config
   - type: function
       namespace: ns1
       name: func1
       tenant: tenant1
       artifactFileName: tenant1-1.1-SNAPSHOT-jar-with-dependencies.jar
       className: com.path.to.className1
       inputs:
       - persistent://tenant1/ns1/topic1
       logTopic: persistent://tenant1/ns1/logTopic1
       output: persistent://tenant1/ns1/topic2
   - type: sink
       namespace: ns1
       name: sink1-redis
       tenant: tenant1
       artifactFileName: tenant1-1.1-SNAPSHOT-jar-with-dependencies.jar
       className: com.path.to.className2
       inputs:
       - persistent://tenant1/ns1/topic2
       configs:
           hostname: redis_hostname
           port: redis_port
           password: redis_pass
   ```
   
   ## The Idea
   If we could pass a YAML file like this to Pulsar and have Pulsar ensure that 
its state matches our YAML file, it would make large-scale continuous 
deployments seamless. The Upsert functionality is one step in this direction, 
but it's not anywhere near as important as a bigger-picture solution to this 
problem. My team wants to build changes into Pulsar to simplify deployments, 
but we need architectural guidance about where to make these changes in Pulsar 
so that we don't violate architectural expectations.  
   
   ## Answering your questions
   # Regarding your Question 1:
   My current understanding is that the Admin CLI creates REST calls that hit 
the Pulsar REST API. If my understanding of the behavior of the Admin CLI is 
incorrect, then Upsert would not need to be added to the REST API. 
   # Regarding your Question 2: 
   I agree that it would be better to use the existing APIs and not modify 
FunctionRuntimeManager and FunctionMetaDataManager, especially because the 
current PR introduces a considerable amount of code duplication. However, I 
didn't want to create new classes to handle the new functionality without 
getting architectural guidance because I wasn't sure where to put the new 
classes. 
   
   
   What are your thoughts? If we can solve the broader deployment problem with 
a YAML based approach, then we could create a robust solution that wouldn't 
need Upsert to be added to the REST API to simplify the deployment process.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [pulsar] devinbost commented on issue #4012: Adding upsert functionality

Reply via email to