jerrypeng commented on issue #4012: Adding upsert functionality
URL: https://github.com/apache/pulsar/pull/4012#issuecomment-481809137
 
 
   @devinbost thanks sharing your use case and the hurdles you are trying to 
overcome!
   
   In regards to:
   
   > Because our pulsar-admin commands are dockerized to allow them to operate 
at scale, there is performance overhead with every pulsar-admin command that 
must be executed.
   
   Is there a reason why you can't just submit/update functions via the REST 
endpoints instead of using the pulsar-admin CLI from docker containers?  
Submitting/Updating functions by just making a HTTP REST call will be a lot 
faster than start up a docker container every time to execute commands via 
command line
   
   > In a deployment with 300 Pulsar functions, if each pulsar-admin command 
must be executed in series (rather than in parallel), executing 300 
pulsar-admin commands to update these objects takes 15-25 minutes.
   
   Do you have 300 individual functions or is there a function with 300 
instances or a group of functions that total 300 instances?  There will be a 
huge submission time difference depending on which scenario.  Submitting one 
function with 300 instances will take much less time that submitting 300 
functions with one instance each.
   
   > Because Pulsar is in a broken state while these commands are being executed
   
   What do you mean by this?  The cluster will be running as it should when 
submitting functions.
   
   > this deployment approach could result in a production Pulsar environment 
being down for 15-25 minutes, far beyond our SLA of 300 milliseconds of 
downtime.
   
   In a situation, that somehow your whole pulsar cluster is down and all your 
functions disappeared, it is unrealistic to expect the downtime to be less that 
300 milliseconds.  As you probably already know, starting up a pulsar cluster 
regardless of functions will take longer than that.  If you are just talking 
about resubmitting 300 functions,  I am not sure its realistic to expect all 
the JARs/Packages for 300 functions can be upload in 300 milliseconds.  If you 
are trying to avoid a situation in which you suffer downtime because a 
catastrophic event happened to your cluster, i would recommend having 
redundancy.  Have geo-replicated clusters across multiple regions.  So you can 
seamlessly cut traffic from your downed cluster to another cluster.
   
   If you have 300 functions, I don't think its going to be the norm for you to 
need to update all 300 functions.  Its more likely that its going to be a 
subset of that.
   
   I think functionality you are looking is bulk create, update, or upserts.  
You want to bring a cluster from a potentially unknown state into a known 
consistent state in regards to functions.  I am I understanding you correctly?
   
   While we can add upserts and even bulk upserts.  I would suggest you to try 
just creating/updating functions directly using the REST endpoint first to see 
if that is good enough.
   
   I would still very much like to see features like bulk create/update/upserts 
in Pulsar functions.  I do believe we can accomplish them by just 
adding/modifying the "front end" code i.e. the REST the endpoints and 
ComponentImpl.java  to implement the bulk actions.  Please reference the code 
in registerFunction and updateFunction and when can probably just run that in a 
loop for bulk actions.
   
   In regards, to this PR and implementing upserts, I think you can just do 
something like the following in ComponentImpl.java
   
   ```
   if(functionMetaDataManager.containsFunction(tenant, namespace, 
functionName)) {
      updateFunction(...)
   } else {
      registerFunction(...)
   }
   ```
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to