devinbost commented on issue #4012: Adding upsert functionality
URL: https://github.com/apache/pulsar/pull/4012#issuecomment-481832662
 
 
   @jerrypeng Thank you for your very detailed response. I appreciate your time 
and attention to this matter. 
   
   Regarding:
   
   > Is there a reason why you can't just submit/update functions via the REST 
endpoints instead of using the pulsar-admin CLI from docker containers? 
Submitting/Updating functions by just making a HTTP REST call will be a lot 
faster . . . 
   
   I appreciate your guidance. Based on advice from @merlimat earlier today, I 
am currently working on an implementation using the REST endpoints.
   
   Regarding:
   > Do you have 300 individual functions or is there a function with 300 
instances or a group of functions that total 300 instances? There will be a 
huge submission time difference depending on which scenario. Submitting one 
function with 300 instances will take much less time that submitting 300 
functions with one instance each.
   
   At the current moment, all of our functions are individual because they 
represent different use cases. However, we appreciate your advice about the 
performance improvement that we will get from deploying function instances, so 
we will examine ways that we can refactor to obtain those benefits. 
   
   Regarding:
   > What do you mean by this? The cluster will be running as it should when 
submitting functions.
   
   I may have been unintentionally misleading, and I apologize for that. Please 
let me clarify. When I said:
   >  Pulsar is in a broken state
   I didn't mean that the Pulsar cluster is not running. What I meant is that 
our end-to-end production message pipelines will be in a broken state. (i.e. 
Our customers will experience problems.) 
   Consider a plumbing analogy. If you need to re-route pipes while water is 
flowing, if you can't do it extremely quickly, then water will end up leaking 
everywhere, and the people who are expecting water at a particular location 
will notice a loss of service. This doesn't mean that the water system is 
completely broken or that water is not flowing; however, it means that water is 
not reaching our customers. 
   In our case, if we have a production data flow that is processing tens of 
thousands of messages per second, if we need to deploy updates to functions 
that are inter-dependent, then until all of the functions are deployed, some of 
the functions may introduce breaking changes that could cause data loss or 
could cause messages to fail to reach the final destination topic until all of 
the updated functions are deployed. 
   Does this make more sense?
   
   Regarding:
   > I think functionality you are looking is bulk create, update, or upserts. 
You want to bring a cluster from a potentially unknown state into a known 
consistent state in regards to functions. I am I understanding you correctly?
   That is exactly right. 
   I think you're right that we won't likely always need to update all 300 
functions every time we deploy updates. However, we need to ensure that Pulsar 
can quickly and seamlessly match the expected state when we deploy updates.
   
   Regarding:
   > While we can add upserts and even bulk upserts. I would suggest you to try 
just creating/updating functions directly using the REST endpoint first to see 
if that is good enough.
   I will investigate your suggestions for implementing these changes for bulk 
actions. 
   
   Thank you also for the guidance and example change to ComponentImpl.java for 
the Upsert functionality for this PR. 
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to