[google-appengine] Re: Synchronizing between different Cloud Sql tables with GooglePubSub or TaskQueue

Roxana Ioana Roman Tue, 30 May 2017 15:12:06 -0700

Thank you very much for your answers! 
Yes, the database is on Cloud SQL and at the moment both services share it. 
Why I want to split this database is firstly because it serves several 
hundred queries per second, so a bit under stress and secondly in a 
micro-services architecture (what I want to achieve at some point..), 
separate services should have their own private data, which should only be 
accessed through APIs, so each with their own database.  Therefore, since 
the events sent between services won't be complex, PubSub would better suit 
my need? There will be a large number of subscribers, since I will have 
several instances of each service running at the same time ( so many 
publishers and many subscribers I would say).


On Tuesday, 30 May 2017 23:24:43 UTC+3, Yannick (Cloud Platform Support) 
wrote:
>
> Hey Roxana, adding onto Alexey’s excellent answer: I understand that the 
> database currently shared by your services is on Cloud SQL and that both of 
> your services can currently access it independently. Could you please 
> expand on your need for either restricting access to a single service or 
> replicating the data across services and why the current situation isn’t 
> desirable? This should help determine which solution is most appropriate.
>  
> You also asked a couple of other questions which I will attempt to answer 
> here:
> 1)  Can Cloud SQL Read Replicas be configured to ignore tables?
>      At this time Cloud SQL Read Replicas cannot be configured to ignore 
> tables, though there is a MySQL flag that can be used when configuring 
> external replicas.
>  
> 2)  If I decide to synchronise the data between both databases, should I 
> use PubSub or Task Queues?
>      It really goes down to the details of your specific use case. Task 
> Queues is more closely integrated with App Engine and while it's perfectly 
> suitable for sending messages between services it is designed for executing 
> complex long-running tasks. Cloud Pub/Sub on the other hand is a networked 
> messaging service designed for broadcasting publications to a large number 
> of subscribers. You could also communicate between your services by using 
> UrlFetch but then you’d have to handle the retry mechanism yourself.
>  
> 3)  Can PubSub be modified to maintain order, somehow insert ordering 
> information in the message payload to contain previous messages sent?
>      The Cloud Pub/Sub documentation has an article that explains a few 
> ways to handle message ordering. Is is possible but strict message ordering 
> comes at the expense of performance and throughput.
>  
> 4)  How do I handle retries in PubSub?
>      Cloud Pub/Sub offers an "at-least-once delivery" guarantee of each 
> message you publish to each subscriber and will automatically handle 
> retries until they are acknowledged by the subscriber.
>
> On Tuesday, May 30, 2017 at 2:34:41 PM UTC-4, Alexey wrote:
>>
>> Roxana,
>>
>> It may be useful to look closer at the nature of your data.  You outlined 
>> 2 basic strategies: 
>>
>> 1. Replicate the data between 2 services, having one replica of that data 
>> be the written to and you would presumably consider it source of truth.  
>> You have 2 variations on this theme, where you either would rely on a 
>> database replication or use some messaging system to do the job.
>> 2. Allow one service to call the other service for the data it needs.  
>> Your suggestion to cache the data is good, but I would propose that you 
>> cache the data on the side that's service HTTP responses.  Let the 
>> dependent service always call the source service.  This will greatly 
>> simplify your cache management strategy and you'd be able to create other 
>> similar consumers for this system without maintaining different cache 
>> policies.
>>
>> Each of these 2 ideas is workable and each has its trade-offs.  I would 
>> say having a single replica of the data is arguably simpler and allows you 
>> to totally bypass the question of DB syncing with the downside of a 
>> service-to-service dependency.  I would say option 2 is an easier first 
>> step and option 1 is a good thing to do if you find you have a performance 
>> problem and you want to make the data queries with their required joins 
>> faster.
>>
>> Depending on the nature of your data and your queries, there may be a 
>> third option for you to consider.  Something in between these 2 
>> strategies.  Conflict-free Replicated Data Types (
>> https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type 
>> <https://www.google.com/url?q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FConflict-free_replicated_data_type&sa=D&sntz=1&usg=AFQjCNHHHpRAoPCI_2uNInuWrFVpA9NGiw>)
>>  
>> provide some new options in situations, where it becomes possible to pass 
>> messages across a distributed system in order to keep data eventually 
>> consistent without relying on message order being preserved.  It requires 
>> special care in designing your data model, but can be a good alternative to 
>> service-to-service calls and tight DB coupling.  Hope this helps,
>>
>> Alexey
>>
>> On Tuesday, May 30, 2017 at 9:44:38 AM UTC-4, Roxana Ioana Roman wrote:
>>>
>>> Moving to a micro-service architecture, I have separated an AppEngine 
>>> module into two services (the big one and a smaller one, still part of the 
>>> same AppEngine project).
>>>   
>>> Next, I want to separate the database into two, so allowing the small 
>>> micro-service to have only the needed tables in its own db.  The problem is 
>>> one of the tables is needed by both micro-services. The smaller service 
>>> only reads data from this table (has multiple select queries which join on 
>>> data from this table)
>>>
>>> Option 1:
>>> I can leave this table in the bigger service which uses it the most and 
>>> make the smaller service make an http request to get the data that it 
>>> needs, cache this data and be notified when this data changes by the owner 
>>> service of that table. Then, to refresh its cache, the smaller service 
>>> makes an http request again.
>>>
>>> In order to communicate from service A to service B, I was thinking 
>>> between using Google PubSub or Task Queue. And I am not sure which one to 
>>> use here. In this case, receiving the message in order is not important 
>>> since the message will be generic "table_state_changed, query for new 
>>> data", so both could be used..
>>>
>>> Option 2:
>>> I can duplicate the table in both databases (this will allow the service 
>>> to have all the necessary data closer).  When the bigger service modifies 
>>> data in the table, it will notify the smaller service to perform the same 
>>> modification on its version of the table. In this case, the order of the 
>>> messages is important, since it specifies exact crud operations to perform 
>>> on the table. 
>>>
>>> Can PubSub be modified to maintain order, somehow insert ordering 
>>> information in the message payload to contain previous messages sent? 
>>> Retries are also important, we don't want to end up having 
>>> inconsistencies between the two tables.
>>>
>>> Option 3:
>>> Is there a way to create Read Replicas with ignored tables in AppEngine 
>>> (so that I include only the tables I need in one of the services and leave 
>>> the other one with the entire db as it is currently) and set a specific 
>>> service to only use that replica? This does not sound as a good idea, 
>>> however it leaves CloudSql the burden of maintaining same data in both 
>>> versions of the table).
>>>
>>> Which option is better in your opinion and most importantly which one is 
>>> better suited for this case? PubSub or TaskQueues. 
>>> Also, this is only the first step to separate the monolith, there will 
>>> be other services in the future, that would probably encounter the same 
>>> problem.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/dcea92c4-3323-4965-bd1e-bdc656e583a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: Synchronizing between different Cloud Sql tables with GooglePubSub or TaskQueue

Reply via email to