Emitswang commented on issue #9628:
URL: https://github.com/apache/shardingsphere/issues/9628#issuecomment-796641932


   Hi @tristaZero 
   
   I'm glad that my suggestion has been accepted. For the first point, let me 
briefly explain the background.
   
   Previously, we have implemented the rule of sharding through mybatis 
interceptor at the business application level. That is to say, the SQL 
currently applied to the proxy layer is in the form of `select * from 
sharding_13_25_db.t_sharding_table_1 where uid =13000251`.
   
   Current stage, we want to take over all SQL requests by using 
sharding-proxy. Then, without changing the business, I need to create all the 
DB corresponding config files so that sharding-proxy can take over all the SQL 
requests smoothly.
   
   For example, if I only configure a sharding rule config file corresponding 
to `schemaName: sharding_db`. Business application send sql request: `select * 
from sharding_13_25_db.t_sharding_table_1 where uid = 13000251 `, an exception 
will occur:` error 1049 (42000): unknown database`.
   
   Therefore, we need a large number of configuration files to enable the proxy 
to normally accept all SQL requests. In this way, we will have 2000 dataSources 
need to configure, although in fact, we only want to split them into 20 data 
source instances, so we don't use the sharding function provided by proxy, we 
just do read-write separation.
   
   You may ask why business applications don't modify the SQL to `select * from 
sharding_db.t_sharding where uid = 13000251`.  This is mainly considered from 
the cost of business implement, which mainly involves the following changes:
   
       1. At present, there are some SQL scenarios in the business, which can't 
obtain the sharding rule according to SQL itself. It needs to carry out some 
association queries, query from the dictionary mapping table, and then rewrite 
the sharding rule to SQL.
       2. The business has table scanning logic traversing by table name, 
starting from `00_db.t_0` to `99_db.t_9` scan data one by one.
   
   Another risk I am worried about is that 2000 table configurations will 
produce some large objects. Will this cause runtime exception, such as fullgc 
frequently or oom, even affect proxy performance.
   
   In theory, parallelization can speed up the `build` process, and I'm 
preparing to make relevant modifications to verify it. If I can, I'd like to be 
a contributor to the project. However, as I am a novice in the project, the 
whole process of submitting PR is not very clear, so I need to learn it first, 
such as:
   
       1. Is the scope of change to be evaluated by myself or to be implemented 
after your evaluation and confirmation?
       2. If the code I submit does not meet the requirements of the project, 
will it wait until I finish the modification?
       3. Is it necessary to add new test cases or run the original test cases 
for parallelization?
       4. Is there a deadline requirement?
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to