srinify commented on issue #11048:
URL: 
https://github.com/apache/incubator-superset/issues/11048#issuecomment-698562956


   Hi Abhishek, there's a few approaches you can take.
   
   1. I would read more about optimizing MySQL first. I'm definitely not the 
expert here, but making sure you have appropriate indexes 
(https://www.tutorialspoint.com/mysql/mysql-indexes.htm), shards 
(https://medium.com/pinterest-engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f),
 and dedicated compute resources for your MySQL server is the first thing I'd 
try. Using EXPLAIN will help you figure out where the bottlenecks are 
(https://dev.mysql.com/doc/refman/5.6/en/using-explain.html). There's people 
running MySQL at scale with petabytes of data!
   
   2. Try introducing an ETL layer so Superset is only querying intermediate 
data / aggregates. You can stick with MySQL and use a tool like DBT to 
introduce some intermediate tables (https://www.getdbt.com/).
   
   3. Druid is definitely a good choice if your data has lots of aggregations 
and you need it to be real-time: 
https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06
 If real-time isn't important, then Druid *could* be a little overkill 
(remember, each database makes their own set of tradeoffs).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to