srinify commented on issue #11048: URL: https://github.com/apache/incubator-superset/issues/11048#issuecomment-698562956
Hi Abhishek, there's a few approaches you can take. 1. I would read more about optimizing MySQL first. I'm definitely not the expert here, but making sure you have appropriate indexes (https://www.tutorialspoint.com/mysql/mysql-indexes.htm), shards (https://medium.com/pinterest-engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f), and dedicated compute resources for your MySQL server is the first thing I'd try. Using EXPLAIN will help you figure out where the bottlenecks are (https://dev.mysql.com/doc/refman/5.6/en/using-explain.html). There's people running MySQL at scale with petabytes of data! 2. Try introducing an ETL layer so Superset is only querying intermediate data / aggregates. You can stick with MySQL and use a tool like DBT to introduce some intermediate tables (https://www.getdbt.com/). 3. Druid is definitely a good choice if your data has lots of aggregations and you need it to be real-time: https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06 If real-time isn't important, then Druid *could* be a little overkill (remember, each database makes their own set of tradeoffs). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
