[GitHub] [incubator-superset] srinify commented on issue #11048: Looking for suggestions on the right database

GitBox Thu, 24 Sep 2020 13:10:15 -0700


srinify commented on issue #11048:
URL: 
https://github.com/apache/incubator-superset/issues/11048#issuecomment-698562956

Hi Abhishek, there's a few approaches you can take.

1. I would read more about optimizing MySQL first. I'm definitely not the
expert here, but making sure you have appropriate indexes
(https://www.tutorialspoint.com/mysql/mysql-indexes.htm), shards
(https://medium.com/pinterest-engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f),
and dedicated compute resources for your MySQL server is the first thing I'd
try. Using EXPLAIN will help you figure out where the bottlenecks are
(https://dev.mysql.com/doc/refman/5.6/en/using-explain.html). There's people
running MySQL at scale with petabytes of data!

2. Try introducing an ETL layer so Superset is only querying intermediate
data / aggregates. You can stick with MySQL and use a tool like DBT to
introduce some intermediate tables (https://www.getdbt.com/).

3. Druid is definitely a good choice if your data has lots of aggregations
and you need it to be real-time:
https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06
If real-time isn't important, then Druid *could* be a little overkill
(remember, each database makes their own set of tradeoffs).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-superset] srinify commented on issue #11048: Looking for suggestions on the right database

Reply via email to