Neeraj, Thanks for that, it is useful info.
The instance has a very skewed balance between data elements/catcombos and indicators, but what else is new - outside of South Africa, most countries have the same: they collect a huge number or data elements but turn very few of them into indicators (SA collect very very few data elements that are NOT part of any indicator). But that is a separate discussion. I would NOT expect the number of data elements & catcombos + the number of orgunits to have a significant impact on analytics performance, but that needs to be verified. I WOULD expect the number of indicators to have a significant impact, so if you doubled or quadrupled the number of indicators I would expect a significant impact on analytics time Using your and my numbers as a baseline, I would envision the following tests to be revealing: Baseline system 1: SSD, 4-core CPU: 100 mill values, 400 DEs, 250 indicators, 30,000 OUs, few catcombos -> 1 hour for analytics processing Baseline system 2: SSD, 12-core CPU: 500 mill values, 3,500 DEs, 15,000 OUs, 25 catcombos -> ~10 hours for analytics processing Test 1: Using system 1, reduce OUs to 15,000 - run analytics (hypothesis: no significant difference) Test 2: Using system 1, reduce indicators to 125 - run analytics, then to 0 - run analytics (hypothesis: analytics reduced to 45 min) Test 3: Using system 1, increase datavalues from 100mill to 500 mill by introducing additional attributecombos (easy) - run analytics (hypothesis: ~15 hours for analytics processing, with 4-cores instead of 12-cores) Test 4: Using system 1, increase datavalues to 1 billion - run analytics (hypothesis: 35 hours for analytics processing). Neeraj, I don't know if you have a similar sandbox server available for testing, but if you do: Test 5: Using system 2, increase number of indicators to 600 (just export the 200 you have, modify the names and uids a bit, and re-import - for the purpose of this test, it does not matter that many of the indicators have identical formulas). Hypothesis: Analytics time up at least 20% Test 6: using system 2 (with 200 indicators), shift half of the 500 mill data values forward or backward in time resulting in twice the number of analytics tables at half the size. (hypothesis: slight reduction in processing time) Test 7: Using system 2 (with 200 indicators), increase datavalues to 1 billion using a new attributecombo - run analytics (hypothesis: analytics time up to 25 hours again). As far as I've seen, the bulk of analytics processing time is taken up by indexing - which I would expect to follow a moderately quadratic curve in terms of analytics table sizes. Best regards Calle On 24 October 2016 at 10:39, Neeraj Gupta <neeraj.h...@gmail.com> wrote: > Calle, > > We have around 500 million record in database with 3666 data elements > having 26 category combinations and 201 indicators and there are 14398 > organisation units. > > Thanks, > Neeraj > > On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg <calle.hedb...@gmail.com> > wrote: > >> Neeraj, >> >> You never stated the number of records you have in the datavalue table - >> what is it? >> >> In the same context: anybody have a rough idea of how many datavalue >> records there are in the global DATIM database - which I think currently >> might be the largest DHIS2 instance around? >> >> Given our own recent work on performance + what Neeraj has reported, I've >> been thinking of creating one test instance with let us say 500 mill >> datavalue records and another with let us say 1 billion, then use them to >> identify key bottlenecks in various processes AND use them to ensure that >> DHIS2 analytics performance is as linear as possible in terms of database >> size. Postgresql has introduced a number of new indexing algorithms in >> recent versions, and I'm not sure if DHIS2 is taking full advantage of them. >> >> Best regards >> Calle >> >> On 24 October 2016 at 07:53, Brajesh Murari <brajesh.mur...@gmail.com> >> wrote: >> >>> Congratulation Neeraj and team ...it much appreciated >>> >>> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.h...@gmail.com> >>> wrote: >>> >>>> Dear Team, Thanks for all your suggestions. >>>> >>>> Now the time of analytic is reduced to 10 hours 41 minutes. >>>> >>>> We tried to VACUUM as Sam suggested but it didn't help then we upgraded >>>> postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some >>>> changes in configuration file of postgres and it reduced the time. But the >>>> database size is still same. >>>> >>>> Thanks for all your help! >>>> >>>> Thanks, >>>> Neeraj >>>> >>>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <calle.hedb...@gmail.com >>>> > wrote: >>>> >>>>> Neeraj, >>>>> >>>>> It's always an element of uncertainty linked to database sizes - ref >>>>> Sam's post over. So indicating the number of records you have in the >>>>> datavalue table & key meta-data tables would be useful + indicating >>>>> whether >>>>> you are running other instances on the same server. Some comments - I've >>>>> been doing a lot of similar optimising work recently: >>>>> >>>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 >>>>> before the worst bugs are fixed and it has stabilised). >>>>> >>>>> 2. Carefully check your postgres.conf against the recommended >>>>> settings. The guide is a bit superficial in the sense that it has >>>>> recommended "fixed" values only and no explanations around ranges below or >>>>> above those, but you can experiment a bit yourself (e.g. the recommended >>>>> "max_connections = 200" might not be sufficient for a really large system >>>>> like what you have. >>>>> >>>>> 3. If your server is running that single instance only, then 48GB or >>>>> RAM should be sufficient. Our servers are all having 128GB RAM so we >>>>> experimented quite a bit earlier this year with giving a DHIS2 instance >>>>> large amounts or RAM (up to 60-70gb), with negligible impact on >>>>> performance. According to Lars, the DHIS2 cannot really utilize more than >>>>> around 16gb RAM (at least that is how I understood his communication at >>>>> the >>>>> time). So 48GB should be sufficient for a single instance. >>>>> >>>>> 4. I've been doing performance optimizing recently on an instance with >>>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2 >>>>> - 31,000 Orgunits >>>>> - 420 data elements >>>>> - 250 indicators >>>>> - around 100 mill datavalue records >>>>> - total size around 140gb with analytics tables. >>>>> >>>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT >>>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the >>>>> analytics engine is designed well, I would expect a nearly linear >>>>> relationship between database size and the time analytics takes to run. So >>>>> running analytics on your database on our server should in theory take 4-5 >>>>> hours. >>>>> >>>>> We are obviously comparing oranges and nectarines here, in the sense >>>>> that there might be other aspects of our server and database that is >>>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, >>>>> whether >>>>> your instance have lots of tracker data, etc etc). I have not seen any >>>>> scientific/quantified comparative performance values related to specific >>>>> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD >>>>> improve analytics performance quite a bit - assuming around 30% then it >>>>> means running analytics on your database/server should take around 3 >>>>> hours...... >>>>> >>>>> I tried getting comparative, quantitative data on various >>>>> configurations of hardware and software (e.g. some users prefer CentOS, >>>>> others Ubuntu) during the academy in August, but did not get much - it >>>>> seems most users/providers have found a setup that works for them for now >>>>> and nobody is doing any systematic performance testing (some of the >>>>> international NGOs/companies using DHIS2 might have, but as with >>>>> internally >>>>> developed apps they are not that keen on sharing). So it would be highly >>>>> appreciated if you would post the results on analytics time with every >>>>> upgrade / tweak you do - starting with the upgrade to Pg 9.5.4 >>>>> >>>>> Best regards >>>>> Calle >>>>> >>>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.john...@qebo.co.uk> >>>>> wrote: >>>>> >>>>>> Hi Neeraj, >>>>>> >>>>>> >>>>>> >>>>>> *Using VACUUM and ANALYZE* >>>>>> >>>>>> >>>>>> >>>>>> Like Brajesh, my background is MySQL, and one database admin task >>>>>> that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims >>>>>> unused >>>>>> space (we’ve had 100Gb databases files drop to half their size) and >>>>>> refreshes index statistics (if the shape of your data has changed over >>>>>> time, this can make indices run faster). >>>>>> >>>>>> >>>>>> >>>>>> I’m new to PostgreSQL, but the core principles are the same, and a >>>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the >>>>>> VACUUM and ANALYZE commands. If your database isn’t set to automatically >>>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you >>>>>> might want to try VACUUM *FULL*, which will literally rewrite all of >>>>>> your database tables and indices into smaller, more efficient files >>>>>> (note, >>>>>> however, that on a 500Gb database this could take a *looong* time – >>>>>> perhaps test on a backup first?). The following forum post is a really >>>>>> nice, plain-English explanation of what VACUUM does: >>>>>> >>>>>> http://dba.stackexchange.com/questions/126258/what-is-table- >>>>>> bloating-in-databases >>>>>> >>>>>> >>>>>> >>>>>> As I mentioned, my background is MySQL rather than Postgres, so >>>>>> someone with more specific Postgres experience might like to also chime >>>>>> in >>>>>> here. >>>>>> >>>>>> >>>>>> >>>>>> Cheers, Sam. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson= >>>>>> qebo.co...@lists.launchpad.net> on behalf of Brajesh Murari < >>>>>> brajesh.mur...@gmail.com> >>>>>> *Date: *Wednesday, 19 October 2016 at 08:28 >>>>>> *To: *Knut Staring <knu...@gmail.com> >>>>>> *Cc: *DHIS 2 Users list <dhis2-users@lists.launchpad.net>, DHIS2 >>>>>> Developers <dhis2-d...@lists.launchpad.net> >>>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing >>>>>> Analytic >>>>>> >>>>>> >>>>>> >>>>>> Dear Neeraj, >>>>>> >>>>>> >>>>>> >>>>>> The physical database size doesn't matter much, even the number of >>>>>> records don't matter. In my experience the biggest problem that one can >>>>>> going to run in to is not size, but the number of queries you can handle >>>>>> at >>>>>> a time instance specially during analytic functionality execution. Most >>>>>> probably you should going to have to move to a master/slave configuration >>>>>> of your database, so that the read queries can run against the slaves and >>>>>> the write queries run against the master. However, if you and your >>>>>> database >>>>>> management team are not ready for this than, you can tweak your indexes >>>>>> for >>>>>> the queries you are running to speed up the response times. Also there >>>>>> is a >>>>>> lot of tweaking you can do to the network stack and kernel in Linux where >>>>>> MySQL Server has been installed that will help.Perhaps, I would focus >>>>>> first >>>>>> on your indexes, then have a server admin look at your OS, and if all >>>>>> that >>>>>> doesn't help it might be time to implement a master/slave configuration. >>>>>> The most important scalability factor is RAM. If the indexes of your >>>>>> tables >>>>>> fit into memory and your queries are highly optimized in analytic >>>>>> functionality, you can serve a reasonable amount of requests with a >>>>>> average >>>>>> machine. The number of records do matter, depending of how your tables >>>>>> look >>>>>> like. It's a difference to have a lot of varchar fields or only a couple >>>>>> of >>>>>> ints or longs. The physical size of the database matters as well, think >>>>>> of >>>>>> backups, for instance. Depending on your engine, your physical db files >>>>>> on >>>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of >>>>>> rows, doesn't help to shrink your physical files. Thus the database size >>>>>> does matter. If you have more than one table with more than a million >>>>>> records, then performance starts indeed to degrade. Indexig is one of the >>>>>> important stand need to take care, If you hit one million records you >>>>>> will >>>>>> get performance problems, if the indices are not set right (for example >>>>>> no >>>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). If >>>>>> you hit 10 million records, you will start to get performance problems >>>>>> even >>>>>> if you have all your indices right. Hardware upgrades - adding more >>>>>> memory >>>>>> and more processor power, especially memory - often help to reduce the >>>>>> most >>>>>> severe problems by increasing the performance again, at least to a >>>>>> certain >>>>>> degree. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knu...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> Just a heads-up that there seems to be a JDBC issue with Postgres >>>>>> 9.6, so perhaps you should try upgrading to 9.5 first. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <l...@dhis2.org> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Neeraj, >>>>>> >>>>>> >>>>>> >>>>>> what usually helps to improve runtime is to improve/increase: >>>>>> >>>>>> >>>>>> >>>>>> - ssd (read and write speed) >>>>>> >>>>>> - number of CPUs >>>>>> >>>>>> - using latest postgresql (9.6 claims to have even better indexing >>>>>> performance >>>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than >>>>>> 9.5) >>>>>> >>>>>> - tuning >>>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning> >>>>>> of postgresql >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> regards, >>>>>> >>>>>> >>>>>> >>>>>> Lars >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Lars Helge Øverland >>>>>> >>>>>> Lead developer, DHIS 2 >>>>>> >>>>>> University of Oslo >>>>>> >>>>>> Skype: larshelgeoverland >>>>>> >>>>>> l...@dhis2.org >>>>>> >>>>>> http://www.dhis2.org <https://www.dhis2.org/> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>>> Post to : dhis2-users@lists.launchpad.net >>>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>>> More help : https://help.launchpad.net/ListHelp >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Knut Staring >>>>>> >>>>>> Dept. of Informatics, University of Oslo >>>>>> >>>>>> Norway: +4791880522 >>>>>> >>>>>> Skype: knutstar >>>>>> >>>>>> http://dhis2.org >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>>> Post to : dhis2-d...@lists.launchpad.net >>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>>> More help : https://help.launchpad.net/ListHelp >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> >>>>>> >>>>>> Brajesh Murari, >>>>>> >>>>>> Postgraduate, Department of Computer Science and Engineering, >>>>>> >>>>>> Chaudhary Devi Lal University, Sirsa, >>>>>> >>>>>> India. >>>>>> >>>>>> >>>>>> >>>>>> The three basic dimensions of human development: a long and healthy >>>>>> life, access to knowledge, and a decent standard of living. >>>>>> >>>>>> _______________________________________________ >>>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>>> Post to : dhis2-d...@lists.launchpad.net >>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>>> More help : https://help.launchpad.net/ListHelp >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ******************************************* >>>>> >>>>> Calle Hedberg >>>>> >>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>>>> >>>>> Tel/fax (home): +27-21-685-6472 >>>>> >>>>> Cell: +27-82-853-5352 >>>>> >>>>> Iridium SatPhone: +8816-315-19119 >>>>> >>>>> Email: calle.hedb...@gmail.com >>>>> >>>>> Skype: calle_hedberg >>>>> >>>>> ******************************************* >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>> Post to : dhis2-users@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> Neeraj Gupta >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~dhis2-users >>>> Post to : dhis2-users@lists.launchpad.net >>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> >>> Brajesh Murari, >>> Postgraduate, Dept of CSE, >>> Chaudhary Devi Lal University, Sirsa, >>> India. >>> >>> The three basic dimensions of human development: a long and healthy >>> life, access to knowledge, and a decent standard of living. >>> >> >> >> >> -- >> >> ******************************************* >> >> Calle Hedberg >> >> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >> >> Tel/fax (home): +27-21-685-6472 >> >> Cell: +27-82-853-5352 >> >> Iridium SatPhone: +8816-315-19119 >> >> Email: calle.hedb...@gmail.com >> >> Skype: calle_hedberg >> >> ******************************************* >> >> > > > -- > Thanks, > Neeraj Gupta > -- ******************************************* Calle Hedberg 46D Alma Road, 7700 Rosebank, SOUTH AFRICA Tel/fax (home): +27-21-685-6472 Cell: +27-82-853-5352 Iridium SatPhone: +8816-315-19119 Email: calle.hedb...@gmail.com Skype: calle_hedberg *******************************************
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-users Post to : dhis2-users@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-users More help : https://help.launchpad.net/ListHelp