Hi there, the analytics table generation is bound by db indexing speed / disk write speed. It scales almost linearly with the number of CPU cores available and disk write speed.
regards, Lars On Mon, Oct 24, 2016 at 7:33 AM, Neeraj Gupta <neeraj.h...@gmail.com> wrote: > Hi Calle, > > Thanks for all this. > > Yes, we will have a test server available with us by first week of > November, I can start test scenarios suggested by you after that. > > Thanks, > Neeraj > > On Mon, Oct 24, 2016 at 3:50 PM, Calle Hedberg <calle.hedb...@gmail.com> > wrote: > >> Neeraj, >> >> Thanks for that, it is useful info. >> >> The instance has a very skewed balance between data elements/catcombos >> and indicators, but what else is new - outside of South Africa, most >> countries have the same: they collect a huge number or data elements but >> turn very few of them into indicators (SA collect very very few data >> elements that are NOT part of any indicator). But that is a separate >> discussion. >> >> I would NOT expect the number of data elements & catcombos + the number >> of orgunits to have a significant impact on analytics performance, but that >> needs to be verified. >> >> I WOULD expect the number of indicators to have a significant impact, so >> if you doubled or quadrupled the number of indicators I would expect a >> significant impact on analytics time >> >> Using your and my numbers as a baseline, I would envision the following >> tests to be revealing: >> >> Baseline system 1: SSD, 4-core CPU: 100 mill values, 400 DEs, 250 >> indicators, 30,000 OUs, few catcombos -> 1 hour for analytics processing >> >> Baseline system 2: SSD, 12-core CPU: 500 mill values, 3,500 DEs, 15,000 >> OUs, 25 catcombos -> ~10 hours for analytics processing >> >> Test 1: Using system 1, reduce OUs to 15,000 - run analytics (hypothesis: >> no significant difference) >> Test 2: Using system 1, reduce indicators to 125 - run analytics, then to >> 0 - run analytics (hypothesis: analytics reduced to 45 min) >> Test 3: Using system 1, increase datavalues from 100mill to 500 mill by >> introducing additional attributecombos (easy) - run analytics (hypothesis: >> ~15 hours for analytics processing, with 4-cores instead of 12-cores) >> Test 4: Using system 1, increase datavalues to 1 billion - run analytics >> (hypothesis: 35 hours for analytics processing). >> >> Neeraj, I don't know if you have a similar sandbox server available for >> testing, but if you do: >> >> Test 5: Using system 2, increase number of indicators to 600 (just export >> the 200 you have, modify the names and uids a bit, and re-import - for the >> purpose of this test, it does not matter that many of the indicators have >> identical formulas). Hypothesis: Analytics time up at least 20% >> Test 6: using system 2 (with 200 indicators), shift half of the 500 mill >> data values forward or backward in time resulting in twice the number of >> analytics tables at half the size. (hypothesis: slight reduction in >> processing time) >> Test 7: Using system 2 (with 200 indicators), increase datavalues to 1 >> billion using a new attributecombo - run analytics (hypothesis: analytics >> time up to 25 hours again). >> >> As far as I've seen, the bulk of analytics processing time is taken up by >> indexing - which I would expect to follow a moderately quadratic curve in >> terms of analytics table sizes. >> >> Best regards >> Calle >> >> On 24 October 2016 at 10:39, Neeraj Gupta <neeraj.h...@gmail.com> wrote: >> >>> Calle, >>> >>> We have around 500 million record in database with 3666 data elements >>> having 26 category combinations and 201 indicators and there are 14398 >>> organisation units. >>> >>> Thanks, >>> Neeraj >>> >>> On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg <calle.hedb...@gmail.com> >>> wrote: >>> >>>> Neeraj, >>>> >>>> You never stated the number of records you have in the datavalue table >>>> - what is it? >>>> >>>> In the same context: anybody have a rough idea of how many datavalue >>>> records there are in the global DATIM database - which I think currently >>>> might be the largest DHIS2 instance around? >>>> >>>> Given our own recent work on performance + what Neeraj has reported, >>>> I've been thinking of creating one test instance with let us say 500 mill >>>> datavalue records and another with let us say 1 billion, then use them to >>>> identify key bottlenecks in various processes AND use them to ensure that >>>> DHIS2 analytics performance is as linear as possible in terms of database >>>> size. Postgresql has introduced a number of new indexing algorithms in >>>> recent versions, and I'm not sure if DHIS2 is taking full advantage of >>>> them. >>>> >>>> Best regards >>>> Calle >>>> >>>> On 24 October 2016 at 07:53, Brajesh Murari <brajesh.mur...@gmail.com> >>>> wrote: >>>> >>>>> Congratulation Neeraj and team ...it much appreciated >>>>> >>>>> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.h...@gmail.com> >>>>> wrote: >>>>> >>>>>> Dear Team, Thanks for all your suggestions. >>>>>> >>>>>> Now the time of analytic is reduced to 10 hours 41 minutes. >>>>>> >>>>>> We tried to VACUUM as Sam suggested but it didn't help then we >>>>>> upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we >>>>>> made >>>>>> some changes in configuration file of postgres and it reduced the time. >>>>>> But >>>>>> the database size is still same. >>>>>> >>>>>> Thanks for all your help! >>>>>> >>>>>> Thanks, >>>>>> Neeraj >>>>>> >>>>>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg < >>>>>> calle.hedb...@gmail.com> wrote: >>>>>> >>>>>>> Neeraj, >>>>>>> >>>>>>> It's always an element of uncertainty linked to database sizes - ref >>>>>>> Sam's post over. So indicating the number of records you have in the >>>>>>> datavalue table & key meta-data tables would be useful + indicating >>>>>>> whether >>>>>>> you are running other instances on the same server. Some comments - I've >>>>>>> been doing a lot of similar optimising work recently: >>>>>>> >>>>>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 >>>>>>> before the worst bugs are fixed and it has stabilised). >>>>>>> >>>>>>> 2. Carefully check your postgres.conf against the recommended >>>>>>> settings. The guide is a bit superficial in the sense that it has >>>>>>> recommended "fixed" values only and no explanations around ranges below >>>>>>> or >>>>>>> above those, but you can experiment a bit yourself (e.g. the recommended >>>>>>> "max_connections = 200" might not be sufficient for a really large >>>>>>> system >>>>>>> like what you have. >>>>>>> >>>>>>> 3. If your server is running that single instance only, then 48GB or >>>>>>> RAM should be sufficient. Our servers are all having 128GB RAM so we >>>>>>> experimented quite a bit earlier this year with giving a DHIS2 instance >>>>>>> large amounts or RAM (up to 60-70gb), with negligible impact on >>>>>>> performance. According to Lars, the DHIS2 cannot really utilize more >>>>>>> than >>>>>>> around 16gb RAM (at least that is how I understood his communication at >>>>>>> the >>>>>>> time). So 48GB should be sufficient for a single instance. >>>>>>> >>>>>>> 4. I've been doing performance optimizing recently on an instance >>>>>>> with >>>>>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2 >>>>>>> - 31,000 Orgunits >>>>>>> - 420 data elements >>>>>>> - 250 indicators >>>>>>> - around 100 mill datavalue records >>>>>>> - total size around 140gb with analytics tables. >>>>>>> >>>>>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT >>>>>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the >>>>>>> analytics engine is designed well, I would expect a nearly linear >>>>>>> relationship between database size and the time analytics takes to run. >>>>>>> So >>>>>>> running analytics on your database on our server should in theory take >>>>>>> 4-5 >>>>>>> hours. >>>>>>> >>>>>>> We are obviously comparing oranges and nectarines here, in the sense >>>>>>> that there might be other aspects of our server and database that is >>>>>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, >>>>>>> whether >>>>>>> your instance have lots of tracker data, etc etc). I have not seen any >>>>>>> scientific/quantified comparative performance values related to specific >>>>>>> parameters like number of CPUs and/or number of cores, but 12 cores >>>>>>> SHOULD >>>>>>> improve analytics performance quite a bit - assuming around 30% then it >>>>>>> means running analytics on your database/server should take around 3 >>>>>>> hours...... >>>>>>> >>>>>>> I tried getting comparative, quantitative data on various >>>>>>> configurations of hardware and software (e.g. some users prefer CentOS, >>>>>>> others Ubuntu) during the academy in August, but did not get much - it >>>>>>> seems most users/providers have found a setup that works for them for >>>>>>> now >>>>>>> and nobody is doing any systematic performance testing (some of the >>>>>>> international NGOs/companies using DHIS2 might have, but as with >>>>>>> internally >>>>>>> developed apps they are not that keen on sharing). So it would be highly >>>>>>> appreciated if you would post the results on analytics time with every >>>>>>> upgrade / tweak you do - starting with the upgrade to Pg 9.5.4 >>>>>>> >>>>>>> Best regards >>>>>>> Calle >>>>>>> >>>>>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.john...@qebo.co.uk> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Neeraj, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Using VACUUM and ANALYZE* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Like Brajesh, my background is MySQL, and one database admin task >>>>>>>> that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims >>>>>>>> unused >>>>>>>> space (we’ve had 100Gb databases files drop to half their size) and >>>>>>>> refreshes index statistics (if the shape of your data has changed over >>>>>>>> time, this can make indices run faster). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I’m new to PostgreSQL, but the core principles are the same, and a >>>>>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the >>>>>>>> VACUUM and ANALYZE commands. If your database isn’t set to >>>>>>>> automatically >>>>>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you >>>>>>>> might want to try VACUUM *FULL*, which will literally rewrite all >>>>>>>> of your database tables and indices into smaller, more efficient files >>>>>>>> (note, however, that on a 500Gb database this could take a *looong* >>>>>>>> time – perhaps test on a backup first?). The following forum post is a >>>>>>>> really nice, plain-English explanation of what VACUUM does: >>>>>>>> >>>>>>>> http://dba.stackexchange.com/questions/126258/what-is-table- >>>>>>>> bloating-in-databases >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> As I mentioned, my background is MySQL rather than Postgres, so >>>>>>>> someone with more specific Postgres experience might like to also >>>>>>>> chime in >>>>>>>> here. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Cheers, Sam. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson= >>>>>>>> qebo.co...@lists.launchpad.net> on behalf of Brajesh Murari < >>>>>>>> brajesh.mur...@gmail.com> >>>>>>>> *Date: *Wednesday, 19 October 2016 at 08:28 >>>>>>>> *To: *Knut Staring <knu...@gmail.com> >>>>>>>> *Cc: *DHIS 2 Users list <dhis2-us...@lists.launchpad.net>, DHIS2 >>>>>>>> Developers <dhis2-devs@lists.launchpad.net> >>>>>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing >>>>>>>> Analytic >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Dear Neeraj, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The physical database size doesn't matter much, even the number of >>>>>>>> records don't matter. In my experience the biggest problem that one can >>>>>>>> going to run in to is not size, but the number of queries you can >>>>>>>> handle at >>>>>>>> a time instance specially during analytic functionality execution. Most >>>>>>>> probably you should going to have to move to a master/slave >>>>>>>> configuration >>>>>>>> of your database, so that the read queries can run against the slaves >>>>>>>> and >>>>>>>> the write queries run against the master. However, if you and your >>>>>>>> database >>>>>>>> management team are not ready for this than, you can tweak your >>>>>>>> indexes for >>>>>>>> the queries you are running to speed up the response times. Also there >>>>>>>> is a >>>>>>>> lot of tweaking you can do to the network stack and kernel in Linux >>>>>>>> where >>>>>>>> MySQL Server has been installed that will help.Perhaps, I would focus >>>>>>>> first >>>>>>>> on your indexes, then have a server admin look at your OS, and if all >>>>>>>> that >>>>>>>> doesn't help it might be time to implement a master/slave >>>>>>>> configuration. >>>>>>>> The most important scalability factor is RAM. If the indexes of your >>>>>>>> tables >>>>>>>> fit into memory and your queries are highly optimized in analytic >>>>>>>> functionality, you can serve a reasonable amount of requests with a >>>>>>>> average >>>>>>>> machine. The number of records do matter, depending of how your tables >>>>>>>> look >>>>>>>> like. It's a difference to have a lot of varchar fields or only a >>>>>>>> couple of >>>>>>>> ints or longs. The physical size of the database matters as well, >>>>>>>> think of >>>>>>>> backups, for instance. Depending on your engine, your physical db >>>>>>>> files on >>>>>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of >>>>>>>> rows, doesn't help to shrink your physical files. Thus the database >>>>>>>> size >>>>>>>> does matter. If you have more than one table with more than a million >>>>>>>> records, then performance starts indeed to degrade. Indexig is one of >>>>>>>> the >>>>>>>> important stand need to take care, If you hit one million records you >>>>>>>> will >>>>>>>> get performance problems, if the indices are not set right (for >>>>>>>> example no >>>>>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). >>>>>>>> If >>>>>>>> you hit 10 million records, you will start to get performance problems >>>>>>>> even >>>>>>>> if you have all your indices right. Hardware upgrades - adding more >>>>>>>> memory >>>>>>>> and more processor power, especially memory - often help to reduce the >>>>>>>> most >>>>>>>> severe problems by increasing the performance again, at least to a >>>>>>>> certain >>>>>>>> degree. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knu...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Just a heads-up that there seems to be a JDBC issue with Postgres >>>>>>>> 9.6, so perhaps you should try upgrading to 9.5 first. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland < >>>>>>>> l...@dhis2.org> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Neeraj, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> what usually helps to improve runtime is to improve/increase: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> - ssd (read and write speed) >>>>>>>> >>>>>>>> - number of CPUs >>>>>>>> >>>>>>>> - using latest postgresql (9.6 claims to have even better indexing >>>>>>>> performance >>>>>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than >>>>>>>> 9.5) >>>>>>>> >>>>>>>> - tuning >>>>>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning> >>>>>>>> of postgresql >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> regards, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Lars >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Lars Helge Øverland >>>>>>>> >>>>>>>> Lead developer, DHIS 2 >>>>>>>> >>>>>>>> University of Oslo >>>>>>>> >>>>>>>> Skype: larshelgeoverland >>>>>>>> >>>>>>>> l...@dhis2.org >>>>>>>> >>>>>>>> http://www.dhis2.org <https://www.dhis2.org/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>>>>> Post to : dhis2-us...@lists.launchpad.net >>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Knut Staring >>>>>>>> >>>>>>>> Dept. of Informatics, University of Oslo >>>>>>>> >>>>>>>> Norway: +4791880522 >>>>>>>> >>>>>>>> Skype: knutstar >>>>>>>> >>>>>>>> http://dhis2.org >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>>>>> Post to : dhis2-devs@lists.launchpad.net >>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Brajesh Murari, >>>>>>>> >>>>>>>> Postgraduate, Department of Computer Science and Engineering, >>>>>>>> >>>>>>>> Chaudhary Devi Lal University, Sirsa, >>>>>>>> >>>>>>>> India. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The three basic dimensions of human development: a long and healthy >>>>>>>> life, access to knowledge, and a decent standard of living. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>>>>> Post to : dhis2-devs@lists.launchpad.net >>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ******************************************* >>>>>>> >>>>>>> Calle Hedberg >>>>>>> >>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>>>>>> >>>>>>> Tel/fax (home): +27-21-685-6472 >>>>>>> >>>>>>> Cell: +27-82-853-5352 >>>>>>> >>>>>>> Iridium SatPhone: +8816-315-19119 >>>>>>> >>>>>>> Email: calle.hedb...@gmail.com >>>>>>> >>>>>>> Skype: calle_hedberg >>>>>>> >>>>>>> ******************************************* >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>>>> Post to : dhis2-us...@lists.launchpad.net >>>>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> Neeraj Gupta >>>>>> >>>>>> _______________________________________________ >>>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>>> Post to : dhis2-us...@lists.launchpad.net >>>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>>> More help : https://help.launchpad.net/ListHelp >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> >>>>> Brajesh Murari, >>>>> Postgraduate, Dept of CSE, >>>>> Chaudhary Devi Lal University, Sirsa, >>>>> India. >>>>> >>>>> The three basic dimensions of human development: a long and healthy >>>>> life, access to knowledge, and a decent standard of living. >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> ******************************************* >>>> >>>> Calle Hedberg >>>> >>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>>> >>>> Tel/fax (home): +27-21-685-6472 >>>> >>>> Cell: +27-82-853-5352 >>>> >>>> Iridium SatPhone: +8816-315-19119 >>>> >>>> Email: calle.hedb...@gmail.com >>>> >>>> Skype: calle_hedberg >>>> >>>> ******************************************* >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> Neeraj Gupta >>> >> >> >> >> -- >> >> ******************************************* >> >> Calle Hedberg >> >> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >> >> Tel/fax (home): +27-21-685-6472 >> >> Cell: +27-82-853-5352 >> >> Iridium SatPhone: +8816-315-19119 >> >> Email: calle.hedb...@gmail.com >> >> Skype: calle_hedberg >> >> ******************************************* >> >> > > > -- > Thanks, > Neeraj Gupta > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-users > Post to : dhis2-us...@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-users > More help : https://help.launchpad.net/ListHelp > > -- Lars Helge Øverland Lead developer, DHIS 2 University of Oslo Skype: larshelgeoverland l...@dhis2.org http://www.dhis2.org <https://www.dhis2.org/>
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp