Much appreciated Bob and Neeraj, much appreciated ... On Mon, Oct 24, 2016 at 3:27 PM, Knut Staring <knu...@gmail.com> wrote:
> It would be very interesting to know of other similarly large > installations. DATIM, Bangladesh, PSI perhaps - others? > > On Mon, Oct 24, 2016 at 10:39 AM, Neeraj Gupta <neeraj.h...@gmail.com> > wrote: > >> Calle, >> >> We have around 500 million record in database with 3666 data elements >> having 26 category combinations and 201 indicators and there are 14398 >> organisation units. >> >> Thanks, >> Neeraj >> >> On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg <calle.hedb...@gmail.com> >> wrote: >> >>> Neeraj, >>> >>> You never stated the number of records you have in the datavalue table - >>> what is it? >>> >>> In the same context: anybody have a rough idea of how many datavalue >>> records there are in the global DATIM database - which I think currently >>> might be the largest DHIS2 instance around? >>> >>> Given our own recent work on performance + what Neeraj has reported, >>> I've been thinking of creating one test instance with let us say 500 mill >>> datavalue records and another with let us say 1 billion, then use them to >>> identify key bottlenecks in various processes AND use them to ensure that >>> DHIS2 analytics performance is as linear as possible in terms of database >>> size. Postgresql has introduced a number of new indexing algorithms in >>> recent versions, and I'm not sure if DHIS2 is taking full advantage of them. >>> >>> Best regards >>> Calle >>> >>> On 24 October 2016 at 07:53, Brajesh Murari <brajesh.mur...@gmail.com> >>> wrote: >>> >>>> Congratulation Neeraj and team ...it much appreciated >>>> >>>> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.h...@gmail.com> >>>> wrote: >>>> >>>>> Dear Team, Thanks for all your suggestions. >>>>> >>>>> Now the time of analytic is reduced to 10 hours 41 minutes. >>>>> >>>>> We tried to VACUUM as Sam suggested but it didn't help then we >>>>> upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made >>>>> some changes in configuration file of postgres and it reduced the time. >>>>> But >>>>> the database size is still same. >>>>> >>>>> Thanks for all your help! >>>>> >>>>> Thanks, >>>>> Neeraj >>>>> >>>>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg < >>>>> calle.hedb...@gmail.com> wrote: >>>>> >>>>>> Neeraj, >>>>>> >>>>>> It's always an element of uncertainty linked to database sizes - ref >>>>>> Sam's post over. So indicating the number of records you have in the >>>>>> datavalue table & key meta-data tables would be useful + indicating >>>>>> whether >>>>>> you are running other instances on the same server. Some comments - I've >>>>>> been doing a lot of similar optimising work recently: >>>>>> >>>>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 >>>>>> before the worst bugs are fixed and it has stabilised). >>>>>> >>>>>> 2. Carefully check your postgres.conf against the recommended >>>>>> settings. The guide is a bit superficial in the sense that it has >>>>>> recommended "fixed" values only and no explanations around ranges below >>>>>> or >>>>>> above those, but you can experiment a bit yourself (e.g. the recommended >>>>>> "max_connections = 200" might not be sufficient for a really large system >>>>>> like what you have. >>>>>> >>>>>> 3. If your server is running that single instance only, then 48GB or >>>>>> RAM should be sufficient. Our servers are all having 128GB RAM so we >>>>>> experimented quite a bit earlier this year with giving a DHIS2 instance >>>>>> large amounts or RAM (up to 60-70gb), with negligible impact on >>>>>> performance. According to Lars, the DHIS2 cannot really utilize more than >>>>>> around 16gb RAM (at least that is how I understood his communication at >>>>>> the >>>>>> time). So 48GB should be sufficient for a single instance. >>>>>> >>>>>> 4. I've been doing performance optimizing recently on an instance >>>>>> with >>>>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2 >>>>>> - 31,000 Orgunits >>>>>> - 420 data elements >>>>>> - 250 indicators >>>>>> - around 100 mill datavalue records >>>>>> - total size around 140gb with analytics tables. >>>>>> >>>>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT >>>>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the >>>>>> analytics engine is designed well, I would expect a nearly linear >>>>>> relationship between database size and the time analytics takes to run. >>>>>> So >>>>>> running analytics on your database on our server should in theory take >>>>>> 4-5 >>>>>> hours. >>>>>> >>>>>> We are obviously comparing oranges and nectarines here, in the sense >>>>>> that there might be other aspects of our server and database that is >>>>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, >>>>>> whether >>>>>> your instance have lots of tracker data, etc etc). I have not seen any >>>>>> scientific/quantified comparative performance values related to specific >>>>>> parameters like number of CPUs and/or number of cores, but 12 cores >>>>>> SHOULD >>>>>> improve analytics performance quite a bit - assuming around 30% then it >>>>>> means running analytics on your database/server should take around 3 >>>>>> hours...... >>>>>> >>>>>> I tried getting comparative, quantitative data on various >>>>>> configurations of hardware and software (e.g. some users prefer CentOS, >>>>>> others Ubuntu) during the academy in August, but did not get much - it >>>>>> seems most users/providers have found a setup that works for them for now >>>>>> and nobody is doing any systematic performance testing (some of the >>>>>> international NGOs/companies using DHIS2 might have, but as with >>>>>> internally >>>>>> developed apps they are not that keen on sharing). So it would be highly >>>>>> appreciated if you would post the results on analytics time with every >>>>>> upgrade / tweak you do - starting with the upgrade to Pg 9.5.4 >>>>>> >>>>>> Best regards >>>>>> Calle >>>>>> >>>>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.john...@qebo.co.uk> >>>>>> wrote: >>>>>> >>>>>>> Hi Neeraj, >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Using VACUUM and ANALYZE* >>>>>>> >>>>>>> >>>>>>> >>>>>>> Like Brajesh, my background is MySQL, and one database admin task >>>>>>> that is often overlooked in MySQL is OPTIMIZE TABLEs. This reclaims >>>>>>> unused >>>>>>> space (we’ve had 100Gb databases files drop to half their size) and >>>>>>> refreshes index statistics (if the shape of your data has changed over >>>>>>> time, this can make indices run faster). >>>>>>> >>>>>>> >>>>>>> >>>>>>> I’m new to PostgreSQL, but the core principles are the same, and a >>>>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the >>>>>>> VACUUM and ANALYZE commands. If your database isn’t set to >>>>>>> automatically >>>>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you >>>>>>> might want to try VACUUM *FULL*, which will literally rewrite all >>>>>>> of your database tables and indices into smaller, more efficient files >>>>>>> (note, however, that on a 500Gb database this could take a *looong* >>>>>>> time – perhaps test on a backup first?). The following forum post is a >>>>>>> really nice, plain-English explanation of what VACUUM does: >>>>>>> >>>>>>> http://dba.stackexchange.com/questions/126258/what-is-table- >>>>>>> bloating-in-databases >>>>>>> >>>>>>> >>>>>>> >>>>>>> As I mentioned, my background is MySQL rather than Postgres, so >>>>>>> someone with more specific Postgres experience might like to also chime >>>>>>> in >>>>>>> here. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Cheers, Sam. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson= >>>>>>> qebo.co...@lists.launchpad.net> on behalf of Brajesh Murari < >>>>>>> brajesh.mur...@gmail.com> >>>>>>> *Date: *Wednesday, 19 October 2016 at 08:28 >>>>>>> *To: *Knut Staring <knu...@gmail.com> >>>>>>> *Cc: *DHIS 2 Users list <dhis2-users@lists.launchpad.net>, DHIS2 >>>>>>> Developers <dhis2-d...@lists.launchpad.net> >>>>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing >>>>>>> Analytic >>>>>>> >>>>>>> >>>>>>> >>>>>>> Dear Neeraj, >>>>>>> >>>>>>> >>>>>>> >>>>>>> The physical database size doesn't matter much, even the number of >>>>>>> records don't matter. In my experience the biggest problem that one can >>>>>>> going to run in to is not size, but the number of queries you can >>>>>>> handle at >>>>>>> a time instance specially during analytic functionality execution. Most >>>>>>> probably you should going to have to move to a master/slave >>>>>>> configuration >>>>>>> of your database, so that the read queries can run against the slaves >>>>>>> and >>>>>>> the write queries run against the master. However, if you and your >>>>>>> database >>>>>>> management team are not ready for this than, you can tweak your indexes >>>>>>> for >>>>>>> the queries you are running to speed up the response times. Also there >>>>>>> is a >>>>>>> lot of tweaking you can do to the network stack and kernel in Linux >>>>>>> where >>>>>>> MySQL Server has been installed that will help.Perhaps, I would focus >>>>>>> first >>>>>>> on your indexes, then have a server admin look at your OS, and if all >>>>>>> that >>>>>>> doesn't help it might be time to implement a master/slave configuration. >>>>>>> The most important scalability factor is RAM. If the indexes of your >>>>>>> tables >>>>>>> fit into memory and your queries are highly optimized in analytic >>>>>>> functionality, you can serve a reasonable amount of requests with a >>>>>>> average >>>>>>> machine. The number of records do matter, depending of how your tables >>>>>>> look >>>>>>> like. It's a difference to have a lot of varchar fields or only a >>>>>>> couple of >>>>>>> ints or longs. The physical size of the database matters as well, >>>>>>> think of >>>>>>> backups, for instance. Depending on your engine, your physical db files >>>>>>> on >>>>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of >>>>>>> rows, doesn't help to shrink your physical files. Thus the database size >>>>>>> does matter. If you have more than one table with more than a million >>>>>>> records, then performance starts indeed to degrade. Indexig is one of >>>>>>> the >>>>>>> important stand need to take care, If you hit one million records you >>>>>>> will >>>>>>> get performance problems, if the indices are not set right (for example >>>>>>> no >>>>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). >>>>>>> If >>>>>>> you hit 10 million records, you will start to get performance problems >>>>>>> even >>>>>>> if you have all your indices right. Hardware upgrades - adding more >>>>>>> memory >>>>>>> and more processor power, especially memory - often help to reduce the >>>>>>> most >>>>>>> severe problems by increasing the performance again, at least to a >>>>>>> certain >>>>>>> degree. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knu...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Just a heads-up that there seems to be a JDBC issue with Postgres >>>>>>> 9.6, so perhaps you should try upgrading to 9.5 first. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <l...@dhis2.org> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Neeraj, >>>>>>> >>>>>>> >>>>>>> >>>>>>> what usually helps to improve runtime is to improve/increase: >>>>>>> >>>>>>> >>>>>>> >>>>>>> - ssd (read and write speed) >>>>>>> >>>>>>> - number of CPUs >>>>>>> >>>>>>> - using latest postgresql (9.6 claims to have even better indexing >>>>>>> performance >>>>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than >>>>>>> 9.5) >>>>>>> >>>>>>> - tuning >>>>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning> >>>>>>> of postgresql >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Lars >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Lars Helge Øverland >>>>>>> >>>>>>> Lead developer, DHIS 2 >>>>>>> >>>>>>> University of Oslo >>>>>>> >>>>>>> Skype: larshelgeoverland >>>>>>> >>>>>>> l...@dhis2.org >>>>>>> >>>>>>> http://www.dhis2.org <https://www.dhis2.org/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>>>> Post to : dhis2-users@lists.launchpad.net >>>>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Knut Staring >>>>>>> >>>>>>> Dept. of Informatics, University of Oslo >>>>>>> >>>>>>> Norway: +4791880522 >>>>>>> >>>>>>> Skype: knutstar >>>>>>> >>>>>>> http://dhis2.org >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>>>> Post to : dhis2-d...@lists.launchpad.net >>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Best Regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Brajesh Murari, >>>>>>> >>>>>>> Postgraduate, Department of Computer Science and Engineering, >>>>>>> >>>>>>> Chaudhary Devi Lal University, Sirsa, >>>>>>> >>>>>>> India. >>>>>>> >>>>>>> >>>>>>> >>>>>>> The three basic dimensions of human development: a long and healthy >>>>>>> life, access to knowledge, and a decent standard of living. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mailing list: https://launchpad.net/~dhis2-devs >>>>>>> Post to : dhis2-d...@lists.launchpad.net >>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs >>>>>>> More help : https://help.launchpad.net/ListHelp >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ******************************************* >>>>>> >>>>>> Calle Hedberg >>>>>> >>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>>>>> >>>>>> Tel/fax (home): +27-21-685-6472 >>>>>> >>>>>> Cell: +27-82-853-5352 >>>>>> >>>>>> Iridium SatPhone: +8816-315-19119 >>>>>> >>>>>> Email: calle.hedb...@gmail.com >>>>>> >>>>>> Skype: calle_hedberg >>>>>> >>>>>> ******************************************* >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>>> Post to : dhis2-users@lists.launchpad.net >>>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>>> More help : https://help.launchpad.net/ListHelp >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Neeraj Gupta >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~dhis2-users >>>>> Post to : dhis2-users@lists.launchpad.net >>>>> Unsubscribe : https://launchpad.net/~dhis2-users >>>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Brajesh Murari, >>>> Postgraduate, Dept of CSE, >>>> Chaudhary Devi Lal University, Sirsa, >>>> India. >>>> >>>> The three basic dimensions of human development: a long and healthy >>>> life, access to knowledge, and a decent standard of living. >>>> >>> >>> >>> >>> -- >>> >>> ******************************************* >>> >>> Calle Hedberg >>> >>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA >>> >>> Tel/fax (home): +27-21-685-6472 >>> >>> Cell: +27-82-853-5352 >>> >>> Iridium SatPhone: +8816-315-19119 >>> >>> Email: calle.hedb...@gmail.com >>> >>> Skype: calle_hedberg >>> >>> ******************************************* >>> >>> >> >> >> -- >> Thanks, >> Neeraj Gupta >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~dhis2-devs >> Post to : dhis2-d...@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~dhis2-devs >> More help : https://help.launchpad.net/ListHelp >> >> > > > -- > Knut Staring > Dept. of Informatics, University of Oslo > Norway: +4791880522 > Skype: knutstar > http://dhis2.org > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-users > Post to : dhis2-users@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-users > More help : https://help.launchpad.net/ListHelp > > -- Best Regards, Brajesh Murari, Postgraduate, Dept of CSE, Chaudhary Devi Lal University, Sirsa, India. The three basic dimensions of human development: a long and healthy life, access to knowledge, and a decent standard of living.
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-users Post to : dhis2-users@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-users More help : https://help.launchpad.net/ListHelp