Well Valerio, my expertise is in ontology and knowledge representation, so I hope not to give you wrong information.
I do not have a back up, because I have not finished the first load of data, which it is a 1,5 M items, around 60 M of triples. At the beginning I realized that increasing threads increased the edition, with threads we increased edition to 4 items/second, then we improved hardware, and with the sysadmin made the following adjustment to our database: For database: We heave test with mariadb and mysql Both we have set in tmpfs (temporary file storage) and done following settings in mysqld.cnf tmpdir = /var/lib/mysql/mysqltmp query_cache_limit = 0 query_cache_size = 0 innodb_buffer_pool_size = 8G innodb_flush_log_at_trx_commit=2 With this improvement we reached our max edition 10 items/second. However, we do not know how to make further improvement, given that we must load 30 M items, which seems to be very long task. Hope this information could shade some lights to our use case, and perhaps helps us to improve our work. Luis > Valerio Bozzolan <[email protected]> hat am 29. Januar 2020 um 08:44 > geschrieben: > > > Thank you for the clarification, > > First of all let me clarify that on you private Wikibase instance - on your > own hardware - you can surely do whatever you want and flood your APIs > without asking any permission. So, if you reached a pratical edit/second > limitation, probably you may want to find some hardware bottlenecks with the > help of a sysadmin. > > As a note "in case of fire" you can just restore your database backup instead > of re-running your bot another time. (You have a backup, isn't it? :) > > Warm wishes > > On January 29, 2020 8:12:40 AM GMT+01:00, wp1080397-lsrs wp1080397-lsrs > <[email protected]> wrote: > >If I understand your request, I can not provide you such discussion, > >because I did not participate in any discussion for bot approval, our > >administrator configured > >the bots in our private instance. > > > > > >Hope you can provide me some additional support, and > > > >if you require further information, please let me now, and I would > >answer ASAP. > > > > > >Best regards > > > > > >Luis Ramos > > > > > >> Valerio Bozzolan <[email protected]> hat am 28. Januar 2020 um 17:43 > >geschrieben: > >> > >> > >> In order to further help you, can I ask you your Wikidata bot > >approval > >> discussion? > >> > >> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot > >> > >> On Tue, 2020-01-28 at 10:19 +0100, wp1080397-lsrs wp1080397-lsrs > >wrote: > >> > Dear Valerio, > >> > > >> > Thanks for the quick answer, if I understood your answer, we should > >> > be using an inappropriate approach at doing parallel programming in > >> > the edition process. > >> > In this case, we are aiming to have the data available asap, as > >soon > >> > as we have it we should use another approach. > >> > > >> > The question I made is about the necessity of loading large data > >> > sets, because in the case of private instances, we need to load > >> > 20.000.000 of items for private use, and with a rate of 10 items > >per > >> > second, using the approach we are following we will require 25 > >days, > >> > with a script writing 24 hour a day, and speaking in big data > >terms, > >> > 20 M is an small data set. > >> > > >> > So, I leave an open question: > >> > > >> > my questions is if there is some experience when has been possible > >to > >> > have a higher speed in edition rate?. > >> > > >> > Best regards > >> > > >> > > >> > > Valerio Bozzolan <[email protected]> hat am 28. Januar 2020 um > >> > > 09:28 geschrieben: > >> > > > >> > > > >> > > Please note that - AFAIK - parallel requests are not well > >accepted. > >> > > > >> > > https://www.mediawiki.org/wiki/API:Etiquette > >> > > > >> > > (You may have a bigger problem now :^) > >> > > > >> > > On Tue, 2020-01-28 at 08:13 +0100, wp1080397-lsrs wp1080397-lsrs > >> > > wrote: > >> > > > Dear friends, > >> > > > We have been working for some months in a wikidata project, and > >> > > > we > >> > > > have found an issue with edition performance, I began to work > >> > > > with > >> > > > wikidata java api, and when I tried to increase the edition > >speed > >> > > > the > >> > > > java system held editions, and inserted delays, which reduced > >> > > > edition > >> > > > output as well. > >> > > > I chose the option to edit with pywikibot, but my experience > >was > >> > > > that > >> > > > this reduced more the edition. > >> > > > At the end we use the procedure indicated here: > >> > > > https://www.mediawiki.org/wiki/API:Edit#Example > >> > > > With multithreading, and we reach a maximum of 10,6 edition per > >> > > > second. > >> > > > my questions is if there is some experience when has been > >> > > > possible to > >> > > > have a higher speed?. > >> > > > Currently we need to write 1.500.000 items, and we would > >require > >> > > > 5 > >> > > > working days for such a task. > >> > > > Best regards > >> > > > Luis Ramos > >> > > > Senior Java Developer > >> > > > (Semantic Web Developer) > >> > > > PST.AG > >> > > > Jena, Germany. > >> > > > > >> > > > _______________________________________________ > >> > > > Mediawiki-api mailing list > >> > > > [email protected] > >> > > > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > >> > > > >> > > _______________________________________________ > >> > > Mediawiki-api mailing list > >> > > [email protected] > >> > > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > >> > > >> > Luis Ramos > >> > > >> > > >> > Senior Java Developer > >> > > >> > > >> > (Semantic Web Developer) > >> > > >> > > >> > PST.AG > >> > > >> > > >> > Jena, Germany. > >> > > >> > _______________________________________________ > >> > Mediawiki-api mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > >> > >> > >> _______________________________________________ > >> Mediawiki-api mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > > >Luis Ramos > > > > > >Senior Java Developer > > > > > >(Semantic Web Developer) > > > > > >PST.AG > > > > > >Jena, Germany. > > -- > E-mail sent from the "K-9 mail" app from F-Droid, installed in my LineageOS > device without proprietary Google apps. I'm delivering through my Postfix > mailserver installed in a Debian GNU/Linux. > > Have fun with software freedom! > > [[User:Valerio Bozzolan]] > > _______________________________________________ > Mediawiki-api mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api Luis Ramos Senior Java Developer (Semantic Web Developer) PST.AG Jena, Germany. _______________________________________________ Mediawiki-api mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
