Well Valerio, my expertise is in ontology and knowledge representation, so I 
hope not to give you wrong information. 

I do not have a back up, because I have not finished the first load of data, 
which it is a 1,5 M items, around 60 M of triples. 

At the beginning I realized that increasing threads increased the edition, with 
threads we increased edition to 4 items/second, then we improved hardware, and 
with the sysadmin made the following adjustment to our database:

For database: 

We heave test with mariadb and mysql
Both we have set in tmpfs (temporary file storage) and done following settings 
in mysqld.cnf

tmpdir          = /var/lib/mysql/mysqltmp
query_cache_limit       = 0
query_cache_size        = 0
innodb_buffer_pool_size = 8G
innodb_flush_log_at_trx_commit=2

With this improvement we reached our max edition 10 items/second. 


However, we do not know how to make further improvement, given that we must 
load 30 M items, which seems to be very long task. 

Hope this information could shade some lights to our use case, and perhaps 
helps us to improve our work. 


Luis 





> Valerio Bozzolan <[email protected]> hat am 29. Januar 2020 um 08:44 
> geschrieben:
> 
> 
> Thank you for the clarification,
> 
> First of all let me clarify that on you private Wikibase instance - on your 
> own hardware - you can surely do whatever you want and flood your APIs 
> without asking any permission. So, if you reached a pratical edit/second 
> limitation, probably you may want to find some hardware bottlenecks with the 
> help of a sysadmin.
> 
> As a note "in case of fire" you can just restore your database backup instead 
> of re-running your bot another time. (You have a backup, isn't it? :)
> 
> Warm wishes
> 
> On January 29, 2020 8:12:40 AM GMT+01:00, wp1080397-lsrs wp1080397-lsrs 
> <[email protected]> wrote:
> >If I understand your request, I can not provide you such discussion,
> >because I did not participate in any discussion for bot approval, our
> >administrator configured 
> >the bots in our private instance.
> >
> >
> >Hope you can provide me some additional support, and 
> >
> >if you require further information, please let me now, and I would
> >answer ASAP.
> >
> >
> >Best regards
> >
> >
> >Luis Ramos
> >
> >
> >> Valerio Bozzolan <[email protected]> hat am 28. Januar 2020 um 17:43
> >geschrieben:
> >> 
> >> 
> >> In order to further help you, can I ask you your Wikidata bot
> >approval
> >> discussion?
> >> 
> >> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot
> >> 
> >> On Tue, 2020-01-28 at 10:19 +0100, wp1080397-lsrs wp1080397-lsrs
> >wrote:
> >> > Dear Valerio, 
> >> > 
> >> > Thanks for the quick answer, if I understood your answer, we should
> >> > be using an inappropriate approach at doing parallel programming in
> >> > the edition process. 
> >> > In this case, we are aiming to have the data available asap, as
> >soon
> >> > as we have it we should use another approach. 
> >> > 
> >> > The question I made is about the necessity of loading  large data
> >> > sets, because in the case of private instances, we need to load
> >> > 20.000.000 of items for private use, and with a rate of 10 items
> >per
> >> > second, using the approach we are following we will require 25
> >days,
> >> > with a script writing 24 hour a day, and speaking in big data
> >terms,
> >> > 20 M is an small data set. 
> >> > 
> >> > So, I leave an open question:
> >> > 
> >> > my questions is if there is some experience when has been possible
> >to
> >> > have a higher speed in edition rate?.
> >> > 
> >> > Best regards
> >> > 
> >> > 
> >> > > Valerio Bozzolan <[email protected]> hat am 28. Januar 2020 um
> >> > > 09:28 geschrieben:
> >> > > 
> >> > > 
> >> > > Please note that - AFAIK - parallel requests are not well
> >accepted.
> >> > > 
> >> > > https://www.mediawiki.org/wiki/API:Etiquette
> >> > > 
> >> > > (You may have a bigger problem now :^)
> >> > > 
> >> > > On Tue, 2020-01-28 at 08:13 +0100, wp1080397-lsrs wp1080397-lsrs
> >> > > wrote:
> >> > > > Dear friends, 
> >> > > > We have been working for some months in a wikidata project, and
> >> > > > we
> >> > > > have found an issue with edition performance, I began to work
> >> > > > with
> >> > > > wikidata java api, and when I tried to increase the edition
> >speed
> >> > > > the
> >> > > > java system held editions, and inserted delays, which reduced
> >> > > > edition
> >> > > > output as well. 
> >> > > > I chose the option to edit with pywikibot, but my experience
> >was
> >> > > > that
> >> > > > this reduced more the edition.
> >> > > > At the end we use the procedure indicated here:
> >> > > > https://www.mediawiki.org/wiki/API:Edit#Example
> >> > > > With multithreading, and we reach a maximum of 10,6 edition per
> >> > > > second. 
> >> > > > my questions is if there is some experience when has been
> >> > > > possible to
> >> > > > have a higher speed?.
> >> > > > Currently we need to write 1.500.000 items, and we would
> >require
> >> > > > 5
> >> > > > working days for such a task.
> >> > > > Best regards
> >> > > > Luis Ramos
> >> > > > Senior Java Developer
> >> > > > (Semantic Web Developer)
> >> > > > PST.AG
> >> > > > Jena, Germany. 
> >> > > > 
> >> > > > _______________________________________________
> >> > > > Mediawiki-api mailing list
> >> > > > [email protected]
> >> > > > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
> >> > > 
> >> > > _______________________________________________
> >> > > Mediawiki-api mailing list
> >> > > [email protected]
> >> > > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
> >> > 
> >> > Luis Ramos
> >> > 
> >> > 
> >> > Senior Java Developer
> >> > 
> >> > 
> >> > (Semantic Web Developer)
> >> > 
> >> > 
> >> > PST.AG
> >> > 
> >> > 
> >> > Jena, Germany.
> >> > 
> >> > _______________________________________________
> >> > Mediawiki-api mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
> >> 
> >> 
> >> _______________________________________________
> >> Mediawiki-api mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
> >
> >Luis Ramos
> >
> >
> >Senior Java Developer
> >
> >
> >(Semantic Web Developer)
> >
> >
> >PST.AG
> >
> >
> >Jena, Germany.
> 
> -- 
> E-mail sent from the "K-9 mail" app from F-Droid, installed in my LineageOS 
> device without proprietary Google apps. I'm delivering through my Postfix 
> mailserver installed in a Debian GNU/Linux.
> 
> Have fun with software freedom!
> 
> [[User:Valerio Bozzolan]]
> 
> _______________________________________________
> Mediawiki-api mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Luis Ramos


Senior Java Developer


(Semantic Web Developer)


PST.AG


Jena, Germany.

_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to