Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)
Jörn, > On 30 Nov 2015, at 15:47, Jörn Hees wrote: > > Hi, > > any ideas on this? > >> On 23 Nov 2015, at 04:01, Jörn Hees wrote: >> Reading the doc page, i have two remaining questions: After a normal `rdf_loader_run()`, would a `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those cases and will otherwise never arrive at a complete free-text index (not even after the background tasks finished?)? >>> >>> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so >>> you can wait for it to run or run it manually itself. >> >> >> Yes, that's what i understood, but in the >> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext >> there's this remark: >> >>> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some >>> applications (e.g. those that import billions of triples) may set off >>> triggers. This will make free-text index data incomplete. Calling procedure >>> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items >>> by dropping and re-inserting every existing free-text index rule. >> >> So i'm asking: is `rdf_loader_run();` one of those "applications" which >> deactivate some triggers leading to an incomplete full-text index and need >> `DB.DBA.RDF_OBJ_FT_RECOVER();` to be called? > > Or is it enough to wait or call `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`? [Hugh] calling `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` is all that is required such that an new data get re-indexed as is called in the scheduler event which fires every minute to keep the index up to date … Regards Hugh > > > Cheers, > > Jörn > > > -- > Go from Idea to Many App Stores Faster with Intel(R) XDK > Give your users amazing mobile app experiences with Intel(R) XDK. > Use one codebase in this all-in-one HTML5 development environment. > Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 > ___ > Virtuoso-users mailing list > Virtuoso-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/virtuoso-users smime.p7s Description: S/MIME cryptographic signature -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)
Hi, any ideas on this? > On 23 Nov 2015, at 04:01, Jörn Hees wrote: > >>> >>> Reading the doc page, i have two remaining questions: >>> >>> After a normal `rdf_loader_run()`, would a >>> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete >>> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in >>> those cases and will otherwise never arrive at a complete free-text index >>> (not even after the background tasks finished?)? >> >> [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you >> can wait for it to run or run it manually itself. > > > Yes, that's what i understood, but in the > http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext > there's this remark: > >> One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some >> applications (e.g. those that import billions of triples) may set off >> triggers. This will make free-text index data incomplete. Calling procedure >> DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items >> by dropping and re-inserting every existing free-text index rule. > > So i'm asking: is `rdf_loader_run();` one of those "applications" which > deactivate some triggers leading to an incomplete full-text index and need > `DB.DBA.RDF_OBJ_FT_RECOVER();` to be called? Or is it enough to wait or call `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`? Cheers, Jörn -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)
Hi, > On 23 Nov 2015, at 03:36, Hugh Williams wrote: > >>> >>> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext >> >> I really think that this could be it, as by default there seems to be an >> "all" index. > > [Hugh] If you installed the Virtuoso Faceted Browser then the FT index would > be enabled and run as a scheduled job. It's a "vanilla" virtuoso-server as installed from the .deb packages of the 7.2.1 sources, the only VAD package installed is the conductor. >> >> Reading the doc page, i have two remaining questions: >> >> After a normal `rdf_loader_run()`, would a >> `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete >> full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those >> cases and will otherwise never arrive at a complete free-text index (not >> even after the background tasks finished?)? > > [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you > can wait for it to run or run it manually itself. Yes, that's what i understood, but in the http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext there's this remark: > One problem related to free-text indexing of DB.DBA.RDF_QUAD is that some > applications (e.g. those that import billions of triples) may set off > triggers. This will make free-text index data incomplete. Calling procedure > DB.DBA.RDF_OBJ_FT_RECOVER () will insert all missing free-text index items by > dropping and re-inserting every existing free-text index rule. So i'm asking: is `rdf_loader_run();` one of those "applications" which deactivate some triggers leading to an incomplete full-text index and need `DB.DBA.RDF_OBJ_FT_RECOVER();` to be called? Or are the only two interesting alternatives for me waiting or calling `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();`? >> If i have to, a mention of this around >> http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod >> would be nice. >> >> I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the >> DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very >> little IO. The whole importing of that dataset only took 1:30 hours, but the >> full-text indexing is still running after 3 hours now... Is there any way to >> go full speed at the cost of locking the whole DB or something? > > [Hugh] Will have to check with development as I am not aware of a param to > control CPU usage, it should run with full platform utilisation I would have > thought Thanks, Jörn -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140 ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
Re: [Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)
Jörn, > On 23 Nov 2015, at 01:46, Jörn Hees wrote: > > Hi Hugh, > > thanks again for your replies... > >> On 22 Nov 2015, at 01:46, Hugh Williams wrote: >> >>> What puzzles me is that after import and several checkpoints and restarts, >>> just leaving the DB idle without any queries (see below) it seems to become >>> busy. >>> I guess it does some kind of "re-organization" and i'd mostly like to find >>> out how i can tell it "do it now, take all resources you want, don't care >>> if anyone is waiting, admin override, full speed ;)". >>> That would allow me to then have that static state of the DB which i can >>> back-up and replay if things go wrong or someone wants an old version, >>> leaving us with "ready to use" backups, and not such that first start some >>> lengthy "re-organization after mass import". >>> >>> The mentioned "re-organization state" now seems to be over after leaving >>> the DB switched on and idle for the last couple of days. >> >> [Hugh] Does your database have Full Text indexing enabled which would is a >> scheduled background task that would take time to complete on a newly loaded >> large database like yours, see: >> >> >> http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext > > I really think that this could be it, as by default there seems to be an > "all" index. [Hugh] If you installed the Virtuoso Faceted Browser then the FT index would be enabled and run as a scheduled job. > > Reading the doc page, i have two remaining questions: > > After a normal `rdf_loader_run()`, would a > `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete > full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those > cases and will otherwise never arrive at a complete free-text index (not even > after the background tasks finished?)? [Hugh] The scheduler will run `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` so you can wait for it to run or run it manually itself. > If i have to, a mention of this around > http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod > would be nice. > > I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the > DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very > little IO. The whole importing of that dataset only took 1:30 hours, but the > full-text indexing is still running after 3 hours now... Is there any way to > go full speed at the cost of locking the whole DB or something? [Hugh] Will have to check with development as I am not aware of a param to control CPU usage, it should run with full platform utilisation I would have thought … Regards Hugh > > Cheers, > Jörn > > > -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140 ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users
[Virtuoso-users] full-text indexing after mass import (was: rdf_loader_run logs "missed delete of name id cache" messages)
Hi Hugh, thanks again for your replies... > On 22 Nov 2015, at 01:46, Hugh Williams wrote: > >> What puzzles me is that after import and several checkpoints and restarts, >> just leaving the DB idle without any queries (see below) it seems to become >> busy. >> I guess it does some kind of "re-organization" and i'd mostly like to find >> out how i can tell it "do it now, take all resources you want, don't care if >> anyone is waiting, admin override, full speed ;)". >> That would allow me to then have that static state of the DB which i can >> back-up and replay if things go wrong or someone wants an old version, >> leaving us with "ready to use" backups, and not such that first start some >> lengthy "re-organization after mass import". >> >> The mentioned "re-organization state" now seems to be over after leaving the >> DB switched on and idle for the last couple of days. > > [Hugh] Does your database have Full Text indexing enabled which would is a > scheduled background task that would take time to complete on a newly loaded > large database like yours, see: > > > http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext I really think that this could be it, as by default there seems to be an "all" index. Reading the doc page, i have two remaining questions: After a normal `rdf_loader_run()`, would a `DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ();` be sufficient to get a complete full-text index? Or do i have to run `DB.DBA.RDF_OBJ_FT_RECOVER();` in those cases and will otherwise never arrive at a complete free-text index (not even after the background tasks finished?)? If i have to, a mention of this around http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfloadinglod would be nice. I ran `DB.DBA.RDF_OBJ_FT_RECOVER();` on a small instance with just the DBpedia core (~ 430 M triples) and it seems to only use 2 - 3 CPUs with very little IO. The whole importing of that dataset only took 1:30 hours, but the full-text indexing is still running after 3 hours now... Is there any way to go full speed at the cost of locking the whole DB or something? Cheers, Jörn -- Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140 ___ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users