Hi Roberto,

You only need to run the ld_dir() function once on a single server Virtuoso 
instance which logs all the datasets to load in the “load_list” table as well 
as there status ie not loaded, in progress, complete which enables multiple 
clients to load datasets in parallel by checking and loading datasets whose 
status is not loaded and each continuing to pick off the queue until all are 
complete. So you can have a script of the following form to load with multiple 
datasets in parallel:

$ more btc_load.sh  
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
isql 1162 dba dba exec="rdf_loader_run();" & 
wait 
isql 1162 dba dba exec="checkpoint;" 
$ 

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.      //              http://www.openlinksw.com/
10 Burlington Mall Road, Suite 265, Burlington MA 01803
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

On 29 Feb 2012, at 18:30, Roberto Mirizzi wrote:

> Hugh,
> 
> Il 29/02/2012 14:59, Hugh Williams ha scritto:
>> 
>> Hi Roberto,
>> 
>> Run the “status(‘’);” command from the isql prompt which should the Buffers 
>> allocated and buffer used, I suspect you will find it is 2000 (default) 
>> rather than 1360000,  as those spaces you have on the NumberOfBuffers line 
>> need to be removed ie should be:
> 
> It's exactly as you said. I'll try to modify this value and see what happens.
> 
> Thanks a lot also for the hint about rdf_loader_run(). 
> In this case, are you suggesting me to open several isql prompts, and 
> distribute the files to load on each process with ld_add('file1'), 
> ld_add(file2), etc., and then launch rdf_loader_run() from each prompt, or 
> the "ld_add..." is not necessary?
> 
> Thank you very much,
> roberto
> 
> 
>> 
>>> ;; Uncomment next two lines if there is 16 GB system memory free
>>> NumberOfBuffers          = 1360000
>>> MaxDirtyBuffers          = 1000000
>> 
>> Which is a know issue to be fixed in the next release,
>> 
>> Then restart Virtuoso and try again. Note you can also run multiple 
>> “rdf_loader_run()” instances, one per processor core, if you are loading 
>> many datasets for better parallel loading of data and hence quick completion 
>> ...
>> 
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>> 10 Burlington Mall Road, Suite 265, Burlington MA 01803
>> Weblog   -- http://www.openlinksw.com/blogs/
>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>> Twitter  -- http://twitter.com/OpenLink
>> Google+  -- http://plus.google.com/100570109519069333827/
>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>> Universal Data Access, Integration, and Management Technology Providers
>> 
>> On 29 Feb 2012, at 13:03, Roberto Mirizzi wrote:
>> 
>>> Hi Hugh,
>>> 
>>> Il 29/02/2012 13:51, Hugh Williams ha scritto:
>>>> Hi Roberto,
>>>> 
>>>> Have you tuned your Virtuoso Server for hosting large datasets as 
>>>> indicated in the Bulk loader document, which directs you to :
>>>> 
>>>>  http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning
>>>> 
>>>> as it source as if the server is under resourced ?
>>> 
>>> thank you very much for your answer.
>>> Since our server has 32GB ram, I set these values for buffers:
>>> 
>>> ;; Uncomment next two lines if there is 16 GB system memory free
>>>       NumberOfBuffers          = 1360000
>>>       MaxDirtyBuffers          = 1000000
>>> 
>>> Then,
>>> [Database]
>>> Striping = 0
>>> 
>>> [Parameters]
>>> ServerThreads = 100
>>> DirsAllowed = path-to-the-dump
>>> 
>>> [HTTPServer]
>>> ServerThreads = 100
>>> 
>>> [SPARQL]
>>> ResultSetMaxRows                = 120000
>>> MaxQueryCostEstimationTime      = 60000 ; in seconds
>>> MaxQueryExecutionTime           = 600   ; in seconds
>>> 
>>> 
>>> The rest is was not modified.
>>> It seems to be everything correctly tuned. :-(
>>> 
>>> 
>>> regards,
>>> roberto
>>> 
>>> 
>>> 
>>>> 
>>>> The LOD Cloud Cache (lod.openlinksw.com) is currently down for major 
>>>> Virtuoso Server update and load of 50+ billion triples, and was announced 
>>>> on twitter etc. It should be back online soon ...
>>>> 
>>>> Best Regards
>>>> Hugh Williams
>>>> Professional Services
>>>> OpenLink Software, Inc.      //              http://www.openlinksw.com/
>>>> 10 Burlington Mall Road, Suite 265, Burlington MA 01803
>>>> Weblog   -- http://www.openlinksw.com/blogs/
>>>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>>>> Twitter  -- http://twitter.com/OpenLink
>>>> Google+  -- http://plus.google.com/100570109519069333827/
>>>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>>>> Universal Data Access, Integration, and Management Technology Providers
>>>> 
>>>> On 29 Feb 2012, at 12:27, Roberto Mirizzi wrote:
>>>> 
>>>>> Hi all,
>>>>> I think one of our IP addresses has been blacklistened by dbpedia
>>>>> servers. We use these addresses just for research purposes within my
>>>>> university. Who should I contact for kindly asking to enable it again?
>>>>> 
>>>>> Ok, the obvious answer to this question could be: "install a local dump
>>>>> of DBpedia and don't bother DBpedia server".
>>>>> Well, that's what I would really like to do. We used to have dbpedia
>>>>> dumps 3.5. Then, we recently decided to install a brand new fresh
>>>>> version with dump 3.7. The nightmare started. :-) Here's my story.
>>>>> 
>>>>> I successfully installed Virtuoso Opensource 6.1.4 (latest version) on a
>>>>> Linux Ubuntu 10.04 64bit distribution with 32GB ram.
>>>>> Then, I tried several times to follow the instructions at:
>>>>> http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderExampleDbpedia
>>>>> (I successfully did the same a couple of years ago for dump 3.5).
>>>>> Unfortunately, after one hour or two of correct execution, the
>>>>> rdf_loader_run() procedure stucks, the virtuoso-t process result active
>>>>> (at least it seems to be active since a "ps aux|grep virtuoso" shows me
>>>>> the process has not been killed), but everything concerning virtuoso
>>>>> seems to be dead: the web interface http://localhost:8890 does not
>>>>> respond anymore, a "top" command from the shell does not show "virtuoso"
>>>>> (while in the beginning it used 100% of CPU), the "isql-v" command
>>>>> allows me to correctly log in, but then the instructions does not respond.
>>>>> The virtuoso.log file does not show anything wrong.
>>>>> 
>>>>> Finally, I've observed (from the virtuoso.log file) some of the .nt
>>>>> files of the dump contain incorrect triples. For example, I get this
>>>>> error message:
>>>>> File /dbpedia-dump/3.7/en/external_links_en.nt error 23000 SR133:  Can
>>>>> not set NULL to not nullable column 'DB.DBA.RDF_QUAD.O'
>>>>> 
>>>>> The problem is that when such an error is encountered, I think the
>>>>> loading of that file does not go on. In other words, I could lose
>>>>> important triples.
>>>>> 
>>>>> Does anyone has any successfully/unsuccessfully experiences about
>>>>> installing DBpedia dump 3.7 on Virtuoso?
>>>>> 
>>>>> ps: where is the openlinksw beloved endpoint?
>>>>> http://lod.openlinksw.com/sparql/
>>>>> 
>>>>> 
>>>>> Thanks in advance,
>>>>> roberto
>>>>> 
>>>>> -- 
>>>>> Roberto Mirizzi
>>>>> Politecnico of Bari
>>>>> http://sisinflab.poliba.it/mirizzi
>>>>> 
>>>>> 
>>>>> 
>>>>> ------------------------------------------------------------------------------
>>>>> Virtualization&  Cloud Management Using Capacity Planning
>>>>> Cloud computing makes use of virtualization - but cloud computing
>>>>> also focuses on allowing computing to be delivered as a service.
>>>>> http://www.accelacomm.com/jaw/sfnl/114/51521223/
>>>>> _______________________________________________
>>>>> Dbpedia-discussion mailing list
>>>>> Dbpedia-discussion@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> 
>> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to