Re: [Virtuoso-users] Loading relational data into Virtuoso

2014-09-08 Thread Medha Atre
Hi Hugh,

This is in reference to our previous conversation about using Virtuoso
as a pure DB store and using SQL commands.

Basically I want to upload my own raw integer valued 3-column table
(it's RDF data parsed and converted to integer IDs, as I don't want
Virtuoso or any system to deal with the strings or URIs/IRIs).

I was following the RDF performance tuning instructions given at
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning
for Virtuoso, and I am confused about two points -

1) What is the difference between datatype IRI_ID_8 and INTEGER?

2) If I want to use INTEGER datatype in my tables (e.g. "CREATE TABLE
rdftest (S INTEGER, P INTEGER, O INTEGER, PRIMARY KEY (P,S,O)
COLUMN)"), how should the ALTER INDEX command like "ALTER INDEX rdf_s
ON rdftest PARTITION (S INT (0hex00));" be re-written? What does
"S INT (0hex00)" do? Does it depend on the datatype of the column
or does it only specify the blocksize of partition?

Could you please clarify this to me?

Thanks.
Medha



On Mon, Aug 4, 2014 at 6:45 PM, Hugh Williams  wrote:
> Hi Medha,
>
> What then is the form of the SQL data you are seeking to load, as if
> standard ANSI SQL you are seeking to load this can done using the "load"
> command of the isql command line tool:
>
> http://docs.openlinksw.com/virtuoso/isql.html#isqlcommands
>
> or if in CVS form it can be loaded with the CVS loader:
>
> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtCsvFileBulkLoader
>
> The following white paper details the Virtuoso RDF Quad Store implementation
> on top of its SQL engine:
>
> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSRDFWP
>
> What does make or make install report then, I assume the compilation is
> completed with out errors and you are running make install as a user that
> has access to the location you are seeking to install to ? Is there a
> Virtuoso binary in ~/binsrc/virtuoso/virtuoso-t  as that would confirm the
> compilation at least completed successfully ?
>
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  //  http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
> On 4 Aug 2014, at 13:07, Medha Atre  wrote:
>
> I do NOT want to map relational data to RDF!
>
> I simply want to use Virtuoso opensource as _a relational DB_! But all
> the tutorials and instruction pages have only information of handling
> RDF data. I believe Virtuoso is a native relational DB and support for
> RDF is added "on top of it", am I right? So I want to use the native
> relational DB.
>
> E.g. I want to load a relational table with say 5 columns A, B, C, D,
> E in Virtuoso and run SQL queries on that data directly by creating
> indexes etc.
>
> No RDF conversion required!
> 
> I have a couple more questions/problems. My next question is unrelated
> this the previous one.
>
> 1. If I want to load RDF data into Virtuoso opensource, are strings
> and URIs in the RDF database mapped first to hash or integer (or some
> IDs), and stored in that form? If yes, then if I run a SPARQL query,
> say "select ?s ?o where {?s ?p ?o}", can I get values of ?s and ?o in
> the hash or integer ID forms instead of string or URI forms? (This is
> for the sake of measuring the "raw" query evaluation speed of
> Virtuoso, where I do not want to account for the "ID --> String"
> conversion time.
>
> 2. I tried to compile and setup Virtuoso-opensource 7.1 on Ubuntu
> 12.04 LTS. I ran "autogen.sh", and "./configure
> --prefix=/work/tools/virtinstall
> --program-transform-name="s/isql/isql-v", and then "make", and "make
> install", but even after all the steps completing successfully, I have
> an EMPTY "bin" directory under "/work/tools/virtinstall" (the "prefix"
> install path given to the configure script).
>
> Why is this happening? The "autogen", "configure", and "make" do not
> show any errors, and all the required dependencies as mentioned on
> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSMake#Package%20Dependencies
> are installed on my machine!
>
> Could you please let me know answers of these questions?
>
> Thanks.
> Medha
>
>
>
> On Mon, Aug 4, 2014 at 5:46 PM, Hugh Williams 
> wrote:
>
> Hi Medha,
>
> Virtuoso open source does not have the necessary Virtual Database (VDB)
> support for mapping relational databases (transiently or persistently) to
> RDF using Virtuoso Linked Data Views [1][2] , as this is a commercial only
> feature [3]. The commercial product is for evaluation download from [4].
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  //  

Re: [Virtuoso-users] importing Freebase RDF dump: slows down, memleak?

2014-09-08 Thread Jörn Hees

On 2 Sep 2014, at 00:01, Hugh Williams  wrote:

>>> Development indicate your suggestion is not without merit but 
>>> implementation is not as simple as it may seems as the indexes are not all 
>>> sequential, but something like that could possibly be implemented. It is 
>>> suggested you could try dropping the indexes on RDF_QUAD table,  load the 
>>> Freebase datasets and then recreate indexes after loading,  which would 
>>> require a smaller working set that would better fix into the 32GB RAM 
>>> available. The command for dropping the necessary indexes are:
>>> 
>>> drop index rdf_quad_pogs;
>>> drop index rdf_quad_sp;
>>> drop index rdf_quad_op;
>>> drop index rdf_quad_gs;
>>> 
>>> and the respective indexes can then be recreated as detailed at:
>>> 
>>> 
>>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning?#RDF%20Index%20Scheme
>>> 
>>> Note you need to recreate the column-wise indexes being v7. Let us know how 
>>> this works for you.
>> 
>> Cool, will try.
> 
> [Hugh] OK, let us know the outcome ...

After 5 days:

[Sun Sep  7 23:43:31 2014] virtuoso-t[11495]: segfault at  ip 
008c3a8e sp 7f4ec2f79d20 error 7 in virtuoso-t[40+b47000]

yay...

Performance also breaks down at some point with dropped indexes and 2 
run_rdf_loaders as suggested.
I'll append the output of `select * from DB.DBA.LOAD_LIST;` but for now i give 
up...

Cheers,
Jörn


ll_file 
  ll_graph  
ll_statell_started   ll_done  ll_host 
ll_work_time  ll_error
VARCHAR NOT NULL
  VARCHAR   
INTEGER TIMESTAMPTIMESTAMPINTEGER INTEGER   
  VARCHAR
___

/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.aa.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:20.13 927132000  2014.9.2 17:34.28 11121000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ab.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:20.59 818941000  2014.9.2 17:35.42 994563000  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ac.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:34.28 46244000  2014.9.2 17:52.1 4205  0
   NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ad.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:35.43 10491000  2014.9.2 17:53.58 266217000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ae.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:52.1 4551  2014.9.2 18:11.45 21522000  0
   NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.af.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 17:53.58 27075  2014.9.2 18:14.13 26648  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ag.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:11.45 25765000  2014.9.2 18:29.32 312824000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ah.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:14.13 27152  2014.9.2 18:34.51 216078000  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ai.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:29.32 321036000  2014.9.2 18:54.37 54526000  0  
 NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.aj.nt.gz
  http://rdf.freebase.com   
2   2014.9.2 18:34.51 220487000  2014.9.2 18:54.44 130952000  0 
  NULLNULL
/usr/local/data/datasets/remote/freebase/2014-08-20/splitted/freebase-rdf-2014-08-17-00-00.ak.nt.gz
  http