Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings

2016-09-12 Thread Paul Houle
The trouble with classic http is that pretty frequently you wind up making
a separate TCP connection for each API call.  In theory there was
pipelining in http/1.1 but it did not really work and was rarely
implemented in "web service" scenarios.

http/2 has pipelining that works,  reduces the # of round trips to do
encryption,  compresses headers,  and has all sorts of goodness.

Another issue is that http-based protocols have historically been
text-based protocols and if there was useful thing I learned in grad school
it was that you can do an awful lot of FLOPS in the time that it takes to
parse a string like "7.55523135E12".

Prior to 2000,  it was common to see a wide range of text-based formats for
structured data such as HL7,  UN/EDIFACT,  the communication packets used
in amateur radio, etc.

In the 2000-2010 period,  XML ate most of that.  In 2010-present,  JSON has
replaced XML in a lot of places.  As people catch on to what JSON-LD really
means (you can paint on the XML semantics that are missing in JSON with a
minimum of bother),  JSON will grow.

Performance-wise you can get much better results with binary formats and
people today are even realizing (slower than with JSON-LD) you can
implement binary formats in LD in a secure way in C,  unlike text formats.

The interesting thing though is that "general" binary formats are nowhere
near the consolidation we've seen with XML and JSON.  Binary XML formats
have been tried and not caught on outside of niches.  A small community
speaks Binary FIX.  There is protocol buffers,  Captain Proto,  Thrift,
 Avro, MessagePack,  and who knows what else.

Binary formats might never get their XML moment because once you start
caring about performance you care about performance for *your specific
case* and thus it is impossible to pick one protocol that will make
everybody equally happy (or miserable)

Caching is another http family problem.  Once you start caching you run
into (a) the risk that cached data will be invalid,  and (b) there is no
guarantee that perceived performance will be better when you use a cache.
It's is very hard for Homo Corporatus to understand that users (ex.
customers,  employees) experience latency,  not throughput.  (It's anathema
to corporate ideology,  for instance,  that 9 women can't make a baby in 1
month -- I knew Sun was on the skids when they started talking about
"Throughput Computing")

A 5400 rpm hard drive typical in a laptop can spin around about twice in
the time it takes my (awful) DSL connection to round trip to the Azure data
center in Chicago,  and any kind of I/O storm means it can take much longer
than two spins to discover that something is not in the cache.

It's certainly possible that http caching can improve perceived
performance,  but you can't take it for granted.

On Mon, Sep 12, 2016 at 10:49 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Tangentially on JDBC/ODBC vs HTTP.HTTP is slow because of the
> transport, but JDBC/ODBC has more mature models of paging over results,
> pre-fetching, etc.   I feel that RDF is mature as a Data format, but as a
> protocol it is not that mature.
>
> -Original Message-
> From: Hugh Williams [mailto:hwilli...@openlinksw.com]
> Sent: Monday, September 12, 2016 5:30 AM
> To: Lorenz Buehmann 
> Cc: virtuoso-users@lists.sourceforge.net
> Subject: Re: [Virtuoso-users] Out of curiosity, some questions about
> OpenSource Virtuoso settings
>
> Hi Lorenz,
>
> In Virtuoso 7+  Vectored query execution [1] enables single, typically
> complex analytical types queries, to be broken down and executed on
> multiple threads. The INI file param that controls this is
> "ThreadsPerQuery" [2] , which controls the maximum number of threads that
> can be claimed from the thread pool by a single query, and there are other
> associated params details at [2] .
>
> [1] http://docs.openlinksw.com/virtuoso/vexqrparl.html
> [2] http://docs.openlinksw.com/virtuoso/vexqrparlconfp/
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  //  http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
> > On 12 Sep 2016, at 07:13, Lorenz Buehmann  leipzig.de> wrote:
> >
> > Hi,
> >
> > just as a follow-up question:
> >
> > I know it supports inter-query parallelization, but does it also support
> intra-query parallelization, i.e. using multiple threads to compute the
> result of a single query? If yes, which parameter is used to configure this?
> >
> >
> > Cheers,
> >
> > Lorenz
> >
> > On 12.09.2016 05:57, Kingsley Idehen wrote:
> >> On 9/10/16 10:41 PM, giacom...@libero.it
> >>  wrote:
> >>
> >>> So, I have a bunch of question

Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings

2016-09-12 Thread Davis, Daniel (NIH/NLM) [C]
Tangentially on JDBC/ODBC vs HTTP.HTTP is slow because of the transport, 
but JDBC/ODBC has more mature models of paging over results, pre-fetching, etc. 
  I feel that RDF is mature as a Data format, but as a protocol it is not that 
mature.

-Original Message-
From: Hugh Williams [mailto:hwilli...@openlinksw.com] 
Sent: Monday, September 12, 2016 5:30 AM
To: Lorenz Buehmann 
Cc: virtuoso-users@lists.sourceforge.net
Subject: Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource 
Virtuoso settings

Hi Lorenz,

In Virtuoso 7+  Vectored query execution [1] enables single, typically complex 
analytical types queries, to be broken down and executed on multiple threads. 
The INI file param that controls this is "ThreadsPerQuery" [2] , which controls 
the maximum number of threads that can be claimed from the thread pool by a 
single query, and there are other associated params details at [2] .

[1] http://docs.openlinksw.com/virtuoso/vexqrparl.html
[2] http://docs.openlinksw.com/virtuoso/vexqrparlconfp/

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

> On 12 Sep 2016, at 07:13, Lorenz Buehmann 
>  wrote:
> 
> Hi,
> 
> just as a follow-up question:
> 
> I know it supports inter-query parallelization, but does it also support 
> intra-query parallelization, i.e. using multiple threads to compute the 
> result of a single query? If yes, which parameter is used to configure this?
> 
> 
> Cheers,
> 
> Lorenz
> 
> On 12.09.2016 05:57, Kingsley Idehen wrote:
>> On 9/10/16 10:41 PM, giacom...@libero.it
>>  wrote:
>> 
>>> So, I have a bunch of questions about Virtuoso.
>>> 
>>> * Are SPARQL queries performed concurrently (using the standard 
>>> virtuoso.ini configuration)?
>>> 
>> Yes, Virtuoso is highly multi-threaded. In addition, it has 
>> vectorized query execution i.e., many query batches per thread, handled 
>> concurrently .
>> 
>>> * Are Virtuoso transactions autocommittable, or there aren't any 
>>> kind of transactions when connecting to the ODBC driver?
>>> 
>> Depends, by default you have Read Committed Isolation level.
>> 
>>> * Does Virtuoso automatically performs triple indexing before query data? 
>>> 
>> No, it has indexes in place.
>> 
>> 
>>> In
>>> some systems I have to configure it manually.
>>> * Does the usage of ODBC degradates the benchmarking of Virtuoso's 
>>> SPARQL query or the time required to store the triples within a given named 
>>> graph?
>>> 
>> No, if anything that's faster than HTTP.
>> 
>> Links:
>> 
>> [1]
>> http://docs.openlinksw.com/virtuoso/isolation/
>> 
>> [2]
>> 
>> http://wikis.openlinksw.com/VirtuosoWikiWeb/ChangeVirtuosoSDefaultTra
>> nsactionIsolationLevel
>> 
>> [3]
>> http://docs.openlinksw.com/virtuoso/fn_log_enable.html
>> 
>> [4]
>> 
>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPer
>> formanceTuning
>> 
>> 
>> [5]
>> 
>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRD
>> FLoader#Running
>> 
>> multiple Loaders
>> 
>> 
>> Kingsley
>> 
>> 
>>> Thanks in advance for any support,
>>> 
 JackB
 
>>> 
>>> -- ___
>>> Virtuoso-users mailing list
>>> 
>>> Virtuoso-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -
>> -
>> 
>> 
>> 
>> ___
>> Virtuoso-users mailing list
>> 
>> Virtuoso-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
> 
> --
>  ___
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Timeout when listing graphs

2016-09-12 Thread Kingsley Idehen
On 9/12/16 3:50 AM, Pantelis Natsiavas wrote:
> SPARQL SELECT distinct ?graph WHERE { GRAPH ?graph { ?s ?p ?o } };

Please try:

SPARQL SELECT distinct ?graph WHERE { GRAPH ?graph { ?s a ?o } };

-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Medium Blog: https://medium.com/@kidehen
Blogspot Blog: http://kidehen.blogspot.com
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature
--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Out of curiosity, some questions about OpenSource Virtuoso settings

2016-09-12 Thread Hugh Williams
Hi Lorenz,

In Virtuoso 7+  Vectored query execution [1] enables single, typically complex 
analytical types queries, to be broken down and executed on multiple threads. 
The INI file param that controls this is “ThreadsPerQuery” [2] , which controls 
the maximum number of threads that can be claimed from the thread pool by a 
single query, and there are other associated params details at [2] .

[1] http://docs.openlinksw.com/virtuoso/vexqrparl.html
[2] http://docs.openlinksw.com/virtuoso/vexqrparlconfp/

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

> On 12 Sep 2016, at 07:13, Lorenz Buehmann 
>  wrote:
> 
> Hi,
> 
> just as a follow-up question:
> 
> I know it supports inter-query parallelization, but does it also support 
> intra-query parallelization, i.e. using multiple threads to compute the 
> result of a single query? If yes, which parameter is used to configure this?
> 
> 
> Cheers,
> 
> Lorenz
> 
> On 12.09.2016 05:57, Kingsley Idehen wrote:
>> On 9/10/16 10:41 PM, giacom...@libero.it
>>  wrote:
>> 
>>> So, I have a bunch of questions about Virtuoso.
>>> 
>>> * Are SPARQL queries performed concurrently (using the standard 
>>> virtuoso.ini 
>>> configuration)?
>>> 
>> Yes, Virtuoso is highly multi-threaded. In addition, it has vectorized
>> query execution i.e., many query batches per thread, handled concurrently .
>> 
>>> * Are Virtuoso transactions autocommittable, or there aren't any kind of 
>>> transactions when connecting to the ODBC driver?
>>> 
>> Depends, by default you have Read Committed Isolation level.
>> 
>>> * Does Virtuoso automatically performs triple indexing before query data? 
>>> 
>> No, it has indexes in place.
>> 
>> 
>>> In 
>>> some systems I have to configure it manually.
>>> * Does the usage of ODBC degradates the benchmarking of Virtuoso's SPARQL 
>>> query or the time required to store the triples within a given named graph?
>>> 
>> No, if anything that's faster than HTTP.
>> 
>> Links:
>> 
>> [1] 
>> http://docs.openlinksw.com/virtuoso/isolation/
>> 
>> [2]
>> 
>> http://wikis.openlinksw.com/VirtuosoWikiWeb/ChangeVirtuosoSDefaultTransactionIsolationLevel
>> 
>> [3] 
>> http://docs.openlinksw.com/virtuoso/fn_log_enable.html
>> 
>> [4]
>> 
>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning
>> 
>> 
>> [5]
>> 
>> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader#Running
>> 
>> multiple Loaders
>> 
>> 
>> Kingsley
>> 
>> 
>>> Thanks in advance for any support,
>>> 
 JackB
 
>>> --
>>> ___
>>> Virtuoso-users mailing list
>>> 
>>> Virtuoso-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> --
>> 
>> 
>> 
>> ___
>> Virtuoso-users mailing list
>> 
>> Virtuoso-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
> 
> --
> ___
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users



smime.p7s
Description: S/MIME cryptographic signature
--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Timeout when listing graphs

2016-09-12 Thread Pantelis Natsiavas
Hi Hugh.

I had changed the virtuoso.ini vector parameters in the past due to working
with large datasets. One of these changes could have caused the problem.
However, I can not confidently say which. I have changed the parameters you
suggested, however, still I cannot get an answer to the graphs query in a
reasonable time.

I am using Virtuoso Version: 07.20.3214 Build: Oct 14 2015, on Ubuntu 14.04
and the number of the graphs is not large (maybe 15 or 20).

Kind regards,
Pantelis Natsiavas

2016-09-08 4:54 GMT+03:00 Hugh Williams :

> Hi Pantelis,
>
> What is the version of  the binary being run ie 07.20.3217 or other build
> id and how many graphs are in the database ?
>
> I have seen such issues with older Virtuoso binaries, which has been
> resolved by adding the following to the “[Parameters]” of the INI file:
>
> VectorSize   = 1000
> AdjustVectorSize = 0
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  //  http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
> > On 7 Sep 2016, at 10:25, Pantelis Natsiavas  wrote:
> >
> > Hi everybody.
> >
> > When I try to list the graphs in my virtuoso instance, I get a huge
> delay, and finally a timeout.
> >
> > This happens for the last two weeks or so, without any obvious reason.
> The delay occurs when I try it through the conductor web UI (Tab "Linked
> Data" -> Tab "Graphs" -> Tab "Graphs"). I have tried retrieving the graphs
> through the isql-v
> >
> > SPARQL SELECT distinct ?graph WHERE { GRAPH ?graph { ?s ?p ?o } };
> >
> > and it is also very slow.
> >
> > Please note that SPARQL queries are answered normally, without any
> obvious delays.
> >
> > Do you have any kind of suggestions? Should I fear having my triple
> store in some kind of corrupted state?
> >
> > Best regards,
> > Pantelis Natsiavas
> > 
> --
> > ___
> > Virtuoso-users mailing list
> > Virtuoso-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>
--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Ubuntu upgrading

2016-09-12 Thread Pantelis Natsiavas
Thank you Hugh.

I have removed the old virtuoso.trx file and everything seems in place.

Kind regards,
Pantelis Natsiavas

2016-09-08 4:47 GMT+03:00 Hugh Williams :

> Hi Pantelis,
>
> The "It is impossible to have a database file 
> /media/VirtuosoDBDrive/virtuoso.db
> with a length not multiple of 2MB” error on startup indicates and attempt
> to grow the database, which is done in 2MB segments, failed during the
> process probably due to running out of disk space or the server shutdown
> unexpectedly.
>
> This is not related to the trx file which would only be an issue if
> upgrading to a new engine build id ie 3217 , 3216 etc. and if you have ran
> the +checkpoint-only option then you can remove the trx file if it is not
> zero bytes as it would just contain the database signature info.
>
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  //  http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/
> LinkedIn -- http://www.linkedin.com/company/openlink-software/
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/
> Facebook -- http://www.facebook.com/OpenLinkSoftware
> Universal Data Access, Integration, and Management Technology Providers
>
> > On 7 Sep 2016, at 09:13, Pantelis Natsiavas  wrote:
> >
> > Hi everybody.
> >
> > As I am trying to work on big datasets, I thought that upgrading on the
> latest version of virtuoso would be a good idea as I would hopefully get
> some performance advantages too.
> >
> > However, following the procedure described in
> http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/
> Main/UpgradingToVOS610#Upgrading from Release 7.x to a newer Release 7.x
> I found out that my trx file has 185 bytes length and it would not go to 0
> even after the restart with the "+checkpoint-only" argument. My log
> contains the following snippet after startup
> >
> > 10:57:16 It is impossible to have a database file 
> > /media/VirtuosoDBDrive/virtuoso.db
> with a length not multiple of 2MB.
> > 10:57:16 The process must have last terminated while growing the file.
> > 10:57:16 Please contact OpenLink Customer Support
> > 10:57:16 Database version 3126
> > 
> > 10:58:00 Roll forward started
> > 10:58:00 3 transactions, 185 bytes replayed (100 %)
> > 10:58:00 Roll forward complete
> > 
> >
> > My questions are:
> > 1. If I get it right, the virtuoso.trx size implies that there are
> transactions left uncompleted in a "dirty" state. However, this should not
> be the case since as the log shows normal replay of the transactions. Could
> I just delete the virtuoso.trx? Is there something else I could do to
> recover?
> > 2. What would be the easiest way to upgrade the virtuoso server? I am
> not comfortable with the instruction "install the newer v7.x binary
> components, either atop or after removing the older v7.x binary
> components.". Is there a more specific guideline?
> >
> > Please note that I am running Virtuoso Version: 07.20.3214 Build: Oct 14
> 2015, on Ubuntu 14.04.
> >
> > Kind regards,
> > Pantelis Natsiavas
> > 
> --
> > ___
> > Virtuoso-users mailing list
> > Virtuoso-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>
>
--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users