Re: [Virtuoso-users] Mapper Options in Conductor (Question about Sponging)

2015-11-12 Thread Haag, Jason
Hi All,

I have been trying to understand how virtuoso's crawler content import and
sponging features work. I'm currently evaluating virtuoso using 07.20.3214
VOS.

I set up three crawl jobs for three different HTML/RDFa files and received
no errors.

When I attempt to use the sparql interface to query the data it doesn't
show up:

For example, http://w3id.org/xapi/adb/verbs/ is the target URL of a crawl
job I set up in conductor under content imports. I am using the xhtml/HTM5
variants cartridge with the following options:

fallback-mode=no
rdfa=yes
reify_html5md=0
reify_rdfa=1
reify_jsonld=0
reify_all_grddl=0
reify_html=0
passthrough_mode=yes
loose=yes
reify_html_misc=no
reify_turtle=no

If I go to http://54.152.125.100:8890/sparql and use the following sparql
query it returns no results:

#Query all Verb IRIs
PREFIX xapi: 

SELECT DISTINCT ?Verb

WHERE {
   ?Verb a xapi:Verb .

}


However, the data does start to show up in this query if I subsequently add
http://w3id.org/xapi/adb/verbs/ as the default data set name / graph IRI in
the sparql interface and also select the sponging option to download all
RDF resources.

Is this sponging option from the sparql interface actually adding/download
the triples? Wouldn't this allow anyone to add triples that has access to
the sparql interface? The faceted search interface seems to indicate so as
I did this with
the following graph IRI, http://adlnet.gov/expapi/verbs

http://54.152.125.100:8890/describe/?url=http%3A%2F%2Fadlnet.gov%2Fexpapi%2Fverbs=4

I tried to set up this IRI as a crawl job and it never populated virtuoso's
data store. But as soon as I add it as a graph IRI using the sparql
interface and sponging it shows up. Is this the expected behavior / by
design for this sparql sponging option? I thought graphs and triples could
only be added with special SPARQL permissions and using INSERT.

I still don't think the crawler feature is working for HTML/RDFa. It
appears to be processing and storing the HTML file in the
repository/locally in virtuoso, but it doesn't seem to actually add the
graph or triples to the database.

Thanks in advance for your patience and help!

J Haag

---



On Wed, Oct 28, 2015 at 5:17 AM, Tim Haynes  wrote:

>
> On 27 October 2015 at 20:49, Haag, Jason  wrote:
>
>> I think I know the answer to my last two questions. I had additional html
>> files below the /verbs/ directory. I believe that is where the duplicates
>> came from. I'm guessing sponger also looks for any html files at the
>> specified path, not just the "index.html" file that was specified as a
>> target URL. Can anyone verify this?
>
>
> Hi,
>
> It's unlikely - I don't know of anything in the Sponger that implements
> directory browsing, but it may well be following e.g.  rel="alternate" href="" /> to RSS/Atom feeds, etc.
>
> As Kingsley says, Faceted Browser will show you what graphs the triples
> appear in.
>
> When a page is sponged, its URL becomes 1:1 the graph IRI in which data
> from/about/in that resource is stored. Multiple graphs implies multiple
> sponging events.
>
> HTH,
>
> ~Tim
> --
> Tim Haynes
> Product Development Consultant
> OpenLink Software
> 
> 
>
--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] rdf_loader_run logs "missed delete of name id cache" messages

2015-11-12 Thread Hugh Williams
Hi Jörn,

If you check the load_list table does it log in the ll_error column which 
dataset is causing the error and you can check the file to see it is valid ? 

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/
Weblog   -- http://www.openlinksw.com/blogs/
LinkedIn -- http://www.linkedin.com/company/openlink-software/
Twitter  -- http://twitter.com/OpenLink
Google+  -- http://plus.google.com/100570109519069333827/
Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers

> On 12 Nov 2015, at 15:15, Jörn Hees  wrote:
> 
> Hi,
> 
> i'm trying to import the latest DBpedia dumps with 3 rdf_loader_runs on 
> Virtuoso-Opensource 7.2.1.
> 
> In the virtuoso.log file i find messages like this:
> 
> 15:47:39 Server online at  (pid 60790)
> 15:48:30 PL LOG: Loader started
> 15:48:57 PL LOG: Loader started
> 15:49:21 PL LOG: Loader started
> 15:52:34 missed delete of name id cache 
> resource/III_Marine_Expeditionary_Force 0 (0x5d6ae6 )
> 
> I retried importing from scratch a couple of times, also with a single 
> rdf_loader_run only, but it seems "missed delete of name id cache" always 
> occur.
> I couldn't find anything online about how bad this is.
> Does anyone know?
> 
> 
> I also noticed that after all loaders finish, virtuoso continues to have very 
> high CPU loads (2 cores maxed out on average) with very little IO activity.
> Does anyone know what is happening in that time? Is it save to shut down the 
> DB?
> 
> Cheers,
> Jörn
> 
> 
> --
> ___
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users

--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] rdf_loader_run logs "missed delete of name id cache" messages

2015-11-12 Thread Jörn Hees
Hi,

i'm trying to import the latest DBpedia dumps with 3 rdf_loader_runs on 
Virtuoso-Opensource 7.2.1.

In the virtuoso.log file i find messages like this:

15:47:39 Server online at  (pid 60790)
15:48:30 PL LOG: Loader started
15:48:57 PL LOG: Loader started
15:49:21 PL LOG: Loader started
15:52:34 missed delete of name id cache resource/III_Marine_Expeditionary_Force 
0 (0x5d6ae6 )

I retried importing from scratch a couple of times, also with a single 
rdf_loader_run only, but it seems "missed delete of name id cache" always occur.
I couldn't find anything online about how bad this is.
Does anyone know?


I also noticed that after all loaders finish, virtuoso continues to have very 
high CPU loads (2 cores maxed out on average) with very little IO activity.
Does anyone know what is happening in that time? Is it save to shut down the DB?

Cheers,
Jörn


--
___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users