Sorry, I had mistakenly sent this email to Sebastian only... Hi!
On 16 September 2010 19:46, Sebastian Trueg <[email protected]> wrote: > It is not recommended to use RDF containers. They cannot properly > queries via SPARQL, support is not guranteed, and their semantics are > very unclear anyway. > Thus, I follow popular opinion in the semantic world and recommend not > to use them. Duly noted. We have an already high risk factor "as is", we won't be looking for additional trouble. > > If you need to store more information then you need to go the normal RDF > way: define the ontology constructs you need. We will be happy to help > you with that. > Okay, I'll try to use the example I found on the ontologies, so you can correct my misunderstandings as I go on. I put in the end, as some explanations may be of help before that. Thus, please think twice before trying to store anything in the > graph/context metadata. In most cases a specific class or property might > make more sense. > Again, whatever does the job better and with less trouble is welcome. > > Could you maybe elaborate a bit on your project? > It's a simple idea: we collect external data sources, like wiktionary, AGROVOC, geographic names, public open source DBs etc and put them into a common format that allows a single interface for them all. In our DSL the prepared data from a single source is called a "region". We have bots doing the collect/update job for a region, and since data is semantically tagged, a user may say "I want everything about Tai-chi, vegan food and astronomy from the following regions in English and Japanese". His machine downloads the prepared normalized data from a list of network-sources to keep up-to-date. While these bot driven regions are read-only the user can also upload back stuff to the same network-sources using a "community region", in order to share it. There may be any number of communities, as we expect people to develop thematic communities or simply to dislike each other POVs, sooner or later, and communities are self managed. If I don't like a community I don't link their stuff, and that's it. Anyway, this is a further development, we will have just one community, to begin with. > Could you also elaborate on the distributed rep, please. Be aware that > Nepomuk does not provide a distributed store and it is very unlikely > that it will in the near future - simply because creating a distributed > store is very very hard, a lot of work, and requires expertise that we > do not have... > We have no idea about how to make a "real" distributed store, either. The doable thing I can think of is, as I said, a list of network-sources that can import and distribute a number of "regions" to end users. But I tend to think that since we have to upload stuff back to a "community region" two laptops could sync each other using any network connection in place. So, for example, if I live in an insulated village in the middle of nothing (a common situation in the third world) and I have but one box in a school, anyone coming to visit with a laptop can update me, provided that I told him what to download when he could get a normal connection and he has storage space enough for it. Or I could remain in the village and be sent a RAM key or a DVD with updates on it. This would already be a lot, for most "randomly connected" situations. Most of Africa and a lot of Asia has little choice other than this, and it's especially for them that accessing "thematic knowledge" locally is a high value. I suppose that since dbpedia has RDF exports there should be RDF imports, and we could just use this. Once export files are available, they could be broadcast using the p2p features (which I know nothing of, I just know they should be there, sooner or later). Since we do multimedia content, this is especially important to limit the amount of content one wants to store. To remain with our previous example, my subscription could be: "I want everything about Tai-chi, vegan food and astronomy from the following regions in English and Japanese, excluding video and audio files, pictures included". In any case I would get pointers to remote resorce uuids, telling me there's a video/audio file (and its tags), so that I can know it's there and I can order single downloads if I decide some particular material is of interest. Now let's get to the model. "Profile" is our DSL lingo for "meaning", see http://en.wikipedia.org/wiki/Cognitive_semantics#Langacker:_profile_and_base : @PREFIX foo: <http://foo.bar/types#> foo:Profile rdf:type rdfs:Class . /* A "Region" has a textual description, along with other minor properties, so we want it to inherit all translation capabilities from Profile */ foo:Region rdf:type rdfs: Class. foo:Region rdfs:subClassOf foo:Profile . /* This is where actual content is */ foo:Content rdf:type rdfs:Class . foo:Text rdf:type rdfs:Class . foo:File rdf:type rdfs:Class . foo:means rdf:type rdf:Property . foo:means rdfs:domain foo:Content . foo:means rdfs:range foo:Profile . /* Here I'm in trouble, as I need what DBs call an ENUM(expression, definition) that defines the role a Content instance in a dictionary expr=def equation. You will excuse my "creative syntax", probably I should have created a "Role" class with two instances, right? */ foo:hasRole rdf:type rdf:Property . foo:hasRole rdfs:domain foo:Content . foo:hasRole rdfs:range foo:(expression,definition) . /* How do we avoid infinite recursions here? */ foo:isTranslationOf rdf:type rdf:Property . foo:isTranslationOf rdfs:domain foo:Content . foo:isTranslationOf rdfs:range foo:Content . /* Do we have Booleans? Anyway, if an instance of content gets modified, all of its translations are marked "fuzzy" by this flag */ foo:isVerified rdf:type rdf:Property . foo:isVerified rdfs:domain foo:Content . foo:isVerified rdfs:range foo:Boolean . /* Now these two properties are on a mutex constraint, something is either a text of a file. Not sure whether this distinction is important for nepomuk, in PostgreSQL we use it to separate things we can set a full-text search on from things that must be searched otherwise. Content is also used as a meta-level, to send out minimal information about remote files that aren't actually present on the system */ foo:hasText rdf:type rdf:Property . foo:hasText rdfs:domain foo:Content . foo:hasText rdfs:range foo:Text . foo:hasFile rdf:type rdf:Property . foo:hasFile rdfs:domain foo:Content . foo:hasFile rdfs:range foo:File . /* This assigns content to a Region */ foo:belongsTo rdf:type rdf:Property . foo:belongsTo rdfs:domain foo:Content . foo:belongsTo rdfs:range foo:Region . Now, before I write too much garbage syntax, is this readable/usable? There is much more to come, although I expect changes to be needed, for the existing model to adapt to this new environment. Bèrto -- ============================== Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs.
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
