> I would very much like to see discussion of use cases so we understand 
> each others expectations and technical requirements. Minto's message 
> touches on this, about distribution and design for in-memory.

At the moment we have a Hadoop cluster processing data scrapped from the Web. 
We use OpenNLP on the data, and a corporate ontology to create triples in RDF 
(IIRC we're using jena-arq dependency for that). These triples are later loaded 
 in a graph in Jena TDB via Fuseki. Later a data quality process is used for 
data deduplication and record linkage (via SPARQL). Finally the data is ready 
to be consumed by internal products and services.

My use case for commons-rdf would be in the Hadoop jobs. Using its API and a 
simple and efficient implementation to create the triples. If another 
implementation claimed to be more efficient, I could simply replace the impl 
dependency in my pom.xml and run some jobs to test it. 

Cheers,
Bruno

ps: ATM we have some custom writables, but plan to soon use Jena Elephas for 
that too :)


----- Original Message -----
> From: Andy Seaborne <a...@apache.org>
> To: dev@commons.apache.org
> Cc: 
> Sent: Saturday, January 17, 2015 8:26 AM
> Subject: Re: [RDF] Updated Commons-RDF
> 
> On 14/01/15 18:34, Reto Gmür wrote:
>>  There has been an indirect reply here:
>>  https://github.com/commons-rdf/commons-rdf/issues/43, as the issue point to
>>  this thread I though to add a back-link but I would prefer to have a
>>  discussion here and to discuss about concrete code proposals
> 
> I would very much like to see discussion of use cases so we understand 
> each others expectations and technical requirements. Minto's message 
> touches on this, about distribution and design for in-memory.
> 
> https://mail-archives.apache.org/mod_mbox/clerezza-dev/201412.mbox/%3c54946d8b.1060...@apache.org%3E
> 
>>  According to Sergio the proposal is "a wrapper implementation instead 
> of
>>  commons interface". As the proposal doesn't contain any wrapper, 
> this might
>>  refer to the question on when to define classes and when to define
>>  interfaces.
>> 
>>  The API proposal has the following interfaces and classes (without .events)
>> 
>>  Interfaces:
> ...
> 
>>  Classes:
> ...
> 
>>  The reason why Language and Iri are classes rather than interfaces is
>>  because the additional work for service providers exposing the API to
>>  implement the interfaces themselves seems to outweigh the benefits of the
>>  possibility to provide an own implementation without inheriting the
>>  overhead of an additional String per instance (the classes are not final,
>>  so implementation can still provide an Iri implementation that stores all
>>  the lengthy IRIs on disk, in this case there is just an empty and unused
>>  string field for the JIT to optimize away).
> 
> Not having interfaces everywhere is painful for adding this new system 
> to existing code.  Java does not support multiple inheritance.  Existing 
> code may already have a super class.  A copy would be needed.
> 
> What had you in mind for different literal implementations?
> 
> I don't see why Literal is different to a IRI here - a literal is , by 
> definition, lexical form + language + datatype, in the same way URI is a 
> uristring.
> 
>>  The reason why BlankNode is a class and not an interface is to discourage
>>  polymorphism. If an instance is more than just a BNode user will be more
>>  likely to expect to get the very same instance back,
> 
> I hope they, for any RDFterms, make that assumption under any 
> circumstances!  For persistence, same object (i.e. java's ==) is 
> somewhere between "very hard" (= expensive to implement for no 
> benefit) 
> and impossible (persist data > RAM size).  Interning is not practical 
> because of reference counting to keep the intern table size managed. 
> Weak references add cost (app write and execution) at a point where 
> simple costs can mount up quickly (parsing speed ... once the I/O path 
> is straightened out ... java :-( ).
> 
> 
>>  but as described in
>>  the Readme there is no such guarantee. Typically implementations will
>>  replace BlankNode objects with instances of their own subclass of BlankNode
>>  as soon as they can (i.e. as soon as originally added instance becomes
>>  eligible for garbage collection).
> 
>     Andy
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to