On 04.04.2012, at 19:18, Alessandro Adamou wrote: > Hi Rupert, all, > > just telling you that I have tried the SingleTdbDatasetTcProvider on the > field with one of my use cases which involves many small ontologies (content > design patterns). > > I've created ~20 graphs totalling about 500 triples > > On OS X 10.6.8 (on HFS+ filesystem with journalling) the database grew from > an initial 184MiB to 248MiB > > I am yet to test large graphs, so I cannot tell if the overhead is given by > named graph indexes or the triple storage, but this is already a big leap > from the TdbTcProvider. >
Thx for testing. > Did you already commit this component to rdf.jena.tdb.storage ? > No not yet, but I have made some improvements and fixed some bugs since the last patch attached to the Issue. I hope I will have some time to finish this later this week. best Rupert > Best, > > Alessandro > > On 3/19/12 9:16 AM, Hasan Hasan wrote: >> Hi all, >> >> I generally agree to extend Clerezza to be able to support multiple >> requirements. Thus, I see the necessity of SingleDatasetTdbTcProvide. >> Although I am bit unhappy, due to the fact, that application developers >> have to be aware of this. >> Note that, new clerezza instances (at least my own build) do not anymore >> generate 200 MB of index files for empty graphs, but merely 200K. >> >> Regards >> Hasan >> >> >> On Fri, Mar 16, 2012 at 2:10 PM, Rupert Westenthaler< >> [email protected]> wrote: >> >>> Hi David, stanbol& clerezza community >>> >>> Short summary of the situation: >>> >>> The Ontonet component generate a lot of MGraphs using the Jena TDB >>> provider. This causes the disc consumption and number of open files to >>> explode. See the quoted emails for details >>> >>> >>> @Stanbol we are already discussion how to avoid the creation of such many >>> graphs >>> >>> >>> @Clerezza the observed behavior of the TDB provider is also very dangerous >>> (at least for typical use cases in Apache Stanbol). >>> >>> Even targeting at a different CLEREZZA-467 maybe provides a possible >>> solution for that as it suggests to use named graphs instead of isolated >>> TDB instances for creating MGraphs. >>> >>> To be honest this would be the optimal solution for our usages of Clerezza >>> in Stanbol. However I assume that for a semantic CMS it is saver to use >>> different TDB datasets. >>> >>> Because of that I would like to make the following proposal that >>> hopefully covers both the needs of Apache Stanbol and Apache Clerezza. >>> >>> 1. AbstractTdbTcProvider: providing most of the functionality needed to >>> store Clerezza MGraphs in Jena TDB >>> >>> 2. TdbTcProvider: The same as now but now extending the abstract one. I >>> follows the currently used methodology to map Clerezza graphs to separate >>> TDB datasets >>> >>> 3. SingleDatasetTdbTcProvider: Tdb provider variant that stores all >>> MGraphs in a single TDB dataset. This provider should also support >>> "configurationFactory=true" (multiple instances). each instance would use a >>> different TDB dataset to store its MGrpahs. >>> >>> By default the SingleDatasetTdbTcProvider would be inactive, because it >>> requires a configuration of the directory for the TDB dataset as well as a >>> name (that can be used in Filters). This ensures full backward >>> compatibility. >>> >>> In environment - such as Stanbol - where you want to store multiple graphs >>> in the same TDB dataset you would need to provide a configuration for the >>> SingleDatasetTdbTcProvider. Here you have two possible usage scenarios: >>> >>> * if you just need a single TDB dataset that stores all MGraphs, than you >>> can assign a high enough service.ranking to the SingleDatasetTdbTcProvider >>> and normally use the TcManager to create your graphs. >>> * if you want to use single TDB datasets or a mix of the TdbTcProvider and >>> SingleDatasetTdbTcProvider's you will need to use according filters. >>> >>> >>> WDYT >>> Rupert >>> >>> >>> [1] https://issues.apache.org/jira/browse/CLEREZZA-467 >>> >>> On 16.03.2012, at 10:44, Rupert Westenthaler wrote: >>> >>>> Hi David, all >>>> >>>> this could be the explanation for the failed build on the Jenkins server >>> when the SEO configuration for the Refactor engine was used in the default >>> configuration of the Full launcher >>>> see http://markmail.org/message/sprwklaobdjankig for details. >>>> >>>> For me that looks like as if the RefactorEngine does create multiple >>> Jena TDB instances for various created MGraphs. One needs to know the even >>> for an empty graph Jena TDB creates ~200MByte of index files. So it is >>> important to map multiple MGraphs to different named graphs of the same >>> Jena TDB store. >>>> I have no Idea how Clerezza manages this or how Ontonet creates MGraphs, >>> but I hope this can help in tracing this down. >>>> best >>>> Rupert >>>> >>>> On 16.03.2012, at 10:30, David Riccitelli wrote: >>>> >>>>> Dears, >>>>> >>>>> As I ran into disk issues, I found that this folder: >>>>> sling/felix/bundleXXX/data/tdb-data/mgraph >>>>> >>>>> where XX is the bundle of: >>>>> Clerezza - SCB Jena TDB Storage Provider >>>>> org.apache.clerezza.rdf.jena.tdb.storage >>>>> >>>>> took almost 70 gbytes of disk space (then the disk space has been >>>>> exhausted). >>>>> >>>>> These are some of the files I found inside: >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology889 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology1041 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology395 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology363 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology661 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology786 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology608 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology213 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology188 >>>>> 193M ./ontonet%3A%3Ainputstream%3Aontology602 >>>>> >>>>> >>>>> Any clues? >>>>> >>>>> Thanks, >>>>> David Riccitelli >>>>> >>>>> >>> ******************************************************************************** >>>>> InsideOut10 s.r.l. >>>>> P.IVA: IT-11381771002 >>>>> Fax: +39 0110708239 >>>>> --- >>>>> LinkedIn: http://it.linkedin.com/in/riccitelli >>>>> Twitter: ziodave >>>>> --- >>>>> Layar Partner Network< >>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 >>> ******************************************************************************** >>> > > > -- > M.Sc. Alessandro Adamou > > Alma Mater Studiorum - Università di Bologna > Department of Computer Science > Mura Anteo Zamboni 7, 40127 Bologna - Italy > > Semantic Technology Laboratory (STLab) > Institute for Cognitive Science and Technology (ISTC) > National Research Council (CNR) > Via Nomentana 56, 00161 Rome - Italy > > > "I will give you everything, so long as you do not demand anything." > (Ettore Petrolini, 1930) > > Not sent from my iSnobTechDevice >
