With my resolution for CLEREZZA-414 the format on disk changed. Let me know if you have mgraphs that are so big that you would like a script to update from the old format.
Cheers, Reto 2011/2/15 Reto Bachmann-Gmür (JIRA) <[email protected]> > > [ > https://issues.apache.org/jira/browse/CLEREZZA-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Reto Bachmann-Gmür resolved CLEREZZA-414. > ----------------------------------------- > > Resolution: Fixed > > I've committed a patch that allows the file not to be read until it is > really needed. > > The format on disk is changed, data directory created with older versions > are incompatible. > > Possible future improvements: > - read datatype only if lexical form isn't needed > > > the performance of removing resources with externalized literals should > be improved > > > ----------------------------------------------------------------------------------- > > > > Key: CLEREZZA-414 > > URL: https://issues.apache.org/jira/browse/CLEREZZA-414 > > Project: Clerezza > > Issue Type: Improvement > > Reporter: Hasan > > Assignee: Reto Bachmann-Gmür > > > > I compared the performance of AbstractDiscobitsHandler.remove() when > removing an InfoDiscoBit in the following cases: > > a. with literal externalizer > > a1. the infoBit is about 3.5 MB > > a2. the infoBit is about 30 KB > > b. without literal externalizer > > b1. the infoBit is about 3.5 MB > > b2. the infoBit is about 30 KB > > Using a clerezza instance on my notebook, I obtained an average value for > > a1: 15 seconds > > a2: 110 ms > > b1: 500 ms > > b2: 5 ms > > I examined the remove method and did some timestamping. The comments in > the code below tell us where most of the times are spent. > > public void remove(NonLiteral node) { > > MGraph mGraph = getMGraph(); > > Iterator<Triple> properties = mGraph.filter(node, null, > null); > > //copying properties to set, as we're modifying > underlying graph > > Set<Triple> propertiesSet = new HashSet<Triple>(); > > // this while loop consumes half of the total time needed for the whole > remove operation in case of using literal externalizer > > // if literal is NOT externalized, the time consumed by this while loop > is negligible > > while (properties.hasNext()) { > > propertiesSet.add(properties.next()); > > } > > properties = propertiesSet.iterator(); > > while (properties.hasNext()) { > > Triple triple = properties.next(); > > UriRef predicate = triple.getPredicate(); > > if (predicate.equals(DISCOBITS.contains)) { > > try { > > GraphNode containedNode = new > GraphNode((NonLiteral)triple.getObject(), mGraph); > > //The following includes triple > > > containedNode.deleteNodeContext(); > > } catch (ClassCastException e) { > > throw new RuntimeException("The > value of "+predicate+" is expected not to be a literal"); > > } > > //as some other properties of node could > have been in the context of the object > > remove(node); > > return; > > } > > } > > // In case of using literal externalizer, the code segment below consumes > half of the total time needed for the whole remove operation > > // if literal is NOT externalized, it consumes most of the time needed > for the whole remove operation > > GraphNode graphNode = new GraphNode(node, mGraph); > > graphNode.deleteNodeContext(); > > } > > It seems that using literal externalizer is pretty expensive. > > We need to find a more efficient solution to deal with large literals > > -- > This message is automatically generated by JIRA. > - > For more information on JIRA, see: http://www.atlassian.com/software/jira > > >
