[jira] [Commented] (JENA-985) Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

Eugene Tenkaev (JIRA) Mon, 13 Jul 2015 04:40:41 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624550#comment-14624550
 ]


Eugene Tenkaev commented on JENA-985:
-------------------------------------

Hello!
I create persistent storage with next code:
{code:java}
    /**
     * Creates dataset from source "*.nt" files.
     *
     * @param sourceFilePath source file path.
     * @param jenaModelPath  directory where jena model will be created.
     * @return true if all good.
     */
    private boolean createDataset(Path sourceFilePath, Path jenaModelPath) {
        try {
            logger.info("Creating new dataset: " + jenaModelPath);

            Location location = Location.create(jenaModelPath.toString());
            DatasetGraphTDB datasetGraphTDB = TDBMaker.createDatasetGraphTDB(
                location, StoreParams.getDftStoreParams()
            );
            String fileUri = sourceFilePath.toUri().toString();
            TDBLoader.load(datasetGraphTDB, fileUri, true);
            datasetGraphTDB.close();
            TDBMaker.releaseLocation(location);

            logger.info("Creating of new dataset completed.");
        } catch (Exception e) {
            logger.fatal(e);
            return false;
        }

        return true;
    }
{code}

Then I load this dataset by next code:
{code:java}
DatasetGraph datasetGraph = TDBFactory.createDatasetGraph(modelPath.toString());
Graph graph = datasetGraph.getDefaultGraph();
{code}

And I double check... my code doesn't store {code}Triple{code} or 
{code}Node{code}

So my iterate through all abstracts, looks like this:
{code:java}
     private voidprintAllAbstracts() {
        ExtendedIterator<Triple> iter = graph.find(
            Node.ANY, NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY
            );

        while (iter.hasNext()) {
            Triple triple = iter.next();
            Node subjectNode = triple.getSubject();
            Node objectNode = triple.getObject();

            if (objectNode.isLiteral()) {
                String abstractStr = objectNode.getLiteralLexicalForm();

                if (!abstractStr.isEmpty()) {
                    System.out.println(subjectNode.getURI() + " - " + 
abstractStr);
                }
            }
        }
    }
{code}

> Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples
> ------------------------------------------------------------------------------
>
>                 Key: JENA-985
>                 URL: https://issues.apache.org/jira/browse/JENA-985
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: Jena 2.13.0
>         Environment: *Hardware*
> Windows 7 64-bit
> Intel Core i7 4785T @ 2.20GHz
> RAM 16,0GB DDR3
> 465GB Samsung SSD 850 EVO 500G SCSI Disk Device (SSD)
> *Software environment*
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> *Running options*
> VM options: -Xmx14g
>            Reporter: Eugene Tenkaev
>            Priority: Minor
>
> I'm generating Apache Jena Graph from DBpedia dumps and now I want iterate 
> through all "dbpedia-owl:abstract".
> So I do something like this:
> {code:java}
>     ExtendedIterator<Triple> iterator = Graph.find(Node.ANY, 
> NodeFactory.createURI("dbpedia-owl:abstract"), Node.ANY);
> {code}
> But then I try to iterate, memory consumption is increased, so looks like 
> "ExtendedIterator" store found nodes.
> I use VisualVM profiler and found that while I iterate, count of 
> "com.hp.hpl.jena.graph.Node_URI" is increasing.
> I try to do "iterator.reset()" but this takes no effect.
> Is this bug or feature?:D
> Can I iterate through all DBpedia abstracts without storing nodes and without 
> increasing consumption of memory that gc can't freed?
> Sorry for my bad english.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (JENA-985) Iterate using Apache Jena ExtendedIterator on Graph with big amount of triples

Reply via email to