I appreciate if you get me the newer version, I am already using 2.2.0-M01.
I want to run some graph queries over my rdf. First, I loaded my data into Virtuoso triple store (took 2-3 hours), but could not get results for my SPARQL queries in a reasonable time. That is the reason I decided to load my data into Neo4j to be able to run my queries. I am only importing RDF to Neo4j only for a specific research problem. I need to extract some patterns from the rdf data and I have to write queries that require some sort of graph traversal. I don't want to do reasoning over my rdf data. The graph structure looks simple: nodes only have Label (Uri or Literal) and Value, and relationships don't have any property. On Friday, December 12, 2014 2:41:36 AM UTC-8, Michael Hunger wrote: > > >our id's are UUIDs or ? so 36 chars * 90M -> 72 bytes and Neo-Id's are > longs w/ 8 bytes. so 80 bytes per entry. > Should allocate about 6G heap. > > Btw. importing RDF 1:1 into Neo4j is no good idea in the first place. > > You should model a clean property graph model and import INTO that model. > > The the batch-import, it's a bug that has been fixed after the milestone, > I try to get you a newer version to try. > > Cheers, Michael > > > > On Fri, Dec 12, 2014 at 11:26 AM, mohsen <[email protected] > <javascript:>> wrote: > >> Thanks Michael for following my problem. In groovy script, the output was >> still with nodes. It is not feasible to use enum for relationshipTypes, >> types are URIs of ontology predicates coming from CSV file, and there are >> many of them. However, I think the problem is that this script requires >> more than 10GB heap, because it needs to store the nodes in memory (map) to >> use them later for creating relationships. So, I guess even reducing mmio >> mapping size won't solve the problem, will try it though tomorrow. >> >> Regarding the batch-import command, do you have any idea why I am getting >> that error? >> >> On Friday, December 12, 2014 1:40:56 AM UTC-8, Michael Hunger wrote: >>> >>> It would have been good if you had taken a thread dump from the groovy >>> script. >>> >>> but if you look at the memory: >>> >>> off heap = 2+2+1+1 => 6 >>> heap = 10 >>> leaves nothing for OS >>> >>> probably the heap gc's as well. >>> >>> So you have to reduce the mmio mapping size >>> >>> Was the output still with nodes or already rels? >>> >>> Perhaps also replace DynamicRelationshipType.withName(line.Type) with >>> an enum >>> >>> you can also extend trace to output number of nodes and rels >>> >>> Would you be able to share your csv files? >>> >>> Michael >>> >>> >>> >>> On Fri, Dec 12, 2014 at 10:08 AM, mohsen <[email protected]> wrote: >>> >>>> I could not load the data using Groovy too. I increased groovy heap >>>> size to 10G before running the script (using JAVA_OPTS). My machine has >>>> 16G >>>> of RAM. It halts when it loads 41M rows from nodes.csv: >>>> >>>> >>>> log: >>>> .... >>>> 41200000 rows 38431 ms >>>> 41300000 rows 50988 ms >>>> 41400000 rows 63747 ms >>>> 41500000 rows 112758 ms >>>> 41600000 rows 326497 ms >>>> >>>> After logging 41,600,000 rows, nothing happened. I waited 2 hours there >>>> was not any progress. The process was still taking CPU but there was NOT >>>> any free memory at that time. I guess that's the reason for that. I have >>>> attached my groovy script where you can find the memory configurations. I >>>> guess something goes wrong with memory since it stopped when all my >>>> system's memory was used. >>>> >>>> I then switched back to batch-import tool with stacktrace. I think the >>>> error I got last time was due to small heap size because I did not get >>>> that >>>> error this time (after allocating 10GB heap). Anyway, I have exactly >>>> 86983375 >>>> nodes and it could load the nodes this time, but I got another error: >>>> >>>> Nodes >>>> >>>> [INPUT-------------|ENCODER-----------------------------------------|WRITER] >>>> >>>>> 86M >>>> >>>> Calculate dense nodes >>>>> Import error: InputRelationship: >>>>> properties: [] >>>>> startNode: file:///Users/mohsen/Desktop/ >>>>> Music%20RDF/echonest/analyze-example.rdf#signal >>>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>>> start node that hasn't been imported >>>>> java.lang.RuntimeException: InputRelationship: >>>>> properties: [] >>>>> startNode: file:///Users/mohsen/Desktop/ >>>>> Music%20RDF/echonest/analyze-example.rdf#signal >>>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>>> start node that hasn't been imported >>>>> at org.neo4j.unsafe.impl.batchimport.staging. >>>>> StageExecution.stillExecuting(StageExecution.java:54) >>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor. >>>>> anyStillExecuting(PollingExecutionMonitor.java:71) >>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor. >>>>> finishAwareSleep(PollingExecutionMonitor.java:94) >>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor. >>>>> monitor(PollingExecutionMonitor.java:62) >>>>> at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter. >>>>> executeStages(ParallelBatchImporter.java:221) >>>>> at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport( >>>>> ParallelBatchImporter.java:139) >>>>> at org.neo4j.tooling.ImportTool.main(ImportTool.java:212) >>>>> Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: >>>>> InputRelationship: >>>>> properties: [] >>>>> startNode: file:///Users/mohsen/Desktop/ >>>>> Music%20RDF/echonest/analyze-example.rdf#signal >>>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>>> start node that hasn't been imported >>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep. >>>>> ensureNodeFound(CalculateDenseNodesStep.java:95) >>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process( >>>>> CalculateDenseNodesStep.java:61) >>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process( >>>>> CalculateDenseNodesStep.java:38) >>>>> at org.neo4j.unsafe.impl.batchimport.staging. >>>>> ExecutorServiceStep$2.run(ExecutorServiceStep.java:81) >>>>> at java.util.concurrent.Executors$RunnableAdapter. >>>>> call(Executors.java:471) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker( >>>>> ThreadPoolExecutor.java:1145) >>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >>>>> ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> at org.neo4j.helpers.NamedThreadFactory$2.run( >>>>> NamedThreadFactory.java:99) >>>> >>>> >>>> It seems that it cannot find the start and end node of a relationships. >>>> However, both nodes exist in nodes.csv (I did a grep to be sure). So, I >>>> don't know what goes wrong. Do you have any idea? Can it be related to the >>>> id of the start node "file:///Users/mohsen/Desktop/ >>>> Music%20RDF/echonest/analyze-example.rdf#signal"? >>>> On Thursday, December 11, 2014 10:02:05 PM UTC-8, Michael Hunger wrote: >>>>> >>>>> The groovy one should work fine too. I wanted to augment the post with >>>>> one that has @CompileStatic so that it's faster. >>>>> >>>>> I'd be also interested in the --stacktraces output of the batch-import >>>>> tool of Neo4j 2.2, perhaps you can let it run over night or in the >>>>> background. >>>>> >>>>> Cheers, Michael >>>>> >>>>> On Fri, Dec 12, 2014 at 3:34 AM, mohsen <[email protected]> wrote: >>>>> >>>>>> I guess the core code for both batch-import and Load CSV is the same, >>>>>> why do you think running it from Cypher (rather than through >>>>>> batch-import) >>>>>> helps? I am trying groovy and batch-inserter >>>>>> <https://gist.github.com/jexp/0617412dcdd644fd520b#file-import_kaggle-groovy> >>>>>> now, >>>>>> will post how it goes. >>>>>> >>>>>> >>>>>> On Thursday, December 11, 2014 5:44:36 AM UTC-8, Andrii Stesin wrote: >>>>>>> >>>>>>> I'd suggest you take a look at last 5-7 posts in this recent thread >>>>>>> <https://groups.google.com/forum/#!topic/neo4j/jSFtnD5OHxg>. You >>>>>>> don't basically need any "batch import" command - I'd suggest you to >>>>>>> use >>>>>>> just a plain LOAD CSV functionality from Cypher, and you will just fill >>>>>>> your database step by step. >>>>>>> >>>>>>> WBR, >>>>>>> Andrii >>>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Neo4j" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Neo4j" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
