Right, that's the problem with an RDF model why only uses relationships to represent properties, you won't get the performance that you would get with a real property-graph model.
I share the version separately. Cheers, Michael On Fri, Dec 12, 2014 at 12:07 PM, mohsen <[email protected]> wrote: > I appreciate if you get me the newer version, I am already using > 2.2.0-M01. > > I want to run some graph queries over my rdf. First, I loaded my data into > Virtuoso triple store (took 2-3 hours), but could not get results for my > SPARQL queries in a reasonable time. That is the reason I decided to load > my data into Neo4j to be able to run my queries. > > I am only importing RDF to Neo4j only for a specific research problem. I > need to extract some patterns from the rdf data and I have to write queries > that require some sort of graph traversal. I don't want to do reasoning > over my rdf data. The graph structure looks simple: nodes only have Label > (Uri or Literal) and Value, and relationships don't have any property. > > On Friday, December 12, 2014 2:41:36 AM UTC-8, Michael Hunger wrote: >> >> >our id's are UUIDs or ? so 36 chars * 90M -> 72 bytes and Neo-Id's are >> longs w/ 8 bytes. so 80 bytes per entry. >> Should allocate about 6G heap. >> >> Btw. importing RDF 1:1 into Neo4j is no good idea in the first place. >> >> You should model a clean property graph model and import INTO that model. >> >> The the batch-import, it's a bug that has been fixed after the milestone, >> I try to get you a newer version to try. >> >> Cheers, Michael >> >> >> >> On Fri, Dec 12, 2014 at 11:26 AM, mohsen <[email protected]> wrote: >> >>> Thanks Michael for following my problem. In groovy script, the output >>> was still with nodes. It is not feasible to use enum for relationshipTypes, >>> types are URIs of ontology predicates coming from CSV file, and there are >>> many of them. However, I think the problem is that this script requires >>> more than 10GB heap, because it needs to store the nodes in memory (map) to >>> use them later for creating relationships. So, I guess even reducing mmio >>> mapping size won't solve the problem, will try it though tomorrow. >>> >>> Regarding the batch-import command, do you have any idea why I am >>> getting that error? >>> >>> On Friday, December 12, 2014 1:40:56 AM UTC-8, Michael Hunger wrote: >>>> >>>> It would have been good if you had taken a thread dump from the groovy >>>> script. >>>> >>>> but if you look at the memory: >>>> >>>> off heap = 2+2+1+1 => 6 >>>> heap = 10 >>>> leaves nothing for OS >>>> >>>> probably the heap gc's as well. >>>> >>>> So you have to reduce the mmio mapping size >>>> >>>> Was the output still with nodes or already rels? >>>> >>>> Perhaps also replace DynamicRelationshipType.withName(line.Type) with >>>> an enum >>>> >>>> you can also extend trace to output number of nodes and rels >>>> >>>> Would you be able to share your csv files? >>>> >>>> Michael >>>> >>>> >>>> >>>> On Fri, Dec 12, 2014 at 10:08 AM, mohsen <[email protected]> wrote: >>>> >>>>> I could not load the data using Groovy too. I increased groovy heap >>>>> size to 10G before running the script (using JAVA_OPTS). My machine has >>>>> 16G >>>>> of RAM. It halts when it loads 41M rows from nodes.csv: >>>>> >>>>> >>>>> log: >>>>> .... >>>>> 41200000 rows 38431 ms >>>>> 41300000 rows 50988 ms >>>>> 41400000 rows 63747 ms >>>>> 41500000 rows 112758 ms >>>>> 41600000 rows 326497 ms >>>>> >>>>> After logging 41,600,000 rows, nothing happened. I waited 2 hours >>>>> there was not any progress. The process was still taking CPU but there was >>>>> NOT any free memory at that time. I guess that's the reason for that. I >>>>> have attached my groovy script where you can find the memory >>>>> configurations. I guess something goes wrong with memory since it stopped >>>>> when all my system's memory was used. >>>>> >>>>> I then switched back to batch-import tool with stacktrace. I think the >>>>> error I got last time was due to small heap size because I did not get >>>>> that >>>>> error this time (after allocating 10GB heap). Anyway, I have exactly >>>>> 86983375 >>>>> nodes and it could load the nodes this time, but I got another error: >>>>> >>>>> Nodes >>>>> >>>>> [INPUT-------------|ENCODER-----------------------------------------|WRITER] >>>>>> 86M >>>>> >>>>> Calculate dense nodes >>>>>> Import error: InputRelationship: >>>>>> properties: [] >>>>>> startNode: file:///Users/mohsen/Desktop/M >>>>>> usic%20RDF/echonest/analyze-example.rdf#signal >>>>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>>>> start node that hasn't been imported >>>>>> java.lang.RuntimeException: InputRelationship: >>>>>> properties: [] >>>>>> startNode: file:///Users/mohsen/Desktop/M >>>>>> usic%20RDF/echonest/analyze-example.rdf#signal >>>>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>>>> start node that hasn't been imported >>>>>> at org.neo4j.unsafe.impl.batchimport.staging.StageExecution. >>>>>> stillExecuting(StageExecution.java:54) >>>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor. >>>>>> anyStillExecuting(PollingExecutionMonitor.java:71) >>>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor. >>>>>> finishAwareSleep(PollingExecutionMonitor.java:94) >>>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor. >>>>>> monitor(PollingExecutionMonitor.java:62) >>>>>> at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.exec >>>>>> uteStages(ParallelBatchImporter.java:221) >>>>>> at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport( >>>>>> ParallelBatchImporter.java:139) >>>>>> at org.neo4j.tooling.ImportTool.main(ImportTool.java:212) >>>>>> Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: >>>>>> InputRelationship: >>>>>> properties: [] >>>>>> startNode: file:///Users/mohsen/Desktop/M >>>>>> usic%20RDF/echonest/analyze-example.rdf#signal >>>>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>>>> start node that hasn't been imported >>>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.en >>>>>> sureNodeFound(CalculateDenseNodesStep.java:95) >>>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process( >>>>>> CalculateDenseNodesStep.java:61) >>>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process( >>>>>> CalculateDenseNodesStep.java:38) >>>>>> at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceSte >>>>>> p$2.run(ExecutorServiceStep.java:81) >>>>>> at java.util.concurrent.Executors$RunnableAdapter.call( >>>>>> Executors.java:471) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>>>>> Executor.java:1145) >>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>>>>> lExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>> at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactor >>>>>> y.java:99) >>>>> >>>>> >>>>> It seems that it cannot find the start and end node of a >>>>> relationships. However, both nodes exist in nodes.csv (I did a grep to be >>>>> sure). So, I don't know what goes wrong. Do you have any idea? Can it be >>>>> related to the id of the start node "file:///Users/mohsen/Desktop/ >>>>> Music%20RDF/echonest/analyze-example.rdf#signal"? >>>>> On Thursday, December 11, 2014 10:02:05 PM UTC-8, Michael Hunger wrote: >>>>>> >>>>>> The groovy one should work fine too. I wanted to augment the post >>>>>> with one that has @CompileStatic so that it's faster. >>>>>> >>>>>> I'd be also interested in the --stacktraces output of the >>>>>> batch-import tool of Neo4j 2.2, perhaps you can let it run over night or >>>>>> in >>>>>> the background. >>>>>> >>>>>> Cheers, Michael >>>>>> >>>>>> On Fri, Dec 12, 2014 at 3:34 AM, mohsen <[email protected]> wrote: >>>>>> >>>>>>> I guess the core code for both batch-import and Load CSV is the >>>>>>> same, why do you think running it from Cypher (rather than through >>>>>>> batch-import) helps? I am trying groovy and batch-inserter >>>>>>> <https://gist.github.com/jexp/0617412dcdd644fd520b#file-import_kaggle-groovy> >>>>>>> now, >>>>>>> will post how it goes. >>>>>>> >>>>>>> >>>>>>> On Thursday, December 11, 2014 5:44:36 AM UTC-8, Andrii Stesin wrote: >>>>>>>> >>>>>>>> I'd suggest you take a look at last 5-7 posts in this recent thread >>>>>>>> <https://groups.google.com/forum/#!topic/neo4j/jSFtnD5OHxg>. You >>>>>>>> don't basically need any "batch import" command - I'd suggest you to >>>>>>>> use >>>>>>>> just a plain LOAD CSV functionality from Cypher, and you will just fill >>>>>>>> your database step by step. >>>>>>>> >>>>>>>> WBR, >>>>>>>> Andrii >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Neo4j" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
