I don't have any problem with sharing my CSV files, but I don't know how and where can I share these large files.
On Friday, December 12, 2014 2:26:17 AM UTC-8, mohsen wrote: > > Thanks Michael for following my problem. In groovy script, the output was > still with nodes. It is not feasible to use enum for relationshipTypes, > types are URIs of ontology predicates coming from CSV file, and there are > many of them. However, I think the problem is that this script requires > more than 10GB heap, because it needs to store the nodes in memory (map) to > use them later for creating relationships. So, I guess even reducing mmio > mapping size won't solve the problem, will try it though tomorrow. > > Regarding the batch-import command, do you have any idea why I am getting > that error? > > On Friday, December 12, 2014 1:40:56 AM UTC-8, Michael Hunger wrote: >> >> It would have been good if you had taken a thread dump from the groovy >> script. >> >> but if you look at the memory: >> >> off heap = 2+2+1+1 => 6 >> heap = 10 >> leaves nothing for OS >> >> probably the heap gc's as well. >> >> So you have to reduce the mmio mapping size >> >> Was the output still with nodes or already rels? >> >> Perhaps also replace DynamicRelationshipType.withName(line.Type) with an >> enum >> >> you can also extend trace to output number of nodes and rels >> >> Would you be able to share your csv files? >> >> Michael >> >> >> >> On Fri, Dec 12, 2014 at 10:08 AM, mohsen <[email protected]> wrote: >> >>> I could not load the data using Groovy too. I increased groovy heap size >>> to 10G before running the script (using JAVA_OPTS). My machine has 16G of >>> RAM. It halts when it loads 41M rows from nodes.csv: >>> >>> >>> log: >>> .... >>> 41200000 rows 38431 ms >>> 41300000 rows 50988 ms >>> 41400000 rows 63747 ms >>> 41500000 rows 112758 ms >>> 41600000 rows 326497 ms >>> >>> After logging 41,600,000 rows, nothing happened. I waited 2 hours there >>> was not any progress. The process was still taking CPU but there was NOT >>> any free memory at that time. I guess that's the reason for that. I have >>> attached my groovy script where you can find the memory configurations. I >>> guess something goes wrong with memory since it stopped when all my >>> system's memory was used. >>> >>> I then switched back to batch-import tool with stacktrace. I think the >>> error I got last time was due to small heap size because I did not get that >>> error this time (after allocating 10GB heap). Anyway, I have exactly >>> 86983375 >>> nodes and it could load the nodes this time, but I got another error: >>> >>> Nodes >>> >>> [INPUT-------------|ENCODER-----------------------------------------|WRITER] >>> >>>> 86M >>> >>> Calculate dense nodes >>>> Import error: InputRelationship: >>>> properties: [] >>>> startNode: >>>> file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal >>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>> start node that hasn't been imported >>>> java.lang.RuntimeException: InputRelationship: >>>> properties: [] >>>> startNode: >>>> file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal >>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>> start node that hasn't been imported >>>> at >>>> org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:54) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.anyStillExecuting(PollingExecutionMonitor.java:71) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.finishAwareSleep(PollingExecutionMonitor.java:94) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.monitor(PollingExecutionMonitor.java:62) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:221) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:139) >>>> at org.neo4j.tooling.ImportTool.main(ImportTool.java:212) >>>> Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: >>>> InputRelationship: >>>> properties: [] >>>> startNode: >>>> file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal >>>> endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1 >>>> type: http://purl.org/ontology/echonest/beatVariance specified >>>> start node that hasn't been imported >>>> at >>>> org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.ensureNodeFound(CalculateDenseNodesStep.java:95) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:61) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:38) >>>> at >>>> org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.run(ExecutorServiceStep.java:81) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> at >>>> org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:99) >>> >>> >>> It seems that it cannot find the start and end node of a relationships. >>> However, both nodes exist in nodes.csv (I did a grep to be sure). So, I >>> don't know what goes wrong. Do you have any idea? Can it be related to the >>> id of the start node >>> "file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal"? >>> On Thursday, December 11, 2014 10:02:05 PM UTC-8, Michael Hunger wrote: >>>> >>>> The groovy one should work fine too. I wanted to augment the post with >>>> one that has @CompileStatic so that it's faster. >>>> >>>> I'd be also interested in the --stacktraces output of the batch-import >>>> tool of Neo4j 2.2, perhaps you can let it run over night or in the >>>> background. >>>> >>>> Cheers, Michael >>>> >>>> On Fri, Dec 12, 2014 at 3:34 AM, mohsen <[email protected]> wrote: >>>> >>>>> I guess the core code for both batch-import and Load CSV is the same, >>>>> why do you think running it from Cypher (rather than through >>>>> batch-import) >>>>> helps? I am trying groovy and batch-inserter >>>>> <https://gist.github.com/jexp/0617412dcdd644fd520b#file-import_kaggle-groovy> >>>>> now, >>>>> will post how it goes. >>>>> >>>>> >>>>> On Thursday, December 11, 2014 5:44:36 AM UTC-8, Andrii Stesin wrote: >>>>>> >>>>>> I'd suggest you take a look at last 5-7 posts in this recent thread >>>>>> <https://groups.google.com/forum/#!topic/neo4j/jSFtnD5OHxg>. You >>>>>> don't basically need any "batch import" command - I'd suggest you to use >>>>>> just a plain LOAD CSV functionality from Cypher, and you will just fill >>>>>> your database step by step. >>>>>> >>>>>> WBR, >>>>>> Andrii >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
