I could not load the data using Groovy too. I increased groovy heap size to 
10G before running the script (using JAVA_OPTS). My machine has 16G of 
RAM. It halts when it loads 41M rows from nodes.csv:


log: 
....
41200000 rows 38431 ms
41300000 rows 50988 ms 
41400000 rows 63747 ms 
41500000 rows 112758 ms 
41600000 rows 326497 ms

After logging 41,600,000 rows, nothing happened. I waited 2 hours there was 
not any progress. The process was still taking CPU but there was NOT any 
free memory at that time. I guess that's the reason for that. I have 
attached my groovy script where you can find the memory configurations. I 
guess something goes wrong with memory since it stopped when all my 
system's memory was used.

I then switched back to batch-import tool with stacktrace. I think the 
error I got last time was due to small heap size because I did not get that 
error this time (after allocating 10GB heap). Anyway, I have exactly 86983375 
nodes and it could load the nodes this time, but I got another error:  

 Nodes

[INPUT-------------|ENCODER-----------------------------------------|WRITER] 
> 86M

Calculate dense nodes
> Import error: InputRelationship:
>    properties: []
>    startNode: 
> file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal
>    endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1
>    type: http://purl.org/ontology/echonest/beatVariance specified start 
> node that hasn't been imported
> java.lang.RuntimeException: InputRelationship:
>    properties: []
>    startNode: 
> file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal
>    endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1
>    type: http://purl.org/ontology/echonest/beatVariance specified start 
> node that hasn't been imported
> at 
> org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:54)
> at 
> org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.anyStillExecuting(PollingExecutionMonitor.java:71)
> at 
> org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.finishAwareSleep(PollingExecutionMonitor.java:94)
> at 
> org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.monitor(PollingExecutionMonitor.java:62)
> at 
> org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:221)
> at 
> org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:139)
> at org.neo4j.tooling.ImportTool.main(ImportTool.java:212)
> Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: 
> InputRelationship:
>    properties: []
>    startNode: 
> file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal
>    endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1
>    type: http://purl.org/ontology/echonest/beatVariance specified start 
> node that hasn't been imported
> at 
> org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.ensureNodeFound(CalculateDenseNodesStep.java:95)
> at 
> org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:61)
> at 
> org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:38)
> at 
> org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.run(ExecutorServiceStep.java:81)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:99)


It seems that it cannot find the start and end node of a relationships. 
However, both nodes exist in nodes.csv (I did a grep to be sure). So, I 
don't know what goes wrong. Do you have any idea? Can it be related to the 
id of the start node 
"file:///Users/mohsen/Desktop/Music%20RDF/echonest/analyze-example.rdf#signal"?
On Thursday, December 11, 2014 10:02:05 PM UTC-8, Michael Hunger wrote:
>
> The groovy one should work fine too. I wanted to augment the post with one 
> that has @CompileStatic so that it's faster. 
>
> I'd be also interested in the --stacktraces output of the batch-import 
> tool of Neo4j 2.2, perhaps you can let it run over night or in the 
> background.
>
> Cheers, Michael
>
> On Fri, Dec 12, 2014 at 3:34 AM, mohsen <[email protected] <javascript:>
> > wrote:
>
>> I guess the core code for both batch-import and Load CSV is the same, why 
>> do you think running it from Cypher (rather than through batch-import) 
>> helps? I am trying groovy and batch-inserter 
>> <https://gist.github.com/jexp/0617412dcdd644fd520b#file-import_kaggle-groovy>
>>  now, 
>> will post how it goes.
>>
>>
>> On Thursday, December 11, 2014 5:44:36 AM UTC-8, Andrii Stesin wrote:
>>>
>>> I'd suggest you take a look at last 5-7 posts in this recent thread 
>>> <https://groups.google.com/forum/#!topic/neo4j/jSFtnD5OHxg>. You don't 
>>> basically need any "batch import" command - I'd suggest you to use just a 
>>> plain LOAD CSV functionality from Cypher, and you will just fill your 
>>> database step by step.
>>>
>>> WBR,
>>> Andrii
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Attachment: csv2neo4j.groovy
Description: Binary data

Reply via email to