>our id's are UUIDs or ? so 36 chars * 90M -> 72 bytes and Neo-Id's are
longs w/ 8 bytes. so 80 bytes per entry.
Should allocate about 6G heap.

Btw. importing RDF 1:1 into Neo4j is no good idea in the first place.

You should model a clean property graph model and import INTO that model.

The the batch-import, it's a bug that has been fixed after the milestone, I
try to get you a newer version to try.

Cheers, Michael



On Fri, Dec 12, 2014 at 11:26 AM, mohsen <[email protected]> wrote:

> Thanks Michael for following my problem. In groovy script, the output was
> still with nodes. It is not feasible to use enum for relationshipTypes,
> types are URIs of ontology predicates coming from CSV file, and there are
> many of them. However, I think the problem is that this script requires
> more than 10GB heap, because it needs to store the nodes in memory (map) to
> use them later for creating relationships. So, I guess even reducing mmio
> mapping size won't solve the problem, will try it though tomorrow.
>
> Regarding the batch-import command, do you have any idea why I am getting
> that error?
>
> On Friday, December 12, 2014 1:40:56 AM UTC-8, Michael Hunger wrote:
>>
>> It would have been good if you had taken a thread dump from the groovy
>> script.
>>
>> but if you look at the memory:
>>
>> off heap = 2+2+1+1 => 6
>> heap = 10
>> leaves nothing for OS
>>
>> probably the heap gc's as well.
>>
>> So you have to reduce the mmio mapping size
>>
>> Was the output still with nodes or already rels?
>>
>> Perhaps also replace DynamicRelationshipType.withName(line.Type) with an
>> enum
>>
>> you can also extend trace to output number of nodes and rels
>>
>> Would you be able to share your csv files?
>>
>> Michael
>>
>>
>>
>> On Fri, Dec 12, 2014 at 10:08 AM, mohsen <[email protected]> wrote:
>>
>>> I could not load the data using Groovy too. I increased groovy heap size
>>> to 10G before running the script (using JAVA_OPTS). My machine has 16G of
>>> RAM. It halts when it loads 41M rows from nodes.csv:
>>>
>>>
>>> log:
>>> ....
>>> 41200000 rows 38431 ms
>>> 41300000 rows 50988 ms
>>> 41400000 rows 63747 ms
>>> 41500000 rows 112758 ms
>>> 41600000 rows 326497 ms
>>>
>>> After logging 41,600,000 rows, nothing happened. I waited 2 hours there
>>> was not any progress. The process was still taking CPU but there was NOT
>>> any free memory at that time. I guess that's the reason for that. I have
>>> attached my groovy script where you can find the memory configurations. I
>>> guess something goes wrong with memory since it stopped when all my
>>> system's memory was used.
>>>
>>> I then switched back to batch-import tool with stacktrace. I think the
>>> error I got last time was due to small heap size because I did not get that
>>> error this time (after allocating 10GB heap). Anyway, I have exactly 
>>> 86983375
>>> nodes and it could load the nodes this time, but I got another error:
>>>
>>>  Nodes
>>>
>>> [INPUT-------------|ENCODER-----------------------------------------|WRITER]
>>>> 86M
>>>
>>> Calculate dense nodes
>>>> Import error: InputRelationship:
>>>>    properties: []
>>>>    startNode: file:///Users/mohsen/Desktop/
>>>> Music%20RDF/echonest/analyze-example.rdf#signal
>>>>    endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1
>>>>    type: http://purl.org/ontology/echonest/beatVariance specified
>>>> start node that hasn't been imported
>>>> java.lang.RuntimeException: InputRelationship:
>>>>    properties: []
>>>>    startNode: file:///Users/mohsen/Desktop/
>>>> Music%20RDF/echonest/analyze-example.rdf#signal
>>>>    endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1
>>>>    type: http://purl.org/ontology/echonest/beatVariance specified
>>>> start node that hasn't been imported
>>>> at org.neo4j.unsafe.impl.batchimport.staging.
>>>> StageExecution.stillExecuting(StageExecution.java:54)
>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.
>>>> anyStillExecuting(PollingExecutionMonitor.java:71)
>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.
>>>> finishAwareSleep(PollingExecutionMonitor.java:94)
>>>> at org.neo4j.unsafe.impl.batchimport.staging.PollingExecutionMonitor.
>>>> monitor(PollingExecutionMonitor.java:62)
>>>> at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.
>>>> executeStages(ParallelBatchImporter.java:221)
>>>> at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(
>>>> ParallelBatchImporter.java:139)
>>>> at org.neo4j.tooling.ImportTool.main(ImportTool.java:212)
>>>> Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException:
>>>> InputRelationship:
>>>>    properties: []
>>>>    startNode: file:///Users/mohsen/Desktop/
>>>> Music%20RDF/echonest/analyze-example.rdf#signal
>>>>    endNode: 82A4CB6E-7250-1634-DBB8-0297C5259BB1
>>>>    type: http://purl.org/ontology/echonest/beatVariance specified
>>>> start node that hasn't been imported
>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.
>>>> ensureNodeFound(CalculateDenseNodesStep.java:95)
>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(
>>>> CalculateDenseNodesStep.java:61)
>>>> at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(
>>>> CalculateDenseNodesStep.java:38)
>>>> at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.run(
>>>> ExecutorServiceStep.java:81)
>>>> at java.util.concurrent.Executors$RunnableAdapter.
>>>> call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1145)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> at org.neo4j.helpers.NamedThreadFactory$2.run(
>>>> NamedThreadFactory.java:99)
>>>
>>>
>>> It seems that it cannot find the start and end node of a relationships.
>>> However, both nodes exist in nodes.csv (I did a grep to be sure). So, I
>>> don't know what goes wrong. Do you have any idea? Can it be related to the
>>> id of the start node "file:///Users/mohsen/Desktop/
>>> Music%20RDF/echonest/analyze-example.rdf#signal"?
>>> On Thursday, December 11, 2014 10:02:05 PM UTC-8, Michael Hunger wrote:
>>>>
>>>> The groovy one should work fine too. I wanted to augment the post with
>>>> one that has @CompileStatic so that it's faster.
>>>>
>>>> I'd be also interested in the --stacktraces output of the batch-import
>>>> tool of Neo4j 2.2, perhaps you can let it run over night or in the
>>>> background.
>>>>
>>>> Cheers, Michael
>>>>
>>>> On Fri, Dec 12, 2014 at 3:34 AM, mohsen <[email protected]> wrote:
>>>>
>>>>> I guess the core code for both batch-import and Load CSV is the same,
>>>>> why do you think running it from Cypher (rather than through batch-import)
>>>>> helps? I am trying groovy and batch-inserter
>>>>> <https://gist.github.com/jexp/0617412dcdd644fd520b#file-import_kaggle-groovy>
>>>>>  now,
>>>>> will post how it goes.
>>>>>
>>>>>
>>>>> On Thursday, December 11, 2014 5:44:36 AM UTC-8, Andrii Stesin wrote:
>>>>>>
>>>>>> I'd suggest you take a look at last 5-7 posts in this recent thread
>>>>>> <https://groups.google.com/forum/#!topic/neo4j/jSFtnD5OHxg>. You
>>>>>> don't basically need any "batch import" command - I'd suggest you to use
>>>>>> just a plain LOAD CSV functionality from Cypher, and you will just fill
>>>>>> your database step by step.
>>>>>>
>>>>>> WBR,
>>>>>> Andrii
>>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to