Rich, thanks so much for looking into it. as far as I understood, this new implementation does not yet support external id's (i.e. those index-lookups).
I'm working in parallel on another version that integrates with new kernel-level APIs that provide a similar functionality but will also support external ids. So right now for that new importer you have to provide node-id's externally. Cheers, Michael On Sun, Jun 1, 2014 at 12:33 AM, Rich Morin <r...@cfcl.com> wrote: > I've been trying out the new "Superfast Batch Importer": > > https://github.com/neo4j-contrib/superfast-batch-importer > > It looks very promising, but I'm having a few problems. Help? > > Background > > My real files will be enormous (eg, 100M rels), so I'm using > moderate-sized test files: > > $ wc -l tmp/[nr]* > 128299 tmp/nodes.csv > 92661 tmp/rels.csv > 220960 total > > Here is my batch.properties file, set up for a 32GB, 8-core Mac Pro, > running OSX 10.7.5: > > cache_type=none > use_memory_mapped_buffers=true > # 14 bytes per node > neostore.nodestore.db.mapped_memory=2G > # 33 bytes per relationship > neostore.relationshipstore.db.mapped_memory=20G > # 38 bytes per property > neostore.propertystore.db.mapped_memory=1G > # 60 bytes per long-string block > neostore.propertystore.db.strings.mapped_memory=1G > neostore.propertystore.db.index.keys.mapped_memory=50M > neostore.propertystore.db.index.mapped_memory=50M > # set up indexing > batch_import.node_index.Xhas_airport_code=exact > batch_import.node_index.XhasArea=exact > batch_import.node_index.Xhas_family_name=exact > batch_import.node_index.Xhas_GeoNames_Class_ID=exact > batch_import.node_index.Xhas_GeoNames_Entity_ID=exact > batch_import.node_index.Xhas_given_name=exact > batch_import.node_index.Xhas_gloss=exact > batch_import.node_index.Xhas_ISBN=exact > batch_import.node_index.Xhas_IMDB=exact > batch_import.node_index.Xhas_language_code=exact > batch_import.node_index.Xhas_motto=exact > batch_import.node_index.Xhas_official_language=exact > batch_import.node_index.Xhas_Synset_ID=exact > batch_import.node_index.Xhas_top-level_domain=exact > batch_import.node_index.Xhas_three-letter_language_code=exact > batch_import.node_index.Xis_preferred_meaning_of=exact > batch_import.node_index.Xlabel=exact > batch_import.node_index.Xns_name=exact > batch_import.node_index.Xpreferred_label=exact > > > Behavior > > My Terminal output looks rather messy; perhaps some output buffering > tweaks or newlines are needed: > > $ time import.sh test.db -nodes ../nodes.csv -rels ../rels.csv > Neo4j Data Importer > Importer -db-directory <graph.db> -nodes <nodes.csv> -rels <rels.csv> > -debug <debug config> > > Using Existing Configuration File > [Current time:2014-05-31 15:11:32.258][Compile Time:Importer $ > batch-import-2.1.0 $ 31/05/2014 04:12:24] > Node Import: [5] Property[292048] Node[128298] Relationship[0] Label[0] > Disk[13 mb, 0 mb/sec] FreeMem[3173 mb] > [2014-05-31 15:11:41.02] Node file [nodes.csv] imported in 8 secs - > [Property[292048] Node[128298] Relationship[0] Label[0]] > [2014-05-31 15:11:41.02]Node Import complete in 8 secs - [Property[292048] > Node[128298] Relationship[0] Label[0]]java.lang.NumberFormatException: For > input string: "owl_Thing" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Long.parseLong(Long.java:441) > at java.lang.Long.parseLong(Long.java:483) > at > org.neo4j.batchimport.importer.structs.AbstractDataBuffer.getLong(AbstractDataBuffer.java:159) > at > org.neo4j.unsafe.batchinsert.BatchInserterImplNew.accumulateNodeCount(BatchInserterImplNew.java:743) > at > org.neo4j.batchimport.importer.stages.NodeStatsAccumulatorStage$2.execute(NodeStatsAccumulatorStage.java:24) > at > org.neo4j.batchimport.importer.stages.ImportWorker.processData(ImportWorker.java:144) > at > org.neo4j.batchimport.importer.stages.ImportWorker.run(ImportWorker.java:196) > Invoke stage method failed:ImportNode_Stage1:[Error in accumulateNodeCount > - For input string: "owl_Thing"]:1 > org.neo4j.kernel.api.Exceptions.BatchImportException: [Error in > accumulateNodeCount - For input string: "owl_Thing"] > at > org.neo4j.batchimport.importer.stages.ImportWorker.processData(ImportWorker.java:152) > at > org.neo4j.batchimport.importer.stages.ImportWorker.run(ImportWorker.java:196) > Import worker:ImportNode_Stage1:[Error in accumulateNodeCount - For input > string: "owl_Thing"] > Uncaught exception: java.lang.RuntimeException: [Error in > accumulateNodeCount - For input string: "owl_Thing"] > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:321) > at > org.neo4j.batchimport.importer.structs.DataBufferBlockingQ.putBuffer(DataBufferBlockingQ.java:243) > at > org.neo4j.batchimport.importer.stages.ImportWorker.writeData(ImportWorker.java:160) > at > org.neo4j.batchimport.importer.stages.ImportWorker.run(ImportWorker.java:198) > java.lang.InterruptedException: sleep interruptedImport > worker:ImportNode_Stage0:null > > at java.lang.Thread.sleep(Native Method)Uncaught exception: > java.lang.RuntimeException > > at > org.neo4j.batchimport.importer.stages.ImportWorker.readData(ImportWorker.java:131) > at > org.neo4j.batchimport.importer.stages.ImportWorker.run(ImportWorker.java:194) > Import worker:ImportNode_Stage4:sleep interrupted > java.lang.InterruptedException: sleep interruptedUncaught exception: > java.lang.RuntimeException: sleep interrupted > ... > > Aside from the fact that the SBI doesn't like "owl_Thing", I have no clue > about what the problem is. > > > Also, the Relationship Prescan counter (827, below) is only changing once > a second: > > [827] Property[292048] Node[128298] Relationship[0] Label[0] Disk[13 mb, > 0 mb/sec] FreeMem[3088 mb] > > If this is counting relationships, this load is gonna take a loooong > time... > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to neo4j+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.