[
https://issues.apache.org/jira/browse/JENA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andy Seaborne resolved JENA-2044.
---------------------------------
Fix Version/s: Jena 4.2.0
Assignee: Andy Seaborne
Resolution: Feedback Received
> tdb2.tdbloader crashses loading wikidata
> ----------------------------------------
>
> Key: JENA-2044
> URL: https://issues.apache.org/jira/browse/JENA-2044
> Project: Apache Jena
> Issue Type: Bug
> Components: Cmd line tools, Jena, TDB2
> Affects Versions: Jena 3.14.0, Jena 3.17.0
> Environment: {code:bash}
> $ java --version
> openjdk 11.0.9.1 2020-11-04
> OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2)
> OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode,
> sharing)
> $ uname -r
> 4.19.0-14-amd64
> $ lsb_release -da
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 10 (buster)
> Release: 10
> Codename: buster
> {code}
> Reporter: Alexander Bigerl
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 4.2.0
>
> Attachments: FusekiWikidataLoaderDockerfile,
> fuseki-wikidata-2020-11-11_(docker_8c_J16g_R256g_K256g_phased).log,
> hs_err_pid28709.log
>
>
> Apache jena crashes when loading wikidata truthy 2020-11-11 (it is not
> available any more via wikidata, but a backup can be found here:
> [https://hobbitdata.informatik.uni-leipzig.de/wikidata-20201111-truthy-BETA.nt.bz2]
> command run was:
> {code:bash}
> cgmemtime
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/triplestores/jena/apache-jena-3.17.0/bin/tdb2.tdbloader
> --loader=sequential --loc
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/databases/fuseki/wikidata-2020-11-11/
>
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/datasets/wikidata-2020-11-11/wikidata-20201111-truthy-BETA.nt
> 2>&1 | tee
> /upb/users/d/dice-gr/profiles/unix/cs/triplestore-benchmark/logs/load/fuseki-wikidata-2020-11-11.log
> {code}
>
> The end of the logfile is:
> {code:bash}
> 06:56:52 INFO loader :: Add: 729,000,000 SPO->OSP (Batch: 237,642 /
> Avg: 300,050)
> 06:56:57 INFO loader :: Add: 730,000,000 SPO->OSP (Batch: 234,962 /
> Avg: 299,936)
> 06:56:57 INFO loader :: Elapsed: 2,433.85 seconds [2021/02/12
> 06:56:57 CET]
> 06:57:01 INFO loader :: Add: 731,000,000 SPO->OSP (Batch: 233,863 /
> Avg: 299,820)
> 06:57:05 INFO loader :: Add: 732,000,000 SPO->OSP (Batch: 269,978 /
> Avg: 299,775)
> 06:57:08 INFO loader :: Add: 733,000,000 SPO->OSP (Batch: 281,373 /
> Avg: 299,748)
> 06:57:12 INFO loader :: Add: 734,000,000 SPO->OSP (Batch: 285,143 /
> Avg: 299,727)
> 06:57:15 INFO loader :: Add: 735,000,000 SPO->OSP (Batch: 290,023 /
> Avg: 299,714)
> 06:57:19 INFO loader :: Add: 736,000,000 SPO->OSP (Batch: 290,951 /
> Avg: 299,701)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 2097152 bytes for
> AllocateHeap
> # An error report file with more information is saved as:
> # /home/d/dice-gr/profiles/unix/cs/triplestore-benchmark/hs_err_pid28709.log
> Child user: 70961.063 s
> Child sys : 5817.787 s
> Child wall: 76243.666 s
> Child high-water RSS : 534037652 KiB
> Recursive and acc. high-water RSS+CACHE : 585081976 KiB
> {code}
> The machine has a AMD EPYC 7742 64-Core Processor, 1TB RAM and 2 TB free ssd
> storage on /home. So there should still be plenty of RAM have been available.
> The loc folder has at the time of crash 543GB.
> I also tried with -loader=parallel and -loader=phased . Same result.
> h3. update 23/Feb/2021:
> ulimit -a
> {code:bash}
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 4110480
> max locked memory (kbytes, -l) 65536
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited
> max user processes (-u) 4110480
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
> {code}
> output from prlimit --pid=<PID of tdb.loader2>:
> {code:bash}
> RESOURCE DESCRIPTION SOFT HARD UNITS
> AS address space limit unlimited unlimited bytes
> CORE max core file size 0 unlimited bytes
> CPU CPU time unlimited unlimited seconds
> DATA max data size unlimited unlimited bytes
> FSIZE max file size unlimited unlimited bytes
> LOCKS max number of file locks held unlimited unlimited locks
> MEMLOCK max locked-in-memory address space 67108864 67108864 bytes
> MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
> NICE max nice prio allowed to raise 0 0
> NOFILE max number of open files 1048576 1048576 files
> NPROC max number of processes 4110480 4110480 processes
> RSS max resident set size unlimited unlimited bytes
> RTPRIO max real-time priority 0 0
> RTTIME timeout for real-time tasks unlimited unlimited microsecs
> SIGPENDING max number of pending signals 4110480 4110480 signals
> STACK max stack size 8388608 unlimited bytes
> {code}
> I tried several things. Nothing worked so far. Here is what I tried:
> * Setting several JVM_ARGS to avoid resource over-consumption:
> {code:bash}
> -Xmx16g -XX:MaxDirectMemorySize=16g -Djdk.nio.maxCachedBufferSize=17179869184
> -XX:ActiveProcessorCount=4
> {code}
> * Setting limits for Metaspace and CodeCache:
> {code:bash}
> -XX:MaxMetaspaceSize=4g -XX:ReservedCodeCacheSize=4g
> {code}
> * limiting kernel shared memory to (set in /etc/sysctl.conf + reboot)
> {code:bash}
> kernel.shmmax=17179869185 # 16 GB
> kernel.shmall=4194304 # *4096 pagesize = 16 GB
> {code}
> * putting jena into a dockerfile and run it with the commands below. The idea
> was to limit the resources and have a default environment. I first tried with
> 16g memory but it within a couple 100M triples dropping below 10k triples/s
> loading speed. So I increased the reseources because I didn't have that much
> time on the server. Dockerfile is provided as attachment as well as the
> output. Here the output might be interesting. Because tdb2.loader did not
> crash completely but simply logged an Exception and idled ever since.
> {code:bash}
> docker build -f FusekiWikidataLoaderDockerfile -t fuseki-wikidata_01
> empty_folder/
> dockerun --user $(id -u):$(id -g) --kernel-memory=256g --memory=256g
> --cpuset-cpus=0-7 --name fuseki_wikidata_8c_J16g_R256g_K256g_phased -v
> "$(pwd)"/databases:/databases -v "$(pwd)"/datasets:/datasets -v
> "$(pwd)"/logs:/logs fuseki-wikidata_01:latest
> {code}
> error:
> {code:bash}
> 05:27:31 INFO loader :: Add: 3,156,000,000 Index (Batch: 57,074 /
> Avg: 159,787)
> 05:27:35 ERROR Indexer :: Interrupted
> org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate:
> Segment = 16313 : Offset = 136843362304
> at
> org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> [53484.193s][warning][os,thread] Attempt to deallocate stack guard pages
> failed (0x00007f33244ff000-0x00007f3324503000).
> 05:27:36 ERROR Indexer :: Interrupted
> org.apache.jena.dboe.base.file.FileException: BlockMgrMapped.segmentAllocate:
> Segment = 16459 : Offset = 138068099072
> at
> org.apache.jena.dboe.base.file.BlockAccessMapped.allocSegment(BlockAccessMapped.java:218)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.file.BlockAccessMapped.getByteBuffer(BlockAccessMapped.java:150)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.file.BlockAccessMapped.allocate(BlockAccessMapped.java:80)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrFileAccess.allocate(BlockMgrFileAccess.java:51)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrBase.allocate(BlockMgrBase.java:39)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrWrapper.allocate(BlockMgrWrapper.java:41)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.block.BlockMgrFreeChain.allocate(BlockMgrFreeChain.java:44)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.base.page.PageBlockMgr.create(PageBlockMgr.java:53)
> ~[jena-dboe-base-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecordsMgr.create(BPTreeRecordsMgr.java:77)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecords.create(BPTreeRecords.java:236)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecords.insertNewPage(BPTreeRecords.java:230)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeRecords.split(BPTreeRecords.java:206)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.split(BPTreeNode.java:568)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:509)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.internalInsert(BPTreeNode.java:522)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPTreeNode.insert(BPTreeNode.java:203)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPlusTree.insertAndReturnOld(BPlusTree.java:278)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.dboe.trans.bplustree.BPlusTree.insert(BPlusTree.java:271)
> ~[jena-dboe-trans-data-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.store.tupletable.TupleIndexRecord.performAdd(TupleIndexRecord.java:94)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.store.tupletable.TupleIndexBase.add(TupleIndexBase.java:66)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.loader.main.Indexer.lambda$loadTuples$1(Indexer.java:134)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.loader.main.Indexer.stageIndex(Indexer.java:116)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at
> org.apache.jena.tdb2.loader.main.Indexer.lambda$startBulk$0(Indexer.java:93)
> ~[jena-tdb2-3.17.0.jar:3.17.0]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> [53485.632s][warning][os,thread] Attempt to deallocate stack guard pages
> failed (0x00007f33242fd000-0x00007f3324301000).
> {code}
> Side observation:
> * Fusekis loading performance seems to be influenced by linux' cache
> significantly. If there is 1TB available it had had a read rate of ~2.5GB/s
> (measured with htop). With only 16GB it dropped to at max 250MB/s.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)