Hello there, I have been struggling with building indexes from generic rdf and even using default configuration for more popular sources like dbpedia.
I found an indexing tool online configured to index yago, at https://github.com/ChalithaUdara/Stanbol-Yago-Site. Everything seemed to be going well until it got into this loop: 11:17:26,546 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace ' http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored! 11:17:26,546 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'condition' valid , namespace ' http://www.kinjal.com/condition:' invalid -> mapping ignored! 11:17:26,576 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'wimpo' valid , namespace ' http://rdfex.org/withImports?uri=' invalid -> mapping ignored! 12:17:26,856 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'nsogi' valid , namespace ' http://prefix.cc/nsogi:' invalid -> mapping ignored! 12:17:26,918 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'dbc' valid , namespace ' http://dbpedia.org/resource/Category:' invalid -> mapping ignored! 12:17:26,949 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'category' valid , namespace ' http://dbpedia.org/resource/Category:' invalid -> mapping ignored! 12:17:26,949 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'hgnc' valid , namespace ' http://bio2rdf.org/hgnc:' invalid -> mapping ignored! 12:17:26,950 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'chebi' valid , namespace ' http://bio2rdf.org/chebi:' invalid -> mapping ignored! 12:17:26,980 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'dbt' valid , namespace ' http://dbpedia.org/resource/Template:' invalid -> mapping ignored! 12:17:26,980 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'pubmed' valid , namespace ' http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored! 12:17:26,980 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace ' http://dbpedia.org/resource/Template:' invalid -> mapping ignored! 12:17:26,981 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'dbrc' valid , namespace ' http://dbpedia.org/resource/Category:' invalid -> mapping ignored! 12:17:26,981 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'call' valid , namespace ' http://webofcode.org/wfn/call:' invalid -> mapping ignored! 12:17:27,011 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'dbcat' valid , namespace ' http://dbpedia.org/resource/Category:' invalid -> mapping ignored! 12:17:27,011 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'bgcat' valid , namespace ' http://bg.dbpedia.org/resource/?????????:' invalid -> mapping ignored! 12:17:27,012 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace ' http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored! 12:17:27,012 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'condition' valid , namespace ' http://www.kinjal.com/condition:' invalid -> mapping ignored! 12:17:27,042 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - Invalid Namespace Mapping: prefix 'wimpo' valid , namespace ' http://rdfex.org/withImports?uri=' invalid -> mapping ignored! It happened to me before with the dbpedia index and at first I thought it was some problem with the rdf source, and since theses messages are logged at WARN level, I simply ignored them. but after days, the indexing/tdb directory stayed the same size even though there are still files in the indexing/resources/rdfdata directory. Then I realised that these messages follow a pattern and they are logged every hour with precision to the second, which seems weird. Also, they are always the same messages. This led me to think that the indexing tool is stuck in a loop and that's why it is not moving any further. I think it is important to say that the one hour time span between messages is the same for the dbpedia index and for the yago index, the yago index is much bigger. I have been constantly running `watch du * -s` in the resources directory for days to check for size changes and nothing is changing and hasn't changed for days. I don't know if this is some problem with the configuration, but since I didn't configure it myself, I assumed that what I got from github would be a working configuration for this specific index. I have a few questions related to this problem: 1) Is it safe to cancel the indexing tool and start again without changing what's in the rdfdata and imported directories? Could this help at all? 2) What can possibly be causing this problem? 3) Why is it looping and logging every hour (accurate to the second)? If there is any extra information I can provide that would help understanding what the problem is here, tell me what it is and I will provide it. Regards, Antero