TDB command-line tools should respond to ordinary usage of the JAVA_OPTS 
environment variable, so you should be able to do something like:

JAVA_OPTS="-Xmx2G" tdbstats  --loc=./tdb

to run with a 2GB heap, for example.

One thing to note is that if you were even able to access the TDB database at 
all (which it seems you were because your attempt to use tdbstats didn't error 
out instantly) it means that no other process was using the database, 
particularly that the indexer was not. Only one system process can use a TDB 
database at a time. Now, that may or may not mean that the indexer was actually 
done doing stuff with the database: I don't know enough about how it works to 
know that, although the Stanbol devs on here would. It just means that at the 
moment you tried to run tdbstats, the indexer had relinquished control of the 
TDB database, at least for the moment.

---
A. Soroka
The University of Virginia Library

> On Apr 5, 2016, at 6:42 PM, Antero Duarte <a.fduar...@gmail.com> wrote:
> 
> Fair enough... I just wanted to figure out if it is actually stopped before
> stopping it, but I will do that tomorrow.
> 
> I didn't change anything with jena cmd tools, to be honest, I never used it
> before, so I just googled what to do and how to do it, and the tdbstats
> command was the one that made more sense. Is this a flag that is passed in
> to tdbstats?
> 
> Thanks,
> Antero
> 
> On Tue, 5 Apr 2016 8:12 pm A. Soroka, <aj...@virginia.edu> wrote:
> 
>> I don't know whether you will be able to restart the indexer, but it seems
>> like you aren't getting much of anywhere so you might as well stop it with
>> the proviso that you might have to start again from scratch. But YMMV.
>> 
>> As for the Jena side of things: did you adjust the heap allocation when
>> you ran tdbstats?
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Apr 5, 2016, at 12:45 PM, Antero Duarte <a.fduar...@gmail.com> wrote:
>>> 
>>> Thanks for your reply.
>>> 
>>> I downloaded jena to use the command line tools, what command should I
>> run?
>>> I tried running `tdbstats --loc=./tdb` (from the resources dir) but
>> after a
>>> whileit threw an exception because it ran out of heap space. Is there
>>> another command that can be more useful and doesn't use as much memory?
>> On
>>> the other hand if I can stop the indexing tool, I can clear some memory
>> to
>>> let tdbstats run.
>>> 
>>> Regards,
>>> Antero
>>> 
>>> On Tue, 5 Apr 2016 at 15:21 A. Soroka <aj...@virginia.edu> wrote:
>>> 
>>>> Looks similar to something others have seen:
>>>> 
>>>> https://issues.apache.org/jira/browse/STANBOL-1446
>>>> 
>>>> which doesn't help you much, but might be a place to centralize the
>> answer
>>>> to this question. I wouldn't think that a WARN level message would tag a
>>>> condition so severe that indexing doesn't take place. Perhaps it is
>>>> something else.
>>>> 
>>>> Can you use Jena's command-line tools to check and see how many entities
>>>> have actually been loaded into TDB vs. how many you expect? That might
>> give
>>>> you a clue as to where indexing is hanging up (if it actually is).
>>>> 
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>> 
>>>>> On Apr 5, 2016, at 7:59 AM, Antero Duarte <a.fduar...@gmail.com>
>> wrote:
>>>>> 
>>>>> Hello there,
>>>>> 
>>>>> I have been struggling with building indexes from generic rdf and even
>>>>> using default configuration for more popular sources like dbpedia.
>>>>> 
>>>>> I found an indexing tool online configured to index yago, at
>>>>> https://github.com/ChalithaUdara/Stanbol-Yago-Site.
>>>>> 
>>>>> Everything seemed to be going well until it got into this loop:
>>>>> 
>>>>> 11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
>>>>> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
>>>>> 11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
>>>>> http://www.kinjal.com/condition:' invalid -> mapping ignored!
>>>>> 11:17:26,576 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
>>>>> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
>>>>> 12:17:26,856 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'nsogi' valid , namespace '
>>>>> http://prefix.cc/nsogi:' invalid -> mapping ignored!
>>>>> 12:17:26,918 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'dbc' valid , namespace '
>>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>>>> 12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'category' valid , namespace '
>>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>>>> 12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'hgnc' valid , namespace '
>>>>> http://bio2rdf.org/hgnc:' invalid -> mapping ignored!
>>>>> 12:17:26,950 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'chebi' valid , namespace '
>>>>> http://bio2rdf.org/chebi:' invalid -> mapping ignored!
>>>>> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'dbt' valid , namespace '
>>>>> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>>>>> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'pubmed' valid , namespace '
>>>>> http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
>>>>> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace '
>>>>> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>>>>> 12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'dbrc' valid , namespace '
>>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>>>> 12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'call' valid , namespace '
>>>>> http://webofcode.org/wfn/call:' invalid -> mapping ignored!
>>>>> 12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'dbcat' valid , namespace '
>>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>>>> 12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'bgcat' valid , namespace '
>>>>> http://bg.dbpedia.org/resource/?????????:' invalid -> mapping ignored!
>>>>> 12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
>>>>> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
>>>>> 12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
>>>>> http://www.kinjal.com/condition:' invalid -> mapping ignored!
>>>>> 12:17:27,042 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
>>>>> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
>>>>> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
>>>>> 
>>>>> It happened to me before with the dbpedia index and at first I thought
>> it
>>>>> was some problem with the rdf source, and since theses messages are
>>>> logged
>>>>> at WARN level, I simply ignored them. but after days, the indexing/tdb
>>>>> directory stayed the same size even though there are still files in the
>>>>> indexing/resources/rdfdata directory. Then I realised that these
>> messages
>>>>> follow a pattern and they are logged every hour with precision to the
>>>>> second, which seems weird. Also, they are always the same messages.
>> This
>>>>> led me to think that the indexing tool is stuck in a loop and that's
>> why
>>>> it
>>>>> is not moving any further. I think it is important to say that the one
>>>> hour
>>>>> time span between messages is the same for the dbpedia index and for
>> the
>>>>> yago index, the yago index is much bigger.
>>>>> 
>>>>> I have been constantly running `watch du * -s` in the resources
>> directory
>>>>> for days to check for size changes and nothing is changing and hasn't
>>>>> changed for days.
>>>>> 
>>>>> I don't know if this is some problem with the configuration, but since
>> I
>>>>> didn't configure it myself, I assumed that what I got from github would
>>>> be
>>>>> a working configuration for this specific index.
>>>>> 
>>>>> I have a few questions related to this problem:
>>>>> 
>>>>> 1) Is it safe to cancel the indexing tool and start again without
>>>> changing
>>>>> what's in the rdfdata and imported directories? Could this help at all?
>>>>> 
>>>>> 2) What can possibly be causing this problem?
>>>>> 
>>>>> 3) Why is it looping and logging every hour (accurate to the second)?
>>>>> 
>>>>> If there is any extra information I can provide that would help
>>>>> understanding what the problem is here, tell me what it is and I will
>>>>> provide it.
>>>>> 
>>>>> Regards,
>>>>> Antero
>>>> 
>>>> 
>> 
>> 

Reply via email to