Hi,

Right, as an update to my last message, I changed the heap size for
tdbstats and redirected the output to a file which has 1,005,822 lines
(output of "wc -l"). Now if this directly relates to the number of
entities, I don't think it finished because the initial size of the files
in rdfdata/ was around 70GB.

One interesting thing though, the first time I ran tdbstats, the size of
the tdb directory increased... Not much and only once, it has been stable
ever since. Because I never used tdbstats before, I don't know if this is
the indexing tool showing it's still running or just tdbstats itself
changing something in the directory.

I restarted the indexing tool and the size of tdb seems to be increasing
again and the tool seems to be working well... Could it be that I just
needed to clear whatever was in RAM because it is a huge index?

I will see how it goes from now on and if further problems arise I will ask
again.

Thank you for your help.
Antero

On Wed, 6 Apr 2016 at 20:04 A. Soroka <aj...@virginia.edu> wrote:

> TDB command-line tools should respond to ordinary usage of the JAVA_OPTS
> environment variable, so you should be able to do something like:
>
> JAVA_OPTS="-Xmx2G" tdbstats  --loc=./tdb
>
> to run with a 2GB heap, for example.
>
> One thing to note is that if you were even able to access the TDB database
> at all (which it seems you were because your attempt to use tdbstats didn't
> error out instantly) it means that no other process was using the database,
> particularly that the indexer was not. Only one system process can use a
> TDB database at a time. Now, that may or may not mean that the indexer was
> actually done doing stuff with the database: I don't know enough about how
> it works to know that, although the Stanbol devs on here would. It just
> means that at the moment you tried to run tdbstats, the indexer had
> relinquished control of the TDB database, at least for the moment.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Apr 5, 2016, at 6:42 PM, Antero Duarte <a.fduar...@gmail.com> wrote:
> >
> > Fair enough... I just wanted to figure out if it is actually stopped
> before
> > stopping it, but I will do that tomorrow.
> >
> > I didn't change anything with jena cmd tools, to be honest, I never used
> it
> > before, so I just googled what to do and how to do it, and the tdbstats
> > command was the one that made more sense. Is this a flag that is passed
> in
> > to tdbstats?
> >
> > Thanks,
> > Antero
> >
> > On Tue, 5 Apr 2016 8:12 pm A. Soroka, <aj...@virginia.edu> wrote:
> >
> >> I don't know whether you will be able to restart the indexer, but it
> seems
> >> like you aren't getting much of anywhere so you might as well stop it
> with
> >> the proviso that you might have to start again from scratch. But YMMV.
> >>
> >> As for the Jena side of things: did you adjust the heap allocation when
> >> you ran tdbstats?
> >>
> >> ---
> >> A. Soroka
> >> The University of Virginia Library
> >>
> >>> On Apr 5, 2016, at 12:45 PM, Antero Duarte <a.fduar...@gmail.com>
> wrote:
> >>>
> >>> Thanks for your reply.
> >>>
> >>> I downloaded jena to use the command line tools, what command should I
> >> run?
> >>> I tried running `tdbstats --loc=./tdb` (from the resources dir) but
> >> after a
> >>> whileit threw an exception because it ran out of heap space. Is there
> >>> another command that can be more useful and doesn't use as much memory?
> >> On
> >>> the other hand if I can stop the indexing tool, I can clear some memory
> >> to
> >>> let tdbstats run.
> >>>
> >>> Regards,
> >>> Antero
> >>>
> >>> On Tue, 5 Apr 2016 at 15:21 A. Soroka <aj...@virginia.edu> wrote:
> >>>
> >>>> Looks similar to something others have seen:
> >>>>
> >>>> https://issues.apache.org/jira/browse/STANBOL-1446
> >>>>
> >>>> which doesn't help you much, but might be a place to centralize the
> >> answer
> >>>> to this question. I wouldn't think that a WARN level message would
> tag a
> >>>> condition so severe that indexing doesn't take place. Perhaps it is
> >>>> something else.
> >>>>
> >>>> Can you use Jena's command-line tools to check and see how many
> entities
> >>>> have actually been loaded into TDB vs. how many you expect? That might
> >> give
> >>>> you a clue as to where indexing is hanging up (if it actually is).
> >>>>
> >>>> ---
> >>>> A. Soroka
> >>>> The University of Virginia Library
> >>>>
> >>>>> On Apr 5, 2016, at 7:59 AM, Antero Duarte <a.fduar...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> Hello there,
> >>>>>
> >>>>> I have been struggling with building indexes from generic rdf and
> even
> >>>>> using default configuration for more popular sources like dbpedia.
> >>>>>
> >>>>> I found an indexing tool online configured to index yago, at
> >>>>> https://github.com/ChalithaUdara/Stanbol-Yago-Site.
> >>>>>
> >>>>> Everything seemed to be going well until it got into this loop:
> >>>>>
> >>>>> 11:17:26,546 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
> >>>>> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping
> ignored!
> >>>>> 11:17:26,546 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
> >>>>> http://www.kinjal.com/condition:' invalid -> mapping ignored!
> >>>>> 11:17:26,576 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
> >>>>> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
> >>>>> 12:17:26,856 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'nsogi' valid , namespace '
> >>>>> http://prefix.cc/nsogi:' invalid -> mapping ignored!
> >>>>> 12:17:26,918 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'dbc' valid , namespace '
> >>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>>>> 12:17:26,949 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'category' valid , namespace '
> >>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>>>> 12:17:26,949 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'hgnc' valid , namespace '
> >>>>> http://bio2rdf.org/hgnc:' invalid -> mapping ignored!
> >>>>> 12:17:26,950 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'chebi' valid , namespace '
> >>>>> http://bio2rdf.org/chebi:' invalid -> mapping ignored!
> >>>>> 12:17:26,980 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'dbt' valid , namespace '
> >>>>> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> >>>>> 12:17:26,980 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'pubmed' valid , namespace '
> >>>>> http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
> >>>>> 12:17:26,980 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace '
> >>>>> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> >>>>> 12:17:26,981 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'dbrc' valid , namespace '
> >>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>>>> 12:17:26,981 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'call' valid , namespace '
> >>>>> http://webofcode.org/wfn/call:' invalid -> mapping ignored!
> >>>>> 12:17:27,011 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'dbcat' valid , namespace '
> >>>>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>>>> 12:17:27,011 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'bgcat' valid , namespace '
> >>>>> http://bg.dbpedia.org/resource/?????????:' invalid -> mapping
> ignored!
> >>>>> 12:17:27,012 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
> >>>>> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping
> ignored!
> >>>>> 12:17:27,012 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
> >>>>> http://www.kinjal.com/condition:' invalid -> mapping ignored!
> >>>>> 12:17:27,042 [pool-1-thread-1] WARN
> impl.NamespacePrefixProviderImpl -
> >>>>> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
> >>>>> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
> >>>>>
> >>>>> It happened to me before with the dbpedia index and at first I
> thought
> >> it
> >>>>> was some problem with the rdf source, and since theses messages are
> >>>> logged
> >>>>> at WARN level, I simply ignored them. but after days, the
> indexing/tdb
> >>>>> directory stayed the same size even though there are still files in
> the
> >>>>> indexing/resources/rdfdata directory. Then I realised that these
> >> messages
> >>>>> follow a pattern and they are logged every hour with precision to the
> >>>>> second, which seems weird. Also, they are always the same messages.
> >> This
> >>>>> led me to think that the indexing tool is stuck in a loop and that's
> >> why
> >>>> it
> >>>>> is not moving any further. I think it is important to say that the
> one
> >>>> hour
> >>>>> time span between messages is the same for the dbpedia index and for
> >> the
> >>>>> yago index, the yago index is much bigger.
> >>>>>
> >>>>> I have been constantly running `watch du * -s` in the resources
> >> directory
> >>>>> for days to check for size changes and nothing is changing and hasn't
> >>>>> changed for days.
> >>>>>
> >>>>> I don't know if this is some problem with the configuration, but
> since
> >> I
> >>>>> didn't configure it myself, I assumed that what I got from github
> would
> >>>> be
> >>>>> a working configuration for this specific index.
> >>>>>
> >>>>> I have a few questions related to this problem:
> >>>>>
> >>>>> 1) Is it safe to cancel the indexing tool and start again without
> >>>> changing
> >>>>> what's in the rdfdata and imported directories? Could this help at
> all?
> >>>>>
> >>>>> 2) What can possibly be causing this problem?
> >>>>>
> >>>>> 3) Why is it looping and logging every hour (accurate to the second)?
> >>>>>
> >>>>> If there is any extra information I can provide that would help
> >>>>> understanding what the problem is here, tell me what it is and I will
> >>>>> provide it.
> >>>>>
> >>>>> Regards,
> >>>>> Antero
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to