Fair enough... I just wanted to figure out if it is actually stopped before
stopping it, but I will do that tomorrow.

I didn't change anything with jena cmd tools, to be honest, I never used it
before, so I just googled what to do and how to do it, and the tdbstats
command was the one that made more sense. Is this a flag that is passed in
to tdbstats?

Thanks,
Antero

On Tue, 5 Apr 2016 8:12 pm A. Soroka, <aj...@virginia.edu> wrote:

> I don't know whether you will be able to restart the indexer, but it seems
> like you aren't getting much of anywhere so you might as well stop it with
> the proviso that you might have to start again from scratch. But YMMV.
>
> As for the Jena side of things: did you adjust the heap allocation when
> you ran tdbstats?
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Apr 5, 2016, at 12:45 PM, Antero Duarte <a.fduar...@gmail.com> wrote:
> >
> > Thanks for your reply.
> >
> > I downloaded jena to use the command line tools, what command should I
> run?
> > I tried running `tdbstats --loc=./tdb` (from the resources dir) but
> after a
> > whileit threw an exception because it ran out of heap space. Is there
> > another command that can be more useful and doesn't use as much memory?
> On
> > the other hand if I can stop the indexing tool, I can clear some memory
> to
> > let tdbstats run.
> >
> > Regards,
> > Antero
> >
> > On Tue, 5 Apr 2016 at 15:21 A. Soroka <aj...@virginia.edu> wrote:
> >
> >> Looks similar to something others have seen:
> >>
> >> https://issues.apache.org/jira/browse/STANBOL-1446
> >>
> >> which doesn't help you much, but might be a place to centralize the
> answer
> >> to this question. I wouldn't think that a WARN level message would tag a
> >> condition so severe that indexing doesn't take place. Perhaps it is
> >> something else.
> >>
> >> Can you use Jena's command-line tools to check and see how many entities
> >> have actually been loaded into TDB vs. how many you expect? That might
> give
> >> you a clue as to where indexing is hanging up (if it actually is).
> >>
> >> ---
> >> A. Soroka
> >> The University of Virginia Library
> >>
> >>> On Apr 5, 2016, at 7:59 AM, Antero Duarte <a.fduar...@gmail.com>
> wrote:
> >>>
> >>> Hello there,
> >>>
> >>> I have been struggling with building indexes from generic rdf and even
> >>> using default configuration for more popular sources like dbpedia.
> >>>
> >>> I found an indexing tool online configured to index yago, at
> >>> https://github.com/ChalithaUdara/Stanbol-Yago-Site.
> >>>
> >>> Everything seemed to be going well until it got into this loop:
> >>>
> >>> 11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
> >>> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
> >>> 11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
> >>> http://www.kinjal.com/condition:' invalid -> mapping ignored!
> >>> 11:17:26,576 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
> >>> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
> >>> 12:17:26,856 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'nsogi' valid , namespace '
> >>> http://prefix.cc/nsogi:' invalid -> mapping ignored!
> >>> 12:17:26,918 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'dbc' valid , namespace '
> >>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>> 12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'category' valid , namespace '
> >>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>> 12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'hgnc' valid , namespace '
> >>> http://bio2rdf.org/hgnc:' invalid -> mapping ignored!
> >>> 12:17:26,950 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'chebi' valid , namespace '
> >>> http://bio2rdf.org/chebi:' invalid -> mapping ignored!
> >>> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'dbt' valid , namespace '
> >>> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> >>> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'pubmed' valid , namespace '
> >>> http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
> >>> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace '
> >>> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> >>> 12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'dbrc' valid , namespace '
> >>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>> 12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'call' valid , namespace '
> >>> http://webofcode.org/wfn/call:' invalid -> mapping ignored!
> >>> 12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'dbcat' valid , namespace '
> >>> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> >>> 12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'bgcat' valid , namespace '
> >>> http://bg.dbpedia.org/resource/?????????:' invalid -> mapping ignored!
> >>> 12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
> >>> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
> >>> 12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
> >>> http://www.kinjal.com/condition:' invalid -> mapping ignored!
> >>> 12:17:27,042 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> >>> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
> >>> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
> >>>
> >>> It happened to me before with the dbpedia index and at first I thought
> it
> >>> was some problem with the rdf source, and since theses messages are
> >> logged
> >>> at WARN level, I simply ignored them. but after days, the indexing/tdb
> >>> directory stayed the same size even though there are still files in the
> >>> indexing/resources/rdfdata directory. Then I realised that these
> messages
> >>> follow a pattern and they are logged every hour with precision to the
> >>> second, which seems weird. Also, they are always the same messages.
> This
> >>> led me to think that the indexing tool is stuck in a loop and that's
> why
> >> it
> >>> is not moving any further. I think it is important to say that the one
> >> hour
> >>> time span between messages is the same for the dbpedia index and for
> the
> >>> yago index, the yago index is much bigger.
> >>>
> >>> I have been constantly running `watch du * -s` in the resources
> directory
> >>> for days to check for size changes and nothing is changing and hasn't
> >>> changed for days.
> >>>
> >>> I don't know if this is some problem with the configuration, but since
> I
> >>> didn't configure it myself, I assumed that what I got from github would
> >> be
> >>> a working configuration for this specific index.
> >>>
> >>> I have a few questions related to this problem:
> >>>
> >>> 1) Is it safe to cancel the indexing tool and start again without
> >> changing
> >>> what's in the rdfdata and imported directories? Could this help at all?
> >>>
> >>> 2) What can possibly be causing this problem?
> >>>
> >>> 3) Why is it looping and logging every hour (accurate to the second)?
> >>>
> >>> If there is any extra information I can provide that would help
> >>> understanding what the problem is here, tell me what it is and I will
> >>> provide it.
> >>>
> >>> Regards,
> >>> Antero
> >>
> >>
>
>

Reply via email to