Hi Gabriel, CCing dbpedia-developers list,

this looks like a problem with the DBpedia parser, so it is not
directly a Spotlight problem. It's related to the (fairly new?) Module
namespace in Wikipedia [1] that is not handled by the parser for all
languages yet [2]. I assume you are not working with English, French
or Hungarian dumps. For all other language, the Module namespace is
not configured yet. Which language are you working with?
If you understand what's going on, you can add the appropriate
configuration yourself and send a pull request on GitHub to the
extraction-framework repo [3]. Otherwise, the developer community
might be able to help you.
After this is corrected, install the extraction-framework in your
local Maven repo by running mvn clean install. Afterwards, do the same
for Spotlight again. Finally, re-attempt to run the indexing.

Cheers,
Max

[1] http://en.wikipedia.org/wiki/Wikipedia:Namespace
[2] 
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Namespaces.scala
[3] https://github.com/dbpedia/extraction-framework


On Wed, May 29, 2013 at 10:27 PM, Gabriel Oliveira <[email protected]> wrote:
> Hello Max,
>
> I did as you told me and I have managed to fix some problems. I am still
> learning how to use Maven and IntelliJ, therefore I have missed a few
> details and it took me a while to realize that, but now I have made some
> progress.
>
> Different from the last attempts though, now it has run for almost an hour
> and has saved about 10800000 occurrences. However, after these occurrences
> are saved an exception is thrown, and now I believe it is not my fault
> anymore.
> The output is as follows:
>
>  INFO 2013-05-29 16:55:58,441 main [AllOccurrenceSource$] - Processed
> 1300000 Wikipedia definition pages (average 9.73 links per page)
>  INFO 2013-05-29 16:56:23,092 main [FileOccurrenceSource$] -   saved
> 10600000 occurrences
>  INFO 2013-05-29 16:57:10,162 main [FileOccurrenceSource$] -   saved
> 10700000 occurrences
>  INFO 2013-05-29 16:58:00,311 main [FileOccurrenceSource$] -   saved
> 10800000 occurrences
> java.lang.reflect.InvocationTargetException
>
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
>     at
> scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
> Caused by: java.util.NoSuchElementException: key not found: 828
>     at scala.collection.MapLike$class.default(MapLike.scala:225)
>     at scala.collection.immutable.HashMap.default(HashMap.scala:38)
>     at scala.collection.MapLike$class.apply(MapLike.scala:135)
>     at scala.collection.immutable.HashMap.apply(HashMap.scala:38)
>     at
> org.dbpedia.extraction.sources.WikipediaDumpParser.readPage(WikipediaDumpParser.java:218)
>     at
> org.dbpedia.extraction.sources.WikipediaDumpParser.readPages(WikipediaDumpParser.java:179)
>     at
> org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:137)
>     at
> org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:108)
>     at
> org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:57)
>     at
> org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource.foreach(AllOccurrenceSource.scala:80)
>     at
> org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
>     at
> org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
>     at
> org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
>     at
> org.dbpedia.spotlight.io.FileOccurrenceSource$.writeToFile(FileOccurrenceSource.scala:57)
>     at
> org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia$.main(ExtractOccsFromWikipedia.scala:82)
>     at
> org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia.main(ExtractOccsFromWikipedia.scala)
>     ... 6 more
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 56:48.220s
> [INFO] Finished at: Wed May 29 16:58:17 BRT 2013
> [INFO] Final Memory: 11M/216M
> [INFO]
> ------------------------------------------------------------------------
>
> [ERROR] Failed to execute goal
> net.alchim31.maven:scala-maven-plugin:3.1.0:run (default-cli) on project
> index: wrap: org.apache.commons.exec.ExecuteException: Process exited with
> an error: 240(Exit value: 240) -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please
> read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>
> I will run it again with the -X switch and will send you the full output as
> soon as it finishes. Probably about an hour from now.
>
> I really appreciate your support.
>
> Cheers,
> Gabriel Oliveira

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to