Hi Max and Pablo,
    Thanks for your help.

    I checked the parser and the Module namespace is not prepared for
portuguese language [1, line 225].
    However, isn't clear to me how to config a new namespace and how to
run GenerateWikiConfig.scala, since we shouldn't modify the
Namespaces.scala file directly. Can you point to a template or a wiki page
with instructions?

Best,
Jairo

[1]
https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Namespaces.scala#L225


On Tue, Jun 4, 2013 at 12:14 PM, Pablo N. Mendes <[email protected]>wrote:

> Hi Gabriel,
> Max has given you some pretty clear and direct pointers to where you
> should start looking. If you have already tried a bunch of things and still
> cannot find the problem, please describe what you have tried, ask clear and
> direct questions, and we can try to support you to the best of our
> availability.
>
> Cheers,
> Pablo
>
>
> On Tue, Jun 4, 2013 at 1:34 AM, Gabriel Oliveira <[email protected]>wrote:
>
>> Hello Max,
>>
>> I'm working with Portuguese. I'm not sure if I understand what's going on
>> and how to solve it.
>>
>> Cheers,
>> Gabriel Oliveira
>>
>>
>> 2013/5/29 Max Jakob <[email protected]>
>>
>>> Hi Gabriel, CCing dbpedia-developers list,
>>>
>>> this looks like a problem with the DBpedia parser, so it is not
>>> directly a Spotlight problem. It's related to the (fairly new?) Module
>>> namespace in Wikipedia [1] that is not handled by the parser for all
>>> languages yet [2]. I assume you are not working with English, French
>>> or Hungarian dumps. For all other language, the Module namespace is
>>> not configured yet. Which language are you working with?
>>> If you understand what's going on, you can add the appropriate
>>> configuration yourself and send a pull request on GitHub to the
>>> extraction-framework repo [3]. Otherwise, the developer community
>>> might be able to help you.
>>> After this is corrected, install the extraction-framework in your
>>> local Maven repo by running mvn clean install. Afterwards, do the same
>>> for Spotlight again. Finally, re-attempt to run the indexing.
>>>
>>> Cheers,
>>> Max
>>>
>>> [1] http://en.wikipedia.org/wiki/Wikipedia:Namespace
>>> [2]
>>> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/wikiparser/impl/wikipedia/Namespaces.scala
>>> [3] https://github.com/dbpedia/extraction-framework
>>>
>>>
>>> On Wed, May 29, 2013 at 10:27 PM, Gabriel Oliveira <[email protected]>
>>> wrote:
>>> > Hello Max,
>>> >
>>> > I did as you told me and I have managed to fix some problems. I am
>>> still
>>> > learning how to use Maven and IntelliJ, therefore I have missed a few
>>> > details and it took me a while to realize that, but now I have made
>>> some
>>> > progress.
>>> >
>>> > Different from the last attempts though, now it has run for almost an
>>> hour
>>> > and has saved about 10800000 occurrences. However, after these
>>> occurrences
>>> > are saved an exception is thrown, and now I believe it is not my fault
>>> > anymore.
>>> > The output is as follows:
>>> >
>>> >  INFO 2013-05-29 16:55:58,441 main [AllOccurrenceSource$] - Processed
>>> > 1300000 Wikipedia definition pages (average 9.73 links per page)
>>> >  INFO 2013-05-29 16:56:23,092 main [FileOccurrenceSource$] -   saved
>>> > 10600000 occurrences
>>> >  INFO 2013-05-29 16:57:10,162 main [FileOccurrenceSource$] -   saved
>>> > 10700000 occurrences
>>> >  INFO 2013-05-29 16:58:00,311 main [FileOccurrenceSource$] -   saved
>>> > 10800000 occurrences
>>> > java.lang.reflect.InvocationTargetException
>>> >
>>> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >     at
>>> >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> >     at
>>> >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> >     at java.lang.reflect.Method.invoke(Method.java:601)
>>> >     at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
>>> >     at
>>> >
>>> scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
>>> > Caused by: java.util.NoSuchElementException: key not found: 828
>>> >     at scala.collection.MapLike$class.default(MapLike.scala:225)
>>> >     at scala.collection.immutable.HashMap.default(HashMap.scala:38)
>>> >     at scala.collection.MapLike$class.apply(MapLike.scala:135)
>>> >     at scala.collection.immutable.HashMap.apply(HashMap.scala:38)
>>> >     at
>>> >
>>> org.dbpedia.extraction.sources.WikipediaDumpParser.readPage(WikipediaDumpParser.java:218)
>>> >     at
>>> >
>>> org.dbpedia.extraction.sources.WikipediaDumpParser.readPages(WikipediaDumpParser.java:179)
>>> >     at
>>> >
>>> org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:137)
>>> >     at
>>> >
>>> org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:108)
>>> >     at
>>> >
>>> org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:57)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource.foreach(AllOccurrenceSource.scala:80)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.io.FileOccurrenceSource$.writeToFile(FileOccurrenceSource.scala:57)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia$.main(ExtractOccsFromWikipedia.scala:82)
>>> >     at
>>> >
>>> org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia.main(ExtractOccsFromWikipedia.scala)
>>> >     ... 6 more
>>> > [INFO]
>>> >
>>> ------------------------------------------------------------------------
>>> > [INFO] BUILD FAILURE
>>> > [INFO]
>>> >
>>> ------------------------------------------------------------------------
>>> > [INFO] Total time: 56:48.220s
>>> > [INFO] Finished at: Wed May 29 16:58:17 BRT 2013
>>> > [INFO] Final Memory: 11M/216M
>>> > [INFO]
>>> >
>>> ------------------------------------------------------------------------
>>> >
>>> > [ERROR] Failed to execute goal
>>> > net.alchim31.maven:scala-maven-plugin:3.1.0:run (default-cli) on
>>> project
>>> > index: wrap: org.apache.commons.exec.ExecuteException: Process exited
>>> with
>>> > an error: 240(Exit value: 240) -> [Help 1]
>>> > [ERROR]
>>> > [ERROR] To see the full stack trace of the errors, re-run Maven with
>>> the -e
>>> > switch.
>>> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>> >
>>> > [ERROR]
>>> > [ERROR] For more information about the errors and possible solutions,
>>> please
>>> > read the following articles:
>>> > [ERROR] [Help 1]
>>> >
>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>> >
>>> > I will run it again with the -X switch and will send you the full
>>> output as
>>> > soon as it finishes. Probably about an hour from now.
>>> >
>>> > I really appreciate your support.
>>> >
>>> > Cheers,
>>> > Gabriel Oliveira
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> How ServiceNow helps IT people transform IT departments:
>> 1. A cloud service to automate IT design, transition and operations
>> 2. Dashboards that offer high-level views of enterprise services
>> 3. A single system of record for all IT processes
>> http://p.sf.net/sfu/servicenow-d2d-j
>>
>> _______________________________________________
>> Dbpedia-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>>
>
>
> --
>
> Pablo N. Mendes
> http://pablomendes.com
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to