Re: [Dbpedia-discussion] Build Failure during abstract extraction

Jona Christopher Sahnwaldt Thu, 18 Apr 2013 06:55:27 -0700

Hi Julien,

That sucks. 21 hours and then it crashes. That's a bummer.


I don't know what's going on. You could try calling api.php from the
command line using curl and see what happens. Maybe it actually takes
extremely long to render that article. Calling api.php is a bit
cumbersome though - I think you have to copy the wikitext for the
article from the xml dump and construct a POST request. I may be
simpler to hack together a little HTML page with a form for all the
data you need (page title and content, I think) which POSTs the data
to api.php. If you do that, let us know, I'd love to add such a test
page to our MediaWiki files in the repo.

@all - Is there a simpler way to test the abstract extraction for a
single page? I don't remember.

By the way, 21 hours for the French WIkipedia sounds pretty slow, if I
recall correctly. How many ms per page does the log file say? What
kind of machine do you have? I think on our reasonably but not
extremely fast machine with four cores it took something like 30 ms
per page. Are you sure you activated APC? That makes a huge
difference.

Good luck,
JC

On 18 April 2013 11:52, Julien Plu <julien....@redaction-developpez.com> wrote:
> Hi,
>
> After around 21 hours of process the abstract extraction has been stopped by
> a "build failure" :
>
> avr. 18, 2013 10:33:44 AM
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1
> apply$mcVI$sp
> INFO: Error retrieving abstract of title=Prix Ken
> Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying...
> java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.read(SocketInputStream.java:150)
>     at java.net.SocketInputStream.read(SocketInputStream.java:121)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>     at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
>     at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
>     at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
>     at
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124)
>     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
>     at
> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109)
>     at
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
>     at
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
>     at
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>     at
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
>     at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>     at
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
>     at
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
>     at scala.collection.immutable.List.foreach(List.scala:76)
>     at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
>     at scala.collection.immutable.List.flatMap(List.scala:76)
>     at
> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
>     at
> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
>     at
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
>     at
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
>     at
> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
>     at
> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 21:33:55.973s
> [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013
> [INFO] Final Memory: 10M/147M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal org.scala-tools:maven-scala-plugin:2.15.2:run
> (default-cli) on project dump: wrap:
> org.apache.commons.exec.ExecuteException: Process exited with an error:
> 137(Exit value: 137) -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please
> read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>
> Someone know why this error happened ? Not enough memory ?
>
> Best.
>
> Julien.
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Build Failure during abstract extraction

Reply via email to