Hi Jona,

The API respond correctly :-( I think at least that the
"SocketTimeoutException" occur because an abstract doesn't exist, no ?
(because this exception appeared many times during the extraction) but it's
not blocking.

And I have some gunzip files into my dump directories with data inside so
the extraction worked until the error occured.

I rerun the extraction but with "extraction.default.properties" we will see
if there is an improvement...

And the machine is a virtual machine (from virtualbox) with 2Go of memories
and 3 cores from my computer so it's normal if it's slow like that. But I
will try it on another machine, a real server machine.

Best.

Julien.


2013/4/18 Jona Christopher Sahnwaldt <j...@sahnwaldt.de>

> Hi Julien,
>
> That sucks. 21 hours and then it crashes. That's a bummer.
>
> I don't know what's going on. You could try calling api.php from the
> command line using curl and see what happens. Maybe it actually takes
> extremely long to render that article. Calling api.php is a bit
> cumbersome though - I think you have to copy the wikitext for the
> article from the xml dump and construct a POST request. I may be
> simpler to hack together a little HTML page with a form for all the
> data you need (page title and content, I think) which POSTs the data
> to api.php. If you do that, let us know, I'd love to add such a test
> page to our MediaWiki files in the repo.
>
> @all - Is there a simpler way to test the abstract extraction for a
> single page? I don't remember.
>
> By the way, 21 hours for the French WIkipedia sounds pretty slow, if I
> recall correctly. How many ms per page does the log file say? What
> kind of machine do you have? I think on our reasonably but not
> extremely fast machine with four cores it took something like 30 ms
> per page. Are you sure you activated APC? That makes a huge
> difference.
>
> Good luck,
> JC
>
> On 18 April 2013 11:52, Julien Plu <julien....@redaction-developpez.com>
> wrote:
> > Hi,
> >
> > After around 21 hours of process the abstract extraction has been
> stopped by
> > a "build failure" :
> >
> > avr. 18, 2013 10:33:44 AM
> > org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1
> > apply$mcVI$sp
> > INFO: Error retrieving abstract of title=Prix Ken
> > Domon;ns=0/Main/;language:wiki=fr,locale=fr. Retrying...
> > java.net.SocketTimeoutException: Read timed out
> >     at java.net.SocketInputStream.socketRead0(Native Method)
> >     at java.net.SocketInputStream.read(SocketInputStream.java:150)
> >     at java.net.SocketInputStream.read(SocketInputStream.java:121)
> >     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >     at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> >     at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633)
> >     at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579)
> >     at
> >
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322)
> >     at
> >
> org.dbpedia.extraction.mappings.AbstractExtractor$$anonfun$retrievePage$1.apply$mcVI$sp(AbstractExtractor.scala:124)
> >     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
> >     at
> >
> org.dbpedia.extraction.mappings.AbstractExtractor.retrievePage(AbstractExtractor.scala:109)
> >     at
> >
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:66)
> >     at
> >
> org.dbpedia.extraction.mappings.AbstractExtractor.extract(AbstractExtractor.scala:21)
> >     at
> >
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
> >     at
> >
> org.dbpedia.extraction.mappings.CompositeMapping$$anonfun$extract$1.apply(CompositeMapping.scala:13)
> >     at
> >
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
> >     at
> >
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:239)
> >     at
> >
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> >     at scala.collection.immutable.List.foreach(List.scala:76)
> >     at
> > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:239)
> >     at scala.collection.immutable.List.flatMap(List.scala:76)
> >     at
> >
> org.dbpedia.extraction.mappings.CompositeMapping.extract(CompositeMapping.scala:13)
> >     at
> >
> org.dbpedia.extraction.mappings.RootExtractor.apply(RootExtractor.scala:23)
> >     at
> >
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:29)
> >     at
> >
> org.dbpedia.extraction.dump.extract.ExtractionJob$$anonfun$1.apply(ExtractionJob.scala:25)
> >     at
> >
> org.dbpedia.extraction.util.SimpleWorkers$$anonfun$apply$1$$anon$2.process(Workers.scala:23)
> >     at
> >
> org.dbpedia.extraction.util.Workers$$anonfun$1$$anon$1.run(Workers.scala:131)
> >
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] BUILD FAILURE
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] Total time: 21:33:55.973s
> > [INFO] Finished at: Thu Apr 18 10:35:37 CEST 2013
> > [INFO] Final Memory: 10M/147M
> > [INFO]
> > ------------------------------------------------------------------------
> > [ERROR] Failed to execute goal
> org.scala-tools:maven-scala-plugin:2.15.2:run
> > (default-cli) on project dump: wrap:
> > org.apache.commons.exec.ExecuteException: Process exited with an error:
> > 137(Exit value: 137) -> [Help 1]
> > [ERROR]
> > [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e
> > switch.
> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> > [ERROR]
> > [ERROR] For more information about the errors and possible solutions,
> please
> > read the following articles:
> > [ERROR] [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> >
> > Someone know why this error happened ? Not enough memory ?
> >
> > Best.
> >
> > Julien.
> >
> >
> ------------------------------------------------------------------------------
> > Precog is a next-generation analytics platform capable of advanced
> > analytics on semi-structured data. The platform includes APIs for
> building
> > apps and a phenomenal toolset for data science. Developers can use
> > our toolset for easy data analysis & visualization. Get a free account!
> > http://www2.precog.com/precogplatform/slashdotnewsletter
> > _______________________________________________
> > Dbpedia-discussion mailing list
> > Dbpedia-discussion@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to