On 11 July 2013 10:10, Andrea Di Menna <[email protected]> wrote:

> Hi all,
>
> I have a question (which might sound stupid, I know) regarding the
> performances of the extraction framework when processing wikipedias.
> Whenever I run any of the extractors on any wikipedia, I am noticing that
> the time to process each single wikipedia page decreases as the extraction
> goes on (as per the stats produced by the framework).
> What is the reason for this?
>

I've also seen this. I'm not sure why this happens. It might the JIT
compiler kicking in after a while, although when one extracts multiple
languages, I think no new classes are loaded for the second and all
following languages. Or it could be the initial overhead of the bz2
decoding. It could be that to read the first few bytes, bz2 has to
decompress the first block, which is several hundred kilobytes large. Or
some other stuff that has only to be done once for the whole extraction.
This would cause an initial delay that makes the extraction of the first
few pages look slower than that of the following. The displayed speed is
not a rolling average, it's simply the quotient of total time spent and
number of pages done.


>
> Also, I am wondering whether there is anything that can be done to speed
> up the extraction process, apart from boosting the hardware.
> Any ideas?
>

Not really. You might want to play around with the GC settings - they often
make quite a difference - and other -XX JVM settings.

Other than that, there are certainly a lot of places where the Scala code
can be made more efficient.

JC


>
> Cheers
> Andrea
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to