Re: [Dbpedia-discussion] dbpedia extraction framework on windows problems

Jona Christopher Sahnwaldt Mon, 08 Jul 2013 14:25:37 -0700

Hi Adrian,

thanks for the feedback!


I am an Eclipse fan who never warmed to IntelliJ. Developing DBpedia
on Eclipse works quite well for me. Here's what I do:

I don't care much about Maven / Eclipse integration. I run Maven on
the command line and use Eclipse as a smart editor. The necessary
Eclipse config files (.classpath, .project, ...) are checked into the
git repo and should work out of the box.

If there are some changes in a pom.xml that I need to propagate to
Eclipse, I run

$ mvn eclipse:eclipse -DdownloadSources=true

and then edit .project and .classpath by hand - mostly I delete
unnecessary stuff that Maven generates.

I usually install Eclipse with as few plugins as possible, starting
with the "Platform Runtime Binary" at
http://download.eclipse.org/eclipse/downloads/ . I then add only the
plugins I need. No Maven plugin.

As for the path settings - DBpedia configuration is a mess in general.
It's a pity that one needs to change file paths in so many files,
sometimes several places in one config file. :-(

You write "20 GB is not enough" - do you mean RAM or disk space? For
most extraction steps, 4 to 6 GB RAM should easily be enough.

Regards,
Christopher

On 8 July 2013 21:30, Adrian Brasoveanu <adrian.brasove...@gmail.com> wrote:
> Hello Dimitris,
>
> I am in vacation, so I don't really have a good Internet connection.
> After reading your mail, I am succesfully running an extraction with the
> download.minimal.properties file...(it is slow, but I was really curious).
> However I did this under Ubuntu and it was really easy (following all the
> steps from http://wiki.dbpedia.org/ExtractionOnUbuntu?v=fnb).
> I did this because I wanted to see the difference between Windows and Linux
> when running the bash scripts...
>
> I also discovered I need a bigger virtual machine for such tasks :) (20 GB
> is not enough :) ).
>
> It took about 45 minutes to install everything under Linux (Intellij Idea,
> apache, php, mysql, mediawiki, etc).
> Under Windows it took some 3 hours or so, but mainly due to the fact that I
> tried to do it with Eclipse..
> After that I switched to Intellij Idea and it took another hour or so...
>
> I noticed the next things:
> 0) Eclipse is really bad when it comes to maven and scala.... So I used
> IntelliJ since that was recommended on Ubuntu also.
> I used everything in last versions (IntelliJ Community last version, last
> Maven, last Scala, and so on...)
> 1) The maven integration was not working out of the box in Windows even in
> IntelliJ, compared to Linux (it took more than 30-40 minutes to bring out
> all the
> files and dependencies), but there were nice and clear error messages that
> tell you what jar is missing)
> 2) There were 4-5 jars missings (they are easy to find on the web anyway,
> apache.commons.compress was one of those jars, for example...
> and a certain version of scala-test)
> 3) All the paths need to be changed (clearly you're not using
> /home/release/downloads or something similar on Windows)
> 4) There is a need for a good tutorial on how to run bash files under
> Windows....
> 5) Most of the errors you will get will be path errors,...
>
> That would be it... If you want to run it on Windows. However I did not
> added any settings for WAMP and MediaWiki on Windows...
> Yes, I am willing to contribute to any Windows documentation or also to the
> normal documentation.
>
> As you probably guessed I am mostly interested in writing custom extractors,
> that's where all the attraction of this framework lies,
> not to mention that it's a good excuse to try Scala :).
>
> Best regards,
> Adrian
>
>
>
> On Fri, Jul 5, 2013 at 6:31 PM, Dimitris Kontokostas <jimk...@gmail.com>
> wrote:
>>
>> Hi Andrian,
>>
>> The project runs with maven3 and the dump pom defines various launchers
>> for simple extraction we use the download & extraction launchers
>>
>>
>> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions
>> the "run" script runs in bash but if you take a look [1] you will quickly
>> find out what it needs to run the scripts in windows.
>> You can equivalently make run configurations in IntellJ / eclipse
>>
>> Once you get it running, could take some time and create a guide for
>> windows users?
>>
>> Cheers,
>> Dimitris
>>
>> [1] https://github.com/dbpedia/extraction-framework/blob/master/run
>>
>>
>> On Thu, Jul 4, 2013 at 11:55 AM, Adrian Brasoveanu
>> <adrian.brasove...@gmail.com> wrote:
>>>
>>> Hello all,
>>>
>>> Sorry for re-posting this. First time I got an error message because I
>>> was not subscribed to this list.
>>>
>>> I tried running the DBPedia Extraction Framework on Windows.
>>> I used these settings in the pom.xml:
>>>
>>>                          <launcher>
>>>                             <id>import</id>
>>>
>>> <mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
>>>                             <jvmArgs>
>>>                                 <jvmArg>-server</jvmArg>
>>>                             </jvmArgs>
>>>                             <args>
>>>                                 <!-- base folder of downloaded dumps -->
>>>                                 <arg>/home/release/wikipedia</arg>
>>>                                 <!-- location of SQL file containing
>>> MediaWiki table definitions  -->
>>>
>>> <arg>/home/release/data/projects/mediawiki/core/maintenance/tables.sql</arg>
>>>                                 <!-- JDBC URL of MySQL server. Import
>>> creates a new database for each wiki. -->
>>>
>>> <arg>jdbc:mysql://localhost/?characterEncoding=UTF-8</arg>
>>>                                 <!-- require-download-complete -->
>>>                                 <arg>true</arg>
>>>                                 <!-- file name:
>>> pages-articles.xml{,.bz2,.gz} -->
>>>                                 <arg>pages-articles.xml.bz2</arg>
>>>                                 <!-- languages and article count ranges,
>>> comma-separated, e.g. "de,en" or "@mappings" etc. -->
>>>                                 <arg>en</arg>
>>>                             </args>
>>>                         </launcher>
>>>
>>> The error I got was this:
>>> The error that I get is this:
>>>
>>> [INFO] launcher 'import' selected =>
>>> org.dbpedia.extraction.dump.sql.Import
>>> java.lang.reflect.InvocationTargetException
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at
>>> org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
>>> at
>>> org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
>>> Caused by: java.io.FileNotFoundException:
>>> \home\release\data\projects\mediawiki\core\maintenance\tables.sql (The
>>> system cannot find the path specified)
>>> at java.io.FileInputStream.open(Native Method)
>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>> at scala.io.Source$.fromFile(Source.scala:91)
>>> at scala.io.Source$.fromFile(Source.scala:76)
>>> at org.dbpedia.extraction.dump.sql.Import$.main(Import.scala:32)
>>> at org.dbpedia.extraction.dump.sql.Import.main(Import.scala)
>>> ... 6 more
>>>
>>> So it appears that I need to have mediawiki even though I don't want to
>>> extract the abstracts...
>>>
>>> My questions are this:
>>> 1) assuming that I do not want to generate the abstracts yet (I just want
>>> to see how it works and how to create custom dumps),
>>> do I still need a copy of the next things:
>>> local MediaWiki and Wikipedia (http://wiki.dbpedia.org/Documentation;
>>> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions
>>> and
>>> http://wiki.dbpedia.org/Documentation/ExtractionConfiguration?v=17gm - do
>>> not mention that I need a MediaWiki - and Wikipedia mirror except if I want
>>> to extract abstracts);
>>>
>>> 2) Does this process works on Windows? Do I still need to provide old
>>> dumps in order to run this framework?
>>>
>>> 3) Where can I setup the default configuration file that I will use?
>>> There is no default configuration specified in the pom file...  so that when
>>> I run the scala plugin it will automatically use that config file...
>>>
>>>
>>> Best regards,
>>> Adrian
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>
>>
>>
>> --
>> Kontokostas Dimitris
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] dbpedia extraction framework on windows problems

Reply via email to