Hi Gaurav,

I'm glad that it worked in the end. Abstract extraction is the most
complex part of the DBpedia extraction process. Most people would
probably have given up, but you persevered and succeeded. Good job!

I just found a few more notes I took when I ran the abstract
extraction in summer 2012. Sorry that I didn't remember them earlier.
I added them to the wiki page you created:

https://github.com/dbpedia/extraction-framework/wiki/Dbpedia-Abstract-Extraction--Step-by-step#notes-by-christopher

Thanks again for your hard work and your contributions!

Cheers,
JC

On 30 March 2013 13:01, Dimitris Kontokostas <[email protected]> wrote:
>
> Hi gaurav,
>
> By contributing back you automatically become a member
> So, you are very welcome to add it yourself ;) after all it's your work...
>
> I'd suggest to create a new page and make a link from here
> https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions
> to the new one, but we are very open to other suggestions.
>
> Cheers,
> Dimitris
>
>
> On Sat, Mar 30, 2013 at 1:28 PM, gaurav pant <[email protected]> wrote:
>>
>> Hi Jona/Dimitris/All,
>>
>> Thanks for All the help and being so patience to me.Finally setup has
>> been completed successfully.
>>
>> As you suggested I am putting all the step-by-step approach here So that
>> after review you put is somewhere in shared/common place. Also I request
>> entire bdpedia developers to update it time to time at single place.It will
>> be very helpful for all.
>>
>> "
>>
>> AbstractExtraction from dbpedia dump
>>
>> Software Requirement-
>>
>> Mysql
>> PHP with xml and apc
>> Scala
>> Maven
>> MediaWiki
>>
>> Steps to be followed-
>>
>> Step 1- Download extraction framework
>>
>> Please download or pull the extraction framework using git utility.
>>
>> "git clone git://github.com/dbpedia/extraction-framework.git"
>>
>>
>> Step 2- Download dbpedia dumps if required
>>
>> If you want to download dbpedia dump file than please do below.
>>
>> "cd dump;../clean-install-run download download.minimal.properties"
>>
>> There are already some configuration files to download with the
>> repository dump;Customize file according to your need and fire above command
>> to download dump.
>>
>> In download configuration file you define base-dir; above will download
>> dump inside this directory in below structure.
>>
>>
>> /path_to_download_folder/yyyymmdd/[language_code]wiki-yyyymmdd.-pages-articles.xml.bz2
>>
>> NOTE- If you have downloaded above page-article dump already
>> manually(without using this utility), than please skip step 2.But make sure
>> that the above naming convention for directory-structure name have been
>> followed. If not, than create this directory structure manually.
>>
>>
>> Step 3- Install Basic Software
>>
>> You need to install Mysql,PHP,apache and other software.
>>
>> To install and start the MySQL server, you can use
>> dump/src/main/bash/mysql.sh . If you do not want to use this script that
>> fine. Just make sure that all the configuration parameters specified in this
>> script have been updated into mysql configuration file -- my.conf.
>>
>> Install PHP,Aapache & Mysql properly. The installation of these software
>> is out of scope.Please refer other proper documentations for it.
>>
>> Also you need to install php-xml and php-apc. Why ? To avoid some error
>> and performance issues which will be described later in this document.
>>
>> NOTE- For some linux/unix package name php-apc may be php-pecl-apc . It
>> is an e-accelerator.
>>
>> I also come across with one script which may be used for this setup. I
>> have not tested it, but seems should work fine.
>>
>>
>> https://github.com/saxenap/install-php-apc-mysql-amazon-linux-centos/blob/master/php-apc-mysql-script.sh
>>
>> Finally Download MediaWiki from http://www.mediawiki.org/wiki/Download .
>> Use the latest stable release.(recommended). You can also use download
>> latest release from git
>>
>> git clone https://gerrit.wikimedia.org/r/p/mediawiki/core.git
>>
>>
>> Step 4- Trigger Import to mysql
>>
>> In order to generate clean abstracts from Wikipedia articles one needs to
>> render wiki templates as they would be rendered in the original Wikipedia
>> instance. So in order for the DBpedia Abstract Extractor to work, a running
>> Media Wiki instance with Wikipedia data in a MySQL database is necessary.
>>
>> To import the data, you need to run the Scala 'import' launcher:
>>
>> Before importing you have to adapt the settings for the 'import' launcher
>> in dump/pom.xml as below.
>>
>>
>> "<launcher>
>> <id>import</id>
>> <mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
>> <jvmArgs>
>> <jvmArg>-server</jvmArg>
>> </jvmArgs>
>> <args>
>>
>> <arg>path_to_download_folder</arg>
>> <arg>/path_to_wikimedia_parent_dir/mediawiki/maintenance/tables.sql</arg>
>>
>> <arg>jdbc:mysql://machine_name:mysql_port/?characterEncoding=UTF-8&amp;user=myuser&amp;password=mypass</arg>
>>
>> <arg>false</arg><!-- require-download-complete -->
>>
>> <arg>language-code</arg><!-- languages and article count ranges,
>> comma-separated -->
>> </args>
>> </launcher
>> "
>>
>> If you have download dbpedia dump file manually than make
>> require-download-complete as false as no file with the name exists to
>> indicate successful download.
>>
>> Now to import data into mysql fire-
>>
>> ../clean-install-run import
>>
>> NOTE-
>>
>> a) -- If while importing you get error ERROR 1283: Column 'si_title'
>> cannot be part of FULLTEXT index than collate should be specified for table
>> 'searchindex'. Hence change line for table searchindex
>>
>> ) ENGINE=MyISAM
>>
>> to
>>
>> ) ENGINE=MyISAM COLLATE='utf8_general_ci';
>>
>>
>>
>> Step 5- Prepare MediaWiki -Configuration and Settings
>>
>> Modify MediaWiki for DBpedia, just copy the three files from
>>
>> https://github.com/dbpedia/extraction-framework/tree/master/dump/src/main/mediawikito
>> appropriate directory.
>>
>> In new code it is already there. Still check whether patch
>> https://github.com/dbpedia/extraction-framework/commit/e36913dabe0715672cbf0f2e6c5d86ec424b08b3has
>> been applied to ApiParse.php.
>>
>> Now download required wikimedia extensions listed at the end of
>> LocalSetting.php. For downloading extensions you may refer
>> http://www.mediawiki.org/wiki/Download_from_Git
>>
>> I have first downloaded all the extensions using git; than copied all
>> required extensions into /path_to_mediawiki_parent_dir/mediawiki/extensions
>> with folder structure.
>>
>> Configure your mediawiki directory as web-directory by adding
>> configuration inforation into httpd.conf as below.
>>
>> “
>>
>> Alias /mediawiki /path_to_mediawiki_parent_dir/mediawiki
>> <Directory /mediawiki>
>> Allow from all
>> </Directory>
>> “
>>
>>
>> Step 6- Check if MediaWiki and php configuration is proper or not?
>>
>> Now fire the url at your
>> browserhttp://machine/mediawiki/api.php?uselang=en
>>
>> If you get some usages instructions at you browser than mediawiki
>> configuration has no issue. Skip every thing below for this step and Go to
>> next step.
>>
>> If you are not getting usages information than resolve error by error and
>> fire above specified url every-time and resolve error, unless you get usages
>> instructions.Also keep on checking Apache error log.
>>
>> Now troubleshoot the errors displayed into apache error-log. I am putting
>> some error & solutions which I have faced.
>>
>> ---> Class 'DOMDocument' not found in LocalisationCache.php
>>
>> This error is because you have not installed php-xml module which is
>> specified in step 3.
>>
>> ---> if it ask you to to set $wgShowExceptionDetails = true; in
>> LocalSetting.php
>>
>> Simply do it. It is used to throw full debugging information
>>
>> --->After above step you may get below error.
>>
>> CACHE_ACCEL requested but no suitable object cache is present. You may
>> want to install APC.
>> Backtrace:
>> #0 [internal function]: ObjectCache::newAccelerator(Array)
>> #1
>> /mnt/ebs/framework/media_wiki/wikimedia/includes/objectcache/ObjectCache.php(85):
>> call_user_func('ObjectCache::ne...', Array)
>> #2
>> /mnt/ebs/framework/media_wiki/wikimedia/includes/objectcache/ObjectCache.php(72):
>> ObjectCache::newFromParams(Array)
>> #3
>> /mnt/ebs/framework/media_wiki/wikimedia/includes/objectcache/ObjectCache.php(44):
>> ObjectCache::newFromId(3)
>> #4
>> /mnt/ebs/framework/media_wiki/wikimedia/includes/GlobalFunctions.php(3780):
>> ObjectCache::getInstance(3)
>> #5 /mnt/ebs/framework/media_wiki/wikimedia/includes/Setup.php(464):
>> wfGetMainCache()
>> #6 /mnt/ebs/framework/media_wiki/wikimedia/includes/WebStart.php(157):
>> require_once('/mnt/ebs/framew...')
>> #7 /mnt/ebs/framework/media_wiki/wikimedia/api.php(47):
>> require('/mnt/ebs/framew...')
>> #8 {main}
>>
>> Its mean is that you have not installed php-apc.This is a e-accelerator
>> used to speed-up the process around 4-5 times.
>>
>> If you really not want to use php-apc than please set
>> $wgMainCacheType=CACHE_ANYTHING . But If will make significant impact into
>> performance. (Not Recommended)
>>
>>
>> Step 7- Trigger the abstract export with proper setting in
>> abstract.properties file
>>
>> ../clean-install-run extraction extraction.abstracts.properties
>>
>> "
>>
>> --
>> Regards
>> Gaurav Pant
>> +91-7709196607,+91-9405757794
>
>
>
>
> --
> Kontokostas Dimitris

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to