Hi Gaurav,
Try to check again your extraction.de.property
"# download and extraction target dir
dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump
# Source file. If source file name ends with .gz or .bz2, it is unzipped on the
fly.
# Must exist in the directory xxwiki/20121231 and have the prefix
xxwiki-20121231-.
# default:
# source=pages-articles.xml
# alternatives:
source=pages-articles.xml.bz2
# source=pages-articles.xml.gz
# use only directories that contain a 'download-complete' file? Default is
false.
require-download-complete=true
# unqualified extractor class names are prefixed by
org.dbpedia.extraction.mappings.
# All 111 languages that as of 2012-05-25 have 10000 articles or more.
# TODO: parse wikipedias.csv and figure out from there which languages to
extract.
# If no languages are given, the ones having a mapping namespace on
mappings.dbpedia.org are used
languages=de
extractors=InfoboxExtractor
#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor
extractors.de=InfoboxExtractor
#extractors.de=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
# if ontology and mapping files are not given or do not exist, download info
from mappings.dbpedia.org
ontology=../ontology.xml
mappings=../mappings
# URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have on of
the suffixes
# -subjects, -predicates, -objects, -datatype, -context to match only URIs in a
certain position.
# Without a suffix, a flag matches all URI positions.
uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*
# File formats. Allowed flags: n-triples, n-quads, turtle-triples,
turtle-quads, trix-triples, trix-quads
# May be followed by a semicolon and a URI policy name. If format name ends
with .gz or .bz2, files
# are zipped on the fly.
# NT is unreadable anyway - might as well use URIs
format.nt=n-triples;uri-policy.uri
#format.nq.gz=n-quads;uri-policy.uri
# Turtle is much more readable - use nice IRIs
format.ttl=turtle-triples;uri-policy.iri
#format.tql.gz=turtle-quads;uri-policy.iri
"
You write dir, so there is not base-dir in your extraction configuration.
Cheers,
Riko
________________________________
Riko Adi Prasetya
Faculty of Computer Science
Universitas Indonesia
________________________________
Dari: gaurav pant <golup...@gmail.com>
Kepada: dbpedia-discussion@lists.sourceforge.net
Dikirim: Selasa, 5 Maret 2013 12:10
Judul: [Dbpedia-discussion] extraction problem
Hi All,
Greeting for the day..
I want to extract infobox properties and abstract from
(pages-articles.xml.bz2).I am able to download this file using command "../run
download config=download.de.properties"
here I have configured file download.de.properties.file to download only german
page-article file.
Now when i am trying to extract information out from it using "../run
extraction extraction.de.property" it is giving me below error. In
extraction.de.property I have mentioned dir properly , the same which I have
mentioned in download.de.properties file.
Please let me know what wrong is going on?Is there any change need to be done
in pom.xml of cump dir.
"
[INFO] --- maven-scala-plugin:2.15.2:testCompile (test-compile) @ dump ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO]
[INFO] <<< maven-scala-plugin:2.15.2:run (default-cli) @ dump <<<
[INFO]
[INFO] --- maven-scala-plugin:2.15.2:run (default-cli) @ dump ---
[INFO] Checking for multiple versions of scala
[INFO] launcher 'extraction' selected =>
org.dbpedia.extraction.dump.extract.Extraction
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
at
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.IllegalArgumentException: property 'base-dir' not defined.
at
org.dbpedia.extraction.dump.extract.ConfigParser.error(ConfigParser.scala:18)
at org.dbpedia.extraction.dump.extract.Config.<init>(Config.scala:26)
at org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:26)
at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
... 6 more
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.356s
[INFO] Finished at: Tue Mar 05 04:52:35 UTC 2013
[INFO] Final Memory: 8M/140M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scala-tools:maven-scala-plugin:2.15.2:run
(default-cli) on project dump: wrap: org.apache.commons.exec.ExecuteException:
Process exited with an error: 240(Exit value: 240) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please
read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
"
contents of extraction.de.property
"# download and extraction target dir
dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump
# Source file. If source file name ends with .gz or .bz2, it is unzipped on the
fly.
# Must exist in the directory xxwiki/20121231 and have the prefix
xxwiki-20121231-.
# default:
# source=pages-articles.xml
# alternatives:
source=pages-articles.xml.bz2
# source=pages-articles.xml.gz
# use only directories that contain a 'download-complete' file? Default is
false.
require-download-complete=true
# unqualified extractor class names are prefixed by
org.dbpedia.extraction.mappings.
# All 111 languages that as of 2012-05-25 have 10000 articles or more.
# TODO: parse wikipedias.csv and figure out from there which languages to
extract.
# If no languages are given, the ones having a mapping namespace on
mappings.dbpedia.org are used
languages=de
extractors=InfoboxExtractor
#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor
extractors.de=InfoboxExtractor
#extractors.de=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
# if ontology and mapping files are not given or do not exist, download info
from mappings.dbpedia.org
ontology=../ontology.xml
mappings=../mappings
# URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have on of
the suffixes
# -subjects, -predicates, -objects, -datatype, -context to match only URIs in a
certain position.
# Without a suffix, a flag matches all URI positions.
uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*
# File formats. Allowed flags: n-triples, n-quads, turtle-triples,
turtle-quads, trix-triples, trix-quads
# May be followed by a semicolon and a URI policy name. If format name ends
with .gz or .bz2, files
# are zipped on the fly.
# NT is unreadable anyway - might as well use URIs
format.nt=n-triples;uri-policy.uri
#format.nq.gz=n-quads;uri-policy.uri
# Turtle is much more readable - use nice IRIs
format.ttl=turtle-triples;uri-policy.iri
#format.tql.gz=turtle-quads;uri-policy.iri
"
--
Regards
Gaurav Pant
+91-7709196607,+91-9405757794
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion