[Dbpedia-discussion] Bls: extraction problem

riko adi prasetya Mon, 04 Mar 2013 21:41:50 -0800

Hi Gaurav,

Try to check again your extraction.de.property


"# download and extraction target dir
dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump

# Source file. If source file name ends with .gz or .bz2, it is unzipped on the 
fly. 
# Must exist in the directory xxwiki/20121231 and have the prefix 
xxwiki-20121231-.
 
# default:
# source=pages-articles.xml

# alternatives:
source=pages-articles.xml.bz2
# source=pages-articles.xml.gz

# use only directories that contain a 'download-complete' file? Default is 
false.
require-download-complete=true

# unqualified extractor class names are prefixed by 
org.dbpedia.extraction.mappings.

# All 111 languages that as of 2012-05-25 have 10000 articles or more.
# TODO: parse wikipedias.csv and figure out from there which languages to 
extract.
# If no languages are given, the ones having a mapping namespace on 
mappings.dbpedia.org are used 
languages=de

extractors=InfoboxExtractor
#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor

extractors.de=InfoboxExtractor
#extractors.de=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor

# if ontology and mapping files are not given or do not exist, download info 
from mappings.dbpedia.org
ontology=../ontology.xml
mappings=../mappings

# URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have on of 
the suffixes
# -subjects, -predicates, -objects, -datatype, -context to match only URIs in a 
certain position. 
# Without a suffix, a flag matches all URI positions.

uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*


# File formats. Allowed flags: n-triples, n-quads, turtle-triples, 
turtle-quads, trix-triples, trix-quads
# May be followed by a semicolon and a URI policy name. If format name ends 
with .gz or .bz2, files
# are zipped on the fly.

# NT is unreadable anyway - might as well use URIs
format.nt=n-triples;uri-policy.uri
#format.nq.gz=n-quads;uri-policy.uri

# Turtle is much more readable - use nice IRIs
format.ttl=turtle-triples;uri-policy.iri
#format.tql.gz=turtle-quads;uri-policy.iri
"


You write dir, so there is not base-dir in your extraction configuration.
 
Cheers,
Riko
 

________________________________
Riko Adi Prasetya
Faculty of Computer Science
Universitas Indonesia



________________________________
 Dari: gaurav pant <golup...@gmail.com>
Kepada: dbpedia-discussion@lists.sourceforge.net 
Dikirim: Selasa, 5 Maret 2013 12:10
Judul: [Dbpedia-discussion] extraction problem
 

Hi All,

Greeting for the day..

I want to extract infobox properties and abstract from 
(pages-articles.xml.bz2).I am able to download this file using command "../run 
download config=download.de.properties"

here I have configured file download.de.properties.file to download only german 
page-article file.

Now when i am trying to extract information out from it using "../run 
extraction extraction.de.property" it is giving me below error. In 
extraction.de.property I have mentioned dir properly , the same which I have 
mentioned in download.de.properties file.

Please let me know what wrong is going on?Is there any change need to be done 
in pom.xml of cump dir.

"
[INFO] --- maven-scala-plugin:2.15.2:testCompile (test-compile) @ dump ---
[INFO] Checking for multiple versions of scala
[INFO] includes = [**/*.scala,**/*.java,]
[INFO] excludes = []
[WARNING] No source files found.
[INFO] 
[INFO] <<< maven-scala-plugin:2.15.2:run (default-cli) @ dump <<<
[INFO] 
[INFO] --- maven-scala-plugin:2.15.2:run (default-cli) @ dump ---
[INFO] Checking for multiple versions of scala
[INFO] launcher 'extraction' selected => 
org.dbpedia.extraction.dump.extract.Extraction
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
    at 
org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.IllegalArgumentException: property 'base-dir' not defined.
    at 
org.dbpedia.extraction.dump.extract.ConfigParser.error(ConfigParser.scala:18)
    at org.dbpedia.extraction.dump.extract.Config.<init>(Config.scala:26)
    at org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:26)
    at org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala)
    ... 6 more
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.356s
[INFO] Finished at: Tue Mar 05 04:52:35 UTC 2013
[INFO] Final Memory: 8M/140M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scala-tools:maven-scala-plugin:2.15.2:run 
(default-cli) on project dump: wrap: org.apache.commons.exec.ExecuteException: 
Process exited with an error: 240(Exit value: 240) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
"

contents of extraction.de.property

"# download and extraction target dir
dir=/mnt/ebs/perl/framework/extraction-framework/dump/wiki_dump

# Source file. If source file name ends with .gz or .bz2, it is unzipped on the 
fly. 
# Must exist in the directory xxwiki/20121231 and have the prefix 
xxwiki-20121231-.
 
# default:
# source=pages-articles.xml

# alternatives:
source=pages-articles.xml.bz2
# source=pages-articles.xml.gz

# use only directories that contain a 'download-complete' file? Default is 
false.
require-download-complete=true

# unqualified extractor class names are prefixed by 
org.dbpedia.extraction.mappings.

# All 111 languages that as of 2012-05-25 have 10000 articles or more.
# TODO: parse wikipedias.csv and figure out from there which languages to 
extract.
# If no languages are given, the ones having a mapping namespace on 
mappings.dbpedia.org are used 
languages=de

extractors=InfoboxExtractor
#ArticleCategoriesExtractor,CategoryLabelExtractor,ExternalLinksExtractor,\
#GeoExtractor,InfoboxExtractor,LabelExtractor,PageIdExtractor,PageLinksExtractor,\
#RedirectExtractor,RevisionIdExtractor,SkosCategoriesExtractor,WikiPageExtractor

extractors.de=InfoboxExtractor
#extractors.de=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor
#extractors.en=MappingExtractor,DisambiguationExtractor,InterLanguageLinksExtractor,RedirectExtractor,LabelExtractor

# if ontology and mapping files are not given or do not exist, download info 
from mappings.dbpedia.org
ontology=../ontology.xml
mappings=../mappings

# URI policies. Allowed flags: uri, generic, xml-safe. Each flag may have on of 
the suffixes
# -subjects, -predicates, -objects, -datatype, -context to match only URIs in a 
certain position. 
# Without a suffix, a flag matches all URI positions.

uri-policy.uri=uri:en; generic:en; xml-safe-predicates:*
uri-policy.iri=generic:en; xml-safe-predicates:*


# File formats. Allowed flags: n-triples, n-quads, turtle-triples, 
turtle-quads, trix-triples, trix-quads
# May be followed by a semicolon and a URI policy name. If format name ends 
with .gz or .bz2, files
# are zipped on the fly.

# NT is unreadable anyway - might as well use URIs
format.nt=n-triples;uri-policy.uri
#format.nq.gz=n-quads;uri-policy.uri

# Turtle is much more readable - use nice IRIs
format.ttl=turtle-triples;uri-policy.iri
#format.tql.gz=turtle-quads;uri-policy.iri
"

-- 
Regards
Gaurav Pant
+91-7709196607,+91-9405757794

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb

_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Bls: extraction problem

Reply via email to