Hi Anupam, Something's wrong with your file. enwiki-20150113-pages-articles.xml.bz2 does not exist on dumps.wikimedia.org, but enwiki-20150112-pages-articles.xml.bz2 and wikidatawiki-20150113-pages-articles.xml.bz2 do.
Please download the enwiki dump and try again. The best way is to adapt download.minimal.properties and extraction.default.properties to your needs and then execute ../run download config=download.minimal.properties and later ../run extraction extraction.default.properties The warnings you sent imply that the parser is reading the wikidata dump file, not the enwiki file. The "unexpected end of stream" error probably means that the file is corrupted. Regards, JC On Tue, Feb 10, 2015 at 3:05 PM, Anupam Mishra <anupam.nihil...@gmail.com> wrote: > Hi All, > > I have downloaded DBpedia extraction framework and trying to extract > enwiki-20150113-pages-articles.xml.bz2 using commond mvn scala:run > "-Dlauncher=extraction" "-DaddArgs=extraction.default.properties" but > getting following exception. > > WARNING: Error parsing title: found namespace 0/Main, expected 4/Project in > title Wikidata:Notability/eo > Feb 10, 2015 7:21:29 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Notability/3/eo > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/Page display > title/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected 4/Project in > title Wikidata:Introduction/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/1/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/2/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/3/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/4/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/5/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Glossary/23/he > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/6/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/7/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/8/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/9/uk > Feb 10, 2015 7:22:01 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/22/uk > Feb 10, 2015 7:22:04 PM org.dbpedia.extraction.sources.WikipediaDumpParser > readPage > WARNING: Error parsing title: found namespace 0/Main, expected > 1198/Namespace 1198 in title Translations:Wikidata:Introduction/10/uk > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at scala_maven_executions.MainHelper.runMain(MainHelper.java:164) > at > scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26) > Caused by: javax.xml.stream.XMLStreamException: ParseError at > [row,col]:[4068918,3920] > Message: unexpected end of stream > at > com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596) > at > com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getElementText(XMLStreamReaderImpl.java:862) > at > org.dbpedia.extraction.sources.WikipediaDumpParser.readString(WikipediaDumpParser.java:395) > at > org.dbpedia.extraction.sources.WikipediaDumpParser.readRevision(WikipediaDumpParser.java:290) > at > org.dbpedia.extraction.sources.WikipediaDumpParser.readPage(WikipediaDumpParser.java:248) > at > org.dbpedia.extraction.sources.WikipediaDumpParser.readPages(WikipediaDumpParser.java:187) > at > org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:145) > at > org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:116) > at > org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:112) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:252) > at > org.dbpedia.extraction.sources.XMLReaderSource.flatMap(XMLSource.scala:108) > at > org.dbpedia.extraction.mappings.Redirects$.loadFromSource(Redirects.scala:171) > at > org.dbpedia.extraction.mappings.Redirects$.load(Redirects.scala:122) > at > org.dbpedia.extraction.dump.extract.ConfigLoader$$anon$1.<init>(ConfigLoader.scala:101) > at > org.dbpedia.extraction.dump.extract.ConfigLoader.org$dbpedia$extraction$dump$extract$ConfigLoader$$createExtractionJob(ConfigLoader.scala:53) > at > org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:40) > at > org.dbpedia.extraction.dump.extract.ConfigLoader$$anonfun$getExtractionJobs$1.apply(ConfigLoader.scala:40) > at > scala.collection.TraversableViewLike$Mapped$$anonfun$foreach$2.apply(TraversableViewLike.scala:169) > at scala.collection.Iterator$class.foreach(Iterator.scala:743) > at > scala.collection.immutable.RedBlackTree$TreeIterator.foreach(RedBlackTree.scala:468) > at > scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:310) > at > scala.collection.TraversableViewLike$Mapped$class.foreach(TraversableViewLike.scala:168) > at > scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:113) > at > org.dbpedia.extraction.dump.extract.Extraction$.main(Extraction.scala:30) > at > org.dbpedia.extraction.dump.extract.Extraction.main(Extraction.scala) > > > Thanks & Regards, > Anupam > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Dbpedia-discussion mailing list > Dbpedia-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion