Hi Amindri This are valuable information.
The important thing is that >> 1 m.0___xpc >> 1 m.0___xk_ "m.0___xpc" is processed as "http://rdf.freebase.com/ns/m.0___xpc". Make sure that your "indexing/config/iditerator.properties" is configured accordingly. If not you will see the log noting that the indexing has started. Than you will have no loggings for quite some time. After that it will finish indexing without a single entity to be indexed. The reason is that the URIs for Entities are wrongly build and therefore not found in the source triple store. If the "iditerator.properties" is correctly configured you will see logs every few thousand indexed entities. >> So the entities are not preceded with a name space. Therefore when calling >> String prefix = NamespaceMappingUtils.getPrefix(entity); >> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is >> assigned with a empty String. >> Is it correct to defined an empty name space mapping as follows in the >> namespaceprefix.mapping? An empty String represents the default namespace. You can provide a mapping for an empty String in the "namespaceprefix.mapping" file. If not this would be a Bug. best Rupert On Tue, Feb 10, 2015 at 3:50 AM, Amindri Udugala <amindriudug...@gmail.com> wrote: > Hi Rupert, > > Sorry about the previous mail. I configured the ns-prefix-state property in > iditerator.properties file to false he the indexing process finished > without any error. However I'm not sure if what I did was correct. > If it is correct, it will be quite helpful, to throw an exception if the > prefix is empty and prefix state is set to true. > I'm sorry again if any of the things I mentioned doesn't make any sense :) > Thanks > > On 10 February 2015 at 11:53, Amindri Udugala <amindriudug...@gmail.com> > wrote: > >> Hi Rupert, >> >> Thank for the prompt reply. >> >> When I checked the incoming_links.txt the final lines were as follows >> 1 m.0___xpc >> 1 m.0___xk_ >> 1 m.0___ttg >> 1 m.0___t6s >> 1 m.0___t6h >> 1 m.0___t5v >> 1 m.0___t5c >> 1 m.0___rw7 >> 1 m.0___qhn >> 1 m.0___p3v >> 1 m.0___nm5 >> 1 m.0___n4s >> 1 m.0___n >> 1 m.0___jk_ >> 1 m.0___hv4 >> 1 m.0___c6k >> 1 m.0___b4g >> 1 m.0___8 >> 1 m.0___7yv >> 1 m.0___2fw >> 1 m.0____ >> >> So the entities are not preceded with a name space. Therefore when calling >> String prefix = NamespaceMappingUtils.getPrefix(entity); >> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is >> assigned with a empty String. >> Is it correct to defined an empty name space mapping as follows in the >> namespaceprefix.mapping? >> >> fb http://rdf.freebase.com/ns/ >> ns http://rdf.freebase.com/ns/ >> key http://rdf.freebase.com/key/ >> http://rdf.freebase.com/ns/ >> >> Thanks >> >> >> Regards >> Amindri >> >> On 9 February 2015 at 17:52, Rupert Westenthaler < >> rupert.westentha...@gmail.com> wrote: >> >>> Hi Amindri >>> >>> Based on the code the NPE could originate from a namespace prefix >>> unknown to the namespace prefix service. >>> >>> Can you please check the data of the "incoming_links.txt" file against >>> mappings define in the "indexing/config/namespaceprefix.mappings" >>> file. My guess is that the "incoming_links.txt" uses a prefix that is >>> not define in the mappings file. >>> >>> It is recommended to explicitly define namespace prefix mappings for >>> all namespaces used by the indexing process (config data and rdf >>> data). For missing mappings http://prefix.cc/ is used as a fallback. >>> >>> best >>> Rupert >>> >>> >>> On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala >>> <amindriudug...@gmail.com> wrote: >>> > Hi All, >>> > >>> > I need to create an index from a Freebase data dump. So I followed the >>> > instructions in the README file in entityhub\indexing\freebase. >>> > >>> > First I executed java -jar >>> > org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init, >>> to >>> > generate the folder structure. The folder structure was successfully >>> > generated except for the following warnings >>> > >>> > 16:16:20,530 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'nsogi' valid , namespace ' >>> http://prefix.cc/nsogi:' >>> > invalid -> mapping ignored! >>> > 16:16:21,279 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'category' valid , namespace ' >>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! >>> > 16:16:21,435 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'chebi' valid , namespace ' >>> > http://bio2rdf.org/chebi:' invalid -> mapping ignored! >>> > 16:16:21,435 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'hgnc' valid , namespace ' >>> http://bio2rdf.org/hgnc:' >>> > invalid -> mapping ignored! >>> > 16:16:21,450 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'dbptmpl' valid , namespace ' >>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored! >>> > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'pubmed' valid , namespace ' >>> > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored! >>> > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'dbc' valid , namespace ' >>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! >>> > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'dbt' valid , namespace ' >>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored! >>> > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'dbrc' valid , namespace ' >>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! >>> > 16:16:21,809 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'call' valid , namespace ' >>> > http://webofcode.org/wfn/call:' invalid -> mapping ignored! >>> > 16:16:21,809 [main] WARN impl.NamespacePrefixProviderImpl - Invalid >>> > Namespace Mapping: prefix 'affymetrix' valid , namespace ' >>> > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored! >>> > >>> > Then I copied the Freebase dump (freebase-rdf-latest.gz) to the >>> > indexing/resources/rdfdata folder >>> > and the incoming_links.txt file, generated by fbrankings-uri.sh to >>> > indexing/resources folder and executed the indexing process. (I used all >>> > the default config files) >>> > >>> > While executing the index process I noticed the following log. >>> > >>> > >>> > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityDataIterable: >>> null >>> > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityIterator: >>> > >>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c >>> > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityDataProvider: >>> > >>> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55 >>> > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityScoreProvider: >>> null >>> > >>> > Finally it threw a null pointer exception as follows >>> > >>> > 16:38:40,837 [Thread-3] INFO source.ResourceLoader - ... 1 files >>> imported >>> > in 0 seconds >>> > 16:38:40,837 [Thread-3] INFO source.ResourceLoader - Loding 0 File ... >>> > 16:38:40,837 [Thread-3] INFO source.ResourceLoader - ... 0 files >>> imported >>> > in 0 seconds >>> > 16:38:42,912 [Thread-0] INFO solryard.SolrYardIndexingDestination - >>> ... >>> > create SolrYard >>> > 16:38:42,959 [main] INFO impl.IndexerImpl - ... delete existing >>> > IndexedEntityId file >>> > >>> C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip >>> > 16:38:42,974 [main] INFO impl.IndexerImpl - Initialisation completed >>> > 16:38:42,974 [main] INFO impl.IndexerImpl - ... initialisation >>> completed >>> > 16:38:42,974 [main] INFO impl.IndexerImpl - start indexing ... >>> > 16:38:42,974 [main] INFO impl.IndexerImpl - Indexing started ... >>> > Exception in thread "Indexing: Entity Source Reader Deamon" >>> > java.lang.NullPointerException >>> > at java.lang.StringBuilder.<init>(Unknown Source) >>> > at >>> > >>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435) >>> > at >>> > >>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379) >>> > at >>> > >>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356) >>> > at >>> > >>> org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55) >>> > at java.lang.Thread.run(Unknown Source) >>> > >>> > I'm not sure if this happens because I haven't configured an important >>> > property in a configuration file. I'm pretty new to Stanbol and any help >>> > would be much appreciated. >>> > >>> > Thanks in advance. >>> > -- >>> > Regards >>> > Amindri Udugala >>> >>> >>> >>> -- >>> | Rupert Westenthaler rupert.westentha...@gmail.com >>> | Bodenlehenstraße 11 ++43-699-11108907 >>> | A-5500 Bischofshofen >>> | REDLINK.CO >>> .......................................................................... >>> | http://redlink.co/ >>> >> >> >> >> -- >> Regards >> Amindri Udugala >> >> >> > > > -- > Regards > Amindri Udugala -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/