Hi Amindri

This are valuable information.

The important thing is that

>>       1 m.0___xpc
>>       1 m.0___xk_

"m.0___xpc" is processed as "http://rdf.freebase.com/ns/m.0___xpc";.
Make sure that your "indexing/config/iditerator.properties" is
configured accordingly.
If not you will see the log noting that the indexing has started. Than
you will have no loggings for quite some time. After that it will
finish indexing without a single entity to be indexed. The reason is
that the URIs for Entities are wrongly build and therefore not found
in the source triple store.

If the "iditerator.properties" is correctly configured you will see
logs every few thousand indexed entities.

>> So the entities are not preceded with a name space. Therefore when calling
>> String prefix = NamespaceMappingUtils.getPrefix(entity);
>> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is
>> assigned with a empty String.
>> Is it correct to defined an empty name space mapping as follows in the
>> namespaceprefix.mapping?

An empty String represents the default namespace. You can provide a
mapping for an empty String in the "namespaceprefix.mapping" file. If
not this would be a Bug.

best
Rupert


On Tue, Feb 10, 2015 at 3:50 AM, Amindri Udugala
<amindriudug...@gmail.com> wrote:
> Hi Rupert,
>
> Sorry about the previous mail. I configured the ns-prefix-state property in
> iditerator.properties file to false he the indexing process finished
> without any error. However I'm not sure if what I did was correct.
> If it is correct, it will be quite helpful, to throw an exception if the
> prefix is empty and prefix state is set to true.
> I'm sorry again if any of the things I mentioned doesn't make any sense :)
> Thanks
>
> On 10 February 2015 at 11:53, Amindri Udugala <amindriudug...@gmail.com>
> wrote:
>
>> Hi Rupert,
>>
>> Thank for the prompt reply.
>>
>> When I checked the incoming_links.txt  the final lines were as follows
>>       1 m.0___xpc
>>       1 m.0___xk_
>>       1 m.0___ttg
>>       1 m.0___t6s
>>       1 m.0___t6h
>>       1 m.0___t5v
>>       1 m.0___t5c
>>       1 m.0___rw7
>>       1 m.0___qhn
>>       1 m.0___p3v
>>       1 m.0___nm5
>>       1 m.0___n4s
>>       1 m.0___n
>>       1 m.0___jk_
>>       1 m.0___hv4
>>       1 m.0___c6k
>>       1 m.0___b4g
>>       1 m.0___8
>>       1 m.0___7yv
>>       1 m.0___2fw
>>       1 m.0____
>>
>> So the entities are not preceded with a name space. Therefore when calling
>> String prefix = NamespaceMappingUtils.getPrefix(entity);
>> (LineBasedEntityIterator.parseEntityFormLine(String line) - 425), prefix is
>> assigned with a empty String.
>> Is it correct to defined an empty name space mapping as follows in the
>> namespaceprefix.mapping?
>>
>> fb    http://rdf.freebase.com/ns/
>> ns    http://rdf.freebase.com/ns/
>> key    http://rdf.freebase.com/key/
>>     http://rdf.freebase.com/ns/
>>
>> Thanks
>>
>>
>> Regards
>> Amindri
>>
>> On 9 February 2015 at 17:52, Rupert Westenthaler <
>> rupert.westentha...@gmail.com> wrote:
>>
>>> Hi Amindri
>>>
>>> Based on the code the NPE could originate from a namespace prefix
>>> unknown to the namespace prefix service.
>>>
>>> Can you please check the data of the "incoming_links.txt" file against
>>> mappings define in the "indexing/config/namespaceprefix.mappings"
>>> file. My guess is that the "incoming_links.txt" uses a prefix that is
>>> not define in the mappings file.
>>>
>>> It is recommended to explicitly define namespace prefix mappings for
>>> all namespaces used by the indexing process (config data and rdf
>>> data). For missing mappings http://prefix.cc/ is used as a fallback.
>>>
>>> best
>>> Rupert
>>>
>>>
>>> On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala
>>> <amindriudug...@gmail.com> wrote:
>>> > Hi All,
>>> >
>>> > I need to create an index from a Freebase data dump. So I followed the
>>> > instructions in the README file in entityhub\indexing\freebase.
>>> >
>>> > First I executed java -jar
>>> > org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init,
>>> to
>>> > generate the folder structure. The folder structure was successfully
>>> > generated except for the following warnings
>>> >
>>> > 16:16:20,530 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'nsogi' valid , namespace '
>>> http://prefix.cc/nsogi:'
>>> > invalid -> mapping ignored!
>>> > 16:16:21,279 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'category' valid , namespace '
>>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'chebi' valid , namespace '
>>> > http://bio2rdf.org/chebi:' invalid -> mapping ignored!
>>> > 16:16:21,435 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'hgnc' valid , namespace '
>>> http://bio2rdf.org/hgnc:'
>>> > invalid -> mapping ignored!
>>> > 16:16:21,450 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbptmpl' valid , namespace '
>>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'pubmed' valid , namespace '
>>> > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbc' valid , namespace '
>>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbt' valid , namespace '
>>> > http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
>>> > 16:16:21,638 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'dbrc' valid , namespace '
>>> > http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
>>> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'call' valid , namespace '
>>> > http://webofcode.org/wfn/call:' invalid -> mapping ignored!
>>> > 16:16:21,809 [main] WARN  impl.NamespacePrefixProviderImpl - Invalid
>>> > Namespace Mapping: prefix 'affymetrix' valid , namespace '
>>> > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
>>> >
>>> > Then I copied the Freebase dump (freebase-rdf-latest.gz) to the
>>> > indexing/resources/rdfdata folder
>>> > and the incoming_links.txt file, generated by fbrankings-uri.sh  to
>>> > indexing/resources folder and executed the indexing process. (I used all
>>> > the default config files)
>>> >
>>> > While executing the index process I noticed the following log.
>>> >
>>> >
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataIterable:
>>> null
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityIterator:
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityDataProvider:
>>> >
>>> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55
>>> > 16:38:40,806 [main] INFO  core.IndexerFactory -  - EntityScoreProvider:
>>> null
>>> >
>>> > Finally it threw a null pointer exception as follows
>>> >
>>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 1 files
>>> imported
>>> > in 0 seconds
>>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader - Loding 0 File ...
>>> > 16:38:40,837 [Thread-3] INFO  source.ResourceLoader -  ... 0 files
>>> imported
>>> > in 0 seconds
>>> > 16:38:42,912 [Thread-0] INFO  solryard.SolrYardIndexingDestination -
>>> ...
>>> > create SolrYard
>>> > 16:38:42,959 [main] INFO  impl.IndexerImpl -  ... delete existing
>>> > IndexedEntityId file
>>> >
>>> C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Initialisation completed
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl -   ... initialisation
>>> completed
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - start indexing ...
>>> > 16:38:42,974 [main] INFO  impl.IndexerImpl - Indexing started ...
>>> > Exception in thread "Indexing: Entity Source Reader Deamon"
>>> > java.lang.NullPointerException
>>> >         at java.lang.StringBuilder.<init>(Unknown Source)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356)
>>> >         at
>>> >
>>> org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55)
>>> >         at java.lang.Thread.run(Unknown Source)
>>> >
>>> > I'm not sure if this happens because I haven't configured an important
>>> > property in a configuration file. I'm pretty new to Stanbol and any help
>>> > would be much appreciated.
>>> >
>>> > Thanks in advance.
>>> > --
>>> > Regards
>>> > Amindri Udugala
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westentha...@gmail.com
>>> | Bodenlehenstraße 11                              ++43-699-11108907
>>> | A-5500 Bischofshofen
>>> | REDLINK.CO
>>> ..........................................................................
>>> | http://redlink.co/
>>>
>>
>>
>>
>> --
>> Regards
>> Amindri Udugala
>>
>>
>>
>
>
> --
> Regards
> Amindri Udugala



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to