Re: Adding attributes to Solr fields ?

2013-10-04 Thread jimmy nguyen
Hello,

and thank you for your answer Shawn.

I tried to simplify my problem but I realize I chose a bad example : I
don't process phone numbers, and I do process unstructured documents.

My GATE application might return several annotations for the same group of
words (because I'm using an ontology). So for example, I will have an
Animal annotation, which marks the words "cat", "catfish" and "eider" as
Animal(s), and (depending on the ontology used) the "cat" annotation will
have 2 features : Animal.class=mammal and Animal.class="cat", the "catfish"
will have 1 feature Animal.class=fish, and the more specific term "eider"
will have 2 features : Animal.class=bird, Animal.class=duck.

I don't want to consider 1 solr "document" for each animal, I really want 1
index for each actual document. I'd like to be able to query my solr index
for "bird" and get all the documents containing the terms "bird", or any
subclass or instance (like "duck" or "eider"). Since all the words "bird",
"duck" and "eider" appearing in my documents will be tagged as Animal and
there will be an annotation with Animal.class=bird, it is easy to get Solr
to return the right documents.

But since I get something like :


  
hdfs://...

  cat
  cat
  catfish
  eider
  eider


  mammal
  cat
  fish
  bird
  duck


  http://.../Animal#catfish
  http://.../Animal#eider
  http://.../Animal#eider

  
  
   ...
  
  
   ...
  


... when I want to generate a snippet of the document and highlight the
terms whose appearance made solr return the document (like the first
document containing "eider" when the user is searching for a "bird"), I'd
like to highlight the term "eider" in the snippet, but I don't know how to
do that. Having a correspondance between my solr "animal" and "class"
fields (for example, an id attribute that would link them : eider and the same id for the class "bird") would make it
easier to highlight my term "eider".

What do you think ?

Thanks !
Jim


Adding attributes to Solr fields ?

2013-10-03 Thread jimmy nguyen
Hi all,

is it possible to add attributes to our Solr fields ?

 I'm indexing GATE-annotated documents into solr. The annotations produced
by my GATE application usually have several features (for example,
Person.title, Person.name, Person.phoneNumber...).
Now each of my documents may contain more than one Person annotation, and
each person might have more than one phone number... Unfortunately I don't
know how to index all the features for one annotation in one field in solr.

So instead, I would like to add an attribute "id" (or "offset") to each of
the features I'm sending to Solr in order to be able to find out, for
example, which Person.name goes with which Person.phoneNumber.

So instead of:
 1
 Jane Doe John Doe
 0123456789 1234567890
2345678901


I'd like to get something like this in Solr:

1
 Jane Doe John Doe
 0123456789 
1234567890
2345678901


This way it is easy to link the 2 first phone numbers to Jane Doe and the
last one to John Doe.

Any idea ?

Thanks !
Jim


Re: Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

thanks for the answer. Sorry, I actually meant attribute "subFieldSuffix".

So, in order to be able to index several features in one solr field, should
I program a new Java class inheriting AbstractSubTypeFieldType ? Or is
there another way to do it ?

Thanks !
Jim


On Thu, Sep 19, 2013 at 4:05 PM, Jack Krupansky wrote:

> There is no such fieldType attribute as "subSuffix". Solr is just
> complaining about extraneous, junk attributes. Delete the crap.
>
> -- Jack Krupansky
>
> -Original Message- From: jimmy nguyen
> Sent: Thursday, September 19, 2013 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Indexing several sub-fields in one solr field
>
>
> Hello,
>
> I'd like to index into Solr (4.4.0) documents that I previously annotated
> with GATE (7.1).
> I use Behemoth to be able to run my GATE application on a corpus of
> documents on Hadoop, and then Behemoth allows me to directly send my
> annotated documents to solr. But my question is not about the Behemoth or
> Hadoop parts.
>
> The annotations produced by my GATE application usually have several
> features (for example, annotation type Person has the following features :
> Person.title, Person.firstName, Person.lastName, Person.gender).
> Each of my documents may contain more than one Person annotation, which is
> why I would like to index all the features for one annotation in one field
> in solr.
> How do I do that ?
>
> I thought I'd add the following lines in schema.xml :
>
> 
> ...
> 
> ...
> 
> ...
> 
> ...
>  multiValued="true" />
>  stored="false" />
> ...
> 
>
>
> But as soon as I start my solr instances and try to access solr from my
> browser, I get an HTTP ERROR 500 :
>
> Problem accessing /solr/. Reason:
>
>{msg=SolrCore 'collection1' is not available due to init failure:
> Plugin Initializing failure for [schema.xml]
> fieldType,trace=org.apache.**solr.common.SolrException: SolrCore
> 'collection1' is not available due to init failure: Plugin Initializing
> failure for [schema.xml] fieldType
> at org.apache.solr.core.**CoreContainer.getCore(**CoreContainer.java:860)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
> SolrDispatchFilter.java:287)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
> SolrDispatchFilter.java:158)
> at
> org.eclipse.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1419)
> at
> org.eclipse.jetty.servlet.**ServletHandler.doHandle(**
> ServletHandler.java:455)
> at
> org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
> ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.**SecurityHandler.handle(**
> SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.**session.SessionHandler.**
> doHandle(SessionHandler.java:**231)
> at
> org.eclipse.jetty.server.**handler.ContextHandler.**
> doHandle(ContextHandler.java:**1075)
> at org.eclipse.jetty.servlet.**ServletHandler.doScope(**
> ServletHandler.java:384)
> at
> org.eclipse.jetty.server.**session.SessionHandler.**
> doScope(SessionHandler.java:**193)
> at
> org.eclipse.jetty.server.**handler.ContextHandler.**
> doScope(ContextHandler.java:**1009)
> at
> org.eclipse.jetty.server.**handler.ScopedHandler.handle(**
> ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**handle(**
> ContextHandlerCollection.java:**255)
> at
> org.eclipse.jetty.server.**handler.HandlerCollection.**
> handle(HandlerCollection.java:**154)
> at
> org.eclipse.jetty.server.**handler.HandlerWrapper.handle(**
> HandlerWrapper.java:116)
> at org.eclipse.jetty.server.**Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.**AbstractHttpConnection.**handleRequest(**
> AbstractHttpConnection.java:**489)
> at
> org.eclipse.jetty.server.**BlockingHttpConnection.**handleRequest(**
> BlockingHttpConnection.java:**53)
> at
> org.eclipse.jetty.server.**AbstractHttpConnection.**headerComplete(**
> AbstractHttpConnection.java:**942)
> at
> org.eclipse.jetty.server.**AbstractHttpConnection$**
> RequestHandler.headerComplete(**AbstractHttpConnection.java:**1004)
> at org.eclipse.jetty.http.**HttpParser.parseNext(**HttpParser.java:640)
> at org.eclipse.jetty.http.**HttpParser.parseAvailable(**
> HttpParser.java:235)
> at
> org.eclipse.jetty.server.**BlockingHttpConnection.handle(**
> BlockingHttpConnection.java:**72)
> at
> org.eclipse.jetty.server.bio.**SocketConnector$**ConnectorEndPoint.run(**
> SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(**
> QueuedThreadPool.java:608)
>

Indexing several sub-fields in one solr field

2013-09-19 Thread jimmy nguyen
Hello,

I'd like to index into Solr (4.4.0) documents that I previously annotated
with GATE (7.1).
I use Behemoth to be able to run my GATE application on a corpus of
documents on Hadoop, and then Behemoth allows me to directly send my
annotated documents to solr. But my question is not about the Behemoth or
Hadoop parts.

The annotations produced by my GATE application usually have several
features (for example, annotation type Person has the following features :
Person.title, Person.firstName, Person.lastName, Person.gender).
Each of my documents may contain more than one Person annotation, which is
why I would like to index all the features for one annotation in one field
in solr.
How do I do that ?

I thought I'd add the following lines in schema.xml :


...

...

...

...


...



But as soon as I start my solr instances and try to access solr from my
browser, I get an HTTP ERROR 500 :

Problem accessing /solr/. Reason:

{msg=SolrCore 'collection1' is not available due to init failure:
Plugin Initializing failure for [schema.xml]
fieldType,trace=org.apache.solr.common.SolrException: SolrCore
'collection1' is not available due to init failure: Plugin Initializing
failure for [schema.xml] fieldType
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Plugin Initializing
failure for [schema.xml] fieldType
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
... 1 more
Caused by: java.lang.RuntimeException: schema fieldtype
ladate(org.apache.solr.schema.StrField) invalid
arguments:{subSuffix=_ladate}
at org.apache