Re: Adding attributes to Solr fields ?
Hello, and thank you for your answer Shawn. I tried to simplify my problem but I realize I chose a bad example : I don't process phone numbers, and I do process unstructured documents. My GATE application might return several annotations for the same group of words (because I'm using an ontology). So for example, I will have an Animal annotation, which marks the words "cat", "catfish" and "eider" as Animal(s), and (depending on the ontology used) the "cat" annotation will have 2 features : Animal.class=mammal and Animal.class="cat", the "catfish" will have 1 feature Animal.class=fish, and the more specific term "eider" will have 2 features : Animal.class=bird, Animal.class=duck. I don't want to consider 1 solr "document" for each animal, I really want 1 index for each actual document. I'd like to be able to query my solr index for "bird" and get all the documents containing the terms "bird", or any subclass or instance (like "duck" or "eider"). Since all the words "bird", "duck" and "eider" appearing in my documents will be tagged as Animal and there will be an annotation with Animal.class=bird, it is easy to get Solr to return the right documents. But since I get something like : hdfs://... cat cat catfish eider eider mammal cat fish bird duck http://.../Animal#catfish http://.../Animal#eider http://.../Animal#eider ... ... ... when I want to generate a snippet of the document and highlight the terms whose appearance made solr return the document (like the first document containing "eider" when the user is searching for a "bird"), I'd like to highlight the term "eider" in the snippet, but I don't know how to do that. Having a correspondance between my solr "animal" and "class" fields (for example, an id attribute that would link them : eider and the same id for the class "bird") would make it easier to highlight my term "eider". What do you think ? Thanks ! Jim
Adding attributes to Solr fields ?
Hi all, is it possible to add attributes to our Solr fields ? I'm indexing GATE-annotated documents into solr. The annotations produced by my GATE application usually have several features (for example, Person.title, Person.name, Person.phoneNumber...). Now each of my documents may contain more than one Person annotation, and each person might have more than one phone number... Unfortunately I don't know how to index all the features for one annotation in one field in solr. So instead, I would like to add an attribute "id" (or "offset") to each of the features I'm sending to Solr in order to be able to find out, for example, which Person.name goes with which Person.phoneNumber. So instead of: 1 Jane Doe John Doe 0123456789 1234567890 2345678901 I'd like to get something like this in Solr: 1 Jane Doe John Doe 0123456789 1234567890 2345678901 This way it is easy to link the 2 first phone numbers to Jane Doe and the last one to John Doe. Any idea ? Thanks ! Jim
Re: Indexing several sub-fields in one solr field
Hello, thanks for the answer. Sorry, I actually meant attribute "subFieldSuffix". So, in order to be able to index several features in one solr field, should I program a new Java class inheriting AbstractSubTypeFieldType ? Or is there another way to do it ? Thanks ! Jim On Thu, Sep 19, 2013 at 4:05 PM, Jack Krupansky wrote: > There is no such fieldType attribute as "subSuffix". Solr is just > complaining about extraneous, junk attributes. Delete the crap. > > -- Jack Krupansky > > -Original Message- From: jimmy nguyen > Sent: Thursday, September 19, 2013 12:43 PM > To: solr-user@lucene.apache.org > Subject: Indexing several sub-fields in one solr field > > > Hello, > > I'd like to index into Solr (4.4.0) documents that I previously annotated > with GATE (7.1). > I use Behemoth to be able to run my GATE application on a corpus of > documents on Hadoop, and then Behemoth allows me to directly send my > annotated documents to solr. But my question is not about the Behemoth or > Hadoop parts. > > The annotations produced by my GATE application usually have several > features (for example, annotation type Person has the following features : > Person.title, Person.firstName, Person.lastName, Person.gender). > Each of my documents may contain more than one Person annotation, which is > why I would like to index all the features for one annotation in one field > in solr. > How do I do that ? > > I thought I'd add the following lines in schema.xml : > > > ... > > ... > > ... > > ... > multiValued="true" /> > stored="false" /> > ... > > > > But as soon as I start my solr instances and try to access solr from my > browser, I get an HTTP ERROR 500 : > > Problem accessing /solr/. Reason: > >{msg=SolrCore 'collection1' is not available due to init failure: > Plugin Initializing failure for [schema.xml] > fieldType,trace=org.apache.**solr.common.SolrException: SolrCore > 'collection1' is not available due to init failure: Plugin Initializing > failure for [schema.xml] fieldType > at org.apache.solr.core.**CoreContainer.getCore(**CoreContainer.java:860) > at > org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** > SolrDispatchFilter.java:287) > at > org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** > SolrDispatchFilter.java:158) > at > org.eclipse.jetty.servlet.**ServletHandler$CachedChain.** > doFilter(ServletHandler.java:**1419) > at > org.eclipse.jetty.servlet.**ServletHandler.doHandle(** > ServletHandler.java:455) > at > org.eclipse.jetty.server.**handler.ScopedHandler.handle(** > ScopedHandler.java:137) > at > org.eclipse.jetty.security.**SecurityHandler.handle(** > SecurityHandler.java:557) > at > org.eclipse.jetty.server.**session.SessionHandler.** > doHandle(SessionHandler.java:**231) > at > org.eclipse.jetty.server.**handler.ContextHandler.** > doHandle(ContextHandler.java:**1075) > at org.eclipse.jetty.servlet.**ServletHandler.doScope(** > ServletHandler.java:384) > at > org.eclipse.jetty.server.**session.SessionHandler.** > doScope(SessionHandler.java:**193) > at > org.eclipse.jetty.server.**handler.ContextHandler.** > doScope(ContextHandler.java:**1009) > at > org.eclipse.jetty.server.**handler.ScopedHandler.handle(** > ScopedHandler.java:135) > at > org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**handle(** > ContextHandlerCollection.java:**255) > at > org.eclipse.jetty.server.**handler.HandlerCollection.** > handle(HandlerCollection.java:**154) > at > org.eclipse.jetty.server.**handler.HandlerWrapper.handle(** > HandlerWrapper.java:116) > at org.eclipse.jetty.server.**Server.handle(Server.java:368) > at > org.eclipse.jetty.server.**AbstractHttpConnection.**handleRequest(** > AbstractHttpConnection.java:**489) > at > org.eclipse.jetty.server.**BlockingHttpConnection.**handleRequest(** > BlockingHttpConnection.java:**53) > at > org.eclipse.jetty.server.**AbstractHttpConnection.**headerComplete(** > AbstractHttpConnection.java:**942) > at > org.eclipse.jetty.server.**AbstractHttpConnection$** > RequestHandler.headerComplete(**AbstractHttpConnection.java:**1004) > at org.eclipse.jetty.http.**HttpParser.parseNext(**HttpParser.java:640) > at org.eclipse.jetty.http.**HttpParser.parseAvailable(** > HttpParser.java:235) > at > org.eclipse.jetty.server.**BlockingHttpConnection.handle(** > BlockingHttpConnection.java:**72) > at > org.eclipse.jetty.server.bio.**SocketConnector$**ConnectorEndPoint.run(** > SocketConnector.java:264) > at > org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(** > QueuedThreadPool.java:608) >
Indexing several sub-fields in one solr field
Hello, I'd like to index into Solr (4.4.0) documents that I previously annotated with GATE (7.1). I use Behemoth to be able to run my GATE application on a corpus of documents on Hadoop, and then Behemoth allows me to directly send my annotated documents to solr. But my question is not about the Behemoth or Hadoop parts. The annotations produced by my GATE application usually have several features (for example, annotation type Person has the following features : Person.title, Person.firstName, Person.lastName, Person.gender). Each of my documents may contain more than one Person annotation, which is why I would like to index all the features for one annotation in one field in solr. How do I do that ? I thought I'd add the following lines in schema.xml : ... ... ... ... ... But as soon as I start my solr instances and try to access solr from my browser, I get an HTTP ERROR 500 : Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) at org.apache.solr.schema.IndexSchema.(IndexSchema.java:164) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) ... 1 more Caused by: java.lang.RuntimeException: schema fieldtype ladate(org.apache.solr.schema.StrField) invalid arguments:{subSuffix=_ladate} at org.apache