Storing metadata from post parameters and XML
I'm very unclear on how to associate what I need to a Solr index entry. Based on what I've read thus far, you can extract data from text files and store that in a Solr document. I have hundreds of thousands of documents in a database/svn type system. When I index a file, it is likely going to be local to the filesystem and I know the location it will take on in the database. So, when I index, I want to provide a path that it can find it when someone else does a search. 123.xml may look like: my title Every foobar has its day My caption and the proprietary location I want it to be associated with is: /abc/def/ghi/123.xml So, when a user does a search for "foobar", it returns some information about 123.xml but most importantly the location should be available. I have yet to find (in the schema.xml or otherwise) where you can define that path to store, and how you would pass along that parameter in the indexing of that document. Instead, from the examples I can find, including the book, you store fields from your data into the index. In the book's examples (a music database), searching for "Cherub Rock" returns a list of with their duration, track name, album name, and artist. In other words, the full text data you retrieve is the only information the search index has to offer. Just for example, using the exampledocs post.jar, I'm envisioning something like this: java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1 "xxx" -othermeta2 "zzz" Then the Solr doc would look like: 123 /abc/def/ghi/123.xml xxx zzz my title /abc/xxx.gif Every foobar has its day My caption This way, when a user searches for foobar, they get item 123 back, review the search result and if they decide that's the data they want, they can use the dblocation field to retrieve the data for editing purposes (and then re-index it following their edits). I'm guessing I just haven't found the right terms yet to look into, as I'm very new to this. Thanks for any direction you can provide. Also, if Solr appears to be the wrong tool for what I need, let me know as well! Thank you, Walter
Re: Storing metadata from post parameters and XML
Stefan, You're right. I was attempting to post some quick pseudo-code, but that is pretty misleading, they should have been elements, like /abc/def/ghi/123.xml, or something to that affect. Thanks, Walter On Mon, Jan 10, 2011 at 10:08 AM, Stefan Matheis < matheis.ste...@googlemail.com> wrote: > Hey Walter, > > what's against just putting your db-location in a 'string' field, and use > it > like any other value? > There is no special field-type for something like a > path/directory/location-information, afaik. > > Regards > Stefan > > On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight < > walter.p.closenflei...@gmail.com> wrote: > > > I'm very unclear on how to associate what I need to a Solr index entry. > > Based on what I've read thus far, you can extract data from text files > and > > store that in a Solr document. > > > > I have hundreds of thousands of documents in a database/svn type system. > > When I index a file, it is likely going to be local to the filesystem and > I > > know the location it will take on in the database. So, when I index, I > want > > to provide a path that it can find it when someone else does a search. > > > > 123.xml may look like: > > > > > > my title > > Every foobar has its day > > My caption > > > > > > and the proprietary location I want it to be associated with is: > > > > /abc/def/ghi/123.xml > > > > So, when a user does a search for "foobar", it returns some information > > about 123.xml but most importantly the location should be available. > > > > I have yet to find (in the schema.xml or otherwise) where you can define > > that path to store, and how you would pass along that parameter in the > > indexing of that document. > > > > Instead, from the examples I can find, including the book, you store > fields > > from your data into the index. In the book's examples (a music database), > > searching for "Cherub Rock" returns a list of with their duration, track > > name, album name, and artist. In other words, the full text data you > > retrieve is the only information the search index has to offer. > > > > Just for example, using the exampledocs post.jar, I'm envisioning > something > > like this: > > > > java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1 > > "xxx" -othermeta2 "zzz" > > > > Then the Solr doc would look like: > > > > 123 > > /abc/def/ghi/123.xml > > xxx > > zzz > > my title > > /abc/xxx.gif > > Every foobar has its day My caption > > > > > > This way, when a user searches for foobar, they get item 123 back, review > > the search result and if they decide that's the data they want, they can > > use > > the dblocation field to retrieve the data for editing purposes (and then > > re-index it following their edits). > > > > I'm guessing I just haven't found the right terms yet to look into, as > I'm > > very new to this. Thanks for any direction you can provide. Also, if Solr > > appears to be the wrong tool for what I need, let me know as well! > > > > Thank you, > > Walter > > >
Solr - search queries not returning results
Hello everyone, I believe I am missing something very elementary. The following query returns zero hits: http://localhost:8983/solr/core0/select/?q=testabc However, using solritas, it finds many results: http://localhost:8983/solr/core0/itas?q=testabc Do you have any idea what the issue may be? Thanks in advance!
Re: Solr - search queries not returning results
Thanks to both of you, I understand now and am now getting the expected results. Cheers! On Wed, Jun 29, 2011 at 2:21 AM, Ahmet Arslan wrote: > > > I believe I am missing something very elementary. The > > following query > > returns zero hits: > > > > http://localhost:8983/solr/core0/select/?q=testabc > > With this URL, you are hitting the RequestHandler defined as > in your > core0/conf/solrconfig.xml. > > > However, using solritas, it finds many results: > > > > http://localhost:8983/solr/core0/itas?q=testabc > > With this one, you are hitting the one registered as name="/itas"> > > > Do you have any idea what the issue may be? > > Probably they have different default parameters configured. > > For example (e)dismax versus lucene query parser. lucene query parser > searches testabc in your default field. dismax searches it in all of the > fields defined in qf parameter. > > You can see the full parameter list by appending &echoParams=all to your > search URL. >
Methods for preserving text entities?
We have some text entities in fields to index (and search) like so: Solr is a really &myword; search engine! I would like to preserve/protect &myword; and not resolve it in the indexing or search results. What sort of methods have people used? I realize the results are returned in XML format, so preserving these text entities may be hard. Are people replacing the "&" character or doing something else? Thanks in advance!
Re: Methods for preserving text entities?
Ah, I think I suddenly answered my own question, but appreciate further insight if you have it. I converted the & in &myword; to an & so it looks like this: Solr is a really &myword; search engine! On Wed, Jun 29, 2011 at 12:40 PM, Walter Closenfleight < walter.p.closenflei...@gmail.com> wrote: > We have some text entities in fields to index (and search) like so: > > Solr is a really &myword; search engine! > > I would like to preserve/protect &myword; and not resolve it in the > indexing or search results. > > What sort of methods have people used? I realize the results are returned > in XML format, so preserving these text entities may be hard. Are people > replacing the "&" character or doing something else? > > Thanks in advance! >
Multicore clustering setup problem
I had set up the clusteringComponent in solrconfig.xml for my first core. It has been working fine and now I want to get my next core working. I set up the second core with the clustering component so that I could use it, use solritas properly, etc. but Solr did not like the solrconfig.xml changes for the second core. I'm getting this error when Solr is started or when I hit a Solr related URL: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' Should the clusteringComponent be set up in a shared configuration file somehow or is there something else I am doing wrong? Thanks in advance!
Re: Multicore clustering setup problem
lrResourceLoader.java:359) ... 38 more Jun 30, 2011 8:51:23 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: user.dir=/opt/tomcat/solr/myapp On Thu, Jun 30, 2011 at 12:57 AM, Stanislaw Osinski < stanislaw.osin...@carrotsearch.com> wrote: > Hi, > > Can you post the full strack trace? I'd need to know if it's > really org.apache.solr.handler.clustering.ClusteringComponent that's > missing > or some other class ClusteringComponent depends on. > > Cheers, > > Staszek > > On Thu, Jun 30, 2011 at 04:19, Walter Closenfleight < > walter.p.closenflei...@gmail.com> wrote: > > > I had set up the clusteringComponent in solrconfig.xml for my first core. > > It > > has been working fine and now I want to get my next core working. I set > up > > the second core with the clustering component so that I could use it, use > > solritas properly, etc. but Solr did not like the solrconfig.xml changes > > for > > the second core. I'm getting this error when Solr is started or when I > hit > > a > > Solr related URL: > > > > SEVERE: org.apache.solr.common.SolrException: Error loading class > > 'org.apache.solr.handler.clustering.ClusteringComponent' > > > > Should the clusteringComponent be set up in a shared configuration file > > somehow or is there something else I am doing wrong? > > > > Thanks in advance! > > >
Re: Multicore clustering setup problem
Staszek, That makes sense, but this has always been a multi-core setup, so the paths have not changed, and the clustering component worked fine for core0. The only thing new is I have fine tuned core1 (to begin implementing it). Previously the solrconfig.xml file was very basic. I replaced it with core0's solrconfig.xml and made very minor changes to it (unrelated to clustering) - it's a nearly identical solrconfig.xml file so I'm surprised it doesn't work for core1. In other words, the paths here are the same for core0 and core1: Again, I'm wondering if perhaps since both cores have the clustering component, if it should have a shared configuration in a different file used by both cores(?). Perhaps the duplicate clusteringComponent configuration for both cores is the problem? Thanks for looking at this! On Thu, Jun 30, 2011 at 1:29 PM, Stanislaw Osinski < stanislaw.osin...@carrotsearch.com> wrote: > It looks like the whole clustering component JAR is not in the classpath. I > remember that I once dealt with a similar issue in Solr 1.4 and the cause > was the relative path of the tag being resolved against the core's > instanceDir, which made the path incorrect when directly copying and > pasting > from the single core configuration. Try correcting the relative paths > or replacing them with absolute ones, it should solve the problem. > > Cheers, > > Staszek >
Solrj - when a request fails
* I have a java program which sends thousands of Solr XML files up to Solr using the following code. It works fine until there is a problem with one of the Solr XML files. The code fails on the solrServer.request(up) line, but it does not throw an exception, my application therefore cannot catch it and recover, and my whole application dies. I've fixed this individual file that made it fail, but want to better trap these so my application does not die. Thanks for any insight you can provide. Java code and log below- // ... start of a loop to process each file removed ... try { String xml = read(filename); DirectXmlRequest up = new DirectXmlRequest( "/update", xml ); solrServer.request( up ); solrServer.commit(); } catch (SolrServerException e) { log.warn("Exception: "+ e.toString()); throw new MyException(e); } catch (IOException e) { log.warn("Exception: "+ e.toString()); throw new MyException(e); } DEBUG >> "[\n]" - (Wire.java:70) DEBUG Request body sent - (EntityEnclosingMethod.java:508) DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70) DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70) DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70) DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70) DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70) DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70) DEBUG << "Connection: close[\r][\n]" - (Wire.java:70) DEBUG << "[\r][\n]" - (Wire.java:70) DEBUG << "Apache Tomcat/6.0.29 - Error report HTTP Status 400 - Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" - (Wire.java:70) DEBUG << " at [row,col {unknown-source}]: [3,1]type Status reportmessage Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" - (Wire.java:70) DEBUG << " at [row,col {unknown-source}]: [3,1]description " - (Wire.java:84) DEBUG << "The request sent by the client was syntactically incorrect (Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" - (Wire.java:70) DEBUG << " at [row,col {unknown-source}]: [3,1]).Apache Tomcat/6.0.29" - (Wire.java:84) DEBUG Should close connection in response to directive: close - (HttpMethodBase.java:1008) *
Re: Solrj - when a request fails
I tried that with the same results. You would think I would get the exception back from Solr so I could trap it, instead I lose all other requests after it. On Fri, Sep 23, 2011 at 8:33 AM, Gunther, Andrew wrote: > All the solr methods look like they should throw those 2 exceptions. > Have you tried the DirectXmlRequest method? > > up.process(solrServer); > > public UpdateResponse process( SolrServer server ) throws > SolrServerException, IOException > { >long startTime = System.currentTimeMillis(); >UpdateResponse res = new UpdateResponse(); >res.setResponse( server.request( this ) ); >res.setElapsedTime( System.currentTimeMillis()-startTime ); >return res; > } > ____ > From: Walter Closenfleight [walter.p.closenflei...@gmail.com] > Sent: Friday, September 23, 2011 8:55 AM > To: solr-user@lucene.apache.org > Subject: Solrj - when a request fails > > * > I have a java program which sends thousands of Solr XML files up to Solr > using the following code. It works fine until there is a problem with one > of > the Solr XML files. The code fails on the solrServer.request(up) line, but > it does not throw an exception, my application therefore cannot catch it > and > recover, and my whole application dies. > > I've fixed this individual file that made it fail, but want to better trap > these so my application does not die. > > Thanks for any insight you can provide. Java code and log below- > > > // ... start of a loop to process each file removed ... > > try { > > String xml = read(filename); > DirectXmlRequest up = new DirectXmlRequest( "/update", xml ); > > solrServer.request( up ); > solrServer.commit(); > > } catch (SolrServerException e) { > log.warn("Exception: "+ e.toString()); > throw new MyException(e); > } catch (IOException e) { > log.warn("Exception: "+ e.toString()); > throw new MyException(e); > } > DEBUG >> "[\n]" - (Wire.java:70) > DEBUG Request body sent - (EntityEnclosingMethod.java:508) > DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70) > DEBUG << "HTTP/1.1 400 Bad Request[\r][\n]" - (Wire.java:70) > DEBUG << "Server: Apache-Coyote/1.1[\r][\n]" - (Wire.java:70) > DEBUG << "Content-Type: text/html;charset=utf-8[\r][\n]" - (Wire.java:70) > DEBUG << "Content-Length: 1271[\r][\n]" - (Wire.java:70) > DEBUG << "Date: Fri, 23 Sep 2011 12:08:05 GMT[\r][\n]" - (Wire.java:70) > DEBUG << "Connection: close[\r][\n]" - (Wire.java:70) > DEBUG << "[\r][\n]" - (Wire.java:70) > DEBUG << "Apache Tomcat/6.0.29 - Error > report<!--H1 > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} > H2 > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} > H3 > > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} > BODY > {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B > {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} > P > > {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A > {color : black;}A.name {color : black;}HR {color : #525D76;}--> > HTTP Status 400 - Unexpected character 'x' (code 120) in > prolog; expected '<'[\n]" - (Wire.java:70) > DEBUG << " at [row,col {unknown-source}]: [3,1] noshade="noshade">type Status reportmessage > Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" - > (Wire.java:70) > DEBUG << " at [row,col {unknown-source}]: > [3,1]description > " - (Wire.java:84) > DEBUG << "The request sent by the client was syntactically incorrect > (Unexpected character 'x' (code 120) in prolog; expected '<'[\n]" - > (Wire.java:70) > DEBUG << " at [row,col {unknown-source}]: [3,1]). noshade="noshade">Apache Tomcat/6.0.29" - > (Wire.java:84) > DEBUG Should close connection in response to directive: close - > (HttpMethodBase.java:1008) > * >