Highlights not returning after upgrading from 3.3 to 4.0
I'm running some test on Solr 4.0 before putting it into production, and I've just encountered an issue with hit highlighting. I started by placing my index from 3.3 into a Solr 4.0 install. I then edited the field definition in the schema config to match my schema from 3.3, with the addition of the new _version_ field. I also modified the solrconfig.xml defaults to match my previous 3.3 config (only editing the default df). Once I had everything configured, I started the application, then called optimize. After optimization had completed, I executed the following query: http:// server/solr/select?indent=onversion=2.2q=omegafq=start=0rows=50fl=*%2Cscoreqt=wt=explainOther=hl=onhl.fl=contents This query returned a highlight element in the return packet, but the fragment section was empty in Solr 4, while it returned fragments in 3.3. Do I need to make any additional changes? The default field is contents, which is a text_en field.
Re: Generating large datasets for Solr proof-of-concept
I've done it using SolrJ and a *lot *of of parallel processes feeding dummy data into the server. On Thu, Sep 15, 2011 at 4:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote: Hello Everyone, I have a goal of populating Solr with a million unique products in order to create a test environment for a proof of concept. I started out by using DIH with Amazon RSS feeds but I've quickly realized that there's no way I can glean a million products from one RSS feed. And I'd go mad if I just sat at my computer all day looking for feeds and punching them into DIH config for Solr. Has anyone ever had to create large mock/dummy datasets for test environments or for POCs/Demos to convince folks that Solr was the wave of the future? Any tips would be greatly appreciated. I suppose it sounds a lot like crawling even though it started out as innocent DIH usage. - Pulkit
Re: Where the heck do you put maxAnalyzedChars?
Thanks. That seemed to do it. I was thrown by the section of documentation that said This parameter makes sense for Highlighter only and tried to put it in the various highlighter elements. On Wed, Aug 24, 2011 at 6:52 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/08/25 5:29), Daniel Skiles wrote: I have a very large field in my index that I need to highlight. Where in the config file do I set the maxAnalyzedChars in order to make this work? Has anyone successfully done this? Placing it in your requestHandler should work. For example: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults int name=hl.maxAnalyzedChars**1000/int /lst /requestHandler koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
commitWithin + SolrJ
What is the cleanest way to use the commitWithin directive with SolrJ? AbstractUpdateRequest has a setCommitWithin() method, but I don't see how to hook that into SolrServer.add(SolrInputDocument doc). Do I need to use SolrServer.request(), or do I need to use some other method? Thanks.
Re: commitWithin + SolrJ
I ended up doing this with request.process(server) on an UpdateRequest class. On Wed, Aug 24, 2011 at 2:07 PM, Daniel Skiles daniel.ski...@docfinity.comwrote: What is the cleanest way to use the commitWithin directive with SolrJ? AbstractUpdateRequest has a setCommitWithin() method, but I don't see how to hook that into SolrServer.add(SolrInputDocument doc). Do I need to use SolrServer.request(), or do I need to use some other method? Thanks.
Where the heck do you put maxAnalyzedChars?
I have a very large field in my index that I need to highlight. Where in the config file do I set the maxAnalyzedChars in order to make this work? Has anyone successfully done this?
Re: automatically dealing with out of memory exceptions
I've gotten around that by using the Java Service Wrapperhttp://wrapper.tanukisoftware.comfrom Tanuki Soft to restart the entire container. On Wed, Aug 24, 2011 at 5:28 PM, Jason Toy jason...@gmail.com wrote: After running a combination of different queries, my solr server eventually is unable to complete certain requests because it runs out of memory, which means I need to restart the server as its basically useless with some queries working and not others. I am moving to distributed setting soon, but in the meantime how can I deal with automatically restarting the server when it fails on certain queries? I don't know what the queries it will die on are ahead of time, so I can't ping the server for certain queries to see if its dying.
Re: SSD experience
I haven't tried it with Solr yet, but with straight Lucene about two years ago we saw about a 40% boost in performance on our tests with no changes except the disk. On Mon, Aug 22, 2011 at 10:54 AM, Rich Cariens richcari...@gmail.comwrote: Ahoy ahoy! Does anyone have any experiences or stories they can share with the list about how SSDs impacted search performance for better or worse? I found a Lucene SSD performance benchmark doc http://wiki.apache.org/lucene-java/SSD_performance?action=AttachFiledo=viewtarget=combined-disk-ssd.pdf but the wiki engine is refusing to let me view the attachment (I get You are not allowed to do AttachFile on this page.). Thanks in advance!
Re: Return records based on aggregate functions?
It's actually an analyzed String. I figured that out after the first test run. On Thu, Aug 18, 2011 at 9:00 AM, Erick Erickson erickerick...@gmail.comwrote: Side comment: Is your content field really a string value in your schema.xml? that's an un-analyzed type and unless you're always searching for *exactly* the full contents of the field, you'll have problems Best Erick On Wed, Aug 17, 2011 at 2:20 PM, Daniel Skiles daniel.ski...@docfinity.com wrote: I've recently started using Solr and I'm stumped by a problem I'm currently encountering. Given that I can't really find anything close to what I'm trying to do on Google or the mailing lists, I figured I'd ask if anyone here had suggestions on how to do it. I currently have a schema that looks more or less like this: uniqueId (string) -- Unique identifier for a record documentId (string) -- Id of document represented by this record contents (string) -- contents of file represented by this record version (float) -- Numeric representation of the version of this document What I'd like to do is submit a query to the server that returns records that match against contents, but only if the record has a version field that is the largest value for all records that share the same documentId. In other words, I'd like to be able to only search the most recent version of a document in some scenarios. Is this possible with Solr? I'm at an early enough phase that I'm also able to modify my solr schema if necessary. Thank you, Daniel
Return records based on aggregate functions?
I've recently started using Solr and I'm stumped by a problem I'm currently encountering. Given that I can't really find anything close to what I'm trying to do on Google or the mailing lists, I figured I'd ask if anyone here had suggestions on how to do it. I currently have a schema that looks more or less like this: uniqueId (string) -- Unique identifier for a record documentId (string) -- Id of document represented by this record contents (string) -- contents of file represented by this record version (float) -- Numeric representation of the version of this document What I'd like to do is submit a query to the server that returns records that match against contents, but only if the record has a version field that is the largest value for all records that share the same documentId. In other words, I'd like to be able to only search the most recent version of a document in some scenarios. Is this possible with Solr? I'm at an early enough phase that I'm also able to modify my solr schema if necessary. Thank you, Daniel
Re: Return records based on aggregate functions?
Woah. That looks like exactly what I need. Thanks you very much. Is there any documentation for how to do that using the SolrJ API? On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James james.d...@ingrambook.comwrote: Daniel, This looks like a good usecase for FieldCollapsing (see http://wiki.apache.org/solr/FieldCollapsing). Perhaps try something like: group=truegroup.field=documentIdgroup.limit=1group.sort=version desc James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Daniel Skiles [mailto:daniel.ski...@docfinity.com] Sent: Wednesday, August 17, 2011 1:20 PM To: solr-user@lucene.apache.org Subject: Return records based on aggregate functions? I've recently started using Solr and I'm stumped by a problem I'm currently encountering. Given that I can't really find anything close to what I'm trying to do on Google or the mailing lists, I figured I'd ask if anyone here had suggestions on how to do it. I currently have a schema that looks more or less like this: uniqueId (string) -- Unique identifier for a record documentId (string) -- Id of document represented by this record contents (string) -- contents of file represented by this record version (float) -- Numeric representation of the version of this document What I'd like to do is submit a query to the server that returns records that match against contents, but only if the record has a version field that is the largest value for all records that share the same documentId. In other words, I'd like to be able to only search the most recent version of a document in some scenarios. Is this possible with Solr? I'm at an early enough phase that I'm also able to modify my solr schema if necessary. Thank you, Daniel
Re: Return records based on aggregate functions?
For response option 1, would I add the group.main=true and group.format=simple parameters to the SolrQuery object? On Wed, Aug 17, 2011 at 3:09 PM, Dyer, James james.d...@ingrambook.comwrote: For the request end, you can just use something like: solrquery.add(group, true); ..etc.. For the response, you have 3 options: 1. specify group.main=truegroup.format=simple . (note: When I tested this on a nightly build from back in February I noticed a significant performance impact from using these params although I imagine the version that is committed to 3.3 does not have this problem.) This will return your 1-document-per-group as if it is a regular non-grouped query and the response will come back just like any other query. (see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr and the javadocs: http://lucene.apache.org/solr/api/overview-summary.htmlthen scroll to the solrj section.) 2. Full SolrJ support was just added to the 3.x branch so you'll have to use a nightly build (which ought to be stable production-quality). See https://issues.apache.org/jira/browse/SOLR-2637 for more information. After building the solrj documentation, look for classes that start with Group 3. See this posting on how to parse the response by-hand. This is for a slightly older version of Field Collapsing than what was committed so it might not be 100% accurate. http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Daniel Skiles [mailto:daniel.ski...@docfinity.com] Sent: Wednesday, August 17, 2011 1:32 PM To: solr-user@lucene.apache.org Subject: Re: Return records based on aggregate functions? Woah. That looks like exactly what I need. Thanks you very much. Is there any documentation for how to do that using the SolrJ API? On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James james.d...@ingrambook.com wrote: Daniel, This looks like a good usecase for FieldCollapsing (see http://wiki.apache.org/solr/FieldCollapsing). Perhaps try something like: group=truegroup.field=documentIdgroup.limit=1group.sort=version desc James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Daniel Skiles [mailto:daniel.ski...@docfinity.com] Sent: Wednesday, August 17, 2011 1:20 PM To: solr-user@lucene.apache.org Subject: Return records based on aggregate functions? I've recently started using Solr and I'm stumped by a problem I'm currently encountering. Given that I can't really find anything close to what I'm trying to do on Google or the mailing lists, I figured I'd ask if anyone here had suggestions on how to do it. I currently have a schema that looks more or less like this: uniqueId (string) -- Unique identifier for a record documentId (string) -- Id of document represented by this record contents (string) -- contents of file represented by this record version (float) -- Numeric representation of the version of this document What I'd like to do is submit a query to the server that returns records that match against contents, but only if the record has a version field that is the largest value for all records that share the same documentId. In other words, I'd like to be able to only search the most recent version of a document in some scenarios. Is this possible with Solr? I'm at an early enough phase that I'm also able to modify my solr schema if necessary. Thank you, Daniel