Highlights not returning after upgrading from 3.3 to 4.0

2012-10-25 Thread Daniel Skiles
I'm running some test on Solr 4.0 before putting it into production, and
I've just encountered an issue with hit highlighting.

I started by placing my index from 3.3 into a Solr 4.0 install.  I then
edited the field definition in the schema config to match my schema from
3.3, with the addition of the new _version_ field.  I also modified the
solrconfig.xml defaults to match my previous 3.3 config (only editing the
default df).

Once I had everything configured, I started the application, then called
optimize.

After optimization had completed, I executed the following query:

http://
server/solr/select?indent=onversion=2.2q=omegafq=start=0rows=50fl=*%2Cscoreqt=wt=explainOther=hl=onhl.fl=contents

This query returned a highlight element in the return packet, but the
fragment section was empty in Solr 4, while it returned fragments in 3.3.
Do I need to make any additional changes?  The default field is contents,
which is a text_en field.


Re: Generating large datasets for Solr proof-of-concept

2011-09-15 Thread Daniel Skiles
I've done it using SolrJ and a *lot *of of parallel processes feeding dummy
data into the server.

On Thu, Sep 15, 2011 at 4:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:

 Hello Everyone,

 I have a goal of populating Solr with a million unique products in
 order to create a test environment for a proof of concept. I started
 out by using DIH with Amazon RSS feeds but I've quickly realized that
 there's no way I can glean a million products from one RSS feed. And
 I'd go mad if I just sat at my computer all day looking for feeds and
 punching them into DIH config for Solr.

 Has anyone ever had to create large mock/dummy datasets for test
 environments or for POCs/Demos to convince folks that Solr was the
 wave of the future? Any tips would be greatly appreciated. I suppose
 it sounds a lot like crawling even though it started out as innocent
 DIH usage.

 - Pulkit



Re: Where the heck do you put maxAnalyzedChars?

2011-08-25 Thread Daniel Skiles
Thanks.  That seemed to do it.  I was thrown by the section of documentation
that said This parameter makes sense for Highlighter only and tried to put
it in the various highlighter elements.

On Wed, Aug 24, 2011 at 6:52 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (11/08/25 5:29), Daniel Skiles wrote:

 I have a very large field in my index that I need to highlight.  Where in
 the config file do I set the maxAnalyzedChars in order to make this work?
 Has anyone successfully done this?


 Placing it in your requestHandler should work. For example:

 requestHandler name=search class=solr.SearchHandler default=true
  lst name=defaults
int name=hl.maxAnalyzedChars**1000/int
  /lst
 /requestHandler

 koji
 --
 Check out Query Log Visualizer for Apache Solr
 http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/



commitWithin + SolrJ

2011-08-24 Thread Daniel Skiles
What is the cleanest way to use the commitWithin directive with SolrJ?
AbstractUpdateRequest has a setCommitWithin() method, but I don't see how to
hook that into SolrServer.add(SolrInputDocument doc).

Do I need to use SolrServer.request(), or do I need to use some other
method?

Thanks.


Re: commitWithin + SolrJ

2011-08-24 Thread Daniel Skiles
I ended up doing this with request.process(server) on an UpdateRequest
class.

On Wed, Aug 24, 2011 at 2:07 PM, Daniel Skiles
daniel.ski...@docfinity.comwrote:

 What is the cleanest way to use the commitWithin directive with SolrJ?
 AbstractUpdateRequest has a setCommitWithin() method, but I don't see how to
 hook that into SolrServer.add(SolrInputDocument doc).

 Do I need to use SolrServer.request(), or do I need to use some other
 method?

 Thanks.



Where the heck do you put maxAnalyzedChars?

2011-08-24 Thread Daniel Skiles
I have a very large field in my index that I need to highlight.  Where in
the config file do I set the maxAnalyzedChars in order to make this work?
Has anyone successfully done this?


Re: automatically dealing with out of memory exceptions

2011-08-24 Thread Daniel Skiles
I've gotten around that by using the Java Service
Wrapperhttp://wrapper.tanukisoftware.comfrom Tanuki Soft to restart
the entire container.

On Wed, Aug 24, 2011 at 5:28 PM, Jason Toy jason...@gmail.com wrote:

 After running a combination of different queries, my solr server eventually
 is unable to complete certain requests because it runs out of memory, which
 means I need to restart the server as its basically useless with some
 queries working and not others.   I am moving to distributed setting soon,
 but in the meantime how can I deal with automatically restarting the server
 when it fails on certain queries?   I don't know what the queries it will
 die on are ahead of time, so I can't ping the server for certain queries to
 see if its dying.



Re: SSD experience

2011-08-22 Thread Daniel Skiles
I haven't tried it with Solr yet, but with straight Lucene about two years
ago we saw about a 40% boost in performance on our tests with no changes
except the disk.

On Mon, Aug 22, 2011 at 10:54 AM, Rich Cariens richcari...@gmail.comwrote:

 Ahoy ahoy!

 Does anyone have any experiences or stories they can share with the list
 about how SSDs impacted search performance for better or worse?

 I found a Lucene SSD performance benchmark
 doc
 http://wiki.apache.org/lucene-java/SSD_performance?action=AttachFiledo=viewtarget=combined-disk-ssd.pdf
 but
 the wiki engine is refusing to let me view the attachment (I get You
 are not allowed to do AttachFile on this page.).

 Thanks in advance!



Re: Return records based on aggregate functions?

2011-08-18 Thread Daniel Skiles
It's actually an analyzed String.  I figured that out after the first test
run.

On Thu, Aug 18, 2011 at 9:00 AM, Erick Erickson erickerick...@gmail.comwrote:

 Side comment: Is your content field really a string value in your
 schema.xml? that's an un-analyzed type and unless you're
 always searching for *exactly* the full contents of the field,
 you'll have problems

 Best
 Erick

 On Wed, Aug 17, 2011 at 2:20 PM, Daniel Skiles
 daniel.ski...@docfinity.com wrote:
  I've recently started using Solr and I'm stumped by a problem I'm
 currently
  encountering.  Given that I can't really find anything close to what I'm
  trying to do on Google or the mailing lists, I figured I'd ask if anyone
  here had suggestions on how to do it.
 
  I currently have a schema that looks more or less like this:
 
  uniqueId (string) -- Unique identifier for a record
  documentId (string) -- Id of document represented by this record
  contents (string) -- contents of file represented by this record
  version (float) -- Numeric representation of the version of this document
 
 
  What I'd like to do is submit a query to the server that returns records
  that match against contents, but only if the record has a version field
 that
  is the largest value for all records that share the same documentId.
 
  In other words, I'd like to be able to only search the most recent
 version
  of a document in some scenarios.
 
  Is this possible with Solr?  I'm at an early enough phase that I'm also
 able
  to modify my solr schema if necessary.
 
  Thank you,
  Daniel
 



Return records based on aggregate functions?

2011-08-17 Thread Daniel Skiles
I've recently started using Solr and I'm stumped by a problem I'm currently
encountering.  Given that I can't really find anything close to what I'm
trying to do on Google or the mailing lists, I figured I'd ask if anyone
here had suggestions on how to do it.

I currently have a schema that looks more or less like this:

uniqueId (string) -- Unique identifier for a record
documentId (string) -- Id of document represented by this record
contents (string) -- contents of file represented by this record
version (float) -- Numeric representation of the version of this document


What I'd like to do is submit a query to the server that returns records
that match against contents, but only if the record has a version field that
is the largest value for all records that share the same documentId.

In other words, I'd like to be able to only search the most recent version
of a document in some scenarios.

Is this possible with Solr?  I'm at an early enough phase that I'm also able
to modify my solr schema if necessary.

Thank you,
Daniel


Re: Return records based on aggregate functions?

2011-08-17 Thread Daniel Skiles
Woah.  That looks like exactly what I need.  Thanks you very much.  Is there
any documentation for how to do that using the SolrJ API?

On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James james.d...@ingrambook.comwrote:

 Daniel,

 This looks like a good usecase for FieldCollapsing (see
 http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:

 group=truegroup.field=documentIdgroup.limit=1group.sort=version desc

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
 Sent: Wednesday, August 17, 2011 1:20 PM
 To: solr-user@lucene.apache.org
 Subject: Return records based on aggregate functions?

 I've recently started using Solr and I'm stumped by a problem I'm currently
 encountering.  Given that I can't really find anything close to what I'm
 trying to do on Google or the mailing lists, I figured I'd ask if anyone
 here had suggestions on how to do it.

 I currently have a schema that looks more or less like this:

 uniqueId (string) -- Unique identifier for a record
 documentId (string) -- Id of document represented by this record
 contents (string) -- contents of file represented by this record
 version (float) -- Numeric representation of the version of this document


 What I'd like to do is submit a query to the server that returns records
 that match against contents, but only if the record has a version field
 that
 is the largest value for all records that share the same documentId.

 In other words, I'd like to be able to only search the most recent version
 of a document in some scenarios.

 Is this possible with Solr?  I'm at an early enough phase that I'm also
 able
 to modify my solr schema if necessary.

 Thank you,
 Daniel



Re: Return records based on aggregate functions?

2011-08-17 Thread Daniel Skiles
For response option 1, would I add the group.main=true and
group.format=simple parameters to the SolrQuery object?

On Wed, Aug 17, 2011 at 3:09 PM, Dyer, James james.d...@ingrambook.comwrote:

 For the request end, you can just use something like:

 solrquery.add(group, true);
 ..etc..

 For the response, you have 3 options:

 1. specify group.main=truegroup.format=simple .  (note: When I tested
 this on a nightly build from back in February I noticed a significant
 performance impact from using these params although I imagine the version
 that is committed to 3.3 does not have this problem.)

 This will return your 1-document-per-group as if it is a regular
 non-grouped query and the response will come back just like any other query.
 (see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr
  and the javadocs: 
 http://lucene.apache.org/solr/api/overview-summary.htmlthen scroll to the 
 solrj section.)

 2. Full SolrJ support was just added to the 3.x branch so you'll have to
 use a nightly build (which ought to be stable  production-quality).  See
 https://issues.apache.org/jira/browse/SOLR-2637 for more information.
  After building the solrj documentation, look for classes that start with
 Group

 3. See this posting on how to parse the response by-hand.  This is for a
 slightly older version of Field Collapsing than what was committed so it
 might not be 100% accurate.
 http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
 Sent: Wednesday, August 17, 2011 1:32 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Return records based on aggregate functions?

 Woah.  That looks like exactly what I need.  Thanks you very much.  Is
 there
 any documentation for how to do that using the SolrJ API?

 On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James james.d...@ingrambook.com
 wrote:

  Daniel,
 
  This looks like a good usecase for FieldCollapsing (see
  http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something
 like:
 
  group=truegroup.field=documentIdgroup.limit=1group.sort=version desc
 
  James Dyer
  E-Commerce Systems
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Daniel Skiles [mailto:daniel.ski...@docfinity.com]
  Sent: Wednesday, August 17, 2011 1:20 PM
  To: solr-user@lucene.apache.org
  Subject: Return records based on aggregate functions?
 
  I've recently started using Solr and I'm stumped by a problem I'm
 currently
  encountering.  Given that I can't really find anything close to what I'm
  trying to do on Google or the mailing lists, I figured I'd ask if anyone
  here had suggestions on how to do it.
 
  I currently have a schema that looks more or less like this:
 
  uniqueId (string) -- Unique identifier for a record
  documentId (string) -- Id of document represented by this record
  contents (string) -- contents of file represented by this record
  version (float) -- Numeric representation of the version of this document
 
 
  What I'd like to do is submit a query to the server that returns records
  that match against contents, but only if the record has a version field
  that
  is the largest value for all records that share the same documentId.
 
  In other words, I'd like to be able to only search the most recent
 version
  of a document in some scenarios.
 
  Is this possible with Solr?  I'm at an early enough phase that I'm also
  able
  to modify my solr schema if necessary.
 
  Thank you,
  Daniel