date:20130408

Understanding the Solr Admin page

2013-04-08 Thread Dotan Cohen

I am expanding my Solr skills and would like to understand the Admin
page better. I understand that understanding Java memory management
and Java memory options will help me, and I am reading and
experimenting on that front, but if there are any concise resources
that are especially pertinent to Solr I would love to know about them.
Everything that I've found is either a do this one-liner or expects
Java experience which I don't have and don't know what I need to
learn.

I notice that some of the Args presented are in black text, and others
in grey. Why are  they presented differently? Where would I have found
this information in the fine manual?

When I start Solr with nohup, the resulting nohup.out file is _huge_.
How might I start Solr such that INFO is not output, but only WARNINGs
and SEVEREs are. In particular, I'd rather not log every query, even
the invalid queries which also log as SEVERE. I thought that this
would be easy to Google for, but it is not! If there is a concise
document that examines this issue, I would love to know where on the
wild wild web it exists.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

solr 4.2.1 and docValues

2013-04-08 Thread Bernd Fehling

Hi list,

I want to try docValues for my facets and sorting with solr 4.2.1
and have already seen many papers, examples and source code about
and around docValues, but there are still some questions.

The example schema.xml has fields:
field name=popularity type=int indexed=true stored=true /
field name=manu_exact type=string indexed=true stored=false/

and it has a comment for docValues:
!--
Some fields such as popularity and manu_exact could be modified to
leverage doc values:
field name=popularity type=int indexed=true stored=true 
docValues=true default=0 /
field name=manu_exact type=string indexed=false stored=false 
docValues=true default= /
...
--

For popularity with docValues indexed and stored are true and default is 0.
For manu_exact with docValues indexed and stored are false and default is an 
empty string.

Questions:
- if docValues is true will this replace indexed=true as for field manu_exact?

- what is the advantage of having indexed=true and docvalues=true?

- what if default= also for the popularity int field?

Regards
Bernd

Re: Out of memory on some faceting queries

2013-04-08 Thread Dotan Cohen

On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey s...@elyograg.org wrote:
 On 4/2/2013 3:09 AM, Dotan Cohen wrote:
 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

 It looks like you've followed some advice that I gave previously on how
 to tune java.  I have since learned that this advice is bad, it results
 in long GC pauses, even with heaps that aren't huge.


I see, thanks.

 As others have pointed out, you don't have a max heap setting, which
 would mean that you're using whatever Java chooses for its default,
 which might not be enough.  If you can get Solr to successfully run for
 a while with queries and updates happening, the heap should eventually
 max out and the admin UI will show you what Java is choosing by default.

 Here is what I would now recommend for a beginning point on your Solr
 startup command.  You may need to increase the heap beyond 4GB, but be
 careful that you still have enough free memory to be able to do
 effective caching of your index.

 sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC
 -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3
 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
 -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 


Thank you, I will experiment with that.

 If you are running a really old build of java (latest versions on
 Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to
 leave AggressiveOpts out.  Some people would argue that you should never
 use that option.


Great, thank for the warning. This is what we're running, I'll see
about updating it through my distro's package manager:
$ java -version
java version 1.6.0_27
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: maxWarmingSearchers in Solr 4.

2013-04-08 Thread Dotan Cohen

On Thu, Apr 4, 2013 at 10:54 PM, Shawn Heisey s...@elyograg.org wrote:
 You'll want to ensure that your autowarmCount value on Solr's caches is low
 enough that each commit happens quickly.  If it takes 5000 milliseconds to
 warm the caches when you commit, then you want to be sure that you are
 committing less often than that, or you'll quickly reach your
 maxWarmingSearchers config value.  If the commits are happening VARY
 quickly, you may need to set autowarmCount to 0, and possibly disable caches
 entirely.


I see. This seems to be the opposite of the approach that I was taking.


 I went poking in the code, and it seems that maxWarmingSearchers
 defaults to Integer.MAX_VALUE.  I'm not sure whether this is a bad
 default or not.  It does mean that a pathological setup without
 maxWarmingSearchers in the config will probably blow up with an
 OutOfMemory exception, but is that better or worse than commits that
 don't make new documents searchable?  I can see arguments either way.


 This is interesting, what you found is that the value in the stock
 solrconfig.xml file differs from the Solr default value. I think that
 this is bad practice: a single default should be decided upon and Solr
 should use this value when nothing is specified in solrconfig.xml, and
 that _same_value_ should be specified in the stock solrconfig.xml. Is
 it not a reasonable assumption that this would be the case?


 That was directed more at the other committers.  I would argue that either a
 low number or a relatively high number should be the default, but not
 MAX_VALUE.  The example config should have a commented out section for
 maxWarmingSearchers that mentions the default.  I'm having the same
 discussion about maxBooleanClauses on SOLR-4586.


Right.


 It's possible that this has already been discussed, and that everyone
 prefers that a badly configured setup will eventually have a spectacular
 blow up with OutOfMemory, rather than semi-silently ignoring commits.  A
 searcher object contains caches and uses a lot of memory, so having lots of
 them around will eventually use up the entire heap.


Silently dropping data is by far the worse choice, I agree, especially
as a default setting.


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Filtered search term suggestions via Facet Prefixing or NGrams

2013-04-08 Thread Andreas Hubold

Hi,

can somebody help, please? Maybe you can at least answer parts of my
question?

I'd expect that somebody at least knows the limitations of faceting with
UninvertedField?

Thank you,
Andreas

Andreas Hubold wrote on 04.04.2013 13:30:

Hi,

we've successfully implemented suggestion of search terms using facet
prefixing with Solr 4.0. However, with lots of unique index terms
we've encountered performance problems (long running queries) and even
exceptions: Too many values for UnInvertedField faceting on field
textbody.

We must provide suggestions based on a prefix entered by the user. The
solution should use the terms from an indexed text field. Furthermore
the suggestions must be filtered according to some specified filter
queries.

Do you have any performance tips for facet prefixing or know how to
avoid the above exception even in the case of many unique terms?

What is causing the above exception: a) the total number of unique
terms in the field or b) the number of unique terms in the field of a
single document
If b), is there a way to find such documents easily? Do you know how
many unique terms can be handled without problems by facet prefixing?

I've read the blog post
http://www.searchworkings.org/blog/-/blogs/different-ways-to-make-auto-suggestions-with-solr
which describes NGrams as another possible approach to implement
suggestions with filtering. I would expect that this approach provides
better query performance (at the cost of increased index size).
However I haven't found detailed information how to implement it. I
know how to configure a field for ngrams and how to perform a query
using that field. But the results just give me the document but not
the matched terms. Or am I expected to use a stored field and inspect
its value?

I also found this blog post where the Highlighter is used in
combination with ngrams to provide suggestions:
http://solr.pl/en/2013/02/25/autocomplete-on-multivalued-fields-using-highlighting/
Can this be used to get the suggested terms from a document? What
about performance? Will such an approach perform better than facet
prefixing for large text fields with lots of unique terms?

60 matches

Mail list logo