AW: Occasionally getting error in solr suggester component.
Lucene 5.1: I am (also) facing java.lang.IllegalStateException: suggester was not built At the very moment no new documents seem tob e added to the index/core. Will a reboot sanitize the index/core? I (still) have str name=buildOnCommittrue/str How can I tell Solr to peridoically update the suggestions? If not possible per configuration (in solrconfig.xml), what ist he preferred approach through SolrJ? Thx Clemens -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Donnerstag, 15. Januar 2015 19:52 An: solr-user@lucene.apache.org Betreff: Re: Occasionally getting error in solr suggester component. That sounds like a good approach to me. Of course it depends how often you commit, and what your tolerance is for delay in having suggestions appear, but it sounds as if you have a good understanding of the tradeoffs there. -Mike On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote: Hi, From Solr 4.7 onwards, the implementation of this Suggester is changed. The old SpellChecker based search component is replaced with a new suggester that utilizes Lucene suggester module. The latest Solr download is preconfigured with this new suggester I;m using Solr 4.10 and suggestion are based on query /suggest instead of /spell. So what I did is that in changed to str name=buildOnCommitfalse/str Its not good that each time rebuild the index on commit , however, I would like to build the index on certain time period, say 1 hour. The lookup data will be built only when requested by URL parameter suggest.build=true http://localhost:8983/solr/ha/suggest?suggest.build=true; So this will rebuild the index again and the changes will reflect in the suggester. There are certain pros and cons for this. Issue is that the change will reflect only on certain time interval, here 1 hour. Advantage is that we can avoid the rebuilt index on every commit or optimize. Is this the right way ?? or any that I missed ??? Regards dhanesh s.r On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: did you build the spellcheck index using spellcheck.build as described here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ? -Mike On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote: Hi, Thanks for the reply. As you mentioned in the previous mail I changed buildOnCommit=false in solrConfig. After that change, suggestions are not working. In Solr 4.7 introduced a new approach based on a dedicated SuggestComponent I'm using that component to build suggestions and lookup implementation is AnalyzingInfixLookupFactory Is there any work around ?? On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true) -Mike On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote: Hi all, I am experiencing a problem in Solr SuggestComponent Occasionally solr suggester component throws an error like Solr failed: {responseHeader:{status:500,QTime:1},error:{msg:suggest er was not built,trace:java.lang.IllegalStateException: suggester was not built\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester. lookup(AnalyzingInfixSuggester.java:368)\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester. lookup(AnalyzingInfixSuggester.java:342)\n\tat org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\ tat org.apache.solr.spelling.suggest.SolrSuggester. getSuggestions(SolrSuggester.java:199)\n\tat org.apache.solr.handler.component.SuggestComponent. process(SuggestComponent.java:234)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody( SearchHandler.java:218)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135)\n\tat org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:246)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:777)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:418)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:207)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(
Synonyms within FQ
morning everyone, i'm attempting to find related documents based on a manufacturer's competitor. as such i'm querying against the 'description' field with manufacturer1's product description but running a filter query with manufacturer2's name against the 'mfgname' field. one of the ways that we help boost our document finding is with a synonym dictionary for manufacturer names. many of the larger players have multiple divisions, have absorbed smaller companies, etc. so we need all of their potential names to map to our record. i may be wrong, but from my initial testing it doesn't seem to be applying to a fq. is there any way of doing this? thanks-
Re: Synonyms within FQ
after further investigation it looks like the synonym i was testing against was only associated with one of their multiple divisions (despite being the most common name for them!). it looks like this may clear the issue up, but thanks anyway! -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote: morning everyone, i'm attempting to find related documents based on a manufacturer's competitor. as such i'm querying against the 'description' field with manufacturer1's product description but running a filter query with manufacturer2's name against the 'mfgname' field. one of the ways that we help boost our document finding is with a synonym dictionary for manufacturer names. many of the larger players have multiple divisions, have absorbed smaller companies, etc. so we need all of their potential names to map to our record. i may be wrong, but from my initial testing it doesn't seem to be applying to a fq. is there any way of doing this? thanks-
UI Admin - and stored=false fields
Hi I am indexing some content under text field. In the schema.xml text field is defined as : field name=text type=text_general indexed=true stored=false multiValued=true/ However, when I am looking to the documents via the UI http://localhost:8983/solr/#/sec_600b/query I see the text field content in the returned documents. Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed in admin ui) is expected? thanks! Benjamin.
Re: Number of clustering labels to show
Only to clarify the initial mail, The carrot.fragSize has nothing to do with the number of clusters produced. When you select to work with field summary ( you will work only on snippets from the original content, snippets produced by the highlight of the query in the content), the fragSize will specify the size of these fragments. From Carrot documentation : carrot.produceSummary When true, the carrot.snippet https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field (if no snippet field, then the carrot.title https://wiki.apache.org/solr/ClusteringComponent#carrot.title field) will be highlighted and the highlighted text will be used for clustering. Highlighting is recommended when the snippet field contains a lot of content. Highlighting can also increase the quality of clustering because the clustered content will get an additional query-specific context. carrot.fragSize The frag size to use for highlighting. Meaningful only when carrot.produceSummary https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary is true. If not specified, the default highlighting fragsize (hl.fragsize) will be used. If that isn't specified, then 100. Cheers 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you Stanislaw for the links. Will read them up to better understand how the algorithm works. Regards, Edwin On 29 May 2015 at 17:22, Stanislaw Osinski stanislaw.osin...@carrotsearch.com wrote: Hi, The number of clusters primarily depends on the parameters of the specific clustering algorithm. If you're using the default Lingo algorithm, the number of clusters is governed by the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look at the documentation ( https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings ) for some more details (the Tweaking at Query-Time section shows how to pass the specific parameters at request time). A complete overview of the Lingo clustering algorithm parameters is here: http://doc.carrot2.org/#section.component.lingo. Stanislaw -- Stanislaw Osinski, stanislaw.osin...@carrotsearch.com http://carrotsearch.com On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm trying to increase the number of cluster result to be shown during the search. I tried to set carrot.fragSize=20 but only 15 cluster labels is shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels shown. Is this the correct way to do this? I understand that setting it to 20 might not necessary mean 20 lables will be shown, as the setting is for maximum number. But when I set this to 5, it should reduce the number of labels to 5? I'm using Solr 5.1. Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Deleting Fields
On 30/05/2015 00:30, Shawn Heisey wrote: On 5/29/2015 5:08 PM, Joseph Obernberger wrote: Hi All - I have a lot of fields to delete, but noticed that once I started deleting them, I quickly ran out of heap space. Is delete-field a memory intensive operation? Should I delete one field, wait a while, then delete the next? I'm not aware of a way to delete a field. I may have a different definition of what a field is than you do, though. Solr lets you delete entire documents, but deleting a field from the entire index would involve re-indexing every document in the index, excluding that field. Can you be more specific about exactly what you are doing, what you are seeing, and what you want to see instead? Also, please be aware of this: http://people.apache.org/~hossman/#threadhijack Thanks, Shawn Here's a rather old post on how we did something similar: http://www.flax.co.uk/blog/2011/06/24/how-to-remove-a-stored-field-in-lucene/ Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Best strategy for logging security
It will be great if you can provide your valuable inputs on strategy for logging security... Thanks a lot in advance... Logging : - Is there a way to implement logging for each cores separately. - What will be the best strategy to log every query details (like source IP, search query, etc.) at some point we will need monthly reports for analysis. Securing SOLR : - We need to implement SOLR security from client as well as server side... requests will be performed via web app as well as other server side apps e.g. curl... Please suggest about the best approach we can follow... link to any documentation will also help. Environment : SOLR 4.7 configured on Tomcat 7 (Linux)
Re: UI Admin - and stored=false fields
That's the whole point of having a true/false option for stored. Stored=true implies that those fields are available for display to the user in results lists. stored=false and they're not. Best, Erick On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi I am indexing some content under text field. In the schema.xml text field is defined as : field name=text type=text_general indexed=true stored=false multiValued=true/ However, when I am looking to the documents via the UI http://localhost:8983/solr/#/sec_600b/query I see the text field content in the returned documents. Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed in admin ui) is expected? thanks! Benjamin.
Re: Synonyms within FQ
Thanks Erick! On Mon, Jun 1, 2015 at 11:29 AM, Erick Erickson erickerick...@gmail.com wrote: For future reference, fq clauses are parsed just like the q clause; they can be arbitrarily complex. Best, Erick On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote: after further investigation it looks like the synonym i was testing against was only associated with one of their multiple divisions (despite being the most common name for them!). it looks like this may clear the issue up, but thanks anyway! -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote: morning everyone, i'm attempting to find related documents based on a manufacturer's competitor. as such i'm querying against the 'description' field with manufacturer1's product description but running a filter query with manufacturer2's name against the 'mfgname' field. one of the ways that we help boost our document finding is with a synonym dictionary for manufacturer names. many of the larger players have multiple divisions, have absorbed smaller companies, etc. so we need all of their potential names to map to our record. i may be wrong, but from my initial testing it doesn't seem to be applying to a fq. is there any way of doing this? thanks-
Re: Occasionally getting error in solr suggester component.
Attach suggester.build=true or suggester.buildAll=true to any request to suggester to rebuild. OR add buildOnStartup or buildOnCommit or buildOnOptimize to the definition in solrConfig. BUT: building can be a _very_ expensive operation. For document-based indexes, the build process reads through _all_ of the _stored_ documents in your index, and that can take many minutes so I recommend against these options for a large index, and strongly recommend you test these with a large corpus. Best, Erick On Mon, Jun 1, 2015 at 4:01 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Lucene 5.1: I am (also) facing java.lang.IllegalStateException: suggester was not built At the very moment no new documents seem tob e added to the index/core. Will a reboot sanitize the index/core? I (still) have str name=buildOnCommittrue/str How can I tell Solr to peridoically update the suggestions? If not possible per configuration (in solrconfig.xml), what ist he preferred approach through SolrJ? Thx Clemens -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Donnerstag, 15. Januar 2015 19:52 An: solr-user@lucene.apache.org Betreff: Re: Occasionally getting error in solr suggester component. That sounds like a good approach to me. Of course it depends how often you commit, and what your tolerance is for delay in having suggestions appear, but it sounds as if you have a good understanding of the tradeoffs there. -Mike On 1/15/15 10:31 AM, Dhanesh Radhakrishnan wrote: Hi, From Solr 4.7 onwards, the implementation of this Suggester is changed. The old SpellChecker based search component is replaced with a new suggester that utilizes Lucene suggester module. The latest Solr download is preconfigured with this new suggester I;m using Solr 4.10 and suggestion are based on query /suggest instead of /spell. So what I did is that in changed to str name=buildOnCommitfalse/str Its not good that each time rebuild the index on commit , however, I would like to build the index on certain time period, say 1 hour. The lookup data will be built only when requested by URL parameter suggest.build=true http://localhost:8983/solr/ha/suggest?suggest.build=true; So this will rebuild the index again and the changes will reflect in the suggester. There are certain pros and cons for this. Issue is that the change will reflect only on certain time interval, here 1 hour. Advantage is that we can avoid the rebuilt index on every commit or optimize. Is this the right way ?? or any that I missed ??? Regards dhanesh s.r On Thu, Jan 15, 2015 at 3:20 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: did you build the spellcheck index using spellcheck.build as described here: https://cwiki.apache.org/confluence/display/solr/Spell+Checking ? -Mike On 01/14/2015 07:19 AM, Dhanesh Radhakrishnan wrote: Hi, Thanks for the reply. As you mentioned in the previous mail I changed buildOnCommit=false in solrConfig. After that change, suggestions are not working. In Solr 4.7 introduced a new approach based on a dedicated SuggestComponent I'm using that component to build suggestions and lookup implementation is AnalyzingInfixLookupFactory Is there any work around ?? On Wed, Jan 14, 2015 at 12:47 AM, Michael Sokolov msoko...@safaribooksonline.com wrote: I think you are probably getting bitten by one of the issues addressed in LUCENE-5889 I would recommend against using buildOnCommit=true - with a large index this can be a performance-killer. Instead, build the index yourself using the Solr spellchecker support (spellcheck.build=true) -Mike On 01/13/2015 10:41 AM, Dhanesh Radhakrishnan wrote: Hi all, I am experiencing a problem in Solr SuggestComponent Occasionally solr suggester component throws an error like Solr failed: {responseHeader:{status:500,QTime:1},error:{msg:suggest er was not built,trace:java.lang.IllegalStateException: suggester was not built\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester. lookup(AnalyzingInfixSuggester.java:368)\n\tat org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester. lookup(AnalyzingInfixSuggester.java:342)\n\tat org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240)\n\ tat org.apache.solr.spelling.suggest.SolrSuggester. getSuggestions(SolrSuggester.java:199)\n\tat org.apache.solr.handler.component.SuggestComponent. process(SuggestComponent.java:234)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody( SearchHandler.java:218)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135)\n\tat org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:246)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(
Re: Sorting in Solr
On 6/1/2015 9:29 AM, Steven White wrote: I need to be able to sot in Solr. Obviously, I need to do this in a way sorting won't cause OOM when a result may contain 1000's of hits if not millions. Can you guide me on how I can do this? Is there a way to tell Solr sort top N results (discarding everything else) or must such sorting be cone on the client side? Solr supports sorting. https://wiki.apache.org/solr/CommonQueryParameters#sort https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter I think we may have an omission from the docs -- docValues can also be used for sorting, and may also offer a performance advantage. Thanks, Shawn
Re: Synonyms within FQ
For future reference, fq clauses are parsed just like the q clause; they can be arbitrarily complex. Best, Erick On Mon, Jun 1, 2015 at 5:52 AM, John Blythe j...@curvolabs.com wrote: after further investigation it looks like the synonym i was testing against was only associated with one of their multiple divisions (despite being the most common name for them!). it looks like this may clear the issue up, but thanks anyway! -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Mon, Jun 1, 2015 at 8:33 AM, John Blythe j...@curvolabs.com wrote: morning everyone, i'm attempting to find related documents based on a manufacturer's competitor. as such i'm querying against the 'description' field with manufacturer1's product description but running a filter query with manufacturer2's name against the 'mfgname' field. one of the ways that we help boost our document finding is with a synonym dictionary for manufacturer names. many of the larger players have multiple divisions, have absorbed smaller companies, etc. so we need all of their potential names to map to our record. i may be wrong, but from my initial testing it doesn't seem to be applying to a fq. is there any way of doing this? thanks-
Sorting in Solr
Hi everyone, I need to be able to sot in Solr. Obviously, I need to do this in a way sorting won't cause OOM when a result may contain 1000's of hits if not millions. Can you guide me on how I can do this? Is there a way to tell Solr sort top N results (discarding everything else) or must such sorting be cone on the client side? Thanks in advanced Steve
Re: UI Admin - and stored=false fields
Did you happen to change the field type definition without reindexing? (it requires reindexing to “unstore” them if they were originally stored) If you’re seeing a field value in a document result (not facets, those are driven by indexed terms) when stored=“false” then something is wrong and I’d guess it’s because of the field definition changing as mentioned. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Jun 1, 2015, at 11:27 AM, Erick Erickson erickerick...@gmail.com wrote: That's the whole point of having a true/false option for stored. Stored=true implies that those fields are available for display to the user in results lists. stored=false and they're not. Best, Erick On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi I am indexing some content under text field. In the schema.xml text field is defined as : field name=text type=text_general indexed=true stored=false multiValued=true/ However, when I am looking to the documents via the UI http://localhost:8983/solr/#/sec_600b/query I see the text field content in the returned documents. Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed in admin ui) is expected? thanks! Benjamin.
Re: UI Admin - and stored=false fields
Reset, pay attention to Erik. I didn't read it all the way through. Erick On Mon, Jun 1, 2015 at 8:27 AM, Erick Erickson erickerick...@gmail.com wrote: That's the whole point of having a true/false option for stored. Stored=true implies that those fields are available for display to the user in results lists. stored=false and they're not. Best, Erick On Mon, Jun 1, 2015 at 4:34 AM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Hi I am indexing some content under text field. In the schema.xml text field is defined as : field name=text type=text_general indexed=true stored=false multiValued=true/ However, when I am looking to the documents via the UI http://localhost:8983/solr/#/sec_600b/query I see the text field content in the returned documents. Do I make a mistake? Or this behavior (i.e. no-stored fields are displayed in admin ui) is expected? thanks! Benjamin.
Re: Sorting in Solr
Steve: Surprisingly, the number of hits is completely irrelevant for the memory requirements for sorting. The base memory size is, AFAIK, an array of maxDoc ints (you can find maxDoc on the admin screen). There's some additional overhead, but that's the base size. If you sue DocValues, much of the overhead is kept in the MMapDirectory space IIRC. Best, Erick On Mon, Jun 1, 2015 at 8:41 AM, Shawn Heisey apa...@elyograg.org wrote: On 6/1/2015 9:29 AM, Steven White wrote: I need to be able to sot in Solr. Obviously, I need to do this in a way sorting won't cause OOM when a result may contain 1000's of hits if not millions. Can you guide me on how I can do this? Is there a way to tell Solr sort top N results (discarding everything else) or must such sorting be cone on the client side? Solr supports sorting. https://wiki.apache.org/solr/CommonQueryParameters#sort https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter I think we may have an omission from the docs -- docValues can also be used for sorting, and may also offer a performance advantage. Thanks, Shawn
Re: fq and defType
On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote: I need to parse some complicated queries that only works properly with the edismax query parser, in q and fq parameters. I am testing with defType=edismax, but it seems that this clause only affects to the q parameter. Is there any way to set edismax to the fq parameter? fq={!edismax}querystring The other edismax parameters on your request (qf, etc) apply to those filter queries just like they would for the q parameter. Thanks, Shawn
fq and defType
Hello, I need to parse some complicated queries that only works properly with the edismax query parser, in q and fq parameters. I am testing with defType=edismax, but it seems that this clause only affects to the q parameter. Is there any way to set edismax to the fq parameter? Thank you very much, David Dávila Atienza DIT Teléfono: 915828763 Extensión: 36763
Re: Deleting Fields
Hi - we are using 64bit OS and 64bit JVM. The JVM settings are currently: - -DSTOP.KEY=solrrocks -DSTOP.PORT=8100 -Dhost=helios -Djava.net.preferIPv4Stack=true -Djetty.port=9100 -DnumShards=27 -Dsolr.clustering.enabled=true -Dsolr.install.dir=/opt/solr -Dsolr.lock.type=hdfs -Dsolr.solr.home=/opt/solr/server/solr -Duser.timezone=UTC-DzkClientTimeout=15000 -DzkHost=eris.querymasters.com:2181,daphnis.querymasters.com:2181,triton.querymasters.com:2181,oberon.querymasters.com:2181,portia.querymasters.com:2181,puck.querymasters.com:2181/solr5 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:+ParallelRefProcEnabled -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseLargePages -XX:+UseParNewGC-XX:CMSFullGCsBeforeCompaction=1 -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:CMSTriggerPermRatio=80 -XX:ConcGCThreads=8 -XX:MaxDirectMemorySize=26g -XX:MaxTenuringThreshold=8 -XX:NewRatio=3 -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 9100 /opt/solr/server/logs -XX:ParallelGCThreads=8 -XX:PretenureSizeThreshold=64m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -Xloggc:/opt/solr/server/logs/solr_gc.log -Xms8g -Xmx16g -Xss256k -verbose:gc - At the time out of the OOM error, Xmx was set to 10g. OS limits, for the most part, are 'factory' Scientific Linux 6.6. I didn't see any messages in the log about too many open files. Thank you for the tips! -Joe On 5/31/2015 4:24 AM, Tomasz Borek wrote: Joseph, You are doing a memory intensive operation and perhaps an IO intensive operation at once. That makes your C-heap run out of memory or hit a thread limit (thus first problem, java.lang.OutOfMemoryError: unable to create new native thread) and later you're also hitting the problem of Java heap being full or - more precisely - GC being unable to free enough space there even with collection, to allocate new object that you want allocated (thus second throw: java.lang.OutOfMemoryError: Java heap space). Important is: - whether your OS is 32-bit or 64-bit - whether your JVM is 32-bit or 64-bit - what are your OS limits on thread creation and have you touched them or changed them (1st problem) - how do you start JVM (Xms, Xmx, Xss, dumps, direct memory, permgen size - both problems) What you can do to solve your problems differs depending on what exactly causes them, but in general: NATIVE: 1) Either your operation causes many threads to spawn and you hit your thread limit (OS limits how many threads process can create) - have a threaddump and see 2) Or your op causes many thread creation and the memory settings you start JVM with plus 32/64 bits of OS and JVM make it impossible for the C-heap to have this much memory thus you're hitting the OOM error - adjust settings, move to 64-bit architectures, add RAM while on 64-bit (32-bit really chokes you down, less than 4GB is available for you for EVERYTHING: Java heap, PermGen space AND C-Heap) Usually, with such thread-greedy operation, it's also nice to look at the code and see if perhaps one can optimize thread creation/management. OOM on Java heap: Add crash on memory dump parameter to your JVM and walk dominator tree or see histogram to actually tell what's eating your heap space. MAT is a good tool for this, unless your heap is like 150GB, then Netbeans may help or see Alexey Ragozin's work, I think he forked the code for NetBeans heap analyzer and made some adjustments specially for such cases. Light Google search and here it is: http://blog.ragozin.info/ pozdrawiam, LAFK 2015-05-30 20:48 GMT+02:00 Erick Erickson erickerick...@gmail.com: Faceting on very high cardinality fields can use up memory, no doubt about that. I think the entire delete question was a red herring, but you know that already ;) So I think you can forget about the delete stuff. Although do note that if you do re-index your old documents, the new version won't have the field, and as segments are merged the deleted documents will have all their resources reclaimed, effectively deleting the field from the old docs So you could gradually re-index your corpus and get this stuff out of there. Best, Erick On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger j...@lovehorsepower.com wrote: Thank you Erick. I was thinking that it actually went through and removed the index data; that you for the clarification. What happened was I had some bad data that created a lot of fields (some 8000). I was getting some errors adding new fields where solr could not talk to zookeeper, and I thought it may be because there are so many fields. The index size is some 420million docs. I'm hesitant to try to re-create as when the shards crash, they leave a write.lock file in HDFS, and I need to manually
Re: fq and defType
Thank you! David De: Shawn Heisey apa...@elyograg.org Para: solr-user@lucene.apache.org, Fecha: 01/06/2015 18:53 Asunto: Re: fq and defType On 6/1/2015 10:44 AM, david.dav...@correo.aeat.es wrote: I need to parse some complicated queries that only works properly with the edismax query parser, in q and fq parameters. I am testing with defType=edismax, but it seems that this clause only affects to the q parameter. Is there any way to set edismax to the fq parameter? fq={!edismax}querystring The other edismax parameters on your request (qf, etc) apply to those filter queries just like they would for the q parameter. Thanks, Shawn
Chef recipes for Solr
Anyone have Chef recipes they like for deploying Solr? I’d especially appreciate one for uploading the configs directly to a Zookeeper ensemble. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Best strategy for logging security
Logging : Just use logstash to a parse your logs for all collection and logstash forwarder and lumberjack at your solr replicas in your solr cloud to send the log events to you central logstash server and send it to back to solr (either the same or different instance) to a different collection. The default log4j.properties that comes with solr dist can log core name with each query log. Security: suggest you to go through this wiki https://wiki.apache.org/solr/SolrSecurity *Thanks,* *Rajesh,* *(mobile) : 8328789519.* On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com wrote: It will be great if you can provide your valuable inputs on strategy for logging security... Thanks a lot in advance... Logging : - Is there a way to implement logging for each cores separately. - What will be the best strategy to log every query details (like source IP, search query, etc.) at some point we will need monthly reports for analysis. Securing SOLR : - We need to implement SOLR security from client as well as server side... requests will be performed via web app as well as other server side apps e.g. curl... Please suggest about the best approach we can follow... link to any documentation will also help. Environment : SOLR 4.7 configured on Tomcat 7 (Linux)
Multiple Word Synonyms with Autophrasing
Hello everyone @ solr-user, At Wayfair, I have implemented multiple word synonyms in a clean and efficient way in conjunction with with a slightly modified version of the LucidWorks' Autophrasing plugin by also tacking on a modified version of edismax. It is not released or on use on our public website yet, but it will be very soon. While it is not ready to officially open source yet, I know some people out there are anxious to implement this type of thing. Please feel free to contact me if you are interested in learning about how to theoretically accomplish this on your own. Note that while this may have some concepts in common with Named Entity Recognition implementations, I think it really is a completely different thing. I get a lot of spam, so if you please, would you write me privately your questions with the subject line being MWSwA so I can easily compile everyone's questions about this. I will respond to everyone at some point soon with some beta documentation or possibly with an invitation to a private github or something so that you can review an example. Thanks! -Chris.
Re: Best strategy for logging security
Thanks Rajesh... just trying to figure out if *logstash *is opensource and free ? On Mon, Jun 1, 2015 at 2:13 PM, Rajesh Hazari rajeshhaz...@gmail.com wrote: Logging : Just use logstash to a parse your logs for all collection and logstash forwarder and lumberjack at your solr replicas in your solr cloud to send the log events to you central logstash server and send it to back to solr (either the same or different instance) to a different collection. The default log4j.properties that comes with solr dist can log core name with each query log. Security: suggest you to go through this wiki https://wiki.apache.org/solr/SolrSecurity *Thanks,* *Rajesh,* *(mobile) : 8328789519.* On Mon, Jun 1, 2015 at 11:20 AM, Vishal Swaroop vishal@gmail.com wrote: It will be great if you can provide your valuable inputs on strategy for logging security... Thanks a lot in advance... Logging : - Is there a way to implement logging for each cores separately. - What will be the best strategy to log every query details (like source IP, search query, etc.) at some point we will need monthly reports for analysis. Securing SOLR : - We need to implement SOLR security from client as well as server side... requests will be performed via web app as well as other server side apps e.g. curl... Please suggest about the best approach we can follow... link to any documentation will also help. Environment : SOLR 4.7 configured on Tomcat 7 (Linux)
Re: fq and defType
fq={!edismax}you are welome On Mon, Jun 1, 2015 at 6:44 PM, david.dav...@correo.aeat.es wrote: Hello, I need to parse some complicated queries that only works properly with the edismax query parser, in q and fq parameters. I am testing with defType=edismax, but it seems that this clause only affects to the q parameter. Is there any way to set edismax to the fq parameter? Thank you very much, David Dávila Atienza DIT Teléfono: 915828763 Extensión: 36763 -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: solr uima and opennlp
yeah, I think you'd rather post it to d...@uima.apache.org . Regards, Tommaso 2015-05-28 15:19 GMT+02:00 hossmaa andreea.hossm...@gmail.com: Hi Tommaso Thanks for the quick reply! I have another question about using the Dictionary Annotator, but I guess it's better to post it separately. Cheers Andreea -- View this message in context: http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud 5.1 startup looking for standalone config
I followed these steps and I am unable to launch in cloud mode. 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2 3. uploaded a configset to zk1 (solr home is /volume/solr/data) --- /opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost zk1:2181 -confname mycollection_cloud_conf -solrhome /volume/solr/data -confdir /home/ec2-user/mycollection/conf 4. on s1, added these params to solr.in.sh --- ZK_HOST=zk1:2181,zk2:2181,zk3:2181 SOLR_HOST=s1 ZK_CLIENT_TIMEOUT=15000 SOLR_OPTS=$SOLR_OPTS -DnumShards=2 5. on s1 created core directory and file /volume/solr/data/mycollection/core.properties (name=mycollection) 6. repeated steps 4,5 for s2 minus the numShards param Starting the service on s1 gives me mycollection: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core mycollection: Error loading solr config from /volume/solr/data/mycollection/conf/solrconfig.xml but aren't the config files supposed to be in Zookeeper? Tux -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html Sent from the Solr - User mailing list archive at Nabble.com.
Derive suggestions across multiple fields
Hi, Does anyone knows if we can derive suggestions across multiple fields? I tried to set something like this in my field in suggest searchComponents in solrconfig.xml, but nothing is returned. It only works when I set a single field, and not multiple field. searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str str name=fieldContent, Summary/str !-- the indexed field to derive suggestions from -- float name=threshold0.005/float str name=buildOnCommittrue/str /lst /searchComponent I'm using solr 5.1. Regards, Edwin
Re: Chef recipes for Solr
That sounds great. Someone else here will be making the recipes, so I’ll put him in touch with you. As always, this is a really helpful list. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 1, 2015, at 10:20 PM, Upayavira u...@odoko.co.uk wrote: I have many. My SolrCloud code has the app push configs to zookeeper. I am afk at the mo. Feel free to bug me about it! Upayavira On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote: Anyone have Chef recipes they like for deploying Solr? I’d especially appreciate one for uploading the configs directly to a Zookeeper ensemble. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: SolrCloud 5.1 startup looking for standalone config
bq: but aren't the config files supposed to be in Zookeeper Yes, but you haven't done anything to tell Solr that the nodes you've created are part of SolrCloud! You're confusing, I think, core discovery with creating collections. Basically you were pretty much OK up until step 5 (although I'm not at all sure that SOLR_HOST is doing you any good, and certainly setting numShards in SOLR_OPTS isn't a good idea, what happens if you want to create a collection with 5 shards?) You don't need to create any directories on your Solr nodes, that'll be done for you automatically by the collection creation command from the Collections API. So I'd down the nodes and nuke the directories you created by hand and bring the nodes back up. It's probably not necessary to take the nodes down, but I tend to be paranoid about that. Then just create the collection via the Collections API CREATE command, see: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1 You can use curl or a browser to issue something like this to any active Solr node, Solr will do the rest: http://some_solr_node:port/solr/admin/collections?action=CREATEname=mycollectionnumShards=2collection.configName=my_collection_cloud_confetc.. I believe it's _possible_ to carefully construct the core.properties files on all the Solr instances, but unless you know _exactly_ what's going on under the covers it'll lead to endless tail-chasing. You can control which nodes the collection ends up on with the createNodeSet parameter etc Best, Erick On Mon, Jun 1, 2015 at 4:37 PM, tuxedomoon dancolem...@yahoo.com wrote: I followed these steps and I am unable to launch in cloud mode. 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2 3. uploaded a configset to zk1 (solr home is /volume/solr/data) --- /opt/solrsvc/server/scripts/cloud-scripts/zkcli.sh -cmd upconfig -zkhost zk1:2181 -confname mycollection_cloud_conf -solrhome /volume/solr/data -confdir /home/ec2-user/mycollection/conf 4. on s1, added these params to solr.in.sh --- ZK_HOST=zk1:2181,zk2:2181,zk3:2181 SOLR_HOST=s1 ZK_CLIENT_TIMEOUT=15000 SOLR_OPTS=$SOLR_OPTS -DnumShards=2 5. on s1 created core directory and file /volume/solr/data/mycollection/core.properties (name=mycollection) 6. repeated steps 4,5 for s2 minus the numShards param Starting the service on s1 gives me mycollection: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load conf for core mycollection: Error loading solr config from /volume/solr/data/mycollection/conf/solrconfig.xml but aren't the config files supposed to be in Zookeeper? Tux -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Number of clustering labels to show
Thank you so much Alessandro. But i do not find any difference with the quality of the clustering results when I change the hl.fragszie to a even though I've set my carrot.produceSummary to true. Regards, Edwin On 1 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Only to clarify the initial mail, The carrot.fragSize has nothing to do with the number of clusters produced. When you select to work with field summary ( you will work only on snippets from the original content, snippets produced by the highlight of the query in the content), the fragSize will specify the size of these fragments. From Carrot documentation : carrot.produceSummary When true, the carrot.snippet https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet field (if no snippet field, then the carrot.title https://wiki.apache.org/solr/ClusteringComponent#carrot.title field) will be highlighted and the highlighted text will be used for clustering. Highlighting is recommended when the snippet field contains a lot of content. Highlighting can also increase the quality of clustering because the clustered content will get an additional query-specific context. carrot.fragSize The frag size to use for highlighting. Meaningful only when carrot.produceSummary https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary is true. If not specified, the default highlighting fragsize (hl.fragsize) will be used. If that isn't specified, then 100. Cheers 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thank you Stanislaw for the links. Will read them up to better understand how the algorithm works. Regards, Edwin On 29 May 2015 at 17:22, Stanislaw Osinski stanislaw.osin...@carrotsearch.com wrote: Hi, The number of clusters primarily depends on the parameters of the specific clustering algorithm. If you're using the default Lingo algorithm, the number of clusters is governed by the LingoClusteringAlgorithm.desiredClusterCountBase parameter. Take a look at the documentation ( https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings ) for some more details (the Tweaking at Query-Time section shows how to pass the specific parameters at request time). A complete overview of the Lingo clustering algorithm parameters is here: http://doc.carrot2.org/#section.component.lingo. Stanislaw -- Stanislaw Osinski, stanislaw.osin...@carrotsearch.com http://carrotsearch.com On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, I'm trying to increase the number of cluster result to be shown during the search. I tried to set carrot.fragSize=20 but only 15 cluster labels is shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels shown. Is this the correct way to do this? I understand that setting it to 20 might not necessary mean 20 lables will be shown, as the setting is for maximum number. But when I set this to 5, it should reduce the number of labels to 5? I'm using Solr 5.1. Regards, Edwin -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Chef recipes for Solr
I have many. My SolrCloud code has the app push configs to zookeeper. I am afk at the mo. Feel free to bug me about it! Upayavira On Mon, Jun 1, 2015, at 07:29 PM, Walter Underwood wrote: Anyone have Chef recipes they like for deploying Solr? I’d especially appreciate one for uploading the configs directly to a Zookeeper ensemble. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)