Re: SOLRJ and SOLR compatibility
Am 27.02.2014 08:04, schrieb Shawn Heisey: On 2/26/2014 11:22 PM, Thomas Scheffler wrote: I am one developer of a repository framework. We rely on the fact, that "SolrJ generally maintains backwards compatibility, so you can use a newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1] This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1. (SOLRJ 4.6.1, SOLR 4.6.0) We use SolrInputDocument from SOLRJ to index our documents (javabin). But as framework developer we are not in a role to force our users to update their SOLR server such often. Instead with every new version we want to update just the SOLRJ library we ship with to enable latest features, if the user wishes. When I send a query to a request handler I can attach a "version" parameter to tell SOLR which version of the response format I expect. Is there such a configuration when indexing SolrInputDocuments? I did not find it so far. What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. Actually bug reports arrive me that sound like "Unknown type 19" I am currently not able to reproduce it myself with server version 4.5.0, 4.5.1 and 4.6.0 when using solrj 4.6.1 It sounds to be the same issue like described here: http://lucene.472066.n3.nabble.com/After-upgrading-indexer-to-SolrJ-4-6-1-o-a-solr-servlet-SolrDispatchFilter-Unknown-type-19-td4116152.html The solution there was to upgrade the Server to version 4.6.1. This helped here, too. Out there it is a very unpopular decision. Some user have large SOLR installs and stick to a certain (4.x) version. They want upgrades from us but upgrading company-wide SOLR installations is out of their scope. Is that a known SOLRJ issue that is fixed in version 4.7.0? kind regards, Thomas
Re: Column ambiguously defined error in SOLR delta import
On 2/26/2014 11:42 PM, Chandan khatua wrote: > I have the bellow query in data-config.xml, but it throws an error while > running the delta query: "java.sql.SQLSyntaxErrorException: ORA-00918: > column ambiguously defined". These are the FIRST two hits that I got when I searched for your full error string on Google: http://ora-918.ora-code.com/ http://www.techonthenet.com/oracle/errors/ora00918.php This error is coming from Oracle, not Solr. You did not include your deltaImportQuery. If you do not HAVE a deltaImportQuery defined, Solr will try to guess what it should be doing based on your main query and deltaQuery. As it says in the following wiki page, this is error-prone, and is likely to be the reason it's not working. http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config It's always possible that the real problem here is a bug in the Oracle JDBC driver. Less likely is a bug in Oracle itself. Thanks, Shawn
Re: SOLRJ and SOLR compatibility
On 2/26/2014 11:22 PM, Thomas Scheffler wrote: > I am one developer of a repository framework. We rely on the fact, that > "SolrJ generally maintains backwards compatibility, so you can use a > newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1] > > This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1. > (SOLRJ 4.6.1, SOLR 4.6.0) > > We use SolrInputDocument from SOLRJ to index our documents (javabin). > But as framework developer we are not in a role to force our users to > update their SOLR server such often. Instead with every new version we > want to update just the SOLRJ library we ship with to enable latest > features, if the user wishes. > > When I send a query to a request handler I can attach a "version" > parameter to tell SOLR which version of the response format I expect. > > Is there such a configuration when indexing SolrInputDocuments? I did > not find it so far. What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. A full discussion of this topic could fill a short novel. This reply is a little long, but hopefully digestible. I am assuming that you have a fair amount of familiarity with SolrJ here. If there's something you don't understand or seems wrong, we'll explore further. The javabin format changed between 1.4.1 and the next version (3.1) in a way that is incompatible in either direction, so mixing those versions requires using XMLResponseWriter. The javabin format has remained unchanged since version 3.1. Because Solr 1.x is very old and has the javabin incompatibility with later releases, I will not be discussing it beyond what I wrote above. You mentioned the version parameter. SolrJ automatically handles this in the requests it makes to Solr. You don't need to worry about it. One of the first things to say is that if you are using SolrCloud with the CloudSolrServer object, the only way that you can have any assurance of success with mixed versions is if your SolrJ version is newer than your Solr version, and I would not be assured very far unless the minor version is the same between the two. SolrCloud is evolving at an incredible pace. As far as I know, *backwards* compatibility is pretty good, but I would not be surprised to learn that there are some hiccups. I don't have a lot of experience with CloudSolrServer yet. Cross-version compatibility with non-cloud setups is MUCH better. A non-cloud setup is assumed for the rest of this email. I think it's important to mention that ConcurrentUpdateSolrServer and its predecessor StreamingUpdateSolrServer are usually not a good choice, unless you don't care about error handling. These classes do NOT inform the calling application of any error that occurs when sending updates to Solr. Rather than rely on one of these methods for making requests in parallel, your application should be multithreaded and send parallel requests itself with HttpSolrServer, which is completely threadsafe. If you're mixing 3.x and 4.x versions, stick to the xml REQUEST writer. This is the default in all but the most recent versions of SolrJ, but it's actually a good idea to explicitly set the writer object so you won't be surprised by an upgrade. You can use the binary RESPONSE writer (which is the default in all versions) with no problem. If both versions are 4.x, binary is fine for both the request writer and the response writer, and for performance reasons, is the preferred choice. In non-cloud setups, there are very few problems to be found with any combination of 4.x versions. Thanks. Shawn
Re: Know indexing time of a document
you could just add a field with default value NOW in schema.xml, for example On Wed, Feb 26, 2014 at 10:44 PM, pratpor wrote: > Is it possible to know the indexing time of a document in solr. Like there > is > a implicit field for "score" which automatically gets added to a document, > is there a field that stores value of indexing time? > > Thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Know-indexing-time-of-a-document-tp4120051.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Know indexing time of a document
None that I know of, but you can easily have a date field with default set to NOW. Or you can have an UpdateRequestProcessor that adds it in: http://lucene.apache.org/solr/4_6_1/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html Regards, Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 27, 2014 at 5:44 PM, pratpor wrote: > Is it possible to know the indexing time of a document in solr. Like there is > a implicit field for "score" which automatically gets added to a document, > is there a field that stores value of indexing time? > > Thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Know-indexing-time-of-a-document-tp4120051.html > Sent from the Solr - User mailing list archive at Nabble.com.
Know indexing time of a document
Is it possible to know the indexing time of a document in solr. Like there is a implicit field for "score" which automatically gets added to a document, is there a field that stores value of indexing time? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Know-indexing-time-of-a-document-tp4120051.html Sent from the Solr - User mailing list archive at Nabble.com.
Column ambiguously defined error in SOLR delta import
Hi I have the bellow query in data-config.xml, but it throws an error while running the delta query: "java.sql.SQLSyntaxErrorException: ORA-00918: column ambiguously defined". Full data import is running fine. Kindly suggest the changes required. Thanking you, -Chandan
Searching with special chars
Hello, We are facing some kinda weird problem. So here is the scenario: We have a frontend and a middle-ware which is dealing with user input search queries before posting to Solr. So when a user enters city:Frankenthal_(Pfalz) and then searches, there is no result although there are fields on some documents matching city:Frankenthal_(Pfalz). We are aware that we can escape those chars, but the middleware which is accepting queries is running on a Glassfish server, which is refusing URLs with backslashes in it, hence using backslashes is not okay for posting the query. To make everyone clear about the system it looks like: (PHP) -> Encoded JSON -> (Glassfish App - Middleware) -> Javabin -> Solr any other ideas who to deal with queries with special chars like this one? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-with-special-chars-tp4120047.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLRJ and SOLR compatibility
Hi, I am one developer of a repository framework. We rely on the fact, that "SolrJ generally maintains backwards compatibility, so you can use a newer SolrJ with an older Solr, or an older SolrJ with a newer Solr." [1] This statement is not even true for bugfix releases like 4.6.0 -> 4.6.1. (SOLRJ 4.6.1, SOLR 4.6.0) We use SolrInputDocument from SOLRJ to index our documents (javabin). But as framework developer we are not in a role to force our users to update their SOLR server such often. Instead with every new version we want to update just the SOLRJ library we ship with to enable latest features, if the user wishes. When I send a query to a request handler I can attach a "version" parameter to tell SOLR which version of the response format I expect. Is there such a configuration when indexing SolrInputDocuments? I did not find it so far. Kind regards, Thomas [1] https://wiki.apache.org/solr/Solrj
Re: Solr cloud: Faceting issue on text field
Hi Jack, Ya, the requirement is like that. I also want to apply various filters on the field like shingle, pattern replace etc. That is why I am using the text field. (But for the above run these filters were not enabled) The facet count is set as 10 and the unique terms can go into thousands. Regards, On Wed, Feb 26, 2014 at 6:33 PM, Jack Krupansky wrote: > Are you sure you want to be faceting on a text field, as opposed to a > string field? I mean, each term (word) from the text will be a separate > facet value. > > How many facet values do you typically returning? > > How many unique terms occur in the facet field? > > -- Jack Krupansky > > -Original Message- From: David Miller > Sent: Wednesday, February 26, 2014 2:06 PM > To: solr-user@lucene.apache.org > Subject: Solr cloud: Faceting issue on text field > > > Hi, > > I am encountering an issue where Solr nodes goes down when trying to obtain > facets on a text field. The cluster consists of a few servers and have > around 200 million documents (small to medium). I am trying the faceting > first time on this field and it gives a 502 Bad Gateway error along with > some of the nodes going down and solr getting generally slow. > > The text field can have few words to a few thousand words. The Solr version > we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the > logs, Zookeeper was giving an EndOfStreamException > > Any hint on this will be helpful. > > Thanks & Regards, >
Re: Tracing Solr Query Execution and Performance
Thanks, Jack. I will file a jira then. What are the generic ways to improve/tune a solr query if we know its expensive? Does the analysis page help with this at all? On Wed, Feb 26, 2014 at 3:39 PM, Jack Krupansky wrote: > I don't recall seeing anything related to passing the debug/debugQuery > parameters on for inter-node shard queries and then add that to the > aggregated response (if debug/debugQuery was specified.) Sounds worth a > Jira. > > -- Jack Krupansky > > -Original Message- From: KNitin > Sent: Wednesday, February 26, 2014 5:25 PM > To: solr-user@lucene.apache.org > Subject: Tracing Solr Query Execution and Performance > > > Hi there > > I have a few very expensive queries (atleast thats what the QTime tells > me) that is causing high CPU problems on a few nodes. Is there a way where > I can "trace" or do an "explain" on the solr query to see where it spends > more time? More like profiling on a per sub query basis? > > I have tried using debug=timing as a part of the query and it gives me > stage level details (parsing, highlighting) but I need much more insights > into where a query is spending time on > > > Any help is much appreciated > > Thanks > Nitin >
Re: Fetching uniqueKey and other int quickly from documentCache?
You could try forcing things to go through function queries (via pseudo-fields): fl=field(id), field(myfield) If you're not requesting any stored fields, that *might* currently skip that step. -Yonik http://heliosearch.org - native off-heap filters and fieldcache for solr On Mon, Feb 24, 2014 at 9:58 PM, Gregg Donovan wrote: > We fetch a large number of documents -- 1000+ -- for each search. Each > request fetches only the uniqueKey or the uniqueKey plus one secondary > integer key. Despite this, we find that we spent a sizable amount of time > in SolrIndexSearcher#doc(int docId, Set fields). Time is spent > fetching the two stored fields, LZ4 decoding, etc. > > I would love to be able to tell Solr to always fetch these two fields from > memory. We have them both in the fieldCache so we're already spending the > RAM. I've seen this asked previously [1], so it seems like a fairly common > need, especially for distributed search. Any ideas? > > A few possible ideas I had: > > --Check FieldCache.html#getCacheEntries() before going to stored fields. > --Give the documentCache config a list of fields it should load from the > fieldCache > > > Having an in-memory mapping from docId->uniqueKey has come up for us > before. We've used a custom SolrCache maintaining that mapping to quickly > filter over personalized collections. Maybe the uniqueKey should be more > optimized out of the box? Perhaps a custom "uniqueKey" codec that also > maintained the docId->uniqueKey mapping in memory? > > --Gregg > > [1] http://search-lucene.com/m/oCUKJ1heHUU1
Re: Autocommit, opensearchers and ingestion
On Feb 26, 2014, at 5:24 PM, Joel Cohen wrote: > he's told me that he's doing commits in his SolrJ code > every 1000 items (configurable). Does that override my Solr server settings? Yes. Even if you have configured autocommit - explicit commits are explicit commits that happen on demand. Generally, clients should not send there own commits if you are using auto commit. If clients want to control this, it’s best to setup hard auto commit and have clients use commitWithin for soft commits. It generally doesn’t make sense for a client to make explicit hard commits with SolrCloud. - Mark http://about.me/markrmiller
Re: Search score problem using bf edismax
The bf parameter adds the value of a function query to the document store. Your example did not include a bf parameter. -- Jack Krupansky -Original Message- From: Ing. Andrea Vettori Sent: Wednesday, February 26, 2014 12:26 PM To: solr-user@lucene.apache.org Subject: Search score problem using bf edismax Hi, I'm new to Solr and I'm trying to understand why I don't get what I want with the bf parameter. The query debug information follows. What I don't understand is why the result of the bf parameter is so low in score compared to matched fields. Can anyone help ? Thank you 0 19 true true iphone cavo 1393434305227 xml 3 125520 0125562 Carica batterie da auto con cavo riavvolgibile CBR-AR I-Phone 1 Cellular Line Cellulare - Cavo Accendisigari Cellular Line CBR-AR I-Phone 1 true true IS107445|IP107261|ST300392|IG27586 P98720 1.0 1 22 22 0.22 0.22 15.9 0 2050-12-31T00:00:00Z 15.9 0.0 0.0 9.57 0 2020-12-31T00:00:00Z 10.24 9.95 0.0 2013-02-20T23:00:00Z ELDTEL003001001 ELETEL03004 9 8 9 9 9 461945 1.0 1.0 A1 2011-06-05T22:00:00Z 2011-06-06T00:00:00Z 2013-10-21T00:00:00Z 12 9 9 930437 1.0 1.0 A1 2013-01-10T23:00:00Z 2013-01-11T00:00:00Z 2013-10-21T00:00:00Z 18 0 C_0125562 CAT_607 M_186 CM_607|186 CAR_14956 CAR_14956 CAR_14952 SIMILE 1461102968118968320 2014-02-26T12:06:14.32Z 167951 0167435 Carica batteria da auto dedicato iPhone 5 CBR-MFIPH5W-Phone 5 Cellular Line Cellulare - Cavo Accendisigari Cellular Line CBR-MFIPH5W-Phone 5 Cellulare - Cavo Accendisigari|Dedicato apple light true true IS174019|IP173834|ST135516|IG98795 P135190 1.0 1 22 22 0.22 0.22 18.02 0 2020-12-31T00:00:00Z 19.27 18.71 0.0 2013-02-20T23:00:00Z 24.9 0 2050-12-31T00:00:00Z 24.9 0.0 0.0 ELDTEL003001001 ELETEL03004 9 8 9 9 9 816069 1.0 1.0 A1 2012-12-05T23:00:00Z 2012-12-06T00:00:00Z 2013-10-21T00:00:00Z 9 9 9 941785 1.0 1.0 A1 2013-01-10T23:00:00Z 2013-01-11T00:00:00Z 2013-10-21T00:00:00Z 65 0 C_0167435 CAT_607 M_186 CM_607|186 CAR_14957 CAR_14957 CAR_14952 SIMILE 1461103051247976448 2014-02-26T12:07:33.597Z 167185 0166678 Caricabatteria da auto per Apple IPHONE 5/IPAD MINI K39757EU Kensington Cellulare - Cavo Accendisigari Kensington K39757EU Cellulare - Cavo Accendisigari|Dedicato apple light true true IS171490|IP171305|ST133668|IG96264 P134418 1.0 1 22 22 0.22 0.22 19.9 0 2050-12-31T00:00:00Z 19.9 22.9 0.0 2014-02-06T23:00:00Z 14.03 0 2020-12-31T00:00:00Z 15.01 16.16 0.0 2013-05-28T22:00:00Z ELDTEL003001001 ELETEL03004 9 8 9 9 9 814053 1.0 1.0 A1 2012-11-22T23:00:00Z 2012-11-23T00:00:00Z 2014-02-13T00:00:00Z 8 9 9 941453 1.0 1.0 A1 2013-01-10T23:00:00Z 2013-01-11T00:00:00Z 2014-02-13T00:00:00Z 76 0 C_0166678 M_537 CAT_607 CM_607|537 CAR_14957 CAR_14957 CAR_14952 SIMILE 1461103049458057216 2014-02-26T12:07:31.891Z true iphone cavo iphone cavo (+((DisjunctionMaxQuery((categoria_s:iphone | titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | modello_s:iphone)) DisjunctionMaxQuery((categoria_s:cavo | titolo:cavo^2.0 | descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo)))~2) FunctionQuery(1.0/(1.0E-9*float(int(rank1_8))+1.0)))/no_coord +(((categoria_s:iphone | titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | modello_s:iphone) (categoria_s:cavo | titolo:cavo^2.0 | descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo))~2) 1.0/(1.0E-9*float(int(rank1_8))+1.0) 0.8545726 = (MATCH) sum of: 0.82827055 = (MATCH) sum of: 0.33939165 = (MATCH) max of: 0.33939165 = (MATCH) weight(modello_s:iphone in 24160) [DefaultSimilarity], result of: 0.33939165 = score(doc=24160,freq=1.0 = termFreq=1.0 ), product of: 0.21819489 = queryWeight, product of: 8.295743 = idf(docFreq=170, maxDocs=252056) 0.02630203 = queryNorm 1.5554519 = fieldWeight in 24160, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 8.295743 = idf(docFreq=170, maxDocs=252056) 0.1875 = fieldN
Re: Parallel queries to Solr
Just send the queries to Solr in parallel using multiple threads in your application layer. Solr can handle multiple, parallel queries as separate, parallel requests, but does not have a way to bundle multiple queries on a single request. -- Jack Krupansky -Original Message- From: solr2020 Sent: Wednesday, February 26, 2014 4:40 PM To: solr-user@lucene.apache.org Subject: Parallel queries to Solr Hi, We want to send parallel queries(2-3 queries) in the same request from client to Solr. How to send the parallel queries from client side(using Solrj). Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Parallel-queries-to-Solr-tp4119959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How does Solr parse schema.xml?
There is an existing Solr admin service to do that, which is what the Solr Admin UI uses to support that feature: For example: curl “http://localhost:8983/solr/analysis/field?analysis.fieldname=features&analysis.fieldvalue=Hello+World.&indent=true” There are some examples in the next (unpublished) release of my book (that's one of them.) That handler returns all token details, but if you wanted to roll your own, start there. The handler is: org.apache.solr.handler.FieldAnalysisRequestHandler -- Jack Krupansky -Original Message- From: Software Dev Sent: Wednesday, February 26, 2014 7:00 PM To: solr-user@lucene.apache.org Subject: How does Solr parse schema.xml? Can anyone point me in the right direction. I'm trying to duplicate the functionality of the analysis request handler so we can wrap a service around it to return the terms given a string of text. We would like to read the same schema.xml file to configure the analyzer,tokenizer, etc but I can't seem to find the class that actually does the parsing of that file. Thanks
Re: How does Solr parse schema.xml?
Check out org.apache.solr.schema.IndexSchema#readSchema(), which uses org.apache.solr.schema.FieldTypePluginLoader to parse analyzers. On Feb 26, 2014, at 7:00 PM, Software Dev wrote: > Can anyone point me in the right direction. I'm trying to duplicate the > functionality of the analysis request handler so we can wrap a service > around it to return the terms given a string of text. We would like to read > the same schema.xml file to configure the analyzer,tokenizer, etc but I > can't seem to find the class that actually does the parsing of that file. > > Thanks
How does Solr parse schema.xml?
Can anyone point me in the right direction. I'm trying to duplicate the functionality of the analysis request handler so we can wrap a service around it to return the terms given a string of text. We would like to read the same schema.xml file to configure the analyzer,tokenizer, etc but I can't seem to find the class that actually does the parsing of that file. Thanks
Re: Tracing Solr Query Execution and Performance
I don't recall seeing anything related to passing the debug/debugQuery parameters on for inter-node shard queries and then add that to the aggregated response (if debug/debugQuery was specified.) Sounds worth a Jira. -- Jack Krupansky -Original Message- From: KNitin Sent: Wednesday, February 26, 2014 5:25 PM To: solr-user@lucene.apache.org Subject: Tracing Solr Query Execution and Performance Hi there I have a few very expensive queries (atleast thats what the QTime tells me) that is causing high CPU problems on a few nodes. Is there a way where I can "trace" or do an "explain" on the solr query to see where it spends more time? More like profiling on a per sub query basis? I have tried using debug=timing as a part of the query and it gives me stage level details (parsing, highlighting) but I need much more insights into where a query is spending time on Any help is much appreciated Thanks Nitin
Re: Solr cloud: Faceting issue on text field
Are you sure you want to be faceting on a text field, as opposed to a string field? I mean, each term (word) from the text will be a separate facet value. How many facet values do you typically returning? How many unique terms occur in the facet field? -- Jack Krupansky -Original Message- From: David Miller Sent: Wednesday, February 26, 2014 2:06 PM To: solr-user@lucene.apache.org Subject: Solr cloud: Faceting issue on text field Hi, I am encountering an issue where Solr nodes goes down when trying to obtain facets on a text field. The cluster consists of a few servers and have around 200 million documents (small to medium). I am trying the faceting first time on this field and it gives a 502 Bad Gateway error along with some of the nodes going down and solr getting generally slow. The text field can have few words to a few thousand words. The Solr version we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the logs, Zookeeper was giving an EndOfStreamException Any hint on this will be helpful. Thanks & Regards,
Re: SolrCloud Startup
Thanks, Shawn. I will try to upgrade solr soon Reg firstSearcher: I think it does nothing now. I have configured to use ExternalFileLoader but there the external file has no contents. Most of the queries hitting the collection are expensive and tail queries. What will be your recommendation to warm the first Searcher/new Searcher? Thanks Nitin On Tue, Feb 25, 2014 at 4:12 PM, Shawn Heisey wrote: > On 2/25/2014 4:30 PM, KNitin wrote: > >> Jeff : Thanks. I have tried reload before but it is not reliable (atleast >> in 4.3.1). A few cores get initialized and few dont (show as just >> recovering or down) and hence had to move away from it. Is it a known >> issue >> in 4.3.1? >> > > With Solr 4.3.1, you are running into this bug with reloads under > SolrCloud: > > https://issues.apache.org/jira/browse/SOLR-4805 > > The only way to recover from this bug is to restart Solr.The bug is fixed > in 4.4.0 and later. > > > Shawn,Otis,Erick >> >> Yes I have reviewed the page before and have given 1/4 of my mem to JVM >> and the rest to RAM/Os Cache. (15 Gb heap and 45 G to rest. Totally 60G >> machine). I have also reviewed the tlog file and they are in the order of >> KB (4-10 or 30). I have SSD and the reads are hardly noticable (in the >> order of 100Kb during that time frame). I have also disabled swap on all >> machines >> >> Regarding firstSearcher, It is currently set to externalFileLoader. What >> is >> the use of first searcher? I havent played around with it >> > > I don't think it's a good idea to have extensive warming queries. I do > exactly one query in firstSearcher and newSearcher: a query for all > documents with zero rows, sorted on our most common sort field. This is > designed purely to preload the sort data into the FieldCache. > > Thanks, > Shawn > >
Filter query exclusion with SolrJ
I use a SolrJ-based client to query Solr and I have been trying to construct HTTP requests where facet name/value pairs are excluded. The web interface I am working with has a refine further functionality, which allows excluding one or more facet values. I have 3 facet fields: domain, content type and author and I would like to be able to handle faceting by exclusion on each of them. For example, q = Dickens AND fq=-author:Dickens, Janet will construct the following HTTP request: /solr/solrbase/select?q=Dickens&fq=-author:Dickens%2c+Janet&wt=json&indent=true Whereas the XML dump will look like: Dickens, Charles Dickens, Sarah So far, the Java implementation I am working with does not seems to handle filter query exclusion: private HttpSolrServer solrServer; solrServer = new HttpSolrServer("http://localhost:8983/solr/";); private static final String CONFIG_SOLR_FACET_FIELD = "facet_field"; private String[] _facetFields = new String[] {"author"}; private static final String CONFIG_SOLR_FACETS = "facets" Element el = myParams.getChild(CONFIG_SOLR_FACETS); _facetUse = el.getAttributeValue("useFacets", "true"); _facetMinCount = el.getAttributeValue("minCount", String.valueOf(1)); _facetLimit = el.getAttributeValue("limit", String.valueOf(20)); List vals = el.getChildren(CONFIG_SOLR_FACET_FIELD); if (vals.size() > 0) { _facetFields = new String[vals.size()]; for (int i=0; i < vals.size(); i++) { _facetFields[i] = ((Element)vals.get(i)).getTextTrim(); } } SolrQuery query = new SolrQuery(); query.setQuery(qs); List facetList = doc.getRootElement().getChildren("facet"); Iterator it = facetList.iterator(); while (it.hasNext()) { Element el = (Element)it.next(); // String name = el.getAttributeValue("name"); String value = el.getTextTrim(); if (name != null && value != null) { facets.add(name+":"+value); } } query.setQuery(qs). setFacet(Boolean.parseBoolean(_facetUse)). setFacetMinCount(Integer.parseInt(_facetMinCount)). setFacetLimit(Integer.parseInt(_facetLimit)). for (int i=0; i<_facetFields.length; i++) { query.addFacetField(_facetFields[i]); }; for (int i=0; ihttp://lucene.472066.n3.nabble.com/Filter-query-exclusion-with-SolrJ-tp4119974.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr cloud: Faceting issue on text field
Hi Greg, Thanks for the info. But the scenario in link is little bit different from my requirement. Regards, On Wed, Feb 26, 2014 at 4:46 PM, Greg Walters wrote: > I don't have much experience with faceting and its best practices though > I'm sure someone else on here can pipe up to address your questions there. > In the mean time have you read > http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/? > > > On Feb 26, 2014, at 3:26 PM, David Miller wrote: > > > Hi Greg, > > > > Yes, the memory and cpu spiked for that machine. Another issue I found in > > the log was "SolrException: Too many values for UnInvertedField faceting > on > > field". > > I was using the fc method. Will changing the method/params help? > > > > One thing I don't understand is that, the query was returning only a > single > > document, but the facet still seems to be having the issue. > > > > So, it should be technically possible to get facets on text field over > > 200-300 million docs at a decent speed, right? > > > > > > Regards, > > > > > > > > > > > > > > > > > > On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters >wrote: > > > >> IIRC faceting uses copious amounts of memory; have you checked for GC > >> activity while the query is running? > >> > >> Thanks, > >> Greg > >> > >> On Feb 26, 2014, at 1:06 PM, David Miller > wrote: > >> > >>> Hi, > >>> > >>> I am encountering an issue where Solr nodes goes down when trying to > >> obtain > >>> facets on a text field. The cluster consists of a few servers and have > >>> around 200 million documents (small to medium). I am trying the > faceting > >>> first time on this field and it gives a 502 Bad Gateway error along > with > >>> some of the nodes going down and solr getting generally slow. > >>> > >>> The text field can have few words to a few thousand words. The Solr > >> version > >>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking > the > >>> logs, Zookeeper was giving an EndOfStreamException > >>> > >>> Any hint on this will be helpful. > >>> > >>> Thanks & Regards, > >> > >> > >
Tracing Solr Query Execution and Performance
Hi there I have a few very expensive queries (atleast thats what the QTime tells me) that is causing high CPU problems on a few nodes. Is there a way where I can "trace" or do an "explain" on the solr query to see where it spends more time? More like profiling on a per sub query basis? I have tried using debug=timing as a part of the query and it gives me stage level details (parsing, highlighting) but I need much more insights into where a query is spending time on Any help is much appreciated Thanks Nitin
Re: Autocommit, opensearchers and ingestion
I read that blog too! Great info. I've bumped up the commit times and turned the ingestion up a bit as well. I've upped hard commit to 5 minutes and the soft commit to 60 seconds. ${solr.autoCommit.maxTime:30} false ${solr.autoSoftCommit.maxTime:6} I'm still getting the same issue. After speaking to the engineer working on the ingestion code, he's told me that he's doing commits in his SolrJ code every 1000 items (configurable). Does that override my Solr server settings? On Tue, Feb 25, 2014 at 3:27 PM, Erick Erickson wrote: > Gopal: I'm glad somebody noticed that blog! > > Joel: > For bulk loads it's a Good Thing to lengthen out > your soft autocommit interval. A lot. Every second > poor Solr is trying to open up a new searcher while > you're throwing lots of documents at it. That's what's > generating the "too many searchers" problem I'd > guess. Soft commits are less expensive than hard > commits with openSearcher=true (you're not doing this, > and you shouldn't be). But soft commits aren't free. > All the top-level caches are thrown away and autowarming > is performed. > > Also, I'd probably consider just leaving off the bit about > maxDocs in your hard commit, I find it rarely does all > that much good. After all, even if you have to replay the > transaction log, you're only talking 15 seconds here. > > Best, > Erick > > > On Tue, Feb 25, 2014 at 12:08 PM, Gopal Patwa > wrote: > > > This blog by Eric will help you to understand different commit option and > > transaction logs and it does provide some recommendation for ingestion > > process. > > > > > > > http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > > > > > On Tue, Feb 25, 2014 at 11:40 AM, Furkan KAMACI > >wrote: > > > > > Hi; > > > > > > You should read here: > > > > > > > > > http://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F > > > > > > On the other hand do you have 4 Zookeeper instances as a quorum? > > > > > > Thanks; > > > Furkan KAMACI > > > > > > > > > 2014-02-25 20:31 GMT+02:00 Joel Cohen : > > > > > > > Hi all, > > > > > > > > I'm working with Solr 4.6.1 and I'm trying to tune my ingestion > > process. > > > > The ingestion runs a big DB query and then does some ETL on it and > > > inserts > > > > via SolrJ. > > > > > > > > I have a 4 node cluster with 1 shard per node running in Tomcat with > > > > -Xmx=4096M. Each node has a separate instance of Zookeeper on it, > plus > > > the > > > > ingestion server has one as well. The Solr servers have 8 cores and > 64 > > Gb > > > > of total RAM. The ingestion server is a VM with 8 Gb and 2 cores. > > > > > > > > My ingestion code uses a few settings to control concurrency and > batch > > > > size. > > > > > > > > solr.update.batchSize=500 > > > > solr.threadCount=4 > > > > > > > > With this setup, I'm getting a lot of errors and the ingestion is > > taking > > > > much longer than it should. > > > > > > > > Every so often during the ingestion I get these errors on the Solr > > > servers: > > > > > > > > WARN shard1 - 2014-02-25 11:18:34.341; > > > > org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay > > > > > > > > > > > > > > tlog{file=/usr/local/solr_shard1/productCatalog/data/tlog/tlog.0014074 > > > > refcount=2} active=true starting pos=776774 > > > > WARN shard1 - 2014-02-25 11:18:37.275; > > > > org.apache.solr.update.UpdateLog$LogReplayer; Log replay finished. > > > > recoveryInfo=RecoveryInfo{adds=4065 deletes=0 deleteByQuery=0 > errors=0 > > > > positionOfStart=776774} > > > > WARN shard1 - 2014-02-25 11:18:37.960; > org.apache.solr.core.SolrCore; > > > > [productCatalog] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 > > > > WARN shard1 - 2014-02-25 11:18:37.961; > org.apache.solr.core.SolrCore; > > > > [productCatalog] Error opening new searcher. exceeded limit of > > > > maxWarmingSearchers=2, try again later. > > > > WARN shard1 - 2014-02-25 11:18:37.961; > org.apache.solr.core.SolrCore; > > > > [productCatalog] Error opening new searcher. exceeded limit of > > > > maxWarmingSearchers=2, try again later. > > > > ERROR shard1 - 2014-02-25 11:18:37.961; > > > > org.apache.solr.common.SolrException; > > > org.apache.solr.common.SolrException: > > > > Error opening new searcher. exceeded limit of maxWarmingSearchers=2, > > try > > > > again later. > > > > at > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1575) > > > > at > > org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1346) > > > > at > > > > > > > > > > > > > > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:592) > > > > > > > > I cut threads down to 1 and batchSize down to 100 and the errors go > > away, > > > > but the upload time jumps up by a factor of 25. > > > > > > > > My solrconfig.xml has: > > > > > > > > > > > >${solr.autoCommit.maxDo
Re: concurrentlinkedhashmap 1.2 vs 1.4
Done, created under SolrCloud component, couldn't find a more appropriate like Server - Java or something, hope it has all the info needed, I could contribute to it sometime next week, waiting for new PC parts from Amazon to have a proper after work dev environment. Regards, Guido. On 26/02/14 20:58, Mark Miller wrote: Thanks Guido - any chance you could file a JIRA issue for this? - Mark http://about.me/markrmiller On Feb 26, 2014, at 6:28 AM, Guido Medina wrote: I think it would need Guava v16.0.1 to benefit from the ported code. Guido. On 26/02/14 11:20, Guido Medina wrote: As notes also stated at concurrentlinkedhashmap v1.4, the performance changes were ported to Guava (don't know to what version to be honest), so, wouldn't be better to use MapMaker builder? Regards, Guido. On 26/02/14 11:15, Guido Medina wrote: Hi, I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, according to notes at https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x designed against Java 6+? If so, wouldn't it benefit from v1.4? Regards, Guido.
Re: Solr cloud: Faceting issue on text field
I don't have much experience with faceting and its best practices though I'm sure someone else on here can pipe up to address your questions there. In the mean time have you read http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/? On Feb 26, 2014, at 3:26 PM, David Miller wrote: > Hi Greg, > > Yes, the memory and cpu spiked for that machine. Another issue I found in > the log was "SolrException: Too many values for UnInvertedField faceting on > field". > I was using the fc method. Will changing the method/params help? > > One thing I don't understand is that, the query was returning only a single > document, but the facet still seems to be having the issue. > > So, it should be technically possible to get facets on text field over > 200-300 million docs at a decent speed, right? > > > Regards, > > > > > > > > > On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters wrote: > >> IIRC faceting uses copious amounts of memory; have you checked for GC >> activity while the query is running? >> >> Thanks, >> Greg >> >> On Feb 26, 2014, at 1:06 PM, David Miller wrote: >> >>> Hi, >>> >>> I am encountering an issue where Solr nodes goes down when trying to >> obtain >>> facets on a text field. The cluster consists of a few servers and have >>> around 200 million documents (small to medium). I am trying the faceting >>> first time on this field and it gives a 502 Bad Gateway error along with >>> some of the nodes going down and solr getting generally slow. >>> >>> The text field can have few words to a few thousand words. The Solr >> version >>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the >>> logs, Zookeeper was giving an EndOfStreamException >>> >>> Any hint on this will be helpful. >>> >>> Thanks & Regards, >> >>
Parallel queries to Solr
Hi, We want to send parallel queries(2-3 queries) in the same request from client to Solr. How to send the parallel queries from client side(using Solrj). Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Parallel-queries-to-Solr-tp4119959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr cloud: Faceting issue on text field
Hi Greg, Yes, the memory and cpu spiked for that machine. Another issue I found in the log was "SolrException: Too many values for UnInvertedField faceting on field". I was using the fc method. Will changing the method/params help? One thing I don't understand is that, the query was returning only a single document, but the facet still seems to be having the issue. So, it should be technically possible to get facets on text field over 200-300 million docs at a decent speed, right? Regards, On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters wrote: > IIRC faceting uses copious amounts of memory; have you checked for GC > activity while the query is running? > > Thanks, > Greg > > On Feb 26, 2014, at 1:06 PM, David Miller wrote: > > > Hi, > > > > I am encountering an issue where Solr nodes goes down when trying to > obtain > > facets on a text field. The cluster consists of a few servers and have > > around 200 million documents (small to medium). I am trying the faceting > > first time on this field and it gives a 502 Bad Gateway error along with > > some of the nodes going down and solr getting generally slow. > > > > The text field can have few words to a few thousand words. The Solr > version > > we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the > > logs, Zookeeper was giving an EndOfStreamException > > > > Any hint on this will be helpful. > > > > Thanks & Regards, > >
Re: concurrentlinkedhashmap 1.2 vs 1.4
Thanks Guido - any chance you could file a JIRA issue for this? - Mark http://about.me/markrmiller On Feb 26, 2014, at 6:28 AM, Guido Medina wrote: > I think it would need Guava v16.0.1 to benefit from the ported code. > > Guido. > > On 26/02/14 11:20, Guido Medina wrote: >> As notes also stated at concurrentlinkedhashmap v1.4, the performance >> changes were ported to Guava (don't know to what version to be honest), so, >> wouldn't be better to use MapMaker builder? >> >> Regards, >> >> Guido. >> >> On 26/02/14 11:15, Guido Medina wrote: >>> Hi, >>> >>> I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, >>> according to notes at https://code.google.com/p/concurrentlinkedhashmap/ >>> version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x >>> designed against Java 6+? If so, wouldn't it benefit from v1.4? >>> >>> Regards, >>> >>> Guido. >> >
Re: Cluster state ranges are all null after reboot
Thanks Shalin, that code might be helpful... do you know if there is a reliable way to line up the ranges with the shard numbers? When the problem occurred we had 80 million documents already in the index, and could not issue even a basic 'deleteById' call. I'm tempted to assume they are just assigned linearly since our Test and Prod clusters both look to work that way now, but I can't be sure whether that is by design or just happenstance of boot order. And no, unfortunately we have not been able to reproduce this issue consistently despite trying a number of different things such as graceless stop/start and screwing with the underlying WAR file (which is what we thought puppet might be doing). The problem has occurred twice since, but always in our Test environment. The fact that Test has only a single replica per shard is the most likely culprit for me, but as mentioned, even gracelessly killing the last replica in the cluster seems to leave the range set correctly in clusterstate when we test it in isolation. In production (45 JVMs, 15 shards with 3 replicas each) we've never seen the problem, despite a similar number of rollouts for version changes etc. Ta, Greg On 26 February 2014 23:46, Shalin Shekhar Mangar wrote: > If you have 15 shards and assuming that you've never used shard > splitting, you can calculate the shard ranges by using new > CompositeIdRouter().partitionRange(15, new > CompositeIdRouter().fullRange()) > > This gives me: > [8000-9110, 9111-a221, a222-b332, > b333-c443, c444-d554, d555-e665, > e666-f776, f777-887, 888-1998, > 1999-2aa9, 2aaa-3bba, 3bbb-4ccb, > 4ccc-5ddc, 5ddd-6eed, 6eee-7fff] > > Have you done any more investigation into why this happened? Anything > strange in the logs? Are you able to reproduce this in a test > environment? > > On Wed, Feb 19, 2014 at 5:16 AM, Greg Pendlebury > wrote: > > We've got a 15 shard cluster spread across 3 hosts. This morning our > puppet > > software rebooted them all and afterwards the 'range' for each shard has > > become null in zookeeper. Is there any way to restore this value short of > > rebuilding a fresh index? > > > > I've read various questions from people with a similar problem, although > in > > those cases it is usually a single shard that has become null allowing > them > > to infer what the value should be and manually fix it in ZK. In this > case I > > have no idea what the ranges should be. This is our test cluster, and > > checking production I can see that the ranges don't appear to be > > predictable based on the shard number. > > > > I'm also not certain why it even occurred. Our test cluster only has a > > single replica per shard, so when a JVM is rebooted the cluster is > > unavailable... would that cause this? Production has 3 replicas so we can > > do rolling reboots. > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Solr Permgen Exceptions when creating/removing cores
Thanks Timothy, I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the error to happen more quickly. With this option on it didn't seemed to do any intermittent garbage collecting that delayed the issue in with it off. I was already using a max of 512MB, and I can reproduce it with it set this high or even higher. Right now because of how we have this implemented just increasing it to something high just delays the problem :/ Anything else you could suggest I would really appreciate. On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter wrote: > Hi Josh, > > Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM > versions, permgen collection was disabled by default. > > Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may > be too small. > > > Timothy Potter > Sr. Software Engineer, LucidWorks > www.lucidworks.com > > > From: Josh > Sent: Wednesday, February 26, 2014 12:27 PM > To: solr-user@lucene.apache.org > Subject: Solr Permgen Exceptions when creating/removing cores > > We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows > installation with 64bit Java 1.7U51 and we are seeing consistent issues > with PermGen exceptions. We have the permgen configured to be 512MB. > Bitnami ships with a 32bit version of Java for windows and we are replacing > it with a 64bit version. > > Passed in Java Options: > > -XX:MaxPermSize=64M > -Xms3072M > -Xmx6144M > -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=75 > -XX:+CMSClassUnloadingEnabled > -XX:NewRatio=3 > > -XX:MaxTenuringThreshold=8 > > This is our use case: > > We have what we call a database core which remains fairly static and > contains the imported contents of a table from SQL server. We then have > user cores which contain the record ids of results from a text search > outside of Solr. We then query for the data we want from the database core > and limit the results to the content of the user core. This allows us to > combine facet data from Solr with the search results from another engine. > We are creating the user cores on demand and removing them when the user > logs out. > > Our issue is the constant creation and removal of user cores combined with > the constant importing seems to push us over our PermGen limit. The user > cores are removed at the end of every session and as a test I made an > application that would loop creating the user core, import a set of data to > it, query the database core using it as a limiter and then remove the user > core. My expectation was in this scenario that all the permgen associated > with that user cores would be freed upon it's unload and allow permgen to > reclaim that memory during a garbage collection. This was not the case, it > would constantly go up until the application would exhaust the memory. > > I also investigated whether the there was a connection between the two > cores left behind because I was joining them together in a query but even > unloading the database core after unloading all the user cores won't > prevent the limit from being hit or any memory to be garbage collected from > Solr. > > Is this a known issue with creating and unloading a large number of cores? > Could it be configuration based for the core? Is there something other than > unloading that needs to happen to free the references? > > Thanks > > Notes: I've tried using tools to determine if it's a leak within Solr such > as Plumbr and my activities turned up nothing. >
RE: Solr Permgen Exceptions when creating/removing cores
Hi Josh, Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM versions, permgen collection was disabled by default. Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may be too small. Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: Josh Sent: Wednesday, February 26, 2014 12:27 PM To: solr-user@lucene.apache.org Subject: Solr Permgen Exceptions when creating/removing cores We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows installation with 64bit Java 1.7U51 and we are seeing consistent issues with PermGen exceptions. We have the permgen configured to be 512MB. Bitnami ships with a 32bit version of Java for windows and we are replacing it with a 64bit version. Passed in Java Options: -XX:MaxPermSize=64M -Xms3072M -Xmx6144M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 This is our use case: We have what we call a database core which remains fairly static and contains the imported contents of a table from SQL server. We then have user cores which contain the record ids of results from a text search outside of Solr. We then query for the data we want from the database core and limit the results to the content of the user core. This allows us to combine facet data from Solr with the search results from another engine. We are creating the user cores on demand and removing them when the user logs out. Our issue is the constant creation and removal of user cores combined with the constant importing seems to push us over our PermGen limit. The user cores are removed at the end of every session and as a test I made an application that would loop creating the user core, import a set of data to it, query the database core using it as a limiter and then remove the user core. My expectation was in this scenario that all the permgen associated with that user cores would be freed upon it's unload and allow permgen to reclaim that memory during a garbage collection. This was not the case, it would constantly go up until the application would exhaust the memory. I also investigated whether the there was a connection between the two cores left behind because I was joining them together in a query but even unloading the database core after unloading all the user cores won't prevent the limit from being hit or any memory to be garbage collected from Solr. Is this a known issue with creating and unloading a large number of cores? Could it be configuration based for the core? Is there something other than unloading that needs to happen to free the references? Thanks Notes: I've tried using tools to determine if it's a leak within Solr such as Plumbr and my activities turned up nothing.
CollapsingQParserPlugin is slower than standard Solr field grouping in Solr 4.6.1
I notice that in Solr 4.6.1 CollapsingQParserPlugin is slower than standard Solr field grouping. I have a Solr index of 1 docs, with a signature field which is a Solr dedup field of the doc content. Majority of the signatures are unique. With standard Solr field grouping, http://localhost:4462/solr/collection1/select?q=*:*&group.ngroups=true&group=true&group.field=signature&group.main=true&rows=1&fl=id I get average QTime 78 after Solr warmed up. Using CollapsingQParserPlugin, http://localhost:4462/solr/collection1/select?q=*:*&fq={!collapse%20field=signature}&rows=1&fl=id I get average QTime 89.2 In fact CollapsingQParserPlugin QTime is always slower than the standard Solr field grouping. How can I get CollapsingQParserPlugin run faster? Joe
Solr Permgen Exceptions when creating/removing cores
We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows installation with 64bit Java 1.7U51 and we are seeing consistent issues with PermGen exceptions. We have the permgen configured to be 512MB. Bitnami ships with a 32bit version of Java for windows and we are replacing it with a 64bit version. Passed in Java Options: -XX:MaxPermSize=64M -Xms3072M -Xmx6144M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 This is our use case: We have what we call a database core which remains fairly static and contains the imported contents of a table from SQL server. We then have user cores which contain the record ids of results from a text search outside of Solr. We then query for the data we want from the database core and limit the results to the content of the user core. This allows us to combine facet data from Solr with the search results from another engine. We are creating the user cores on demand and removing them when the user logs out. Our issue is the constant creation and removal of user cores combined with the constant importing seems to push us over our PermGen limit. The user cores are removed at the end of every session and as a test I made an application that would loop creating the user core, import a set of data to it, query the database core using it as a limiter and then remove the user core. My expectation was in this scenario that all the permgen associated with that user cores would be freed upon it's unload and allow permgen to reclaim that memory during a garbage collection. This was not the case, it would constantly go up until the application would exhaust the memory. I also investigated whether the there was a connection between the two cores left behind because I was joining them together in a query but even unloading the database core after unloading all the user cores won't prevent the limit from being hit or any memory to be garbage collected from Solr. Is this a known issue with creating and unloading a large number of cores? Could it be configuration based for the core? Is there something other than unloading that needs to happen to free the references? Thanks Notes: I've tried using tools to determine if it's a leak within Solr such as Plumbr and my activities turned up nothing.
Re: Solr cloud: Faceting issue on text field
IIRC faceting uses copious amounts of memory; have you checked for GC activity while the query is running? Thanks, Greg On Feb 26, 2014, at 1:06 PM, David Miller wrote: > Hi, > > I am encountering an issue where Solr nodes goes down when trying to obtain > facets on a text field. The cluster consists of a few servers and have > around 200 million documents (small to medium). I am trying the faceting > first time on this field and it gives a 502 Bad Gateway error along with > some of the nodes going down and solr getting generally slow. > > The text field can have few words to a few thousand words. The Solr version > we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the > logs, Zookeeper was giving an EndOfStreamException > > Any hint on this will be helpful. > > Thanks & Regards,
Solr cloud: Faceting issue on text field
Hi, I am encountering an issue where Solr nodes goes down when trying to obtain facets on a text field. The cluster consists of a few servers and have around 200 million documents (small to medium). I am trying the faceting first time on this field and it gives a 502 Bad Gateway error along with some of the nodes going down and solr getting generally slow. The text field can have few words to a few thousand words. The Solr version we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the logs, Zookeeper was giving an EndOfStreamException Any hint on this will be helpful. Thanks & Regards,
Re: SolrCloud: How to replicate shard of another machine for failover?
Hi; As Daniel mentioned it is just for "first time" and not a suggested approach. However if you follow that way you can assign shards to machines. On the other hand you can not change it after a time later with same procedure. Thanks; Furkan KAMACI 2014-02-26 15:53 GMT+02:00 Daniel Collins : > This is only true the *first* time you start the cluster. As mentioned > earlier, the correct way to assign shards to cores is to use the collection > API. Failing that, you can start cores in a determined order, and the > cores will assign themselves a shard/replica when they first start up. > From that point on, that mapping is defined in clusterstate.json, and will > persist until you change it (delete cluster state or use collection/core > API to move/remove a core. It is a kludgy approach, that's why generally > it isn't recommended for new starters to use, but by starting the first > cores in a particular order you can get exactly the distribution you want. > > The collection API is good generally because it has some logic to > distribute shards across machines, but you can't be very specific with it, > you can't say "I want shard 1 on machine A, and its replicas on machines b, > c & d). So we use the "start order" mechanism for our production systems, > because we want to place shards on specific machines., We have 256 shards, > so we want to know exactly what set of cores & machines is required in > order to have a "full collection" of data. As long as you are aware of the > limitations of each mechanism, both work. > > > On 26 February 2014 10:26, Oliver Schrenk > wrote: > > > > There is a round robin process when assigning nodes at cluster. If you > > want > > > to achieve what you want you should change your Solr start up order. > > > > Well that is just weird. To bring a cluster to a reproducible state, I > > have to bring the whole cluster down, and start it up again in a specific > > order? > > > > What order do you suggest, to have a failover mechanism? >
Re: Format of the spellcheck.q used to get suggestions in current filter
I'm afraid I have to manually retreive all docs for suggested query in current filter (category:Cars&q=Renau) and count them to get the frequency in given filter. 2014-02-26 19:09 GMT+01:00 Hakim Benoudjit : > It seems that suggestion frequency stays the same with filter query (fq). > > > 2014-02-26 19:05 GMT+01:00 Ahmet Arslan : > > >> >> Just a guess, what happens when you use filter query? >> fq=category:Cars&q=Renau >> >> >> >> On Wednesday, February 26, 2014 7:38 PM, Hakim Benoudjit < >> h.benoud...@gmail.com> wrote: >> I mean that: I want suggestions frequency to count only document in >> current >> query (solr 'q'). My issue is even if suggestion 'word' is correct; the >> frequency is relative to all index and not only to the current query. >> Suppose that I have 'q = category:Cars', in this case, if my searched >> query >> is 'Renau' (for cars model), suggestions frequence should only count cars >> having the name 'Renault', not persons >> >> >> >> 2014-02-26 18:07 GMT+01:00 Ahmet Arslan : >> >> > Hi, >> > >> > What do you mean by "suggestions only for current category" ? Do you >> mean >> > that suggested word(s) should return non-zero hits for that category? >> > >> > Ahmet >> > >> > >> > >> > On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit < >> > h.benoud...@gmail.com> wrote: >> > @Jack Krupansky, here is the important portion of my solrconfig.xml: >> > >> > >> > default >> > title >> > solr.DirectSolrSpellChecker >> > >> > internal >> > >> > 0.5 >> > >> > 2 >> > >> > 1 >> > >> > 5 >> > >> > 4 >> > >> > 0.01 >> > >> > >> > >> > As you guess 'title' field is the one I'm searching & the one I'm >> building >> > my suggestions from. >> > >> > @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my >> issues, >> > cause I want to get suggestions only for current category. >> > >> > >> > >> > 2014-02-26 17:07 GMT+01:00 Ahmet Arslan : >> > >> > > Hi Hakim, >> > > >> > > According to wiki spellcheck.q is intended to use with 'spelling >> ready' >> > > query/input. >> > > 'spelling ready' means it does not contain field names, AND, OR, etc. >> > > Something like should work. spellcheck.q=value1 >> value2&q=+field1:value1 >> > > +field2:value2 >> > > >> > > Ahmet >> > > >> > > >> > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit < >> > > h.benoud...@gmail.com> wrote: >> > > I have some difficulties to use `spellcheck.q` to get only suggestions >> > for >> > > current query. >> > > >> > > When I set `spellcheck.q` to lucene query format (field1:value1 AND >> > > field2:value2), it doesnt return me any result. >> > > >> > > I have supposed that the value stored in `spellcheck.q` is just the >> value >> > > of ``spellcheck` component default field, but it returns an error in >> this >> > > case. >> > > >> > > Any help please? >> > > >> > > >> > >> > >> >> >
Re: Format of the spellcheck.q used to get suggestions in current filter
It seems that suggestion frequency stays the same with filter query (fq). 2014-02-26 19:05 GMT+01:00 Ahmet Arslan : > > > Just a guess, what happens when you use filter query? > fq=category:Cars&q=Renau > > > > On Wednesday, February 26, 2014 7:38 PM, Hakim Benoudjit < > h.benoud...@gmail.com> wrote: > I mean that: I want suggestions frequency to count only document in current > query (solr 'q'). My issue is even if suggestion 'word' is correct; the > frequency is relative to all index and not only to the current query. > Suppose that I have 'q = category:Cars', in this case, if my searched query > is 'Renau' (for cars model), suggestions frequence should only count cars > having the name 'Renault', not persons > > > > 2014-02-26 18:07 GMT+01:00 Ahmet Arslan : > > > Hi, > > > > What do you mean by "suggestions only for current category" ? Do you mean > > that suggested word(s) should return non-zero hits for that category? > > > > Ahmet > > > > > > > > On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit < > > h.benoud...@gmail.com> wrote: > > @Jack Krupansky, here is the important portion of my solrconfig.xml: > > > > > > default > > title > > solr.DirectSolrSpellChecker > > > > internal > > > > 0.5 > > > > 2 > > > > 1 > > > > 5 > > > > 4 > > > > 0.01 > > > > > > > > As you guess 'title' field is the one I'm searching & the one I'm > building > > my suggestions from. > > > > @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues, > > cause I want to get suggestions only for current category. > > > > > > > > 2014-02-26 17:07 GMT+01:00 Ahmet Arslan : > > > > > Hi Hakim, > > > > > > According to wiki spellcheck.q is intended to use with 'spelling ready' > > > query/input. > > > 'spelling ready' means it does not contain field names, AND, OR, etc. > > > Something like should work. spellcheck.q=value1 value2&q=+field1:value1 > > > +field2:value2 > > > > > > Ahmet > > > > > > > > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit < > > > h.benoud...@gmail.com> wrote: > > > I have some difficulties to use `spellcheck.q` to get only suggestions > > for > > > current query. > > > > > > When I set `spellcheck.q` to lucene query format (field1:value1 AND > > > field2:value2), it doesnt return me any result. > > > > > > I have supposed that the value stored in `spellcheck.q` is just the > value > > > of ``spellcheck` component default field, but it returns an error in > this > > > case. > > > > > > Any help please? > > > > > > > > > > > >
Re: Format of the spellcheck.q used to get suggestions in current filter
Just a guess, what happens when you use filter query? fq=category:Cars&q=Renau On Wednesday, February 26, 2014 7:38 PM, Hakim Benoudjit wrote: I mean that: I want suggestions frequency to count only document in current query (solr 'q'). My issue is even if suggestion 'word' is correct; the frequency is relative to all index and not only to the current query. Suppose that I have 'q = category:Cars', in this case, if my searched query is 'Renau' (for cars model), suggestions frequence should only count cars having the name 'Renault', not persons 2014-02-26 18:07 GMT+01:00 Ahmet Arslan : > Hi, > > What do you mean by "suggestions only for current category" ? Do you mean > that suggested word(s) should return non-zero hits for that category? > > Ahmet > > > > On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit < > h.benoud...@gmail.com> wrote: > @Jack Krupansky, here is the important portion of my solrconfig.xml: > > > default > title > solr.DirectSolrSpellChecker > > internal > > 0.5 > > 2 > > 1 > > 5 > > 4 > > 0.01 > > > > As you guess 'title' field is the one I'm searching & the one I'm building > my suggestions from. > > @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues, > cause I want to get suggestions only for current category. > > > > 2014-02-26 17:07 GMT+01:00 Ahmet Arslan : > > > Hi Hakim, > > > > According to wiki spellcheck.q is intended to use with 'spelling ready' > > query/input. > > 'spelling ready' means it does not contain field names, AND, OR, etc. > > Something like should work. spellcheck.q=value1 value2&q=+field1:value1 > > +field2:value2 > > > > Ahmet > > > > > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit < > > h.benoud...@gmail.com> wrote: > > I have some difficulties to use `spellcheck.q` to get only suggestions > for > > current query. > > > > When I set `spellcheck.q` to lucene query format (field1:value1 AND > > field2:value2), it doesnt return me any result. > > > > I have supposed that the value stored in `spellcheck.q` is just the value > > of ``spellcheck` component default field, but it returns an error in this > > case. > > > > Any help please? > > > > > >
Re: Format of the spellcheck.q used to get suggestions in current filter
I mean that: I want suggestions frequency to count only document in current query (solr 'q'). My issue is even if suggestion 'word' is correct; the frequency is relative to all index and not only to the current query. Suppose that I have 'q = category:Cars', in this case, if my searched query is 'Renau' (for cars model), suggestions frequence should only count cars having the name 'Renault', not persons 2014-02-26 18:07 GMT+01:00 Ahmet Arslan : > Hi, > > What do you mean by "suggestions only for current category" ? Do you mean > that suggested word(s) should return non-zero hits for that category? > > Ahmet > > > > On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit < > h.benoud...@gmail.com> wrote: > @Jack Krupansky, here is the important portion of my solrconfig.xml: > > > default > title > solr.DirectSolrSpellChecker > > internal > > 0.5 > > 2 > > 1 > > 5 > > 4 > > 0.01 > > > > As you guess 'title' field is the one I'm searching & the one I'm building > my suggestions from. > > @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues, > cause I want to get suggestions only for current category. > > > > 2014-02-26 17:07 GMT+01:00 Ahmet Arslan : > > > Hi Hakim, > > > > According to wiki spellcheck.q is intended to use with 'spelling ready' > > query/input. > > 'spelling ready' means it does not contain field names, AND, OR, etc. > > Something like should work. spellcheck.q=value1 value2&q=+field1:value1 > > +field2:value2 > > > > Ahmet > > > > > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit < > > h.benoud...@gmail.com> wrote: > > I have some difficulties to use `spellcheck.q` to get only suggestions > for > > current query. > > > > When I set `spellcheck.q` to lucene query format (field1:value1 AND > > field2:value2), it doesnt return me any result. > > > > I have supposed that the value stored in `spellcheck.q` is just the value > > of ``spellcheck` component default field, but it returns an error in this > > case. > > > > Any help please? > > > > > >
Search score problem using bf edismax
Hi, I'm new to Solr and I'm trying to understand why I don't get what I want with the bf parameter. The query debug information follows. What I don't understand is why the result of the bf parameter is so low in score compared to matched fields. Can anyone help ? Thank you 0 19 true true iphone cavo 1393434305227 xml 3 125520 0125562 Carica batterie da auto con cavo riavvolgibile CBR-AR I-Phone 1 Cellular Line Cellulare - Cavo Accendisigari Cellular Line CBR-AR I-Phone 1 true true IS107445|IP107261|ST300392|IG27586 P98720 1.0 1 22 22 0.22 0.22 15.9 0 2050-12-31T00:00:00Z 15.9 0.0 0.0 9.57 0 2020-12-31T00:00:00Z 10.24 9.95 0.0 2013-02-20T23:00:00Z ELDTEL003001001 ELETEL03004 9 8 9 9 9 461945 1.0 1.0 A1 2011-06-05T22:00:00Z 2011-06-06T00:00:00Z 2013-10-21T00:00:00Z 12 9 9 930437 1.0 1.0 A1 2013-01-10T23:00:00Z 2013-01-11T00:00:00Z 2013-10-21T00:00:00Z 18 0 C_0125562 CAT_607 M_186 CM_607|186 CAR_14956 CAR_14956 CAR_14952 SIMILE 1461102968118968320 2014-02-26T12:06:14.32Z 167951 0167435 Carica batteria da auto dedicato iPhone 5 CBR-MFIPH5W-Phone 5 Cellular Line Cellulare - Cavo Accendisigari Cellular Line CBR-MFIPH5W-Phone 5 Cellulare - Cavo Accendisigari|Dedicato apple light true true IS174019|IP173834|ST135516|IG98795 P135190 1.0 1 22 22 0.22 0.22 18.02 0 2020-12-31T00:00:00Z 19.27 18.71 0.0 2013-02-20T23:00:00Z 24.9 0 2050-12-31T00:00:00Z 24.9 0.0 0.0 ELDTEL003001001 ELETEL03004 9 8 9 9 9 816069 1.0 1.0 A1 2012-12-05T23:00:00Z 2012-12-06T00:00:00Z 2013-10-21T00:00:00Z 9 9 9 941785 1.0 1.0 A1 2013-01-10T23:00:00Z 2013-01-11T00:00:00Z 2013-10-21T00:00:00Z 65 0 C_0167435 CAT_607 M_186 CM_607|186 CAR_14957 CAR_14957 CAR_14952 SIMILE 1461103051247976448 2014-02-26T12:07:33.597Z 167185 0166678 Caricabatteria da auto per Apple IPHONE 5/IPAD MINI K39757EU Kensington Cellulare - Cavo Accendisigari Kensington K39757EU Cellulare - Cavo Accendisigari|Dedicato apple light true true IS171490|IP171305|ST133668|IG96264 P134418 1.0 1 22 22 0.22 0.22 19.9 0 2050-12-31T00:00:00Z 19.9 22.9 0.0 2014-02-06T23:00:00Z 14.03 0 2020-12-31T00:00:00Z 15.01 16.16 0.0 2013-05-28T22:00:00Z ELDTEL003001001 ELETEL03004 9 8 9 9 9 814053 1.0 1.0 A1 2012-11-22T23:00:00Z 2012-11-23T00:00:00Z 2014-02-13T00:00:00Z 8 9 9 941453 1.0 1.0 A1 2013-01-10T23:00:00Z 2013-01-11T00:00:00Z 2014-02-13T00:00:00Z 76 0 C_0166678 M_537 CAT_607 CM_607|537 CAR_14957 CAR_14957 CAR_14952 SIMILE 1461103049458057216 2014-02-26T12:07:31.891Z true iphone cavo iphone cavo (+((DisjunctionMaxQuery((categoria_s:iphone | titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | modello_s:iphone)) DisjunctionMaxQuery((categoria_s:cavo | titolo:cavo^2.0 | descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo)))~2) FunctionQuery(1.0/(1.0E-9*float(int(rank1_8))+1.0)))/no_coord +(((categoria_s:iphone | titolo:iphone^2.0 | descrizione:iphon^0.5 | marchio_s:iphone | modello_s:iphone) (categoria_s:cavo | titolo:cavo^2.0 | descrizione:cav^0.5 | marchio_s:cavo | modello_s:cavo))~2) 1.0/(1.0E-9*float(int(rank1_8))+1.0) 0.8545726 = (MATCH) sum of: 0.82827055 = (MATCH) sum of: 0.33939165 = (MATCH) max of: 0.33939165 = (MATCH) weight(modello_s:iphone in 24160) [DefaultSimilarity], result of: 0.33939165 = score(doc=24160,freq=1.0 = termFreq=1.0 ), product of: 0.21819489 = queryWeight, product of: 8.295743 = idf(docFreq=170, maxDocs=252056) 0.02630203 = queryNorm 1.5554519 = fieldWeight in 24160, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 8.295743 = idf(docFreq=170, maxDocs=252056) 0.1875 = fieldNorm(doc=24160) 0.48887888 = (MATCH) max of: 0.
Re: Format of the spellcheck.q used to get suggestions in current filter
Hi, What do you mean by "suggestions only for current category" ? Do you mean that suggested word(s) should return non-zero hits for that category? Ahmet On Wednesday, February 26, 2014 6:36 PM, Hakim Benoudjit wrote: @Jack Krupansky, here is the important portion of my solrconfig.xml: default title solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 As you guess 'title' field is the one I'm searching & the one I'm building my suggestions from. @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues, cause I want to get suggestions only for current category. 2014-02-26 17:07 GMT+01:00 Ahmet Arslan : > Hi Hakim, > > According to wiki spellcheck.q is intended to use with 'spelling ready' > query/input. > 'spelling ready' means it does not contain field names, AND, OR, etc. > Something like should work. spellcheck.q=value1 value2&q=+field1:value1 > +field2:value2 > > Ahmet > > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit < > h.benoud...@gmail.com> wrote: > I have some difficulties to use `spellcheck.q` to get only suggestions for > current query. > > When I set `spellcheck.q` to lucene query format (field1:value1 AND > field2:value2), it doesnt return me any result. > > I have supposed that the value stored in `spellcheck.q` is just the value > of ``spellcheck` component default field, but it returns an error in this > case. > > Any help please? > >
Re: Format of the spellcheck.q used to get suggestions in current filter
@Jack Krupansky, here is the important portion of my solrconfig.xml: default title solr.DirectSolrSpellChecker internal 0.5 2 1 5 4 0.01 As you guess 'title' field is the one I'm searching & the one I'm building my suggestions from. @Ahmet Arsian: I understand that `spellcheck.q` doesnt resolve my issues, cause I want to get suggestions only for current category. 2014-02-26 17:07 GMT+01:00 Ahmet Arslan : > Hi Hakim, > > According to wiki spellcheck.q is intended to use with 'spelling ready' > query/input. > 'spelling ready' means it does not contain field names, AND, OR, etc. > Something like should work. spellcheck.q=value1 value2&q=+field1:value1 > +field2:value2 > > Ahmet > > > On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit < > h.benoud...@gmail.com> wrote: > I have some difficulties to use `spellcheck.q` to get only suggestions for > current query. > > When I set `spellcheck.q` to lucene query format (field1:value1 AND > field2:value2), it doesnt return me any result. > > I have supposed that the value stored in `spellcheck.q` is just the value > of ``spellcheck` component default field, but it returns an error in this > case. > > Any help please? > >
Re: Format of the spellcheck.q used to get suggestions in current filter
Hi Hakim, According to wiki spellcheck.q is intended to use with 'spelling ready' query/input. 'spelling ready' means it does not contain field names, AND, OR, etc. Something like should work. spellcheck.q=value1 value2&q=+field1:value1 +field2:value2 Ahmet On Wednesday, February 26, 2014 5:51 PM, Hakim Benoudjit wrote: I have some difficulties to use `spellcheck.q` to get only suggestions for current query. When I set `spellcheck.q` to lucene query format (field1:value1 AND field2:value2), it doesnt return me any result. I have supposed that the value stored in `spellcheck.q` is just the value of ``spellcheck` component default field, but it returns an error in this case. Any help please?
Re: Format of the spellcheck.q used to get suggestions in current filter
Could you post the request URL and the XML/JSON Solr response? And the solrconfig for both the query request handler and the spellcheck component. Is your spell check component configured for both fields, field1 and field2? -- Jack Krupansky -Original Message- From: Hakim Benoudjit Sent: Wednesday, February 26, 2014 10:50 AM To: solr-user@lucene.apache.org Subject: Format of the spellcheck.q used to get suggestions in current filter I have some difficulties to use `spellcheck.q` to get only suggestions for current query. When I set `spellcheck.q` to lucene query format (field1:value1 AND field2:value2), it doesnt return me any result. I have supposed that the value stored in `spellcheck.q` is just the value of ``spellcheck` component default field, but it returns an error in this case. Any help please?
Format of the spellcheck.q used to get suggestions in current filter
I have some difficulties to use `spellcheck.q` to get only suggestions for current query. When I set `spellcheck.q` to lucene query format (field1:value1 AND field2:value2), it doesnt return me any result. I have supposed that the value stored in `spellcheck.q` is just the value of ``spellcheck` component default field, but it returns an error in this case. Any help please?
[ANNOUNCE] Apache Solr 4.7.0 released.
February 2014, Apache Solr™ 4.7 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.7 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Solr 4.7 Release Highlights: * A new 'migrate' collection API to split all documents with a route key into another collection. * Added support for tri-level compositeId routing. * Admin UI - Added a new "Files" conf directory browser/file viewer. * Add a QParserPlugin for Lucene's SimpleQueryParser. * Suggest improvements: a new SuggestComponent that fully utilizes the Lucene suggester module; queries can now use multiple suggesters; Lucene's FreeTextSuggester and BlendedInfixSuggester are now supported. * New 'cursorMark' request param for efficient deep paging of sorted result sets. See http://s.apache.org/cursorpagination * Add a Solr contrib that allows for building Solr indexes via Hadoop's MapReduce. * Upgrade to Spatial4j 0.4. Various new options are now exposed automatically for an RPT field type. See Spatial4j CHANGES & javadocs. https://github.com/spatial4j/spatial4j/blob/master/CHANGES.md * SSL support for SolrCloud. Solr 4.7 also includes many other new features as well as numerous optimizations and bugfixes. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access.
Re: SolrCloud: How to replicate shard of another machine for failover?
This is only true the *first* time you start the cluster. As mentioned earlier, the correct way to assign shards to cores is to use the collection API. Failing that, you can start cores in a determined order, and the cores will assign themselves a shard/replica when they first start up. From that point on, that mapping is defined in clusterstate.json, and will persist until you change it (delete cluster state or use collection/core API to move/remove a core. It is a kludgy approach, that's why generally it isn't recommended for new starters to use, but by starting the first cores in a particular order you can get exactly the distribution you want. The collection API is good generally because it has some logic to distribute shards across machines, but you can't be very specific with it, you can't say "I want shard 1 on machine A, and its replicas on machines b, c & d). So we use the "start order" mechanism for our production systems, because we want to place shards on specific machines., We have 256 shards, so we want to know exactly what set of cores & machines is required in order to have a "full collection" of data. As long as you are aware of the limitations of each mechanism, both work. On 26 February 2014 10:26, Oliver Schrenk wrote: > > There is a round robin process when assigning nodes at cluster. If you > want > > to achieve what you want you should change your Solr start up order. > > Well that is just weird. To bring a cluster to a reproducible state, I > have to bring the whole cluster down, and start it up again in a specific > order? > > What order do you suggest, to have a failover mechanism?
Re: programmatically disable/enable solr queryResultCache...
Shalin, Great,Thanks for the clear explanation. let me try to make my scoring function as part of QueryResultKey. Thanks & Regards, Senthilnathan V On Wed, Feb 26, 2014 at 5:40 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > The problem here is that your custom scoring function (is that a > SearchComponent?) is not part of a query. The query cache is defined > as SolrCache where the QueryResultKey contains > Query, Sort, SortField[] and filters=List. So your custom > scoring function either needs to be present in the QueryResultKey or > else you need to disable the query result cache via configuration. > > On Wed, Feb 26, 2014 at 12:09 PM, Senthilnathan Vijayaraja > wrote: > > Erick, > > Thanks for the response. > > > > Kindly have a look at my sample query, > > > > select?fl=city,$score&q=*:*&fq={!lucene q.op=OR df=city > v=$cit}&cit=Chennai& > > > > *sort=$score desc& score=norm($la,value,10)& la=8 &b=1&c=2*here, > > score= norm($la,value,10), norm is a custom function > > > > *,if I change la then the $score will change.* > > first time it work fine but if I am changing la alone and firing the > query > > again the result remains in the same order as first query result.Which > > means sorting is not happening even the score is different.But If I am > > changing the cit=Chennai to cit=someCity then I am getting result in > > proper order,means sorting works fine. > > > > At any rate, queryResultCache is unlikely to impact > > much. All it is is > > *a map containing the query and the first few document IDs *(internal > > Lucene). > > > > which means query is the unique key and list of document ids are values > > mapped with that key.If I am not wrong, > > > > may I know how solr builds the unique keys based on the queries. > > > > Whether it builds the key based on only solr common query parameters or > it > > will include all the parameters supplied by user as part of query(for e.g > > la=8&b=1&c=2 ). > > > > > > any clue? > > > > > > Thanks & Regards, > > Senthilnathan V > > > > > > On Tue, Feb 25, 2014 at 8:00 PM, Erick Erickson >wrote: > > > >> This seems like an XY problem, you're asking for > >> specifics on doing something without any indication > >> _why_ you think this would help. Nor are you explaining > >> what the problem you're having is in the first place. > >> > >> At any rate, queryResultCache is unlikely to impact > >> much. All it is is a map containing the query and > >> the first few document IDs (internal Lucene). See > >> in solrconfig.xml. It is > >> quite light-weight, it does NOT store the entire > >> result set, nor even the contents of the documents. > >> > >> Best > >> Erick > >> > >> > >> On Tue, Feb 25, 2014 at 6:07 AM, Senthilnathan Vijayaraja < > >> senthilnat...@8kmiles.com> wrote: > >> > >> > is there any way programmatically disable/enable solr > queryResultCache? > >> > > >> > I am using SolrJ. > >> > > >> > > >> > Thanks & Regards, > >> > Senthilnathan V > >> > > >> > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Cluster state ranges are all null after reboot
If you have 15 shards and assuming that you've never used shard splitting, you can calculate the shard ranges by using new CompositeIdRouter().partitionRange(15, new CompositeIdRouter().fullRange()) This gives me: [8000-9110, 9111-a221, a222-b332, b333-c443, c444-d554, d555-e665, e666-f776, f777-887, 888-1998, 1999-2aa9, 2aaa-3bba, 3bbb-4ccb, 4ccc-5ddc, 5ddd-6eed, 6eee-7fff] Have you done any more investigation into why this happened? Anything strange in the logs? Are you able to reproduce this in a test environment? On Wed, Feb 19, 2014 at 5:16 AM, Greg Pendlebury wrote: > We've got a 15 shard cluster spread across 3 hosts. This morning our puppet > software rebooted them all and afterwards the 'range' for each shard has > become null in zookeeper. Is there any way to restore this value short of > rebuilding a fresh index? > > I've read various questions from people with a similar problem, although in > those cases it is usually a single shard that has become null allowing them > to infer what the value should be and manually fix it in ZK. In this case I > have no idea what the ranges should be. This is our test cluster, and > checking production I can see that the ranges don't appear to be > predictable based on the shard number. > > I'm also not certain why it even occurred. Our test cluster only has a > single replica per shard, so when a JVM is rebooted the cluster is > unavailable... would that cause this? Production has 3 replicas so we can > do rolling reboots. -- Regards, Shalin Shekhar Mangar.
Function Query does not work properly
Hi, I have a small problem using function queries. According to http://wiki.apache.org/solr/FunctionQuery#Date_Boosting and http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents I've tried using function queries to boost newer documents over older ones. For my case, I have documents with dates in the future, so I tried to adapt the example: All dates in the future should have a boost multiplier of 1. Therefore, I tried using the following function, so all dates until 100 years in the future should get a 0 through the map function, and all past dates should end up being used as they are after the map function, resulting in a boost multiplier of 1 for all future dates, and all past dates having the normal values of their age according to the recip function: recip(map(product(ms(NOW,date_field),3.16e-11),-100,0,0),3.16e-11,1,1) Unfortunately, this does not seem to work - this function seems to return 1 for any date_field value After that, I tried using a workaround, by emulating the recip function using the div and product and sum functions: div(1,sum(product(map(product(ms(NOW,date_field),3.16e-11),-100,0,0),3.16e-11),1)) This also did not work . Finally I checked, whether the map function returns correct values, by executing the map function alone, so all future days should end up having 0 for their score. This DID work. map(product(ms(NOW,date_field),3.16e-11),-100,0,0) So my question now: Am I doing something wrong or is there a bug in the recip function? I am currently using Solr 4.5.1. Thanks for your help, Jan
Re: programmatically disable/enable solr queryResultCache...
The problem here is that your custom scoring function (is that a SearchComponent?) is not part of a query. The query cache is defined as SolrCache where the QueryResultKey contains Query, Sort, SortField[] and filters=List. So your custom scoring function either needs to be present in the QueryResultKey or else you need to disable the query result cache via configuration. On Wed, Feb 26, 2014 at 12:09 PM, Senthilnathan Vijayaraja wrote: > Erick, > Thanks for the response. > > Kindly have a look at my sample query, > > select?fl=city,$score&q=*:*&fq={!lucene q.op=OR df=city v=$cit}&cit=Chennai& > > *sort=$score desc& score=norm($la,value,10)& la=8 &b=1&c=2*here, > score= norm($la,value,10), norm is a custom function > > *,if I change la then the $score will change.* > first time it work fine but if I am changing la alone and firing the query > again the result remains in the same order as first query result.Which > means sorting is not happening even the score is different.But If I am > changing the cit=Chennai to cit=someCity then I am getting result in > proper order,means sorting works fine. > > At any rate, queryResultCache is unlikely to impact > much. All it is is > *a map containing the query and the first few document IDs *(internal > Lucene). > > which means query is the unique key and list of document ids are values > mapped with that key.If I am not wrong, > > may I know how solr builds the unique keys based on the queries. > > Whether it builds the key based on only solr common query parameters or it > will include all the parameters supplied by user as part of query(for e.g > la=8&b=1&c=2 ). > > > any clue? > > > Thanks & Regards, > Senthilnathan V > > > On Tue, Feb 25, 2014 at 8:00 PM, Erick Erickson > wrote: > >> This seems like an XY problem, you're asking for >> specifics on doing something without any indication >> _why_ you think this would help. Nor are you explaining >> what the problem you're having is in the first place. >> >> At any rate, queryResultCache is unlikely to impact >> much. All it is is a map containing the query and >> the first few document IDs (internal Lucene). See >> in solrconfig.xml. It is >> quite light-weight, it does NOT store the entire >> result set, nor even the contents of the documents. >> >> Best >> Erick >> >> >> On Tue, Feb 25, 2014 at 6:07 AM, Senthilnathan Vijayaraja < >> senthilnat...@8kmiles.com> wrote: >> >> > is there any way programmatically disable/enable solr queryResultCache? >> > >> > I am using SolrJ. >> > >> > >> > Thanks & Regards, >> > Senthilnathan V >> > >> -- Regards, Shalin Shekhar Mangar.
Re: Knowing shard value of result
Ah, I didn't know that this is possible with DocTransformers. This is also possible in Solr 4.7 (to be released soon) by using shards.info=true in the request. On Wed, Feb 26, 2014 at 2:32 PM, Ahmet Arslan wrote: > Hi, > > I think with this : https://wiki.apache.org/solr/DocTransformers#A.5Bshard.5D > > Ahmet > > > > On Wednesday, February 26, 2014 10:36 AM, search engn dev > wrote: > I have setup solr cloud of two shards and two replicas. I am using solrj for > communicating with solr. We are using CloudSolrServer for searching in solr > cloud. below is my code > > String zkHost = > "host1:2181,host1:2182,host1:2183,host1:2184,host1:2185"; > CloudSolrServer server = new CloudSolrServer(zkHost); > server.connect(); > server.setDefaultCollection(defaultCollection); > server.setIdField("Id"); > SolrQuery parameters = new SolrQuery(); > parameters.set("q","*:*"); > QueryResponse response = server.query(parameters); > System.out.println(""+response.toString()); > > I am getting correct response from solr. But how do i know the requested > solr hosts. ? because request can go to any live solr host. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Regards, Shalin Shekhar Mangar.
Re: concurrentlinkedhashmap 1.2 vs 1.4
I think it would need Guava v16.0.1 to benefit from the ported code. Guido. On 26/02/14 11:20, Guido Medina wrote: As notes also stated at concurrentlinkedhashmap v1.4, the performance changes were ported to Guava (don't know to what version to be honest), so, wouldn't be better to use MapMaker builder? Regards, Guido. On 26/02/14 11:15, Guido Medina wrote: Hi, I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, according to notes at https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x designed against Java 6+? If so, wouldn't it benefit from v1.4? Regards, Guido.
Re: concurrentlinkedhashmap 1.2 vs 1.4
As notes also stated at concurrentlinkedhashmap v1.4, the performance changes were ported to Guava (don't know to what version to be honest), so, wouldn't be better to use MapMaker builder? Regards, Guido. On 26/02/14 11:15, Guido Medina wrote: Hi, I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, according to notes at https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x designed against Java 6+? If so, wouldn't it benefit from v1.4? Regards, Guido.
concurrentlinkedhashmap 1.2 vs 1.4
Hi, I noticed Solr is using concurrentlinkedhashmap v1.2 which is for Java 5, according to notes at https://code.google.com/p/concurrentlinkedhashmap/ version 1.4 has performance improvements compared to v1.2, isn't Solr 4.x designed against Java 6+? If so, wouldn't it benefit from v1.4? Regards, Guido.
Re: SolrCloud: How to replicate shard of another machine for failover?
Hi, > Don't run multiple instances of Solr on one machine. Instead, run one > instance per machine and create the collection with the maxShardsPerNode > parameter set to 2 or whatever value you need. Ok. > Yet another whole separate discussion: You need three physical nodes for > a redundant zookeeper, but I see only one host (localhost) in your > zkHost parameter. I know, but thanks for pointing it out. At the moment I’m doing a proof of concept investigating SolrCloud. Properly configuring Zookeeper comes later. > The way you've set it up, SolrCloud just sees that you have four Solr > instances. It does not know that they are on the same machine. As far > as it is concerned, they are entirely separate. > > Something that would be a good idea is an optional config flag that > would make SolrCloud compare hostnames when building a collection and > avoid putting replicas on nodes where the hostname matches. Whether to > default this option to on or off is a whole separate discussion. That would be a great addition. Because as of now I don’t see a way of having a reproducible failover mechanism without additional physical machines. Or am I wrong here? Let’s say I have two leaders (host1, and host2) running, each with one shard of the collection running. How can I make sure that host1 will run a replica of host2 ? Thanks Oliver
Re: SolrCloud: How to replicate shard of another machine for failover?
> There is a round robin process when assigning nodes at cluster. If you want > to achieve what you want you should change your Solr start up order. Well that is just weird. To bring a cluster to a reproducible state, I have to bring the whole cluster down, and start it up again in a specific order? What order do you suggest, to have a failover mechanism?
Re: Knowing shard value of result
Thanks iorixxx, SolrQuery parameters = new SolrQuery(); parameters.set("q","*:*"); parameters.set("fl","Id,STATE_NAME,[shard]"); parameters.set("distrib","true"); QueryResponse response = server.query(parameters); It's working fine now. -- View this message in context: http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713p4119720.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Knowing shard value of result
Hi, I think with this : https://wiki.apache.org/solr/DocTransformers#A.5Bshard.5D Ahmet On Wednesday, February 26, 2014 10:36 AM, search engn dev wrote: I have setup solr cloud of two shards and two replicas. I am using solrj for communicating with solr. We are using CloudSolrServer for searching in solr cloud. below is my code String zkHost = "host1:2181,host1:2182,host1:2183,host1:2184,host1:2185"; CloudSolrServer server = new CloudSolrServer(zkHost); server.connect(); server.setDefaultCollection(defaultCollection); server.setIdField("Id"); SolrQuery parameters = new SolrQuery(); parameters.set("q","*:*"); QueryResponse response = server.query(parameters); System.out.println(""+response.toString()); I am getting correct response from solr. But how do i know the requested solr hosts. ? because request can go to any live solr host. -- View this message in context: http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713.html Sent from the Solr - User mailing list archive at Nabble.com.
Knowing shard value of result
I have setup solr cloud of two shards and two replicas. I am using solrj for communicating with solr. We are using CloudSolrServer for searching in solr cloud. below is my code String zkHost = "host1:2181,host1:2182,host1:2183,host1:2184,host1:2185"; CloudSolrServer server = new CloudSolrServer(zkHost); server.connect(); server.setDefaultCollection(defaultCollection); server.setIdField("Id"); SolrQuery parameters = new SolrQuery(); parameters.set("q","*:*"); QueryResponse response = server.query(parameters); System.out.println(""+response.toString()); I am getting correct response from solr. But how do i know the requested solr hosts. ? because request can go to any live solr host. -- View this message in context: http://lucene.472066.n3.nabble.com/Knowing-shard-value-of-result-tp4119713.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need feedback: Browsing and searching solr-user list emails
Hi Dmitry, Thanks for your feedback. Couple of inline responses below. On Mon, Feb 24, 2014 at 4:43 AM, Dmitry Kan wrote: > Hello! > > Just few random points: > > 1. Interesting site. I'd say there are similar sites, but this one has > cleaner interface. How does your site compare to this one, for example, in > terms of feature set? > > http://qnalist.com/questions/4640870/luke-4-6-0-released > > At least, the user ranking seems to be different, because on your site > yours truly marked with 5800 points and on the qnalist with 59. > > Looks like a similar idea. UI seems quite different though, as you suggested - seems qnalist is removing all quoted text within emails. We preserve it as it brings "context". Imagine inline responses showing up without quoted text. Seems it is missing "crowdsource" aspect also - votes, favorites, best answers - which are very important for relevancy. Might want to compare search results as well, particularly the "Related questions" under each question. Being able to quickly navigate to similar threads (like StackExchange) is a very powerful way to access content. 2. Do you handle several users, like DmitryKan, DmitryKan-1.. as a single > user, i.e. if I'd post under different e-mail addresses. > Yes, but with administrator's intervention. We combine multiple name identities associated under same email address (may be coming from different email clients) but combining multiple emails addresses needs to be done by admin. > 3. It seems like your site is going to mostly be read only, except for > question / user voting? > Yes, in a short-term. However, one can argue that solr-user type mailing lists are Q&A anyways and SE like forum are better suited for this purpose given they "organize" content little better compared to emails. So if longer term solution for managing such community is Q&A then solution like this "gently" moves people in that direction without asking them to drastically change existing behaviors. > > To me any such site, including yours, will make sense as long as I could > find stuff faster than with Google. > That's probably the key. Even with SE, Google lands you there but once you are on SE, you navigate using its own search and recommendation engine etc. It all boils down to the quality of search ranking and associated UI :) Durgam. > > Dmitry Kan > > > > > > On Tue, Feb 11, 2014 at 7:18 AM, Durgam Vahia wrote: > > > Hi Solr-users, > > > > I wanted to get your thoughts/feedback on a potentially useful way to > > browse and search prior email conversations in > > solr-users@lucenedistribution list. > > > > http://www.signaldump.org/solr/qpod/ > > > > In a nutshell, this is a Q&A engine like StackExchange (SE) > auto-populated > > with solr-users@lucene email threads of past one year. Engine auto-tags > > email threads and creates user profile of participants with points, > badges > > etc. New emails also gets processed automatically and will be placed > under > > the relevant conversation. > > > > Here are some of the advantages that might be useful - > > > >- Like SE, users can "crowdsource" the quality of content by voting, > and > >choosing best answers. > >- You can favorite posts/threads, users, tags to personalize search. > >- Email conversations and Q&A engine work seamlessly together. One can > >use any medium and conversations are still presented in a uniform way. > >- Web UI supports mobile device aspect ratios - just click on above > link > >on your mobile device to get a feel. > > > > Do you think this would be useful for the solr-users community? To get a > > feel, try searching the archive before posting in the email list to see > if > > UI makes finding things little gentler. As more people search/view/vote, > > search should become more relevant and personalized. > > > > I would be happy to maintain this for the benefit of the community. > > Currently I have only seeded past one year of email but we could > > potentially go further back if people find this useful. > > > > Thanks and feedback welcome. > > > > And before someone asks - yes, our search engine is Solr .. > > > > Durgam. > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: twitter.com/dmitrykan >
RE: Performance problem on Solr query on stemmed values
Hi Erick, thank you for the reply. Yes, I'm using the fast vector highlighter (Solr 4.3). Every request should only deliver 10 results. Here is my schema configuration on both field: Field content contains in average around 5000 - 6000 words (only rough estimation). Best regards Erwin -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, February 25, 2014 3:27 PM To: solr-user@lucene.apache.org Subject: Re: Performance problem on Solr query on stemmed values Right, highlighting may have to re-analyze the input in order to return the highlighted data. This will be significantly slower than the search, especially if you have a large number of rows you're returning. You can get better performance in highlighting by using FastVectorHighlighter. See: https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter 1000x is unusual, though, unless your fields are very large or you're returning a lot of documents. Best, Erick On Tue, Feb 25, 2014 at 5:23 AM, Erwin Gunadi wrote: > Hi, > > > > I would like to know whether anyone have experienced this kind of > phenomena. > > > > We are having performance problem regarding query on stemmed value. > > I've documented the symptoms which I'm currently facing: > > > > > Search on field content > > Search on field spell > > Highlighting (on content field) > > Processing speed > > > active > > active > > Active > > Slow > > > active > > not active > > Active > > Fast > > > active > > active > > not active > > Fast > > > not active > > active > > Active > > Slow > > > not active > > active > > not active > > Fast > > > > *Fast means 1000x faster than "slow". > > > > Field Content is our index field, which holds original text, and spell > is the field with stemmed value. > > According to my measurement result, search on both fields (stemmed and > not > stemmed) is really fast. > > But when I start to take highlighting into our query it takes too long > to process. > > > > Best Regards > > Erwin > >