Re: how to get abortOnConfigurationError=false working
If you look at the version info I posted it seems that it is a pretty old version embedded in Coldfusion, 1.4 by the looks of it. Regards Russ Michaels www.michaels.me.uk www.cfmldeveloper.com - Free CFML hosting for developers www.cfsearch.com - CF search engine On Jan 17, 2013 9:33 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4034359...@n3.nabble.com wrote: Solr 4 most definitely ignores missing cores (just run into that accidentally again myself). So, if you start Solr and directory is missing, it will survive (but complain). The other problem is what happens when a customer deletes the account and the core directory disappears in a middle of open searcher. I would suggest some-sort of pre-delete trigger that hits Solr admin interface and unloads that core first. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Jan 17, 2013 at 4:03 PM, Yonik Seeley [hidden email]http://user/SendEmail.jtp?type=nodenode=4034359i=0 wrote: On Thu, Jan 17, 2013 at 3:40 PM, snake [hidden email]http://user/SendEmail.jtp?type=nodenode=4034359i=1 wrote: Ok so is there any other to stop this problem I am having where any site can break solr by delering their collection? Seems odd everyone would vote to remove a feature that would make solr more stable. I agree. abortOnConfigurationError was more about a single core... if the core would still be loaded if there were config errors. There *should* be a way to still load other cores if one core has an error and is not loaded. If there's not currently, then we should implement it. -Yonik http://lucidworks.com -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034359.html To unsubscribe from how to get abortOnConfigurationError=false working, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4034149code=cnVzc0BtaWNoYWVscy5tZS51a3w0MDM0MTQ5fDEwMDg4NTg5MzM= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-abortOnConfigurationError-false-working-tp4034149p4034468.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: access matched token ids in the FacetComponent?
Hello Mikhail, I have some relevant experience and ready to help... Thanks a lot! ...but I can not get the core problem. Could you please expand the description and/or provide a sample? Sure. Among other fields not relevant to this discussion, we have two fields: a text field (Contents; searchable tokenized and filtered field) and a unique string field document id (or doc id for short). We search on one field (Contents), but return facets on another: (doc id). It is absolutely legit, that Solr simply counts the number of doc ids in the result set and since there is only one doc id per document it sets the facet count of that doc id to 1. What we need FacetComponent to report instead is the count of special matches inside the document (one figure per document id). The special matches mean, that we actually require the count of the sentences inside each document where the hits were found. By now I know, how to retrieve the individual sentences using Solr Highlighter: with the help of hl.fragmenter=regexhl.snippets=1hl.regex.pattern=[REGEX_PATTERN_TO_IDENTIFY_A_SENTENCE]. My thinking goes like this: Implement a similar to functionality to the one of org.apache.lucene.search.highlight.Highlighter in the FacetComponent to perform counting of matches the way described above. Substitute per doc id counts (1's) with the calculated sentence counts. Return the updated facet results. Regards, Dmitry On Tue, Jan 15, 2013 at 2:08 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Dmitry, I have some relevant experience and ready to help, but I can not get the core problem. Could you please expand the description and/or provide a sample? On Tue, Jan 15, 2013 at 11:01 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello! Is there a simple way of accessing the matched token ids in the FacetComponent? The use case is to text search on one field and facet on another. And in the facet counts we want to see the text hit counts. Can it be done via some other component / approach? Any input is greatly appreciated. Dmitry -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: using PositionIncrementAttribute to increment certain term positions to large values
Hi, For the sake of story completeness, I was able to fix the highlighter to work with the token matches that go beyond the length of the text field. The solution was to mod on matched token positions, if they exceed the length of the text. Dmitry On Thu, Dec 27, 2012 at 10:13 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi, answering my own question for the records: the experiments show that the described functionality is achievable with the TokenFilter class implementation. The only caveat though, is that Highlighter component stops working properly, if the match position goes beyond the length of the text field. As for the performance, no major delays compared to the original proximity search implementation have been noticed. Best, Dmitry Kan On Wed, Dec 19, 2012 at 10:53 AM, Dmitry Kan solrexp...@gmail.com wrote: Dear list, We are currently evaluating proximity searches (term1 term2 ~slope) for a specific use case. In particular, each document contains artificial delimiter characters (one character between each pair of sentences in the text). Our goal is to hit the sentences individually for any proximity search and avoid sentence cross-boundary matches. We figured, that by using PositionIncrementAttribute as a field in the descendant of TokenFilter class it is possible to set a position increment of each artificial character (which is a term in Lucene / SOLR notation) to an arbitrarily large number. Thus any proximity searches with reasonably small slope values should automatically hit withing the sentence boundaries. Does this sound like a right way to tackle the problem? Are there any performance costs involved? Thanks in advance for any input, Dmitry Kan
Re: build CMIS compatible Solr
A colleague of mine when I was working for Sourcesense made a CMIS plugin for Solr. It was one way, and we used it to index stuff out of Alfresco into Solr. I can't search for it now, let me know if you can't find it. Upayavira On Fri, Jan 18, 2013, at 05:35 AM, Nicholas Li wrote: I want to make something like Alfresco, but not having that many features. And I'd like to utilise the searching ability of Solr. On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com wrote: On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote: hi I am new to solr and I would like to use Solr as my document server, plus search engine. But solr is not CMIS compatible( While it shoud not be, as it is not build as a pure document management server). In that sense, I would build another layer beyond Solr so that the exposed interface would be CMIS compatible. [...] May I ask why? Solr is designed to be a search engine, which is a very different beast from a document repository. In the open-source world, Alfresco ( http://www.alfresco.com/ ) already exists, can index into Solr, and supports CMIS-based access. Regards, Gora
Re: Solr getting scores of multiple core queries?
K since I found out it didn't work I have merged my two cores back to 1. But now the scores still don't work on the join? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-getting-scores-of-multiple-core-queries-tp4034088p4034481.html Sent from the Solr - User mailing list archive at Nabble.com.
ConcurrentModificationException in Solr 3.6.1
Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034493.html Sent from the Solr - User mailing list archive at Nabble.com.
ConcurrentModificationException in Solr 3.6.1
Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034494.html Sent from the Solr - User mailing list archive at Nabble.com.
ConcurrentModificationException in solr 3.6.1
Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-solr-3-6-1-tp4034495.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: ConcurrentModificationException in Solr 3.6.1
This should be fixed in 3.6.2 which is available since Dec 25. From the release notes: Fixed ConcurrentModificationException during highlighting, if all fields were requested. André Von: mechravi25 [mechrav...@yahoo.co.in] Gesendet: Freitag, 18. Januar 2013 11:10 An: solr-user@lucene.apache.org Betreff: ConcurrentModificationException in Solr 3.6.1 Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034493.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Large data importing getting rollback with solr
On 18 January 2013 15:04, ashimbose ashimb...@gmail.com wrote: Hi Gora, Thank you for your reply again, Joining is not possible in my case. coz there is no relation between all tables. is there joining is possible without any relation in this solr case? No, one needs some kind of a relationship to join. I really going through hard time.Its really tough me to do alternatives process also, coz I am very new in solr. My Requirement is I need all text, varchar or char type data from my data source sampleDB, Its huge dsta source, may be it will come in TB amd more than 1000 tables. This will be quite difficult to benchmark, architect, and set up, especially if you are not experienced with Solr. Again, it might be best to re-examine your assumptions. Could you describe what you are trying to do, rather than jumping to a solution? What kind of search requirements do you have that need data from a thousand tables per Solr document? How is it that the data tables have no relationships, where presumably the fields put into each Solr document will have something in common? What kind of queries are you planning to run? Even if you are certain that you want to go down this route, it would make sense to approach this in a phased manner. Become familiar with Solr indexing using just a few tables, then extend that to more tables. Benchmark, and use the results to plan the sort of architecture you will need to build. Which one is the batter solution 1) Having multiple root entities and indexing each separately 2) Having multiple data-import requestHandlers Can you give me example and procedure how to achieve those. What changes I need to have. [...] You should really try and get familiar with the basics of Solr indexing first, but here is a brief outline: * Each Solr DIH configuration file has only one document tag, but can have multiple data sources, multiple root entities, and nested entities. In your case, you could use multiple data sources to spread the load over multiple databases, each holding a subset of the tables. It looks like you are already using multiple root entities, as all your entities are distinct. Thus, instead of importing all the entities at once, which is what /dataimport?command=full-import does, you could import them in batches, e.g., /dataimport?command=full-importentity=CUSTOMER would import only the CUSTOMER entity, /dataimport?command=full-importentity=CUSTOMERentity=SHOP would import the CUSTOMER and SHOP entities, and so on. * Have never tried this, but one can set up multiple request handlers in solrconfig.xml for each DIH instance that one plans to run. These can run in parallel rather than the sequential indexing of root entities in a single DIH instance. Regards, Gora
RE: Field Collapsing - Anything in the works for multi-valued fields?
If I understand the reading, you've suggested that I index the vendor names as their own document (currently this is a multi-valued field of each document). Each such vendor document would just have a single valued 'name' field. Each normal product document would contain a multi-valued field that is a list of vendor document IDs and that we use to join the query results with the vendor documents. I presume this means that I would have some kind of dynamic field created from the join which I could use as the 'group.field' value? I didn't quite follow the last point. -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Friday, January 18, 2013 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Field Collapsing - Anything in the works for multi-valued fields? Hi, Instead of the multi-valued fields, would parent-child setup for you here? See http://search-lucene.com/?q=solr+joinfc_type=wiki Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Jan 17, 2013 at 8:04 PM, David Parks davidpark...@yahoo.com wrote: The documents are individual products which come from 1 or more vendors. Example: a 'toy spiderman doll' is sold by 2 vendors, that is 1 document. Most fields are multi valued (short_description from each of the 2 vendors, long_description, product_name, vendor, etc. the same). I'd like to collapse on the vendor in an attempt to ensure that vast collections of books, music, and movies, by just a few vendors, don't overwhelm the results simply due to the fact that they have every search term imaginable due to the sheer volume of books, CDs, and DVDs, in relation to other product items. But in this case there is clearly 1...N vendors per document, solidly a multi-valued field. And it's hard to put a maximum number of vendors possible. Thanks, Dave -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Friday, January 18, 2013 2:32 AM To: solr-user Subject: Re: Field Collapsing - Anything in the works for multi-valued fields? David, What's the documents and the field? It can help to suggest workaround. On Thu, Jan 17, 2013 at 5:51 PM, David Parks davidpark...@yahoo.com wrote: I want to configure Field Collapsing, but my target field is multi-valued (e.g. the field I want to group on has a variable # of entries per document, 1-N entries). I read on the wiki (http://wiki.apache.org/solr/FieldCollapsing) that grouping doesn't support multi-valued fields yet. Anything in the works on that front by chance? Any common work-arounds? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Large data importing getting rollback with solr
Hi Gora, I will have a Archive Data as some format like relational, Provided by client. Which may have some relation but I will not know. I have to make index that data without restoring to my sql db. That means I have to read the data from archive file directly and this will be full automated. What ever data have in that archive file I need to get all, mean Select * from Table is my required query only. In that Archive file can have huge table and data but data source will be only one. Let me know if you need any other information to help me... Thanks Regards, Ashim -- View this message in context: http://lucene.472066.n3.nabble.com/Large-data-importing-getting-rollback-with-solr-tp4034075p4034508.html Sent from the Solr - User mailing list archive at Nabble.com.
indexed file size in doubled in Solr 3.6.1 slave after Replication
Hi all, I am using Solr 3.6.1 version and have both master and slave instance. I noticed that whenever I make some configuration changes and then restart the server(both master and slave instance) to replicate the changed index(after indexing in master) to slave. When I perform the replication, I see that the data count between the master and the slave remains the same but the size of the index files seems to be doubled in the slave(400MB) when compared to master(200MB). I tried refreshing the page then also it remains the same and also, I see that the replication.properties gets created automaticaly to point to the current index folder after few replications. Any reason why the index file size gets doubled from master to slave. This scenario occurs only when I do a configuration change in master and slave and then replicate data from master to slave. To overcome this, I delete the index folder in slave and then replicate, then the size remains the same with both master and slave. After that ,the rest of the time, this scenario does not occur. This only happens when replication is done the first time after configuration changes. Please tell me if there is any configuration changes that is needed to be done to overcome this scenario or is there any other reason for the occurence of this scenario? Please guide me. -- View this message in context: http://lucene.472066.n3.nabble.com/indexed-file-size-in-doubled-in-Solr-3-6-1-slave-after-Replication-tp4034518.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexed File Size is doubled in Solr 3.6.1 Slave after Replication
Hi all, I am using Solr 3.6.1 version and have both master and slave instance. I noticed that whenever I make some configuration changes and then restart the server(both master and slave instance) to replicate the changed index(after indexing in master) to slave. When I perform the replication, I see that the data count between the master and the slave remains the same but the size of the index files seems to be doubled in the slave(400MB) when compared to master(200MB). I tried refreshing the page then also it remains the same and also, I see that the replication.properties gets created automaticaly to point to the current index folder after few replications. Any reason why the index file size gets doubled from master to slave. This scenario occurs only when I do a configuration change in master and slave and then replicate data from master to slave. To overcome this, I delete the index folder in slave and then replicate, then the size remains the same with both master and slave. After that ,the rest of the time, this scenario does not occur. This only happens when replication is done the first time after configuration changes. Please tell me if there is any configuration changes that is needed to be done to overcome this scenario or is there any other reason for the occurence of this scenario? Please guide me. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-File-Size-is-doubled-in-Solr-3-6-1-Slave-after-Replication-tp4034519.html Sent from the Solr - User mailing list archive at Nabble.com.
ConcurrentModificationException in Solr 3.6.1
Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ConcurrentModificationException in Solr 3.6.1
Hi There, I think Andre has already guided you in your earlier mail.. This should be fixed in 3.6.2 which is available since Dec 25. From the release notes: Fixed ConcurrentModificationException during highlighting, if all fields were requested. André Von: mechravi25 [mechrav...@yahoo.co.in] Gesendet: Freitag, 18. Januar 2013 11:10 An: solr-user@lucene.apache.org Betreff: ConcurrentModificationException in Solr 3.6.1 On 18 January 2013 12:01, mechravi25 mechrav...@yahoo.co.in wrote: Hi all, I am using Solr 3.6.1 version. I am giving a set of requests to solr simultaneously. When I check the log file, I noticed the below exception stack trace SEVERE: java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at java.util.LinkedList$ListItr.next(LinkedList.java:696) at org.apache.solr.highlight.SolrHighlighter.getHighlightFields(SolrHighlighter.java:106) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:369) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) When I searched through the solr issues, I got the following two url's, https://issues.apache.org/jira/browse/SOLR-2684 https://issues.apache.org/jira/browse/SOLR-3790 The stack trace given in the second url coincides with the one given above so I have applied the code change as given in the below link http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?r1=1229401r2=1231606diff_format=h The first url's stack trace seems to be different. I have two questions here. 1.) Please tell me why this exception stack trace occurs 2.) IS there any other patch/solution available to overcome this exception. Please guide me. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentModificationException-in-Solr-3-6-1-tp4034520.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr load balancer
Hi, I would like to experiment with some custom load balancers to help with query latency in the face of long gc pauses and the odd time-consuming query that we need to be able to support. At the moment setting the socket timeout via the HttpShardHandlerFactory does help, but of course it can only be set to a length of time as long as the most time consuming query we are likely to receive. For example perhaps a load balancer that sends multiple queries concurrently to all/some replicas and only keeps the first response might be effective. Or maybe a load balancer which takes account of the frequency of timeouts would be able to recognize zombies more effectively. To use alternative load balancer implementations cleanly and without having to hack solr directly, I would need to be able to make the existing LBHttpSolrServer and HttpShardHandlerFactory more amenable to extension, I can then override the default load balancer using solr's plugin mechanism. So my question is, if I made a patch to make the load balancer more pluggable, is this something that would be acceptable and if so what do I do next? Phil __ brightsolid is used in this email to collectively mean brightsolid online innovation limited and its subsidiary companies brightsolid online publishing limited and brightsolid online technology limited. findmypast.co.uk is a brand of brightsolid online publishing limited. brightsolid online innovation limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC274983. brightsolid online publishing limited, The Glebe, 6 Chapel Place, Rivington Street, London EC2A 3DQ. Registered in England No. 04369607. brightsolid online technology limited, Gateway House, Luna Place, Dundee Technology Park, Dundee DD2 1TP. Registered in Scotland No. SC161678. Email Disclaimer This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it. __ This email has been scanned by the brightsolid Email Security System. Powered by MessageLabs __
Re: access matched token ids in the FacetComponent?
Dmitry, It definitely seems like postptocessing highlighter's output. The also approach is: - limit number of occurrences of a word in a sentence to 1 - play with facet by function patch https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf() function. It doesn't seem like much help. On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan solrexp...@gmail.com wrote: that we actually require the count of the sentences inside each document where the hits were found. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to round solr score ?
On 18 January 2013 18:26, Gustav xbihy...@sharklasers.com wrote: I have to bump this... is it possible to do it (round solr's score) with any integrated query function?? Do not have a Solr index handy at the moment to check, but it should be possible to do this with function queries. Please see the rint() and query() function at http://wiki.apache.org/solr/FunctionQuery Regards, Gora
Re: Logging wrong exception
On 18 January 2013 18:34, Muhzin R muhsinlo...@gmail.com wrote: Hi all, I'm trying to set the value of a field in my schema to null.The solr throws the following exception . [...] This is the relevant part of the error: INFO - 2013-01-18 18:13:35.409; org.apache.solr.update.processor.LogUpdateProcessor; [core0] webapp=/solr path=/update params={wt=javabinversion=2} {} 0 3 ERROR - 2013-01-18 18:13:35.409; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: [doc=10] missing required field: countryId [...] __ Even though I'm trying to modify a field other than countryId. FYI i'm trying to do a partial update. The schema of countryId is as : field name=countryId type=int indexed=true stored=true required= true/ Why is solr logging a me the wrong exception? Please show us the code that triggers this exception. Seems like you are trying to do an update without providing a value for a required field. If you are using Solr 4.0, here is how to update only a specific in a document: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ Regards, Gora
Re: access matched token ids in the FacetComponent?
Mikhail, Do you say, that it is not possible to access the matched terms positions in the FacetComponent? If that would be possible (somewhere in the StandardFacetsAccumulator class, where docids are available), then by knowing the matched term positions I can do some school simple math to calculate the sentence counts per doc id. Dmitry On Fri, Jan 18, 2013 at 2:45 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Dmitry, It definitely seems like postptocessing highlighter's output. The also approach is: - limit number of occurrences of a word in a sentence to 1 - play with facet by function patch https://issues.apache.org/jira/browse/SOLR-1581 accomplished by tf() function. It doesn't seem like much help. On Fri, Jan 18, 2013 at 12:42 PM, Dmitry Kan solrexp...@gmail.com wrote: that we actually require the count of the sentences inside each document where the hits were found. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: How to round solr score ?
Hey Gora, thanks for the fast answer! I Had tried the rint(score) function before(it would be perfect in my case) but it didnt work out, i guess it only works with indexed fields, so i got the sort param could not be parsed as a query, and is not a field that exists in the index: rint(score) error, And with the query() function i didnt got any successful result... Im stuck in the same cenario as squaro. if two docs have score of 1.67989 and 1.6767, I would like to sort them by price. My sort rules ae something like: sort=score desc, price asc -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-round-solr-score-tp495198p4034551.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR 4.x: multiterm phrase inside proximity searches possible?
Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/jira/browse/LUCENE-2754 implements what we need on the Lucene side. Regards, Dmitry
Re: Solr cache considerations
No, the fieldValueCache is not used for resolving queries. Only for multi-token faceting and apparently for the stats component too. The document cache maintains in memory the stored content of the fields you are retrieving or highlighting on. It'll hit if the same document matches the query multiple times and the same fields are requested, but as Eirck said, it is important for cases when multiple components in the same request need to access the same data. I think soft committing every 10 minutes is totally fine, but you should hard commit more often if you are going to be using transaction log. openSearcher=false will essentially tell Solr not to open a new searcher after the (hard) commit, so you won't see the new indexed data and caches wont be flushed. openSearcher=false makes sense when you are using hard-commits together with soft-commits, as the soft-commit is dealing with opening/closing searchers, you don't need hard commits to do it. Tomás On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh isaac.he...@gmail.com wrote: Unfortunately, it seems ( http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that these caches are not per-segment. In this case, I want to (soft) commit less frequently. Am I right? Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I guess it has a big contribution to standard (not only faceted) queries time. SolrWiki claims that it primarily used by faceting. What that says about complex textual queries? documentCache: Erick, After a query processing is finished, doesn't some documents stay in the documentCache? can't I use it to accelerate queries that should retrieve stored fields of documents? In this case, a big documentCache can hold more documents.. About commit frequency: HardCommit: openSearch=false seems as a nice solution. Where can I read about this? (found nothing but one unexplained sentence in SolrWiki). SoftCommit: In my case, the required index freshness is 10 minutes. The plan to soft commit every 10 minutes is similar to storing all of the documents in a queue (outside to Solr), an indexing a bulk every 10 minutes. Thanks. On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I think fieldValueCache is not per segment, only fieldCache is. However, unless I'm missing something, this cache is only used for faceting on multivalued fields On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson erickerick...@gmail.com wrote: filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in cache). Notice the /8. This reflects the fact that the filters are represented by a bitset on the _internal_ Lucene ID. UniqueId has no bearing here whatsoever. This is, in a nutshell, why warming is required, the internal Lucene IDs may change. Note also that it's maxDoc, the internal arrays have holes for deleted documents. Note this is an _upper_ bound, if there are only a few docs that match, the size will be (num of matching docs) * sizeof(int)). fieldValueCache. I don't think so, although I'm a bit fuzzy on this. It depends on whether these are per-segment caches or not. Any per segment cache is still valid. Think of documentCache as intended to hold the stored fields while various components operate on it, thus avoiding repeatedly fetching the data from disk. It's _usually_ not too big a worry. About hard-commits once a day. That's _extremely_ long. Think instead of committing more frequently with openSearcher=false. If nothing else, you transaction log will grow lots and lots and lots. I'm thinking on the order of 15 minutes, or possibly even much less. With softCommits happening more often, maybe every 15 seconds. In fact, I'd start out with soft commits every 15 seconds and hard commits (openSearcher=false) every 5 minutes. The problem with hard commits being once a day is that, if for any reason the server is interrupted, on startup Solr will try to replay the entire transaction log to assure index integrity. Not to mention that your tlog will be huge. Not to mention that there is some memory usage for each document in the tlog. Hard commits roll over the tlog, flush the in-memory tlog pointers, close index segments, etc. Best Erick On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, I am going to build a big Solr (4.0?) index, which holds some dozens of millions of documents. Each document has some dozens of fields, and one big textual field. The queries on the index are non-trivial, and a little-bit long (might be hundreds of terms). No query is identical to another. Now, I want to analyze the cache performance (before setting up the whole environment), in order to estimate how much RAM will I need. filterCache: In my scenariom, every query has some filters. let's say
Re: SOLR 4.x: multiterm phrase inside proximity searches possible?
There is no regular expression support across terms in queries, just within a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper). You can use the surround query parser to do span queries: http://wiki.apache.org/solr/SurroundQueryParser But, surround does not support regex terms, just wildcards. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 8:59 AM To: solr-user@lucene.apache.org Subject: SOLR 4.x: multiterm phrase inside proximity searches possible? Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/jira/browse/LUCENE-2754 implements what we need on the Lucene side. Regards, Dmitry
Re: SOLR 4.x: multiterm phrase inside proximity searches possible?
Thanks, Jack. I have been looking at SurroundQueryParser and came across an old thread [1], mentioning two drawbacks: term analysis is left out and no default search operator. Do you know, if this is still true? [1] http://search-lucene.com/m/94ONm1KRuAv1/ On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.comwrote: There is no regular expression support across terms in queries, just within a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper). You can use the surround query parser to do span queries: http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser But, surround does not support regex terms, just wildcards. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 8:59 AM To: solr-user@lucene.apache.org Subject: SOLR 4.x: multiterm phrase inside proximity searches possible? Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/**jira/browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754implements what we need on the Lucene side. Regards, Dmitry
Re: SOLR 4.x: multiterm phrase inside proximity searches possible?
Unfortuntaely, yes. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 9:42 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible? Thanks, Jack. I have been looking at SurroundQueryParser and came across an old thread [1], mentioning two drawbacks: term analysis is left out and no default search operator. Do you know, if this is still true? [1] http://search-lucene.com/m/94ONm1KRuAv1/ On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.comwrote: There is no regular expression support across terms in queries, just within a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper). You can use the surround query parser to do span queries: http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser But, surround does not support regex terms, just wildcards. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 8:59 AM To: solr-user@lucene.apache.org Subject: SOLR 4.x: multiterm phrase inside proximity searches possible? Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/**jira/browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754implements what we need on the Lucene side. Regards, Dmitry
Re: SOLR 4.x: multiterm phrase inside proximity searches possible?
That is good to know, thanks! Taking the alternative (LUCENE-2754): somehow applying the most recent patch attached to the jira wasn't successful, not sure why. I'd guess, since it patches Lucene, one would still need to wire this to Solr or am I missing something? Dmitry On Fri, Jan 18, 2013 at 4:44 PM, Jack Krupansky j...@basetechnology.comwrote: Unfortuntaely, yes. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 9:42 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible? Thanks, Jack. I have been looking at SurroundQueryParser and came across an old thread [1], mentioning two drawbacks: term analysis is left out and no default search operator. Do you know, if this is still true? [1] http://search-lucene.com/m/**94ONm1KRuAv1/http://search-lucene.com/m/94ONm1KRuAv1/ On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.com* *wrote: There is no regular expression support across terms in queries, just within a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper). You can use the surround query parser to do span queries: http://wiki.apache.org/solr/SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser http://**wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser But, surround does not support regex terms, just wildcards. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 8:59 AM To: solr-user@lucene.apache.org Subject: SOLR 4.x: multiterm phrase inside proximity searches possible? Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/jira/browse/LUCENE-2754https://issues.apache.org/**jira/browse/LUCENE-2754 https:**//issues.apache.org/jira/**browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754 implements what we need on the Lucene side. Regards, Dmitry
Re: SOLR 4.x: multiterm phrase inside proximity searches possible?
LUCENE-2754 is already in Lucene 4.0 - SpanMultiTermQueryWrapper. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 9:50 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible? That is good to know, thanks! Taking the alternative (LUCENE-2754): somehow applying the most recent patch attached to the jira wasn't successful, not sure why. I'd guess, since it patches Lucene, one would still need to wire this to Solr or am I missing something? Dmitry On Fri, Jan 18, 2013 at 4:44 PM, Jack Krupansky j...@basetechnology.comwrote: Unfortuntaely, yes. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 9:42 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible? Thanks, Jack. I have been looking at SurroundQueryParser and came across an old thread [1], mentioning two drawbacks: term analysis is left out and no default search operator. Do you know, if this is still true? [1] http://search-lucene.com/m/**94ONm1KRuAv1/http://search-lucene.com/m/94ONm1KRuAv1/ On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.com* *wrote: There is no regular expression support across terms in queries, just within a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper). You can use the surround query parser to do span queries: http://wiki.apache.org/solr/SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser http://**wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser But, surround does not support regex terms, just wildcards. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 8:59 AM To: solr-user@lucene.apache.org Subject: SOLR 4.x: multiterm phrase inside proximity searches possible? Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/jira/browse/LUCENE-2754https://issues.apache.org/**jira/browse/LUCENE-2754 https:**//issues.apache.org/jira/**browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754 implements what we need on the Lucene side. Regards, Dmitry
Re: SOLR 4.x: multiterm phrase inside proximity searches possible?
Yep, that's my issue: we still use solr 3.4. On Fri, Jan 18, 2013 at 4:57 PM, Jack Krupansky j...@basetechnology.comwrote: LUCENE-2754 is already in Lucene 4.0 - SpanMultiTermQueryWrapper. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 9:50 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible? That is good to know, thanks! Taking the alternative (LUCENE-2754): somehow applying the most recent patch attached to the jira wasn't successful, not sure why. I'd guess, since it patches Lucene, one would still need to wire this to Solr or am I missing something? Dmitry On Fri, Jan 18, 2013 at 4:44 PM, Jack Krupansky j...@basetechnology.com* *wrote: Unfortuntaely, yes. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 9:42 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4.x: multiterm phrase inside proximity searches possible? Thanks, Jack. I have been looking at SurroundQueryParser and came across an old thread [1], mentioning two drawbacks: term analysis is left out and no default search operator. Do you know, if this is still true? [1] http://search-lucene.com/m/94ONm1KRuAv1/http://search-lucene.com/m/**94ONm1KRuAv1/ http://search-**lucene.com/m/94ONm1KRuAv1/http://search-lucene.com/m/94ONm1KRuAv1/ On Fri, Jan 18, 2013 at 4:38 PM, Jack Krupansky j...@basetechnology.com * *wrote: There is no regular expression support across terms in queries, just within a single term, which is what LUCENE-2754 does (SpanMultiTermQueryWrapper). You can use the surround query parser to do span queries: http://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser http://**wiki.apache.org/solr/SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser http://**wiki.apache.org/**solr/**SurroundQueryParserhttp://wiki.apache.org/solr/**SurroundQueryParser htt**p://wiki.apache.org/solr/**SurroundQueryParserhttp://wiki.apache.org/solr/SurroundQueryParser But, surround does not support regex terms, just wildcards. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Friday, January 18, 2013 8:59 AM To: solr-user@lucene.apache.org Subject: SOLR 4.x: multiterm phrase inside proximity searches possible? Hello! Does SOLR 4.x support / is going to support the multi-term phrase search inside proximity searches? To illustrate, we would like the following to work: \a b\ c~10 which would return hits with a b 10 tokens away from c in no particular order. It looks like https://issues.apache.org/**jira/browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754 https:**//issues.apache.org/**jira/**browse/LUCENE-2754https://issues.apache.org/**jira/browse/LUCENE-2754 https:**//issues.apache.org/**jira/**browse/LUCENE-2754http://issues.apache.org/jira/**browse/LUCENE-2754 http**s://issues.apache.org/jira/**browse/LUCENE-2754https://issues.apache.org/jira/browse/LUCENE-2754 implements what we need on the Lucene side. Regards, Dmitry
Re: Questions about boosting
On 1/18/2013 12:32 AM, Mikhail Khludnev wrote: Colleagues, fwiw bq is a DisMax parser feature. Shawn, to approach the boosting syntax with the standard parser you need something like q=foo:bar ip:sc^1000. Specifying ^1000 in bq makes no sense ever. If you show query params and debugQuery output, it would much easier for us to help you. PS omitting termfreq's and positions doesn't impact query time boosing ever. The closes caveat is that disabling norms indexing kills _index_ time boosting. Ah! As soon as I changed to my edismax handler, suddenly it started working! I was doing all my tests with /select. Now to work out what the boost factor should be. A value of 0.25 seems like it might produce good results. I was surprised, I thought it would require a higher value. Makes me think that this is not a multiplicative boost. Thanks, Shawn
SOLR-1604
Hello! Is there some activity on SOLR-1604? Can one of the contributors answer two simple questions? https://issues.apache.org/jira/browse/SOLR-1604?focusedCommentId=13557053page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13557053 Regards, Dmitry
Re: How to round solr score ?
On 18 January 2013 19:18, Gustav xbihy...@sharklasers.com wrote: Hey Gora, thanks for the fast answer! I Had tried the rint(score) function before(it would be perfect in my case) but it didnt work out, i guess it only works with indexed fields, so i got the sort param could not be parsed as a query, and is not a field that exists in the index: rint(score) error, And with the query() function i didnt got any successful result... Im stuck in the same cenario as squaro. if two docs have score of 1.67989 and 1.6767, I would like to sort them by price. My sort rules ae something like: sort=score desc, price asc You have to use rint() in combination with query() If I understand your requirements correctly, something along the lines below should work: http://localhost:8983/solr/select/?defType=funcq=rint(query({!v=text:term}))fl=score,*sort=score desc,price asc should work, where one is searching for term in the field text. The score is displayed in the returned fields to demonstrate that it has been rounded off. Regards, Gora
Re: Questions about boosting
On Jan 17, 2013, at 10:53 PM, Shawn Heisey wrote: On 1/17/2013 11:41 PM, Walter Underwood wrote: As I understand it, the bq parameter is a full Lucene query, but only used for ranking, not for selection. This is the complement of fq. You can use weighting: provider:fred^8 This will be affected by idf, so providers with fewer matches will have higher weight than those with more matches. This is a bother, but the idf-free approach requires Solr 4.0. I am doing my testing on Solr 4.1, so if you can give me the syntax for that, I would appreciate it. My production indexes are 3.5, but once we are confident with the 4.1 dev system, we'll upgrade. The provider field has omitTermFreqAndPositions=true defined, but the fields that typically get searched don't omit anything, so IDF probably still applies in the aggregate. On a related note, I have rather extreme length variation in my fields, so I see quite a lot of weird results due to very short metadata. Is there any way to lessen the impact of lengthNorm without eliminating it entirely? If not, is there any way to eliminate lengthNorm without also disabling index-time boosts? At this moment I am not doing index-time boosting, but business requirements may change that in the future. Thanks, Shawn I was experimenting with a boost function like this: if(exists(query(provider:fred)), 5, 1) That gives a constant boost if the term exists in the field, none if it does not. If you pass the provider in as a separate URL param, you could use parameter substitution. if(exists(query(provider:$provider)), 5, 1) For length norms, you could try a different similarity class or write your own, changing Similarity.computeNorm(). wunder -- Walter Underwood wun...@wunderwood.org
n values in one fieldType
Hi guys, I have some specific needs for an application. Each document (identified by docId) has several items from the same type (each one of these items contains 6 integer values). So each Solr doc has a docId and another multiValued attribute. fields field name=docId type=int/ field name=item type=??? multiValued=true / /fields My problem is that I don't know what fieldType I should use to implement in the 'item' attribute, because every input query will have the 6 integer values I told you before, to recover the docs that contains EXACTLY the 6 values. What do you think? Borja. -- View this message in context: http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions about boosting
On 1/18/2013 8:52 AM, Walter Underwood wrote: On Jan 17, 2013, at 10:53 PM, Shawn Heisey wrote: On 1/17/2013 11:41 PM, Walter Underwood wrote: As I understand it, the bq parameter is a full Lucene query, but only used for ranking, not for selection. This is the complement of fq. You can use weighting: provider:fred^8 This will be affected by idf, so providers with fewer matches will have higher weight than those with more matches. This is a bother, but the idf-free approach requires Solr 4.0. I am doing my testing on Solr 4.1, so if you can give me the syntax for that, I would appreciate it. My production indexes are 3.5, but once we are confident with the 4.1 dev system, we'll upgrade. The provider field has omitTermFreqAndPositions=true defined, but the fields that typically get searched don't omit anything, so IDF probably still applies in the aggregate. On a related note, I have rather extreme length variation in my fields, so I see quite a lot of weird results due to very short metadata. Is there any way to lessen the impact of lengthNorm without eliminating it entirely? If not, is there any way to eliminate lengthNorm without also disabling index-time boosts? At this moment I am not doing index-time boosting, but business requirements may change that in the future. Thanks, Shawn I was experimenting with a boost function like this: if(exists(query(provider:fred)), 5, 1) That gives a constant boost if the term exists in the field, none if it does not. If you pass the provider in as a separate URL param, you could use parameter substitution. if(exists(query(provider:$provider)), 5, 1) For length norms, you could try a different similarity class or write your own, changing Similarity.computeNorm(). I tried a boost= parameter on my 4.1 server and got an error message in the response with the entire value of the parameter: org.apache.solr.search.SyntaxError: Nested function query must use $param or {!v=value} forms. got 'if(exists(query(ip:sc)), 2, 1)' I get a different error if I change it to bf instead of boost: org.apache.solr.search.SyntaxError: Unexpected text after function: ) What am I doing wrong? Thanks, Shawn
Re: Need 'stupid beginner' help with SolrCloud
On 1/17/2013 9:24 PM, Mark Miller wrote: There are a couple ways you can proceed. You can preconfigure some SolrCores in solr.xml. Even if you don't, you want a solr.xml, because that is where a lot of cloud properties are defined. Or you can use the collections API or the core admin API. I guess I'd recommend the collections API. You have a couple options for getting in config. I'd recommend using the ZkCli tool to upload each of your config sets: http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper After that, use the collections API to create the necessary cores on each node. Another options is to setup solr.xml like you would locally, then start with -Dconf_bootstrap=true and it will duplicate your local config and collection setup in ZooKeeper. I have a cloud up and running with one config and collection, and I think I understand how to create more of both. Is the following the recommended way of choosing a collection with SolrJ's CloudSolrServer, or should I be doing something different? server.setDefaultCollection(test1); Any SolrJ examples for cloud-related tasks would be appreciated. Thanks, Shawn
Re: Using Solr Spatial in conjunction with HBASE/Hadoop
Thanks guys! David, In general and in your opinion would Lucene Spatial be the way to go to index hundreds of terabytes of spatial data that continually grows. Mostly point data, mostly structured, however, could be polygons. The searches would be within or contains in a polygon. Do you have any thoughts on using a NOSQL database (like Mongodb) or something else comparable. I need response times in the seconds. My thoughts are that I need some type of distributed system. I was thinking about SOLRCLOUD to solve this. I'm fairly new to Lucene/Solr.Most of the data is currently in HDFS/HBASE. I've investigated sharding Oracle and Postgres databases but this just doesn't seem like the ideal solution and since all the data already exists in HDFS, I'd like to build a solution that works on top of it but real-time or as near as I can get. Anyways, I've read some of your work in the past and appreciate your input. I don't mind putting in some development work, just not sure the right approach. Thanks for your time. I appreciate it! -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-Spatial-in-conjunction-with-HBASE-Hadoop-tp4034307p4034639.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need 'stupid beginner' help with SolrCloud
On Jan 18, 2013, at 1:40 PM, Shawn Heisey s...@elyograg.org wrote: I have a cloud up and running with one config and collection, and I think I understand how to create more of both. Is the following the recommended way of choosing a collection with SolrJ's CloudSolrServer, or should I be doing something different? server.setDefaultCollection(test1); Yeah, thats fine if you only plan on working with one collection. Otherwise just pass collection=whatever as a param to override. Any SolrJ examples for cloud-related tasks would be appreciated. I'll look at adding some to the wiki. - Mark
Re: Long ParNew GC pauses - even when young generation is small
product bragging alert We've got people running Solr on the Zing JVM at various places for exactly this reason. A key side effect of running on Zing is the complete elimination of GC effects, with no code changes or tuning needed. So instead of wanting pauses of half a second or less, and settling for pauses of 2 seconds or less (per your message), you can instead actually run on a JVM with a GC behavior that drops noise to below 20 msec with Solr. And you can get this the day you turn Zing on, without needing to know any more about tuning, or having to make the various interplays or tradeoffs around things like Parnew sizing, heap sizing, occupancy thresholds, and the 75 other flags that may strongly affect your user experience. -- Gil. (CTO, Azul Systems). -- View this message in context: http://lucene.472066.n3.nabble.com/Long-ParNew-GC-pauses-even-when-young-generation-is-small-tp4031110p4034645.html Sent from the Solr - User mailing list archive at Nabble.com.
Question on Solr Velocity Example
Hi, I am new to Solr (and Velocity), and have downloaded Solr 4.0 from http://lucene.apache.org/solr/downloads.html. I started the example solr, and indexed the XML files in the /exampledocs directory. Next, I pointed the browser to: http://localhost:8983/solr/browse and I get the results along with the search and faceted search functionality. I am interested in learning how this example works. I hope some of you can help me with the following questions: 1. In this example, we seem to be using the Velocity templates in: /example/solr/collection1/conf/velocity. The overall page at http://localhost:8983/solr/browse seems to be generated from browse.vm - which seems to include (parse) other templates. My question here is that I see things like $response.response.clusters – Where can I know what properties the “response” object has, or the “clusters” object has? Also there seem to be some methods like display_facet_query() – where is this defined. Is there some documentation for this, or some way I can find this out? I might need to modify these values, hence my question. (I am completely new to Velocity – but I think I get some idea by looking at the templates.) 2. In http://localhost:8983/solr/browse page, we have a list of Query Facets. Right now I just see two: ipod and GB? How are these values obtained? Do they come from elevate.xml?? Here I see ipod, but not GB. I would appreciate any help on these questions. If the above description is not clear please let me know. Thank you, O. O.
SolrCloud :: Distributed query processing
Hello, I'm trying to reconcile my understanding of how distributed queries are handled by SolrCloud with what I see in the server (tomcat running solr) logs. The setup: Solr 4.0 GA, single collection, one shard, two nodes (master and replica), standalone zookeeper ensemble. Client uses SolrJ CloudSolrServer to issue queries. Looking at one of the solr instance's logs (same pattern for both master and replica) while repeatedly running a query, I sometimes see just one line: INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 QTime=28 And sometimes I see the following 3 lines (for a single client request): INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={fl=user_id,scoreshard.url=master_url|replica_url/NOW=1358538200298start=0q=my_querydistrib=falseisShard=truewt=javabinfsv=truerows=10version=2} hits=2 status=0 QTime=14 INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={shard.url=master_url|replica_url/NOW=1358538200298start=0q=my_queryids=229,118671distrib=falseisShard=truewt=javabinrows=10version=2} status=0 QTime=9 INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 QTime=107 I thought that the client simply picks a node (master or replica in this case) and that node will fully service the request given that it's a single shard setup. But apparently I'm missing something - please help me understand what. Thanks, Ernest The information contained in this message is intended only for the recipient, and may be a confidential attorney-client communication or may otherwise be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, please be aware that any dissemination or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us by replying to the message and deleting it from your computer. The McGraw-Hill Companies, Inc. reserves the right, subject to applicable local law, to monitor, review and process the content of any electronic message or information sent to or from McGraw-Hill e-mail addresses without informing the sender or recipient of the message. By sending electronic message or information to McGraw-Hill e-mail addresses you, as the sender, are consenting to McGraw-Hill processing any of your personal data therein.
Re: SolrCloud :: Distributed query processing
Hopefully the explanation here will shed some light on this: https://issues.apache.org/jira/browse/SOLR-3912 -Yonik http://lucidworks.com On Fri, Jan 18, 2013 at 2:59 PM, Mishkin, Ernest ernest_mish...@mcgraw-hill.com wrote: Hello, I'm trying to reconcile my understanding of how distributed queries are handled by SolrCloud with what I see in the server (tomcat running solr) logs. The setup: Solr 4.0 GA, single collection, one shard, two nodes (master and replica), standalone zookeeper ensemble. Client uses SolrJ CloudSolrServer to issue queries. Looking at one of the solr instance's logs (same pattern for both master and replica) while repeatedly running a query, I sometimes see just one line: INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 QTime=28 And sometimes I see the following 3 lines (for a single client request): INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={fl=user_id,scoreshard.url=master_url|replica_url/NOW=1358538200298start=0q=my_querydistrib=falseisShard=truewt=javabinfsv=truerows=10version=2} hits=2 status=0 QTime=14 INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={shard.url=master_url|replica_url/NOW=1358538200298start=0q=my_queryids=229,118671distrib=falseisShard=truewt=javabinrows=10version=2} status=0 QTime=9 INFO [org.apache.solr.core.SolrCore ] webapp=/solr path=/select params={start=0q=my_querywt=javabinrows=10version=2} hits=2 status=0 QTime=107 I thought that the client simply picks a node (master or replica in this case) and that node will fully service the request given that it's a single shard setup. But apparently I'm missing something - please help me understand what. Thanks, Ernest The information contained in this message is intended only for the recipient, and may be a confidential attorney-client communication or may otherwise be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, please be aware that any dissemination or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us by replying to the message and deleting it from your computer. The McGraw-Hill Companies, Inc. reserves the right, subject to applicable local law, to monitor, review and process the content of any electronic message or information sent to or from McGraw-Hill e-mail addresses without informing the sender or recipient of the message. By sending electronic message or information to McGraw-Hill e-mail addresses you, as the sender, are consenting to McGraw-Hill processing any of your personal data therein.
Solr 4.0 doesn't send qt parameter to shards
Can someone explain the logic of not sending the qt parameter down to the shards? I see from here that qt is handled as a special case for ResultGrouping: http://lucidworks.lucidimagination.com/display/solr/Result+Grouping where there is a special shard.qt parameter. in 3.x solrconfig.xml supports defining a list of SearchComponents on handler by handler basis. This flexibility goes away if qt isn't passed down or am I missing something? I'm using: requestDispatcher handleSelect=true for the legacy behavior. We want to be able to have a single endpoint (e.g. http://localhost:8983/solr/select) and modify query processing by varying only the query parameters. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-doesn-t-send-qt-parameter-to-shards-tp4034653.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud :: Distributed query processing
Thanks Yonik, that issue is exactly the same as what I observed. Glad it's fixed in 4.1 -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, January 18, 2013 3:10 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud :: Distributed query processing Hopefully the explanation here will shed some light on this: https://issues.apache.org/jira/browse/SOLR-3912 -Yonik http://lucidworks.com The information contained in this message is intended only for the recipient, and may be a confidential attorney-client communication or may otherwise be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, please be aware that any dissemination or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us by replying to the message and deleting it from your computer. The McGraw-Hill Companies, Inc. reserves the right, subject to applicable local law, to monitor, review and process the content of any electronic message or information sent to or from McGraw-Hill e-mail addresses without informing the sender or recipient of the message. By sending electronic message or information to McGraw-Hill e-mail addresses you, as the sender, are consenting to McGraw-Hill processing any of your personal data therein.
Re: n values in one fieldType
It depends on what kind of behavior you're looking for. if for your queries the order of the 6 integer values doesn't matter you could do: field name=item type=tint multiValued=true/ then you could query with ORed or ANDed integer values over that field. If the order matters but you always query on the set of 6 values, then you turn your six integers into a guid or simply HEX encode them into singleValued string field. Another possibility is to HEX encode the integers and separate them with whitespace and whitespace tokenize. Then you get a mixture of the two above but you can also specify some locality constraints, eg. using phrase queries, etc. The answer really depends on the types of queries you need to be able to respond to. M -- View this message in context: http://lucene.472066.n3.nabble.com/n-values-in-one-fieldType-tp4034552p4034662.html Sent from the Solr - User mailing list archive at Nabble.com.
Distributed edismax query - which handler should be in the shards parameter?
I have a handler on my broker core which has defType=edismax and a shards parameter in solrconfig.xml. Currently the handler referenced on each shard in the shards parameter is a handler that does NOT have defType defined. The defType parameter is not in the URL that the client sends. It just occurred to me that perhaps the shard queries should be sent to a handler with defType=edismax ... is that right, or is my current approach OK? I've had the edismax handler defined for a long time, but it's only been in the last few days that it's actually seeing any use. Thanks, Shawn
Re: Auto completion
In the default /browse suggest, it's wired together with the 'name' field in a couple of places: head.vm: 'terms.fl': 'name', suggest.vm: #foreach($t in $response.response.terms.name) I'll aim to make this more dynamic in the future, but for now if you're adapting from what's there now to use a different field you'll need to hack those two spots. That second spot is simply a path navigation reference in the terms component response that puts the field name as one level in the response structure. So in your example, if you wanted to suggest from 'text', substitute 'text' for 'name' in the spots just mentioned. Erik On Jan 11, 2013, at 01:21 , anurag.jain wrote: in solrconfig.xml str name=defTypeedismax/str str name=qf text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0 branch_name^1.1 hq_passout_year^1.4 course_type^10.0 institute_name^5.0 qualification_type^5.0 mail^2.0 state_name^1.0 /str str name=dftext/str str name=mm100%/str str name=q.alt*:*/str str name=rows10/str str name=fl*,score/str str name=mlt.qf text^0.5 last_name^1.0 first_name^1.2 course_name^7.0 id^10.0 branch_name^1.1 hq_passout_year^1.4 course_type^10.0 institute_name^5.0 qualification_type^5.0 mail^2.0 state_name^1.0 /str str name=mlt.fltext,last_name,first_name,course_name,id,branch_name,hq_passout_year,course_type,institute_name,qualification_type,mail,state_name/str int name=mlt.count3/int str name=faceton/str str name=facet.fieldis_top_institute/str str name=facet.fieldcourse_name/str str name=facet.rangecgpa/str int name=f.cgpa.facet.range.start0/int int name=f.cgpa.facet.range.end10/int int name=f.cgpa.facet.range.gap2/int and in schema.xml field name=id type=text_general indexed=true stored=true required=true multiValued=false / field name=first_name type=text_general indexed=false stored=true/ field name=last_name type=text_general indexed=false stored=true/ field name=institute_name type=text_general indexed=true stored=true/ ... ... ... copyField source=first_name dest=text/ copyField source=last_name dest=text/ copyField source=institute_name dest=text/ ... ... ... so please now tell me what will be JavaScript (terms.fl parameter) ? and conf/velocity/head.vm, and also the 'name' reference in suggest.vm. please reply .. and thanks for previous reply .. :-) -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-completion-tp4032267p4032450.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Velocity in Multicore
Paul - In case you haven't already sussed this one out already, the likely issue is that each core is separately configured and only the single core example collection1 core comes with the VelocityResponseWriter wired in fully. You need these lines (paths likely need adjusting!) in your solrconfig.xml: lib dir=../../../contrib/velocity/lib regex=.*\.jar / lib dir=../../../dist/ regex=apache-solr-velocity-\d.*\.jar / and queryResponseWriter name=velocity class=solr.VelocityResponseWriter startup=lazy/ I used the example-DIH, launching via [java -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar), after adding the above to the db/conf/solrconfig.xml file and this works: http://localhost:8983/solr/db/select?q=*:*wt=velocityv.template=hellov.template.hello=Hello%20World! Before adding getting VrW registered, Solr fell back to the XML response writer as you experienced. Erik On Jan 14, 2013, at 14:05 , Ramirez, Paul M (388J) wrote: Hi, I've been unable to get the velocity response writer to work in a multicore environment. Working from the examples that are distributed with Solr I simply started from the multicore example and added a hello.vm into core0/conf/velocity directory. I then updated the solrconfig.xml to add a new request handler as shown below. I've tried to use the v.base_dir to no success. Essentially what I always end up with is the default solr response. Has anyone been able to get the velocity response writer to work in a multicore environment? If so, could you point me to the documentation on how to do so. hello.vm Hello World! solrconfig.xml === … requestHandler name=/hello class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str !-- VelocityResponseWriter settings -- str name=wtvelocity/str str name=v.templatehello/str !-- I've tried all the following in addition to not specifying any. -- !--str name=v.base_dircore0/conf/velocity/str-- !--str name=v.base_dirconf/velocity/str-- !--str name=v.base_dirmulticore/core0/conf/velocity/str-- /lst /requestHandler … Regards, Paul Ramirez
Re: Question on Solr Velocity Example
On Jan 18, 2013, at 14:47 , O. Olson wrote: I am new to Solr (and Velocity), and have downloaded Solr 4.0 from http://lucene.apache.org/solr/downloads.html. I started the example solr, and indexed the XML files in the /exampledocs directory. Next, I pointed the browser to: http://localhost:8983/solr/browse and I get the results along with the search and faceted search functionality. I am interested in learning how this example works. I hope some of you can help me with the following questions: 1. In this example, we seem to be using the Velocity templates in: /example/solr/collection1/conf/velocity. The overall page at http://localhost:8983/solr/browse seems to be generated from browse.vm - which seems to include (parse) other templates. My question here is that I see things like $response.response.clusters – Where can I know what properties the “response” object has, or the “clusters” object has? Great question. $response is as described here http://wiki.apache.org/solr/VelocityResponseWriter#Velocity_Context You can navigate Solr's javadocs (or via IDE and the source code as I do) to trace what that object returns and then introspect as you drill in. I often just add '.class' to something in a template to have it output what kind of Java object it is, and work from there, such as $response.clusters.class Also there seem to be some methods like display_facet_query() – where is this defined. Is there some documentation for this, or some way I can find this out? I might need to modify these values, hence my question. (I am completely new to Velocity – but I think I get some idea by looking at the templates.) display_facet_query is a macro defined in velocity/VM_global_library.vm, which is the default Velocity location to put global macros that all templates can see. 2. In http://localhost:8983/solr/browse page, we have a list of Query Facets. Right now I just see two: ipod and GB? How are these values obtained? Do they come from elevate.xml?? Here I see ipod, but not GB. They come from the definition of the /browse handler (in other words, just arbitrary query request parameters but hard-coded for example purposes) as: str name=facet.queryipod/str str name=facet.queryGB/str The Velocity templates (facet_queries.vm in this case) dynamically generates the link and count display for all facet.query's in the request. Erik
Re: Question on Solr Velocity Example
- Messaggio originale - Da: Erik Hatcher erik.hatc...@gmail.com A: solr-user@lucene.apache.org; O. Olson olson_...@yahoo.it Cc: Inviato: Venerdì 18 Gennaio 2013 15:20 Oggetto: Re: Question on Solr Velocity Example Great question. $response is as described here http://wiki.apache.org/solr/VelocityResponseWriter#Velocity_Context You can navigate Solr's javadocs (or via IDE and the source code as I do) to trace what that object returns and then introspect as you drill in. I often just add '.class' to something in a template to have it output what kind of Java object it is, and work from there, such as $response.clusters.class - Thank you Erik. On the Page http://wiki.apache.org/solr/VelocityResponseWriter#Velocity_Context if you click on QueryResponse you get a 404 i.e. a link to http://lucene.apache.org/solr/4_0_0/solr-core/org/apache/solr/client/solrj/response/QueryResponse.html is a 404. Thank you for throwing light on my other questions. Your responses helped. Thank you, O. O.
Re: Solr 4.0 doesn't send qt parameter to shards
Yeah, shards.qt is the way to go. I think this basically done because of a limitation around early distrib search. If you want to do a distrib search against a bunch of shards and you put the shards to search into the request handler, you then don't want to hit that same request handler on the subsearches or it will just keep sub searching those nodes over and over. So usually it was recommended you create a second search handler with the shards param. Distrib search is then setup so that sub searches call the /select handler using the params from the custom handler you added. shards gets removed though, and since it's not set in the /select handler you don't infinitely loop. I think with something like SolrCloud this is not a problem - you don't hardcode shards in the request handler. It's also not a problem if you specify shards on your query instead. It does make using some components awkard - it's not usually immediately clear how to add a distrib component - but you usually end up setting shards.qt so that you don't jump to the select handler. If you must hard code shards in the config, you can try other things like also adding the component to the /select handler. I think it would be nice to clean this up a bit somehow. Or document it better. - Mark On Jan 18, 2013, at 3:39 PM, Shawn Heisey s...@elyograg.org wrote: On 1/18/2013 1:20 PM, Mike Schultz wrote: Can someone explain the logic of not sending the qt parameter down to the shards? I see from here that qt is handled as a special case for ResultGrouping: http://lucidworks.lucidimagination.com/display/solr/Result+Grouping where there is a special shard.qt parameter. in 3.x solrconfig.xml supports defining a list of SearchComponents on handler by handler basis. This flexibility goes away if qt isn't passed down or am I missing something? I'm using: requestDispatcher handleSelect=true for the legacy behavior. We want to be able to have a single endpoint (e.g. http://localhost:8983/solr/select) and modify query processing by varying only the query parameters. Just add shards.qt with the appropriate value in each solrconfig.xml handler definition that needs it. That should work for most use cases. Thanks, Shawn
Re: SolrCloud Performance for High Query Volume
Hi Otis, Thanks for the response. The primary difference in the schema and solrconfig are the settings that are needed/required for 4.0 compatibility; so things like version field, schema version number, auto commit settings etc. A quick note on our SolrCloud topology: we have 2 Shards with one replica per shard. So essentially 2 servers per shard, which makes up the 4 servers that I referred to below. (Sorry for not being specific) As for the comment about the RAM, given our SolrCloud setup we felt that we wouldn't need an equal amount of memory given the size of the shard would be roughly 50% of the entire document collection...at least that was our rationale. We might be totally off-base here. Our index will contain about 175 million documents, with each document having about 65 fields. The actual physical size of the index is estimated at about 75GB. Almost 90-95% of the queries executed against the index are filter queries, as the site is based on faceted searches. Hence I'll say that the queries will be diverse, as it's based on various user driven permutations. We're going to need to work with our infrastructure team to determine the disk IO utilization between the 3.6 and 4.0 environments. Hopefully that all makes sense. Any immediate thoughts on any of this? Thanks as usual. -Niran From: Otis Gospodnetic otis.gospodne...@gmail.com To: solr-user@lucene.apache.org; Niran Fajemisin afa...@yahoo.com Sent: Thursday, January 17, 2013 10:12 AM Subject: Re: SolrCloud Performance for High Query Volume Hello Niran, Now with the roughly the same schema and solrconfig configuration Can you be more specific about what was changed and how? * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 8GB of RAM and 150GB HDD That's less RAM than before. Could it be that this causes more disk IO because the index is not as well cached? Note that you are comparing a non-real-time master-slave setup with a real-time SolrCloud setup (with an unknown number of shards, replicas, etc.) SSDs will help if there is a lot of disk IO (i.e. if indices are big, queries diverse, and free memory scarce). I'd start by looking at all system-level indicators and metrics. SPM for Solr may help: http://sematext.com/spm/solr-performance-monitoring/index.html . Maybe you can show us disk IO graphs for the old cluster vs. new cluster? Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Jan 15, 2013 at 11:54 AM, Niran Fajemisin afa...@yahoo.com wrote: Hi all, I'm currently in the process of doing some performance testing in preparations for upgrading from Solr 3.6.1 to Solr 4.0. (We're badly in need of NRT functionality) Our existing deployment is not a typical deployment for Solr, as we use it to search and facet on financial data such as accounts, positions and transactions records. To make matters worse, each request could potentially return upwards of 50,000 or more records from the index. As I said, it's not an ideal use case for Solr but this is the system that is in place and it really can't be changed at this point. With this defined use case, our current 3.6.1 deployment is able to scale to about 1500 queries per minute, with an average response time in the low 100-200ms. Note that this time includes the query time and the transport time (time to stream all the documents to the calling services). At the 50,000 document mark, we're getting about 1.6-2 sec. response time. The client is willing to live with this as these type of requests are not very frequent. Our hardware configuration on the 3.6.1 environment is as follows: * 1 Master Server for indexing with 2 CPU (each 6 cores, 2.67GHz) 4GB of RAM and 150GB HDD * 2 Slaves Servers for query only each with 2 of CPUs (each 6 cores, 2.67GHz) with 12GB of RAM each and same HDD space. (mechanical drive) Each of the servers are virtual servers in a VMWare environment. Now with the roughly the same schema and solrconfig configuration, the performance on Solr 4.0 is quite bad. Running just 500 queries per minute our query performance degrades to almost 2 minute response times in some cases. The average is about 40-50 sec. response time. Note that the index at the moment is only a fraction of the size of the existing environment (about 1/8th the size). The hardware setup for the SolrCloud deployment is as follows: * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 8GB of RAM and 150GB HDD * 3 ZooKeeper server instances. We are using each Solr server instance to run 1 ZK instance, with the 4th server not running a ZK server. We haven't observed any issues with memory utilization. Additionally the virtual servers are co-located. We're wondering if upgrading to Solid State Drives would improve performance significantly? Are there any other pointers or configuration changes that we
Re: Long ParNew GC pauses - even when young generation is small
On 1/18/2013 12:40 PM, giltene wrote: product bragging alert We've got people running Solr on the Zing JVM at various places for exactly this reason. A key side effect of running on Zing is the complete elimination of GC effects, with no code changes or tuning needed. So instead of wanting pauses of half a second or less, and settling for pauses of 2 seconds or less (per your message), you can instead actually run on a JVM with a GC behavior that drops noise to below 20 msec with Solr. And you can get this the day you turn Zing on, without needing to know any more about tuning, or having to make the various interplays or tradeoffs around things like Parnew sizing, heap sizing, occupancy thresholds, and the 75 other flags that may strongly affect your user experience. I don't see any info on your website about pricing, so I can't make any decisions about whether it would be right for me. Can you give me long-term pricing information? Chances are that once I inform management of the cost, it'd never fly. Does anyone know how to get good GC pause characteristics with Solr and the latest Oracle Java 7? Thanks, Shawn
Re: Long ParNew GC pauses - even when young generation is small
On Jan 6, 2013, at 5:41 PM, Shawn Heisey s...@elyograg.org wrote: Clarification of my question and my goals: What I *want* is for all GC pauses to be half a second or less. I'd try working with the concurrent, low pause collector. Any of the stop the world collectors mixed with a large heap will likely mean a few second pauses at least at some points. A well tuned concurrent collector will never step the world in most situations. -XX:+UseConcMarkSweepGC I wrote an article that might be useful a while back: http://searchhub.org/2011/03/27/garbage-collection-bootcamp-1-0/ - Mark
Re: Long ParNew GC pauses - even when young generation is small
On 1/18/2013 8:37 PM, Mark Miller wrote: On Jan 6, 2013, at 5:41 PM, Shawn Heisey s...@elyograg.org wrote: Clarification of my question and my goals: What I *want* is for all GC pauses to be half a second or less. I'd try working with the concurrent, low pause collector. Any of the stop the world collectors mixed with a large heap will likely mean a few second pauses at least at some points. A well tuned concurrent collector will never step the world in most situations. -XX:+UseConcMarkSweepGC I wrote an article that might be useful a while back: http://searchhub.org/2011/03/27/garbage-collection-bootcamp-1-0/ Mark, I have been using that collector. When I had a very large young generation (NewRatio=1), most of the really long collections were ParNew. When I lowered the young generation size drastically, the overall situation got slightly better. Unfortunately there are still long pauses, but now they are CMS. I wrote a handy little perl script to parse a GC log and spit out a compact listing of every line that takes longer than half a second. On my dev 4.1 server with Java 7u11, I am using the G1 collector with a max pause target of 1500ms. I was thinking that this collector was producing long pauses too, but after reviewing the gc log with a closer eye, I see that there are lines that specifically say pause ... and all of THOSE lines are below half a second except one that took 1.4 seconds. Does that mean that it's actually meeting the target, or are the other lines that show quite long time values indicative of a problem? If only the lines that explicitly say pause are the ones I need to worry about, then it looks like G1 is the clear winner. My production servers are version 3.5 with Java 6u38. After reading your bootcamp and consulting a few other guides, this was going to be my next step: -Xms1024 -Xmx8192 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 I may try the G1 collector with Java 6 in production, since I am on the newest Oracle version. I am very interested in knowing whether I need to worry about G1 log entries that don't say pause in them. Below is an excerpt from a G1 log. Notice how only a few of the lines actually say pause ... in addition to these here that say (young) there are some pause lines that say (mixed): http://fpaste.org/U0aQ/ Thanks, Shawn