Re: Peronalized Search Results or Matching Documents to Users
: * How often are documents assigned to new users? * How many documents does a user typically have? * Do you have a 'trigger' in your app that tells you a user has been assigned a new doc? You can use a pseudo join to implement this sort of thing - have a different core that contains the 'permissions', either a document that says this document ID is accessible via these users or this user is allowed to see these document IDs. You are keeping your fast moving (authorization) data separate from your slow moving (the docs themselves) data. You can then say find me all documents that are accessible via user X Upayavira -- Thanks Regards Umesh Prasad Tech Lead @ flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Solr hangs / LRU operations are heavy on cpu
-- Thanks Regards Umesh Prasad Tech Lead @ flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Solr hangs / LRU operations are heavy on cpu
It might be because LRUCache by default will try to evict its entries on each call to put and putAll. LRUCache is built on top of java's LinkedHashMap. Check the javadoc of removeEldestEntry http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html#removeEldestEntry%28java.util.Map.Entry%29 Try using LFUCache and a separate cleanup thread .. We have been using that for over 2 yrs now without any issues .. For comparison of Cache in solr you can check this link https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig On 20 March 2015 at 04:05, Sergey Shvets ser...@bintime.com wrote: LRUCache It -- Thanks Regards Umesh Prasad Tech Lead @ flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Grouping based on multiple filters/criterias
Solr does support date mathematics in filters / queries . So your timestamps intervals can be dynamic .. On 22 August 2014 05:51, deniz denizdurmu...@gmail.com wrote: umeshprasad wrote Grouping supports group by queries. https://cwiki.apache.org/confluence/display/solr/Result+Grouping However you will need to form the group queries before hand. Thanks Regards Umesh Prasad Search Lead@ in.linkedin.com/pub/umesh-prasad/6/5bb/580/ have seen this page before but it is not providing the functionality that I need, because the timestamp interval would be seriously tricky, as it is supposed to be dynamic... though i have found another solution to handle this out of Solr :) - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462p4154343.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Dynamically loaded core.properties file
The core discovery process is dependent on presence of core.properties file in the particular directory. You can have a script, which will traverse the directory structure of core base directory and depending on env/host name, will either restore core.properties or rename it to a different file. The script will have to run before solr starts. So solr will see the directory structures, but core.properties will be missing from directories which you do not want to load (renamed as core.properties.bkp) We are already using this approach to control core discovery in prod (we have 40 plus cores and we co-host only a couple of them on a single server. ) On 21 August 2014 04:41, Erick Erickson erickerick...@gmail.com wrote: OK, not quite sure if this would work, but In each core.properties file, put in a line similar to what Chris suggested: properties=${env}/custom.properties You might be able to now define your sys var like -Drelative_or_absolute_path_to_dev_custom.proerties file. or -Drelative_or_absolute_path_to_prod_custom.proerties file. on Solr startup. Then in the custom.properties file you have whatever you need to define to make the prod/dev distinction you need. WARNING: I'm not entirely sure that relative pathing works here, which just means I haven't tried it. Best, Erick On Wed, Aug 20, 2014 at 3:11 PM, Ryan Josal ry...@pointinside.com wrote: Thanks Erick, that mirrors my thoughts exactly. If core.properties had property expansion it would work for this, but I agree with not supporting that for the complexities it introduces, and I'm not sure it's the right way to solve it anyway. So, it doesn't really handle my problem. I think because the properties file I want to load is not actually related to any core, it makes it easier to solve. So if solr.xml is no longer rewritten then it seems like a global properties file could safely be specified there using property expansion. Or maybe there is some way to write some code that could get executed before schema and solrconfig are parsed, although I'm not sure how that would work given how you need solrconfig to load the libraries and define plugins. Ryan On 08/20/2014 01:07 PM, Erick Erickson wrote: Hmmm, I was going to make a code change to do this, but Chris Hostetter saved me from the madness that ensues. Here's his comment on the JIRA that I did open (but then closed), does this handle your problem? I don't think we want to make the name of core.properties be variable ... that way leads to madness and confusion. the request on the user list was about being able to dynamically load a property file with diff values between dev production like you could do in the old style solr.xml – that doesn't mean core.properties needs to have a configurable name, it just means there needs to be a configurable way to load properties. we already have a properties option which can be specified in core.properties to point to an additional external file that should also be loaded ... if variable substitution was in play when parsing core.properties then you could have something like properties=custom.${env}.properties in core.properties ... but introducing variable substitution into thecore.properties (which solr both reads writes based on CoreAdmin calls) brings back the host of complexities involved when we had persistence of solr.xml as a feature, with the questions about persisting the original values with variables in them, vs the values after evaluating variables. Best, Erick On Wed, Aug 20, 2014 at 11:36 AM, Ryan Josal ry...@pointinside.com wrote: Hi all, I have a question about dynamically loading a core properties file with the new core discovery method of defining cores. The concept is that I can have a dev.properties file and a prod.properties file, and specify which one to load with -Dsolr.env=dev. This way I can have one file which specifies a bunch of runtime properties like external servers a plugin might use, etc. Previously I was able to do this in solr.xml because it can do system property substitution when defining which properties file to use for a core. Now I'm not sure how to do this with core discovery, since the core is discovered based on this file, and now the file needs to contain things that are specific to that core, like name, which previously were defined in the xml definition. Is there a way I can plugin some code that gets run before any schema or solrconfigs are parsed? That way I could write a property loader that adds properties from ${solr.env}.properties to the JVM system properties. Thanks! Ryan -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: logging in solr
Or you could use system properties to control that. For example if you are using logbak, then JAVA_OPTS=$JAVA_OPTS -Dlogback.configurationFile=$CATALINA_BASE/conf/logback.xml will do it On 20 August 2014 03:15, Aman Tandon amantandon...@gmail.com wrote: As you are using tomcat you can configure the log file name, folder,etc. by configuring the server.xml present in the Conf directory of tomcat. On Aug 19, 2014 4:17 AM, Shawn Heisey s...@elyograg.org wrote: On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote: Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Solr uses the slf4j library for logging. Simply change your program to use slf4j, and very likely the logs will go to the same place the Solr logs do. http://www.slf4j.org/manual.html See also the wiki page on logging jars and Solr: http://wiki.apache.org/solr/SolrLogging Thanks, Shawn -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Substring and Case In sensitive Search
The performance of wild card queries and specially prefix wild card query can be quite slow. http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/WildcardQuery.html Also, you won't be able to time them out. Take a look at ReversedWildcardFilter http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html The blog post describes it nicely .. http://solr.pl/en/2011/10/10/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-solr-reversedwildcardfilter-%E2%80%93-lets-optimize-wildcard-queries-part-8/ On 19 August 2014 22:19, Jack Krupansky j...@basetechnology.com wrote: Substring search a string field using wildcard, *, at beginning and end of query term. Case-insensitive match on string field is not supported. Instead, copy the string field to a text field, use the keyword tokenizer, and then apply the lower case filter. But... review your use case to confirm whether you really need to use string as opposed to text field. -- Jack Krupansky -Original Message- From: Nishanth S Sent: Tuesday, August 19, 2014 12:03 PM To: solr-user@lucene.apache.org Subject: Substring and Case In sensitive Search Hi, I am very new to solr.How can I allow solr search on a string field case insensitive and substring?. Thanks, Nishanth -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Grouping based on multiple filters/criterias
Grouping supports group by queries. https://cwiki.apache.org/confluence/display/solr/Result+Grouping However you will need to form the group queries before hand. On 18 August 2014 12:47, deniz denizdurmu...@gmail.com wrote: is it possible to have multiple filters/criterias on grouping? I am trying to do something like those, and I am assuming that from the statuses of the tickets, it doesnt seem possible? https://issues.apache.org/jira/browse/SOLR-2553 https://issues.apache.org/jira/browse/SOLR-2526 https://issues.apache.org/jira/browse/LUCENE-3257 To make everything clear, here is details which I am planning to do with Solr... so there is an activity feed of a site and it is basically working like facebook or linkedin newsfeed, though there is no relationship between users, it doesnt matter if i am following someone or not, as long as their settings allows me to see their posts and they hit my search filter, i will see their posts. the part related with grouping is tricky... so lets assume that you are able to see my posts, and I have posted 8 activities in the last one hour, those activities should appear different than other posts, as it would be a combined view of the posts... i.e deniz activity one activity two . activity eight /deniz other user 1 single activity /other user 1 another user 1 single activity /another user 1 other user 2 activity one activity two /other user 2 So here the results should be grouped depending on their post times... on solr (4.7.2), i am indexing activities as documents, and each document has bunch of fields including timestamp and source_user etc etc. is it possible to do this on current solr? (in case the details are not clear, please feel free to ask for more details :) ) - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Selectively setting the number of returned SOLR rows per field based on field value
Field Collapsing has a limitation. Currently it will not allow you to get different number of results from a each group. You can plug a custom AnalyticQuery, which can do exactly what you want with after seeing a matching document. https://cwiki.apache.org/confluence/display/solr/AnalyticsQuery+API On 18 August 2014 04:32, Erick Erickson erickerick...@gmail.com wrote: Aurélien is correct, for the exact behavior you're looking for you'd need to run w queries. But you might be able to make do with field collapsing. You'd probably have to copyField from title to title_grouping which would be un-analyzed (string type or KeywordTokenizer), then group on _that_ field. You'd get back the top N matches grouped by title and your app could display that info however it made sense. Grouping sometimes goes by field collapsing FWIW. Erick On Sun, Aug 17, 2014 at 2:16 PM, talt mikaelsaltz...@gmail.com wrote: I have a field in my SOLR index, let's call it book_title. A query returns 15 rows with book_title:The Kite Runner, 13 rows with book_title:The Stranger, and 8 rows with book_title:The Ruby Way. Is there a way to return only the first row of The Kite Runner and The Stranger, but all of the The Ruby Way rows from the previous query result? This would result in 10 rows altogether. Is this possible at all, using a single query? -- View this message in context: http://lucene.472066.n3.nabble.com/Selectively-setting-the-number-of-returned-SOLR-rows-per-field-based-on-field-value-tp4153441.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: indexing comments with Apache Solr
griddynamics blog is useful. It has 4 parts which covers block join quite well .. http://blog.griddynamics.com/2012/08/block-join-query-performs.html http://blog.griddynamics.com/2013/09/solr-block-join-support.html http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html The github repo is https://gist.github.com/mkhludnev On 6 August 2014 19:05, Ali Nazemian alinazem...@gmail.com wrote: Dear Alexandre, Hi, Thank you very much. I think nested document is what I need. Do you have more information about how can I define such thing in solr schema? Your mentioned blog post was all about retrieving nested docs. Best regards. On Wed, Aug 6, 2014 at 5:16 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You can index comments as child records. The structure of the Solr document should be able to incorporate both parents and children fields and you need to index them all together. Then, just search for JOIN syntax for nested documents. Also, latest Solr (4.9) has some extra functionality that allows you to find all parent pages and then expand children pages to match. E.g.: http://heliosearch.org/expand-block-join/ seems relevant Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Wed, Aug 6, 2014 at 11:18 AM, Ali Nazemian alinazem...@gmail.com wrote: Dear Gora, I think you misunderstood my problem. Actually I used nutch for crawling websites and my problem is in index side and not crawl side. Suppose page is fetch and parsed by Nutch and all comments and the date and source of comments are identified by parsing. Now what can I do for indexing these comments? What is the document granularity? Best regards. On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty g...@mimirtech.com wrote: On 6 August 2014 14:13, Ali Nazemian alinazem...@gmail.com wrote: Dear all, Hi, I was wondering how can I mange to index comments in solr? suppose I am going to index a web page that has a content of news and some comments that are presented by people at the end of this page. How can I index these comments in solr? consider the fact that I am going to do some analysis on these comments. For example I want to have such query flexibility for retrieving all comments that are presented between 24 June 2014 to 24 July 2014! or all the comments that are presented by specific person. Therefore defining these comment as multi-value field would not be the solution since in this case such query flexibility is not feasible. So what is you suggestion about document granularity in this case? Can I consider all of these comments as a new document inside main document (tree based structure). What is your suggestion for this case? I think it is a common case of indexing webpages these days so probably I am not the only one thinking about this situation. Please share you though and perhaps your experiences in this condition with me. Thank you very much. Parsing a web page, and breaking up parts up for indexing into different fields is out of the scope of Solr. You might want to look at Apache Nutch which can index into Solr, and/or other web crawlers/scrapers. Regards, Gora -- A.Nazemian -- A.Nazemian -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Modify/add/remove params at search component
Use ModifiableParams SolrParams params = rb.req.getParams(); ModifiableSolrParams modifableSolrParams = new ModifiableSolrParams(params); modifableSolrParams.set(ParamName, paramValue); rb.req.setParams(modifableSolrParams) On 4 August 2014 12:47, Lee Chunki lck7...@coupang.com wrote: Hi, I am building a new search component and it runs after QueryComponent. What I want to do is set params like start, rows, query and so on at new search component. I could set/get query by using setQueryString() http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#setQueryString(java.lang.String) getQueryString() http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#getQueryString() and get params by using rb.req.getParams() but how can I set params at search component? Thanks, Chunki. -- Thanks Regards Umesh Prasad Search l...@flipkart.com in.linkedin.com/pub/umesh-prasad/6/5bb/580/
Re: Query on Facet
using below query to get facets with combination of language and its binding. But now I'm getting only selected facet in facetList of each field and its count. For e.g. in language facets the query is returning English and its count. Instead I need to get other language facets which satisfies binding type of paperback http://localhost:8080/solr/collection1/select?q=software%20testingfq=language%3A(%22English%22)fq=Binding%3A(%22paperback%22)facet=truefacet.mincount=1 facet.field=Languagefacet.field=latestArrivalsfacet.field=Bindingwt=jsonindent=truedefType=edismax json.nl=map Please provide me your inputs. Thanks Regards, Smitha -- --- Thanks Regards Umesh Prasad
Re: Solr gives the same fieldnorm for two different-size fields
What you really need is a covering type match. I feel your use case fits into this type Score (Exact match in order)Score ( Exact match without order ) Score (Non Exact Match) Example Query : a b c Example docs : d1 : a b c d2 : a c b d3 : c a b d4 : a b c d d5 : a b c d e Use case 1 : Only exact match is a match. (So only d1 is a match) Use case 2 : Only in order are matches. So d2, d3 aren't matches. Scores are d1 d4 d5 Use case 3 : Only in order are matches. And only one extra term is allowed. So d2, d3, d5 aren't matches. Scores are d1 d4 Use case 4 : All are matches and d1 d2 d3 d4 d5 All of these use cases can be satisfied by using SpanQueries, which tracks the positions at which terms matches. For covering match, you will need to introduce add start and end sentinel terms during indexing. There is an excellent post by Mark Miller about span queries http://searchhub.org/2009/07/18/the-spanquery/ Solr's SurroundQuery Parser allows you to create SpanQueries http://wiki.apache.org/solr/SurroundQueryParser Or you can plug your own query parser into solr to do the same. Some more links you can get here .. http://search-lucene.com/?q=span+queriesfc_project=Lucenefc_project=Solr On 1 August 2014 00:24, Erick Erickson erickerick...@gmail.com wrote: You can consider, say, a copyField directive and copy the field into a string type (or perhaps keyworTokenizer followed by lowerCaseFilter) and then match or boost on an exact match rather than trying to make scoring fill this role. In any case, I'm thinking of normalizing the sensitive fields and indexing them as a single token (i.e. the string type or keywordtokenizer) to disambiguate these cases. Because otherwise I fear you'll get one situation to work, then fail on the next case. In your example, you're trying to use length normalization to influence scoring to get the doc with the shorter field to sort above the doc with the longer field. But what are you going to do when your target is university of california berkley research? Rely on matching all the terms? And so on... Best, Erick On Thu, Jul 31, 2014 at 10:26 AM, gorjida a...@sciencescape.net wrote: Thanks so much for your reply... In my case, it really matters because I am going to find the correct institution match for an affiliation string... For example, if an author belongs to the university of Toronto, his/her affiliation should be normalized against the solr... In this case, University of California Berkley Research is a different place to university of california berkeley... I see top-matches are tied in the score for this specific example... I can break the tie using other techniques... However, I am keen to see if this is a common problem in solr? Regards, Ali -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Re: Searching words with spaces for word without spaces in solr
. While using shingle in query analyzer, the query ice cube creates three tokens as ice,cube, icecube. Only ice and cubes are searched but not icecubes.i.e not working for pair though I am using shingle filter. Here's the schema config. 1. fieldType name=text class=solr.TextField positionIncrementGap=100 2. analyzer type=index 3. filter class=solr.SynonymFilterFactory synonyms=synonyms_text_prime_index.txt ignoreCase=true expand=true/ 4. charFilter class=solr.HTMLStripCharFilterFactory/ 5. tokenizer class=solr.StandardTokenizerFactory/ 6. filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true tokenSeparator=/ 7. filter class=solr.WordDelimiterFilterFactory catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=1 generateWordParts=1 generateNumberParts=1/ 8. filter class=solr.LowerCaseFilterFactory/ 9. filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ 10. /analyzer 11. analyzer type=query 12. tokenizer class=solr.StandardTokenizerFactory/ 13. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ 14. filter class=solr.ShingleFilterFactory maxShingleSize=2 outputUnigrams=true tokenSeparator=/ 15. filter class=solr.WordDelimiterFilterFactory preserveOriginal=1/ 16. filter class=solr.LowerCaseFilterFactory/ 17. filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ 18. /analyzer 19. /fieldType Any help is appreciated. -- --- Thanks Regards Umesh Prasad
Re: Bloom filter
+1 to Guava's BloomFilter implementation. You can actually hook into UpdateProcessor chain and have the logic of updating bloom filter / checking there. We had a somewhat similar use case. We were using DIH and it was possible that same solr input document (meaning same content) will be coming lots of times and it was leading to a lot of unnecessary updates in index. I introduced a DuplicateDetector using update processor chain which kept a map of Unique ID -- solr doc hash code and will drop the document if it was a duplicate. There is a nice video of other usage of Update chain https://www.youtube.com/watch?v=qoq2QEPHefo On 30 July 2014 23:05, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You're right. I misunderstood. I thought that you wanted to optimize the finding by id path which is typically done for comparing versions during inserts in Solr. Yes, it won't help with the case where the ID does not exist. On Wed, Jul 30, 2014 at 6:14 PM, Per Steffensen st...@designware.dk wrote: Hi I am not sure exactly what LUCENE-5675 does, but reading the description it seems to me that it would help finding out that there is no document (having an id-field) where version-field is less than some-version. As far as I can see this will not help finding out if a document with id=some-id exists. We want to ask does a document with id some-id exist, without knowing the value of its version-field (if it actually exists). You do not know if it ever existed, either. Please elaborate. Thanks! Regarding The only other choice today is bloom filters, which use up huge amounts of memory, I guess a bloom filter only takes as much space (disk or memory) as you want it to. The more space you allow it to use the more it gives you a false positive (saying this doc might exist in cases where the doc actually does not exist). So the space you need to use for the bloom filter depends on how frequently you can live with false positives (where you have to actually look it up in the real index). Regards, Per Steffensen On 30/07/14 10:05, Shalin Shekhar Mangar wrote: Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it out in Solr is in my todo list but maybe you can get to it before me. https://issues.apache.org/jira/browse/LUCENE-5675 On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen st...@designware.dk wrote: On 30/07/14 08:55, jim ferenczi wrote: Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using guava like BloomFilter), you should take a look at their implementation if you really need this feature. Yes, I am looking into what Lucene can do and how to use it through Solr. If it does not fit our needs I will enhance it - potentially with inspiration from ES implementation. Thanks What is your use-case ? If your index fits in RAM the bloom filter won't help (and it may have a negative impact if you have a lot of segments). In fact the only use case where the bloom filter can help is when your term dictionary does not fit in RAM which is rarely the case. We have so many documents that it will never fit in memory. We use optimistic locking (our own implementation) to do correct concurrent assembly of documents and to do duplicate control. This require a lot of finding docs from their id, and most of the time the document is not there, but to be sure we need to check both transactionlog and the actual index (UpdateLog). We would like to use Bloom Filter to quickly tell that a document with a particular id is NOT present. Regards, Jim Regards, Per Steffensen -- Regards, Shalin Shekhar Mangar. -- --- Thanks Regards Umesh Prasad
Re: Shuffle results a little
What you are look for is a distribution of search results. One way would be a two phase search Phase 1 : Search (with rows =0, No scoring, no grouping) 1. Find the groups (unique combinations) using pivot facets (won't work in distributed env yet) 2. Transform those groups as group.queries .. Phase 2 : Actual search ( with group.queries ) Pros : Readily available and well tested. Cons : It will give you exact same number of results for each group, which may not be desired. Specifically with pagination. And of course, you are making two searches. 2nd Approach would be to have this logic of distributing along different dimensions as your own custom component. Solr's PostFilter/delegating collector can be used for same. Basically TopDocCollector just maintains a PriorityQueue for matching documents. You can plugin your own collector, so that it sees all matching documents. Identifies which groups they belong to (if groups/pivots have been already identified) , maintains the priority queue for each of them and then finally merges them. Quite a bit of customization if you ask me, but can be done and it would be most powerful. PS : We use the 2nd approach. On 30 July 2014 05:56, babenis babe...@gmail.com wrote: despite the fact that I upgrade to 4.9.0 - grouping doesn't seem to work on multi valued field, ie i was going to try to group by tags + brand (where tags is a multi-valued field) and spread results apart or select unique combinations only -- View this message in context: http://lucene.472066.n3.nabble.com/Shuffle-results-a-little-tp1891206p4149973.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Re: To warm the whole cache of Solr other than the only autowarmcount
@Eric : As you said, each use-case is different. We actually autowarm our caches to 80% and we have a 99% hit ratio on filter cache. For query cache, hit ratios are like 25% but given that cache hit saves us about 10X, we strive to increase cache hit ratio. @Yang : You can't do a direct copy of values. Values are related to lucene's internal document id and they can change during an index update. The change can happen because of document being deleted, segments being merged or new segments being created. Solr's caches refer to global doc id which are even more prone to change (because of index merges). On 28 July 2014 21:32, Erick Erickson erickerick...@gmail.com wrote: bq: autowarmcount=1024... That's the point, this is quite a high number in my experience. I've rarely seen numbers above 128 show much of any improvement. I've seen a large number of installations use much smaller autowarm numbers, as in the 16-32 range and be quite content. I _really_ recommend you try to use much smaller numbers then _measure_ whether the first few queries after a commit show unacceptable response times before trying to make things better. This really feels like premature optimization. Of course you know your problem space better than I do, it's just that I've spent too much of my professional life fixing the wrong problem; I've become something of a measure first curmudgeon. FWIW, Erick On Sun, Jul 27, 2014 at 10:48 PM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi Erick We do the DIH job from the DB and committed frequently.It takes a long time to autowarm the filterCaches after commit or soft commit happened when setting the autowarmcount=1024,which I do think is small enough. So It comes up an idea that whether it could directly pass the reference of the caches over to the new caches so that the autowarm processing will take much fewer time . 2014-07-28 2:30 GMT+08:00 Erick Erickson erickerick...@gmail.com: Why do you think you _need_ to autowarm the entire cache? It is, after all, an LRU cache, the theory being that the most recent queries are most likely to be reused. Personally I'd run some tests on using small autowarm counts before getting at all mixed up in some complex scheme that may not be useful at all. Say an autowarm count of 16. Then measure using that, then say 32 then... Insure you have a real problem before worrying about a solution! ;) Best, Erick On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey s...@elyograg.org wrote: On 7/24/2014 8:45 PM, YouPeng Yang wrote: To Matt Thank you,your opinion is very valuable ,So I have checked the source codes about how the cache warming up. It seems to just put items of the old caches into the new caches. I will pull Mark Miller into this discussion.He is the one of the developer of the Solr whom I had contacted with. To Mark Miller Would you please check out what we are discussing in the last two posts.I need your help. Matt is completely right. Any commit can drastically change the Lucene document id numbers. It would be too expensive to determine which numbers haven't changed. That means Solr must throw away all cache information on commit. Two of Solr's caches support autowarming. Those caches use queries as keys and results as values. Autowarming works by re-executing the top N queries (keys) in the old cache to obtain fresh Lucene document id numbers (values). The cache code does take *keys* from the old cache for the new cache, but not *values*. I'm very sure about this, as I wrote the current (and not terribly good) LFUCache. Thanks, Shawn -- --- Thanks Regards Umesh Prasad
Re: Implementing custom analyzer for multi-language stemming
Also, take a look at the Lucid revolution talk Typed Index https://www.youtube.com/watch?v=X93DaRfi790 *Published on 25 Nov 2013* Presented by Christoph Goller, Chief Scientist, IntraFind Software AG If you want to search in a multilingual environment with high-quality language-specific word-normalization, if you want to handle mixed-language documents, if you want to add phonetic search for names if you need a semantic search which distinguishes between a search for the color brown and a person with the second name brown, in all these cases you have to deal with different types of terms. I will show why it makes much more sense to attach types (prefixes) to Lucene terms instead of relying on different fields or even different indexes for different kinds of terms. Furthermore I will show how queries to such a typed index look and why e.g. SpanQueries are needed to correctly treat compound words and phrases or realize a reasonable phonetic search. The Analyzers and the QueryParser described are available as plugins for Lucene, Solr, and elasticsearch. On 31 July 2014 00:34, Sujit Pal sujit@comcast.net wrote: Hi Eugene, In a system we built couple of years ago, we had a corpus of English and French mixed (and Spanish on the way but that was implemented by client after we handed off). We had different fields for each language. So (title, body) for English docs was (title_en, body_en), for French (title_fr, body_fr) and for Spanish (title_es, body_es) - each of these were associated with a different Analyzer (that was associated with the field types in schema.xml, in case of Lucene you can use PerFieldAnalyzerWrapper). Our pipeline used Google translate to detect the language and write the contents into the appropriate field set for the language. Our analyzers were custom - but Lucene/Solr provides analyzer chains for many major languages. You can find a list here: https://wiki.apache.org/solr/LanguageAnalysis -sujit On Wed, Jul 30, 2014 at 10:52 AM, Chris Morley ch...@depahelix.com wrote: I know BasisTech.com has a plugin for elasticsearch that extends stemming/lemmatization to work across 40 natural languages. I'm not sure what they have for Solr, but I think something like that may exist as well. Cheers, -Chris. From: Eugene beyondcomp...@gmail.com Sent: Wednesday, July 30, 2014 1:48 PM To: solr-user@lucene.apache.org Subject: Implementing custom analyzer for multi-language stemming Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically and use Google Translate APIs a lot (so having arbitrary number of languages in our system doesn't concern us). However we need to be able to search using stemming. Having nearly hundred of fields (several fields for each language with language-specific stemmers) listed in our search query is not an option. So we need a way to have a single index which has stemmed tokens for different languages. I have two questions: 1. Are there already (third-party) custom multi-language stemming analyzers? (I doubt that no one else ran into this issue) 2. If I'm going to implement such analyzer myself, could you please suggest a better way to 'pass' detected language value into such analyzer? Detecting language in analyzer itself is not an option, because: a) we already detect it in other place b) we do it based on combined values of many fields ('name', 'topic', 'description', etc.), while current field can be to short for reliable detection c) sometimes we just want to specify language explicitly. The obvious hack would be to prepend ISO 639-1 code to field value. But I'd like to believe that Solr allows for cleaner solution. I could think about either: a) custom query parameter (but I guess, it will require modifying request handlers, etc. which is highly undesirable) b) getting value from other field (we obviously have 'language' field and we do not have mixed-language records). If it is possible, could you please describe the mechanism for doing this or point to relevant code examples? Thank you very much and have a good day! -- --- Thanks Regards Umesh Prasad
Re: Identify specific document insert error inside a solrj batch request
) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:960) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1021) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:957) -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, July 30, 2014 5:53 PM To: solr-user@lucene.apache.org Subject: Re: Identify specific document insert error inside a solrj batch request Agreed that this is a problem with Solr. If it was merely bad input, Solr should be returning a 4xx error. I don't know if we already have a Jira for this. If not, one should be filed. There are two issues: 1. The status code should be 4xx with an appropriate message about bad input. 2. The offset of the offending document should be reported so that the app can locate the problem to resolve it. Give us the actual server stack trace so we can verify whether this was simply user error or some defect in Solr itself. -- Jack Krupansky -Original Message- From: Liram Vardi Sent: Wednesday, July 30, 2014 9:25 AM To: solr-user@lucene.apache.org Subject: Identify specific document insert error inside a solrj batch request Hi All, I have a question regarding the use of HttpSolrServer (SolrJ). I have a collection of SolrInputDocuments I want to send to Solr as a batch. Now, let's assume that one of the docs inside this collection is corrupted (missing some required field). When I send the batch of docs to solr using HttpSolrServer.add(Collection SolrInputDocument docs) I am getting the following general exception: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://172.23.3.91:8210/solr/template returned non ok status:500, message:Server Error When I check Solr log, I can identify exactly which is the corrupted document. My question: Is it possible to identify the problematic document at the client side? (for recovery purposes) Thanks, Liram Email secured by Check Point -- --- Thanks Regards Umesh Prasad
Re: Mixing ordinary and nested documents
Query parentFilterQuery = new TermQuery(new Term(document_type, parent)); int[] childToParentDocMapping = new int[searcher.maxDoc()]; DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery); DocIterator iter = allParentDocSet.iterator(); int child = 0; while (iter.hasNext()) { int parent = iter.nextDoc(); while (child = parent) { childToParentDocMapping[child] = parent; child++; } } On 22 July 2014 16:28, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk wrote: Thanks, Umesh You can get the parent bitset by running a the parent doc type query on the solr indexsearcher. Then child bitset by runnning the child doc type query. Then use these together to create a int[] where int[i] = parent of i. Can you kindly add an example? I am not quite sure how to put this into a query? I can easily make the join from child to parent, but what I want to achieve is to get the parent document added to the result if it exists but maintain the scoring fromt the child as well as the full child document. Is this possible? Cheers, Bjørn 2014-07-18 19:00 GMT+02:00 Umesh Prasad umesh.i...@gmail.com: Comments inline On 16 July 2014 20:31, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk wrote: Hi Solr users I would appreciate your inputs on how to handle a *mix *of *simple *and *nested *documents in the most easy and flexible way. I need to handle: - simple documens: webpages, short articles etc. (approx. 90% of the content) - nested documents: books containing chapters etc. (approx 10% of the content) For simple documents I just want to present straightforward search results without any grouping etc. For the nested documents I want to group by book and show book title, book price etc. AND the individual results within the book. Lets say there is a hit on Chapters 1 and Chapter 7 within Book 1 and a hit on Article 1, I would like to present this: *Book 1 title* Book 1 published date Book 1 description - *Chapter 1 title* Chapter 1 snippet - *Chapter 7 title* CHapter 7 snippet *Article 1 title* Article 1 published date Article 1 description Article 1 snippet It looks like it is pretty straightforward to use the CollapsingQParser to collapse the book results into one result and not to collapse the other results. But how about showing the information about the book (the parent document of the chapters)? You can map the child document to parent doc id space and extract the information from parent doc id. First you need to generate child doc to parent doc id mapping one time. You can get the parent bitset by running a the parent doc type query on the solr indexsearcher. Then child bitset by runnning the child doc type query. Then use these together to create a int[] where int[i] = parent of i. This result is cachable till next commit. I am doing that for computing facets from fields in parent docs and sorting on values from parent docs (while getting child docs as output). 1) Is there a way to do an* optional block join* to a *parent *document and return it together *with *the *child *document - but not to require a parent document? - or - 2) Do I need to require parent-child documents for everything? This is really not my preferred strategy as only a small part of the documents is in a real parent-child relationship. This would mean a lot of dummy child documents. - or - 3) Should I just denormalize data and include the book information within each chapter document? - or - 4) ... or is there a smarter way? Your help is very much appreciated. Cheers, Bjørn Axelsen -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Mixing ordinary and nested documents
public static DocSet mapChildDocsToParentOnly(DocSet childDocSet) { DocSet mappedParentDocSet = new BitDocSet(); DocIterator childIterator = childDocSet.iterator(); while (childIterator.hasNext()) { int childDoc = childIterator.nextDoc(); int parentDoc = childToParentDocMapping[childDoc]; mappedParentDocSet.addUnique(parentDoc); } int[] matches = new int[mappedParentDocSet.size()]; DocIterator parentIter = mappedParentDocSet.iterator(); for (int i = 0; parentIter.hasNext(); i++) { matches[i] = parentIter.nextDoc(); } return new SortedIntDocSet(matches); // you will need SortedIntDocSet impl else docset interaction in some facet queries fails later. } On 22 July 2014 19:59, Umesh Prasad umesh.i...@gmail.com wrote: Query parentFilterQuery = new TermQuery(new Term(document_type, parent)); int[] childToParentDocMapping = new int[searcher.maxDoc()]; DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery); DocIterator iter = allParentDocSet.iterator(); int child = 0; while (iter.hasNext()) { int parent = iter.nextDoc(); while (child = parent) { childToParentDocMapping[child] = parent; child++; } } On 22 July 2014 16:28, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk wrote: Thanks, Umesh You can get the parent bitset by running a the parent doc type query on the solr indexsearcher. Then child bitset by runnning the child doc type query. Then use these together to create a int[] where int[i] = parent of i. Can you kindly add an example? I am not quite sure how to put this into a query? I can easily make the join from child to parent, but what I want to achieve is to get the parent document added to the result if it exists but maintain the scoring fromt the child as well as the full child document. Is this possible? Cheers, Bjørn 2014-07-18 19:00 GMT+02:00 Umesh Prasad umesh.i...@gmail.com: Comments inline On 16 July 2014 20:31, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk wrote: Hi Solr users I would appreciate your inputs on how to handle a *mix *of *simple *and *nested *documents in the most easy and flexible way. I need to handle: - simple documens: webpages, short articles etc. (approx. 90% of the content) - nested documents: books containing chapters etc. (approx 10% of the content) For simple documents I just want to present straightforward search results without any grouping etc. For the nested documents I want to group by book and show book title, book price etc. AND the individual results within the book. Lets say there is a hit on Chapters 1 and Chapter 7 within Book 1 and a hit on Article 1, I would like to present this: *Book 1 title* Book 1 published date Book 1 description - *Chapter 1 title* Chapter 1 snippet - *Chapter 7 title* CHapter 7 snippet *Article 1 title* Article 1 published date Article 1 description Article 1 snippet It looks like it is pretty straightforward to use the CollapsingQParser to collapse the book results into one result and not to collapse the other results. But how about showing the information about the book (the parent document of the chapters)? You can map the child document to parent doc id space and extract the information from parent doc id. First you need to generate child doc to parent doc id mapping one time. You can get the parent bitset by running a the parent doc type query on the solr indexsearcher. Then child bitset by runnning the child doc type query. Then use these together to create a int[] where int[i] = parent of i. This result is cachable till next commit. I am doing that for computing facets from fields in parent docs and sorting on values from parent docs (while getting child docs as output). 1) Is there a way to do an* optional block join* to a *parent *document and return it together *with *the *child *document - but not to require a parent document? - or - 2) Do I need to require parent-child documents for everything? This is really not my preferred strategy as only a small part of the documents is in a real parent-child relationship. This would mean a lot of dummy child documents. - or - 3) Should I just denormalize data and include the book information within each chapter document? - or - 4) ... or is there a smarter way? Your help is very much appreciated. Cheers, Bjørn Axelsen -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- --- Thanks
Re: Match query string within indexed field?
Please ignore my earlier answer .. I had missed that you wanted a match spotting .. So that all the indexed terms must be present in the query ... One way, I can think of is SpanQueries .. But it won't be efficient and won't scale to multiple fields .. My suggestion would be to keep the mapping of keyword -- field name, count mapping in some key value store and use it at query time to find field name for query terms .. On 19 July 2014 02:34, prashantc88 prashant.chau...@searshc.com wrote: Hi, Thanks for the reply. Is there a better way to do it if the scenario is the following: Indexed values: abc def Query String:xy abc def z So basically the query string has to match all the words present in the indexed data to give a MATCH. -- View this message in context: http://lucene.472066.n3.nabble.com/Match-indexed-data-within-query-string-tp4147896p4147958.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Re: Match query string within indexed field?
*Span Queries for illustration :* During Analysis : Inject startSentinel and endSentinal in your indexed field .. So after analysis your field will look like ... start abc def endl Now during query time, you can expand your query clause programmatic create queries which will look like (start xyz end) OR ( start abc end ) OR basically all unigrams (start xyz abc end ) OR (start abc def end ) OR ... bigrams and so on ... Then for each of your clauses, you will need to generate a SpanQuery ... Flexible Query parser can help you here .. You will need to plug a custom query builder there .. However, as you can see, ngrams based queries will results into a lot of clauses n^2 .. exactly for just one field .. And if you are searching across multiple fields then it will go to m * n ^ 2.. On 20 July 2014 10:31, Umesh Prasad umesh.i...@gmail.com wrote: Please ignore my earlier answer .. I had missed that you wanted a match spotting .. So that all the indexed terms must be present in the query ... One way, I can think of is SpanQueries .. But it won't be efficient and won't scale to multiple fields .. My suggestion would be to keep the mapping of keyword -- field name, count mapping in some key value store and use it at query time to find field name for query terms .. On 19 July 2014 02:34, prashantc88 prashant.chau...@searshc.com wrote: Hi, Thanks for the reply. Is there a better way to do it if the scenario is the following: Indexed values: abc def Query String:xy abc def z So basically the query string has to match all the words present in the indexed data to give a MATCH. -- View this message in context: http://lucene.472066.n3.nabble.com/Match-indexed-data-within-query-string-tp4147896p4147958.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: solr boosting any perticular URL
Or you can give huge boosts to url at query time. If you are using dismax then you can use bq like bq = myfield:url1 ^ 50 .. That will bring up url1 as the first result always. On 18 July 2014 15:27, benjelloun anass@gmail.com wrote: hello, before index the URL to a field in Solr, you can use java api(Solrj) and do a test if(URL==) index on field1 else index on field2 then use edismax to boost a specific field: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=qf field1^5.0 field2^1.0 /str /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/solr-boosting-any-perticular-URL-tp4147657p4147864.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Re: solr boosting any perticular URL
PS : You can give huge boosts to url at query time on a per request basis. Don't specify the bqs on solrconfig.xml .. Always determine add bqs for the query at run time.. On 18 July 2014 15:49, Umesh Prasad umesh.i...@gmail.com wrote: Or you can give huge boosts to url at query time. If you are using dismax then you can use bq like bq = myfield:url1 ^ 50 .. That will bring up url1 as the first result always. On 18 July 2014 15:27, benjelloun anass@gmail.com wrote: hello, before index the URL to a field in Solr, you can use java api(Solrj) and do a test if(URL==) index on field1 else index on field2 then use edismax to boost a specific field: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str str name=qf field1^5.0 field2^1.0 /str /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/solr-boosting-any-perticular-URL-tp4147657p4147864.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Match query string within indexed field?
You are looking for wildcard queries but they can be quite costly and you will need to benchmark performance .. Specially Suffix wild card queries (of type *abc) are quite costly .. You can convert a suffix query into a prefix query by using a ReverseTokenFilter during index time analysis. A search on older mails will be useful .. http://search-lucene.com/?q=wild+card+performance Uwe's mail explains why performance optimization of Suffix wild card queries is difficult .. http://search-lucene.com/m/w1CAyxDpbC1/wild+card+performancesubj=Wild+Card+Query+Performance On 18 July 2014 20:38, prashantc88 prashant.chau...@searshc.com wrote: Hi, My requirement is to give a match whenever a string is found within the indexed data of a field irrespective of where it is found. For example, if I have a field which is indexed with the data abc. Now any of the following query string must give a match: xyzabc,xyabc, abcxyz .. I am using *solr.KeywordTokenizerFactory* as the tokenizer class and *solr.LowerCaseFilterFactory* filter as index time in *schema.xml*. Could anyone help me out as to how I can achieve the functionality. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Match-query-string-within-indexed-field-tp4147896.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Re: Mixing ordinary and nested documents
Comments inline On 16 July 2014 20:31, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk wrote: Hi Solr users I would appreciate your inputs on how to handle a *mix *of *simple *and *nested *documents in the most easy and flexible way. I need to handle: - simple documens: webpages, short articles etc. (approx. 90% of the content) - nested documents: books containing chapters etc. (approx 10% of the content) For simple documents I just want to present straightforward search results without any grouping etc. For the nested documents I want to group by book and show book title, book price etc. AND the individual results within the book. Lets say there is a hit on Chapters 1 and Chapter 7 within Book 1 and a hit on Article 1, I would like to present this: *Book 1 title* Book 1 published date Book 1 description - *Chapter 1 title* Chapter 1 snippet - *Chapter 7 title* CHapter 7 snippet *Article 1 title* Article 1 published date Article 1 description Article 1 snippet It looks like it is pretty straightforward to use the CollapsingQParser to collapse the book results into one result and not to collapse the other results. But how about showing the information about the book (the parent document of the chapters)? You can map the child document to parent doc id space and extract the information from parent doc id. First you need to generate child doc to parent doc id mapping one time. You can get the parent bitset by running a the parent doc type query on the solr indexsearcher. Then child bitset by runnning the child doc type query. Then use these together to create a int[] where int[i] = parent of i. This result is cachable till next commit. I am doing that for computing facets from fields in parent docs and sorting on values from parent docs (while getting child docs as output). 1) Is there a way to do an* optional block join* to a *parent *document and return it together *with *the *child *document - but not to require a parent document? - or - 2) Do I need to require parent-child documents for everything? This is really not my preferred strategy as only a small part of the documents is in a real parent-child relationship. This would mean a lot of dummy child documents. - or - 3) Should I just denormalize data and include the book information within each chapter document? - or - 4) ... or is there a smarter way? Your help is very much appreciated. Cheers, Bjørn Axelsen -- --- Thanks Regards Umesh Prasad
Re: How do I get faceting to work with Solr JOINs
Hi Vinay, You can customize the FacetsComponent. Basically FacetComponent uses SimpleFacets to compute the facet count. It passes matched docset present in responsebuilder to SimpleFacets's constructor. 1. Build a mapping between parent space and auxiliary document space in (say an int array) and cache it in your own custom cache in SolrIndexSearcher. You will need to rebuild this mapping on every commit have to define a CacheRegenerator for that. 2. You can map the matched docset (which is in parent space) to auxiliary document space. The catch is that facets from non matching auxililary docs also would be counted. 3. You can then pass on this mapped auxiliary document to SimpleFacets for faceting. I have doing something similar for our needs .. Basically, we have a parent document with text attributes and changes very less. And we have child documents with inventory attributes which changes extremely fast. The search results requires child documents but faceting has to be done on text attributes which belong to parents. So we do this mapping by customizing the FacetComponent. On 18 July 2014 04:11, Vinay B, vybe3...@gmail.com wrote: Some Background info : In our application, we have a requirement to update large number of records often. I investigated solr child documents but it requires updating both the child and the parent document . Therefore, I'm investigating adding frequently updated information in an auxillary document with a custom defined parent-id field that can be used to join with the static parent document. - basically rolling my own child document functionality. This approach has satisfied all my requirements, except one. How can I facet upon a field present in the auxillary document? First, here's a gist dump of my test core index (4 docs + 4 aux docs) https://gist.github.com/anonymous/2774b54e667778c71492 Next, here's a simple facet query only on the aux . While this works, it only returns auxillary documents https://gist.github.com/anonymous/a58b87576b895e467c68 Finally, I tweak the query using a SOLR join ( https://wiki.apache.org/solr/Join ) to return the main documents (which it does), but the faceting returns no results. This is what I'm hoping someone on this list can answer . Here is the gist of that query https://gist.github.com/anonymous/f3a287ab726f35b142cf Any answers, suggestions ? Thanks -- --- Thanks Regards Umesh Prasad
Re: Memory leak for debugQuery?
Histogram by itself isn't sufficient to root cause the JVM heap issue. We have found JVM heap memory issues multiple times in our system and each time it was due to a different reasons. I would recommend taking heap dumps at regular interval (using jmap/visual vm) and analyze those heap dumps. That will give a definite answer to memory issues. I have regularly analyzed heap dump of size 32 GB with eclipse memory analyzer. The linux version comes with a command line script ParseHeapDump.sh inside mat directory. # Usage: ParseHeapDump.sh path/to/dump.hprof [report]* # # The leak report has the id org.eclipse.mat.api:suspects # The top component report has the id org.eclipse.mat.api:top_components Increase the memory by setting Xmx and Xms param in MemoryAnalyzer.ini (in same directory). The leak suspect report is quite good. For checking detailed allocation pattern etc , you can copy the index files generated from parsing and open it in GUI. On 17 July 2014 05:36, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Also, is this trunk? Solr 4.x? Single shard, right? On Wed, Jul 16, 2014 at 2:24 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Tom - You could maybe isolate it a little further by seeing using the “debug parameter with values of timing|query|results Erik On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote: Hello all, I'm trying to get relevance scoring information for each of 1,000 docs returned for each of 250 queries.If I run the query (appended below) without debugQuery=on, I have no problem with getting all the results with under 4GB of memory use. If I add the parameter debugQuery=on, memory use goes up continuously and after about 20 queries (with 1,000 results each), memory use reaches about 29.1 GB and the garbage collector gives up: org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I've attached a jmap -histo, exgerpt below. Is this a known issue with debugQuery? Tom query: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on without debugQuery=on: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 num #instances#bytes Class description -- 1: 585,559 10,292,067,456 byte[] 2: 743,639 18,874,349,592 char[] 3: 53,821 91,936,328 long[] 4: 70,430 69,234,400 int[] 5: 51,348 27,111,744 org.apache.lucene.util.fst.FST$Arc[] 6: 286,357 20,617,704 org.apache.lucene.util.fst.FST$Arc 7: 715,364 17,168,736 java.lang.String 8: 79,561 12,547,792 * ConstMethodKlass 9: 18,909 11,404,696 short[] 10: 345,854 11,067,328 java.util.HashMap$Entry 11: 8,823 10,351,024 * ConstantPoolKlass 12: 79,561 10,193,328 * MethodKlass 13: 228,587 9,143,480 org.apache.lucene.document.FieldType 14: 228,584 9,143,360 org.apache.lucene.document.Field 15: 368,423 8,842,152 org.apache.lucene.util.BytesRef 16: 210,342 8,413,680 java.util.TreeMap$Entry 17: 81,576 8,204,648 java.util.HashMap$Entry[] 18: 107,921 7,770,312 org.apache.lucene.util.fst.FST$Arc 19: 13,020 6,874,560 org.apache.lucene.util.fst.FST$Arc[] debugQuery_jmap.txt -- --- Thanks Regards Umesh Prasad
Re: SOLR Performance benchmarking
Hi Rashi, Also, checkout http://searchhub.org/2010/01/21/the-seven-deadly-sins-of-solr/ .. It would help if you can share your solrconfig.xml and schema.xml .. Some problems are evident from there itself. From our experience we have found 1. JVM Heap size (check for young gen size and new/old ratio. Default is very low for Prod setups) 2. Solr cache tuning as Siegfried pointed out. There are 4 cache queryCache, filterCache , documentCache and FieldValueCache. Make sure that you have the caches populated to by defining a newSearcher and autowarmCount is properly configured. 3. About long running queries, solr core logs are your friend, analyze the QTime percentiles. The list of reasons here is big. Two that we have found killers for performance are a) . A query time analyzer chain of synonym filter -- stemmer -- synonym filter had resulted in like 50 * 50 = 2500 terms for a single term for us b) ngroups and groups.truncate are quite costly, specially if you have large cardinality for field. And these aren't cached. c) Faceting/filtering on timestamp fields (with arbitrary accuracy) d) Deep paging On 13 July 2014 14:48, Siegfried Goeschl sgoes...@gmx.at wrote: Hi Rashi, abnormal behaviour depends on your data, system and work load - I have seen abnormal behaviour at customers sites and it turned out to be a miracle that they the customer had no serious problems before :-) * running out of sockets - you might need to check if you have enough sockets (system quota) and that the sockets are closed properly (mostly a Windows/networking issue - CLOSED_WAIT) * understand your test setup - usually a test box is much smaller in terms of CPU/memory than you production box ** you might be forced to tweak your test configuration (e.g. production SOLR cache configuration can overwhelm a small server) * understand your work-load ** if you have long-running queries within your performance tests they tend to bring down your server under high-load and your “abnormal” condition looks very normal at hindsight ** spot your long-running queries, optimise them, re-run your tests ** check your cache warming and how fast you start your load injector threads Cheers, Siegfried Goeschl On 13 Jul 2014, at 09:53, rashi gandhi gandhirash...@gmail.com wrote: Hi, I am using SolrMeter for load/stress testing solr performance. Tomcat is configured with default maxThreads (i.e. 200). I set Intended Request per min in SolrMeter to 1500 and performed testing. I found that sometimes it works with this much load on solr but sometimes it gives error Sever Refused Connection in solr. On getting this error, i increased maxThreads to some higher value, and then it works again. I would like to know why solr is behaving abnormally, initially when it was working with maxThreads=200. Please provide me some pointers. -- --- Thanks Regards Umesh Prasad
Re: Group only top 50 results not All results.
Another way is to extend the existing Facets component. FacetsComponent uses SimpleFacets to compute facets where it passes the matching docset (rb.getResults.docSet) as an argument in constructor. Instead you can pass it the ranked docList by passing (rb.getResults.docList). Basically 3 steps 1. Develop your custom facet component. For reference you can look at source cod of FacetsComponent. https://github.com/apache/lucene-solr/blob/d49f297a4c7ab2c518717fa5a6ceeeda222349c3/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java (line 79 - 82) 2. Register the Extended FacetComponent as custom component in solrconfig.xml It will look something like searchComponent name=myfacet class=com.flipkart.solr.handler.component.MyFacetComponent / 3. Call that as part of your custom request handler pipeline. arr name=last-components strmyfacet/str You can check http://sujitpal.blogspot.in/2011/04/custom-solr-search-components-2-dev.html for a sample. On 13 July 2014 00:02, Joel Bernstein joels...@gmail.com wrote: I agree with Alex a PostFilter would work. But it would be a somewhat tricky PostFilter to write. You would need to collect the top 50 documents using a priority queue in the DelegatingCollector.collect() method. Then in the DelegatingCollector.finish() method you would send the top documents to the lower collectors. Grouping supports PostFilters so this should work with Grouping or you could use the CollapsingQParserPlugin. Joel Bernstein Search Engineer at Heliosearch On Sat, Jul 12, 2014 at 1:31 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I don't think either grouping or faceting work as postfilter. Otherwise, that would be one way. Have a custom post-filter that only allows top 50 documents and have grouping run as an even-higher-cost postfilter after that. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Sat, Jul 12, 2014 at 11:49 PM, Erick Erickson erickerick...@gmail.com wrote: You could also return the top 50 groups. That will certainly contain the top 50 responses. The app layer could then do some local sorting to figure out what was correct. Maybe you'd be returning 3 docs in each or something... I'd probably only go there if Michael's approach didn't work out though. On Fri, Jul 11, 2014 at 10:52 AM, Michael Ryan mr...@moreover.com wrote: I suggest doing this in two queries. In the first query, retrieve the unique ids of the top 50 documents. In the second query, just query for those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query. -Michael -Original Message- From: Aaron Gibbons [mailto:agibb...@synergydatasystems.com] Sent: Friday, July 11, 2014 1:46 PM To: solr-user@lucene.apache.org Subject: Group only top 50 results not All results. I'm trying to figure out how I can query solr for the top X results THEN group and count only those top 50 by their owner. I can run a query to get the top 50 results that I want. solr/select?q=(current_position_title%3a(TEST))rows=50 I've tried Faceting but I get all results faceted not just the top 50: solr/select?q=(current_position_title%3a(TEST))start=0rows=50facet=truefacet.field=recruiterkeyidfacet.limit=-1facet.mincount=1facet.sort=true I've tried Grouping and get all results again grouped not just the top 50. solr/select?q=(current_position_title%3a(TEST))rows=50group=truegroup.field=recruiterkeyidgroup.limit=1group.format=groupedversion=2.2 I could also run one search to get the top X record Id's then run a second Grouped query on those but I was hoping there was a less expensive way run the search. So what I need to get back are the distinct recruiterkeyid's from the top X query and the count of how many there are only in the top X results. I'll ultimately want to query the results for each of individual recruiterkeyid as well. I'm using SolrNet to build the query. Thank you for your help, Aaron -- --- Thanks Regards Umesh Prasad
Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication
Must Mention here. This Atomic Update will only work if you all your fields are stored. It eases out work on your part, but the stored fields will bloat the index. On 12 July 2014 22:06, Erick Erickson erickerick...@gmail.com wrote: bq: But does performance remain same in this situation No. Some documents will require two calls to be indexed. And you'll be sending one document at a time rather than batching them up. Of course it'll be slower. But will it still be fast enough? Only you can answer that. If it's _really_ a problem, you could consider using a custom update processor plugin that does all this on the server side. This would not require you to change Solr code, just write a relatively small bit of code and use the plugin infrastructure. Best, Erick On Thu, Jul 10, 2014 at 1:56 PM, Ali Nazemian alinazem...@gmail.com wrote: Thank you very much. Now I understand what was the idea. It is better than changing Solr. But does performance remain same in this situation? On Tue, Jul 8, 2014 at 10:43 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I think you are missunderstanding what Himanshu is suggesting to you. You don't need to make lots of big changes ot the internals of solr's code to get what you want -- instead you can leverage the Atomic Updates Optimistic Concurrency features of Solr to get the existing internal Solr to reject any attempts to add a duplicate documentunless the client code sending the document specifies it should be an update. This means your client code needs to be a bit more sophisticated, but the benefit is that you don't have to try to make complex changes to the internals of Solr that may be impossible and/or difficult to support/upgrade later. More details... https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency Simplest possible idea based on the basic info you have given so far... 1) send every doc using _version_=-1 2a) if doc update fails with error 409, that means a version of this doc already exists 2b) resend just the field changes (using set atomic operation) and specify _version_=1 : Dear Himanshu, : Hi, : You misunderstood what I meant. I am not going to update some field. I am : going to change what Solr do on duplication of uniquekey field. I dont want : to solr overwrite Whole document I just want to overwrite some parts of : document. This situation does not come from user side this is what solr do : to documents with duplicated uniquekey. : Regards. : : : On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra : himanshu.mehro...@snapdeal.com wrote: : : Please look at https://wiki.apache.org/solr/Atomic_Updates : : This does what you want just update relevant fields. : : Thanks, : Himanshu : : : On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com : wrote: : : Dears, : Hi, : According to my requirement I need to change the default behavior of Solr : for overwriting the whole document on unique-key duplication. I am going : to : change that the overwrite just part of document (some fields) and other : parts of document (other fields) remain unchanged. First of all I need to : know such changing in Solr behavior is possible? Second, I really : appreciate if you can guide me through what class/classes should I : consider : for changing that? : Best regards. : : -- : A.Nazemian : : : : : : -- : A.Nazemian : -Hoss http://www.lucidworks.com/ -- A.Nazemian -- --- Thanks Regards Umesh Prasad
Re: SOLR-6143 Bad facet counts from CollapsingQParserPlugin
Hi Joel, Actually I also have seen this. The counts given by groups.truncate and collapsingQParserPlugin differ.. We have a golden query framework for our product APIs and there we have seen differences in facet count given. One request uses groups.truncate and another collapsingQParser plugin and we have seen counts differ (By a small margin) I haven't been able to isolate the issue to a unit test level, so I haven't raised a bug. On 12 July 2014 08:57, Joel Bernstein joels...@gmail.com wrote: The CollapsingQParserPlugin currently supports facet counts that match group.truncate. This works great for some use cases. There are use cases though where group.facets counts are preferred. No timetable yet on adding this feature for the CollapsingQParserPlugin. Joel Bernstein Search Engineer at Heliosearch On Thu, Jul 10, 2014 at 7:20 PM, shamik sham...@gmail.com wrote: Are there any plans to release this feature anytime soon ? I think this is pretty important as a lot of search use case are dependent on the facet count being returned by the search result. This issue renders renders the CollapsingQParserPlugin pretty much unusable. I'm now reverting back to the old group query (painfully slow) since I can't use the facet count anymore. -- View this message in context: http://lucene.472066.n3.nabble.com/RE-SOLR-6143-Bad-facet-counts-from-CollapsingQParserPlugin-tp4140455p4146645.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true
Created the jira .. https://issues.apache.org/jira/browse/SOLR-6222 On 30 June 2014 23:53, Joel Bernstein joels...@gmail.com wrote: Sure, go ahead create the ticket. I think there is more we can here as well. I suspect we can get the CollapsingQParserPlugin to work with useFilterForSortedQuery=true if scoring is not needed for the collapse. I'll take a closer look at this. Joel Bernstein Search Engineer at Heliosearch On Mon, Jun 30, 2014 at 1:43 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi Joel, Thanks a lot for clarification .. An error message would indeed be a good thing .. Should I open a jira item for same ? On 28 June 2014 19:08, Joel Bernstein joels...@gmail.com wrote: OK, I see the problem. When you use useFilterForSortedQuery true /useFilterForSortedQuery Solr builds a docSet in a way that seems to be incompatible with the CollapsingQParserPlugin. With useFilterForSortedQuery true /useFilterForSortedQuery, Solr doesn't run the main query again when collecting the DocSet. The getDocSetScore() method is expecting the main query to present, because the CollapsingQParserPlugin may need the scores generated from the main query, to select the group head. I think trying to make useFilterForSortedQuery true /useFilterForSortedQuery compatible with CollapsingQParsePlugin is probably not possible. So, a nice error message would be a good thing. Joel Bernstein Search Engineer at Heliosearch On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi , Found another bug with CollapsignQParserPlugin. Not a critical one. It throws an exception when used with useFilterForSortedQuery true /useFilterForSortedQuery Patch attached (against 4.8.1 but reproducible in other branches also) 518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29} hits=2 status=0 QTime=99 4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=} hits=4 status=0 QTime=15 4587 T11 C0 oasc.SolrException.log ERROR java.lang.UnsupportedOperationException: Query does not implement createWeight at org.apache.lucene.search.Query.createWeight(Query.java:80) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.util.TestHarness.query(TestHarness.java:295) at org.apache.solr.util.TestHarness.query(TestHarness.java:278) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669) at org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53
Re: how to log ngroups
Hi Aman, You can implement and register a last-component which extracts the ngroups from response and adds it to response. You can checkout tutorial about SearchComponent here http://sujitpal.blogspot.in/2011/04/custom-solr-search-components-2-dev.html .. On 29 June 2014 20:31, Aman Tandon amantandon...@gmail.com wrote: Any help here? With Regards Aman Tandon On Thu, Jun 26, 2014 at 7:32 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, I am grouping in my results and also applying the group limit. Is there is any way to log the ngroups as well along with hits. -- --- Thanks Regards Umesh Prasad
Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true
Hi Joel, Thanks a lot for clarification .. An error message would indeed be a good thing .. Should I open a jira item for same ? On 28 June 2014 19:08, Joel Bernstein joels...@gmail.com wrote: OK, I see the problem. When you use useFilterForSortedQuery true /useFilterForSortedQuery Solr builds a docSet in a way that seems to be incompatible with the CollapsingQParserPlugin. With useFilterForSortedQuery true /useFilterForSortedQuery, Solr doesn't run the main query again when collecting the DocSet. The getDocSetScore() method is expecting the main query to present, because the CollapsingQParserPlugin may need the scores generated from the main query, to select the group head. I think trying to make useFilterForSortedQuery true /useFilterForSortedQuery compatible with CollapsingQParsePlugin is probably not possible. So, a nice error message would be a good thing. Joel Bernstein Search Engineer at Heliosearch On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi , Found another bug with CollapsignQParserPlugin. Not a critical one. It throws an exception when used with useFilterForSortedQuery true /useFilterForSortedQuery Patch attached (against 4.8.1 but reproducible in other branches also) 518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29} hits=2 status=0 QTime=99 4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=} hits=4 status=0 QTime=15 4587 T11 C0 oasc.SolrException.log ERROR java.lang.UnsupportedOperationException: Query does not implement createWeight at org.apache.lucene.search.Query.createWeight(Query.java:80) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.util.TestHarness.query(TestHarness.java:295) at org.apache.solr.util.TestHarness.query(TestHarness.java:278) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669) at org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48
Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
Hi Joel, Had missed this email .. Some issue with my gmail setting. The reason CollapsignQParserPlugin is more performant than regular grouping is because 1. QParser refers to global ords for group.field and avoids storing strings in a set. This has two advantage. a) Terms of memory (storing millions of ints vs strings, results in major savings). b) No binary search / look up is necessary when segment changes. Resulting in huge computation savings. 2. The cost CollapsingFieldValue has to maintain score/field value for each unique ord. Memory requirement = number of ords * size of 1 field value. The basic types byte, int, float , long etc will consume reasonable memory. String/Text value can be stored as ords and will consume only 4 bytes. The memory requirement is because arrays are dense and it is per request. Taking an example : Index Size = 100 million documents Unique ords = 10 million Sort field = 4 ( 1 int field + 1 long field + 2 string/text field) Memory requirement = 40 MB for int field + 80 MB for long field + 80 MB for string ords = 200 MB I agree 200 MB per request just for collapsing the search results is huge but at least it increases linearly with number of sort fields.. For my use case, I am willing to pay the linear cost specially when I can't combine the sort fields intelligently into a sort function. Plus it allows me to sort by String/Text fields also which is a big win. PS : 1. We can store long/string fields also as byte/short ords ..For sort fields, where number of unique values are smaller ( example sort by date , sales rank etc), this will result into significant memory savings. On 19 June 2014 19:40, Joel Bernstein joels...@gmail.com wrote: Umesh, this is a good summary. So, the question is what is the cost (performance and memory) of having the CollapsingQParserPlugin choose the group head by using the Solr sort criteria? Keep in mind that the CollapsingQParserPlugin's main design goal is to provide fast performance when collapsing on a high cardinality field. How you choose the group head can have a big impact here, both on memory consumption performance. The function query collapse criteria was added to allow you to come up with custom formulas for selecting the group head, with little or no impact on performance and memory. Using Solr's recip() function query it seems like you could come up with some nice scenarios where two variables could be used to select the group head. For example: fq={!collapse field=a max='sub(prod(cscore(),1000), recip(field(x),1, 1000, 1000))'} This seems like it would basically give you two sort critea: cscore(), which returns the score, would be the primary criteria. The recip of field x would be the secondary criteria. Joel Bernstein Search Engineer at Heliosearch On Thu, Jun 19, 2014 at 2:18 AM, Umesh Prasad umesh.i...@gmail.com wrote: Continuing the discussion on mailing list from Jira. An Example *id group f1 f2*1 g1 5 10 2 g1 5 1000 3 g1 5 1000 4 g1 10 100 5 g2 5 10 6 g2 5 1000 7 g2 5 1000 8 g210 100 sort= f1 asc, f2 desc , id desc *Without collapse will give : * (7,g2), (6,g2), (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1) *On collapsing by group_s expected output is : * (7,g2), (3,g1) solr standard collapsing does give this output with group=on,group.field=group_s,group.main=true * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} : (5,g2), (1,g1) * Summarizing Jira Discussion :* 1. CollapsingQParserPlugin picks up the group heads from matching results and passes those further. So in essence filtering some of the matching documents, so that subsequent collectors never see them. It can also pass on score to subsequent collectors using a dummy scorer. 2. TopDocCollector comes later in hierarchy and it will sort on the collapsed set. That works fine. The issue is with step 1. Collapsing is done by a single comparator which can take its value from a field or function. It defaults to score. Function queries do allow us to combine multiple fields / value sources, however it would be difficult to construct a function for given sort fields. Primarily because a) The range of values for a given sort field is not known in advance. It is possible for one sort field to unbounded, but other to be bounded within a small range. b) The sort field can itself hold custom logic. Because of (a) the group head selected by CollapsingQParserPlugin will be incorrect and subsequent sorting will break. On 14 June
CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true
(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at java.lang.Thread.run(Thread.java:745) --- Thanks Regards Umesh Prasad
Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
Continuing the discussion on mailing list from Jira. An Example *id group f1 f2*1 g1 5 10 2 g1 5 1000 3 g1 5 1000 4 g1 10 100 5 g2 5 10 6 g2 5 1000 7 g2 5 1000 8 g210 100 sort= f1 asc, f2 desc , id desc *Without collapse will give : * (7,g2), (6,g2), (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1) *On collapsing by group_s expected output is : * (7,g2), (3,g1) solr standard collapsing does give this output with group=on,group.field=group_s,group.main=true * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} : (5,g2), (1,g1) * Summarizing Jira Discussion :* 1. CollapsingQParserPlugin picks up the group heads from matching results and passes those further. So in essence filtering some of the matching documents, so that subsequent collectors never see them. It can also pass on score to subsequent collectors using a dummy scorer. 2. TopDocCollector comes later in hierarchy and it will sort on the collapsed set. That works fine. The issue is with step 1. Collapsing is done by a single comparator which can take its value from a field or function. It defaults to score. Function queries do allow us to combine multiple fields / value sources, however it would be difficult to construct a function for given sort fields. Primarily because a) The range of values for a given sort field is not known in advance. It is possible for one sort field to unbounded, but other to be bounded within a small range. b) The sort field can itself hold custom logic. Because of (a) the group head selected by CollapsingQParserPlugin will be incorrect and subsequent sorting will break. On 14 June 2014 12:38, Umesh Prasad umesh.i...@gmail.com wrote: Thanks Joel for the quick response. I have opened a new jira ticket. https://issues.apache.org/jira/browse/SOLR-6168 On 13 June 2014 17:45, Joel Bernstein joels...@gmail.com wrote: Let's open a new ticket. Joel Bernstein Search Engineer at Heliosearch On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad umesh.i...@gmail.com wrote: The patch in SOLR-5408 fixes the issue with sorting only for two sort fields. Sorting still breaks when 3 or more sort fields are used. I have attached a test case, which demonstrates the broken behavior when 3 sort fields are used. The failing test case patch is against Lucene/Solr 4.7 revision number 1602388 Can someone apply and verify the bug ? Also, should I re-open SOLR-5408 or open a new ticket ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
Thanks Joel for the quick response. I have opened a new jira ticket. https://issues.apache.org/jira/browse/SOLR-6168 On 13 June 2014 17:45, Joel Bernstein joels...@gmail.com wrote: Let's open a new ticket. Joel Bernstein Search Engineer at Heliosearch On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad umesh.i...@gmail.com wrote: The patch in SOLR-5408 fixes the issue with sorting only for two sort fields. Sorting still breaks when 3 or more sort fields are used. I have attached a test case, which demonstrates the broken behavior when 3 sort fields are used. The failing test case patch is against Lucene/Solr 4.7 revision number 1602388 Can someone apply and verify the bug ? Also, should I re-open SOLR-5408 or open a new ticket ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
The patch in SOLR-5408 fixes the issue with sorting only for two sort fields. Sorting still breaks when 3 or more sort fields are used. I have attached a test case, which demonstrates the broken behavior when 3 sort fields are used. The failing test case patch is against Lucene/Solr 4.7 revision number 1602388 Can someone apply and verify the bug ? Also, should I re-open SOLR-5408 or open a new ticket ? --- Thanks Regards Umesh Prasad
Re: CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used
Thanks a lot Joel .. For now I have taken it from trunk and verified the patched code works fine .. On Thu, Dec 12, 2013 at 9:21 PM, Joel Bernstein joels...@gmail.com wrote: Hi, This is a known issue resolved in SOLR-5408. It's fixed in trunk and 4x and if there is a 4.6.1 it will be in there. If not it will be Solr 4.7. https://issues.apache.org/jira/browse/SOLR-5408 Joel On Wed, Dec 11, 2013 at 11:36 PM, Umesh Prasad umesh.i...@gmail.com wrote: Issue occurs in Single Segment index also .. sort: score desc,floSalesRank asc response: { - numFound: 21461, - start: 0, - maxScore: 4.4415073, - docs: [ - { - floSalesRank: 0, - score: 0.123750895, - [docid]: 9208 - On Thu, Dec 12, 2013 at 9:50 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi All, I am using new CollapsingQParserPlugin for Grouping and found that it works incorrectly when I use multiple sort criteria. http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id} - sort: score desc,floSalesRank asc, - fl: score,floSalesRank,[docid], - start: 0, - q: car and toys, - facet.field: store_path, - fq: {!collapse field=item_id} response: { - numFound: 21461, - start: 0, - maxScore: 4.447499, - docs: [ - { - floSalesRank: 0, - score: 0.12396862, - [docid]: 9703 }, - { - I found a bug opened for same https://issues.apache.org/jira/browse/SOLR-5408 .. The bug is closed but I am not really sure that it works specially for Multiple segment parts .. I am using Solr 4.6.0 and my index contains 4 segments .. Have anyone else faced the same issue ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- Joel Bernstein Search Engineer at Heliosearch -- --- Thanks Regards Umesh Prasad
CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used
Hi All, I am using new CollapsingQParserPlugin for Grouping and found that it works incorrectly when I use multiple sort criteria. http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id} - sort: score desc,floSalesRank asc, - fl: score,floSalesRank,[docid], - start: 0, - q: car and toys, - facet.field: store_path, - fq: {!collapse field=item_id} response: { - numFound: 21461, - start: 0, - maxScore: 4.447499, - docs: [ - { - floSalesRank: 0, - score: 0.12396862, - [docid]: 9703 }, - { - I found a bug opened for same https://issues.apache.org/jira/browse/SOLR-5408 .. The bug is closed but I am not really sure that it works specially for Multiple segment parts .. I am using Solr 4.6.0 and my index contains 4 segments .. Have anyone else faced the same issue ? --- Thanks Regards Umesh Prasad
Re: CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used
Issue occurs in Single Segment index also .. sort: score desc,floSalesRank asc response: { - numFound: 21461, - start: 0, - maxScore: 4.4415073, - docs: [ - { - floSalesRank: 0, - score: 0.123750895, - [docid]: 9208 - On Thu, Dec 12, 2013 at 9:50 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi All, I am using new CollapsingQParserPlugin for Grouping and found that it works incorrectly when I use multiple sort criteria. http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id} - sort: score desc,floSalesRank asc, - fl: score,floSalesRank,[docid], - start: 0, - q: car and toys, - facet.field: store_path, - fq: {!collapse field=item_id} response: { - numFound: 21461, - start: 0, - maxScore: 4.447499, - docs: [ - { - floSalesRank: 0, - score: 0.12396862, - [docid]: 9703 }, - { - I found a bug opened for same https://issues.apache.org/jira/browse/SOLR-5408 .. The bug is closed but I am not really sure that it works specially for Multiple segment parts .. I am using Solr 4.6.0 and my index contains 4 segments .. Have anyone else faced the same issue ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Solr Core Reload causing JVM Memory Leak through FieldCache/LRUCache/LFUCache
Mailing list by default removes attachments. So uploaded it to google drive .. https://drive.google.com/file/d/0B-RnB4e-vaJhX280NVllMUdHYWs/edit?usp=sharing On Fri, Nov 15, 2013 at 2:28 PM, Umesh Prasad umesh.i...@gmail.com wrote: Hi All, We are seeing memory leaks in our Search application whenever core reload happens after replication. We are using Solr 3.6.2 and I have observed this consistently on all servers. The leak suspect analysis from MAT is attached with the mail. #1425afb4a706064b_ Problem Suspect 1 One instance of *org.apache.lucene.search.FieldCacheImpl*loaded by *org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30* occupies *8,726,099,312 (35.49%)* bytes. The memory is accumulated in one instance of*java.util.HashMap$Entry[]* loaded by *system class loader*. *Keywords* org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30 java.util.HashMap$Entry[] org.apache.lucene.search.FieldCacheImpl Problem Suspect 2 69 instances of *org.apache.solr.util.ConcurrentLRUCache*, loaded by *org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30* occupy *6,309,187,392 (25.66%)* bytes. Biggest instances: - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7fe74ef120 - 755,575,672 (3.07%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7e74b7a068 - 728,731,344 (2.96%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7d0a6bd1b8 - 711,828,392 (2.90%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7c6c12e800 - 708,657,624 (2.88%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7fcb092058 - 568,473,352 (2.31%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7f268cb2f0 - 568,400,040 (2.31%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7e31b60c58 - 544,078,600 (2.21%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7e65c2b2d8 - 489,578,480 (1.99%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7d81ea8538 - 467,833,720 (1.90%) bytes. - org.apache.solr.util.ConcurrentLRUCache @ 0x7f7f31996508 - 444,383,992 (1.81%) bytes. *Keywords* org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30 org.apache.solr.util.ConcurrentLRUCache Details » http://pages/24.html 194 instances of *org.apache.solr.util.ConcurrentLFUCache*, loaded by *org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30* occupy *4,583,727,104 (18.64%)* bytes. Biggest instances: - org.apache.solr.util.ConcurrentLFUCache @ 0x7f7cdd4735a0 - 410,628,176 (1.67%) bytes. - org.apache.solr.util.ConcurrentLFUCache @ 0x7f7c7d48e180 - 390,690,864 (1.59%) bytes. - org.apache.solr.util.ConcurrentLFUCache @ 0x7f7f1edfd008 - 348,193,312 (1.42%) bytes. - org.apache.solr.util.ConcurrentLFUCache @ 0x7f7f37b01990 - 340,595,920 (1.39%) bytes. - org.apache.solr.util.ConcurrentLFUCache @ 0x7f7fe02d8dd8 - 274,611,632 (1.12%) bytes. - org.apache.solr.util.ConcurrentLFUCache @ 0x7f7fa9dcfb20 - 253,848,232 (1.03%) bytes. *Keywords* org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30 org.apache.solr.util.ConcurrentLFUCache --- Thanks Regards Umesh Prasad SDE @ Flipkart : The Online Megastore at your doorstep .. -- --- Thanks Regards Umesh Prasad
Re: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1
Hi Shawn, Thanks for the advice :). The JVM heap Size usage on indexer machine has been consistency about 95% (both total and old gen) for past 3 days. It might have nothing to do with Solr 3.6 Vs solr 4.2 .. Because Solr 3.6 indexer gets restarted once in 2-3 days. Will investigate why memory usage is so high on indexer. On Wed, May 22, 2013 at 10:03 AM, Shawn Heisey s...@elyograg.org wrote: On 5/21/2013 9:22 PM, Umesh Prasad wrote: This is our own implementation of data source (canon name com.flipkart.w3.solr.MultiSPCMSProductsDataSource) , which pulls the data from out downstream service and it doesn't cache data in RAM. It fetches the data in batches of 200 and iterates over it when DIH asks for it. I will check the possibility of leak, but unlikely. Can OOM issue be because during analysis, IndexWriter finds the document to be too large to fit in 100 MB memory and can't flush to disk ? Our analyzer chain doesn't make easy (specially with a field like) (does a cross product of synonyms terms) If your documents are really large (hundreds of KB, or a few MB), you might need a bigger ramBufferSizeMB value ... but if that were causing problems, I would expect it to show up during import, not at commit time. How much of your 32GB heap is in use during indexing? Would you be able to try with the heap at 31GB instead of 32GB? One of Java's default optimizations (UseCompressedOops) gets turned off with a heap size of 32GB because it doesn't work any more, and that might lead to strange things happening. Do you have the ability to try 4.3 instead of 4.2.1? Thanks, Shawn -- --- Thanks Regards Umesh Prasad
Re: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1
We have sufficient RAM on machine ..64 GB and we have given JVM 32 GB of memory. The machine runs Indexing primarily. The JVM doesn't run out of memory. It is the particular IndexWriterSolrCore which has .. May be we have specified too low a memory for IndexWriter .. We index mainly product data and use DIH to pull data from downstream services. Autocommiit is off. The commit is infrequent for legacy reasons.. 1 commit in 2-3 hrs. It it makes a difference, then, a Core can have more than10 lakh documents uncommitted at a time. IndexWriter has a memory of 100 MB We ran with same config on Solr 3.5 and we never ran out of Memory. But then, I hadn't tried hard commits on Solr 3.5. Data-Source Entry : dataConfig dataSource name=products type=MultiSPCMSProductsDataSource spCmsHost=$config.spCmsHost spCmsPort=$config.spCmsPort spCmsTimeout=3 cmsBatchSize=200 psURL=$config.psUrl autoCommit=false/ document name=products entity name=item pk=id transformer=w3.solr.transformers.GenericProductsTransformer dataSource=products /entity /document /dataConfig IndexConfig. ramBufferSizeMB100/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength5/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout On Tue, May 21, 2013 at 7:07 PM, Jack Krupansky j...@basetechnology.comwrote: Try again on a machine with more memory. Or did you do that already? -- Jack Krupansky -Original Message- From: Umesh Prasad Sent: Tuesday, May 21, 2013 1:57 AM To: solr-user@lucene.apache.org Subject: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1 Hi All, I am hitting an OOM error while trying to do an hard commit on one of the cores. Transaction log dir is Empty and DIH shows indexing going on for 13 hrs.. *Indexing since 13h 22m 22s* Requests: 5,211,392 (108/s), Fetched: 1,902,792 (40/s), Skipped: 106,853, Processed: 1,016,696 (21/s) Started: about 13 hours ago response lst name=responseHeaderint name=status500/intint name=QTime4/int/lstlst name=errorstr name=msgthis writer hit an OutOfMemoryError; cannot commit/strstr name=tracejava.lang.**IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.**IndexWriter.**prepareCommitInternal(** IndexWriter.java:2661) at org.apache.lucene.index.**IndexWriter.commitInternal(** IndexWriter.java:2827) at org.apache.lucene.index.**IndexWriter.commit(** IndexWriter.java:2807) at org.apache.solr.update.**DirectUpdateHandler2.commit(** DirectUpdateHandler2.java:536) at org.apache.solr.update.**processor.RunUpdateProcessor.**processCommit(** RunUpdateProcessorFactory.**java:95) at org.apache.solr.update.**processor.**UpdateRequestProcessor.** processCommit(**UpdateRequestProcessor.java:**64) at org.apache.solr.update.**processor.**DistributedUpdateProcessor.** processCommit(**DistributedUpdateProcessor.**java:1055) at org.apache.solr.update.**processor.LogUpdateProcessor.**processCommit(** LogUpdateProcessorFactory.**java:157) at org.apache.solr.handler.**RequestHandlerUtils.**handleCommit(** RequestHandlerUtils.java:69) at org.apache.solr.handler.**ContentStreamHandlerBase.**handleRequestBody(** ContentStreamHandlerBase.java:**68) at org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1817) at org.apache.solr.servlet.**SolrDispatchFilter.execute(** SolrDispatchFilter.java:639) at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** SolrDispatchFilter.java:345) at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** SolrDispatchFilter.java:141) at org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(** ApplicationFilterChain.java:**235) at org.apache.catalina.core.**ApplicationFilterChain.**doFilter(** ApplicationFilterChain.java:**206) at org.apache.catalina.core.**StandardWrapperValve.invoke(** StandardWrapperValve.java:233) at org.apache.catalina.core.**StandardContextValve.invoke(** StandardContextValve.java:191) at org.apache.catalina.core.**StandardHostValve.invoke(** StandardHostValve.java:127) at org.apache.catalina.valves.**ErrorReportValve.invoke(** ErrorReportValve.java:102) at org.apache.catalina.core.**StandardEngineValve.invoke(** StandardEngineValve.java:109) at org.apache.catalina.valves.**AccessLogValve.invoke(** AccessLogValve.java:554) at org.apache.catalina.connector.**CoyoteAdapter.service(** CoyoteAdapter.java:298) at org.apache.coyote.http11.**Http11Processor.process(** Http11Processor.java:859) at org.apache.coyote.http11.**Http11Protocol$**Http11ConnectionHandler.** process(Http11Protocol.java:**588) at org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(** JIoEndpoint.java:489) at java.lang.Thread.run
Re: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1
Hi Shawn, This is our own implementation of data source (canon name com.flipkart.w3.solr.MultiSPCMSProductsDataSource) , which pulls the data from out downstream service and it doesn't cache data in RAM. It fetches the data in batches of 200 and iterates over it when DIH asks for it. I will check the possibility of leak, but unlikely. Can OOM issue be because during analysis, IndexWriter finds the document to be too large to fit in 100 MB memory and can't flush to disk ? Our analyzer chain doesn't make easy (specially with a field like) (does a cross product of synonyms terms) fieldType name=textStemmed class=solr.TextField indexed=true stored=false multiValued=true positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.*SynonymFilterFactory* synonyms=* synonyms_index.txt* ignoreCase=true expand=*true*/ filter class=solr.KStemFilterFactory / filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.*SynonymFilterFactory* synonyms=* synonyms_index.txt* ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt ignoreCase=true expand=true/ filter class=solr.KStemFilterFactory / filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer /fieldType On Wed, May 22, 2013 at 5:03 AM, Shawn Heisey s...@elyograg.org wrote: On 5/21/2013 5:14 PM, Umesh Prasad wrote: We have sufficient RAM on machine ..64 GB and we have given JVM 32 GB of memory. The machine runs Indexing primarily. The JVM doesn't run out of memory. It is the particular IndexWriterSolrCore which has .. May be we have specified too low a memory for IndexWriter .. We index mainly product data and use DIH to pull data from downstream services. Autocommiit is off. The commit is infrequent for legacy reasons.. 1 commit in 2-3 hrs. It it makes a difference, then, a Core can have more than10 lakh documents uncommitted at a time. IndexWriter has a memory of 100 MB We ran with same config on Solr 3.5 and we never ran out of Memory. But then, I hadn't tried hard commits on Solr 3.5. Hard commits are the only kind of commits that Solr 3.x has. It's soft commits that are new with 4.x. Data-Source Entry : dataConfig dataSource name=products type=**MultiSPCMSProductsDataSource This appears to be using a custom data source, not one of the well-known types. If it had been JDBC, I would be saying that your JDBC driver is trying to cache the entire result set in RAM. With a MySQL data source, a batchSize of -1 fixes this problem, by internally changing the JDBC fetchSize to Integer.MIN_VALUE. Other databases have different mechanisms. With this data source, I have no idea at all how to make sure that it doesn't cache all results in RAM. It might be that the combination of the new Solr and this custom data source causes a memory leak, something that doesn't happen with the old Solr version. You said that the transaction log directory is empty. That rules out one possibility, which would be solved by the autoCommit settings on this page: http://wiki.apache.org/solr/**SolrPerformanceProblems#Slow_**startuphttp://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup Aside from the memory leak idea, or possibly having your entire source data cached in RAM, I have no idea what's happening here. Thanks, Shawn -- --- Thanks Regards Umesh Prasad
Hard Commit giving OOM Error on Index Writer in Solr 4.2.1
Hi All, I am hitting an OOM error while trying to do an hard commit on one of the cores. Transaction log dir is Empty and DIH shows indexing going on for 13 hrs.. *Indexing since 13h 22m 22s* Requests: 5,211,392 (108/s), Fetched: 1,902,792 (40/s), Skipped: 106,853, Processed: 1,016,696 (21/s) Started: about 13 hours ago response lst name=responseHeaderint name=status500/intint name=QTime4/int/lstlst name=errorstr name=msgthis writer hit an OutOfMemoryError; cannot commit/strstr name=tracejava.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1055) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) -- --- Thanks Regards Umesh Prasad
Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start
Sorry for late reply. I was trying to change our indexing pipeline and do explicit intermediate commits for each core. That turned out to be a bit more work that I have time for. So, I do want to explore hard commits. I tried solr-host:port/solr/core/*update?commit=true* . But there is no impact on Txn Log size, so I feel, it must be getting ignored. So can someone tell me, how to do the Hard Commits ? @Shawn : openSearcher = false is not an option. On Each commit, index will be replicated to Slaves which will have a searcher on it immediately and can intermediate state. The longer term and better solution is changing indexing pipeline and doing explicit commits, but I can't implement that right now. On 18 Apr 2013 00:35, Shawn Heisey s...@elyograg.org wrote: On 4/17/2013 11:56 AM, Mark Miller wrote: There is one additional caveat - when you disable the updateLog, you have to switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory. The NRT directory implementation will cache a portion of a commit (including hard commits) into RAM instead of onto disk. On the next commit, the previous one is persisted completely to disk. Without a transaction log, you can lose data. I don't think this is true? NRTCachingDirectoryFactory should not cache hard commits and should be as safe as MMapDirectoryFactory is - neither of which is as safe as using a tran log. This is based on observations of what happens with my segment files when I do a full-import, using autoCommit with openSearcher disabled. I see that each autoCommit results in a full segment being written, the part of another segment. On the next autoCommit, the rest of the files for the last segment are written, another full segment is written, I get another partial segment. I asked about this on the list some time ago, and what I just told Umesh is a rehash of what I understood from Yonik's response. If I'm wrong, I hope someone who knows for sure can correct me. Thanks, Shawn
Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start
Thanks Erick. Couple of Questions : Our transaction logs are huge as we have disabled auto commit. The biggest one is 6.1 GB. *567M*autosuggest/data/tlog *22M* avmediaCore/data/tlog *388M*booksCore/data/tlog *4.9G * books/data/tlog *6.1G * mp3-downloads/data/tlog ( 150 % of index Size) 1*.5G*next-5/data/tlog 690Mqueries/data/tlog ( 25 % of Index Size ) 207MqueryProduct/data/tlog (100 % of Index Size) Btw, I am surprised by the size of transaction log, because that is a significant amount of index size itself 2.6Gautosuggest/data/index 992MavmediaCore/data/index 12G booksCore/data/index 4.2Gmp3-downloads-new/data/index 45G next-5/data/index 2.9Gqueries/data/index *220M*queryProduct/data/index We use DIH and have turned off the Auto commit because we have to sometimes build index from Scratch (clean=true) and we not want to Our master server sees a lot of restarts, sometimes 2-3 times a day. It polls other Data Sources for updates which are quite a few. Master maintains a version of last committed version and can handle uncommitted changes. Given the frequent restarts, We can't really afford a huge start up at this point. In the worst case, does Solr allow for disabling transactional log ? Here is our Index Config indexConfig !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength5/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType !-- options specific to the main on-disk lucene index -- useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor5/mergeFactor !-- Deprecated -- !--maxBufferedDocs1000/maxBufferedDocs-- maxMergeDocs2147483647/maxMergeDocs maxFieldLength5/maxFieldLength unlockOnStartupfalse/unlockOnStartup deletionPolicy class=solr.SolrDeletionPolicy !-- The number of commit points to be kept -- str name=maxCommitsToKeep5/str !-- The number of optimized commit points to be kept -- str name=maxOptimizedCommitsToKeep0/str str name=maxCommitAge2HOUR/str /deletionPolicy /indexConfig Thanks Regards Umesh Prasad On Wed, Apr 17, 2013 at 4:57 PM, Erick Erickson erickerick...@gmail.comwrote: How big are you transaction logs? They can be replayed on startup. They are truncated and a new one started when you do a hard commit (openSearcher true or false doesn't matter). So a quick test of this theory would be to just stop your indexing process, issue a hard commit on all your cores and _then_ try to restart. If it comes up immediately, you've identified your problem. Best Erick On Tue, Apr 16, 2013 at 8:33 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi, We are migrating to Solr 4.2 from Solr 3.6 and Solr 4.2 is throwing Exception on Restart. What is More, it take a hell lot of Time ( More than one hour to get Up and Running) THE exception After Restart ... = Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates update WARNING: Unexpected log entry or corrupt log. Entry=11 java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.List at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929) at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863) at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014) at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253) at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:137) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525) at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596) at org.apache.solr.core.SolrCore.init(SolrCore.java:806) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051
Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start
org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost:25280/solr/cameras is not available. Index fetch failed. Exception: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://localhost:25280/solr/cameras Before Restart : Server was running Incremental Indexing ( Triggered by a Cron). The cron triggers every 5 mins for each of about 40 Cores. This was the same with Solr 3.5 also. But we never faced any issues. -- --- Thanks Regards Umesh Prasad
Re: Downloaded Solr 4.2.1 Source: Build Failing
j*ava version 1.6.0_43 Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode) * Mac OS X : Version 10.7.5 -- Umesh On Sat, Apr 13, 2013 at 12:08 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c : *omponent/QueryComponent.java:765: cannot find symbol : [javac] symbol : class ShardFieldSortedHitQueue : [javac] location: class org.apache.solr.handler.component.QueryComponent : [javac] ShardFieldSortedHitQueue queue;* Weird ... can you provide us more details about the java compiler you are using? ShardFieldSortedHitQueue is a package protected class declared in ShardDoc.java (in the same package as QueryComponent). That isn't exactly a best practice, but it shouldn't be causing a compilation failure. -Hoss -- --- Thanks Regards Umesh Prasad
Re: Downloaded Solr 4.2.1 Source: Build Failing
Further update on same. Build on Branch http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1 succeeds fine. Build fails only for Source code downloaded from http://apache.techartifact.com/mirror/lucene/solr/4.2.1/solr-4.2.1-src.tgz On Sun, Apr 14, 2013 at 1:05 PM, Umesh Prasad umesh.i...@gmail.com wrote: j*ava version 1.6.0_43 Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode) * Mac OS X : Version 10.7.5 -- Umesh On Sat, Apr 13, 2013 at 12:08 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c : *omponent/QueryComponent.java:765: cannot find symbol : [javac] symbol : class ShardFieldSortedHitQueue : [javac] location: class org.apache.solr.handler.component.QueryComponent : [javac] ShardFieldSortedHitQueue queue;* Weird ... can you provide us more details about the java compiler you are using? ShardFieldSortedHitQueue is a package protected class declared in ShardDoc.java (in the same package as QueryComponent). That isn't exactly a best practice, but it shouldn't be causing a compilation failure. -Hoss -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Not able to replicate the solr 3.5 indexes to solr 4.2 indexes
Hi Erick, I have already created a Jira and also attached a Path. But no unit tests. My local build is failing (building from solr 4.2.1 source jar). Please see https://issues.apache.org/jira/browse/SOLR-4703 . -- Umesh On Sat, Apr 13, 2013 at 7:24 PM, Erick Erickson erickerick...@gmail.comwrote: Please make a JIRA and attach as a patch if there aren't any JIRAs for this yet. Best Erick On Fri, Apr 12, 2013 at 1:58 AM, Montu v Boda montu.b...@highqsolutions.com wrote: hi thanks for your reply. is anyone is going to fix this issue in new solr version? because there are so many guys facing the same problem while upgrading the solr index 3.5.0 to solr 4.2 Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/Not-able-to-replicate-the-solr-3-5-indexes-to-solr-4-2-indexes-tp4055313p4055477.html Sent from the Solr - User mailing list archive at Nabble.com. -- --- Thanks Regards Umesh Prasad
Downloaded Solr 4.2.1 Source: Build Failing
common.compile-core: [javac] Compiling 337 source files to /Users/umeshprasad/Downloads/solr-4.2.1/solr/build/solr-core/classes/java [javac] /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c *omponent/QueryComponent.java:765: cannot find symbol [javac] symbol : class ShardFieldSortedHitQueue [javac] location: class org.apache.solr.handler.component.QueryComponent [javac] ShardFieldSortedHitQueue queue;* [javac] ^ [javac] /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:766: cannot find symbol [javac] symbol : class ShardFieldSortedHitQueue [javac] location: class org.apache.solr.handler.component.QueryComponent [javac] queue = new ShardFieldSortedHitQueue(sortFields, ss.getOffset() + ss.getCount()); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors -- --- Thanks Regards Umesh Prasad
Re: Index Replication Failing in Solr 4.2.1
Created Jira Issue https://issues.apache.org/jira/browse/SOLR-4703 and attached the Patch. No unit tests yet. On Fri, Apr 12, 2013 at 12:59 AM, Mark Miller markrmil...@gmail.com wrote: I was looking for this msg the other day and couldn't find it offhand… +1, please add this to JIRA so someone can look into it and it does not get lost! - Mark On Apr 11, 2013, at 11:17 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Umesh, The attachment didn't make it through. Could you please add it to JIRA? http://wiki.apache.org/solr/HowToContribute Thanks, Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 10, 2013 at 9:43 PM, Umesh Prasad umesh.i...@gmail.com wrote: Root caused the Issue to a Code Bug / Contract Violation in SnapPuller in solr 4.2.1 (impacts trunk as well) and Fixed by Patching the SnapPuller locally. fetchfilelist API expects indexversion to be specified as param. So Call to Master should of be Form : /solr/phcare/replication?command=filelistgen=108213wt=jsonindexversion=1323961125908 Instead Slave Calls the Master as : /solr/phcare/replication?command=filelistgen=108213wt=json Code bug lies in SnapPuller.fetchFileList(long gen) which gets called by SnapPuller.fetchLatestIndex(final SolrCore core, boolean forceReplication) The fix is pass along the version to fetchFileList and populate it. A Patch is attached for trunk. Thanks Regards Umesh Prasad Search Engineer @ Flipkart : India's Online Megastore - Empowering Consumers Find Products .. On Tue, Apr 9, 2013 at 9:28 PM, Umesh Prasad umesh.i...@gmail.com wrote: Hi All, I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is running fine and set to go, except the master slave replication. We use master slave replication with multi cores ( 1 master, 10 slaves and 20 plus cores). My Configuration is : Master : Solr 3.5.0, Has existing index, and delta import running using DIH. Slave : Solr 4.2.1 , Has no startup index Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute INFO: [phcare] webapp= path=/replication params={command=fetchindex_=1365522520521wt=json} status=0 QTime=1 Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's generation: 107876 Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's generation: 79248 Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList SEVERE: No files to download for index generation: 107876 Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute INFO: [phcare] webapp= path=/replication params={command=details_=1365522520556wt=json} status=0 QTime=7 In Both Master and Slave The File list for replicable version is correct. on Slave { masterDetails: { indexSize: 4.31 MB, indexPath: /var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012, commits: [ [ indexVersion, 1323961124638, generation, 107856, filelist, [ _45e1.tii, _45e1.nrm, .. ON Master [ indexVersion, 1323961124638, generation, 107856, filelist, [ _45e1.tii, _45e1.nrm, _45e2_1.del, _45e2.frq, _45e1_3.del, _45e1.tis, .. Can someone help. Our whole Migration to Solr 4.2 is blocked on Replication issue. --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Index Replication Failing in Solr 4.2.1
Root caused the Issue to a Code Bug / Contract Violation in SnapPuller in solr 4.2.1 (impacts trunk as well) and Fixed by Patching the SnapPuller locally. fetchfilelist API expects indexversion to be specified as param. So Call to Master should of be Form : /solr/phcare/replication?command=filelistgen=108213wt=jsonindexversion=1323961125908 Instead Slave Calls the Master as : /solr/phcare/replication?command=filelistgen=108213wt=json Code bug lies in SnapPuller.fetchFileList(long gen) which gets called by SnapPuller.fetchLatestIndex(final SolrCore core, boolean forceReplication) The fix is pass along the version to fetchFileList and populate it. A Patch is attached for trunk. Thanks Regards Umesh Prasad Search Engineer @ Flipkart : India's Online Megastore - Empowering Consumers Find Products .. On Tue, Apr 9, 2013 at 9:28 PM, Umesh Prasad umesh.i...@gmail.com wrote: Hi All, I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is running fine and set to go, except the master slave replication. We use master slave replication with multi cores ( 1 master, 10 slaves and 20 plus cores). My Configuration is : Master : Solr 3.5.0, Has existing index, and delta import running using DIH. Slave : Solr 4.2.1 , Has no startup index Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute INFO: [phcare] webapp= path=/replication params={command=fetchindex_=1365522520521wt=json} status=0 QTime=1 Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex *INFO: Master's generation: 107876 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex *INFO: Slave's generation: 79248 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList SEVERE: No files to download for index generation: 107876 *Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute INFO: [phcare] webapp= path=/replication params={command=details_=1365522520556wt=json} status=0 QTime=7 In Both Master and Slave The File list for replicable version is correct. *on Slave * { - masterDetails: { - indexSize: 4.31 MB, - indexPath: /var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012, - commits: [ - [ - indexVersion, - 1323961124638, - generation, - 107856, - filelist, - [ - _45e1.tii, - _45e1.nrm, - .. *ON Master * [ - indexVersion, - 1323961124638, - generation, - 107856, - filelist, - [ - _45e1.tii, - _45e1.nrm, - _45e2_1.del, - _45e2.frq, - _45e1_3.del, - _45e1.tis, - .. Can someone help. Our whole Migration to Solr 4.2 is blocked on Replication issue. --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Index Replication Failing in Solr 4.2.1
Hi All, I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is running fine and set to go, except the master slave replication. We use master slave replication with multi cores ( 1 master, 10 slaves and 20 plus cores). My Configuration is : Master : Solr 3.5.0, Has existing index, and delta import running using DIH. Slave : Solr 4.2.1 , Has no startup index Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute INFO: [phcare] webapp= path=/replication params={command=fetchindex_=1365522520521wt=json} status=0 QTime=1 Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex *INFO: Master's generation: 107876 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex *INFO: Slave's generation: 79248 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList SEVERE: No files to download for index generation: 107876 *Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute INFO: [phcare] webapp= path=/replication params={command=details_=1365522520556wt=json} status=0 QTime=7 In Both Master and Slave The File list for replicable version is correct. *on Slave * { - masterDetails: { - indexSize: 4.31 MB, - indexPath: /var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012, - commits: [ - [ - indexVersion, - 1323961124638, - generation, - 107856, - filelist, - [ - _45e1.tii, - _45e1.nrm, - .. *ON Master * [ - indexVersion, - 1323961124638, - generation, - 107856, - filelist, - [ - _45e1.tii, - _45e1.nrm, - _45e2_1.del, - _45e2.frq, - _45e1_3.del, - _45e1.tis, - .. Can someone help. Our whole Migration to Solr 4.2 is blocked on Replication issue. --- Thanks Regards Umesh Prasad
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
[] ASF Mirrors (linked in our release announcements or via the Lucene website) [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) On Fri, Jan 21, 2011 at 10:01 PM, mike anderson saidthero...@gmail.com wrote: [x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) -- --- Thanks Regards Umesh Prasad