RE: Best way to anchor solr searches?
Thanks for the replies. I did look at caching but our commit time time is 90 seconds. It's definitely possible for someone to make a search, change the page, and have wonky results. How about getting it to autowarm the x most recent searches in the queryResultCache and that can hopefully reduce the issues? Though even that can result in issues with the search being out of date. Application cache per user would for sure solve such issues but I'd like to avoid this if possible. Definitely an interesting problem... -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3284674.html Sent from the Solr - User mailing list archive at Nabble.com.
Best way to anchor solr searches?
If I'm searching for users based on last login time, and I search once, then go to the second page with a new offset, I could potentially see the same users on page 2 if the index has changed. What is the best way to anchor it so I avoid this? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3282576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems generating war distribution using ant
Stupid me. The output file was named something else. I really need to make a proper servlet mapping. Works now :D -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-generating-war-distribution-using-ant-tp3260070p3260843.html Sent from the Solr - User mailing list archive at Nabble.com.
Problems generating war distribution using ant
So the way I generate war files now is by running an 'ant dist' in the solr folder. It generates the war fine and I get a build success, and then I deploy it to tomcat and once again the logs show it was successful (from the looks of it). However, when I go to 'myip:8080/solr/admin' I get an HTTP status 404. However, it works when I take a war from the nightly build, expand it, drop some new class files in there that I need, and close it up again. The solr I have checked out seems fine though and I can't find any differences between the war I'm generating and the one that has been generated. -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-generating-war-distribution-using-ant-tp3260070p3260070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems generating war distribution using ant
Interesting. I can use this as an option and create a custom 'war' target if need be but I'd like to avoid this. I'd rather do a full build from the source code I have checked out from the SVN. Any reason why 'ant dist' doesn't produce a good war file? -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-generating-war-distribution-using-ant-tp3260070p3260122.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems generating war distribution using ant
Interesting. I can use this as an option and create a custom 'war' target if need be but I'd like to avoid this. I'd rather do a full build from the source code I have checked out from the SVN. Any reason why 'ant dist' doesn't produce a good war file? -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-generating-war-distribution-using-ant-tp3260070p3260126.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Hudson build issues
I downloaded the official build (4.0) and I've been customizing it for my needs. I'm not really sure how to use these scripts. Is there somewhere in Hudson where I can apply these scripts or something? -- View this message in context: http://lucene.472066.n3.nabble.com/Hudson-build-issues-tp3244563p3246645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cache replication
Thanks for the advice paul, but post processing is a must for me given the nature of my application. I haven't had problems yet though. -- View this message in context: http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3244202.html Sent from the Solr - User mailing list archive at Nabble.com.
Hudson build issues
Whenever I try to build this on our hudson server it says it can't find org.apache.lucene:lucene-xercesImpl:jar:4.0-SNAPSHOT. Is the Apache repo lacking this artifact? -- View this message in context: http://lucene.472066.n3.nabble.com/Hudson-build-issues-tp3244563p3244563.html Sent from the Solr - User mailing list archive at Nabble.com.
Cache replication
I'm wondering if the caches on all the slaves are replicated across (such as queryResultCache). That is to say, if I hit one of my slaves and cache a result, and I make a search later and that search happens to hit a different slave, will that first cached result be available for use? This is pretty important because I'm going to have a lot of slaves and if this isn't done, then I'd have a high chance of running a lot uncached queries. Thanks :) -- View this message in context: http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cache replication
Thanks for the informative response. I'll consider using the 'sticky' addressing as you suggested. The reason cache is so important for me is because I'm actually doing more processing after the query component to come up with my query result and I want to avoid that processing as much as possible. But thanks alot! -- View this message in context: http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240853.html Sent from the Solr - User mailing list archive at Nabble.com.
Any way to get the value if sorting by function?
Lets say my sort is something like: sort=sum(indexedField, constant). If I have a component that runs right after the QueryComponent, is it possible to know what this value was for each of the documents IF the field is not stored, and only indexed? I scoured through the code and it didn't look like this was possible. -- View this message in context: http://lucene.472066.n3.nabble.com/Any-way-to-get-the-value-if-sorting-by-function-tp3148864p3148864.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After the query component has the results, can I do more filtering on them?
Sorry for being vague. Okay so these scores exist on an external server and they change often enough. The score for each returned user is actually dependent on the user doing the searching (if I'm making the request, and you make the same request, the scores are different). So what I'm doing is getting a bunch of scores from the external and aggregating that with the current scores solr gave in my component. So heres the flow (all numbers are arbitrary): 1) Get 10,000 results from solr from the query component 2) return a list of scores and ids from the external server (it'll return a lot of them) 3) Out of this 1, I take the top 3500 docs after aggregating the external servers scores and netcons scores. The problem is, the score for each doc is specific to the user making the request. The algorithm in doing these scores is quite complex. I cannot simply re-index with new scores, hence I've written this component which runs after querycomponent and does the magic of filtering. I've come up with a solution but it involved me changing a lot of solr code. First and foremost, I've maed the queryResultCache public and developed a small API in accessing and changing it. I've also changed the QueryResultKey to include a Long userId in its hashCode and equals functions. When a search is made, the QueryComponent caches its results, and then in my custom component I go into that cache, get my superset, filter it out from the scores in my external server, and throw it back into cache. Of course none of this happens if my custom scored stuff is already cached, so its actually decent. If you have any suggestions and improvements I'd greatly appreciate it. Sorry for the long response...I didn't want to be an XY problem again :D -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3141652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Cache cleared after a commit?
Sorry for my ignorance, but do you have any lead in the code on where to look for this? Also, I'd still need a way of finding out how long its been in the cache because I don't want it to regenerate every time. I'd want it to regenerate only if its been in the cache for less then 6 hours (or some time frame which I deem to be good). Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3141673.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what s the optimum size of SOLR indexes
It depends on how many queries you'd be making per second. I know for us, I have a gradient of index sizes. The first machine, which gets hit most often is about 2.5 gigs. Most of the queries would only ever need to hit this index but then I have a bigger indices of about 5-10 gigs each which are slower, but don't get queried as often so I can afford them to be a little slower (and hence the bigger index) -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-optimum-size-of-SOLR-indexes-tp3137314p3142309.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Cache cleared after a commit?
I guess I'll have to use something other then SolrCache to get what I want then. Or I could use SolrCache and just change the code (I've already done so much of this anwyways...). Anyways thanks for the reply. -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3136580.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Cache cleared after a commit?
I know the queryResultCache and stuff live only so long as a commit happens but I'm wondering if the custom caches are like this as well? I'd actually rather have a custom cache which is not cleared at all. I want to give the elements of this Cache a 6 hour TTL (or some time frame) but I never want it to clear on a commit. Is this possible using SolrCache? -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3136345.html Sent from the Solr - User mailing list archive at Nabble.com.
QueryResultCache question
So it seems the things in the queryResultCache have no TTL, I'm just curious how it works if I reindex something with new info? I am going to be reindexing things often (I'd sort by last login and this changes fast). I've been stepping through the code and of course if the same queries come in it simply gets the results from the key in the result cache. However, if I make the same query over and over again, when will I ever get different results? I'm a little confused as to how the 'correct' results are shown if it just uses the QueryResultKey to get the results from the cache. I imagine a new Searcher with a fresh cache is created or something with every index? If I'm reindexing very often, how useful is the QueryResultCache? -- View this message in context: http://lucene.472066.n3.nabble.com/QueryResultCache-question-tp3130135p3130135.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: QueryResultCache question
Thanks for the quick reply! I see theres no way to access the result cache, I actually want to access the result the cache in a new component I have which runs after the query but it seems this is impossible. I guess I'm just going to rebuild the code to make it public or something as I need the result cache. -- View this message in context: http://lucene.472066.n3.nabble.com/QueryResultCache-question-tp3130135p3130603.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After the query component has the results, can I do more filtering on them?
unfortunately the userIdsToScore updates very often. I'd get more Ids almost every single query (hence why I made the new component). But I see the problem of not being able to score the whole resultSet. I'd actually need to do this now that I think about it. I want to get a whole whack of users (lets say 10,000), score them using my system, and then 'remember' the top 3500 of these users in the result cache or something. How would I go about operating on the whole resultSet rather then just the 'rows' I set. I wonder if I can set rows to be really large, score them in the component, and then remember all of these results in the result cache and then dynamically change rows in my component so not all 3500 (or w/e number I choose) are returned. -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After the query component has the results, can I do more filtering on them?
Sorry for the double post but in this case, is it possible for me to access the queryResultCache in my component and play with it? Ideally what I want is this: 1) I have 1 (just a random large number) total results. 2) In my component I access all of these results, score them, and take the top 3500 (a random smaller number) and drop the rest. 3) The 3500 I have now should end up going into the queryResultCache and essentially replacing the other one. 4) The number returned to the user should then be rows and subsequent queries which are the same just gets them from my new result cache. I'm pretty noob about all if this so I'm hoping someone can help. -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3127581.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: After the query component has the results, can I do more filtering on them?
bump -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3123502.html Sent from the Solr - User mailing list archive at Nabble.com.
After the query component has the results, can I do more filtering on them?
So I made a custom search component which runs right after the query component and this custom component will update the score of each based on some things (and no, I definitely can't use existing components). I didn't see any easy way to just update the score so what I currently do is something like this: DocList docList = rb.getResults().docList; float[] scores = new float[docList.size()]; int[] docs = new int[docList.size()]; int docCounter = 0; int maxScore = 0; while (docList.iterator().hasNext()) { int userId = docList.iterator().nextDoc(); int score = userIdsToScore.get(userId); scores[docCounter] = score; docs[docCounter] = userId; docCounter++; if (maxScore score) { maxScore = score; } } docList = new DocSlice(0, docCounter, docs, scores, 0, maxScore); my userIdsToScore hashtable is how I'm determining the new score. There are a few other things I'm doing but this is the gist. I'm also not sure how to go about sorting this...but basically my question is, is this how I should be updating the score of the documents? -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3114775.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Caching queries.
Thanks, this is exactly what I'm looking for! -- View this message in context: http://lucene.472066.n3.nabble.com/Caching-queries-tp3078271p3087497.html Sent from the Solr - User mailing list archive at Nabble.com.
Caching queries.
I'm wondering if something like this is possible. Lets say I want to query 5000 objects all pertaining to a specific search and I want to return the top 100 or something and cache the rest on my solr server. The next time I get the same query or something with a new offset (lets say start from 101) does it have to do the query again or can it go to cache and get the next 100? -- View this message in context: http://lucene.472066.n3.nabble.com/Caching-queries-tp3078271p3078271.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to get all the hits and score them later?
Basically I don't want the hits and the scores at the same time. I want to get a list of hits but I want to score them myself externally (there is a dedicated server that will do the scoring given a list of id's). Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to get all the hits and score them later?
To clarify. I want to do this all underneath solr. I don't want to get a bunch of hits from solr in my app and then go to my server and score them again. I'd like to score them myself underneath solr before I return the results to my app. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3016592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to get all the hits and score them later?
Actually I was thinking I wanted to do something before the sharding (like in the layer where faceting happens for example). I wanna hack a plugin in the middle to go to my server after I have a bunch of hits. Just not sure where to do this... Though I've decided I can do scoring from solr (like a preliminary scoring to narrow down some results) and then in the middle send those hits to my server for additional scoring. I can't hack it on in the end since the sharding has happened I think, I'm just not sure where to look right now. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3017401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to get all the hits and score them later?
Hmm, looks like I can inherit the Similarity Class and do my own thing there. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-all-the-hits-and-score-them-later-tp3016424p3018001.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Scoring relying on another server.
bump -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p3006873.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Scoring relying on another server.
I know this question has been asked before but I think my situation is a little different. Basically I need to do custom scores that the traditional function queries simply won't allow me to do. I actually need to hit another server from Java (passing in a bunch of things mostly relying on how to score result). So I want to extend the current scorer and add in the things I need it to do for the scoring (make a trip to the scoring server with a bunch of parameters, and come back with the scores). Can someone point me to the right direction to doing this? Exactly where does the document scoring happen in Solr? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Scoring-relying-on-another-server-tp2994546p2994546.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
bump -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
Thanks for the reply! How exactly do I open an issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
https://issues.apache.org/jira/browse/SOLR-2526 modules/grouping was not a valid component so I just put it in search. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing on multiple fields and/or ranges?
Ah, my mistake. Thanks alot, this would be a really cool feature :) For now I'm resorting to like making more then one query and cross referencing the two separate queries. -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2959439.html Sent from the Solr - User mailing list archive at Nabble.com.
Field collapsing on multiple fields and/or ranges?
I'm wondering if there is a way to get the field collapsing to collapse on multiple things? For example, is there a way to get it to collapse on a field (lets say 'domain') but ALSO something else (maybe time or something)? To visualize maybe something like this: Group1 has common field 'www.forum1.com' and ALSO the posts are all from may 11 Group2 has common field 'www.forum2.com' and ALSO the posts are all from may 11 . . . GroupX has common field 'www.forum1.com' and ALSO the posts from may 12 So obviously it's still sorted by date but it won't group the 'www.forum1.com' things together if the document is from a different date, it'll group common date AND common domain field. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2929793.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrQuery API for adding group filter
I'm actually using php but I get what you're saying. I think I understand what I need to do. Thanks a lot man! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrQuery-API-for-adding-group-filter-tp2921539p2923701.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrQuery API for adding group filter
I actually have another question unrelated to this (but related to grouping). I'm wondering if I can do a more complex grouping, such as grouping by a field and also making sure it matches some other criteria (such as date). For example, currently it might group 5 items from some field, but the 5th item for example is from a really far date which I don't want grouped with these more recent items. Basically I want it to look like this: Group1 all has common field 'x' and ALSO is items from today Group2 all has common field 'x' again but now its items are from yesterday, etc... I'm having trouble figuring out how that'd work, any help would be appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrQuery-API-for-adding-group-filter-tp2921539p2924232.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrQuery API for adding group filter
There doesn't seem to be API to add a group (like group.field or group=true). I'm very new to this so I'm wondering how I'd go about adding a group query much like how I use 'addFilterQuery' to add an fq. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrQuery-API-for-adding-group-filter-tp2921539p2921539.html Sent from the Solr - User mailing list archive at Nabble.com.