RE: Facet sorting seems weird
This is indeed an interesting idea so to speak, but I think it's a bit too manual, so to speak, for our use case. I do see it would solve the problem though, so thank you for sharing it with the community! :) -Original Message- From: James Thomas [mailto:jtho...@camstar.com] Sent: 15. juli 2013 17:08 To: solr-user@lucene.apache.org Subject: RE: Facet sorting seems weird Hi Henrik, We did something related to this that I'll share. I'm rather new to Solr so take this idea cautiously :-) Our requirement was to show exact values but have case-insensitive sorting and facet filtering (prefix filtering). We created an index field (type=string) for creating facets so that the values are indexed as-is. The values we indexed were given the format lowercase value|exact value So for example, given the value bObles, we would index the string bobles|bObles. When displaying the facet we split the facet value from Solr in half and display the second half to the user. Of course the caveat is that you could have 2 facets that differ only in case, but to me that's a data cleansing issue. James -Original Message- From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] Sent: Monday, July 15, 2013 10:57 AM To: solr-user@lucene.apache.org Subject: RE: Facet sorting seems weird Hello, thank you for the quick reply! But given that facet.sort=index just sorts by the faceted index (and I don't want the facet itself to be in lower-case), would that really work? Regards, Henrik Ossipoff -Original Message- From: David Quarterman [mailto:da...@corexe.com] Sent: 15. juli 2013 16:46 To: solr-user@lucene.apache.org Subject: RE: Facet sorting seems weird Hi Henrik, Try setting up a copyfield in your schema and set the copied field to use something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on the copyfield. Regards, DQ -Original Message- From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] Sent: 15 July 2013 15:08 To: solr-user@lucene.apache.org Subject: Facet sorting seems weird Hello, first time writing to the list. I am a developer for a company where we recently switched all of our search core from Sphinx to Solr with very great results. In general we've been very happy with the switch, and everything seems to work just as we want it to. Today however we've run into a bit of a issue regarding faceted sort. For example we have a field called brand in our core, defined as the text_en datatype from the example Solr core. This field is copied into facet_brand with the datatype string (since we don't really need to do much with it except show it for faceted navigation). Now, given these two entries into the field on different documents, LEGO and bObles, and given facet.sort=index, it appears that LEGO is sorted as being before bObles. I assume this is because of casing differences. My question then is, how do we define a decent datatype in our schema, where the casing is exact, but we are able to sort it without casing mattering? Thank you :) Best regards, Henrik Ossipoff
RE: Facet sorting seems weird
Hi Alex, Yes this makes sense. My Java is a bit dusty, but depending on how much in need we will become at this feature, it's definitely something we will look into creating, and if successful, we will definitely be submitting a patch. Thank you for your time and detailed answer! Best regards, Henrik Ossipoff -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: 15. juli 2013 17:16 To: solr-user@lucene.apache.org Subject: Re: Facet sorting seems weird Hi Henrik, If I understand the question correctly (case-insensitive sorting of the facet values), then this is the limitation of the current Facet component. You can see the full implementation at: https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L818 If you are comfortable with Java code, the easiest thing might be to copy/fix the component and use your own one for faceting. The components are defined in solrconfig.xml and FacetComponent is in a default chain. See: https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/solrconfig.xml#L1194 If you do manage to do this (I would recommend doing it as an extra option), it would be nice to have it contributed back to Solr. I think you are not the only one with this requirement. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Jul 15, 2013 at 10:08 AM, Henrik Ossipoff Hansen h...@entertainment-trading.com wrote: Hello, first time writing to the list. I am a developer for a company where we recently switched all of our search core from Sphinx to Solr with very great results. In general we've been very happy with the switch, and everything seems to work just as we want it to. Today however we've run into a bit of a issue regarding faceted sort. For example we have a field called brand in our core, defined as the text_en datatype from the example Solr core. This field is copied into facet_brand with the datatype string (since we don't really need to do much with it except show it for faceted navigation). Now, given these two entries into the field on different documents, LEGO and bObles, and given facet.sort=index, it appears that LEGO is sorted as being before bObles. I assume this is because of casing differences. My question then is, how do we define a decent datatype in our schema, where the casing is exact, but we are able to sort it without casing mattering? Thank you :) Best regards, Henrik Ossipoff
Re: Clearing old nodes from zookeper without restarting solrcloud cluster
Hi, You should use CoreAdmin API (or Solr Admin page) and UNLOAD unneeded cores. This will unregister them from the zookeeper (cluster state will be updated), so they won't be used for querying any longer. Solrcloud restart is not needed in this case. Regards. On 16 July 2013 06:18, Ali, Saqib docbook@gmail.com wrote: Hello Luis, I don't think that is possible. If you delete clusterstate.json from zookeeper, you will need to restart the nodes.. I could be very wrong about this Saqib On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: I know that you can clear zookeeper's data directoy using the CLI with the clear command, I just want to know if its possible to update the cluster's state without wiping everything out. Anyone have any ideas/suggestions? On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: Hi, Is there an easy way to clear zookeeper of all offline solr nodes without restarting the cluster? We are having some stability issues and we think it maybe due to the leader querying old offline nodes. thank you, Luis Guerrero -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047
select in clause in solr
I am using solr 4.3 and have 2 collections coll1, coll2. After searching in coll1 I get field1 values which is a comma separated list of strings like, val1, val2, val3,... valN. How can I use that list to match field2 in coll2 with those values separated by an OR clause. So i want to return all documents in coll2 with field2=val1 or field2=val2 or field2=val3 ... or field2=valN In short looking for select in type clause in solr. Any pointers will be much appreciated. -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/select-in-clause-in-solr-tp4078255.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: select in clause in solr
Hello Manasi, Have a look at Solr pseudo joins http://wiki.apache.org/solr/Join Regards On Jul 16, 2013 9:54 AM, smanad sma...@gmail.com wrote: I am using solr 4.3 and have 2 collections coll1, coll2. After searching in coll1 I get field1 values which is a comma separated list of strings like, val1, val2, val3,... valN. How can I use that list to match field2 in coll2 with those values separated by an OR clause. So i want to return all documents in coll2 with field2=val1 or field2=val2 or field2=val3 ... or field2=valN In short looking for select in type clause in solr. Any pointers will be much appreciated. -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/select-in-clause-in-solr-tp4078255.html Sent from the Solr - User mailing list archive at Nabble.com.
Range query on a substring.
Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
Re: Range query on a substring.
IMHO the number(s) should be extracted and stored in separate columns in SOLR at indexing time. -- Oleg On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.comwrote: Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
Re: Range query on a substring.
Hi Oleg, It's a multivalued field and it won't be easier to query when I split this field into text and numbers. I may get wrong results. Regards. On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote: IMHO the number(s) should be extracted and stored in separate columns in SOLR at indexing time. -- Oleg On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
Re: Book contest idea - feedback requested
Alex, I am a beginner and I find it a really good idea. A new forum dedicated to understanding the features rather the missings would allow newcomers to post questions avoiding to mess up with solr-user list where people are already expert practitioners and prefer to see more targeted topics. Let us know follow-up. Andrea On Mon, Jul 15, 2013 at 8:11 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Hello, Packt Publishing has kindly agreed to let me run a contest with e-copies of my book as prizes: http://www.packtpub.com/apache-solr-for-indexing-data/book Since my book is about learning Solr and targeted at beginners and early intermediates, here is what I would like to do. I am asking for feedback on whether people on the mailing list like the idea or have specific objections to it. 1) The basic idea is to get Solr users and write and vote on what they find hard with Solr, especially in understanding the features (as contrasted with just missing ones). 2) I'll probably set it up as a User Voice forum, which has all the mechanisms for suggesting and voting on ideas. With an easier interface than Jira 3) The top N voted ideas will get the books as prizes and I will try to fix/document/create JIRAs for those issues. 4) I am hoping to specifically reach out to the communities where Solr is a component and where they don't necessarily hang out on our mailing list. I am thinking SolrNet, Drupal, project Blacklight, Cloudera, CrafterCMS, SiteCore, Typo3, SunSpot, Nutch. Obviously, anybody and everybody from this list would be absolutely welcome to participate as well. Yes? No? Suggestions? Also, if you are maintainer of one of the products/services/libraries that has Solr in it and want to reach out to your community yourself, I think it would be a lot better than If I did it. Contact me directly and I will let you know what template/FAQ I want you to include in the announcement message when it is ready. Thank you all in advance for the comments and suggestions. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Range query on a substring.
Ah, you mean something like this: record: Id=10, text = this is a text N1 (X), another text N2 (Y), text N3 (Z) Id=11, text = this is a text N1 (W), another text N2 (Q), third text (M) and you need to search for: text N1 and X B ? How big is the core? the first thing that comes to my mind, again, at indexing level, split the text into pieces and index it in solr like this: record_id | text | value 10 | text N1 | X 10 | text N2 | Y 10 | text N3 | Z does it help? On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.comwrote: Hi Oleg, It's a multivalued field and it won't be easier to query when I split this field into text and numbers. I may get wrong results. Regards. On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote: IMHO the number(s) should be extracted and stored in separate columns in SOLR at indexing time. -- Oleg On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
AW: About Suggestions
Hi Eric and everybody else! Thanks for trying to help. Here is the example: .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=trueterms.limit=20terms.sort=indexterms.prefix=1n1187 returns int name=1n11871/int int name=1n1187a1/int int name=1n1187r1/int int name=1n1187ra1/int This list contains 3 complete part numbers but the third item (1n1187r) is not a complete part number. Is there a way to make terms tell if a term represents a complete value? (My guess is that this gets lost after ngram but I'm still hoping something can be done.) More config details: field name=suggest type=text_parts indexed=true stored=true required=false multiValued=true/ and fieldType name=text_parts class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 13. Juli 2013 19:58 An: solr-user@lucene.apache.org Betreff: Re: About Suggestions Not quite sure what you mean here, a couple of examples would help. But since the term is using keyword tokenizer, then each thing you get back is a complete term, by definition. So I'm not quite sure what you're asking here. Best Erick On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Solr people! We need to suggest part numbers in alphabetically order adding up to four characters to the already entered part number prefix. That works quite well with terms component acting on a multivalued field with keyword tokenizer and edge nGram filter. I am mentioning part numbers to indicate that each item in the multivalued field is a string without whitespace and where special characters like dashes cannot be seen as separators. Is there a way to know if the term (the suggestion) represents such a complete part number (without doing another query for each suggestion)? Since we are using SolJ, what we would need is something like boolean Term.isRepresentingCompleteFieldValue() Thanks, Alexander
Re: Range query on a substring.
By multivalued I meant an array of values. For example: arr name=myfield strtext1 (X)/str strtext2 (Y)/str /arr I'd like to avoid spliting it as you propose. I have 2.3mn collection with pretty large records (few hundreds fields and more per record). Duplicating them would impact performance. Regards. On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote: Ah, you mean something like this: record: Id=10, text = this is a text N1 (X), another text N2 (Y), text N3 (Z) Id=11, text = this is a text N1 (W), another text N2 (Q), third text (M) and you need to search for: text N1 and X B ? How big is the core? the first thing that comes to my mind, again, at indexing level, split the text into pieces and index it in solr like this: record_id | text | value 10 | text N1 | X 10 | text N2 | Y 10 | text N3 | Z does it help? On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi Oleg, It's a multivalued field and it won't be easier to query when I split this field into text and numbers. I may get wrong results. Regards. On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote: IMHO the number(s) should be extracted and stored in separate columns in SOLR at indexing time. -- Oleg On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
Re: How to change extracted directory
As I said, if I change it in context.xml it works... but the question is... how to make it from commandline, without modyfing config files. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-extracted-directory-tp4078024p4078284.html Sent from the Solr - User mailing list archive at Nabble.com.
[solr 3.4.1] collections: meaning and necessity
Hello list, Following the answer by Jaendra here: http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
Re: [solr 3.4.1] collections: meaning and necessity
Sorry, hit send too fast.. picking up: from the answer by Jayendra on the link, collections and cores are the same thing. Same is seconded by the config: cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} core name=collection1 instanceDir=. / /cores we basically define cores. We have a plain {frontend_solr, shards} setup with solr 3.4 and were thinking of starting off with it initially in solr 4. In solr 4: can one get by without using collections = cores? We also don't plan on using SolrCloud at the moment. So from our standpoint the solr4 configuration looks more complicated, than that of solr 3.4. Are there any benefits of such a setup for non SolrCloud users? Thanks, Dmitry On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello list, Following the answer by Jaendra here: http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
Re: Apache Solr 4 - after 1st commit the index does not grow
First, when switching subjects please start a new thread. It gets confusing to have multiple topics, it's called thread hijacking. Second, I have no clue why your Nutch output is outputting invalid characters. Sounds like 1 your custom plugin is doing something weird or 2 something you could configure in Nutch. So I'd recommend asking on the Nutch board. Best Erick On Mon, Jul 15, 2013 at 11:40 AM, glumet jan.bouch...@gmail.com wrote: As I can see, this is the same problem like one from older posts - http://lucene.472066.n3.nabble.com/strange-utf-8-problem-td3094473.html ...but it was without any response. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-grow-tp4077913p4078079.html Sent from the Solr - User mailing list archive at Nabble.com.
Live reload
I used the reload command to apply changes in synonyms.txt for example, but with the new mechanisme https://wiki.apache.org/solr/CoreAdmin#LiveReload this will not work anymore. Is there another way to reload config files instead of restarting Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Live-reload-tp4078318.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ACL implementation: Pseudo-join performance Atomic Updates
Roman: Did this ever make into a JIRA? Somehow I missed it if it did, and this would be pretty cool Erick On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com wrote: On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com wrote: Hello Erick, Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. Yep, we have a list of unique Id's that we get by first searching for records where loggedInUser IS IN (userIDs) This corpus is stored in memory I suppose? (not a problem) and then the bottleneck is to match this huge set with the core where I'm searching? Somewhere in maillist archive people were talking about external list of Solr unique IDs but didn't find if there is a solution. Back in 2010 Yonik posted a comment: http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd sorry, haven't the previous thread in its entirety, but few weeks back that Yonik's proposal got implemented, it seems ;) http://search-lucene.com/m/Fa3Dg14mqoj/bitsetsubj=Re+Solr+large+boolean+filter You could use this to send very large bitset filter (which can be translated into any integers, if you can come up with a mapping function). roman bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for Stacked Segments) Ah, ok, I was feeling it affects the architecture, ok, now the only hope is Pseudo-Joins )) One way to deal with this is to implement a post filter, sometimes called a no cache filter. thanks, will have a look, but as you describe it, it's not the best option. The approach too many documents, man. Please refine your query. Partial results below means faceting will not work correctly? ... I have in mind a hybrid approach, comments welcome: Most of the time users are not searching, but browsing content, so our virtual filesystem stored in SOLR will use only the index with the Id of the file and the list of users that have access to it. i.e. not touching the fulltext index at all. Files may have metadata (EXIF info for images for ex) that we'd like to filter by, calculate facets. Meta will be stored in both indexes. In case of a fulltext query: 1. search FT index (the fulltext index), get only the number of search results, let it be Rf 2. search DAC index (the index with permissions), get number of search results, let it be Rd let maxR be the maximum size of the corpus for the pseudo-join. *That was actually my question: what is a reasonable number? 10, 100, 1000 ? * if (Rf maxR) or (Rd maxR) then use the smaller corpus to join onto the second one. this happens when (only a few documents contains the search query) OR (user has access to a small number of files). In case none of these happens, we can use the too many documents, man. Please refine your query. Partial results below but first searching the FT index, because we want relevant results first. What do you think? Regards, Oleg On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson erickerick...@gmail.com wrote: Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for Stacked Segments) on actually doing something about this, but it's been under consideration for at least 3 years so your guess is as good as mine. bq: notice that the worst situation is when everyone has access to all the files, it means the first filter will be the full index. One way to deal with this is to implement a post filter, sometimes called a no cache filter. The distinction here is that 1 it is not cached (duh!) 2 it is only called for documents that have made it through all the other lower cost filters (and the main query of course). 3 lower cost means the filter is either a standard, cached filters and any no cache filters with a cost (explicitly stated in the query) lower than this one's. Critically, and unlike normal filter queries, the result set is NOT calculated for all documents ahead of time You _still_ have to deal with the sysadmin doing a *:* query as you are well aware. But one can mitigate that by having the post-filter fail all documents after some arbitrary N, and display a message in the app like too many documents, man. Please refine your query. Partial results below. Of course this may not be acceptable, but HTH Erick On Sun, Jul 14, 2013 at 12:05 PM, Jack Krupansky j...@basetechnology.com wrote: Take a look at LucidWorks Search and its access control: http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control Role-based
Re: About Suggestions
Garbage in, garbage out G Your indexing analysis chain is breaking up the tokens via the EdgeNgramTokenizer and _putting those values in the index_. Then the TermsComponent is looking _only_ at the tokens in the index and giving you back exactly what you're asking for. So no, there's no way with that analysis chain to get only complete terms, at that level the fact that a term was part of a larger input token has been lost. In fact, if you were to enter something like terms.prefix=1n1 you'd likely see all your 3-grams that start with 1n1 etc. So use a copyfield and put these in a separate field that has only whole tokens or just take the EdgeNgramTokenizer from your current definition. If the latter, blow away your index and re-index from scratch. Best Erick On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Eric and everybody else! Thanks for trying to help. Here is the example: .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=trueterms.limit=20terms.sort=indexterms.prefix=1n1187 returns int name=1n11871/int int name=1n1187a1/int int name=1n1187r1/int int name=1n1187ra1/int This list contains 3 complete part numbers but the third item (1n1187r) is not a complete part number. Is there a way to make terms tell if a term represents a complete value? (My guess is that this gets lost after ngram but I'm still hoping something can be done.) More config details: field name=suggest type=text_parts indexed=true stored=true required=false multiValued=true/ and fieldType name=text_parts class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 13. Juli 2013 19:58 An: solr-user@lucene.apache.org Betreff: Re: About Suggestions Not quite sure what you mean here, a couple of examples would help. But since the term is using keyword tokenizer, then each thing you get back is a complete term, by definition. So I'm not quite sure what you're asking here. Best Erick On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Solr people! We need to suggest part numbers in alphabetically order adding up to four characters to the already entered part number prefix. That works quite well with terms component acting on a multivalued field with keyword tokenizer and edge nGram filter. I am mentioning part numbers to indicate that each item in the multivalued field is a string without whitespace and where special characters like dashes cannot be seen as separators. Is there a way to know if the term (the suggestion) represents such a complete part number (without doing another query for each suggestion)? Since we are using SolJ, what we would need is something like boolean Term.isRepresentingCompleteFieldValue() Thanks, Alexander
Re: Different 'fl' for first X results
You could also use a DocTransformer. But really, unless these fields are quite long it seems overkill to do anything but ignore them when returned for docs you don't care about. Best Erick On Mon, Jul 15, 2013 at 7:05 PM, Jack Krupansky j...@basetechnology.com wrote: SOLR-5005 - JavaScriptRequestHandler https://issues.apache.org/jira/browse/SOLR-5005 -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Monday, July 15, 2013 6:56 PM To: solr-user@lucene.apache.org Subject: Re: Different 'fl' for first X results Is there a JIRA number for the last one? Regards, Alex On 15 Jul 2013 17:21, Jack Krupansky j...@basetechnology.com wrote: 1. Request all fields needed for all results and simply ignore the extra field(s) (which can be empty or missing and will automatically be ignored by Solr anyway). 2. Two separate query requests. 3. A custom search component. 4. Wait for the new scripted query request handler that gives you full control in a custom script. -- Jack Krupansky -Original Message- From: Weber Sent: Monday, July 15, 2013 4:58 PM To: solr-user@lucene.apache.org Subject: Different 'fl' for first X results How to get a different field list in the first X results? For example, in the first 5 results I want fields A, B, C, and on the next results I need only fields A, and B. -- View this message in context: http://lucene.472066.n3.** nabble.com/Different-fl-for-**first-X-results-tp4078178.htmlhttp://lucene.472066.n3.nabble.com/Different-fl-for-first-X-results-tp4078178.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ACL implementation: Pseudo-join performance Atomic Updates
Is that this one: https://issues.apache.org/jira/browse/SOLR-1913 ? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 8:01 AM, Erick Erickson erickerick...@gmail.comwrote: Roman: Did this ever make into a JIRA? Somehow I missed it if it did, and this would be pretty cool Erick On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com wrote: On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com wrote: Hello Erick, Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. Yep, we have a list of unique Id's that we get by first searching for records where loggedInUser IS IN (userIDs) This corpus is stored in memory I suppose? (not a problem) and then the bottleneck is to match this huge set with the core where I'm searching? Somewhere in maillist archive people were talking about external list of Solr unique IDs but didn't find if there is a solution. Back in 2010 Yonik posted a comment: http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd sorry, haven't the previous thread in its entirety, but few weeks back that Yonik's proposal got implemented, it seems ;) http://search-lucene.com/m/Fa3Dg14mqoj/bitsetsubj=Re+Solr+large+boolean+filter You could use this to send very large bitset filter (which can be translated into any integers, if you can come up with a mapping function). roman bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for Stacked Segments) Ah, ok, I was feeling it affects the architecture, ok, now the only hope is Pseudo-Joins )) One way to deal with this is to implement a post filter, sometimes called a no cache filter. thanks, will have a look, but as you describe it, it's not the best option. The approach too many documents, man. Please refine your query. Partial results below means faceting will not work correctly? ... I have in mind a hybrid approach, comments welcome: Most of the time users are not searching, but browsing content, so our virtual filesystem stored in SOLR will use only the index with the Id of the file and the list of users that have access to it. i.e. not touching the fulltext index at all. Files may have metadata (EXIF info for images for ex) that we'd like to filter by, calculate facets. Meta will be stored in both indexes. In case of a fulltext query: 1. search FT index (the fulltext index), get only the number of search results, let it be Rf 2. search DAC index (the index with permissions), get number of search results, let it be Rd let maxR be the maximum size of the corpus for the pseudo-join. *That was actually my question: what is a reasonable number? 10, 100, 1000 ? * if (Rf maxR) or (Rd maxR) then use the smaller corpus to join onto the second one. this happens when (only a few documents contains the search query) OR (user has access to a small number of files). In case none of these happens, we can use the too many documents, man. Please refine your query. Partial results below but first searching the FT index, because we want relevant results first. What do you think? Regards, Oleg On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson erickerick...@gmail.com wrote: Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for Stacked Segments) on actually doing something about this, but it's been under consideration for at least 3 years so your guess is as good as mine. bq: notice that the worst situation is when everyone has access to all the files, it means the first filter will be the full index. One way to deal with this is to implement a post filter, sometimes called a no cache filter. The distinction here is that 1 it is not cached (duh!) 2 it is only called for documents that have made it through all the other lower cost filters (and the main query of course). 3 lower cost means the filter is either a standard, cached filters and any no cache filters with a cost (explicitly stated in the query) lower than this one's. Critically, and unlike normal filter queries, the result set is NOT calculated for all documents ahead of time You _still_ have to deal with the sysadmin
Re: How to use joins in solr 4.3.1
Not quite sure what's the problem with the second, but the first is: q=: That just isn't legal, try q=*:* As for the second, are there any other errors in the solr log? Sometimes what's returned in the response packet does not include the true source of the problem. Best Erick On Mon, Jul 15, 2013 at 7:40 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: I have also tried these queries (as per this SO answer: http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core ) 1. http://_server_.com:8983/solr/location/select?q=:fq={!join from=merchantId to=merchantId fromIndex=merchant}walgreens And I get this: { responseHeader:{ status:400, QTime:1, params:{ indent:true, q::, wt:json, fq:{!join from=merchantId to=merchantId fromIndex=merchant}walgreens}}, error:{ msg:org.apache.solr.search.SyntaxError: Cannot parse ':': Encountered \ \:\ \: \\ at line 1, column 0.\nWas expecting one of:\nNOT ...\n\+\ ...\n\-\ ...\nBAREOPER ...\n \(\ ...\n\*\ ...\nQUOTED ...\nTERM ...\n PREFIXTERM ...\nWILDTERM ...\nREGEXPTERM ...\n\[\ ...\n\{\ ...\nLPARAMS ...\nNUMBER ...\nTERM ...\n\*\ ...\n, code:400}} And this: 2.http://_server_.com:8983/solr/location/select?q=walgreensfq={!join from=merchantId to=merchantId fromIndex=merchant} { responseHeader:{ status:500, QTime:5, params:{ indent:true, q:walgreens, wt:json, fq:{!join from=merchantId to=merchantId fromIndex=merchant}}}, error:{ msg:Server at http://_SERVER_:8983/solr/location returned non ok status:500, message:Server Error, trace:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://_SERVER_:8983/solr/location returned non ok status:500, message:Server Error\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)\n\tat org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)\n\tat org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:138)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:138)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)\n\tat java.lang.Thread.run(Thread.java:662)\n, code:500}} Thanks, -Utkarsh On Mon, Jul 15, 2013 at 4:27 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Hello, I am trying to join data between two cores: merchant and location This is my query: http://_server_.com:8983/solr/location/select?q={!join from=merchantId to=merchantId fromIndex=merchant}walgreens Ref: http://wiki.apache.org/solr/Join Merchants core has documents for the query: walgreens with an merchantId 1 A simple query: http://_server_.com:8983/solr/location/select?q=walgreens returns documents called walgreens with merchantId=1 Location core has documents with merchantId=1 too. But my join query returns no documents. This is the response I get: { responseHeader:{ status:0, QTime:5, params:{ debugQuery:true, indent:true, q:{!join from=merchantId to=merchantId fromIndex=merchant}walgreens, wt:json}}, response:{numFound:0,start:0,maxScore:0.0,docs:[] }, debug:{ rawquerystring:{!join from=merchantId to=merchantId fromIndex=merchant}walgreens, querystring:{!join from=merchantId to=merchantId fromIndex=merchant}walgreens, parsedquery:JoinQuery({!join from=merchantId to=merchantId fromIndex=merchant}allText:walgreens), parsedquery_toString:{!join from=merchantId to=merchantId fromIndex=merchant}allText:walgreens, QParser:, explain:{}}} Any suggestions? -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: Range query on a substring.
Sorry, but you are basically misusing Solr (and multivalued fields), trying to take a shortcut to avoid a proper data model. To properly use Solr, you need to put each of these multivalued field values in a separate Solr document, with a text field and a value field. Then, you can query: text:some text AND value:[min-value TO max-value] Exactly how you should restructure your data model is dependent on all of your other requirements. You may be able to simply flatten your data. You may be able to use a simple join operation. Or, maybe you need to do a multi-step query operation if you data is sufficiently complex. If you want to keep your multivalued field in its current form for display purposes or keyword search, or exact match search, fine, but your stated goal is inconsistent with the Semantics of Solr and Lucene. To be crystal clear, there is no such thing as a range query on a substring in Solr or Lucene. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Tuesday, July 16, 2013 5:13 AM To: solr-user@lucene.apache.org Subject: Re: Range query on a substring. By multivalued I meant an array of values. For example: arr name=myfield strtext1 (X)/str strtext2 (Y)/str /arr I'd like to avoid spliting it as you propose. I have 2.3mn collection with pretty large records (few hundreds fields and more per record). Duplicating them would impact performance. Regards. On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote: Ah, you mean something like this: record: Id=10, text = this is a text N1 (X), another text N2 (Y), text N3 (Z) Id=11, text = this is a text N1 (W), another text N2 (Q), third text (M) and you need to search for: text N1 and X B ? How big is the core? the first thing that comes to my mind, again, at indexing level, split the text into pieces and index it in solr like this: record_id | text | value 10 | text N1 | X 10 | text N2 | Y 10 | text N3 | Z does it help? On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi Oleg, It's a multivalued field and it won't be easier to query when I split this field into text and numbers. I may get wrong results. Regards. On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote: IMHO the number(s) should be extracted and stored in separate columns in SOLR at indexing time. -- Oleg On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
AW: About Suggestions
Thanks Eric, that is what I suspected. We are very happy with the four suggestions in the example (and all the others), but we would like to know which of them represents a full part number. Can you elaborate a little more how that could be achieved? Best regards, Alexander -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Dienstag, 16. Juli 2013 14:09 An: solr-user@lucene.apache.org Betreff: Re: About Suggestions Garbage in, garbage out G Your indexing analysis chain is breaking up the tokens via the EdgeNgramTokenizer and _putting those values in the index_. Then the TermsComponent is looking _only_ at the tokens in the index and giving you back exactly what you're asking for. So no, there's no way with that analysis chain to get only complete terms, at that level the fact that a term was part of a larger input token has been lost. In fact, if you were to enter something like terms.prefix=1n1 you'd likely see all your 3-grams that start with 1n1 etc. So use a copyfield and put these in a separate field that has only whole tokens or just take the EdgeNgramTokenizer from your current definition. If the latter, blow away your index and re-index from scratch. Best Erick On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Eric and everybody else! Thanks for trying to help. Here is the example: .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=tru eterms.limit=20terms.sort=indexterms.prefix=1n1187 returns int name=1n11871/int int name=1n1187a1/int int name=1n1187r1/int int name=1n1187ra1/int This list contains 3 complete part numbers but the third item (1n1187r) is not a complete part number. Is there a way to make terms tell if a term represents a complete value? (My guess is that this gets lost after ngram but I'm still hoping something can be done.) More config details: field name=suggest type=text_parts indexed=true stored=true required=false multiValued=true/ and fieldType name=text_parts class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 13. Juli 2013 19:58 An: solr-user@lucene.apache.org Betreff: Re: About Suggestions Not quite sure what you mean here, a couple of examples would help. But since the term is using keyword tokenizer, then each thing you get back is a complete term, by definition. So I'm not quite sure what you're asking here. Best Erick On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Solr people! We need to suggest part numbers in alphabetically order adding up to four characters to the already entered part number prefix. That works quite well with terms component acting on a multivalued field with keyword tokenizer and edge nGram filter. I am mentioning part numbers to indicate that each item in the multivalued field is a string without whitespace and where special characters like dashes cannot be seen as separators. Is there a way to know if the term (the suggestion) represents such a complete part number (without doing another query for each suggestion)? Since we are using SolJ, what we would need is something like boolean Term.isRepresentingCompleteFieldValue() Thanks, Alexander
Re: Clearing old nodes from zookeper without restarting solrcloud cluster
Thanks, I was actually asking about deleting nodes from the cluster state not cores, unless you can unload cores specific to an already offline node from zookeeper. On Tue, Jul 16, 2013 at 1:55 AM, Marcin Rzewucki mrzewu...@gmail.comwrote: Hi, You should use CoreAdmin API (or Solr Admin page) and UNLOAD unneeded cores. This will unregister them from the zookeeper (cluster state will be updated), so they won't be used for querying any longer. Solrcloud restart is not needed in this case. Regards. On 16 July 2013 06:18, Ali, Saqib docbook@gmail.com wrote: Hello Luis, I don't think that is possible. If you delete clusterstate.json from zookeeper, you will need to restart the nodes.. I could be very wrong about this Saqib On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: I know that you can clear zookeeper's data directoy using the CLI with the clear command, I just want to know if its possible to update the cluster's state without wiping everything out. Anyone have any ideas/suggestions? On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: Hi, Is there an easy way to clear zookeeper of all offline solr nodes without restarting the cluster? We are having some stability issues and we think it maybe due to the leader querying old offline nodes. thank you, Luis Guerrero -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047 -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047
Are analysers applied to each value in a multi-valued field separately?
I'm guessing the answer is yes, but here's the background. We index 2 separate fields, headline and body text for a document, and then we want to identify the top of the story which is th headline + N words of the body (we want to weight that in scoring). So do to that: copyField src=headline dest=top/ copyField src=body dest=top/ And the top field has a LimitTokenCountFilterFactory appended to it to do the limiting. filter class=solr.LimitTokenCountFilterFactory maxTokenCount=N/ I realised that top needs to be multi-valued, which got me thinking: is that N tokens PER VALUE of top or N tokens in total within the top field... The field is indexed but not stored, so its hard to determine exactly which is being done. Logically, I presume each value in the field is independent (and Solr then just matches searches against each one), so that would suggest N is per value? Cheers, Daniel
Re: [solr 3.4.1] collections: meaning and necessity
If you only have one collection and no Solr cloud, then don't use solr.xml at all. It will automatically assume 'collection1' as a name. If you do want to have some control (shards, etc), do not include the optional parameters you do not need. See example here: http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html You don't even need defaultCoreName attribute, if you are happy to always include core name in the URL. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Sorry, hit send too fast.. picking up: from the answer by Jayendra on the link, collections and cores are the same thing. Same is seconded by the config: cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} core name=collection1 instanceDir=. / /cores we basically define cores. We have a plain {frontend_solr, shards} setup with solr 3.4 and were thinking of starting off with it initially in solr 4. In solr 4: can one get by without using collections = cores? We also don't plan on using SolrCloud at the moment. So from our standpoint the solr4 configuration looks more complicated, than that of solr 3.4. Are there any benefits of such a setup for non SolrCloud users? Thanks, Dmitry On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello list, Following the answer by Jaendra here: http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
solr 4.3.1 Installation
Hi , We have been using solr 3.6.1 .Recently downloaded the solr 4.3.1 version and installed the same as multicore setup as follows Folder Structure solr.war solr conf core0 core1 solr.xml Created the context fragment xml file in tomcat/conf/catalina/localhost which refers to the solr.war file and the solr home folder copied the muticore conf folder without the zoo.cfg file I get the following error and admin page does not load 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr_4.3.1] startup failed due to previous errors 16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig checkResources INFO: Undeploying context [/solr_4.3.1] 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr_4.3.1] startup failed due to previous errors Please let me know what I am missing If i need to install this with the default multicore setup without the cloud .Thanks Regards Sujatha
Re: Doc's FunctionQuery result field in my custom SearchComponent class ?
Basically, the evaluation of function queries in the fl parameter occurs when the response writer is composing the document results. That's AFTER all of the search components are done. SolrReturnFields.getTransformer() gets the DocTransformer, which is really a DocTransformers, and then a call to DocTransformers.transform() in each response writer will evaluate the embedded function queries and insert their values in the results as they are being written. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Tuesday, July 16, 2013 1:37 AM To: solr-user@lucene.apache.org Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent class ? No sorry, I am still not getting the termfreq() field in my 'doc' object. I do get the _version_ field in my 'doc' object which I think is realValue=StoredField. At which point termfreq() or any other FunctionQuery field becomes the part of doc object in Solr ? And at that point can I perform some custom logic and append the response ? Thanks. Tony On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin patanachai.tangchai...@wizecommerce.com wrote: Hi, I think the process of retrieving a stored field (through fl) is happens after SearchComponent. One solution: If you wrap a q params with function your score will be a result of the function. For example, http://localhost:8080/solr/**collection2/demoendpoint?q=** termfreq%28product,%27spider%**27%29wt=xmlindent=truefl=*,**scorehttp://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29wt=xmlindent=truefl=*,score Now your score is going to be a result of termfreq(product,'spider') -- Patanachai Tangchaisin On 07/15/2013 12:01 PM, Tony Mullins wrote: any help plz !!! On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins tonymullins...@gmail.com* *wrote: Please any help on how to get the value of 'freq' field in my custom SearchComponent ? http://localhost:8080/solr/**collection2/demoendpoint?q=** spiderwt=xmlindent=truefl=***,freq:termfreq%28product,%** 27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 docstr name=id11/strstr name=typeVideo Games/strstr name=formatxbox 360/strstr name=productThe Amazing Spider-Man/strint name=popularity11/int**long name=_version_**1439994081345273856/longint name=freq1/int/doc Here is my code DocList docs = rb.getResults().docList; DocIterator iterator = docs.iterator(); int sumFreq = 0; String id = null; for (int i = 0; i docs.size(); i++) { try { int docId = iterator.nextDoc(); // Document doc = searcher.doc(docId, fieldSet); Document doc = searcher.doc(docId); In doc object I can see the schema fields like 'id', 'type','format' etc. but I cannot find the field 'freq' which I needed. Is there any way to get the FunctionQuery fields in doc object ? Thanks, Tony On Mon, Jul 15, 2013 at 1:16 PM, Tony Mullins tonymullins...@gmail.com **wrote: Hi, I have extended Solr's SearchComonent class and I am iterating through all the docs in ResponseBuilder in @overrider Process() method. Here I want to get the value of FucntionQuery result but in Document object I am only seeing the standard field of document not the FucntionQuery result. This is my query http://localhost:8080/solr/**collection2/demoendpoint?q=** spiderwt=xmlindent=truefl=***,freq:termfreq%28product,%** 27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 Result of above query in browser shows me that 'freq' is part of doc but its not there in Document object in my @overrider Process() method. How can I get the value of FunctionQuery result in my custom SearchComponent ? Thanks, Tony CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
SolrCloud softcommit problem
Hi I'm using solr version 4.3.1. I have a core with only one shard and three replicas, say server1, server2 and server3. Suppose server1 is currently the leader if I send an update to the leader everything works fine wget -O - --header='Content-type: text/xml' --post-data='adddocfield name=sku16910/fieldfield name=name update=setyy/field/doc/add' 'server1:8080/solr/mycore/update?softCommit=true' querying server 1 server2 and server3 I see the right answer, always yy if instead I do send an update to a replica, say server2 wget -O - --header='Content-type: text/xml' --post-data='adddocfield name=sku16910/fieldfield name=name update=setz/field/doc/add' 'server2:8080/solr/mycore/update?softCommit=true' I see on server1 (leader) and server3 the correct value 'z' but server2 continues to show the wrong value, y, untill I send a commit. Am I using correctly the update api? Thanks Giovanni
Re: Are analysers applied to each value in a multi-valued field separately?
Yes, each input value is analyzed separately. Solr passes each input value to Lucene and then Lucene analyzes each. You could use LimitTokenPositionFilterFactory which uses the absolute token position - each successive analyzed value would have an incremented position, plus the positionIncrementGap (typically 100 for text.) -- Jack Krupansky -Original Message- From: Daniel Collins Sent: Tuesday, July 16, 2013 8:46 AM To: solr-user@lucene.apache.org Subject: Are analysers applied to each value in a multi-valued field separately? I'm guessing the answer is yes, but here's the background. We index 2 separate fields, headline and body text for a document, and then we want to identify the top of the story which is th headline + N words of the body (we want to weight that in scoring). So do to that: copyField src=headline dest=top/ copyField src=body dest=top/ And the top field has a LimitTokenCountFilterFactory appended to it to do the limiting. filter class=solr.LimitTokenCountFilterFactory maxTokenCount=N/ I realised that top needs to be multi-valued, which got me thinking: is that N tokens PER VALUE of top or N tokens in total within the top field... The field is indexed but not stored, so its hard to determine exactly which is being done. Logically, I presume each value in the field is independent (and Solr then just matches searches against each one), so that would suggest N is per value? Cheers, Daniel
Need advice on performing 300 queries per second on solr index
Hi I need to create a solr cluster that contains geospatial information and provides the ability to perform a few hundreds queries per second, each query should retrieve around 100k results. The data is around 100k documents, around 300gb total. I started with 2 shard cluster (replicationFactor 1) and a portion of the data - 20 gb. I run some load-tests and see that when 100 requests are sent in one second, the average qTime is around 4 seconds, but the average total response time (measuring from sending the request to solr untill getting a response ) reaches 20-25 seconds which is very bad. Currently I load-balance myself between the 2 solr servers (each request is sent to another server) Any advice on which resources do I need and how my solr cluster should look like? More shards? more replicas? another webserver? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html Sent from the Solr - User mailing list archive at Nabble.com.
Config changes in solr.DirectSolrSpellCheck after index is built?
Hi All, Can you change the configuration of a spellchecker using solr.DirectSolrSpellCheck after you've built an index? I know that this spellchecker doesn't build and index off to the side like the IndexBasedSpellChecker so I'm wondering what's happening internally to create a spellchecking dictionary. Thanks Brendan -- Brendan Grainger www.kuripai.com
Re: Need advice on performing 300 queries per second on solr index
Have you looked at cache utilization? Have you checked the IO and CPU load to see what the bottlenecks are? Are you sure things like your heap and servlet container threads are tuned? After you look at those issues, I'd probably think about adding http caching and more replicas. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 16, 2013 at 10:42 AM, adfel70 adfe...@gmail.com wrote: Hi I need to create a solr cluster that contains geospatial information and provides the ability to perform a few hundreds queries per second, each query should retrieve around 100k results. The data is around 100k documents, around 300gb total. I started with 2 shard cluster (replicationFactor 1) and a portion of the data - 20 gb. I run some load-tests and see that when 100 requests are sent in one second, the average qTime is around 4 seconds, but the average total response time (measuring from sending the request to solr untill getting a response ) reaches 20-25 seconds which is very bad. Currently I load-balance myself between the 2 solr servers (each request is sent to another server) Any advice on which resources do I need and how my solr cluster should look like? More shards? more replicas? another webserver? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud softcommit problem
I think this is SOLR-4923 https://issues.apache.org/jira/browse/SOLR-4923, should be fixed in 4.4 (when it comes out) or grab the branch_4x branch from svn. On 16 July 2013 14:12, giovanni.bricc...@banzai.it giovanni.bricc...@banzai.it wrote: Hi I'm using solr version 4.3.1. I have a core with only one shard and three replicas, say server1, server2 and server3. Suppose server1 is currently the leader if I send an update to the leader everything works fine wget -O - --header='Content-type: text/xml' --post-data='adddocfield name=sku16910/fieldfield name=name update=setyy/field/**doc/add' 'server1:8080/solr/mycore/**update?softCommit=true' querying server 1 server2 and server3 I see the right answer, always yy if instead I do send an update to a replica, say server2 wget -O - --header='Content-type: text/xml' --post-data='adddocfield name=sku16910/fieldfield name=name update=setz/field/**doc/add' 'server2:8080/solr/mycore/**update?softCommit=true' I see on server1 (leader) and server3 the correct value 'z' but server2 continues to show the wrong value, y, untill I send a commit. Am I using correctly the update api? Thanks Giovanni
Re: Are analysers applied to each value in a multi-valued field separately?
Thanks Jack. There seem to be a never ending set of FilterFactories, I keep hearing about new ones all the time :) Ok, I get it, so our existing code is the first N tokens of each value, and using LimitTokenPositionFilterFactor**y with the same number would give us the first N of the combined set of tokens, that's good to know. On 16 July 2013 14:15, Jack Krupansky j...@basetechnology.com wrote: Yes, each input value is analyzed separately. Solr passes each input value to Lucene and then Lucene analyzes each. You could use LimitTokenPositionFilterFactor**y which uses the absolute token position - each successive analyzed value would have an incremented position, plus the positionIncrementGap (typically 100 for text.) -- Jack Krupansky -Original Message- From: Daniel Collins Sent: Tuesday, July 16, 2013 8:46 AM To: solr-user@lucene.apache.org Subject: Are analysers applied to each value in a multi-valued field separately? I'm guessing the answer is yes, but here's the background. We index 2 separate fields, headline and body text for a document, and then we want to identify the top of the story which is th headline + N words of the body (we want to weight that in scoring). So do to that: copyField src=headline dest=top/ copyField src=body dest=top/ And the top field has a LimitTokenCountFilterFactory appended to it to do the limiting. filter class=solr.**LimitTokenCountFilterFactory maxTokenCount=N/ I realised that top needs to be multi-valued, which got me thinking: is that N tokens PER VALUE of top or N tokens in total within the top field... The field is indexed but not stored, so its hard to determine exactly which is being done. Logically, I presume each value in the field is independent (and Solr then just matches searches against each one), so that would suggest N is per value? Cheers, Daniel
Re: Are analysers applied to each value in a multi-valued field separately?
Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to PI + N to give the results above because of the increment gap between values. On 16 July 2013 17:16, Daniel Collins danwcoll...@gmail.com wrote: Thanks Jack. There seem to be a never ending set of FilterFactories, I keep hearing about new ones all the time :) Ok, I get it, so our existing code is the first N tokens of each value, and using LimitTokenPositionFilterFactor**y with the same number would give us the first N of the combined set of tokens, that's good to know. On 16 July 2013 14:15, Jack Krupansky j...@basetechnology.com wrote: Yes, each input value is analyzed separately. Solr passes each input value to Lucene and then Lucene analyzes each. You could use LimitTokenPositionFilterFactor**y which uses the absolute token position - each successive analyzed value would have an incremented position, plus the positionIncrementGap (typically 100 for text.) -- Jack Krupansky -Original Message- From: Daniel Collins Sent: Tuesday, July 16, 2013 8:46 AM To: solr-user@lucene.apache.org Subject: Are analysers applied to each value in a multi-valued field separately? I'm guessing the answer is yes, but here's the background. We index 2 separate fields, headline and body text for a document, and then we want to identify the top of the story which is th headline + N words of the body (we want to weight that in scoring). So do to that: copyField src=headline dest=top/ copyField src=body dest=top/ And the top field has a LimitTokenCountFilterFactory appended to it to do the limiting. filter class=solr.**LimitTokenCountFilterFactory maxTokenCount=N/ I realised that top needs to be multi-valued, which got me thinking: is that N tokens PER VALUE of top or N tokens in total within the top field... The field is indexed but not stored, so its hard to determine exactly which is being done. Logically, I presume each value in the field is independent (and Solr then just matches searches against each one), so that would suggest N is per value? Cheers, Daniel
Re: Need advice on performing 300 queries per second on solr index
You only have a 20Gb collection but is that per machine or total collection, so 10Gb per machine? What memory do you have available on those 2 machines, is it enough to get the collection into the disk cache? What OS is it (linux/windows, etc)? What heap size does your JVM have? Is it a static collection or are you updating it as well? 4s for a query to 25s end to end time seems a long disparity to me, I'd be curious as to where the time is going. SolrCloud will distribute the initial queries out to the shards (but with fl=uniquekey,score), then it seconds a second request once it has the list of documents, with fl=whatever you asked for to get the stored fields. Might be interesting to see if the query is 4s, how long does the stored field request take (if its long you might want to consider docValues or ask for less!). If you are using SolrCloud, you should be able to see the distributed requests (we see 3 per user request: distributed (on each shard), storedfields (on each shard that returned something) and then the user request on the machine you sent the request to), see if that gives you any indications where the time is going? On 16 July 2013 16:12, Michael Della Bitta michael.della.bi...@appinions.com wrote: Have you looked at cache utilization? Have you checked the IO and CPU load to see what the bottlenecks are? Are you sure things like your heap and servlet container threads are tuned? After you look at those issues, I'd probably think about adding http caching and more replicas. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 16, 2013 at 10:42 AM, adfel70 adfe...@gmail.com wrote: Hi I need to create a solr cluster that contains geospatial information and provides the ability to perform a few hundreds queries per second, each query should retrieve around 100k results. The data is around 100k documents, around 300gb total. I started with 2 shard cluster (replicationFactor 1) and a portion of the data - 20 gb. I run some load-tests and see that when 100 requests are sent in one second, the average qTime is around 4 seconds, but the average total response time (measuring from sending the request to solr untill getting a response ) reaches 20-25 seconds which is very bad. Currently I load-balance myself between the 2 solr servers (each request is sent to another server) Any advice on which resources do I need and how my solr cluster should look like? More shards? more replicas? another webserver? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need advice on performing 300 queries per second on solr index
Are you requesting all 100K results in one request? If so, that is pretty fast. If you are doing that, don't do that. Page the results. wunder On Jul 16, 2013, at 9:30 AM, Daniel Collins wrote: You only have a 20Gb collection but is that per machine or total collection, so 10Gb per machine? What memory do you have available on those 2 machines, is it enough to get the collection into the disk cache? What OS is it (linux/windows, etc)? What heap size does your JVM have? Is it a static collection or are you updating it as well? 4s for a query to 25s end to end time seems a long disparity to me, I'd be curious as to where the time is going. SolrCloud will distribute the initial queries out to the shards (but with fl=uniquekey,score), then it seconds a second request once it has the list of documents, with fl=whatever you asked for to get the stored fields. Might be interesting to see if the query is 4s, how long does the stored field request take (if its long you might want to consider docValues or ask for less!). If you are using SolrCloud, you should be able to see the distributed requests (we see 3 per user request: distributed (on each shard), storedfields (on each shard that returned something) and then the user request on the machine you sent the request to), see if that gives you any indications where the time is going? On 16 July 2013 16:12, Michael Della Bitta michael.della.bi...@appinions.com wrote: Have you looked at cache utilization? Have you checked the IO and CPU load to see what the bottlenecks are? Are you sure things like your heap and servlet container threads are tuned? After you look at those issues, I'd probably think about adding http caching and more replicas. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Tue, Jul 16, 2013 at 10:42 AM, adfel70 adfe...@gmail.com wrote: Hi I need to create a solr cluster that contains geospatial information and provides the ability to perform a few hundreds queries per second, each query should retrieve around 100k results. The data is around 100k documents, around 300gb total. I started with 2 shard cluster (replicationFactor 1) and a portion of the data - 20 gb. I run some load-tests and see that when 100 requests are sent in one second, the average qTime is around 4 seconds, but the average total response time (measuring from sending the request to solr untill getting a response ) reaches 20-25 seconds which is very bad. Currently I load-balance myself between the 2 solr servers (each request is sent to another server) Any advice on which resources do I need and how my solr cluster should look like? More shards? more replicas? another webserver? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
Re: Live reload
: I used the reload command to apply changes in synonyms.txt for example, but : with the new mechanisme https://wiki.apache.org/solr/CoreAdmin#LiveReload : this will not work anymore. the Live reload doesn't affect schema.xml settings and analyziers (like changing stopwords or synonyms) ... when you reload, you should see your new synonyms.txt file loaded. if you don't think you are seeing that behavior, then you need to provide a lot more details about what versin you are using, what steps you are trying, and what behavior you *are* seeing so that we can understand what porblem you might be having... https://wiki.apache.org/solr/UsingMailingLists i just did a simple sanity test on the 4x branch where i ran some stuff through the analyzer UI screen, then changed hte synonyms file and did a reload and saw the changes i expected when i re-loaded the analysis page. -Hoss
Re: Are analysers applied to each value in a multi-valued field separately?
Actually, I appear to be wrong on the position limit filter - it appears to be relative to the string being analyzed and not the full sequence of values analyzed for the field. Given this field and type: fieldType name=text_limit_position4 class=solr.TextField positionIncrementGap=10 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LimitTokenPositionFilterFactory maxTokenPosition=23/ /analyzer /fieldType field name=text_limit3 type=text_limit_position4 indexed=true stored=true multiValued=true / And this document: curl http://localhost:8983/solr/update?commit=true; \ -H 'Content-type:application/json' -d ' [{id: doc-1, title: Hello World, text_limit4: [a1 a2 a3 a4, b1 b2 b3 b4, c1 c2 c3 c4, d1 d2 d3 d4, e1 e2 e3 e4, f1 f2 f3 f4]}]' The hope was that the indexed sequence of terms would stop at c4, but the full values are indexed. These queries succeed: curl http://localhost:8983/solr/select/?q=text_limit4:d1; curl http://localhost:8983/solr/select/?q=text_limit4:f4; And this query fails: curl http://localhost:8983/solr/select/?q=text_limit4:%22a4+f1%22~65; While this query succeeds: curl http://localhost:8983/solr/select/?q=text_limit4:%22a4+f1%22~66; Indicating that the position gaps of 10 are there between each value, but the token position limit filter doesn't trigger. This document: curl http://localhost:8983/solr/update?commit=true; \ -H 'Content-type:application/json' -d ' [{id: doc-1, title: Hello World, text_limit4: a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26}]' Fails on this query: curl http://localhost:8983/solr/select/?q=text_limit4:a24; But succeeds on this query: curl http://localhost:8983/solr/select/?q=text_limit4:a23; Indicating that the token position limit filter does work, but only for the relative position, making it not much more useful than the token count limit filter. Oh well. -- Jack Krupansky -Original Message- From: Daniel Collins Sent: Tuesday, July 16, 2013 12:18 PM To: solr-user@lucene.apache.org Subject: Re: Are analysers applied to each value in a multi-valued field separately? Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to PI + N to give the results above because of the increment gap between values. On 16 July 2013 17:16, Daniel Collins danwcoll...@gmail.com wrote: Thanks Jack. There seem to be a never ending set of FilterFactories, I keep hearing about new ones all the time :) Ok, I get it, so our existing code is the first N tokens of each value, and using LimitTokenPositionFilterFactor**y with the same number would give us the first N of the combined set of tokens, that's good to know. On 16 July 2013 14:15, Jack Krupansky j...@basetechnology.com wrote: Yes, each input value is analyzed separately. Solr passes each input value to Lucene and then Lucene analyzes each. You could use LimitTokenPositionFilterFactor**y which uses the absolute token position - each successive analyzed value would have an incremented position, plus the positionIncrementGap (typically 100 for text.) -- Jack Krupansky -Original Message- From: Daniel Collins Sent: Tuesday, July 16, 2013 8:46 AM To: solr-user@lucene.apache.org Subject: Are analysers applied to each value in a multi-valued field separately? I'm guessing the answer is yes, but here's the background. We index 2 separate fields, headline and body text for a document, and then we want to identify the top of the story which is th headline + N words of the body (we want to weight that in scoring). So do to that: copyField src=headline dest=top/ copyField src=body dest=top/ And the top field has a LimitTokenCountFilterFactory appended to it to do the limiting. filter class=solr.**LimitTokenCountFilterFactory maxTokenCount=N/ I realised that top needs to be multi-valued, which got me thinking: is that N tokens PER VALUE of top or N tokens in total within the top field... The field is indexed but not stored, so its hard to determine exactly which is being done. Logically, I presume each value in the field is independent (and Solr then just matches searches against each one), so that would suggest N is per value? Cheers, Daniel
Re: Doc's FunctionQuery result field in my custom SearchComponent class ?
OK, So thats why I cannot see the FunctionQuery fields in my SearchComponent class. So then question would be how can I apply my custom processing/logic to these FunctionQuery ? Whats the ExtensionPoint in Solr for such scenarios ? Basically I want to call termfreq() for each document and then apply the sum to all doc's termfreq() results and show in one aggregated TermFreq field in my query response. Thanks. Tony On Tue, Jul 16, 2013 at 6:01 PM, Jack Krupansky j...@basetechnology.comwrote: Basically, the evaluation of function queries in the fl parameter occurs when the response writer is composing the document results. That's AFTER all of the search components are done. SolrReturnFields.**getTransformer() gets the DocTransformer, which is really a DocTransformers, and then a call to DocTransformers.transform() in each response writer will evaluate the embedded function queries and insert their values in the results as they are being written. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Tuesday, July 16, 2013 1:37 AM To: solr-user@lucene.apache.org Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent class ? No sorry, I am still not getting the termfreq() field in my 'doc' object. I do get the _version_ field in my 'doc' object which I think is realValue=StoredField. At which point termfreq() or any other FunctionQuery field becomes the part of doc object in Solr ? And at that point can I perform some custom logic and append the response ? Thanks. Tony On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin patanachai.tangchaisin@**wizecommerce.compatanachai.tangchai...@wizecommerce.com wrote: Hi, I think the process of retrieving a stored field (through fl) is happens after SearchComponent. One solution: If you wrap a q params with function your score will be a result of the function. For example, http://localhost:8080/solr/collection2/demoendpoint?q=**http://localhost:8080/solr/**collection2/demoendpoint?q=** termfreq%28product,%27spider%27%29wt=xmlindent=truefl=***,**score http://localhost:**8080/solr/collection2/**demoendpoint?q=termfreq%** 28product,%27spider%27%29wt=**xmlindent=truefl=*,scorehttp://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29wt=xmlindent=truefl=*,score Now your score is going to be a result of termfreq(product,'spider') -- Patanachai Tangchaisin On 07/15/2013 12:01 PM, Tony Mullins wrote: any help plz !!! On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins tonymullins...@gmail.com * *wrote: Please any help on how to get the value of 'freq' field in my custom SearchComponent ? http://localhost:8080/solr/collection2/demoendpoint?q=**http://localhost:8080/solr/**collection2/demoendpoint?q=** spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%** 27spider%27%29http://**localhost:8080/solr/** collection2/demoendpoint?q=**spiderwt=xmlindent=truefl=*** ,freq:termfreq%28product,%**27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 docstr name=id11/strstr name=typeVideo Games/strstr name=formatxbox 360/strstr name=productThe Amazing Spider-Man/strint name=popularity11/intlong name=_version_1439994081345273856/longint name=freq1/int/doc Here is my code DocList docs = rb.getResults().docList; DocIterator iterator = docs.iterator(); int sumFreq = 0; String id = null; for (int i = 0; i docs.size(); i++) { try { int docId = iterator.nextDoc(); // Document doc = searcher.doc(docId, fieldSet); Document doc = searcher.doc(docId); In doc object I can see the schema fields like 'id', 'type','format' etc. but I cannot find the field 'freq' which I needed. Is there any way to get the FunctionQuery fields in doc object ? Thanks, Tony On Mon, Jul 15, 2013 at 1:16 PM, Tony Mullins tonymullins...@gmail.com **wrote: Hi, I have extended Solr's SearchComonent class and I am iterating through all the docs in ResponseBuilder in @overrider Process() method. Here I want to get the value of FucntionQuery result but in Document object I am only seeing the standard field of document not the FucntionQuery result. This is my query http://localhost:8080/solr/collection2/demoendpoint?q=**http://localhost:8080/solr/**collection2/demoendpoint?q=** spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%** 27spider%27%29http://**localhost:8080/solr/** collection2/demoendpoint?q=**spiderwt=xmlindent=truefl=*** ,freq:termfreq%28product,%**27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29 Result of above query in browser shows me that 'freq' is part of doc but its not there in Document object in
Highlighting externally stored text
Does anyone know if Issue SOLR-1397 (It should be possible to highlight external text ) is actively being worked by chance? Looks like the last update was May 2012. https://issues.apache.org/jira/browse/SOLR-1397 I'm trying to find a way to best highlight search results even though those results are not stored in my index. Has anyone been successful in reusing the SOLR highlighting logic on non-stored data? Does anyone know if there any other third party libraries that can do this for me until 1397 is formally released? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to change extracted directory
On 7/16/2013 2:02 AM, wolbi wrote: As I said, if I change it in context.xml it works... but the question is... how to make it from commandline, without modyfing config files. Thanks Take it out of the config file. Thanks, Shawn
Re: solr 4.3.1 Installation
This problem looks to me because of solr logging ... see below detail description (taken one of the mail thread) - Solr 4.3.0 and later does not have ANY slf4j jarfiles in the .war file, so you need to put them in your classpath. Jarfiles are included in the example, in example/lib/ext, and those jarfiles set up logging to use log4j, a much more flexible logging framework than JDK logging. JDK logging is typically set up with a file called logging.properties, which I think you must use a system property to configure. You aren't using JDK logging, you are using log4j, which uses a file called log4j.properties. http://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty On Tue, Jul 16, 2013 at 6:28 PM, Sujatha Arun suja.a...@gmail.com wrote: Hi , We have been using solr 3.6.1 .Recently downloaded the solr 4.3.1 version and installed the same as multicore setup as follows Folder Structure solr.war solr conf core0 core1 solr.xml Created the context fragment xml file in tomcat/conf/catalina/localhost which refers to the solr.war file and the solr home folder copied the muticore conf folder without the zoo.cfg file I get the following error and admin page does not load 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr_4.3.1] startup failed due to previous errors 16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig checkResources INFO: Undeploying context [/solr_4.3.1] 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr_4.3.1] startup failed due to previous errors Please let me know what I am missing If i need to install this with the default multicore setup without the cloud .Thanks Regards Sujatha
Re: ACL implementation: Pseudo-join performance Atomic Updates
Erick, I wasn't sure this issue is important, so I wanted first solicit some feedback. You and Otis expressed interest, and I could create the JIRA - however, as Alexandre, points out, the SOLR-1913 seems similar (actually, closer to the Otis request to have the elasticsearch named filter) but the SOLR-1913 was created in 2010 and is not integrated yet, so I am wondering whether this new feature (somewhat overlapping, but still different from SOLR-1913) is something people would really want and the effort on the JIRA is well spent. What's your view? Thanks, roman On Tue, Jul 16, 2013 at 8:23 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Is that this one: https://issues.apache.org/jira/browse/SOLR-1913 ? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 8:01 AM, Erick Erickson erickerick...@gmail.com wrote: Roman: Did this ever make into a JIRA? Somehow I missed it if it did, and this would be pretty cool Erick On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com wrote: On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com wrote: Hello Erick, Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. Yep, we have a list of unique Id's that we get by first searching for records where loggedInUser IS IN (userIDs) This corpus is stored in memory I suppose? (not a problem) and then the bottleneck is to match this huge set with the core where I'm searching? Somewhere in maillist archive people were talking about external list of Solr unique IDs but didn't find if there is a solution. Back in 2010 Yonik posted a comment: http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd sorry, haven't the previous thread in its entirety, but few weeks back that Yonik's proposal got implemented, it seems ;) http://search-lucene.com/m/Fa3Dg14mqoj/bitsetsubj=Re+Solr+large+boolean+filter You could use this to send very large bitset filter (which can be translated into any integers, if you can come up with a mapping function). roman bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for Stacked Segments) Ah, ok, I was feeling it affects the architecture, ok, now the only hope is Pseudo-Joins )) One way to deal with this is to implement a post filter, sometimes called a no cache filter. thanks, will have a look, but as you describe it, it's not the best option. The approach too many documents, man. Please refine your query. Partial results below means faceting will not work correctly? ... I have in mind a hybrid approach, comments welcome: Most of the time users are not searching, but browsing content, so our virtual filesystem stored in SOLR will use only the index with the Id of the file and the list of users that have access to it. i.e. not touching the fulltext index at all. Files may have metadata (EXIF info for images for ex) that we'd like to filter by, calculate facets. Meta will be stored in both indexes. In case of a fulltext query: 1. search FT index (the fulltext index), get only the number of search results, let it be Rf 2. search DAC index (the index with permissions), get number of search results, let it be Rd let maxR be the maximum size of the corpus for the pseudo-join. *That was actually my question: what is a reasonable number? 10, 100, 1000 ? * if (Rf maxR) or (Rd maxR) then use the smaller corpus to join onto the second one. this happens when (only a few documents contains the search query) OR (user has access to a small number of files). In case none of these happens, we can use the too many documents, man. Please refine your query. Partial results below but first searching the FT index, because we want relevant results first. What do you think? Regards, Oleg On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson erickerick...@gmail.com wrote: Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. bq: I suppose the delete/reindex approach will not change soon There is ongoing work (search the JIRA for Stacked Segments) on actually doing something about this, but it's been under consideration for at least 3 years so your guess is as good as
Re: Live reload
Are you using synonyms during indexing or during query only? If during indexing, the reloading by itself will not change what was stored - you need to fully reindex as well. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 7:46 AM, O. Klein kl...@octoweb.nl wrote: I used the reload command to apply changes in synonyms.txt for example, but with the new mechanisme https://wiki.apache.org/solr/CoreAdmin#LiveReload this will not work anymore. Is there another way to reload config files instead of restarting Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Live-reload-tp4078318.html Sent from the Solr - User mailing list archive at Nabble.com.
Usage of luceneMatchVersion when upgrading from solr 3.6 to solr 4.3
Hi, We are upgrading solr from 3.6 to 4.3, but we have a large amount of indexed data and could not afford to to reindex all once. We wish solr 4.3 could do the following: 1/ still able to search on solr 3.6 indexed data 2/ whenever indexing new document, convert to 4.3 format (may not happen all once) In this case, should we use LUCENE_36 or LUCENE_43 for luceneMatchVersion (it is suggested that we should reindex all data if using LUCENE_43, so I think we should use LUCENE_36, since we cannot reindex all once, true)? Thanks very much for helps, Lisheng
Re: Live reload
My bad. I did some more testing as well and could not replicate the behavior. Reloading synonyms works fine with a core reload. Chris Hostetter-3 wrote : I used the reload command to apply changes in synonyms.txt for example, but : with the new mechanisme lt;https://wiki.apache.org/solr/CoreAdmin#LiveReloadgt; : this will not work anymore. the Live reload doesn't affect schema.xml settings and analyziers (like changing stopwords or synonyms) ... when you reload, you should see your new synonyms.txt file loaded. if you don't think you are seeing that behavior, then you need to provide a lot more details about what versin you are using, what steps you are trying, and what behavior you *are* seeing so that we can understand what porblem you might be having... https://wiki.apache.org/solr/UsingMailingLists i just did a simple sanity test on the 4x branch where i ran some stuff through the analyzer UI screen, then changed hte synonyms file and did a reload and saw the changes i expected when i re-loaded the analysis page. -Hoss -- View this message in context: http://lucene.472066.n3.nabble.com/Live-reload-tp4078318p4078400.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Range query on a substring.
Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1* becomes behind the scenes: lucene (10|11|12|13|14|1abcd) the issue there is that it is a string range, but it is a range query - it just has to be indexed in a clever way So, Marcin, you still have quite a few options besides the strict boolean query model 1. have a special tokenizer chain which creates one token out of these groups (eg. some text prefix_1) and search for some text prefix_* [and do some post-filtering if necessary] 2. another version, using regex /some text (1|2|3...)/ - you got the idea 3. construct the lucene multi-term range query automatically, in your qparser - to produce a phrase query lucene (10|11|12|13|14) 4. use payloads to index your integer at the position of some text and then retrieve only some text where the payload is in range x-y - an example is here, look at getPayloadQuery() https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java- but this is more complex situation and if you google, you will find a better description 5. use a qparser that is able to handle nested search and analysis at the same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i know about a parser that can handle this and i invite others to check it out (yeah, JIRA tickets need reviewers ;-)) https://issues.apache.org/jira/browse/LUCENE-5014 there might be others i forgot, but it is certainly doable; but as Jack points out, you may want to stop for a moment to reflect whether it is necessary HTH, roman On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.comwrote: Sorry, but you are basically misusing Solr (and multivalued fields), trying to take a shortcut to avoid a proper data model. To properly use Solr, you need to put each of these multivalued field values in a separate Solr document, with a text field and a value field. Then, you can query: text:some text AND value:[min-value TO max-value] Exactly how you should restructure your data model is dependent on all of your other requirements. You may be able to simply flatten your data. You may be able to use a simple join operation. Or, maybe you need to do a multi-step query operation if you data is sufficiently complex. If you want to keep your multivalued field in its current form for display purposes or keyword search, or exact match search, fine, but your stated goal is inconsistent with the Semantics of Solr and Lucene. To be crystal clear, there is no such thing as a range query on a substring in Solr or Lucene. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Tuesday, July 16, 2013 5:13 AM To: solr-user@lucene.apache.org Subject: Re: Range query on a substring. By multivalued I meant an array of values. For example: arr name=myfield strtext1 (X)/str strtext2 (Y)/str /arr I'd like to avoid spliting it as you propose. I have 2.3mn collection with pretty large records (few hundreds fields and more per record). Duplicating them would impact performance. Regards. On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote: Ah, you mean something like this: record: Id=10, text = this is a text N1 (X), another text N2 (Y), text N3 (Z) Id=11, text = this is a text N1 (W), another text N2 (Q), third text (M) and you need to search for: text N1 and X B ? How big is the core? the first thing that comes to my mind, again, at indexing level, split the text into pieces and index it in solr like this: record_id | text | value 10 | text N1 | X 10 | text N2 | Y 10 | text N3 | Z does it help? On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi Oleg, It's a multivalued field and it won't be easier to query when I split this field into text and numbers. I may get wrong results. Regards. On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote: IMHO the number(s) should be extracted and stored in separate columns in SOLR at indexing time. -- Oleg On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have a problem (wonder if it is possible to solve it at all) with the following query. There are documents with a field which contains a text and a number in brackets, eg. myfield: this is a text (number) There might be some other documents with the same text but different number in brackets. I'd like to find documents with the given text say this is a text and number between A and B. Is it possible in Solr ? Any ideas ? Kind regards.
Re: Solr is not responding on deployment in tomcat
Thanks Eric, i've configured both to use 8080 (For wicket this is standard :-)). Do i have to assign a different port to solr if i use both webapps in the same container? Btw. the contextpath for my wicket app is /* Could that be a problem to? Per Am 15.07.2013 17:12, schrieb Erick Erickson: Sounds like Wicket and Solr are using the same port(s)... If you start Wicket first then look at the Solr logs, you might see some message about port already in use or some such. If this is SolrCloud, there are also the ZooKeeper ports to wonder about. Best Erick On Mon, Jul 15, 2013 at 6:49 AM, Per Newgro per.new...@gmx.ch wrote: Hi, maybe someone here can help me with my solr-4.3.1 issue. I've successful deployed the solr.war on a tomcat7 instance. Starting the tomcat with only the solr.war deployed - works nicely. I can see the admin interface and logs are clean. If i deploy my wicket-spring-data-solr based app (using the HttpSolrServer) after the solr app without restarting the tomcat = all is fine to. I've implemented a ping to see if server is up. code private void waitUntilSolrIsAvailable(int i) { if (i == 0) { logger.info(Check solr state...); } if (i 5) { throw new RuntimeException(Solr is not avaliable after more than 25 secs. Going down now.); } if (i 0) { try { logger.info(Wait for solr to get alive.); Thread.currentThread().wait(5000); } catch (InterruptedException e) { throw new RuntimeException(e); } } try { i++; SolrPingResponse r = solrServer.ping(); if (r.getStatus() 0) { waitUntilSolrIsAvailable(i); } logger.info(Solr is alive.); } catch (SolrServerException | IOException e) { throw new RuntimeException(e); } } /code Here i can see log log 54295 [localhost-startStop-2] INFO org.apache.wicket.Application – [wicket.project] init: Wicket extensions initializer INFO - 2013-07-15 12:07:45.261; de.company.service.SolrServerInitializationService; Check solr state... 54505 [localhost-startStop-2] INFO de.company.service.SolrServerInitializationService – Check solr state... INFO - 2013-07-15 12:07:45.768; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} hits=0 status=0 QTime=20 55012 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} hits=0 status=0 QTime=20 INFO - 2013-07-15 12:07:45.770; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} status=0 QTime=22 55014 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} status=0 QTime=22 INFO - 2013-07-15 12:07:45.854; de.company.service.SolrServerInitializationService; Solr is alive. 55098 [localhost-startStop-2] INFO de.company.service.SolrServerInitializationService – Solr is alive. /log But if i restart the tomcat with both webapps (solr and wicket) the solr is not responding on the ping request. log INFO - 2013-07-15 12:02:27.634; org.apache.wicket.Application; [wicket.project] init: Wicket extensions initializer 11932 [localhost-startStop-1] INFO org.apache.wicket.Application – [wicket.project] init: Wicket extensions initializer INFO - 2013-07-15 12:02:27.787; de.company.service.SolrServerInitializationService; Check solr state... 12085 [localhost-startStop-1] INFO de.company.service.SolrServerInitializationService – Check solr state... /log What could that be or how can i get infos where this is stopping? Thanks for your support Per
Re: solr 4.3.1 Installation
Thanks Sandeep,that fixed it. Regards, Sujatha On Tue, Jul 16, 2013 at 10:41 PM, Sandeep Gupta gupta...@gmail.com wrote: This problem looks to me because of solr logging ... see below detail description (taken one of the mail thread) - Solr 4.3.0 and later does not have ANY slf4j jarfiles in the .war file, so you need to put them in your classpath. Jarfiles are included in the example, in example/lib/ext, and those jarfiles set up logging to use log4j, a much more flexible logging framework than JDK logging. JDK logging is typically set up with a file called logging.properties, which I think you must use a system property to configure. You aren't using JDK logging, you are using log4j, which uses a file called log4j.properties. http://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty On Tue, Jul 16, 2013 at 6:28 PM, Sujatha Arun suja.a...@gmail.com wrote: Hi , We have been using solr 3.6.1 .Recently downloaded the solr 4.3.1 version and installed the same as multicore setup as follows Folder Structure solr.war solr conf core0 core1 solr.xml Created the context fragment xml file in tomcat/conf/catalina/localhost which refers to the solr.war file and the solr home folder copied the muticore conf folder without the zoo.cfg file I get the following error and admin page does not load 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr_4.3.1] startup failed due to previous errors 16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig checkResources INFO: Undeploying context [/solr_4.3.1] 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start SEVERE: Error filterStart 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start SEVERE: Context [/solr_4.3.1] startup failed due to previous errors Please let me know what I am missing If i need to install this with the default multicore setup without the cloud .Thanks Regards Sujatha
Re: [solr 3.4.1] collections: meaning and necessity
Thanks Alexandre, Well, the initial question was, whether it is possible to altogether avoid dealing with collections (extra layer, longer url). But it seems this is an internal new feature of solr 4 generation. In solr 3 it was just a core, which could be avoided if no solr.xml was found. With this release my solr terminology has transformed into having some ambiguous words (collection and core) referring to the same thing. I'm not even sure, what shard is nowadays :) On Tue, Jul 16, 2013 at 3:57 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: If you only have one collection and no Solr cloud, then don't use solr.xml at all. It will automatically assume 'collection1' as a name. If you do want to have some control (shards, etc), do not include the optional parameters you do not need. See example here: http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html You don't even need defaultCoreName attribute, if you are happy to always include core name in the URL. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Sorry, hit send too fast.. picking up: from the answer by Jayendra on the link, collections and cores are the same thing. Same is seconded by the config: cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} core name=collection1 instanceDir=. / /cores we basically define cores. We have a plain {frontend_solr, shards} setup with solr 3.4 and were thinking of starting off with it initially in solr 4. In solr 4: can one get by without using collections = cores? We also don't plan on using SolrCloud at the moment. So from our standpoint the solr4 configuration looks more complicated, than that of solr 3.4. Are there any benefits of such a setup for non SolrCloud users? Thanks, Dmitry On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello list, Following the answer by Jaendra here: http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
Re: How to use joins in solr 4.3.1
Looks like the JoinQParserPlugin is throwing an NPE. Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key to=merchantId fromIndex=merchant} 84343345 [qtp2012387303-16] ERROR org.apache.solr.core.SolrCore – java.lang.NullPointerException at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580) at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) 84343350 [qtp2012387303-16] INFO org.apache.solr.core.SolrCore – [location] webapp=/solr path=/select params={distrib=falsewt=javabinversion=2rows=10df=allTextfl=key,scoreshard.url=x:8983/solr/location/NOW=1373999694930start=0q=*:*_=1373999505886isShard=truefq={!join+from%3Dkey+to%3DmerchantId+fromIndex%3Dmerchant}fsv=true} status=500 QTime=6 84343351 [qtp2012387303-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580) at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at
Re: [solr 3.4.1] collections: meaning and necessity
Search this mailing list and you will find a very long discussion about the terminology and confusion around it.My contribution to that was the crude picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help. If you don't want longer URL, do use solr.xml and use @adminPath and @defaultCoreName parameters. But you don't need the rest. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 2:30 PM, Dmitry Kan solrexp...@gmail.com wrote: Thanks Alexandre, Well, the initial question was, whether it is possible to altogether avoid dealing with collections (extra layer, longer url). But it seems this is an internal new feature of solr 4 generation. In solr 3 it was just a core, which could be avoided if no solr.xml was found. With this release my solr terminology has transformed into having some ambiguous words (collection and core) referring to the same thing. I'm not even sure, what shard is nowadays :) On Tue, Jul 16, 2013 at 3:57 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: If you only have one collection and no Solr cloud, then don't use solr.xml at all. It will automatically assume 'collection1' as a name. If you do want to have some control (shards, etc), do not include the optional parameters you do not need. See example here: http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html You don't even need defaultCoreName attribute, if you are happy to always include core name in the URL. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Sorry, hit send too fast.. picking up: from the answer by Jayendra on the link, collections and cores are the same thing. Same is seconded by the config: cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} core name=collection1 instanceDir=. / /cores we basically define cores. We have a plain {frontend_solr, shards} setup with solr 3.4 and were thinking of starting off with it initially in solr 4. In solr 4: can one get by without using collections = cores? We also don't plan on using SolrCloud at the moment. So from our standpoint the solr4 configuration looks more complicated, than that of solr 3.4. Are there any benefits of such a setup for non SolrCloud users? Thanks, Dmitry On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello list, Following the answer by Jaendra here: http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
Re: How to use joins in solr 4.3.1
Found this post: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201302.mbox/%3CCAB_8Yd82aqq=oY6dBRmVjG7gvBBewmkZGF9V=fpne4xgkbu...@mail.gmail.com%3E And based on the answer, I modified my query: localhost:8983/solr/location/ select?fq={!join from=key to=merchantId fromIndex=merchant}*:* I don't see any errors, but my original problem still persists, no documents are returned. The two fields on which I am trying to join is: Merchant: field name=merchantId type=string indexed=true stored=true multiValued=false / Location: field name=merchantId type=string indexed=false stored=true multiValued=false / Thanks, -Utkarsh On Tue, Jul 16, 2013 at 11:39 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Looks like the JoinQParserPlugin is throwing an NPE. Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key to=merchantId fromIndex=merchant} 84343345 [qtp2012387303-16] ERROR org.apache.solr.core.SolrCore – java.lang.NullPointerException at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580) at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) 84343350 [qtp2012387303-16] INFO org.apache.solr.core.SolrCore – [location] webapp=/solr path=/select params={distrib=falsewt=javabinversion=2rows=10df=allTextfl=key,scoreshard.url=x:8983/solr/location/NOW=1373999694930start=0q=*:*_=1373999505886isShard=truefq={!join+from%3Dkey+to%3DmerchantId+fromIndex%3Dmerchant}fsv=true} status=500 QTime=6 84343351 [qtp2012387303-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580) at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at
Re: [solr 3.4.1] collections: meaning and necessity
On 7/16/2013 12:41 PM, Alexandre Rafalovitch wrote: Search this mailing list and you will find a very long discussion about the terminology and confusion around it.My contribution to that was the crude picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help. If you don't want longer URL, do use solr.xml and use @adminPath and @defaultCoreName parameters. But you don't need the rest. I'm relatively sure that defaultCoreName isn't there if you use the new core discovery mode that is default in the 4.4 example. This new mode will be the only option in 5.0. The old mode will continue to be supported throughout all 4.x versions. I think getting rid of defaultCoreName is the right move - Solr has been multicore in the standard example for quite some time. Accessing Solr without a corename in the URL is a source of confusion for users when they venture outside the collection1 core that comes with the default example. IMHO, the additional capability and confusion inherent with SolrCloud makes it even more important that the user include a collection/core name when making their request. Thanks, Shawn
SolrCloud Zookeeper SaslClient
Hi, Is there any documentation of how to configure SolrCloud Zookeeper using SASL (on JBOSS 5). When I start SolrCloud on Jboss 5 I see WARN: /2013-07-16 21:38:17,425 INFO [org.apache.solr.common.cloud.ConnectionManager:157] (main) Waiting for client to connect to ZooKeeper 2013-07-16 21:38:17,437 WARN [org.apache.zookeeper.client.ZooKeeperSaslClient:437] (main-SendThread(localhost:2181)) Could not login: the client is being asked for a password, but the Zookeeper client code does not currently support obtai ning a password from the user. Make sure that the client is configured to use a ticket cache (using the JAAS configuration setting 'useTicketCache=true)' and restart the client. If you still get this message after that, the TGT in the tick et cache has expired and must be manually refreshed. To do so, first determine if you are using a password or a keytab. If the former, run kinit in a Unix shell in the environment of the user who is running this Zookeeper client using the command 'kinit princ' (where princ is the name of the client's Kerberos principal). If the latter, do 'kinit -k -t keytab princ' (where princ is the name of the Kerberos principal, and keytab is the location of the keytab file) . After manually refreshing your cache, restart this client. If you continue to see this message after manually refreshing your cache, ensure that your KDC host's clock is in sync with this host's clock. 2013-07-16 21:38:17,438 WARN [org.apache.zookeeper.ClientCnxn:949] (main-SendThread(localhost:2181)) SASL configuration failed: javax.security.auth.login.FailedLoginException: Password Incorrect/Password Required Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it./ Any example or tutorial? I'd like to configure it to be secured :-) Kowish -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-SaslClient-tp4078447.html Sent from the Solr - User mailing list archive at Nabble.com.
JVM Crashed - SOLR deployed in Tomcat
Hello Everyone, We are using solrcloud with Tomcat in our production environment. Here is our configuration. solr-4.0.0 JVM 1.6.0_25 The JVM keeps crashing everyday with the following error. I think it is happening while we try index the data with solrj APIs. INFO: [aq-core] webapp=/solr path=/update params={distrib.from=http://solr03-prod:8080/solr/aq-core/update.distrib=TOLEADERwt=javabinversion=2} status=0 QTime=1 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0xfd7ffadac771, pid=2411, tid=33662 # # JRE version: 6.0_25-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode solaris-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats; # # An error report file with more information is saved as: # /opt/tomcat/hs_err_pid2411.log Jul 16, 2013 6:27:07 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp Instructions: (pc=0xfd7ffadac771) 0xfd7ffadac751: 89 4c 24 30 4c 89 44 24 28 4c 89 54 24 18 44 89 0xfd7ffadac761: 5c 24 20 4c 8b 57 10 4d 63 d9 49 8b ca 49 03 cb 0xfd7ffadac771: 44 0f be 01 45 8b d9 41 ff c3 44 89 5f 18 45 85 0xfd7ffadac781: c0 0f 8c b0 05 00 00 45 8b d0 45 8b da 41 d1 eb Register to memory mapping: RAX=0x14008cf2 is an unknown value RBX= [error occurred during error reporting (printing register info), id 0xb] Stack: [0xfd7de4eff000,0xfd7de4fff000], sp=0xfd7de4ffe140, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats; Please let me know if anyone has seen this before. Any input is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [solr 3.4.1] collections: meaning and necessity
Thanks Alexandre, I think I have followed that discussion, there was another one AFAIR on the dev list. On your diagram, am I guessing it correctly, that shard1 and shard2 inside a collection would at least share the same schema? On Tue, Jul 16, 2013 at 9:41 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Search this mailing list and you will find a very long discussion about the terminology and confusion around it.My contribution to that was the crude picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help. If you don't want longer URL, do use solr.xml and use @adminPath and @defaultCoreName parameters. But you don't need the rest. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 2:30 PM, Dmitry Kan solrexp...@gmail.com wrote: Thanks Alexandre, Well, the initial question was, whether it is possible to altogether avoid dealing with collections (extra layer, longer url). But it seems this is an internal new feature of solr 4 generation. In solr 3 it was just a core, which could be avoided if no solr.xml was found. With this release my solr terminology has transformed into having some ambiguous words (collection and core) referring to the same thing. I'm not even sure, what shard is nowadays :) On Tue, Jul 16, 2013 at 3:57 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: If you only have one collection and no Solr cloud, then don't use solr.xml at all. It will automatically assume 'collection1' as a name. If you do want to have some control (shards, etc), do not include the optional parameters you do not need. See example here: http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html You don't even need defaultCoreName attribute, if you are happy to always include core name in the URL. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Sorry, hit send too fast.. picking up: from the answer by Jayendra on the link, collections and cores are the same thing. Same is seconded by the config: cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} hostPort=${jetty.port:8983} hostContext=${hostContext:solr} zkClientTimeout=${zkClientTimeout:15000} core name=collection1 instanceDir=. / /cores we basically define cores. We have a plain {frontend_solr, shards} setup with solr 3.4 and were thinking of starting off with it initially in solr 4. In solr 4: can one get by without using collections = cores? We also don't plan on using SolrCloud at the moment. So from our standpoint the solr4 configuration looks more complicated, than that of solr 3.4. Are there any benefits of such a setup for non SolrCloud users? Thanks, Dmitry On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote: Hello list, Following the answer by Jaendra here: http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
Re: Partial Matching in both query and field
I figured it out for anyone finding this thread. I had to add the following to my solrconfig.xml luceneMatchVersionLUCENE_31/luceneMatchVersion http://www.searchspring.net/James Bathgate*Sr. Developer*888.643.9043 ext. 610 http://www.linkedin.com/in/bathgate On Thu, Jul 11, 2013 at 2:47 PM, James Bathgate ja...@b7interactive.comwrote: 1. My general process for a schema change (I know it's overkill) is delete the data directory, reload, index data, reload again. 2. I'm using schema version 1.5 on Solr 3.6.2. schema name=SearchSpringDefault version=1.5 3. LuceneQParser, but I've also tried dismax and edismax. Here's my solrQueryParser field in my schema, I think OR is correct for this. solrQueryParser defaultOperator=OR/ James [image: SearchSpring | Findability Unleashed] James Bathgate | Sr. Developer Toll Free (888) 643-9043 x610 - Fax (719) 358-2027 4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918 www.searchspring.net http://www.searchspring.net On Thu, Jul 11, 2013 at 2:29 PM, Jack Krupansky j...@basetechnology.comwrote: A couple of possibilities: 1. Make sure to reload the core. 2. Check that the Solr schema version is new enough to recognize autoGeneratePhraseQueries. 3. What query parser are you using? -- Jack Krupansky -Original Message- From: James Bathgate Sent: Thursday, July 11, 2013 5:26 PM To: solr-user@lucene.apache.org Subject: Re: Partial Matching in both query and field I just noticed I pasted the wrong fieldType with the extra tokenizer not commented out. fieldType name=ngram class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=**false analyzer type=index tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=**true/ filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=0 preserveOriginal=0/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**PatternReplaceFilterFactory pattern=0 replacement=o replace=all/ filter class=solr.**PatternReplaceFilterFactory pattern=1|l replacement=i replace=all/ filter class=solr.**NGramFilterFactory minGramSize=4 maxGramSize=16/ filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/ /analyzer analyzer type=query tokenizer class=solr.**NGramTokenizerFactory minGramSize=4 maxGramSize=16 / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=**true/ filter class=solr.**PatternReplaceFilterFactory pattern=[^A-Za-z0-9]+ replacement= replace=all/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**PatternReplaceFilterFactory pattern=0 replacement=o replace=all/ filter class=solr.**PatternReplaceFilterFactory pattern=1|l replacement=i replace=all/ filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/ /analyzer /fieldType [image: SearchSpring | Findability Unleashed] James Bathgate | Sr. Developer Toll Free (888) 643-9043 x610 - Fax (719) 358-2027 4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918 www.searchspring.net http://www.searchspring.net On Thu, Jul 11, 2013 at 2:15 PM, James Bathgate ja...@b7interactive.com **wrote: Jack, This still isn't working. I just upgraded to 3.6.2 to verify that wasn't the issue. Here's query information: lst name=params str name=debugQueryon/str str name=indenton/str str name=start0/str str name=q0_extrafield1_n:**20454/str str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=0 start=0/ lst name=debug str name=rawquerystring0_**extrafield1_n:20454/str str name=querystring0_**extrafield1_n:20454/str str name=parsedquery**PhraseQuery(0_extrafield1_n:**2o45 o454 2o454)/str str name=parsedquery_toString0_**extrafield1_n:2o45 o454 2o454/str lst name=explain/ str name=QParserLuceneQParser/**str Here's the applicable lines from schema.xml: fieldType name=ngram class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=**false analyzer type=index tokenizer class=solr.**WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=**true/ filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=0 preserveOriginal=0/ filter class=solr.**LowerCaseFilterFactory/
Re: [solr 3.4.1] collections: meaning and necessity
Hi Shawn, Thanks for your input. Having spent some time today figuring out the path to upgrade, I concluded that we have been using what is (and was in solr 3 and possibly earlier) called a core. A group of two cores (with different schemas) we (probably mistakenly) referred to as a shard. That is, the shard was a larger semantic unit or chunk of data that would repeat itself in configuration along the time axis. Each shard would hold data from a particular time period. What's a bit confusing, is that, at least in my vocabulary, a collection is similar to what the word group means. The confusion stems from the fact that in a core config one defines a collection. But, if we imagine a series of cores created with the same schema, they could be united into a group or a collection. Although to me, as a user (if the above explanation holds of course) a collection is an internal implementation detail. On Tue, Jul 16, 2013 at 9:59 PM, Shawn Heisey s...@elyograg.org wrote: On 7/16/2013 12:41 PM, Alexandre Rafalovitch wrote: Search this mailing list and you will find a very long discussion about the terminology and confusion around it.My contribution to that was the crude picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help. If you don't want longer URL, do use solr.xml and use @adminPath and @defaultCoreName parameters. But you don't need the rest. I'm relatively sure that defaultCoreName isn't there if you use the new core discovery mode that is default in the 4.4 example. This new mode will be the only option in 5.0. The old mode will continue to be supported throughout all 4.x versions. I think getting rid of defaultCoreName is the right move - Solr has been multicore in the standard example for quite some time. Accessing Solr without a corename in the URL is a source of confusion for users when they venture outside the collection1 core that comes with the default example. IMHO, the additional capability and confusion inherent with SolrCloud makes it even more important that the user include a collection/core name when making their request. Thanks, Shawn
Re: JVM Crashed - SOLR deployed in Tomcat
I don't know about jvm crashes, but it is known that the Java 6 jvm had various problems supporting Solr, including the 20-30 series. A lot of people use the final jvm release (I think 6_30). On 07/16/2013 12:25 PM, neoman wrote: Hello Everyone, We are using solrcloud with Tomcat in our production environment. Here is our configuration. solr-4.0.0 JVM 1.6.0_25 The JVM keeps crashing everyday with the following error. I think it is happening while we try index the data with solrj APIs. INFO: [aq-core] webapp=/solr path=/update params={distrib.from=http://solr03-prod:8080/solr/aq-core/update.distrib=TOLEADERwt=javabinversion=2} status=0 QTime=1 # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0xfd7ffadac771, pid=2411, tid=33662 # # JRE version: 6.0_25-b06 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode solaris-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats; # # An error report file with more information is saved as: # /opt/tomcat/hs_err_pid2411.log Jul 16, 2013 6:27:07 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp Instructions: (pc=0xfd7ffadac771) 0xfd7ffadac751: 89 4c 24 30 4c 89 44 24 28 4c 89 54 24 18 44 89 0xfd7ffadac761: 5c 24 20 4c 8b 57 10 4d 63 d9 49 8b ca 49 03 cb 0xfd7ffadac771: 44 0f be 01 45 8b d9 41 ff c3 44 89 5f 18 45 85 0xfd7ffadac781: c0 0f 8c b0 05 00 00 45 8b d0 45 8b da 41 d1 eb Register to memory mapping: RAX=0x14008cf2 is an unknown value RBX= [error occurred during error reporting (printing register info), id 0xb] Stack: [0xfd7de4eff000,0xfd7de4fff000], sp=0xfd7de4ffe140, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats; Please let me know if anyone has seen this before. Any input is appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Highlighting externally stored text
I'm trying to find a way to best highlight search results even though those results are not stored in my index. Has anyone been successful in reusing the SOLR highlighting logic on non-stored data? I was able to do this by slightly modifying the FastVectorHighlighter so that it returned before computing snippets, instead returning the term match offsets in the FieldPhraseList class. Of course you need to make sure that your files are encoded in such a way that a character always has the same byte width. -- Bryan
Re: Range query on a substring.
Hi guys, First of all, thanks for your response. Jack: Data structure was created some time ago and this is a new requirement in my project. I'm trying to find a solution. I wouldn't like to split multivalued field into N similar records varying in this particular field only. That could impact performance and imply more changes in backend architecture as well. I'd prefer to create yet another collection and use pseudo-joins... Roman: Your ideas seem to be much closer to what I'm looking for. However, the following syntax: text (1|2|3) does not work for me. Are you sure it works like OR inside a regexp ? By the way: Honestly, I have one more requirement for which I would have to extend Solr query syntax. Basically, it should be possible to do some math on few fields and do range query on the result (without indexing it, because a combination of different fields is allowed). I'd like to spend some time on ANTLR and the new way of parsing you mentioned. I will let you know if it was useful for me. Thanks. Kind regards. On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote: Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1* becomes behind the scenes: lucene (10|11|12|13|14|1abcd) the issue there is that it is a string range, but it is a range query - it just has to be indexed in a clever way So, Marcin, you still have quite a few options besides the strict boolean query model 1. have a special tokenizer chain which creates one token out of these groups (eg. some text prefix_1) and search for some text prefix_* [and do some post-filtering if necessary] 2. another version, using regex /some text (1|2|3...)/ - you got the idea 3. construct the lucene multi-term range query automatically, in your qparser - to produce a phrase query lucene (10|11|12|13|14) 4. use payloads to index your integer at the position of some text and then retrieve only some text where the payload is in range x-y - an example is here, look at getPayloadQuery() https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java- but this is more complex situation and if you google, you will find a better description 5. use a qparser that is able to handle nested search and analysis at the same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i know about a parser that can handle this and i invite others to check it out (yeah, JIRA tickets need reviewers ;-)) https://issues.apache.org/jira/browse/LUCENE-5014 there might be others i forgot, but it is certainly doable; but as Jack points out, you may want to stop for a moment to reflect whether it is necessary HTH, roman On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com wrote: Sorry, but you are basically misusing Solr (and multivalued fields), trying to take a shortcut to avoid a proper data model. To properly use Solr, you need to put each of these multivalued field values in a separate Solr document, with a text field and a value field. Then, you can query: text:some text AND value:[min-value TO max-value] Exactly how you should restructure your data model is dependent on all of your other requirements. You may be able to simply flatten your data. You may be able to use a simple join operation. Or, maybe you need to do a multi-step query operation if you data is sufficiently complex. If you want to keep your multivalued field in its current form for display purposes or keyword search, or exact match search, fine, but your stated goal is inconsistent with the Semantics of Solr and Lucene. To be crystal clear, there is no such thing as a range query on a substring in Solr or Lucene. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Tuesday, July 16, 2013 5:13 AM To: solr-user@lucene.apache.org Subject: Re: Range query on a substring. By multivalued I meant an array of values. For example: arr name=myfield strtext1 (X)/str strtext2 (Y)/str /arr I'd like to avoid spliting it as you propose. I have 2.3mn collection with pretty large records (few hundreds fields and more per record). Duplicating them would impact performance. Regards. On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote: Ah, you mean something like this: record: Id=10, text = this is a text N1 (X), another text N2 (Y), text N3 (Z) Id=11, text = this is a text N1 (W), another text N2 (Q), third text (M) and you need to search for: text N1 and X B ? How big is the core? the first thing that comes to my mind, again, at indexing level, split the text into pieces and index it in solr like this: record_id | text | value 10 | text N1 | X 10 | text N2 | Y 10 | text
Re: Clearing old nodes from zookeper without restarting solrcloud cluster
Unloading a core is the known way to unregister a solr node in zookeeper (and not use for further querying). It works for me. If you didn't do that like this, unused nodes may remain in the cluster state and Solr may try to use them without a success. I'd suggest to start some machine with the old name, run solr, join the cluster for a while, unload a core to unregister it from the cluster and shutdown host at the end. This way you could have clear cluster state. On 16 July 2013 14:41, Luis Carlos Guerrero Covo lcguerreroc...@gmail.comwrote: Thanks, I was actually asking about deleting nodes from the cluster state not cores, unless you can unload cores specific to an already offline node from zookeeper. On Tue, Jul 16, 2013 at 1:55 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, You should use CoreAdmin API (or Solr Admin page) and UNLOAD unneeded cores. This will unregister them from the zookeeper (cluster state will be updated), so they won't be used for querying any longer. Solrcloud restart is not needed in this case. Regards. On 16 July 2013 06:18, Ali, Saqib docbook@gmail.com wrote: Hello Luis, I don't think that is possible. If you delete clusterstate.json from zookeeper, you will need to restart the nodes.. I could be very wrong about this Saqib On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: I know that you can clear zookeeper's data directoy using the CLI with the clear command, I just want to know if its possible to update the cluster's state without wiping everything out. Anyone have any ideas/suggestions? On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo lcguerreroc...@gmail.com wrote: Hi, Is there an easy way to clear zookeeper of all offline solr nodes without restarting the cluster? We are having some stability issues and we think it maybe due to the leader querying old offline nodes. thank you, Luis Guerrero -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047 -- Luis Carlos Guerrero Covo M.S. Computer Engineering (57) 3183542047
Searching w/explicit Multi-Word Synonym Expansion
Hi Everyone, I'm using Solr (version 4.3) for the first time and through much research I got into writing a custom search handler using edismax to do relevancy searches. Of course, the client I'm preparing the search for also has synonyms (both bidirectional and explicit). After much research, I have managed to get the bidirectional synonyms to work, but we have one scenario that isn't behaving as expected. To simplify the example, imagine that my collection has 2 fields: Sku: String Title String Using CopyFields, I copy these to 2 more fields, SkuSearch and TitleSearch which have a type that corresponds to the following field type in the schema file: As you can see, the bidirectional synonyms (ones that look like the following: ipod, i-pod, iPod) are expanded and stored in the index (the synonyms.txt file) as per the best practices from the wiki. One unique thing I've seen is that we have a bunch of shortcut terms where a user wants to type in lp and it will bring up one of 5 skus. So I created a shortcuts.txt file that has only the explicit synonym mappings (like so: lp = 12345, 98765, 11010). My thought to including only these in the query analyzer portion is that since explicit synonyms are not expanded (since the sku values are already indexed in the field as they should be) and the expand=true is useless for explicit synonyms (based on my reading), I can just use the explicit synonym expand the query term to it's mapped skus and just find documents containing them, but it's not working like it does in my head :) I'll paste my handler below, here's the issue. for use cases like the one above it's working. It's when I have an entry in shortcuts.txt that looks like this: (hot dog = 12345, 67890, 10232) that I don't get anything back if I put in hot dog but I do get results when I use hot dog with quotes. Is there any way to get the results without quotes? am I doing something wrong altogether? are there any other suggestions? my search handler looks as follows: Thanks for any help that can be offered. --Dave -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to optimize a search?
I've sold it removing filter class=solr.DoubleMetaphoneFilterFactory inject=true/. But now, i have a problem. If i search for Rocket Bananaa ( with double 'a' ) the result don't appear in first. Any ideas how to fix it? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-optimize-a-search-tp4077531p4078468.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to optimize a search?
Rocket Banana (Single) should be first because its the closest to Rocket Banana. How can i get a ideal rank to return closests words in firsts position? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-optimize-a-search-tp4077531p4078470.html Sent from the Solr - User mailing list archive at Nabble.com.
Where to specify numShards when startup up a cloud setup
I want to script the creation of N solr cloud instances (on ec2). But its not clear to me where I would specify numShards setting. From documentation, I see you can specify on the first node you start up, OR alternatively, use the collections API to create a new collection - but in that case you need first at least one running SOLR instance. I want to push all solr instances with similar configuration onto N instances and just run them with some number of shards pre-set somehow. Where can I put numShards configuration setting? What I want to do: 1) push solr configuration to zookeeper ensemble using zkCli command-line tool. 2) create N instances of SOLR running on Ec2, pointing to the same zookeeper 3) start all SOLR instances which will become a cloud setup with M shards (where MN), and N-M replicas. Currently everything starts up with 1 shards, and N replicas. I already have one single collection pre-configured.
Re: Range query on a substring.
On Tue, Jul 16, 2013 at 5:08 PM, Marcin Rzewucki mrzewu...@gmail.comwrote: Hi guys, First of all, thanks for your response. Jack: Data structure was created some time ago and this is a new requirement in my project. I'm trying to find a solution. I wouldn't like to split multivalued field into N similar records varying in this particular field only. That could impact performance and imply more changes in backend architecture as well. I'd prefer to create yet another collection and use pseudo-joins... Roman: Your ideas seem to be much closer to what I'm looking for. However, the following syntax: text (1|2|3) does not work for me. Are you sure it works like OR inside a regexp ? I wasn't clear, sorry: the text (1|1|3) is a result of the term expansion - you can see something like that when you look at debugQuery=true output after you sent phrase quer* - lucene will search for the variants by enumerating the possible alternatives, hence phrase (token|token|token) it is possible to construct such a query manually, it depends on your application one more thing: the term expansion depends on the type of the field (ie. expanding string field is different from the int field type), yet you could very easily write a small processor that looks at the range values and treats them as numbers (*after* they were parsed by the qparser, but *before* they were built into a query - hmmm, now when I think of it... your values will be indexed as strings, so you have to search/expand into string byterefs - it's doable, just wanted to point out this detail - in normal situations, SOLR will be building query tokens using the string/text field, because your field will be of that type) roman By the way: Honestly, I have one more requirement for which I would have to extend Solr query syntax. Basically, it should be possible to do some math on few fields and do range query on the result (without indexing it, because a combination of different fields is allowed). I'd like to spend some time on ANTLR and the new way of parsing you mentioned. I will let you know if it was useful for me. Thanks. Kind regards. On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote: Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1* becomes behind the scenes: lucene (10|11|12|13|14|1abcd) the issue there is that it is a string range, but it is a range query - it just has to be indexed in a clever way So, Marcin, you still have quite a few options besides the strict boolean query model 1. have a special tokenizer chain which creates one token out of these groups (eg. some text prefix_1) and search for some text prefix_* [and do some post-filtering if necessary] 2. another version, using regex /some text (1|2|3...)/ - you got the idea 3. construct the lucene multi-term range query automatically, in your qparser - to produce a phrase query lucene (10|11|12|13|14) 4. use payloads to index your integer at the position of some text and then retrieve only some text where the payload is in range x-y - an example is here, look at getPayloadQuery() https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java- but this is more complex situation and if you google, you will find a better description 5. use a qparser that is able to handle nested search and analysis at the same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i know about a parser that can handle this and i invite others to check it out (yeah, JIRA tickets need reviewers ;-)) https://issues.apache.org/jira/browse/LUCENE-5014 there might be others i forgot, but it is certainly doable; but as Jack points out, you may want to stop for a moment to reflect whether it is necessary HTH, roman On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com wrote: Sorry, but you are basically misusing Solr (and multivalued fields), trying to take a shortcut to avoid a proper data model. To properly use Solr, you need to put each of these multivalued field values in a separate Solr document, with a text field and a value field. Then, you can query: text:some text AND value:[min-value TO max-value] Exactly how you should restructure your data model is dependent on all of your other requirements. You may be able to simply flatten your data. You may be able to use a simple join operation. Or, maybe you need to do a multi-step query operation if you data is sufficiently complex. If you want to keep your multivalued field in its current form for display purposes or keyword search, or exact match search, fine, but your stated goal is inconsistent with the Semantics of Solr and Lucene. To be crystal clear,
Re: Searching w/explicit Multi-Word Synonym Expansion
In case you were unaware, generalized multi-word synonym expansion is an unsolved problem in Lucene/Solr. Sure, some of the tools are there and you can sometimes make it work for some situations, but not for the general case. Some work has been in progress, but no near-term solution is at hand. -- Jack Krupansky -Original Message- From: dmarini Sent: Tuesday, July 16, 2013 5:23 PM To: solr-user@lucene.apache.org Subject: Searching w/explicit Multi-Word Synonym Expansion Hi Everyone, I'm using Solr (version 4.3) for the first time and through much research I got into writing a custom search handler using edismax to do relevancy searches. Of course, the client I'm preparing the search for also has synonyms (both bidirectional and explicit). After much research, I have managed to get the bidirectional synonyms to work, but we have one scenario that isn't behaving as expected. To simplify the example, imagine that my collection has 2 fields: Sku: String Title String Using CopyFields, I copy these to 2 more fields, SkuSearch and TitleSearch which have a type that corresponds to the following field type in the schema file: As you can see, the bidirectional synonyms (ones that look like the following: ipod, i-pod, iPod) are expanded and stored in the index (the synonyms.txt file) as per the best practices from the wiki. One unique thing I've seen is that we have a bunch of shortcut terms where a user wants to type in lp and it will bring up one of 5 skus. So I created a shortcuts.txt file that has only the explicit synonym mappings (like so: lp = 12345, 98765, 11010). My thought to including only these in the query analyzer portion is that since explicit synonyms are not expanded (since the sku values are already indexed in the field as they should be) and the expand=true is useless for explicit synonyms (based on my reading), I can just use the explicit synonym expand the query term to it's mapped skus and just find documents containing them, but it's not working like it does in my head :) I'll paste my handler below, here's the issue. for use cases like the one above it's working. It's when I have an entry in shortcuts.txt that looks like this: (hot dog = 12345, 67890, 10232) that I don't get anything back if I put in hot dog but I do get results when I use hot dog with quotes. Is there any way to get the results without quotes? am I doing something wrong altogether? are there any other suggestions? my search handler looks as follows: Thanks for any help that can be offered. --Dave -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to optimize a search?
Use fuzzy search instead of phonetic search. Phonetic search is a poor match to most queries. At Netflix, we dropped phonetic search and started using fuzzy. There was a clear improvement in the A/B test. wunder On Jul 16, 2013, at 2:25 PM, padcoe wrote: Rocket Banana (Single) should be first because its the closest to Rocket Banana. How can i get a ideal rank to return closests words in firsts position?
Re: Where to specify numShards when startup up a cloud setup
What does the solr.xml look like on the nodes? On Tue, Jul 16, 2013 at 2:36 PM, Robert Stewart robert_stew...@epam.comwrote: I want to script the creation of N solr cloud instances (on ec2). But its not clear to me where I would specify numShards setting. From documentation, I see you can specify on the first node you start up, OR alternatively, use the collections API to create a new collection - but in that case you need first at least one running SOLR instance. I want to push all solr instances with similar configuration onto N instances and just run them with some number of shards pre-set somehow. Where can I put numShards configuration setting? What I want to do: 1) push solr configuration to zookeeper ensemble using zkCli command-line tool. 2) create N instances of SOLR running on Ec2, pointing to the same zookeeper 3) start all SOLR instances which will become a cloud setup with M shards (where MN), and N-M replicas. Currently everything starts up with 1 shards, and N replicas. I already have one single collection pre-configured.
Re: Range query on a substring.
Hi Macrin, May be you can use https://issues.apache.org/jira/browse/SOLR-1604 . ComplexPhraseQueryParser supports ranges inside phrases. From: Marcin Rzewucki mrzewu...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, July 17, 2013 12:08 AM Subject: Re: Range query on a substring. Hi guys, First of all, thanks for your response. Jack: Data structure was created some time ago and this is a new requirement in my project. I'm trying to find a solution. I wouldn't like to split multivalued field into N similar records varying in this particular field only. That could impact performance and imply more changes in backend architecture as well. I'd prefer to create yet another collection and use pseudo-joins... Roman: Your ideas seem to be much closer to what I'm looking for. However, the following syntax: text (1|2|3) does not work for me. Are you sure it works like OR inside a regexp ? By the way: Honestly, I have one more requirement for which I would have to extend Solr query syntax. Basically, it should be possible to do some math on few fields and do range query on the result (without indexing it, because a combination of different fields is allowed). I'd like to spend some time on ANTLR and the new way of parsing you mentioned. I will let you know if it was useful for me. Thanks. Kind regards. On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote: Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1* becomes behind the scenes: lucene (10|11|12|13|14|1abcd) the issue there is that it is a string range, but it is a range query - it just has to be indexed in a clever way So, Marcin, you still have quite a few options besides the strict boolean query model 1. have a special tokenizer chain which creates one token out of these groups (eg. some text prefix_1) and search for some text prefix_* [and do some post-filtering if necessary] 2. another version, using regex /some text (1|2|3...)/ - you got the idea 3. construct the lucene multi-term range query automatically, in your qparser - to produce a phrase query lucene (10|11|12|13|14) 4. use payloads to index your integer at the position of some text and then retrieve only some text where the payload is in range x-y - an example is here, look at getPayloadQuery() https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java- but this is more complex situation and if you google, you will find a better description 5. use a qparser that is able to handle nested search and analysis at the same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i know about a parser that can handle this and i invite others to check it out (yeah, JIRA tickets need reviewers ;-)) https://issues.apache.org/jira/browse/LUCENE-5014 there might be others i forgot, but it is certainly doable; but as Jack points out, you may want to stop for a moment to reflect whether it is necessary HTH, roman On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com wrote: Sorry, but you are basically misusing Solr (and multivalued fields), trying to take a shortcut to avoid a proper data model. To properly use Solr, you need to put each of these multivalued field values in a separate Solr document, with a text field and a value field. Then, you can query: text:some text AND value:[min-value TO max-value] Exactly how you should restructure your data model is dependent on all of your other requirements. You may be able to simply flatten your data. You may be able to use a simple join operation. Or, maybe you need to do a multi-step query operation if you data is sufficiently complex. If you want to keep your multivalued field in its current form for display purposes or keyword search, or exact match search, fine, but your stated goal is inconsistent with the Semantics of Solr and Lucene. To be crystal clear, there is no such thing as a range query on a substring in Solr or Lucene. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Tuesday, July 16, 2013 5:13 AM To: solr-user@lucene.apache.org Subject: Re: Range query on a substring. By multivalued I meant an array of values. For example: arr name=myfield strtext1 (X)/str strtext2 (Y)/str /arr I'd like to avoid spliting it as you propose. I have 2.3mn collection with pretty large records (few hundreds fields and more per record). Duplicating them would impact performance. Regards. On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote: Ah, you mean something like this: record: Id=10, text = this is a text N1 (X), another text N2 (Y), text N3 (Z) Id=11, text = this is a text N1 (W), another text N2 (Q),
Re: Where to specify numShards when startup up a cloud setup
On 7/16/2013 3:36 PM, Robert Stewart wrote: I want to script the creation of N solr cloud instances (on ec2). But its not clear to me where I would specify numShards setting. From documentation, I see you can specify on the first node you start up, OR alternatively, use the collections API to create a new collection - but in that case you need first at least one running SOLR instance. I want to push all solr instances with similar configuration onto N instances and just run them with some number of shards pre-set somehow. Where can I put numShards configuration setting? What I want to do: 1) push solr configuration to zookeeper ensemble using zkCli command-line tool. 2) create N instances of SOLR running on Ec2, pointing to the same zookeeper 3) start all SOLR instances which will become a cloud setup with M shards (where MN), and N-M replicas. A minimal redundant SolrCloud cluster consists of two larger machines that run Solr and zookeeper, plus a third smaller machine that runs just zookeeper. This is just the minimum requirement, you can use additional and more powerful servers. The general way that you should set up a brand new SolrCloud. If anyone spots a problem with this, please don't hesitate to mention it: 1) Set up three hosts running standalone zookeeper, configured as a fully redundant ensemble. This is outside the scope of Solr documentation, please consult the zookeeper site: http://zookeeper.apache.org 2) Construct a zkHost parameter for your ZK ensemble. An example is below using the default zookeeper port of 2181. You'd need to use the proper port numbers, names, etc. The /chroot part is optional, but highly recommended. Use a name that has meaning for your SolrCloud cluster rather than chroot: -DzkHost=server1:2181,server2:2181,server3:2181/chroot By using the /chroot syntax, you can run more than one SolrCloud cluster on your zookeeper ensemble. Just use a different value for each cluster. 3) Start Solr with the same zkHost parameter on every Solr host, referring to the three zookeeper hosts already set up. You can use the same hosts for Solr as you did for zookeeper. 4) Use the zkcli script in example/cloud-scripts to upload a configuration set to zookeeper using the upconfig command. If you aren't using the Solr example or a custom install based on the example, then you'll need to examine the script to figure out how to run the java command manually and have it find the solr and zookeeper jars. 5) Use the Collections API to create a collection, referencing the uploaded config set and including additional parameters like numShards. If you have four Solr hosts, the following API call would work perfectly: http://server:port/solr/admin/collections?action=CREATEname=mycollectionnumShards=2replicationFactor=2collection.configName=mycfg Thanks, Shawn
Re: Range query on a substring.
Yeah, I was thinking about that. But... will it properly order 10 as being greater than 9? Usually, we used trie or sorted field types to assure numeric order, but a text field doesn't have that feature. Although I did think that maybe you could have a token filter that mapped numeric values to a fixed number of digits with leading zeros, and then they would be properly ordered. But, I don't think we have a token filter that can do that, although I imagine that a new one could be proposed. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Tuesday, July 16, 2013 6:33 PM To: solr-user@lucene.apache.org Subject: Re: Range query on a substring. Hi Macrin, May be you can use https://issues.apache.org/jira/browse/SOLR-1604 . ComplexPhraseQueryParser supports ranges inside phrases. From: Marcin Rzewucki mrzewu...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, July 17, 2013 12:08 AM Subject: Re: Range query on a substring. Hi guys, First of all, thanks for your response. Jack: Data structure was created some time ago and this is a new requirement in my project. I'm trying to find a solution. I wouldn't like to split multivalued field into N similar records varying in this particular field only. That could impact performance and imply more changes in backend architecture as well. I'd prefer to create yet another collection and use pseudo-joins... Roman: Your ideas seem to be much closer to what I'm looking for. However, the following syntax: text (1|2|3) does not work for me. Are you sure it works like OR inside a regexp ? By the way: Honestly, I have one more requirement for which I would have to extend Solr query syntax. Basically, it should be possible to do some math on few fields and do range query on the result (without indexing it, because a combination of different fields is allowed). I'd like to spend some time on ANTLR and the new way of parsing you mentioned. I will let you know if it was useful for me. Thanks. Kind regards. On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote: Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1* becomes behind the scenes: lucene (10|11|12|13|14|1abcd) the issue there is that it is a string range, but it is a range query - it just has to be indexed in a clever way So, Marcin, you still have quite a few options besides the strict boolean query model 1. have a special tokenizer chain which creates one token out of these groups (eg. some text prefix_1) and search for some text prefix_* [and do some post-filtering if necessary] 2. another version, using regex /some text (1|2|3...)/ - you got the idea 3. construct the lucene multi-term range query automatically, in your qparser - to produce a phrase query lucene (10|11|12|13|14) 4. use payloads to index your integer at the position of some text and then retrieve only some text where the payload is in range x-y - an example is here, look at getPayloadQuery() https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java- but this is more complex situation and if you google, you will find a better description 5. use a qparser that is able to handle nested search and analysis at the same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i know about a parser that can handle this and i invite others to check it out (yeah, JIRA tickets need reviewers ;-)) https://issues.apache.org/jira/browse/LUCENE-5014 there might be others i forgot, but it is certainly doable; but as Jack points out, you may want to stop for a moment to reflect whether it is necessary HTH, roman On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com wrote: Sorry, but you are basically misusing Solr (and multivalued fields), trying to take a shortcut to avoid a proper data model. To properly use Solr, you need to put each of these multivalued field values in a separate Solr document, with a text field and a value field. Then, you can query: text:some text AND value:[min-value TO max-value] Exactly how you should restructure your data model is dependent on all of your other requirements. You may be able to simply flatten your data. You may be able to use a simple join operation. Or, maybe you need to do a multi-step query operation if you data is sufficiently complex. If you want to keep your multivalued field in its current form for display purposes or keyword search, or exact match search, fine, but your stated goal is inconsistent with the Semantics of Solr and Lucene. To be crystal clear, there is no such thing as a range query on a substring in Solr or Lucene. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Tuesday, July 16, 2013 5:13 AM To: solr-user@lucene.apache.org
Re: Searching w/explicit Multi-Word Synonym Expansion
Hi Dmarin, Did you consider using http://wiki.apache.org/solr/QueryElevationComponent ? From: Jack Krupansky j...@basetechnology.com To: solr-user@lucene.apache.org Sent: Wednesday, July 17, 2013 12:53 AM Subject: Re: Searching w/explicit Multi-Word Synonym Expansion In case you were unaware, generalized multi-word synonym expansion is an unsolved problem in Lucene/Solr. Sure, some of the tools are there and you can sometimes make it work for some situations, but not for the general case. Some work has been in progress, but no near-term solution is at hand. -- Jack Krupansky -Original Message- From: dmarini Sent: Tuesday, July 16, 2013 5:23 PM To: solr-user@lucene.apache.org Subject: Searching w/explicit Multi-Word Synonym Expansion Hi Everyone, I'm using Solr (version 4.3) for the first time and through much research I got into writing a custom search handler using edismax to do relevancy searches. Of course, the client I'm preparing the search for also has synonyms (both bidirectional and explicit). After much research, I have managed to get the bidirectional synonyms to work, but we have one scenario that isn't behaving as expected. To simplify the example, imagine that my collection has 2 fields: Sku: String Title String Using CopyFields, I copy these to 2 more fields, SkuSearch and TitleSearch which have a type that corresponds to the following field type in the schema file: As you can see, the bidirectional synonyms (ones that look like the following: ipod, i-pod, iPod) are expanded and stored in the index (the synonyms.txt file) as per the best practices from the wiki. One unique thing I've seen is that we have a bunch of shortcut terms where a user wants to type in lp and it will bring up one of 5 skus. So I created a shortcuts.txt file that has only the explicit synonym mappings (like so: lp = 12345, 98765, 11010). My thought to including only these in the query analyzer portion is that since explicit synonyms are not expanded (since the sku values are already indexed in the field as they should be) and the expand=true is useless for explicit synonyms (based on my reading), I can just use the explicit synonym expand the query term to it's mapped skus and just find documents containing them, but it's not working like it does in my head :) I'll paste my handler below, here's the issue. for use cases like the one above it's working. It's when I have an entry in shortcuts.txt that looks like this: (hot dog = 12345, 67890, 10232) that I don't get anything back if I put in hot dog but I do get results when I use hot dog with quotes. Is there any way to get the results without quotes? am I doing something wrong altogether? are there any other suggestions? my search handler looks as follows: Thanks for any help that can be offered. --Dave -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr is not responding on deployment in tomcat
Yes, you need to use a different port for Solr. As for the contextpath, I have no idea. Best Erick On Tue, Jul 16, 2013 at 2:02 PM, Per Newgro per.new...@gmx.ch wrote: Thanks Eric, i've configured both to use 8080 (For wicket this is standard :-)). Do i have to assign a different port to solr if i use both webapps in the same container? Btw. the contextpath for my wicket app is /* Could that be a problem to? Per Am 15.07.2013 17:12, schrieb Erick Erickson: Sounds like Wicket and Solr are using the same port(s)... If you start Wicket first then look at the Solr logs, you might see some message about port already in use or some such. If this is SolrCloud, there are also the ZooKeeper ports to wonder about. Best Erick On Mon, Jul 15, 2013 at 6:49 AM, Per Newgro per.new...@gmx.ch wrote: Hi, maybe someone here can help me with my solr-4.3.1 issue. I've successful deployed the solr.war on a tomcat7 instance. Starting the tomcat with only the solr.war deployed - works nicely. I can see the admin interface and logs are clean. If i deploy my wicket-spring-data-solr based app (using the HttpSolrServer) after the solr app without restarting the tomcat = all is fine to. I've implemented a ping to see if server is up. code private void waitUntilSolrIsAvailable(int i) { if (i == 0) { logger.info(Check solr state...); } if (i 5) { throw new RuntimeException(Solr is not avaliable after more than 25 secs. Going down now.); } if (i 0) { try { logger.info(Wait for solr to get alive.); Thread.currentThread().wait(5000); } catch (InterruptedException e) { throw new RuntimeException(e); } } try { i++; SolrPingResponse r = solrServer.ping(); if (r.getStatus() 0) { waitUntilSolrIsAvailable(i); } logger.info(Solr is alive.); } catch (SolrServerException | IOException e) { throw new RuntimeException(e); } } /code Here i can see log log 54295 [localhost-startStop-2] INFO org.apache.wicket.Application – [wicket.project] init: Wicket extensions initializer INFO - 2013-07-15 12:07:45.261; de.company.service.SolrServerInitializationService; Check solr state... 54505 [localhost-startStop-2] INFO de.company.service.SolrServerInitializationService – Check solr state... INFO - 2013-07-15 12:07:45.768; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} hits=0 status=0 QTime=20 55012 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} hits=0 status=0 QTime=20 INFO - 2013-07-15 12:07:45.770; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} status=0 QTime=22 55014 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2} status=0 QTime=22 INFO - 2013-07-15 12:07:45.854; de.company.service.SolrServerInitializationService; Solr is alive. 55098 [localhost-startStop-2] INFO de.company.service.SolrServerInitializationService – Solr is alive. /log But if i restart the tomcat with both webapps (solr and wicket) the solr is not responding on the ping request. log INFO - 2013-07-15 12:02:27.634; org.apache.wicket.Application; [wicket.project] init: Wicket extensions initializer 11932 [localhost-startStop-1] INFO org.apache.wicket.Application – [wicket.project] init: Wicket extensions initializer INFO - 2013-07-15 12:02:27.787; de.company.service.SolrServerInitializationService; Check solr state... 12085 [localhost-startStop-1] INFO de.company.service.SolrServerInitializationService – Check solr state... /log What could that be or how can i get infos where this is stopping? Thanks for your support Per
Re: How to use joins in solr 4.3.1
You can only join on indexed fields, our Location:merchantId field is not indexed. Best Erick On Tue, Jul 16, 2013 at 2:48 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Found this post: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201302.mbox/%3CCAB_8Yd82aqq=oY6dBRmVjG7gvBBewmkZGF9V=fpne4xgkbu...@mail.gmail.com%3E And based on the answer, I modified my query: localhost:8983/solr/location/ select?fq={!join from=key to=merchantId fromIndex=merchant}*:* I don't see any errors, but my original problem still persists, no documents are returned. The two fields on which I am trying to join is: Merchant: field name=merchantId type=string indexed=true stored=true multiValued=false / Location: field name=merchantId type=string indexed=false stored=true multiValued=false / Thanks, -Utkarsh On Tue, Jul 16, 2013 at 11:39 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Looks like the JoinQParserPlugin is throwing an NPE. Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key to=merchantId fromIndex=merchant} 84343345 [qtp2012387303-16] ERROR org.apache.solr.core.SolrCore – java.lang.NullPointerException at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580) at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) 84343350 [qtp2012387303-16] INFO org.apache.solr.core.SolrCore – [location] webapp=/solr path=/select params={distrib=falsewt=javabinversion=2rows=10df=allTextfl=key,scoreshard.url=x:8983/solr/location/NOW=1373999694930start=0q=*:*_=1373999505886isShard=truefq={!join+from%3Dkey+to%3DmerchantId+fromIndex%3Dmerchant}fsv=true} status=500 QTime=6 84343351 [qtp2012387303-16] ERROR org.apache.solr.servlet.SolrDispatchFilter – null:java.lang.NullPointerException at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580) at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50) at
Re: SolrCloud: Collection API question and problem with core loading
All of the core loading stuff is on the server side, so CloudSolrServer isn't really germane (I don't think anyway). This is in a bit of flux, so try having one core that's loaded on startup even if it's just a dummy core. There's currently ongoing work to play nicer with no cores being defined at startup, but that's not in 4.3. Take a look at: http://wiki.apache.org/solr/CoreAdmin#CREATE where it talks about optional parameters. NOTE: 4.4 (release imminent) has substantial fixes for the whole persistence situation. Also note that solr.xml is going away as a place to store core information and core discovery will be supported only from 5.x on. Good Luck! Erick On Mon, Jul 15, 2013 at 9:05 PM, Patrick Mi patrick...@touchpointgroup.com wrote: Hi there, I run 2 solr instances ( Tomcat 7, Solr 4.3.0 , one shard),one external Zookeeper instance and have lots of cores. I use collection API to create the new core dynamically after the configuration for the core is uploaded to the Zookeeper and it all works fine. As there are so many cores it takes very long time to load them at start up I would like to start up the server quickly and load the cores on demand. When the core is created via collection API it is created with default parameter : loadOnStartup=true ( this can be seen in solr.xml ) Question: is there a way to specify this parameter so it can be set 'false' in collection API ? Problem: If I manually set loadOnStartup=true for the core I had exception below when I used CloudSolrServer to query the core : Error: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request Seems to me that CloudSolrServer will not trigger the core to be loaded. Is it possible to get the core loaded using CloudSolrServer? Regards, Patrick
Re: About Suggestions
Maybe it was lost, I tent to babble on... But use a copyField directive that doesn't have the EdgeNGramTokenizerFactory in the chain and get your suggestions from _that_ field rather than the one you do use currently. You can still search etc. on the one you now have, just get your suggestions from the copied field. Best Erick On Tue, Jul 16, 2013 at 8:39 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Thanks Eric, that is what I suspected. We are very happy with the four suggestions in the example (and all the others), but we would like to know which of them represents a full part number. Can you elaborate a little more how that could be achieved? Best regards, Alexander -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Dienstag, 16. Juli 2013 14:09 An: solr-user@lucene.apache.org Betreff: Re: About Suggestions Garbage in, garbage out G Your indexing analysis chain is breaking up the tokens via the EdgeNgramTokenizer and _putting those values in the index_. Then the TermsComponent is looking _only_ at the tokens in the index and giving you back exactly what you're asking for. So no, there's no way with that analysis chain to get only complete terms, at that level the fact that a term was part of a larger input token has been lost. In fact, if you were to enter something like terms.prefix=1n1 you'd likely see all your 3-grams that start with 1n1 etc. So use a copyfield and put these in a separate field that has only whole tokens or just take the EdgeNgramTokenizer from your current definition. If the latter, blow away your index and re-index from scratch. Best Erick On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Eric and everybody else! Thanks for trying to help. Here is the example: .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=tru eterms.limit=20terms.sort=indexterms.prefix=1n1187 returns int name=1n11871/int int name=1n1187a1/int int name=1n1187r1/int int name=1n1187ra1/int This list contains 3 complete part numbers but the third item (1n1187r) is not a complete part number. Is there a way to make terms tell if a term represents a complete value? (My guess is that this gets lost after ngram but I'm still hoping something can be done.) More config details: field name=suggest type=text_parts indexed=true stored=true required=false multiValued=true/ and fieldType name=text_parts class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=20 side=front/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Thanks, Alexander -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 13. Juli 2013 19:58 An: solr-user@lucene.apache.org Betreff: Re: About Suggestions Not quite sure what you mean here, a couple of examples would help. But since the term is using keyword tokenizer, then each thing you get back is a complete term, by definition. So I'm not quite sure what you're asking here. Best Erick On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander alexander.lochschm...@vishay.com wrote: Hi Solr people! We need to suggest part numbers in alphabetically order adding up to four characters to the already entered part number prefix. That works quite well with terms component acting on a multivalued field with keyword tokenizer and edge nGram filter. I am mentioning part numbers to indicate that each item in the multivalued field is a string without whitespace and where special characters like dashes cannot be seen as separators. Is there a way to know if the term (the suggestion) represents such a complete part number (without doing another query for each suggestion)? Since we are using SolJ, what we would need is something like boolean Term.isRepresentingCompleteFieldValue() Thanks, Alexander
Re: Need advice on performing 300 queries per second on solr index
Hello, 1. It depends on your query types data (complexity, featureset, paging) - geospatial could be something with calculation inside solr? 2. It depends massively on the document size field-selection (load a hundred of 100MB documents can take some time) 3. It depends especially on your disc IO / Ram utilization - are these dedicated machines ? 4. It depends on how often you changing your documents (cache warm-ups!!!, disc IO)! 5. What is the bottleneck? CPU ? RAM ? Disc? You should be able to give some more information about this. 6. It depends on the amount of cores (more cores must not be better - CPU-caching, OS-management overhead...) 7. Force cache hit-rate - means: control the type of queries, cluster them and send them to A or B - to have a higher chance for a cache hit. Maybe you can give some more details about the points I mentioned. Ralf On 07/16/2013 04:42 PM, adfel70 wrote: Hi I need to create a solr cluster that contains geospatial information and provides the ability to perform a few hundreds queries per second, each query should retrieve around 100k results. The data is around 100k documents, around 300gb total. I started with 2 shard cluster (replicationFactor 1) and a portion of the data - 20 gb. I run some load-tests and see that when 100 requests are sent in one second, the average qTime is around 4 seconds, but the average total response time (measuring from sending the request to solr untill getting a response ) reaches 20-25 seconds which is very bad. Currently I load-balance myself between the 2 solr servers (each request is sent to another server) Any advice on which resources do I need and how my solr cluster should look like? More shards? more replicas? another webserver? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html Sent from the Solr - User mailing list archive at Nabble.com.