Re: How to retrieve the index of a string within a field?
: I have a field. The field has a sentence. If the user types in a word : or a phrase, how can I return the index of this word or the index of : the first word of the phrase? : I tried to use bf=ord..., but it does not work as i expected. for basic queries (term, phrase, etc...) position information is not available for the patched doucments ... you can use highlighting to re-compute where matches occured, but the accuracy of that information depends a lot on what your field type, query, and highligher options look like. i don't believe we have any Highlighter options thta will just give you back the position information -- but one could be added. for *true* positional matching info, there are the Span family of queries, which can actually return the exact information -- but there is no native query parser support for Spam queries in Solr, so you would need to customize your QParser to get that information. -Hoss
Re: How to retrieve the index of a string within a field?
Hi Elaine, As you are able to get the sentences which contains that phrase(when you use double quotes), its ok with the 'text' field type. Frankly speaking, I don't know whether Solrj's http call will hung or not if you try to get 100 thousands records at a time. I never tried that. But I guess you can't display more than 1000 records at a time. The best thing I can suggest you is pagination. You can use 'start' and 'rows' parameters to get the results in slices... say 1000 records at a time(start=0rows=1000, start=1001rows=2000). You can easily achieve this using Solrj. In some scenarios, I tried to get 10k records at a time and I didn't get any problem. If you get any heap space errors, try to increase the space with JVM parameters. Thanks, Sandeep Elaine Li wrote: Sandeep, When I submit query, i actually make sure the searched phrase is wrapped with double quotes. When I do that, it will only return sentences with 'get what you'. If it does not have double quotes, it will return all the sentences as described in your email because without double quotes, it is a 'get OR what OR you' query. I don't know too much about the concepts behind search. I just make use of whatever works for me. Do you think I am still ok using text as my sentence field type? If the return is 100 thousands of results, will Solrj's http call hung up on it? Thanks a lot. Elaine -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25816222.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve the index of a string within a field?
Sandeep, When I submit query, i actually make sure the searched phrase is wrapped with double quotes. When I do that, it will only return sentences with 'get what you'. If it does not have double quotes, it will return all the sentences as described in your email because without double quotes, it is a 'get OR what OR you' query. I don't know too much about the concepts behind search. I just make use of whatever works for me. Do you think I am still ok using text as my sentence field type? If the return is 100 thousands of results, will Solrj's http call hung up on it? Thanks a lot. Elaine On Thu, Oct 8, 2009 at 1:31 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Elaine, The field type text contains tokenizer class=solr.WhitespaceTokenizerFactory/ in its definition. So all the sentences that are indexed / queried will be split in to words. So when you search for 'get what you', you will get sentences containing get, what, you, get what, get you, what you, get what you. So when you try to find the indexOf of the keyword in that sentence (from results), you may not get it all the times. Solrj can give the results in one shot but it uses http call. You cant avoid it. You don't need to query multiple times with Solrj. Query once, get the results, store them in java beans, process it and display the results. Regards, Sandeep Elaine Li wrote: Sandeep, I do get results when I search for get what you, not 0 results. What in my schema makes this difference? I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way. -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve the index of a string within a field?
Hi Sandeep, Say the field field name=sentenceCan you get what you want?/field, the field type is Text. My query contains 'sentence:get what you'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? Thanks. Elaine On Wed, Oct 7, 2009 at 8:12 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Hi Elaine, What do you mean by index of this word.. do you want to return the first occurrence of the word in that sentence or the document id. Also which type of field is it? is it a Text or String? If that is of type Text.. u can't achieve that because the sentence will be tokenized. Sandeep Elaine Li wrote: I have a field. The field has a sentence. If the user types in a word or a phrase, how can I return the index of this word or the index of the first word of the phrase? I tried to use bf=ord..., but it does not work as i expected. -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25783936.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve the index of a string within a field?
Hi Elaine, You can achieve that with some modifications in sol configuration files. Generally text will be configured as fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When a field is declared as text(with above conf.) it will tokenized. Say, for example, your sentence Can you get what you want? will become be tokenized like can, you, get, what, you, want. So when you search for 'sentence:get what you' you will get 0 results. To achieve your objective you can remove Tokenizers in text configuration. The best way I suggest is to declare the field as type string. Search the string with wild card like 'sentence:*get what you*' using sorlj client and when you get try to records (results) save the output of sentence.indexOf(keyword) in your java bean. Here sentence is a variable declared in the java bean. For more details you need to read the usage of Solrj. If you have any issues in modifying the configuration post the configuration you have for the fieldtype text and i will modify it for you. Regards, Sandeep Team Elaine Li wrote: Say the field field name=sentenceCan you get what you want?/field, the field type is Text. My query contains 'sentence:get what you'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve the index of a string within a field?
Sandeep, I do get results when I search for get what you, not 0 results. What in my schema makes this difference? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ !--filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way. Thanks. Elaine On Wed, Oct 7, 2009 at 11:06 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote: Hi Elaine, You can achieve that with some modifications in sol configuration files. Generally text will be configured as fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When a field is declared as text(with above conf.) it will tokenized. Say, for example, your sentence Can you get what you want? will become be tokenized like can, you, get, what, you, want. So when you search for 'sentence:get what you' you will get 0 results. To achieve your objective you can remove Tokenizers in text configuration. The best way I suggest is to declare the field as type string. Search the string with wild card like 'sentence:*get what you*' using sorlj client and when you get try to records (results) save the output of sentence.indexOf(keyword) in your java bean. Here sentence is a variable declared in the java bean. For more details you need to read the usage of Solrj. If you have any issues in modifying the configuration post the configuration you have for the fieldtype text and i will modify it for you. Regards, Sandeep Team Elaine Li wrote: Say the field field name=sentenceCan you get what you want?/field, the field type is Text. My query contains 'sentence:get what you'. Is it possible to get number 2 directly from a query since the word 'get' is the 2nd token in the sentence? -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve the index of a string within a field?
Elaine, The field type text contains tokenizer class=solr.WhitespaceTokenizerFactory/ in its definition. So all the sentences that are indexed / queried will be split in to words. So when you search for 'get what you', you will get sentences containing get, what, you, get what, get you, what you, get what you. So when you try to find the indexOf of the keyword in that sentence (from results), you may not get it all the times. Solrj can give the results in one shot but it uses http call. You cant avoid it. You don't need to query multiple times with Solrj. Query once, get the results, store them in java beans, process it and display the results. Regards, Sandeep Elaine Li wrote: Sandeep, I do get results when I search for get what you, not 0 results. What in my schema makes this difference? I need to learn Solrj. I am currently using javascript as a client and invoke http calls to get results to display in the browser. Can Solrj get all the results at one short w/o the http call? I need to do some postprocessing against all the results and then display the processed data. Submitting multiple http queries and post-process after each query does not seem to be the right way. -- View this message in context: http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html Sent from the Solr - User mailing list archive at Nabble.com.
How to retrieve the index of a string within a field?
Hi, I have a field. The field has a sentence. If the user types in a word or a phrase, how can I return the index of this word or the index of the first word of the phrase? I tried to use bf=ord..., but it does not work as i expected. Thanks. Elaine