Re: How to retrieve the index of a string within a field?

2009-10-09 Thread Chris Hostetter

: I have a field. The field has a sentence. If the user types in a word
: or a phrase, how can I return the index of this word or the index of
: the first word of the phrase?
: I tried to use bf=ord..., but it does not work as i expected.

for basic queries (term, phrase, etc...) position information is not 
available for the patched doucments ... you can use highlighting to 
re-compute where matches occured, but the accuracy of that information 
depends a lot on what your field type, query, and highligher options look 
like.  i don't believe we have any Highlighter options thta will just give 
you back the position information -- but one could be added.

for *true* positional matching info, there are the Span family of 
queries, which can actually return the exact information -- but there is 
no native query parser support for Spam queries in Solr, so you would need 
to customize your QParser to get that information.



-Hoss



Re: How to retrieve the index of a string within a field?

2009-10-09 Thread Sandeep Tagore

Hi Elaine,
As you are able to get the sentences which contains that phrase(when you use
double quotes), its ok with the 'text' field type. 
Frankly speaking, I don't know whether Solrj's http call will hung or not if
you try to get 100 thousands records at a time. I never tried that. But I
guess you can't display more than 1000 records at a time. 
The best thing I can suggest you is pagination. You can use 'start' and
'rows' parameters to get the results in slices... say 1000 records at a
time(start=0rows=1000, start=1001rows=2000). You can easily achieve
this using Solrj. In some scenarios, I tried to get 10k records at a time
and I didn't get any problem. If you get any heap space errors, try to
increase the space with JVM parameters.

Thanks,
Sandeep


Elaine Li wrote:
 
 Sandeep,
 
 When I submit query, i actually make sure the searched phrase is
 wrapped with double quotes. When I do that, it will only return
 sentences with 'get what you'. If it does not have double quotes, it
 will return all the sentences as described in your email because
 without double quotes, it is a 'get OR what OR you' query. I don't
 know too much about the concepts behind search. I just make use of
 whatever works for me. Do you think I am still ok using text as my
 sentence field type?
 
 If the return is 100 thousands of results, will Solrj's http call hung
 up on it?
 
 Thanks a lot.
 Elaine
 

-- 
View this message in context: 
http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25816222.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to retrieve the index of a string within a field?

2009-10-08 Thread Elaine Li
Sandeep,

When I submit query, i actually make sure the searched phrase is
wrapped with double quotes. When I do that, it will only return
sentences with 'get what you'. If it does not have double quotes, it
will return all the sentences as described in your email because
without double quotes, it is a 'get OR what OR you' query. I don't
know too much about the concepts behind search. I just make use of
whatever works for me. Do you think I am still ok using text as my
sentence field type?

If the return is 100 thousands of results, will Solrj's http call hung
up on it?

Thanks a lot.

Elaine

On Thu, Oct 8, 2009 at 1:31 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote:

 Elaine,
 The field type text contains tokenizer
 class=solr.WhitespaceTokenizerFactory/ in its definition. So all the
 sentences that are indexed / queried will be split in to words. So when you
 search for 'get what you', you will get sentences containing get, what, you,
 get what, get you, what you, get what you. So when you try to find the
 indexOf of the keyword in that sentence (from results), you may not get it
 all the times.

 Solrj can give the results in one shot but it uses http call. You cant avoid
 it. You don't need to query multiple times with Solrj. Query once, get the
 results, store them in java beans, process it and display the results.

 Regards,
 Sandeep


 Elaine Li wrote:

 Sandeep, I do get results when I search for get what you, not 0 results.
 What in my schema makes this difference?
 I need to learn Solrj. I am currently using javascript as a client and
 invoke http calls to get results to display in the browser. Can Solrj
 get all the results at one short w/o the http call? I need to do some
 postprocessing against all the results and then display the processed
 data. Submitting multiple http queries and post-process after each
 query does not seem to be the right way.

 --
 View this message in context: 
 http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to retrieve the index of a string within a field?

2009-10-07 Thread Elaine Li
Hi Sandeep,

Say the field field name=sentenceCan you get what you
want?/field, the field type is Text.

My query contains 'sentence:get what you'. Is it possible to get
number 2 directly from a query since the word 'get' is the 2nd token
in the sentence?

Thanks.

Elaine

On Wed, Oct 7, 2009 at 8:12 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote:

 Hi Elaine,
 What do you mean by index of this word.. do you want to return the first
 occurrence of the word in that sentence or the document id.
 Also which type of field is it? is it a Text or String? If that is of type
 Text.. u can't achieve that because the sentence will be tokenized.

 Sandeep


 Elaine Li wrote:

 I have a field. The field has a sentence. If the user types in a word
 or a phrase, how can I return the index of this word or the index of
 the first word of the phrase?
 I tried to use bf=ord..., but it does not work as i expected.


 --
 View this message in context: 
 http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25783936.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to retrieve the index of a string within a field?

2009-10-07 Thread Sandeep Tagore

Hi Elaine,
You can achieve that with some modifications in sol configuration files.
Generally text will be configured as 
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

When a field is declared as text(with above conf.) it will tokenized. Say,
for example, your sentence 
Can you get what you want? will become be tokenized like can, you, get,
what, you, want. So when you search for 'sentence:get what you' you will
get 0 results.

To achieve your objective you can remove Tokenizers in text configuration. 
The best way I suggest is to declare the field as type string. Search the
string with wild card like 'sentence:*get what you*' using sorlj client
and when you get try to records (results) save the output of
sentence.indexOf(keyword) in your java bean. Here sentence is a variable
declared in the java bean.
For more details you need to read the usage of Solrj. If you have any issues
in modifying the configuration post the configuration you have for the
fieldtype text and i will modify it for you.

Regards,
Sandeep Team


Elaine Li wrote:
 
 Say the field field name=sentenceCan you get what you
 want?/field, the field type is Text.
 
 My query contains 'sentence:get what you'. Is it possible to get
 number 2 directly from a query since the word 'get' is the 2nd token
 in the sentence?
 

-- 
View this message in context: 
http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to retrieve the index of a string within a field?

2009-10-07 Thread Elaine Li
Sandeep, I do get results when I search for get what you, not 0 results.

What in my schema makes this difference?

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
!--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
!--filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/ --
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
!--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

I need to learn Solrj. I am currently using javascript as a client and
invoke http calls to get results to display in the browser. Can Solrj
get all the results at one short w/o the http call? I need to do some
postprocessing against all the results and then display the processed
data. Submitting multiple http queries and post-process after each
query does not seem to be the right way.

Thanks.

Elaine

On Wed, Oct 7, 2009 at 11:06 AM, Sandeep Tagore
sandeep.tag...@gmail.com wrote:

 Hi Elaine,
 You can achieve that with some modifications in sol configuration files.
 Generally text will be configured as
 fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
        filter class=solr.LowerCaseFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.LowerCaseFilterFactory/
      /analyzer
    /fieldType

 When a field is declared as text(with above conf.) it will tokenized. Say,
 for example, your sentence
 Can you get what you want? will become be tokenized like can, you, get,
 what, you, want. So when you search for 'sentence:get what you' you will
 get 0 results.

 To achieve your objective you can remove Tokenizers in text configuration.
 The best way I suggest is to declare the field as type string. Search the
 string with wild card like 'sentence:*get what you*' using sorlj client
 and when you get try to records (results) save the output of
 sentence.indexOf(keyword) in your java bean. Here sentence is a variable
 declared in the java bean.
 For more details you need to read the usage of Solrj. If you have any issues
 in modifying the configuration post the configuration you have for the
 fieldtype text and i will modify it for you.

 Regards,
 Sandeep Team


 Elaine Li wrote:

 Say the field field name=sentenceCan you get what you
 want?/field, the field type is Text.

 My query contains 'sentence:get what you'. Is it possible to get
 number 2 directly from a query since the word 'get' is the 2nd token
 in the sentence?


 --
 View this message in context: 
 http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25788406.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: How to retrieve the index of a string within a field?

2009-10-07 Thread Sandeep Tagore

Elaine,
The field type text contains tokenizer
class=solr.WhitespaceTokenizerFactory/ in its definition. So all the
sentences that are indexed / queried will be split in to words. So when you
search for 'get what you', you will get sentences containing get, what, you,
get what, get you, what you, get what you. So when you try to find the
indexOf of the keyword in that sentence (from results), you may not get it
all the times.

Solrj can give the results in one shot but it uses http call. You cant avoid
it. You don't need to query multiple times with Solrj. Query once, get the
results, store them in java beans, process it and display the results.

Regards,
Sandeep


Elaine Li wrote:
 
 Sandeep, I do get results when I search for get what you, not 0 results.
 What in my schema makes this difference?
 I need to learn Solrj. I am currently using javascript as a client and
 invoke http calls to get results to display in the browser. Can Solrj
 get all the results at one short w/o the http call? I need to do some
 postprocessing against all the results and then display the processed
 data. Submitting multiple http queries and post-process after each
 query does not seem to be the right way.
 
-- 
View this message in context: 
http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to retrieve the index of a string within a field?

2009-10-06 Thread Elaine Li
Hi,

I have a field. The field has a sentence. If the user types in a word
or a phrase, how can I return the index of this word or the index of
the first word of the phrase?
I tried to use bf=ord..., but it does not work as i expected.

Thanks.

Elaine