Re: What are stopwords and protwords ???

2008-05-21 Thread Grant Ingersoll
Stopwords are commonly occurring words that don't add _much_ value to  
search, such as the, an, a and are usually removed during analysis.
Protwords (protected words) are words that would be stemmed by the  
English porter stemmer that you do not want to be stemmed.


In the end, removing stopwords may keep your index smaller and can  
keep some queries from taking a long time, but they also mean you  
can't query for those words.  As for protwords, that is something you  
would do if you felt the results for those tokens was off.


Many people use stopwords, many don't.   Personally, I don't think  
removing them is the right thing to do, as there isn't always a way to  
recover them and they do provide meaning, otherwise why would they be  
needed in the language?  Often, the best thing to do, is keep  
stopwords, but handle them intelligently on the query side (in  
phrases, etc.).  However, since you're a beginner, it probably makes  
sense to just throw out stopwords for now.


-Grant

On May 21, 2008, at 1:50 AM, Akeel wrote:


Hi,

I am a beginner to Solr, I have successfully indexed my db in solr.  
I want
to know that what are the stopwords and protwords ??? and how much  
they have

effect on my search results ?



Thanks in advance.



--

Akeel





Re: What are stopwords and protwords ???

2008-05-21 Thread Akeel
Thank you very much for such a detailed reply. can you please tell me how
can i interact with solr from within my Java/JSP application ? I mean how to
query the solr running at localhost and getting results back in the
application. Do i have to change something there in solrconfig.xml ? Please
help me in this regards

Thanks in advance

--
Akeel

On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll [EMAIL PROTECTED]
wrote:

 Stopwords are commonly occurring words that don't add _much_ value to
 search, such as the, an, a and are usually removed during analysis.
 Protwords (protected words) are words that would be stemmed by the English
 porter stemmer that you do not want to be stemmed.

 In the end, removing stopwords may keep your index smaller and can keep
 some queries from taking a long time, but they also mean you can't query for
 those words.  As for protwords, that is something you would do if you felt
 the results for those tokens was off.

 Many people use stopwords, many don't.   Personally, I don't think removing
 them is the right thing to do, as there isn't always a way to recover them
 and they do provide meaning, otherwise why would they be needed in the
 language?  Often, the best thing to do, is keep stopwords, but handle them
 intelligently on the query side (in phrases, etc.).  However, since you're a
 beginner, it probably makes sense to just throw out stopwords for now.

 -Grant


 On May 21, 2008, at 1:50 AM, Akeel wrote:

  Hi,

 I am a beginner to Solr, I have successfully indexed my db in solr. I want
 to know that what are the stopwords and protwords ??? and how much they
 have
 effect on my search results ?



 Thanks in advance.



 --

 Akeel





-- 
Thanks and Regards,
Akeel ur Rehman Faridee
http://riseofpakistan.blogspot.com
cell: 0321-4714151

When there is injustice in society, then everyone will go to politics
Except the two kinds: those who are timid and those who are materialist
(Aristotle)



Re: What are stopwords and protwords ???

2008-05-21 Thread Shalin Shekhar Mangar
Hi Akeel,

Take a look at SolrJ which is a Java client library for Solr. It is
packaged with the Solr nightly binary downloads. This can be used by
your Java/JSP application to add documents or query Solr. No changes
to any config files is needed.

On Wed, May 21, 2008 at 5:15 PM, Akeel [EMAIL PROTECTED] wrote:
 Thank you very much for such a detailed reply. can you please tell me how
 can i interact with solr from within my Java/JSP application ? I mean how to
 query the solr running at localhost and getting results back in the
 application. Do i have to change something there in solrconfig.xml ? Please
 help me in this regards

 Thanks in advance

 --
 Akeel

 On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:

 Stopwords are commonly occurring words that don't add _much_ value to
 search, such as the, an, a and are usually removed during analysis.
 Protwords (protected words) are words that would be stemmed by the English
 porter stemmer that you do not want to be stemmed.

 In the end, removing stopwords may keep your index smaller and can keep
 some queries from taking a long time, but they also mean you can't query for
 those words.  As for protwords, that is something you would do if you felt
 the results for those tokens was off.

 Many people use stopwords, many don't.   Personally, I don't think removing
 them is the right thing to do, as there isn't always a way to recover them
 and they do provide meaning, otherwise why would they be needed in the
 language?  Often, the best thing to do, is keep stopwords, but handle them
 intelligently on the query side (in phrases, etc.).  However, since you're a
 beginner, it probably makes sense to just throw out stopwords for now.

 -Grant


 On May 21, 2008, at 1:50 AM, Akeel wrote:

  Hi,

 I am a beginner to Solr, I have successfully indexed my db in solr. I want
 to know that what are the stopwords and protwords ??? and how much they
 have
 effect on my search results ?



 Thanks in advance.



 --

 Akeel





 --
 Thanks and Regards,
 Akeel ur Rehman Faridee
 http://riseofpakistan.blogspot.com
 cell: 0321-4714151
 
 When there is injustice in society, then everyone will go to politics
 Except the two kinds: those who are timid and those who are materialist
 (Aristotle)
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: What are stopwords and protwords ???

2008-05-21 Thread Shalin Shekhar Mangar
Here's the link to wiki documentation on SolrJ

http://wiki.apache.org/solr/Solrj

On Wed, May 21, 2008 at 11:09 PM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 Hi Akeel,

 Take a look at SolrJ which is a Java client library for Solr. It is
 packaged with the Solr nightly binary downloads. This can be used by
 your Java/JSP application to add documents or query Solr. No changes
 to any config files is needed.

 On Wed, May 21, 2008 at 5:15 PM, Akeel [EMAIL PROTECTED] wrote:
 Thank you very much for such a detailed reply. can you please tell me how
 can i interact with solr from within my Java/JSP application ? I mean how to
 query the solr running at localhost and getting results back in the
 application. Do i have to change something there in solrconfig.xml ? Please
 help me in this regards

 Thanks in advance

 --
 Akeel

 On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:

 Stopwords are commonly occurring words that don't add _much_ value to
 search, such as the, an, a and are usually removed during analysis.
 Protwords (protected words) are words that would be stemmed by the English
 porter stemmer that you do not want to be stemmed.

 In the end, removing stopwords may keep your index smaller and can keep
 some queries from taking a long time, but they also mean you can't query for
 those words.  As for protwords, that is something you would do if you felt
 the results for those tokens was off.

 Many people use stopwords, many don't.   Personally, I don't think removing
 them is the right thing to do, as there isn't always a way to recover them
 and they do provide meaning, otherwise why would they be needed in the
 language?  Often, the best thing to do, is keep stopwords, but handle them
 intelligently on the query side (in phrases, etc.).  However, since you're a
 beginner, it probably makes sense to just throw out stopwords for now.

 -Grant


 On May 21, 2008, at 1:50 AM, Akeel wrote:

  Hi,

 I am a beginner to Solr, I have successfully indexed my db in solr. I want
 to know that what are the stopwords and protwords ??? and how much they
 have
 effect on my search results ?



 Thanks in advance.



 --

 Akeel





 --
 Thanks and Regards,
 Akeel ur Rehman Faridee
 http://riseofpakistan.blogspot.com
 cell: 0321-4714151
 
 When there is injustice in society, then everyone will go to politics
 Except the two kinds: those who are timid and those who are materialist
 (Aristotle)
 




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.


Re: What are stopwords and protwords ???

2008-05-21 Thread Akeel
thanks everyone

On Thu, May 22, 2008 at 7:18 AM, Grant Ingersoll [EMAIL PROTECTED]
wrote:

 See http://lucene.apache.org/solr/tutorial.html.  You can also see the
 wiki for a whole bunch of docs, including links to tutorials, etc.

 Also, just for future reference, please separate out questions so that they
 can be addressed separately, and more easily found by others in the future.

 -Grant


 On May 21, 2008, at 7:45 AM, Akeel wrote:

  Thank you very much for such a detailed reply. can you please tell me how
 can i interact with solr from within my Java/JSP application ? I mean how
 to
 query the solr running at localhost and getting results back in the
 application. Do i have to change something there in solrconfig.xml ?
 Please
 help me in this regards

 Thanks in advance

 --
 Akeel

 On Wed, May 21, 2008 at 4:11 PM, Grant Ingersoll [EMAIL PROTECTED]
 wrote:

  Stopwords are commonly occurring words that don't add _much_ value to
 search, such as the, an, a and are usually removed during analysis.
 Protwords (protected words) are words that would be stemmed by the
 English
 porter stemmer that you do not want to be stemmed.

 In the end, removing stopwords may keep your index smaller and can keep
 some queries from taking a long time, but they also mean you can't query
 for
 those words.  As for protwords, that is something you would do if you
 felt
 the results for those tokens was off.

 Many people use stopwords, many don't.   Personally, I don't think
 removing
 them is the right thing to do, as there isn't always a way to recover
 them
 and they do provide meaning, otherwise why would they be needed in the
 language?  Often, the best thing to do, is keep stopwords, but handle
 them
 intelligently on the query side (in phrases, etc.).  However, since
 you're a
 beginner, it probably makes sense to just throw out stopwords for now.

 -Grant


 On May 21, 2008, at 1:50 AM, Akeel wrote:

 Hi,


 I am a beginner to Solr, I have successfully indexed my db in solr. I
 want
 to know that what are the stopwords and protwords ??? and how much they
 have
 effect on my search results ?



 Thanks in advance.



 --

 Akeel





 --
 Thanks and Regards,
 Akeel ur Rehman Faridee
 http://riseofpakistan.blogspot.com
 cell: 0321-4714151
 
 When there is injustice in society, then everyone will go to politics
 Except the two kinds: those who are timid and those who are materialist
 (Aristotle)
 


 --
 Grant Ingersoll

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ









-- 
Thanks and Regards,
Akeel ur Rehman Faridee
http://riseofpakistan.blogspot.com
cell: 0321-4714151

When there is injustice in society, then everyone will go to politics
Except the two kinds: those who are timid and those who are materialist
(Aristotle)