Do not know if this mail got lost in between or no one noticed it!
On Thu, 2010-12-23 at 11:05 +0530, Sushant Sinha wrote:
Just a reminder that this patch is discussing how to break url, emails
etc into its components.
On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Just a reminder that this patch is discussing how to break url, emails etc
into its components.
On Mon, Oct 4, 2010 at 3:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
[ sorry for not responding on this sooner, it's been hectic the last
couple weeks ]
Sushant Sinha sushant...@gmail.com writes:
[ sorry for not responding on this sooner, it's been hectic the last
couple weeks ]
Sushant Sinha sushant...@gmail.com writes:
I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the ts parser can
already do. Why didn't you
On Wed, Sep 29, 2010 at 1:29 AM, Sushant Sinha sushant...@gmail.com wrote:
Any updates on this?
On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha sushant...@gmail.com
wrote:
I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do
Any updates on this?
On Tue, Sep 21, 2010 at 10:47 PM, Sushant Sinha sushant...@gmail.comwrote:
I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the ts parser can
already do. Why didn't you code the url-part mechanism
I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the ts parser can
already do. Why didn't you code the url-part mechanism using the
existing support for compound words?
I am not familiar with compound word implementation
Sushant Sinha sushant...@gmail.com writes:
For the headline generation to work properly, email/file/url/host need
to become skip tokens. Updating the patch with that change.
I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the
For the headline generation to work properly, email/file/url/host need
to become skip tokens. Updating the patch with that change.
-Sushant.
On Sat, 2010-09-04 at 13:25 +0530, Sushant Sinha wrote:
Updating the patch with emitting parttoken and registering it with
snowball config.
-Sushant.
Updating the patch with emitting parttoken and registering it with
snowball config.
-Sushant.
On Fri, 2010-09-03 at 09:44 -0400, Robert Haas wrote:
On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote:
I have attached a patch that emits parts of a host token, a url token,
On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote:
I have attached a patch that emits parts of a host token, a url token,
an email token and a file token. Further, it makes sure that a
host/url/email/file token and the first part-token are at the same
position in
I have attached a patch that emits parts of a host token, a url token,
an email token and a file token. Further, it makes sure that a
host/url/email/file token and the first part-token are at the same
position in tsvector.
The two major changes are:
1. Tokenization changes: The patch exploits
Hi,
On 08/01/2010 08:04 PM, Sushant Sinha wrote:
1. We do not have separate tokens wikipedia and org
2. If we have the two tokens we should have them at adjacent position so
that a phrase search for wikipedia org should work.
This would needlessly increase the number of tokens. Instead you'd
On 08/01/2010 08:04 PM, Sushant Sinha wrote:
1. We do not have separate tokens wikipedia and org
2. If we have the two tokens we should have them at adjacent position so
that a phrase search for wikipedia org should work.
This would needlessly increase the number of tokens. Instead you'd
Hi,
On 08/02/2010 03:12 PM, Sushant Sinha wrote:
The current text parser already returns url and url_path. That already
increases the number of unique tokens.
Well, I think I simply turned that off to be able to search for plain
words. It still works for complete URLs, those are just treated
On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha sushant...@gmail.com wrote:
The current text parser already returns url and url_path. That already
increases the number of unique tokens. I am only asking for adding of
normal english words as well so that if someone types only wikipedia
he gets a
On Mon, 2010-08-02 at 09:32 -0400, Robert Haas wrote:
On Mon, Aug 2, 2010 at 9:12 AM, Sushant Sinha sushant...@gmail.com wrote:
The current text parser already returns url and url_path. That already
increases the number of unique tokens. I am only asking for adding of
normal english words
Sushant Sinha sushant...@gmail.com writes:
This would needlessly increase the number of tokens. Instead you'd
better make it work like compound word support, having just wikipedia
and org as tokens.
The current text parser already returns url and url_path. That already
increases the number
Sushant Sinha sushant...@gmail.com wrote:
Yes thats what I am planning to do. I just wanted to see if anyone
can help me in estimating whether this is doable in the current
parser or I need to write a new one. If possible, then some idea
on how to go about implementing?
The current tsearch
On Mon, Aug 2, 2010 at 10:21 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Sushant Sinha sushant...@gmail.com wrote:
Yes thats what I am planning to do. I just wanted to see if anyone
can help me in estimating whether this is doable in the current
parser or I need to write a new one.
Currently the english parser in text search does not support multiple
words in the same position. Consider a word wikipedia.org. The text
search would return a single token wikipedia.org. However if someone
searches for wikipedia org then there will not be a match. There are
two problems here:
1.
20 matches
Mail list logo