Re: Considering intermediary solution before Lucene question

2004-11-17 Thread jeichels
I thank you both.  I have it already partly implemented here.   It seems easy.

At least this should carry through my product until I can really get to use 
Lucene.  I am not sure how far I can take MySql with stemmed, indexed key 
words, but should give me maybe 6 monthes at least of something useful as 
opposed to impossible searching.  I need time and this might just be the trick.

Always I fight for simplicity, but it is hard when you have 2 databases that 
have to keep in synch.  If accuracy is important (people paying money) then 
handling all of the edge cases (such as the question that was just asked about 
if the machine goes down) are so important.  I understand this is beyond the 
scope of Lucene.

Thank you for the help.  This really is an interesting project.

JohnE



- Original Message -
From: Chris Lamprecht <[EMAIL PROTECTED]>
Date: Wednesday, November 17, 2004 7:08 pm
Subject: Re: Considering intermediary solution before Lucene question

> John,
> 
> It actually should be pretty easy to use just the parts of Lucene you
> want (the analyzers, etc) without using the rest.  See the example of
> the PorterStemmer from this article:
> 
> http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2
> 
> You could feed a Reader to the tokenStream() method of
> PorterStemAnalyzer, and get back a TokenStream, from which you pull
> the tokens using the next() method.
> 
> 
> 
> On Wed, 17 Nov 2004 18:54:07 -0500, [EMAIL PROTECTED]
> <[EMAIL PROTECTED]> wrote:
> > 
> > Is there a way to use Lucene stemming and stop word removal 
> without using the rest of the tool?   I am downloading the code 
> now, but I imagine the answer might be deeply burried.  I would 
> like to be able to send in a phrase and get back a collection of 
> keywords if possible.
> > 
> > I am thinking of using an intermediary solution before moving 
> fully to Lucene.  I don't have time to spend a month making a 
> carefully tested, administratable Lucene solution for my site yet, 
> but I intend to do so over time.  Funny thing is the Lucene code 
> likely would only take up a couple hundred of lines, but 
> integration and administration would take me much more time.
> > 
> > In the meantime, I am thinking I could use perhaps Lucene 
> steming and parsing of words, then stick each search word along 
> with the associated primary key in an indexed MySql table.   Each 
> record I would need to do this to is small with maybe only average 
> 15 userful words.   I would be able to have an in-database 
> solution though ranking, etc would not exist.   This is better 
> then the exact word searching i have currently which is really bad.
> > 
> > By the way, MySql 4.1.1 has some Lucene type handling, but it 
> too does not have stemming and I am sure it is very slow compaired 
> to Lucene.   Cpanel is still stuck on MySql 4.0.* so many people 
> would not have access to even this basic ability in production 
> systems for some time yet.
> > 
> > JohnE
> > 
> > -
> 
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> >
> 
> ---
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Considering intermediary solution before Lucene question

2004-11-17 Thread Chris Lamprecht
John,

It actually should be pretty easy to use just the parts of Lucene you
want (the analyzers, etc) without using the rest.  See the example of
the PorterStemmer from this article:

http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2

You could feed a Reader to the tokenStream() method of
PorterStemAnalyzer, and get back a TokenStream, from which you pull
the tokens using the next() method.



On Wed, 17 Nov 2004 18:54:07 -0500, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> 
> Is there a way to use Lucene stemming and stop word removal without using the 
> rest of the tool?   I am downloading the code now, but I imagine the answer 
> might be deeply burried.  I would like to be able to send in a phrase and get 
> back a collection of keywords if possible.
> 
> I am thinking of using an intermediary solution before moving fully to 
> Lucene.  I don't have time to spend a month making a carefully tested, 
> administratable Lucene solution for my site yet, but I intend to do so over 
> time.  Funny thing is the Lucene code likely would only take up a couple 
> hundred of lines, but integration and administration would take me much more 
> time.
> 
> In the meantime, I am thinking I could use perhaps Lucene steming and parsing 
> of words, then stick each search word along with the associated primary key 
> in an indexed MySql table.   Each record I would need to do this to is small 
> with maybe only average 15 userful words.   I would be able to have an 
> in-database solution though ranking, etc would not exist.   This is better 
> then the exact word searching i have currently which is really bad.
> 
> By the way, MySql 4.1.1 has some Lucene type handling, but it too does not 
> have stemming and I am sure it is very slow compaired to Lucene.   Cpanel is 
> still stuck on MySql 4.0.* so many people would not have access to even this 
> basic ability in production systems for some time yet.
> 
> JohnE
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Considering intermediary solution before Lucene question

2004-11-17 Thread jeichels
This is so cool Otis.  I was just to write this off of something in the FAQ, 
but this is better then what I was doing.

This rocks!!!  Thank you.

JohnE

P.S.:  I am assuming you use org.apache.lucene.analysis.Token?   There are 
three Token's under Lucene.



- Original Message -
From: Otis Gospodnetic <[EMAIL PROTECTED]>
Date: Wednesday, November 17, 2004 7:17 pm
Subject: Re: Considering intermediary solution before Lucene question

> Yes, you can use just the Analysis part.  For instance, I use this for
> http://www.simpy.com and I believe we also have this in the Lucene 
> bookas part of the source code package:
> 
>/**
> * Gets Tokens extracted from the given text, using the specified
> Analyzer.
> *
> * @param analyzer the Analyzer to use
> * @param text the text to analyze
> * @param field the field to pass to the Analyzer for tokenization
> * @return an array of Tokens
> * @exception IOException if an error occurs
> */
>public static Token[] getTokens(Analyzer analyzer, String text,
> String field)
>throws IOException
>{
>TokenStream stream = analyzer.tokenStream(field, new
> StringReader(text));
>ArrayList tokenList = new ArrayList();
>while (true) {
>Token token = stream.next();
>if (token == null)
>break;
>tokenList.add(token);
>}
>return (Token[]) tokenList.toArray(new Token[0]);
>}
> 
> Otis
> 
> --- [EMAIL PROTECTED] wrote:
> 
> > 
> > Is there a way to use Lucene stemming and stop word removal without
> > using the rest of the tool?   I am downloading the code now, but I
> > imagine the answer might be deeply burried.  I would like to be able
> > to send in a phrase and get back a collection of keywords if
> > possible.
> > 
> > I am thinking of using an intermediary solution before moving fully
> > to Lucene.  I don't have time to spend a month making a carefully
> > tested, administratable Lucene solution for my site yet, but I 
> intend> to do so over time.  Funny thing is the Lucene code likely 
> would only
> > take up a couple hundred of lines, but integration and 
> administration> would take me much more time.
> > 
> > In the meantime, I am thinking I could use perhaps Lucene 
> steming and
> > parsing of words, then stick each search word along with the
> > associated primary key in an indexed MySql table.   Each record I
> > would need to do this to is small with maybe only average 15 userful
> > words.   I would be able to have an in-database solution though
> > ranking, etc would not exist.   This is better then the exact word
> > searching i have currently which is really bad.
> > 
> > By the way, MySql 4.1.1 has some Lucene type handling, but it too
> > does not have stemming and I am sure it is very slow compaired to
> > Lucene.   Cpanel is still stuck on MySql 4.0.* so many people would
> > not have access to even this basic ability in production systems for
> > some time yet.
> > 
> > JohnE
> > 
> > 
> > 
> > -
> 
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> ---
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Considering intermediary solution before Lucene question

2004-11-17 Thread Otis Gospodnetic
Yes, you can use just the Analysis part.  For instance, I use this for
http://www.simpy.com and I believe we also have this in the Lucene book
as part of the source code package:

/**
 * Gets Tokens extracted from the given text, using the specified
Analyzer.
 *
 * @param analyzer the Analyzer to use
 * @param text the text to analyze
 * @param field the field to pass to the Analyzer for tokenization
 * @return an array of Tokens
 * @exception IOException if an error occurs
 */
public static Token[] getTokens(Analyzer analyzer, String text,
String field)
throws IOException
{
TokenStream stream = analyzer.tokenStream(field, new
StringReader(text));
ArrayList tokenList = new ArrayList();
while (true) {
Token token = stream.next();
if (token == null)
break;
tokenList.add(token);
}
return (Token[]) tokenList.toArray(new Token[0]);
}

Otis

--- [EMAIL PROTECTED] wrote:

> 
> Is there a way to use Lucene stemming and stop word removal without
> using the rest of the tool?   I am downloading the code now, but I
> imagine the answer might be deeply burried.  I would like to be able
> to send in a phrase and get back a collection of keywords if
> possible.
> 
> I am thinking of using an intermediary solution before moving fully
> to Lucene.  I don't have time to spend a month making a carefully
> tested, administratable Lucene solution for my site yet, but I intend
> to do so over time.  Funny thing is the Lucene code likely would only
> take up a couple hundred of lines, but integration and administration
> would take me much more time.
> 
> In the meantime, I am thinking I could use perhaps Lucene steming and
> parsing of words, then stick each search word along with the
> associated primary key in an indexed MySql table.   Each record I
> would need to do this to is small with maybe only average 15 userful
> words.   I would be able to have an in-database solution though
> ranking, etc would not exist.   This is better then the exact word
> searching i have currently which is really bad.
> 
> By the way, MySql 4.1.1 has some Lucene type handling, but it too
> does not have stemming and I am sure it is very slow compaired to
> Lucene.   Cpanel is still stuck on MySql 4.0.* so many people would
> not have access to even this basic ability in production systems for
> some time yet.
> 
> JohnE
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Considering intermediary solution before Lucene question

2004-11-17 Thread jeichels

Is there a way to use Lucene stemming and stop word removal without using the 
rest of the tool?   I am downloading the code now, but I imagine the answer 
might be deeply burried.  I would like to be able to send in a phrase and get 
back a collection of keywords if possible.

I am thinking of using an intermediary solution before moving fully to Lucene.  
I don't have time to spend a month making a carefully tested, administratable 
Lucene solution for my site yet, but I intend to do so over time.  Funny thing 
is the Lucene code likely would only take up a couple hundred of lines, but 
integration and administration would take me much more time.

In the meantime, I am thinking I could use perhaps Lucene steming and parsing 
of words, then stick each search word along with the associated primary key in 
an indexed MySql table.   Each record I would need to do this to is small with 
maybe only average 15 userful words.   I would be able to have an in-database 
solution though ranking, etc would not exist.   This is better then the exact 
word searching i have currently which is really bad.

By the way, MySql 4.1.1 has some Lucene type handling, but it too does not have 
stemming and I am sure it is very slow compaired to Lucene.   Cpanel is still 
stuck on MySql 4.0.* so many people would not have access to even this basic 
ability in production systems for some time yet.

JohnE



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]