Re: Email Filter using Lucene 3.0

Otis Gospodnetic Fri, 29 Jan 2010 04:50:05 -0800

Hi Jamie,

Could you say more about how it's not working?  No compiling? Run-time 
exceptions?  Doesn't work as expected after you run a unit test for it?



Otis ----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Jamie <ja...@stimulussoft.com>
> To: java-user@lucene.apache.org
> Sent: Fri, January 29, 2010 7:29:13 AM
> Subject: Email Filter using Lucene 3.0
> 
> Hi THere
> 
> In the absence of documentation, I am trying to convert an EmailFilter class 
> to 
> Lucene 3.0. Its not working! Obviously, my understanding of the new token 
> filter 
> mechanism is misguided.
> Can someone in the know help me out for a sec and let me know where I am 
> going 
> wrong. Thanks.
> 
> import org.apache.commons.logging.*;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.TokenFilter;
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.tokenattributes.TermAttribute;
> 
> import java.io.IOException;
> import java.io.Serializable;
> import java.util.ArrayList;
> import java.util.Stack;
> 
> /* Many thanks to Michael J. Prichard" for his
> * original the email filter code. It is rewritten. */
> 
> public class EmailFilter extends TokenFilter  implements Serializable {
> 
>     public EmailFilter(TokenStream in) {
>         super(in);
>     }
> 
>     public final boolean incrementToken() throws java.io.IOException {
> 
>         if (!input.incrementToken()) {
>             return false;
>         }
> 
> 
>         TermAttribute termAtt = (TermAttribute) 
> input.getAttribute(TermAttribute.class);
> 
>         char[] buffer = termAtt.termBuffer();
>         final int bufferLength = termAtt.termLength();
>         String emailAddress = new String(buffer, 0,bufferLength);
>         emailAddress = emailAddress.replaceAll("<", "");
>         emailAddress = emailAddress.replaceAll(">", "");
>         emailAddress = emailAddress.replaceAll("\"", "");
> 
>         String [] parts = extractEmailParts(emailAddress);
>         clearAttributes();
>         for (int i = 0; i < parts.length; i++) {
>             if (parts[i]!=null) {
>                 TermAttribute newTermAttribute = 
> addAttribute(TermAttribute.class);
>                 newTermAttribute.setTermBuffer(parts[i]);
>                 newTermAttribute.setTermLength(parts[i].length());
>             }
>         }
>         return true;
>     }
> 
>     private String[] extractWhitespaceParts(String email) {
>         String[] whitespaceParts = email.split(" ");
>         ArrayListpartsList = new ArrayList();
>         for (int i=0; i < whitespaceParts.length; i++) {
>             partsList.add(whitespaceParts[i]);
>         }
>         return whitespaceParts;
>     }
> 
>     private String[] extractEmailParts(String email) {
> 
>         if (email.indexOf('@')==-1)
>             return extractWhitespaceParts(email);
> 
>         ArrayListpartsList = new ArrayList();
> 
>         String[] whitespaceParts = extractWhitespaceParts(email);
> 
>          for (int w=0;w
> 
>              if (whitespaceParts[w].indexOf('@')==-1)
>                  partsList.add(whitespaceParts[w]);
>              else {
>                  partsList.add(whitespaceParts[w]);
>                  String[] splitOnAmpersand = whitespaceParts[w].split("@");
>                  try {
>                      partsList.add(splitOnAmpersand[0]);
>                      partsList.add(splitOnAmpersand[1]);
>                  } catch (ArrayIndexOutOfBoundsException ae) {}
> 
>                 if (splitOnAmpersand.length > 0) {
>                     String[] splitOnDot = splitOnAmpersand[0].split("\\.");
>                      for (int i=0; i < splitOnDot.length; i++) {
>                          partsList.add(splitOnDot[i]);
>                      }
>                 }
>                 if (splitOnAmpersand.length > 1) {
>                     String[] splitOnDot = splitOnAmpersand[1].split("\\.");
>                     for (int i=0; i < splitOnDot.length; i++) {
>                         partsList.add(splitOnDot[i]);
>                     }
> 
>                     if (splitOnDot.length > 2) {
>                         String domain = splitOnDot[splitOnDot.length-2] + "." 
> + 
> splitOnDot[splitOnDot.length-1];
>                         partsList.add(domain);
>                     }
>                 }
>              }
>          }
>         return partsList.toArray(new String[0]);
>     }
> 
> }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Email Filter using Lucene 3.0

Reply via email to