RE: [codec] Soudex issue with accented character.

2004-06-02 Thread Edelson, Justin
The only better solution I can think of is to map the characters into their 
non-accented equivalent. While I think it's important to state that the default 
Soundex implementation is for English words, it would be nice to accommodate words 
with accented characters.

My bigger concern is that the behavior is inconsistent between Soundex, Metaphone,  
DoubleMetaphone. Soundex will not throw an IllegalArgumentException, whereas Metaphone 
passes through the bad character. DoubleMetaphone has support for two accented 
characters, C with Cedilla and N with tilde.

To the extent that I think the language codecs should be swappable components, it's a 
good idea for the support to be consistent. To that end, a String passed to any of the 
codecs should either throw an exception for all or none.

Just my 2 cents.
 

-Original Message-
From: Gary Gregory [mailto:[EMAIL PROTECTED] 
Sent: Sunday, May 23, 2004 8:37 PM
To: Jakarta Commons Developers List
Subject: [codec] Soudex issue with accented character.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=29080

Currently, ö or é in a String causes Soundex to throw an 
ArrayIndexOutOfBoundsException.

We can either:

(1) Throw a better Exception, like IllegalArgumentException: Only 'plain' letter are 
allowed.

Or:

(2) Ignore unmapped characters. This would work for ö and é since vowels are 
ignored but this could cause bad encoding values for other chars like ç.

AFAIK, you cannot ask if a character is a vowel or not.

Thoughts?

Gary


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [codec] Soudex issue with accented character.

2004-06-02 Thread Edelson, Justin
That's not the behavior either in the latest [codec] release or HEAD.
Can you clarify where this 'standard' behavior you describe is
documented? Neither the National Archives documentation nor the NIST
source code contain this behavior.

 -Original Message-
 From: C. Scott Ananian [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, June 02, 2004 11:02 AM
 To: Jakarta Commons Developers List
 Subject: RE: [codec] Soudex issue with accented character.
 
 
 On Wed, 2 Jun 2004, Edelson, Justin wrote:
 
  The only better solution I can think of is to map the characters 
  into their non-accented equivalent. While I think it's important to 
  state that the default Soundex implementation is for 
 English words, it 
  would be nice to accommodate words with accented characters.
 
 I believe the 'standard' behavior is just to drop the 
 unaccented character from the soundex encoding.  The soundex 
 algorithm typically already does this for other 'quiet' 
 characters. (Note that two words with accented characters 
 will still match correctly even if the accented characters 
 are dropped.)  --scott
 
 blowfish Rijndael Philadelphia MI6 operation Washington SSBN 
 731 UKUSA spy chemical agent Pakistan Bush Waihopai Minister 
 domestic disruption
  ( http://cscott.net/ )
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [lang] Equalator?

2004-05-18 Thread Edelson, Justin
The one thing I can do with an Equalator that I don't see how to do (in
a meaningful way) with Comparators is chain them. I've implemented a
ChainedOrEqualator that contains a list of Equalators. If one returns
true, then the ChainedOrEqualator returns true. Likewise, there's a
ChainedAndEqualator.

Am I missing a way to do this with Comparators?

-Original Message-
From: Chuck Daniels [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 11, 2004 8:55 PM
To: Jakarta Commons Developers List
Subject: RE: [lang] Equalator?


I suggest you simply implement the Comparator interface since it is a
superset of your suggested Equalator interface.  Therefore, I would
implement your MetaphoneEqualator as EncodingComparator.  The class name
prefix is changed from Metaphone to Encoding since you are not actually
comparing Metaphones, but rather two encodings produced by a single
Metaphone.  More generally, you are actually comparing two encodings
produced by a single Encoder:

...


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[email] is email in bugzilla?

2004-05-11 Thread Edelson, Justin
I'm trying to find where to submit a patch for email. It doesn't seem to
be in bugzilla.

In any case, I'd like to add a setHostPort(int) method to the
o.a.c.m.Email class. Here's the patch:
Index: Email.java
===
retrieving revision 1.15
diff -u -r1.15 Email.java
--- Email.java  19 Feb 2004 22:38:09 -  1.15
+++ Email.java  11 May 2004 16:11:03 -
@@ -58,6 +58,7 @@
 public static final String CONTENT_TYPE = content.type;
 
 public static final String MAIL_HOST = mail.host;
+public static final String MAIL_PORT = mail.smtp.port;
 public static final String MAIL_SMTP_FROM = mail.smtp.from;
 public static final String MAIL_SMTP_AUTH = mail.smtp.auth;
 public static final String MAIL_TRANSPORT_PROTOCOL =
mail.transport.protocol;
@@ -110,6 +111,11 @@
 * to get property from system.properties. If still null, quit
 */
 private String hostName = null;
+
+/**
+ * The port of the mail server with which to connect.
+ */
+private int hostPort = 25;
 
 /** List of to email adresses */
 private ArrayList toList = null;
@@ -258,6 +264,7 @@
 
 properties.setProperty(MAIL_HOST, hostName);
 properties.setProperty(MAIL_DEBUG,new
Boolean(this.debug).toString());
+properties.setProperty(MAIL_PORT, +hostPort);
 
 if (this.authenticator!= null)
 {
@@ -753,5 +760,15 @@
 (InternetAddress[]) aList.toArray(new InternetAddress[0]);
 
 return ia;
+}
+
+/**
+ * Set the port of the outgoing mail server
+ *
+ * @param   aHostPort
+ */
+public void setHostPort(int aHostPort)
+{
+this.hostPort = aHostPort;
 }
 }

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[lang] Equalator?

2004-05-11 Thread Edelson, Justin
I'm writing a few classes that currently implement Comparator, but I
really don't care about comparisons - I just want to use an object to
test equality, ergo Equalator. Does such an interface exist somewhere in
lang (I can't find anything similar).

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [lang] Equalator?

2004-05-11 Thread Edelson, Justin
That could be one implementation of the Equalator interface. Another
trivial impelmetantion might do return (a == b); and be called
IdentityEqualator

The specific case I'm working on is using Metaphone (from codec). The
impelemtatnion of MetaphoneEqualator looks like this:

...
private Metaphone mEncoder = new Metaphone();
...
public boolean equals(Object a, Object b) {
try {
Object encoded0 = mEncoder.encode(a);
Object encoded1 = mEncoder.encode(b);
if (encoded0.equals(encoded1)) {
return true;
} else {
return false;
}
} catch (EncoderException exception) {
return false;
}
}

-Original Message-
From: Hookom, Jacob [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 11, 2004 2:47 PM
To: 'Jakarta Commons Developers List'
Subject: RE: [lang] Equalator?


There's this method on Object called equals... I suppose you could
write a single object called Equalator that does:

Public Boolean equals(object a, object b) {
Return a.equals(b);
}

-Original Message-
From: Edelson, Justin [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 11, 2004 1:25 PM
To: Jakarta Commons Developers List
Subject: [lang] Equalator?

I'm writing a few classes that currently implement Comparator, but I
really don't care about comparisons - I just want to use an object to
test equality, ergo Equalator. Does such an interface exist somewhere in
lang (I can't find anything similar).

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [digester] mixed content update

2004-03-30 Thread Edelson, Justin
Simon - Sorry I haven't had a chance to respond to your email. I was
actually more concentrating on answering your question about use-cases,
but it sounds like I don't need to sell this need as much as I thought I
did (at least for now).

 If someone (eg Justin) is keen to work on this now, we could
potentially get it in the next release. Otherwise I suggest this could
go on the to-do list for post-1.6.

I don't know if keen is the right word, but I'm committed to
digesting mixed content XML for an application now, i.e. I need to
solve this problem one way or another (or use some other XML ingesting
mechanism, which I don't want to do for purely selfish reasons). At this
point, I'm planning on an implementation as a subclass. Whether this
subclass is accepted and put into the Digester release is dependent upon
a variety of factors, but my intention is to develop a solution either
way. I have sign-off on contributing modifications back to Apache, so
that's not an issue.

Of course, I have a high level of respect for the members of this list
(not blowing smoke, I swear) so I'm very interested in crafting the
mixed content solution based on any feedback Commons developers may
have.

Just to be clear, is there a timeframe for 1.6?

 Justin, if you have any arguments to back your original design, please
speak up! Or if you are willing to try implementing some other approach
that doesn't involve @text patterns, please
 speak up too.

Let's separate the two issues in my original design - Using a special
text designator and the specific designator used. To be honest, @text
was really just a placeholder on my end. The only requirement I have for
this designator is that it be an illegal XML element name so as to
ensure that there's no conflict (i.e. if the designator was just text,
that would pose an issue if you had an element named text). My core
argument in favor of using a specific designator is that it explicitly
indicates that the pattern (i.e. /element/@text) uses different
functionality then the traditional Digester method. This is a pretty
weak argument, I'll admit. I also feel (and can't prove yet) that this
method is better performance-wise because the extra iterations over the
list of rules is over a smaller list.

As indicated above, I'm very willing to try alternate implementations,
including the interface solution you suggested.

 How do people feel about my initial proposed solution to this (as
follows)?

My only concern about the interface solution (for lack of a better name)
is when you wrote 'for each rule matched by the last call to
startElement' - In my original subclassed-version, I had a call to
super.startElement(), but in order to do what you've described, I think
you'd need to replicate all the code in Digester.startElement() in the
subclasses startElement() method. Otherwise, the overridden
startElement() in the subclass would have to make an extra call to
Rules.match(). I was originally worried about having to maintain the
subclass's startElement method to reflect changes in the Digester
implementation, thus the call to super.startElement(). Is this too
dogmatic? I'm not looking to rehash the cut-and-paste vs. eating our
own dog food discussion recently seen in the context of [lang].

I'll have some time later in the week to take a crack at implementing
the interface solution.

Yet a third implementation that I've been thinking about would be to
take the new interface and create some additional interfaces around it -
MixedContentRules and MixedContentRuleSet. These object would basically
parallel the Rule/Rules/RuleSet interfaces. Within the
MixedContentDigester subclass, there'd be a new instance variable called
mixedContentRules. In short, the concept is that the classes that
implement MixedContentRule would be segregated from the traditional
Digester rules. The core reason for this is that I'm concerned about the
performance impact of both my original and Simon's solutions. By
segregating the rules, I've ensured that a match() call to a
MixedContentRules object only searches within MixedContentRules which
should lead to better performance.

 And if you think there are other features that could be added to
digester using @-style patterns then that would also be good to
mention.
I had worked up a use-case for @comment that would allow for comments to
be digested (imagine a JavaDoc/Xdoclet-style application that read
comments out of a struts config XML file). But then I remembered that
SAX ignores comments, so this is a bigger can of worms.

I've gone ahead and submitted my test case to Bugzilla (#28068). I can
never remember how Bugzilla reacts to XML submitted in it's forms, so I
kept that out. I assume that we're all on the same page as to what
mixed content means, but I can easily add an example.

Thanks for the interest. I was a bit surprised at first that some were
so willing to write off Digester as just for configuration files.

Justin


RE: [collections] ListOrdereMap vs LinkedMap

2004-03-29 Thread Edelson, Justin
ListOrderedMap cannot be instantiated directly. Per the Javadocs, it's a
decorator, not a Map implementation. You get an instance of
ListOrderedMap by passing an existing Map to
ListOrderedMap.decorate(Map).

LinkedMap, however, can be instantiated directly and has similar
constructors to java.util.HashMap.

Hope this helps.

Justin

-Original Message-
From: Torsten Curdt [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 29, 2004 7:21 AM
To: [EMAIL PROTECTED]
Subject: [collections] ListOrdereMap vs LinkedMap


Can someone please explain the difference between the ListOrderedMap and
the LinkedMap? I mean: they basically provide the same functionality,
right?

cheers
--
Torsten



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [digester] mixed content update

2004-03-23 Thread Edelson, Justin
I created a subclass of Digester (MixedContentDigester) to do this. Along with a new 
rule (see below), it passes my simple test case. Would this be useful code to add? I 
figured creating the subclass would make this easier to integrate as it doesn't break 
anything.
 
In short, I create a new abstract class called TextRule and a concrete class 
AddTextRule. When startElement() on the MixedContentDigester is called, before the 
rules are invoked, a search is done for rules matching match + /@text (was it you, 
Scott, who threw this out on the 2002 thread? I forgot, but it looked good to me). The 
AddTextRule's body() method gets called and the bodyText StringBuffer is emptied. 
Something simlilar happens on the call to endElement().
 
This is probably not the best explanation, but it does work.
 
The idea of the abstract TextRule class was that you could have different TextRule, 
not all of which did something like adding. Perhaps it should just be called 
CallTextMethodRule (a lot of the rule code is closely related to that in 
CallMethodRule). I did, however, create another rule called AddTrimmedTextRule and was 
thinking about AddNormalizedTextRule (using JDOM's definition of normalized)...
 
Justin

-Original Message- 
From: Scott Sanders [mailto:[EMAIL PROTECTED] 
Sent: Tue 3/23/2004 2:23 PM 
To: 'Jakarta Commons Developers List' 
Cc: 
Subject: RE: [digester] mixed content update



Justin,

Digester is not set up to handle mixed content.  I would use something else,
or modify Digester to do what you want.

Scott (originator of the mixed content thread)

 -Original Message-
 From: Edelson, Justin [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 23, 2004 9:18 AM
 To: Jakarta Commons Developers List
 Subject: [digester] mixed content update

 I'm trying to figure out the best way to digest some XML with mixed
 content, i.e.

 a
 b
 cbeginning text d attr=foo/ ending text/c
 /b
 /a

 Where it's important for beginning text and ending text to be
 treated separately.

 I looked through the mailing list archives and found a discussion from
 early 2002 on this subject. It looks like the net result of that
 discussion was that, in my example above, the content beginning text
 ending text is made available by using a CallMethodRule.

 Has there been any subsequent discussion? I got the sense that the
 decision really was that mixed content wasn't for Digester in the
 sense that Digester is targeted to loading configuration files that
 tend to be either all-attributes or all-body-content
 (http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]
 .apache.orgmsgId=72369).

 I'll happily give up using Digester to accomplished by mixed-content
 project and switch to JDOM (or even look at the Avalon Configuration
 stuff someone mentioned), but I wanted to check with the list before
 giving up.

 Thanks,
 Justin

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Request for getLog(Object) method

2004-02-13 Thread Edelson, Justin
Title: Request for getLog(Object) method






As part of implementing JCL, we need to be able to obtain Log objects based on the referring object, basically using a custom Log and Factory implementations to inspect an object to determine the appropriate name to use for logging. I can clarify this further if necessary.

In order to do this, I'm proposing these changes:

* a new overloaded version of the static method getLog that accepts an Object be added to LogFactory

* a new overloaded version of the instance method getInstance that accepts an Object be added to LogFactory

* a new default implementation of getInstance(Object) be added to LogFactoryImpl that uses the object's class name as the log name.

Attached is a diff file representing these changes. This is my first time doing this, so please let me know (in a nice way) if I've done something incorrect.

I did not post this directly to bugzilla under the theory that there may need to be some discussion of this request.


Justin Edelson

Software Developer

MTVi


 object_method.diff 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Request for getLog(Object) method

2004-02-13 Thread Edelson, Justin
As part of implementing JCL, we need to be able to obtain Log objects
based on the referring object, basically using a custom Log and Factory
implementations to inspect an object to determine the appropriate name
to use for logging. I can clarify this further if necessary.

In order to do this, I'm proposing these changes: 
* a new overloaded version of the static method getLog that accepts an
Object be added to LogFactory 
* a new overloaded version of the instance method getInstance that
accepts an Object be added to LogFactory 
* a new default implementation of getInstance(Object) be added to
LogFactoryImpl that uses the object's class name as the log name.

Attached is a diff file representing these changes. This is my first
time doing this, so please let me know (in a nice way) if I've done
something incorrect.

I did not post this directly to bugzilla under the theory that there may
need to be some discussion of this request. 

Justin Edelson 
Software Developer 
MTVi 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]