RE: [codec] Soudex issue with accented character.
The only better solution I can think of is to map the characters into their non-accented equivalent. While I think it's important to state that the default Soundex implementation is for English words, it would be nice to accommodate words with accented characters. My bigger concern is that the behavior is inconsistent between Soundex, Metaphone, DoubleMetaphone. Soundex will not throw an IllegalArgumentException, whereas Metaphone passes through the bad character. DoubleMetaphone has support for two accented characters, C with Cedilla and N with tilde. To the extent that I think the language codecs should be swappable components, it's a good idea for the support to be consistent. To that end, a String passed to any of the codecs should either throw an exception for all or none. Just my 2 cents. -Original Message- From: Gary Gregory [mailto:[EMAIL PROTECTED] Sent: Sunday, May 23, 2004 8:37 PM To: Jakarta Commons Developers List Subject: [codec] Soudex issue with accented character. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=29080 Currently, ö or é in a String causes Soundex to throw an ArrayIndexOutOfBoundsException. We can either: (1) Throw a better Exception, like IllegalArgumentException: Only 'plain' letter are allowed. Or: (2) Ignore unmapped characters. This would work for ö and é since vowels are ignored but this could cause bad encoding values for other chars like ç. AFAIK, you cannot ask if a character is a vowel or not. Thoughts? Gary - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [codec] Soudex issue with accented character.
That's not the behavior either in the latest [codec] release or HEAD. Can you clarify where this 'standard' behavior you describe is documented? Neither the National Archives documentation nor the NIST source code contain this behavior. -Original Message- From: C. Scott Ananian [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 02, 2004 11:02 AM To: Jakarta Commons Developers List Subject: RE: [codec] Soudex issue with accented character. On Wed, 2 Jun 2004, Edelson, Justin wrote: The only better solution I can think of is to map the characters into their non-accented equivalent. While I think it's important to state that the default Soundex implementation is for English words, it would be nice to accommodate words with accented characters. I believe the 'standard' behavior is just to drop the unaccented character from the soundex encoding. The soundex algorithm typically already does this for other 'quiet' characters. (Note that two words with accented characters will still match correctly even if the accented characters are dropped.) --scott blowfish Rijndael Philadelphia MI6 operation Washington SSBN 731 UKUSA spy chemical agent Pakistan Bush Waihopai Minister domestic disruption ( http://cscott.net/ ) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [lang] Equalator?
The one thing I can do with an Equalator that I don't see how to do (in a meaningful way) with Comparators is chain them. I've implemented a ChainedOrEqualator that contains a list of Equalators. If one returns true, then the ChainedOrEqualator returns true. Likewise, there's a ChainedAndEqualator. Am I missing a way to do this with Comparators? -Original Message- From: Chuck Daniels [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 11, 2004 8:55 PM To: Jakarta Commons Developers List Subject: RE: [lang] Equalator? I suggest you simply implement the Comparator interface since it is a superset of your suggested Equalator interface. Therefore, I would implement your MetaphoneEqualator as EncodingComparator. The class name prefix is changed from Metaphone to Encoding since you are not actually comparing Metaphones, but rather two encodings produced by a single Metaphone. More generally, you are actually comparing two encodings produced by a single Encoder: ... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[email] is email in bugzilla?
I'm trying to find where to submit a patch for email. It doesn't seem to be in bugzilla. In any case, I'd like to add a setHostPort(int) method to the o.a.c.m.Email class. Here's the patch: Index: Email.java === retrieving revision 1.15 diff -u -r1.15 Email.java --- Email.java 19 Feb 2004 22:38:09 - 1.15 +++ Email.java 11 May 2004 16:11:03 - @@ -58,6 +58,7 @@ public static final String CONTENT_TYPE = content.type; public static final String MAIL_HOST = mail.host; +public static final String MAIL_PORT = mail.smtp.port; public static final String MAIL_SMTP_FROM = mail.smtp.from; public static final String MAIL_SMTP_AUTH = mail.smtp.auth; public static final String MAIL_TRANSPORT_PROTOCOL = mail.transport.protocol; @@ -110,6 +111,11 @@ * to get property from system.properties. If still null, quit */ private String hostName = null; + +/** + * The port of the mail server with which to connect. + */ +private int hostPort = 25; /** List of to email adresses */ private ArrayList toList = null; @@ -258,6 +264,7 @@ properties.setProperty(MAIL_HOST, hostName); properties.setProperty(MAIL_DEBUG,new Boolean(this.debug).toString()); +properties.setProperty(MAIL_PORT, +hostPort); if (this.authenticator!= null) { @@ -753,5 +760,15 @@ (InternetAddress[]) aList.toArray(new InternetAddress[0]); return ia; +} + +/** + * Set the port of the outgoing mail server + * + * @param aHostPort + */ +public void setHostPort(int aHostPort) +{ +this.hostPort = aHostPort; } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[lang] Equalator?
I'm writing a few classes that currently implement Comparator, but I really don't care about comparisons - I just want to use an object to test equality, ergo Equalator. Does such an interface exist somewhere in lang (I can't find anything similar). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [lang] Equalator?
That could be one implementation of the Equalator interface. Another trivial impelmetantion might do return (a == b); and be called IdentityEqualator The specific case I'm working on is using Metaphone (from codec). The impelemtatnion of MetaphoneEqualator looks like this: ... private Metaphone mEncoder = new Metaphone(); ... public boolean equals(Object a, Object b) { try { Object encoded0 = mEncoder.encode(a); Object encoded1 = mEncoder.encode(b); if (encoded0.equals(encoded1)) { return true; } else { return false; } } catch (EncoderException exception) { return false; } } -Original Message- From: Hookom, Jacob [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 11, 2004 2:47 PM To: 'Jakarta Commons Developers List' Subject: RE: [lang] Equalator? There's this method on Object called equals... I suppose you could write a single object called Equalator that does: Public Boolean equals(object a, object b) { Return a.equals(b); } -Original Message- From: Edelson, Justin [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 11, 2004 1:25 PM To: Jakarta Commons Developers List Subject: [lang] Equalator? I'm writing a few classes that currently implement Comparator, but I really don't care about comparisons - I just want to use an object to test equality, ergo Equalator. Does such an interface exist somewhere in lang (I can't find anything similar). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [digester] mixed content update
Simon - Sorry I haven't had a chance to respond to your email. I was actually more concentrating on answering your question about use-cases, but it sounds like I don't need to sell this need as much as I thought I did (at least for now). If someone (eg Justin) is keen to work on this now, we could potentially get it in the next release. Otherwise I suggest this could go on the to-do list for post-1.6. I don't know if keen is the right word, but I'm committed to digesting mixed content XML for an application now, i.e. I need to solve this problem one way or another (or use some other XML ingesting mechanism, which I don't want to do for purely selfish reasons). At this point, I'm planning on an implementation as a subclass. Whether this subclass is accepted and put into the Digester release is dependent upon a variety of factors, but my intention is to develop a solution either way. I have sign-off on contributing modifications back to Apache, so that's not an issue. Of course, I have a high level of respect for the members of this list (not blowing smoke, I swear) so I'm very interested in crafting the mixed content solution based on any feedback Commons developers may have. Just to be clear, is there a timeframe for 1.6? Justin, if you have any arguments to back your original design, please speak up! Or if you are willing to try implementing some other approach that doesn't involve @text patterns, please speak up too. Let's separate the two issues in my original design - Using a special text designator and the specific designator used. To be honest, @text was really just a placeholder on my end. The only requirement I have for this designator is that it be an illegal XML element name so as to ensure that there's no conflict (i.e. if the designator was just text, that would pose an issue if you had an element named text). My core argument in favor of using a specific designator is that it explicitly indicates that the pattern (i.e. /element/@text) uses different functionality then the traditional Digester method. This is a pretty weak argument, I'll admit. I also feel (and can't prove yet) that this method is better performance-wise because the extra iterations over the list of rules is over a smaller list. As indicated above, I'm very willing to try alternate implementations, including the interface solution you suggested. How do people feel about my initial proposed solution to this (as follows)? My only concern about the interface solution (for lack of a better name) is when you wrote 'for each rule matched by the last call to startElement' - In my original subclassed-version, I had a call to super.startElement(), but in order to do what you've described, I think you'd need to replicate all the code in Digester.startElement() in the subclasses startElement() method. Otherwise, the overridden startElement() in the subclass would have to make an extra call to Rules.match(). I was originally worried about having to maintain the subclass's startElement method to reflect changes in the Digester implementation, thus the call to super.startElement(). Is this too dogmatic? I'm not looking to rehash the cut-and-paste vs. eating our own dog food discussion recently seen in the context of [lang]. I'll have some time later in the week to take a crack at implementing the interface solution. Yet a third implementation that I've been thinking about would be to take the new interface and create some additional interfaces around it - MixedContentRules and MixedContentRuleSet. These object would basically parallel the Rule/Rules/RuleSet interfaces. Within the MixedContentDigester subclass, there'd be a new instance variable called mixedContentRules. In short, the concept is that the classes that implement MixedContentRule would be segregated from the traditional Digester rules. The core reason for this is that I'm concerned about the performance impact of both my original and Simon's solutions. By segregating the rules, I've ensured that a match() call to a MixedContentRules object only searches within MixedContentRules which should lead to better performance. And if you think there are other features that could be added to digester using @-style patterns then that would also be good to mention. I had worked up a use-case for @comment that would allow for comments to be digested (imagine a JavaDoc/Xdoclet-style application that read comments out of a struts config XML file). But then I remembered that SAX ignores comments, so this is a bigger can of worms. I've gone ahead and submitted my test case to Bugzilla (#28068). I can never remember how Bugzilla reacts to XML submitted in it's forms, so I kept that out. I assume that we're all on the same page as to what mixed content means, but I can easily add an example. Thanks for the interest. I was a bit surprised at first that some were so willing to write off Digester as just for configuration files. Justin
RE: [collections] ListOrdereMap vs LinkedMap
ListOrderedMap cannot be instantiated directly. Per the Javadocs, it's a decorator, not a Map implementation. You get an instance of ListOrderedMap by passing an existing Map to ListOrderedMap.decorate(Map). LinkedMap, however, can be instantiated directly and has similar constructors to java.util.HashMap. Hope this helps. Justin -Original Message- From: Torsten Curdt [mailto:[EMAIL PROTECTED] Sent: Monday, March 29, 2004 7:21 AM To: [EMAIL PROTECTED] Subject: [collections] ListOrdereMap vs LinkedMap Can someone please explain the difference between the ListOrderedMap and the LinkedMap? I mean: they basically provide the same functionality, right? cheers -- Torsten - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [digester] mixed content update
I created a subclass of Digester (MixedContentDigester) to do this. Along with a new rule (see below), it passes my simple test case. Would this be useful code to add? I figured creating the subclass would make this easier to integrate as it doesn't break anything. In short, I create a new abstract class called TextRule and a concrete class AddTextRule. When startElement() on the MixedContentDigester is called, before the rules are invoked, a search is done for rules matching match + /@text (was it you, Scott, who threw this out on the 2002 thread? I forgot, but it looked good to me). The AddTextRule's body() method gets called and the bodyText StringBuffer is emptied. Something simlilar happens on the call to endElement(). This is probably not the best explanation, but it does work. The idea of the abstract TextRule class was that you could have different TextRule, not all of which did something like adding. Perhaps it should just be called CallTextMethodRule (a lot of the rule code is closely related to that in CallMethodRule). I did, however, create another rule called AddTrimmedTextRule and was thinking about AddNormalizedTextRule (using JDOM's definition of normalized)... Justin -Original Message- From: Scott Sanders [mailto:[EMAIL PROTECTED] Sent: Tue 3/23/2004 2:23 PM To: 'Jakarta Commons Developers List' Cc: Subject: RE: [digester] mixed content update Justin, Digester is not set up to handle mixed content. I would use something else, or modify Digester to do what you want. Scott (originator of the mixed content thread) -Original Message- From: Edelson, Justin [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 9:18 AM To: Jakarta Commons Developers List Subject: [digester] mixed content update I'm trying to figure out the best way to digest some XML with mixed content, i.e. a b cbeginning text d attr=foo/ ending text/c /b /a Where it's important for beginning text and ending text to be treated separately. I looked through the mailing list archives and found a discussion from early 2002 on this subject. It looks like the net result of that discussion was that, in my example above, the content beginning text ending text is made available by using a CallMethodRule. Has there been any subsequent discussion? I got the sense that the decision really was that mixed content wasn't for Digester in the sense that Digester is targeted to loading configuration files that tend to be either all-attributes or all-body-content (http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED] .apache.orgmsgId=72369). I'll happily give up using Digester to accomplished by mixed-content project and switch to JDOM (or even look at the Avalon Configuration stuff someone mentioned), but I wanted to check with the list before giving up. Thanks, Justin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Request for getLog(Object) method
Title: Request for getLog(Object) method As part of implementing JCL, we need to be able to obtain Log objects based on the referring object, basically using a custom Log and Factory implementations to inspect an object to determine the appropriate name to use for logging. I can clarify this further if necessary. In order to do this, I'm proposing these changes: * a new overloaded version of the static method getLog that accepts an Object be added to LogFactory * a new overloaded version of the instance method getInstance that accepts an Object be added to LogFactory * a new default implementation of getInstance(Object) be added to LogFactoryImpl that uses the object's class name as the log name. Attached is a diff file representing these changes. This is my first time doing this, so please let me know (in a nice way) if I've done something incorrect. I did not post this directly to bugzilla under the theory that there may need to be some discussion of this request. Justin Edelson Software Developer MTVi object_method.diff - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Request for getLog(Object) method
As part of implementing JCL, we need to be able to obtain Log objects based on the referring object, basically using a custom Log and Factory implementations to inspect an object to determine the appropriate name to use for logging. I can clarify this further if necessary. In order to do this, I'm proposing these changes: * a new overloaded version of the static method getLog that accepts an Object be added to LogFactory * a new overloaded version of the instance method getInstance that accepts an Object be added to LogFactory * a new default implementation of getInstance(Object) be added to LogFactoryImpl that uses the object's class name as the log name. Attached is a diff file representing these changes. This is my first time doing this, so please let me know (in a nice way) if I've done something incorrect. I did not post this directly to bugzilla under the theory that there may need to be some discussion of this request. Justin Edelson Software Developer MTVi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]