[jira] [Resolved] (CODEC-199) Bug in HW rule in Soundex
[ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved CODEC-199. Resolution: Fixed Fixed by: URL: http://svn.apache.org/viewvc?rev=1789764=rev Log: CODEC-199 Bug in HW rule in Soundex Revert to a fix which does not entail change to public API Modified: commons/proper/codec/trunk/src/changes/changes.xml commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/language/Soundex.java > Bug in HW rule in Soundex > - > > Key: CODEC-199 > URL: https://issues.apache.org/jira/browse/CODEC-199 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Yossi Tamari > Fix For: 1.11 > > Attachments: better.patch, soundex.patch > > > The Soundex algorithm says that if two characters that map to the same code > are separated by H or W, the second one is not encoded. > However, in the implementation (in Soundex.getMappingCode() line 191), a > character that is preceded by two characters that are either H or W, is not > encoded, regardless of what the last consonant was. > Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex
[ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951828#comment-15951828 ] Sebb commented on CODEC-199: I think it makes sense to split this issue into two parts. 1) fixing the bug in the American Soundex algorithm implementation That is the scope of this issue, CODEC-199 2) enhancing the class to provide support for other variants, i.e. Simplified Soundex and the Genealogy variant (whatever its name is). That will now be dealt with under CODEC-233. > Bug in HW rule in Soundex > - > > Key: CODEC-199 > URL: https://issues.apache.org/jira/browse/CODEC-199 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Yossi Tamari > Fix For: 1.11 > > Attachments: better.patch, soundex.patch > > > The Soundex algorithm says that if two characters that map to the same code > are separated by H or W, the second one is not encoded. > However, in the implementation (in Soundex.getMappingCode() line 191), a > character that is preceded by two characters that are either H or W, is not > encoded, regardless of what the last consonant was. > Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CODEC-233) Soundex should support more algorithm variants
Sebb created CODEC-233: -- Summary: Soundex should support more algorithm variants Key: CODEC-233 URL: https://issues.apache.org/jira/browse/CODEC-233 Project: Commons Codec Issue Type: New Feature Reporter: Sebb The existing Soundex class was designed around the American Soundex algorithm. Whilst it offers some flexibility with the mapping of letters to Soundex numbers, the list of the 'silent' letters H and W is built-in to the code. There is no provision for changing the set of silent (ignored) letters. There is also no way to change the designation of HW from silent into consonant separator - i.e. code 0 - because that is how HW are currently encoded in the public API. To fix this, the mapping can be enhanced to support an extra code for 'silent' letters. A mapping which includes such a code did not have defined behaviour previously, so can be treated differently - there is no need to assume HW are silent. This allows for the definition of alternative silent letters. It can also be used to map HW as code '0' - as long as there is at least one 'silent' code. If there are no actual silent letters in the algorithm variant, then the code can be appended to the end of the mapping. This will not affect processing as only letters A-Z are passed to the method. An alternative would be to introduce yet another code as an alias for '0', and only treat HW as silent if they have code '0'. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (LANG-1167) Add null filter to ReflectionToStringBuilder
[ https://issues.apache.org/jira/browse/LANG-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951546#comment-15951546 ] ASF GitHub Bot commented on LANG-1167: -- Github user chtompki commented on the issue: https://github.com/apache/commons-lang/pull/259 @PascalSchumacher - Do you think we should take this approach, or the approach you went after in [LANG-1164](https://issues.apache.org/jira/browse/LANG-1164). > Add null filter to ReflectionToStringBuilder > > > Key: LANG-1167 > URL: https://issues.apache.org/jira/browse/LANG-1167 > Project: Commons Lang > Issue Type: Improvement >Reporter: Gregory Bonk >Assignee: Rob Tompkins > > I know I can filter out class level fields with accept but it would be nice > if there could be an additional configuration where if a field's value is > null then it would be skipped. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] commons-lang issue #259: LANG-1167: Add null filter to ReflectionToStringBui...
Github user chtompki commented on the issue: https://github.com/apache/commons-lang/pull/259 @PascalSchumacher - Do you think we should take this approach, or the approach you went after in [LANG-1164](https://issues.apache.org/jira/browse/LANG-1164). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex
[ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951090#comment-15951090 ] Sebb commented on CODEC-199: The penwith URL is very useful. Thanks. It states that HW were treated as vowels in the original 'Simplified Soundex'. This difference is not described in the Wikipedia article, and regarded as erroneous in the thoughtco page. The thoughtco page is unhelpful in other ways, e.g. it uses SUTTON as an example of a name starting with a double letter! [A name like LLOYD would be OK] I think there is a solution which will allow for the 'Simplified Soundex' variant as well as the current American Soundex - without compromising existing behaviour or needing to change the public constant. I hope to update the code in the next few days once it has been tested further. == It's wasteful to implement features that are not going to be used, and maintenance is increased. Rather more importantly, unless there are usage examples then creating valid test cases is error prone. That is why I have been stressing the need for use cases. > Bug in HW rule in Soundex > - > > Key: CODEC-199 > URL: https://issues.apache.org/jira/browse/CODEC-199 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Yossi Tamari > Fix For: 1.11 > > Attachments: better.patch, soundex.patch > > > The Soundex algorithm says that if two characters that map to the same code > are separated by H or W, the second one is not encoded. > However, in the implementation (in Soundex.getMappingCode() line 191), a > character that is preceded by two characters that are either H or W, is not > encoded, regardless of what the last consonant was. > Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (LANG-1164) allow ToStringStyle to omitNulls
[ https://issues.apache.org/jira/browse/LANG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951037#comment-15951037 ] Rob Tompkins commented on LANG-1164: Hey Pascal -- We've got an open PR for 1167, which is the more global configuration. Do you think it better to handle at this level and then use the ToStringStyle changes to effectively manage 1167 or simply go with the global configuration? > allow ToStringStyle to omitNulls > > > Key: LANG-1164 > URL: https://issues.apache.org/jira/browse/LANG-1164 > Project: Commons Lang > Issue Type: Improvement > Components: lang.builder.* >Reporter: Shaun A Elliott > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (LANG-1167) Add null filter to ReflectionToStringBuilder
[ https://issues.apache.org/jira/browse/LANG-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Tompkins updated LANG-1167: --- Assignee: Rob Tompkins > Add null filter to ReflectionToStringBuilder > > > Key: LANG-1167 > URL: https://issues.apache.org/jira/browse/LANG-1167 > Project: Commons Lang > Issue Type: Improvement >Reporter: Gregory Bonk >Assignee: Rob Tompkins > > I know I can filter out class level fields with accept but it would be nice > if there could be an additional configuration where if a field's value is > null then it would be skipped. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUMBERS-16) Set tolerances in ComplexTest to zero?
[ https://issues.apache.org/jira/browse/NUMBERS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950762#comment-15950762 ] Gilles commented on NUMBERS-16: --- bq. Any reason not to set these tolerances to zero? Assuming that the internal representation is comprised of the real and imaginary parts, I'd answer "no". > Set tolerances in ComplexTest to zero? > -- > > Key: NUMBERS-16 > URL: https://issues.apache.org/jira/browse/NUMBERS-16 > Project: Commons Numbers > Issue Type: Improvement >Reporter: Eric Barnhill >Priority: Minor > > Considering the following JUnit test: > public void testConstructor() { > Complex z = new Complex(3.0, 4.0); > Assert.assertEquals(3.0, z.getReal(), 1.0e-5); > Assert.assertEquals(4.0, z.getImaginary(), 1.0e-5); > } > That tolerance seems pretty high to me -- I sure would not want to work with > a method that was not stabler than that, nor can I see a reason that the > numbers would be exactly the same to the last place. > Any reason not to set these tolerances to zero? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (NUMBERS-16) Set tolerances in ComplexTest to zero?
Eric Barnhill created NUMBERS-16: Summary: Set tolerances in ComplexTest to zero? Key: NUMBERS-16 URL: https://issues.apache.org/jira/browse/NUMBERS-16 Project: Commons Numbers Issue Type: Improvement Reporter: Eric Barnhill Priority: Minor Considering the following JUnit test: public void testConstructor() { Complex z = new Complex(3.0, 4.0); Assert.assertEquals(3.0, z.getReal(), 1.0e-5); Assert.assertEquals(4.0, z.getImaginary(), 1.0e-5); } That tolerance seems pretty high to me -- I sure would not want to work with a method that was not stabler than that, nor can I see a reason that the numbers would be exactly the same to the last place. Any reason not to set these tolerances to zero? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (NUMBERS-14) call to hashCode() in Complex
[ https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Barnhill closed NUMBERS-14. Resolution: Fixed > call to hashCode() in Complex > - > > Key: NUMBERS-14 > URL: https://issues.apache.org/jira/browse/NUMBERS-14 > Project: Commons Numbers > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Eric Barnhill >Priority: Trivial > Labels: easyfix > > There is a call in Complex() to hashCode(). This calls a method > Precision.hash() . Looks like Precision.hash() is gone. I want to confirm > that this method is now gone from commons-numbers Precision() and > consequently I should remove this method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex
[ https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950746#comment-15950746 ] Eric Barnhill commented on NUMBERS-14: -- All right, I add the private method then as per the original suggestion. > call to hashCode() in Complex > - > > Key: NUMBERS-14 > URL: https://issues.apache.org/jira/browse/NUMBERS-14 > Project: Commons Numbers > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Eric Barnhill >Priority: Trivial > Labels: easyfix > > There is a call in Complex() to hashCode(). This calls a method > Precision.hash() . Looks like Precision.hash() is gone. I want to confirm > that this method is now gone from commons-numbers Precision() and > consequently I should remove this method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex
[ https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950685#comment-15950685 ] Gilles commented on NUMBERS-14: --- It's not the same method. To use the one you refer to, one needs to instantiate a {{Double}}. However small, there will be a performance penalty (that's the reason a static method was defined in the CM utilities). > call to hashCode() in Complex > - > > Key: NUMBERS-14 > URL: https://issues.apache.org/jira/browse/NUMBERS-14 > Project: Commons Numbers > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Eric Barnhill >Priority: Trivial > Labels: easyfix > > There is a call in Complex() to hashCode(). This calls a method > Precision.hash() . Looks like Precision.hash() is gone. I want to confirm > that this method is now gone from commons-numbers Precision() and > consequently I should remove this method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex
[ https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950673#comment-15950673 ] Eric Barnhill commented on NUMBERS-14: -- I think I see it here in [Java 7|https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#hashCode()], > call to hashCode() in Complex > - > > Key: NUMBERS-14 > URL: https://issues.apache.org/jira/browse/NUMBERS-14 > Project: Commons Numbers > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Eric Barnhill >Priority: Trivial > Labels: easyfix > > There is a call in Complex() to hashCode(). This calls a method > Precision.hash() . Looks like Precision.hash() is gone. I want to confirm > that this method is now gone from commons-numbers Precision() and > consequently I should remove this method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex
[ https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950669#comment-15950669 ] Gilles commented on NUMBERS-14: --- I guess you mean [Double.hashCode(double)|https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#hashCode-double-], which exists only since Java 8. If so, we should just be sure that {{Commons Numbers}} will not target earlier versions of Java; something to be acknowledged on the "dev" ML. > call to hashCode() in Complex > - > > Key: NUMBERS-14 > URL: https://issues.apache.org/jira/browse/NUMBERS-14 > Project: Commons Numbers > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Eric Barnhill >Priority: Trivial > Labels: easyfix > > There is a call in Complex() to hashCode(). This calls a method > Precision.hash() . Looks like Precision.hash() is gone. I want to confirm > that this method is now gone from commons-numbers Precision() and > consequently I should remove this method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex
[ https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950657#comment-15950657 ] Eric Barnhill commented on NUMBERS-14: -- I just call Double.hashCode() instead. Barring any objections I will close the ticket. > call to hashCode() in Complex > - > > Key: NUMBERS-14 > URL: https://issues.apache.org/jira/browse/NUMBERS-14 > Project: Commons Numbers > Issue Type: Improvement >Affects Versions: 1.0 >Reporter: Eric Barnhill >Priority: Trivial > Labels: easyfix > > There is a call in Complex() to hashCode(). This calls a method > Precision.hash() . Looks like Precision.hash() is gone. I want to confirm > that this method is now gone from commons-numbers Precision() and > consequently I should remove this method. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex
[ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950537#comment-15950537 ] Yossi Tamari commented on CODEC-199: Examples: [https://www.thoughtco.com/soundex-explained-us-census-1421773] {quote} For a period of time, especially between for the 1880, 1900 and 1910 census, Soundex coders sometimes erroneously treated H and W as separators, like the vowels, and assigned a code to both the S and C. This would make the code for ASHCRAFT A226, instead of A261. Basically, any surname with the letter H or W as a separator between adjacent letters having the same code should be coded both ways. {quote} [http://west-penwith.org.uk/misc/soundex.htm] > Bug in HW rule in Soundex > - > > Key: CODEC-199 > URL: https://issues.apache.org/jira/browse/CODEC-199 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Yossi Tamari > Fix For: 1.11 > > Attachments: better.patch, soundex.patch > > > The Soundex algorithm says that if two characters that map to the same code > are separated by H or W, the second one is not encoded. > However, in the implementation (in Soundex.getMappingCode() line 191), a > character that is preceded by two characters that are either H or W, is not > encoded, regardless of what the last consonant was. > Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex
[ https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950496#comment-15950496 ] Sebb commented on CODEC-199: bq. this is exactly a use case for where H and W are treated as vowels No, it's not. Vowels have the code '0' and are used to separate consonants with the same non-zero code. In this case, vowels are completely ignored, i.e. are treated like HW. Try the following test: {code} Assert.assertEquals("L150", s.encode("Lippmann")); {code} This fails with the current code (generates "L155") unless you set A to behave like HW, i.e. vowels need to be set to '#' (silent). Try it and see. Step 3 of the Wikipedia definition says "two letters with the same number separated by 'h' or 'w' are coded as a single number, whereas such letters separated by a vowel are coded twice". However the Genealogy defintion implies that such letters are coded as a single number for HW *and* the vowels. The output from the Wikipedia definition allows repeated digits. The Genealogy definition explicitly does not. That means it does not have any Wiki-style vowels; for the Genealogy definition vowels + HW are all silent. I have yet to see a definiton that requires HW to be treated as a vowel rather than silent (or a consonant). If you find any examples, please provide links (and test cases if possible). > Bug in HW rule in Soundex > - > > Key: CODEC-199 > URL: https://issues.apache.org/jira/browse/CODEC-199 > Project: Commons Codec > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Yossi Tamari > Fix For: 1.11 > > Attachments: better.patch, soundex.patch > > > The Soundex algorithm says that if two characters that map to the same code > are separated by H or W, the second one is not encoded. > However, in the implementation (in Soundex.getMappingCode() line 191), a > character that is preceded by two characters that are either H or W, is not > encoded, regardless of what the last consonant was. > Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex -- This message was sent by Atlassian JIRA (v6.3.15#6346)