[jira] [Resolved] (CODEC-199) Bug in HW rule in Soundex

2017-03-31 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb resolved CODEC-199.

Resolution: Fixed

Fixed by:

URL: http://svn.apache.org/viewvc?rev=1789764=rev
Log:
CODEC-199 Bug in HW rule in Soundex
Revert to a fix which does not entail change to public API

Modified:
commons/proper/codec/trunk/src/changes/changes.xml

commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/language/Soundex.java


> Bug in HW rule in Soundex
> -
>
> Key: CODEC-199
> URL: https://issues.apache.org/jira/browse/CODEC-199
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Yossi Tamari
> Fix For: 1.11
>
> Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code 
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a 
> character that is preceded by two characters that are either H or W, is not 
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex

2017-03-31 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951828#comment-15951828
 ] 

Sebb commented on CODEC-199:


I think it makes sense to split this issue into two parts.

1) fixing the bug in the American Soundex algorithm implementation
That is the scope of this issue, CODEC-199

2) enhancing the class to provide support for other variants, i.e. Simplified 
Soundex and the Genealogy variant (whatever its name is).
That will now be dealt with under CODEC-233.

> Bug in HW rule in Soundex
> -
>
> Key: CODEC-199
> URL: https://issues.apache.org/jira/browse/CODEC-199
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Yossi Tamari
> Fix For: 1.11
>
> Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code 
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a 
> character that is preceded by two characters that are either H or W, is not 
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CODEC-233) Soundex should support more algorithm variants

2017-03-31 Thread Sebb (JIRA)
Sebb created CODEC-233:
--

 Summary: Soundex should support more algorithm variants
 Key: CODEC-233
 URL: https://issues.apache.org/jira/browse/CODEC-233
 Project: Commons Codec
  Issue Type: New Feature
Reporter: Sebb


The existing Soundex class was designed around the American Soundex algorithm.

Whilst it offers some flexibility with the mapping of letters to Soundex 
numbers, the list of the 'silent' letters H and W is built-in to the code. 
There is no provision for changing the set of silent (ignored) letters.

There is also no way to change the designation of HW from silent into consonant 
separator - i.e. code 0 - because that is how HW are currently encoded in the 
public API.

To fix this, the mapping can be enhanced to support an extra code for 'silent' 
letters.

A mapping which includes such a code did not have defined behaviour previously, 
so can be treated differently - there is no need to assume HW are silent.

This allows for the definition of alternative silent letters.

It can also be used to map HW as code '0' - as long as there is at least one 
'silent' code. 

If there are no actual silent letters in the algorithm variant, then the code 
can be appended to the end of the mapping. This will not affect processing as 
only letters A-Z are passed to the method. 

An alternative would be to introduce yet another code as an alias for '0', and 
only treat HW as silent if they have code '0'.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (LANG-1167) Add null filter to ReflectionToStringBuilder

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951546#comment-15951546
 ] 

ASF GitHub Bot commented on LANG-1167:
--

Github user chtompki commented on the issue:

https://github.com/apache/commons-lang/pull/259
  
@PascalSchumacher - Do you think we should take this approach, or the 
approach you went after in 
[LANG-1164](https://issues.apache.org/jira/browse/LANG-1164).


> Add null filter to ReflectionToStringBuilder
> 
>
> Key: LANG-1167
> URL: https://issues.apache.org/jira/browse/LANG-1167
> Project: Commons Lang
>  Issue Type: Improvement
>Reporter: Gregory Bonk
>Assignee: Rob Tompkins
>
> I know I can filter out class level fields with accept but it would be nice 
> if there could be an additional configuration where if a field's value is 
> null then it would be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] commons-lang issue #259: LANG-1167: Add null filter to ReflectionToStringBui...

2017-03-31 Thread chtompki
Github user chtompki commented on the issue:

https://github.com/apache/commons-lang/pull/259
  
@PascalSchumacher - Do you think we should take this approach, or the 
approach you went after in 
[LANG-1164](https://issues.apache.org/jira/browse/LANG-1164).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex

2017-03-31 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951090#comment-15951090
 ] 

Sebb commented on CODEC-199:


The penwith URL is very useful. Thanks.
It states that HW were treated as vowels in the original 'Simplified Soundex'.

This difference is not described in the Wikipedia article, and regarded as 
erroneous in the thoughtco page.
The thoughtco page is unhelpful in other ways, e.g. it uses SUTTON as an 
example of a name starting with a double letter! [A name like LLOYD would be OK]

I think there is a solution which will allow for the 'Simplified Soundex' 
variant as well as the current American Soundex - without compromising existing 
behaviour or needing to change the public constant. I hope to update the code 
in the next few days once it has been tested further.

==

It's wasteful to implement features that are not going to be used, and 
maintenance is increased.
Rather more importantly, unless there are usage examples then creating valid 
test cases is error prone.
That is why I have been stressing the need for use cases.

> Bug in HW rule in Soundex
> -
>
> Key: CODEC-199
> URL: https://issues.apache.org/jira/browse/CODEC-199
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Yossi Tamari
> Fix For: 1.11
>
> Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code 
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a 
> character that is preceded by two characters that are either H or W, is not 
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (LANG-1164) allow ToStringStyle to omitNulls

2017-03-31 Thread Rob Tompkins (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951037#comment-15951037
 ] 

Rob Tompkins commented on LANG-1164:


Hey Pascal -- We've got an open PR for 1167, which is the more global 
configuration. Do you think it better to handle at this level and then use the 
ToStringStyle changes to effectively manage 1167 or simply go with the global 
configuration?

> allow ToStringStyle to omitNulls
> 
>
> Key: LANG-1164
> URL: https://issues.apache.org/jira/browse/LANG-1164
> Project: Commons Lang
>  Issue Type: Improvement
>  Components: lang.builder.*
>Reporter: Shaun A Elliott
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (LANG-1167) Add null filter to ReflectionToStringBuilder

2017-03-31 Thread Rob Tompkins (JIRA)

 [ 
https://issues.apache.org/jira/browse/LANG-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Tompkins updated LANG-1167:
---
Assignee: Rob Tompkins

> Add null filter to ReflectionToStringBuilder
> 
>
> Key: LANG-1167
> URL: https://issues.apache.org/jira/browse/LANG-1167
> Project: Commons Lang
>  Issue Type: Improvement
>Reporter: Gregory Bonk
>Assignee: Rob Tompkins
>
> I know I can filter out class level fields with accept but it would be nice 
> if there could be an additional configuration where if a field's value is 
> null then it would be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUMBERS-16) Set tolerances in ComplexTest to zero?

2017-03-31 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/NUMBERS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950762#comment-15950762
 ] 

Gilles commented on NUMBERS-16:
---

bq. Any reason not to set these tolerances to zero?

Assuming that the internal representation is comprised of the real and 
imaginary parts, I'd answer "no".


> Set tolerances in ComplexTest to zero?
> --
>
> Key: NUMBERS-16
> URL: https://issues.apache.org/jira/browse/NUMBERS-16
> Project: Commons Numbers
>  Issue Type: Improvement
>Reporter: Eric Barnhill
>Priority: Minor
>
> Considering the following JUnit test:
> public void testConstructor() {
> Complex z = new Complex(3.0, 4.0);
> Assert.assertEquals(3.0, z.getReal(), 1.0e-5);
> Assert.assertEquals(4.0, z.getImaginary(), 1.0e-5);
> }
> That tolerance seems pretty high to me -- I sure would not want to work with 
> a method that was not stabler than that, nor can I see a reason that the 
> numbers would be exactly the same to the last place.
> Any reason not to set these tolerances to zero?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (NUMBERS-16) Set tolerances in ComplexTest to zero?

2017-03-31 Thread Eric Barnhill (JIRA)
Eric Barnhill created NUMBERS-16:


 Summary: Set tolerances in ComplexTest to zero?
 Key: NUMBERS-16
 URL: https://issues.apache.org/jira/browse/NUMBERS-16
 Project: Commons Numbers
  Issue Type: Improvement
Reporter: Eric Barnhill
Priority: Minor


Considering the following JUnit test:

public void testConstructor() {
Complex z = new Complex(3.0, 4.0);
Assert.assertEquals(3.0, z.getReal(), 1.0e-5);
Assert.assertEquals(4.0, z.getImaginary(), 1.0e-5);
}

That tolerance seems pretty high to me -- I sure would not want to work with a 
method that was not stabler than that, nor can I see a reason that the numbers 
would be exactly the same to the last place.

Any reason not to set these tolerances to zero?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (NUMBERS-14) call to hashCode() in Complex

2017-03-31 Thread Eric Barnhill (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Barnhill closed NUMBERS-14.

Resolution: Fixed

> call to hashCode() in Complex
> -
>
> Key: NUMBERS-14
> URL: https://issues.apache.org/jira/browse/NUMBERS-14
> Project: Commons Numbers
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Eric Barnhill
>Priority: Trivial
>  Labels: easyfix
>
> There is a call in Complex() to hashCode(). This calls a method 
> Precision.hash() . Looks like Precision.hash() is gone. I want to confirm 
> that this method is now gone from commons-numbers Precision() and 
> consequently I should remove this method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex

2017-03-31 Thread Eric Barnhill (JIRA)

[ 
https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950746#comment-15950746
 ] 

Eric Barnhill commented on NUMBERS-14:
--

All right, I add the private method then as per the original suggestion.

> call to hashCode() in Complex
> -
>
> Key: NUMBERS-14
> URL: https://issues.apache.org/jira/browse/NUMBERS-14
> Project: Commons Numbers
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Eric Barnhill
>Priority: Trivial
>  Labels: easyfix
>
> There is a call in Complex() to hashCode(). This calls a method 
> Precision.hash() . Looks like Precision.hash() is gone. I want to confirm 
> that this method is now gone from commons-numbers Precision() and 
> consequently I should remove this method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex

2017-03-31 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950685#comment-15950685
 ] 

Gilles commented on NUMBERS-14:
---

It's not the same method.  To use the one you refer to, one needs to 
instantiate a {{Double}}.  However small, there will be a performance penalty 
(that's the reason a static method was defined in the CM utilities).

> call to hashCode() in Complex
> -
>
> Key: NUMBERS-14
> URL: https://issues.apache.org/jira/browse/NUMBERS-14
> Project: Commons Numbers
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Eric Barnhill
>Priority: Trivial
>  Labels: easyfix
>
> There is a call in Complex() to hashCode(). This calls a method 
> Precision.hash() . Looks like Precision.hash() is gone. I want to confirm 
> that this method is now gone from commons-numbers Precision() and 
> consequently I should remove this method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex

2017-03-31 Thread Eric Barnhill (JIRA)

[ 
https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950673#comment-15950673
 ] 

Eric Barnhill commented on NUMBERS-14:
--

I think I see it here in [Java 
7|https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#hashCode()], 

> call to hashCode() in Complex
> -
>
> Key: NUMBERS-14
> URL: https://issues.apache.org/jira/browse/NUMBERS-14
> Project: Commons Numbers
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Eric Barnhill
>Priority: Trivial
>  Labels: easyfix
>
> There is a call in Complex() to hashCode(). This calls a method 
> Precision.hash() . Looks like Precision.hash() is gone. I want to confirm 
> that this method is now gone from commons-numbers Precision() and 
> consequently I should remove this method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex

2017-03-31 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950669#comment-15950669
 ] 

Gilles commented on NUMBERS-14:
---

I guess you mean 
[Double.hashCode(double)|https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#hashCode-double-],
 which exists only since Java 8.
If so, we should just be sure that {{Commons Numbers}} will not target earlier 
versions of Java; something to be acknowledged on the "dev" ML.

> call to hashCode() in Complex
> -
>
> Key: NUMBERS-14
> URL: https://issues.apache.org/jira/browse/NUMBERS-14
> Project: Commons Numbers
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Eric Barnhill
>Priority: Trivial
>  Labels: easyfix
>
> There is a call in Complex() to hashCode(). This calls a method 
> Precision.hash() . Looks like Precision.hash() is gone. I want to confirm 
> that this method is now gone from commons-numbers Precision() and 
> consequently I should remove this method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUMBERS-14) call to hashCode() in Complex

2017-03-31 Thread Eric Barnhill (JIRA)

[ 
https://issues.apache.org/jira/browse/NUMBERS-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950657#comment-15950657
 ] 

Eric Barnhill commented on NUMBERS-14:
--

I just call Double.hashCode() instead. Barring any objections I will close the 
ticket.

> call to hashCode() in Complex
> -
>
> Key: NUMBERS-14
> URL: https://issues.apache.org/jira/browse/NUMBERS-14
> Project: Commons Numbers
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Eric Barnhill
>Priority: Trivial
>  Labels: easyfix
>
> There is a call in Complex() to hashCode(). This calls a method 
> Precision.hash() . Looks like Precision.hash() is gone. I want to confirm 
> that this method is now gone from commons-numbers Precision() and 
> consequently I should remove this method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex

2017-03-31 Thread Yossi Tamari (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950537#comment-15950537
 ] 

Yossi Tamari commented on CODEC-199:


Examples:
[https://www.thoughtco.com/soundex-explained-us-census-1421773]
{quote}
For a period of time, especially between for the 1880, 1900 and 1910 census, 
Soundex coders sometimes erroneously treated H and W as separators, like the 
vowels, and assigned a code to both the S and C. This would make the code for 
ASHCRAFT A226, instead of A261. Basically, any surname with the letter H or W 
as a separator between adjacent letters having the same code should be coded 
both ways.
{quote}

[http://west-penwith.org.uk/misc/soundex.htm]


> Bug in HW rule in Soundex
> -
>
> Key: CODEC-199
> URL: https://issues.apache.org/jira/browse/CODEC-199
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Yossi Tamari
> Fix For: 1.11
>
> Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code 
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a 
> character that is preceded by two characters that are either H or W, is not 
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CODEC-199) Bug in HW rule in Soundex

2017-03-31 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950496#comment-15950496
 ] 

Sebb commented on CODEC-199:


bq. this is exactly a use case for where H and W are treated as vowels

No, it's not. 
Vowels have the code '0' and are used to separate consonants with the same 
non-zero code.
In this case, vowels are completely ignored, i.e. are treated like HW.

Try the following test:

{code}
Assert.assertEquals("L150", s.encode("Lippmann"));
{code}

This fails with the current code (generates "L155") unless you set A to behave 
like HW, i.e. vowels need to be set to '#' (silent).
Try it and see.

Step 3 of the Wikipedia definition says "two letters with the same number 
separated by 'h' or 'w' are coded as a single number, whereas such letters 
separated by a vowel are coded twice".
However the Genealogy defintion implies that such letters are coded as a single 
number for HW *and* the vowels.

The output from the Wikipedia definition allows repeated digits.
The Genealogy definition explicitly does not. That means it does not have any 
Wiki-style vowels; for the Genealogy definition vowels + HW are all silent.

I have yet to see a definiton that requires HW to be treated as a vowel rather 
than silent (or a consonant).
If you find any examples, please provide links (and test cases if possible).

> Bug in HW rule in Soundex
> -
>
> Key: CODEC-199
> URL: https://issues.apache.org/jira/browse/CODEC-199
> Project: Commons Codec
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Yossi Tamari
> Fix For: 1.11
>
> Attachments: better.patch, soundex.patch
>
>
> The Soundex algorithm says that if two characters that map to the same code 
> are separated by H or W, the second one is not encoded.
> However, in the implementation (in Soundex.getMappingCode() line 191), a 
> character that is preceded by two characters that are either H or W, is not 
> encoded, regardless of what the last consonant was.
> Source: http://en.wikipedia.org/wiki/Soundex#American_Soundex



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)