I've created a JIRA issue, https://issues.apache.org/jira/browse/LANG-1266, and
a pull request for this: https://github.com/apache/commons-lang/pull/188
Regards,Eyal
On Wednesday, September 7, 2016 5:27 PM, Eyal Allweil
<[email protected]> wrote:
Hi Simo,
I'm not sure I understood how BitSets would be used in this case. For example,
an example with chars might look like this.
AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'],
['a','e','f','g'],['a']) // 'a' is not encoded
and the mapping would become a -> a, b -> e, c -> f, d -> g
so encoding encode("abc") would become "aef".
Ints can be used instead of chars to support unicode code points that don't fit
in a single char (which was our case, but if that seems overkill, the chars
implementation is much more direct).
How did you mean the BitSet to be used?
Regards,Eyal
On Thursday, September 1, 2016 12:26 PM, Simone Tripodi
<[email protected]> wrote:
Hi,I personally think it would a very "nice to have" feature, I had to face
similar issues in the past and, if that feature was available would have saved
me developing time.
I just have a small request/suggestion: since int/char can be casted to each
other, I would use BitSets rather than Sets.
Good luck!-Simo
http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi
On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <[email protected]>
wrote:
Hi guys,
Would you be interested in adding a utility class that creates alphabet
converters, perhaps using a helper method available from StringUtils? It
doesn't have to stay the way it is now, but the API for the class -
AlphabetConverter - is currently:
/** * The input is integers representing code points, but we can make it accept
chars as well * * doNotEncode represents chars we want to leave in the original
state (not to encode them using the chars in encoding) */
public AlphabetConverter(Set<Integer> original, Set<Integer> encoding,
Set<Integer> doNotEncode);
public String encode (String original);
public String decode (String encoded);
In StringUtils, we could add
public AlphabetConverter getAlphabetConverter (Set<Integer> original,
Set<Integer> encoding, Set<Integer> doNotEncode);
I used it to convert from unicode to latin letters, without using any chars I
wanted as delimiters, and preserving the English alphabet as is for
readability. If you'd like to add it, I'll clean up the code and prepare it for
a pull request so you can review it.
It makes sense to me to add a method that returns the HashMaps used internally
for the mappings so they can be serialized (and deserialized) for preserving
the mapping.
Regards,Eyal Allweil (PayPal)