[algogeeks] Re: check string A and B isPermutation ?
Padmanabhan, can you elaborate a bit on your suggestion? Padmanabhan Natarajan [EMAIL PROTECTED] writes: how about a randomized approach if the string is truly big. Generate *relatively* prime numbers for each character in string and compute the product. Repeat it N times to minimize the chances of error. It must easy to observe that as you increase N, your chances of error decreases. On 4/14/06, Timak [EMAIL PROTECTED] wrote: Gene, If in reality space is no consern, I'd argue that just comparing the histograms (what BigYan suggested, except that he tried to optimize it) is the best approach. It is O(2N) time and O(w) space, while your proposal is O(2wN) time and O(N) space. Copmaring the histograms is also the simplest. Timak Gene [EMAIL PROTECTED] writes: Let's be real. I can't imagine a application where it could make much of a difference. Can you? Almost certainly it is simplicity and beauty and angels dancing on pins we're talking about. By the way, you can add about 16 characters to my function above and it becomes a true sort (not a sort into 1-hamming or gray code order). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Timak wrote: Hey, I think it's cool too :-) BTW, how exactly do you generalize it? Well, I'm going to let that as a puzzle for Algorithm Geeks! It's equally cool. And I've never seen it published anywhere. There are a few other things to think about however. First, simplest depends on environment. In perl you can compare two strings $a and $b with if (join('', sort(split '', $a)) eq join('', sort(split '', $b)) { # a is a permutation of b! } Histogram code would be more complex and would probably run slower. Why, can be pretty simple too. Something like: bool is_perm(char* a, char* b) { while (*a) hist[*(a++)]++; while (*b) if (--hist[*(b++)] 0) return true; return false; } Ah, but in the first place I was talking about Perl. Histogramming with arrays in Perl is really awkward. My whole point is that different environments often lead to different best solutions. You also have omitted zeroing the histogram, which isn't fair because your function won't work without it. Initializing arrays often crops up as the most expensive operation in interesting algorithms. So I'll point out that there is a cool data structure that you can tack on to an array. This data structure needs only constant time per array access to maintain. Yet it allows you to treat the array as though it's initialized to zero even though it really is not. The down side is that this data structure needs O(V) additional storage, where V is the number of array entries eventually set to values different from zero. Again I'll let the details as a puzzle for the news group. Unfortunately, your code above doesn't work. For example it will return false on is_perm(a, a) . Did you mean to have the true and false in the other order? Even if you reverse them it doesn't work. Try is_perm(aa, a) . Your code will miss the fact that hist['a'] is 1 at the end. So your algorithm should really be called is_superset(a, b) However, please note that you don't need to compare the histograms (although those were my words). You just need to accumulate the histogram of the first string, and then just decrement the counts when scanning the second one; since you would know that the lengths are the same, a trial to decrement a zero would mean the strings are not permutations. All this is O(N). Sorry this is not correct. To make your algorithm work, you'd need to verify that the incremented/decremented histogram is all zeros at the end (see my example above). This final verification pass is O(2^w). Another approach that doesn't need the verification pass is to use is_superset(a, b) is_superset(b, a). But now you need to zero the histogram twice! IMHO if you have big characters, the histogram is best kept in a hash table where the keys are letters and the values are counts. But this again is more complex. Great discussion! Thanks. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Yeah, I mistakenly swaped the true and false. My code works, but it assumes that you know that the lengths of the strings are the same. BTW, the lengths could be calculated on the fly. Also the hist doesn't have to be a vector, it could be a hash, as you mentioned. Here's a full c++ implementation using an STL map (which is most often implemented using a red-black tree). #include iostream #include map std::mapchar, int hist; bool is_perm(char* a, char* b) { int cnt = 0; while (*a) { hist[*(a++)]++; cnt++; } while (*b) { if (--hist[*(b++)] 0) return false; cnt--; } return !cnt; } main() { std::cout (is_perm(aab, abab) ? y : n) std::endl; } I think STL calls the default constructors so no need make sure the counters are zero. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Gene, If in reality space is no consern, I'd argue that just comparing the histograms (what BigYan suggested, except that he tried to optimize it) is the best approach. It is O(2N) time and O(w) space, while your proposal is O(2wN) time and O(N) space. Copmaring the histograms is also the simplest. Timak Gene [EMAIL PROTECTED] writes: Let's be real. I can't imagine a application where it could make much of a difference. Can you? Almost certainly it is simplicity and beauty and angels dancing on pins we're talking about. By the way, you can add about 16 characters to my function above and it becomes a true sort (not a sort into 1-hamming or gray code order). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
I agree with much of what you say! I just think character sorting into hamming-1 order is cool, especially since it generalizes to a true sort with just a slight modification. There are a few other things to think about however. First, simplest depends on environment. In perl you can compare two strings $a and $b with if (join('', sort(split '', $a)) eq join('', sort(split '', $b)) { # a is a permutation of b! } Histogram code would be more complex and would probably run slower. Another thing is that it's meaningless to talk about O(2N) because O(2N)=O(N). Finally, for characters of w bits, the histogram requires O(2^w log N) space in the general case, not O(w) as you say. Moreover, run time is also O(2^w) to compare those histograms. For example, if you are working in Unicode where characters might have up to 4 bytes, you'd need 4 billion histogram entries (perhaps 16 gigabytes), and it would take a while to compare these. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
how about a randomized approach if the string is truly big. Generate *relatively* prime numbers for each character in string and compute the product. Repeat it N times to minimize the chances of error. It must easy to observe that as you increase N, your chances of error decreases. On 4/14/06, Timak [EMAIL PROTECTED] wrote: Gene,If in reality space is no consern, I'd argue that just comparingthe histograms (what BigYan suggested, except that he tried tooptimize it) is the best approach.It is O(2N) time and O(w) space, while your proposal is O(2wN) time and O(N) space.Copmaring thehistograms is also the simplest.TimakGene [EMAIL PROTECTED] writes: Let's be real.I can't imagine a application where it could make much of a difference.Can you? Almost certainly it is simplicity and beauty and angels dancing on pins we're talking about. By the way, you can add about 16 characters to my function above and it becomes a true sort (not a sort into 1-hamming or gray code order). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Let's be real. I can't imagine a application where it could make much of a difference. Can you? Almost certainly it is simplicity and beauty and angels dancing on pins we're talking about. By the way, you can add about 16 characters to my function above and it becomes a true sort (not a sort into 1-hamming or gray code order). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
I had written.. So, using a sort the complexity really is O(2n log 2), Isn't it? Oh yeah.. That's terribly bad logic! @Gene Use radix sort to make it O(N W) where W is the bit-width of the characters in the string, hence O(N). I don't really get how a radix pass is going to help here either.. Wouldn't it just order the strings, while we really want to order the characters in the strings? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
i think we just construct a string C that repeate string A twice, for example, string A is 'abcd', and string C is 'abcdabcd', then check whether string B is a substring of string C. by the way, the string C can be saved by utilizing string A and modular function, for example, the char at position i in string C is exactly the char at i%len(A) in string A. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Assuming that you cannot have any letter repeated more than 15 times. Create a struct of 26, 4 bit bitfields. This is going to take 26x4 bits or 13 bytes of memory. Use this structure to store the character count of each character of the first string. Now proceed to the second string and for each character, decrease it's corresponding character count by 1 if it is greater than 0. If it is 0 then the second string is not a permutation of the first string. At the end of processin the second string check whether each caracter count is 0. If they all are 0 then the strings are permutations of one another. Of course you can modify the algo to accomodate higher maximum character counts. In any case the time required is O(n) and space required is 26 x ceiling ( log ( maximum_individual_character_count, base=2) ) bits. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
How about the prime number approach I mentioned above. Any comments?On 4/11/06, Dhyanesh [EMAIL PROTECTED] wrote:I also particularly dont like radix sort. Because it is kind of a hack. It is not a comparison based sort. It will work with charactersas you have here. But what if you have not two strings of characters,but two arrays of objects which you can only compare. You would not be able to apply radix sort then.-DhyaneshOn 4/11/06, Kevin [EMAIL PROTECTED] wrote: Talking about radix sorting, my experience is that many job interviewers (who are coders/developer themselvef) do not like it. Each time when I mentioned it, they usually did not get happy with it. What could be the reason? Any bad things in practical for radix sorting? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
With the prime number approach you would need to somehow facilitate large numbers. Also, you need to store those primes which would take a lot of space, although admittedly you only need to take that space once. In addition, multiplication is a much heavier opperation than increment. BiGYaN's approach is nice but does not scale for large strings. If the strings are enourmous then an array of links to the counts with the ability to increase the number of bits in a counter on the fly might work, although that's getting complex and would take space. If space is the main consern, wouldn't some sort of radix sort work here? You could sort the bytes of each string according to one bit (say the least significant of the relevant bits), that would partition the strings into two parts. The sort could be done by starting both from left and right ends and exchanging elements that are not in the appropriate partition (left or right). Once done with the first bit, check if the partitioning position is different, if it is then A is no permitation of B and stop. Otherwise continue with the next relevant bit. This is O(N * w) where w is the number of bits need to be checked before returning FALSE or the total number of relevant bits in a character in the worst case. No additional space required. --Timak Padmanabhan Natarajan [EMAIL PROTECTED] writes: How about the prime number approach I mentioned above. Any comments? On 4/11/06, Dhyanesh [EMAIL PROTECTED] wrote: I also particularly dont like radix sort. Because it is kind of a hack. It is not a comparison based sort. It will work with characters as you have here. But what if you have not two strings of characters, but two arrays of objects which you can only compare. You would not be able to apply radix sort then. -Dhyanesh On 4/11/06, Kevin [EMAIL PROTECTED] wrote: Talking about radix sorting, my experience is that many job interviewers (who are coders/developer themselvef) do not like it. Each time when I mentioned it, they usually did not get happy with it. What could be the reason? Any bad things in practical for radix sorting? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
[EMAIL PROTECTED] writes: So this is no better than just having an array of the counts per character, in terms of space and worse in terms of speed. :-/ Actually, what was I thinking, the radix approach would be better, spacewise. In terms of space it would require O(w) space where w is the number of bits used for the determination. The array of sums approach (what Pramod Patangay suggested earlier) requires O(K) space where K is the number of characters. Since w is logK or less, the radix approach is better. Speedwise, the array of sums would be O(2N), while the radix is O(2N*w). Both are O(N) since w is a constant relatively small, in case of huge N. [boy, I just replied to my own message for a second time :-)] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Something on the lines of counting sort will do. Make an array Count[26] where Count[1] will have the number of a's etc. While reading string A, keep note of the count of the characters in Count and while reading B, just check if the same number of characters occur. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
HI, how do you intend to use hash ( as in which hash function) to check if the hash keys generated for both A and B are equal. Thanks, Shishir --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
But the problem is: there are much more than 26 possible characters in a string, even ASCII have 100+ characters I think, not to mention those unicode, those which use 2 or even more bytes as a character. So the Count[] will be too large to be practical useful. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
How about this?Assign a prime number to each of the the characters. Multiply each of the numbers in each string and compare the product. If they are the same then it is a permutation.Cheers,Nat On 4/10/06, Kevin [EMAIL PROTECTED] wrote: But the problem is:there are much more than 26 possible characters in a string, even ASCIIhave 100+ characters I think, not to mention those unicode, those whichuse 2 or even more bytes as a character. So the Count[] will be too large to be practical useful. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
But multiply prime numbers will run out of range of integer quickly (as I remember multiply the first 10-20 prime will get a too large number to fit into an integer). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
OK OK! Use the modulus operator then. Assign prime numbers mod p where p is a distinct large prime to characters and multiply.On 4/10/06, Kevin [EMAIL PROTECTED] wrote:But multiply prime numbers will run out of range of integer quickly (as I remember multiply the first 10-20 prime will get a too large numberto fit into an integer). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Kevin wrote: Given two strings A and B, how to quickly check whether they are permutation or not? For example, abcd and bcda are permutation: same chars, just different order. Using hash: hash A and check B will use time O(N), at the cost of extra N space. How about do not use that much of extra space? One naive method: sort A and B by their chars, and do a compare will take O(NlogN) time. Any faster method? Use radix sort to make it O(N W) where W is the bit-width of the characters in the string, hence O(N). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---
[algogeeks] Re: check string A and B isPermutation ?
Just of the top of my head. You're dealing with *2* strings here. So the *naive* method you quoted wont be that bad. Use a sort. The basic operation here being string compare, which takes up O(n) time NOTE 'n' is the length of the strings. So, using a sort the complexity really is O(2n log 2), Isn't it? Which is O(n) really, n being length of the strings. NO extra space either. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~--~~~~--~~--~--~---