On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki <dmkosli...@gmail.com> wrote:
> >> Implementing your own Bloom filter really shouldn't be too hard. >> Alternatively, it might not be too hard to file some issues against John's >> package and get it into better working state. If you mention me in an >> issue, I can also take a look at things. Having the BloomFilters package >> working would probably be a good thing. >> > > If you look at the link I provided, you'll see that I've previously done > exactly as you're suggesting. > Great. Thanks for helping out! Exactly, and that's why I'm using my own implementation. > > I have thought using using a single string to store all the smaller > strings, but I think it's even more computationally difficult to come up > with a single (shortest) string that contains all my specified substrings > (and no more). Correct me if I am wrong on this point though, as that > would be great! > I meant to just use a single BioSeq array and concatenate all the sequences, not trying to make the overlap. This just reduces the overhead, it's not a compression technique.