<._2855392203 + 2^32
1439575093
The upper bits of the CRC-32 should be discarded:
4bminus1 =. (26 b.) 32 (33 b.) _32 (33 b.) _1
fourbminus1 (17 b.) f 'assiduously avoid any and all asinine
alliterations'
1439575093
Henry Rich
On 7/21/2015 3:06 PM, Vijay Lulla wrote:
Out of curiosity, I'm getting different value for the example listed
under 128!:3. Shouldn't it be the same as listed on the page?
Below is from my J session
f '123456789'
_873187034
f 'assiduously avoid any and all asinine alliterations' NB.
Different from the listed example
_2855392203
JVERSION
Engine: j803/2014-10-19-11:11:11
Library: 8.04.06
Qt IDE: 1.4.3/5.4.2
Platform: Win 64
Installer: J804 install
InstallPath: h:/utilities/j64-804
On Tue, Jul 21, 2015 at 11:48 AM, Raul Miller <[email protected]> wrote:
You can't have an inverse crc, because crc is a lossy transformation.
You are basically relying on statistics to avoid collisions (different
strings with the same crc).
So actual use would look something like:
step one: get the distinct crcs which are in use.
step two: go over the data again and for each string find its crc, and
check that some other relevant string isn't producing the same crc.
(If there are, you'll need further work to untangle them.)
--
Raul
On Tue, Jul 21, 2015 at 10:34 AM, Mike Day <[email protected]> wrote:
That's neat, but it's a bit messy retrieving the actual
substrings rather than their encoded forms.
This does it,
10(]{~i.@[+/~((I.@:(1<#/.~))@:( (128!:3)\ ]))) s
AAAAACCCCC
CCCCCAAAAA
but it would be much better with an inverse CRC;
however that doesn't seem to be supported in J.
Is there a maximum window size for this approach?
Thanks,
Mike
On 21/07/2015 14:37, Henry Rich wrote:
For longer subsequences consider using
(10 (128!:3)\ ])
to reduce the size of the intermediate array.
Henry Rich
On 7/21/2015 12:49 AM, Vijay Lulla wrote:
Using slightly less space
(~. #~ 1 < #/.~)@(10 ]\ ]) s
On Mon, Jul 20, 2015 at 11:59 PM, Tikkanz <[email protected]> wrote:
(i.~ ~: i:~) will find duplicates so how about:
~.@(#~ i.~ ~: i:~)@(10 ]\ ]) s
AAAAACCCCC
CCCCCAAAAA
On Tue, Jul 21, 2015 at 3:51 PM, Jon Hough <[email protected]> wrote:
This is a problem from leetcode.com (similar to Project Euler)
https://leetcode.com/problems/repeated-dna-sequences/
The problem is to find all 10 letter repeated subsequences from a DNA
string (made of C,G,A,T characters).
My solution:
func =: (I.@:(1&<)@:>@:(1&{)@:(~. ,: <"0@:(#/.~)) {
])@:(<"1@:(10&(]\)))
e.g. s =: 'AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT' NB. see the link for this
definition
func s
┌──────────┬──────────┐
│AAAAACCCCC│CCCCCAAAAA│
└──────────┴──────────┘
It is not very pretty. Can anyone improve on it?
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm