Re: [Jprogramming] Substring sequences of a string

Henry Rich Tue, 21 Jul 2015 13:13:59 -0700

   <._2855392203 + 2^32
1439575093

The upper bits of the CRC-32 should be discarded:


4bminus1 =. (26 b.) 32 (33 b.) _32 (33 b.) _1

fourbminus1 (17 b.) f 'assiduously avoid any and all asininealliterations'

1439575093

Henry Rich

On 7/21/2015 3:06 PM, Vijay Lulla wrote:

Out of curiosity, I'm getting different value for the example listed
under 128!:3.  Shouldn't it be the same as listed on the page?

Below is from my J session

    f '123456789'
_873187034
    f 'assiduously avoid any and all asinine alliterations'  NB.
Different from the listed example
_2855392203
    JVERSION
Engine: j803/2014-10-19-11:11:11
Library: 8.04.06
Qt IDE: 1.4.3/5.4.2
Platform: Win 64
Installer: J804 install
InstallPath: h:/utilities/j64-804


On Tue, Jul 21, 2015 at 11:48 AM, Raul Miller <[email protected]> wrote:

You can't have an inverse crc, because crc is a lossy transformation.
You are basically relying on statistics to avoid collisions (different
strings with the same crc).

So actual use would look something like:

step one: get the distinct crcs which are in use.

step two: go over the data again and for each string find its crc, and
check that some other relevant string isn't producing the same crc.
(If there are, you'll need further work to untangle them.)

--
Raul

On Tue, Jul 21, 2015 at 10:34 AM, Mike Day <[email protected]> wrote:

That's neat,  but it's a bit messy retrieving the actual
substrings rather than their encoded forms.

This does it,
    10(]{~i.@[+/~((I.@:(1<#/.~))@:( (128!:3)\ ]))) s

AAAAACCCCC

CCCCCAAAAA


but it would be much better with an inverse CRC;
however that doesn't seem to be supported in J.


Is there a maximum window size for this approach?

Thanks,

Mike


On 21/07/2015 14:37, Henry Rich wrote:


For longer subsequences consider using

(10 (128!:3)\ ])

to reduce the size of the intermediate array.

Henry Rich

On 7/21/2015 12:49 AM, Vijay Lulla wrote:


Using slightly less space

(~. #~ 1 < #/.~)@(10 ]\ ]) s

On Mon, Jul 20, 2015 at 11:59 PM, Tikkanz <[email protected]> wrote:


(i.~ ~: i:~) will find duplicates so how about:

      ~.@(#~ i.~ ~: i:~)@(10 ]\ ]) s

AAAAACCCCC

CCCCCAAAAA



On Tue, Jul 21, 2015 at 3:51 PM, Jon Hough <[email protected]> wrote:

This is a problem from leetcode.com (similar to Project Euler)
https://leetcode.com/problems/repeated-dna-sequences/
The problem is to find all 10 letter repeated subsequences from a DNA
string (made of C,G,A,T characters).
My solution:
func =: (I.@:(1&<)@:>@:(1&{)@:(~. ,: <"0@:(#/.~)) {
])@:(<"1@:(10&(]\)))
e.g. s =: 'AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT' NB. see the link for this
definition
func s
┌──────────┬──────────┐

│AAAAACCCCC│CCCCCAAAAA│

└──────────┴──────────┘



It is not very pretty. Can anyone improve on it?




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Substring sequences of a string

Reply via email to