Re: [Jprogramming] Substring sequences of a string

Mike Day Tue, 21 Jul 2015 07:36:07 -0700

That's neat,  but it's a bit messy retrieving the actual
substrings rather than their encoded forms.


This does it,
   10(]{~i.@[+/~((I.@:(1<#/.~))@:( (128!:3)\ ]))) s

AAAAACCCCC

CCCCCAAAAA


but it would be much better with an inverse CRC;
however that doesn't seem to be supported in J.


Is there a maximum window size for this approach?

Thanks,

Mike

On 21/07/2015 14:37, Henry Rich wrote:

For longer subsequences consider using

(10 (128!:3)\ ])

to reduce the size of the intermediate array.

Henry Rich

On 7/21/2015 12:49 AM, Vijay Lulla wrote:

Using slightly less space

(~. #~ 1 < #/.~)@(10 ]\ ]) s

On Mon, Jul 20, 2015 at 11:59 PM, Tikkanz <[email protected]> wrote:

(i.~ ~: i:~) will find duplicates so how about:

     ~.@(#~ i.~ ~: i:~)@(10 ]\ ]) s

AAAAACCCCC

CCCCCAAAAA



On Tue, Jul 21, 2015 at 3:51 PM, Jon Hough <[email protected]> wrote:

This is a problem from leetcode.com (similar to Project Euler)
https://leetcode.com/problems/repeated-dna-sequences/
The problem is to find all 10 letter repeated subsequences from a DNA
string (made of C,G,A,T characters).
My solution:

func =: (I.@:(1&<)@:>@:(1&{)@:(~. ,: <"0@:(#/.~)) {])@:(<"1@:(10&(]\)))

e.g. s =: 'AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT' NB. see the link for this
definition
func s
┌──────────┬──────────┐

│AAAAACCCCC│CCCCCAAAAA│

└──────────┴──────────┘



It is not very pretty. Can anyone improve on it?



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Substring sequences of a string

Reply via email to