Sorry I don't know regex, but I like to solve problems in simple J. So here'
are some steps to answer your problem. There are lots of ways to condense
this and many ways to present your result.
However, answering the question itself always seems like a start to me.
DNA=:
'CGATTGACTAGTCGATTGCTGATGCTCTAGTCGTGATGCTATACTAGTGCGTCGATGCTAGCGCTAGTCGCATTT
GA'
$DNA
]S=:'CTAG'
]T=:i.($DNA)-3
]U=:|:T+"0/i.4
]V=:U{DNA
]W=:S="1 2 V
]X=:4=+/W
X,.>:i.74
+/X
Linda
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jon Hough
Sent: Sunday, August 16, 2015 2:10 AM
To: [email protected]
Subject: [Jprogramming] Regex vs I./E. for pattern matching
I recently went through the regex lab, and would like to know whether it is
more idiomatic for J users to use regex when matching simple patterns in a
string, or to use E. and similar verbs?
For example. If I have an (imaginary) DNA sequence string:
DNA=:
'CGATTGACTAGTCGATTGCTGATGCTCTAGTCGTGATGCTATACTAGTGCGTCGATGCTAGCGCTAGTCGCATTT
GA'
I want to find where 'CTAG' sequences exist in this string. Using regex,
'CTAG' rxmatches DNA will give the 5 indices where the CTAG pattern is
found.
But I could equally do,
I. 'CTAG' E. DNA
which will give me the same indices. And it seems the non-regex way is more
efficient (in time and space):
timespacex '( I. ''CTAG'' E. DNA)'
gives 1.5e_5 3008
timespacex '( ''CTAG'' rxmatches DNA)'
gives 0.001103 6720
Granted, the regex expression is as simple as possible. and regex can do
more complicated matching than E. can do, and possibly rxmatches gains
efficiency over E. for very longer DNA strings. But it seems for simple
matches E. is the better choice.
----------------------------------------------------------------------
For information about J forums see <http://www.jsoftware.com/forums.htm>
http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm