note that there is an even faster version
10 timespacex '( I. ''CTAG'' E. DNA)'
2.656e_6 6016
10 timespacex '( ''CTAG'' I.@:E. DNA)'
2.144e_6 6400
for Henry's example of
'CTAG*ACTA'
there can be some enhanced flexibility for using E.
('CTAG';'ACTA') I.@:E.each < DNA
one way to use the index starts is getting all of the "overlapping matches"
rangei =: [ + >:@] i.@- [
('CTAG';'ACTA') (] {~each a:-.~ [: ''"_`rangei@.</each [: ,@:(,.each/"0 1&>/)
I.@:E.each) < DNA
----- Original Message -----
From: Jon Hough <[email protected]>
To: "[email protected]" <[email protected]>
Cc:
Sent: Sunday, August 16, 2015 2:09 AM
Subject: [Jprogramming] Regex vs I./E. for pattern matching
I recently went through the regex lab, and would like to know whether it is
more idiomatic for J users to use regex when matching simple patterns in a
string, or to use E. and similar verbs?
For example. If I have an (imaginary) DNA sequence string:
DNA=:
'CGATTGACTAGTCGATTGCTGATGCTCTAGTCGTGATGCTATACTAGTGCGTCGATGCTAGCGCTAGTCGCATTTGA'
I want to find where 'CTAG' sequences exist in this string. Using regex,
'CTAG' rxmatches DNA
will give the 5 indices where the CTAG pattern is found.
But I could equally do,
I. 'CTAG' E. DNA
which will give me the same indices. And it seems the non-regex way is more
efficient (in time and space):
timespacex '( I. ''CTAG'' E. DNA)'
gives 1.5e_5 3008
timespacex '( ''CTAG'' rxmatches DNA)'
gives 0.001103 6720
Granted, the regex expression is as simple as possible. and regex can do more
complicated matching than E. can do, and possibly rxmatches gains efficiency
over E. for very longer DNA strings. But it seems for simple matches E. is the
better choice.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm