Hi Keshav,

findMatches() in the S4Vectors/IRanges packages (Bioconductor) I think
does what you want:

  library(IRanges)
  y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L)
  x <- c(unique(y), 999L)
  hits <- findMatches(x, y)

Then:

  > hits
  Hits object with 9 hits and 0 metadata columns:
        queryHits subjectHits
        <integer>   <integer>
    [1]         1           1
    [2]         2           2
    [3]         3           3
    [4]         3           9
    [5]         4           4
    [6]         4           5
    [7]         4           8
    [8]         5           6
    [9]         6           7
    -------
    queryLength: 7
    subjectLength: 9

The Hits object can be turned into a list with:

  > as.list(hits)
  [[1]]
  [1] 1

  [[2]]
  [1] 2

  [[3]]
  [1] 3 9

  [[4]]
  [1] 4 5 8

  [[5]]
  [1] 6

  [[6]]
  [1] 7

  [[7]]
  integer(0)

H.

> sessionInfo()
R version 3.2.0 beta (2015-04-05 r68151)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] IRanges_2.1.43       S4Vectors_0.5.22     BiocGenerics_0.13.11

loaded via a namespace (and not attached):
[1] tools_3.2.0

On 04/06/2015 01:56 PM, Keshav Dhandhania wrote:
Hi,

I know that one can find all occurrences of x in a vector v by doing
which(x == v).

However, if I need to do this again and again, where v is remaining the
same, then this is quite inefficient. In my particular case, I need to do
this millions of times, and length(v) = 100 million.

Does anyone have suggestion on how to go about it?
I know of a package called fmatch that does the above for the match
function. But they don't handle multiple matches.

Thanks

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to