Dear Tim, Many thanks - this was illuminating and helped me understand the bottleneck. But it seems I did not give the correct amount of information in my question. In fact, I also want to access A[ idx ] for general index vectors idx that do not necessarily belong to the L array. For example, I may be given a set of vectors in an array R, d x K and want to access A[ L[:, n] + R[:, m] ], then loop over n, m and do some calculations.
These sort of calculations are at the core of the code and I want the syntax as simple and intuitive as possible to avoid making mistakes. Can I confirm something: is it true that when I do A[ L[:, Lcol] ] then L[:, Lcol] creates a new d-dimensional array and copies that data into that array? Can this at all be avoided? (I believe this is the bottlenbeck in what I do?) I've now tried this again in the form module damod export darray3 immutable darray3 data end export getindex getindex(a::darray3, idx) = a.data[idx[1], idx[2],idx[3]] end This makes only a small difference. elapsed time: 0.163860853 seconds (104000080 bytes allocated, 28.46% gc time) vs some variant of your code: elapsed time: 0.003819349 seconds (96 bytes allocated) What strikes me as odd here is that *104,000,080 bytes* are being allocated. If L is an Int16 array, then it should be no more than 100^3 x 3 x 8 = 6,000,000 ? Any further thoughts will be much appreciated. Christoph