On Sunday, May 11, 2014 12:55:47 PM UTC-5, Stefan Karpinski wrote: > > I find this kind of amusing: > > You may have heard before that R is a vectorized language, but what do we >> mean by that? One way to read that is to say that many functions in R can >> operate efficiently on vectors (in addition to singletons). > >
> That's certainly one way to spin it. Following it up with two blog posts > giving four different ways to work around this language "feature" suggests > something a bit less sunny, however. A more cynical take is that a > vectorized language is one that is bad at both iteration and recursion, and > thus forces its users to vectorize their code so that all the hard work can > be done in a faster language. Perhaps a more even-handed characterization > is that vectorized languages provide many vectorized functions and writing > vectorized code is idiomatic – which may make Julia a vectorized language, > but not in the sense of forcing you to doing everything vectorized. > > I must admit that was my reaction too when I saw those posts. If I wanted to come up with an example of a problem showing why forced vectorization (R does not have scalars so any arithmetic operation written in R is vectorized) I would look for a calculation where each step in the iteration is conditional on the current value. For example, this calculation. I don't know the person who wrote these postings. It's possible that he or she doesn't have a lot of experience. I am surprised that others are Revolution Analytics didn't pick up on the fact that this example doesn't show R in a good light. > This simple Collatz computation is clearly a case where vectorization > fails pretty badly and the straightforward iterative version is clear and > easy – and in a language that allows efficient iteration, fast. Here's some > related code for calculating all the Collatz numbers from 1 up to n, > reusing work (adapted from the solution to Euler problem > 14<https://github.com/JuliaLang/julia/blob/458d27216f02d61e3e1249b8cd0f0b446c31e719/test/euler.jl#L219-L238> > ): > > function collatz_up_to(m) > c = fill(-1,m) > c[1] = 0 > for n::Int64 = 2:m > nʹ, d = n, 0 > while nʹ > length(c) || c[nʹ] < 0 > nʹ = iseven(nʹ) ? nʹ>>1 : 3nʹ+1 > d += 1 > end > d += c[nʹ] > while n > length(c) || c[n] < 0 > n <= length(c) && (c[n] = d) > n = iseven(n) ? n>>1 : 3n+1 > d -= 1 > end > end > return c > end > > > I spent the weekend thinking on and off about memoizing this calculation but didn't come up with anything as clear as this. You obviously have a more devious mind than I do. I think there is a possible enhancement in that once you know c[n] you can fill in c[n]*2^k until that product is > m. > On my system, calling collatz_up_to(10^6) is 11x faster than [ collatz(k) > for k=1:10^6 ] and 14x faster than map(collatz, 1:10^6). The speedup > factor does not seem to grow as the problem gets larger – if anything, it > shrinks a little. Maybe the fraction of shared work stays fixed as the > number of Collatz iterations one wants to compute grows larger. Still, an > 11x speedup isn't bad. This also seems like a computation that's > essentially impossible to vectorize. > > I'm kind of thinking of making a new benchmark suite, grouping benchmarks > by whether they test iteration, recursion, array swapping, parsing, etc. > I'd like to head off all the squabbling about the unfairness of not using > vectorized implementations by picking problems like Collatz computation, > where vectorization isn't effective. Of course, I'm sure there will then be > complaints about how the choice of problems is unfair. Perhaps we should > include simpler problems where vectorization is viable and compare both > vectorized and devectorized versions in various languages – kind of like > John Myles White's blog > post<http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/> > on > vectorization vs. devectorization. Making a new benchmark suite would be a > lot of work, but unlike when we wrote all the microbenchmark > implementations back in 2011, today we may be able to distribute the work > and have people who are experts in each language to write the > implementations. > > On Fri, May 9, 2014 at 6:26 PM, Douglas Bates <dmb...@gmail.com<javascript:> > > wrote: > >> On Friday, May 9, 2014 4:45:00 PM UTC-5, Ted Fujimoto wrote: >>> >>> Hi all, >>> >>> In the first few minutes of this video (https://www.youtube.com/ >>> watch?v=v9Io-p_iymI) Prof. Doug Bates skims through why one should >>> invest in Julia over R. However, I felt it wasn't detailed enough. Why >>> should statisticians care about classes instead of "tags"? Do these >>> practices really make a difference? Could anyone share some more insight >>> into his reasoning? Other reasons other than speed (say programming >>> practices, simplicity, etc.) would also be appreciated. >>> >>> Thanks. >>> >> >> Bear in mind that the video you mention is an old video, by the standards >> of the Julia language. >> >> You may want to take a look at some slides from a more recent >> presentation.some >> slides<http://www.stat.wisc.edu/~bates/2014-04-09-Evanston.html> (type >> "w" to get a widescreen view) >> >> The big difference for me between using R and using Julia is that I >> almost always end up writing some C or C++ code when I am developing in R >> whereas in Julia I almost never write C or C++ code for a project. For >> clarity and flexibility being able to write in the high-level REPL-based >> language and attain reasonable or very good performance cannot be >> overemphasized. >> >> A recent >> posting<http://blog.revolutionanalytics.com/2014/05/r-and-the-collatz-conjecture-part-2.html> >> on >> the Revolution Analytics blog (see also the previous >> posting<http://blog.revolutionanalytics.com/2014/04/a-look-a-r-vectorization-through-the-collatz-conjecture.html> >> on >> the topic) showed various ways of evaluating the Collatz stopping time for >> a vector of integers. I was rather surprised at these postings because, in >> my opinion, they don't show the R language to advantage. In Julia I would >> write the evaluation of the stopping time for an integer as >> >> function collatz(n::Integer) >> count = 0 >> while n > 1 >> n = isodd(n) ? 3n+1 : n >> 1 >> count += 1 >> end >> count >> end >> >> which I think is pretty straightforward. (Well, I do use the expression >> n >> 1 to divide n by 2 but you have to have some fun.) >> >> On my machine this outperforms all of the R versions. >> >> julia> @time map(collatz, 1:1000000) >> elapsed time: 0.520773287 seconds (39983824 bytes allocated) >> 1000000-element Array{Int64,1}: >> 0 >> 1 >> 7 >> 2 >> 5 >> 8 >> 16 >> 3 >> 19 >> 6 >> ⋮ >> 165 >> 113 >> 165 >> 113 >> 258 >> 113 >> 113 >> 258 >> 258 >> 152 >> >> And, as a bonus, you can parallelize the mapping of this function with >> very little effort. >> >> Regarding my comment in the video about "tags", I think it is important. >> There are now at least three systems for defining classes and dispatching >> calls to generics in R but the one that is the most widely used, known as >> "S3", doesn't define classes. In R >> >> class(x) <- "foo" >> >> doesn't check anything about the structure of 'x', which is a recipe for >> hard-to-detect bugs. >> >> Anyway, enough of my comments. Perhaps others can contribute. >> > >