from:"gael . mcdon"

Re: [julia-users] Re: Article on `@simd`

2014-09-17 Thread gael . mcdon

Slightly OT, but since I won't talk about it myself I don't feel this will harm 
the current thread ...


I don't know if it can be of any help/use/interest for any of you but some 
people (some at Intel) are actively working on SIMD use with LLVM:

https://ispc.github.io/index.html

But I really don't have the skills to tell you if they "just" wrote a new 
C-like language that is autovectorizing well or if they do some even smarter 
stuff to get maximum performances.

[julia-users] How does Julia decide where to truncate displayed values in the REPL?

2014-09-16 Thread gael . mcdon

Oh great, thanks! I just read some documentation about it. I didn't know that 
representing floating point number as strings was that tricky.

[julia-users] How does Julia decide where to truncate displayed values in the REPL?

2014-09-16 Thread gael . mcdon

Hi,

I was experimenting with @simd and was a bit surprised about some results 
on different implementations of a plain summing function:

julia> f(a)
399921.25f0

julia> g(a)
399916.2f0

julia> sum(a)
399922.25f0

julia> sum_kbn(a)
399920.66f0

julia> @printf("%.6f",g(a))
399916.187500

I don't understand the behavior here: most of the time (not to say always 
when using the REPL), I got 8 digits printed with 32 bit floats. The only 
exception is when there are less than 8 significant digits. But here it's 
not the case: why isn't g(a) displayed with 8 digits?

As an additional question: is there advanced string formatting available in 
Julia? Like a format function or a judicious rem overloading? 

Cheers,
Gaël







a = rand(Float32,80)

function f{T}(a::Array{T,1})
   s = zero(T)
   @simd for i in 1:length(a)
   @inbounds s = s + a[i]
   end
   return s
end

function g{T}(a::Array{T,1})
   s = zero(T)
   for i in 1:length(a)
   @inbounds s = s + a[i]
   end
   return s
end

[julia-users] Re: constants and containers

2014-09-15 Thread gael . mcdon

But this is not the same thing. AFAIK, this is not possible yet as a Base 
class. But I'm far from knowing everything.

I've seen discussions about providing a way to inherit default methods from 
a field of a composite type but I can't find them. One problem of the 
method is type preservation. But whatever, you could still do that and set 
the methods appropriately. This would be tedious but should work: all you 
have to do is to not set a setindex! method for your new type.

Or you may want instead to inherit from AbstractArray and define everything 
missing from there and again, set setindex! to something else for your new 
type. I've never done it, I don't know what it would require to make that 
working.




While I would love to see immutability by default on variables which would 
avoid a whole class of bugs and inefficiencies for a modest cost (except 
maybe in the global scope for the REPL? or maybe that would solve the 
global scope efficiency problem?), I'm more sceptical about immutable 
arrays/dict. Not because it is not sensible, but because if someone is 
tempted to change one of your array values, nothing can
prevent him to do so (in Julia and in many other languages).

My take on this would be to go for the composite type and prefix the 
private things with "private" or "_" or whatever and add a comment stating 
that this should never be accessed directly.

Re: [julia-users] Re: "self" dot product of all columns in a matrix: Julia vs. Octave

2014-09-14 Thread gael . mcdon


>
> I wonder if we should provide access to DSFMT's random array generation, 
> so that one can use an array generator. The requirements are that one has 
> to generate at least 384 random numbers at a time or more, and the size of 
> the array must necessarily be even. 
>

> We should not allow this with the global seed, and it can be through a 
> randarray!() function. We can even avoid exporting this function by 
> default, since there are lots of conditions it needs, but it gives really 
> high performance.
>
Are the conditions needed limited to n>384 and n even?

Why not providing it by default then with a single if statement to check 
for the n>384 condition? The n even condition is not really a problem as 
Julia does not allocate the exact amount of data needed. Even for 
fixed-size array, adding 1 extra element (not user accessible) does not 
seem to be much of a drawback.

 

>
> -viral
>

[julia-users] Re: constants and containers

2014-09-14 Thread gael . mcdon

I may have missed something but wouldn't
immutable t
   x
   y
end

immutable t
x
y
end

type u
x
y
end
work?

julia> myvar = t(1,2)
julia> myvar.x=5
ERROR: type t is immutable
julia> v = u(t(1,2), t(3,4))
u(t(1,2),t(3,4))
julia> v.x
t(1,2)
v.x=t(5,6)
t(5,6)
v.x.x=42
ERROR: type t is immutable

If you really want to guaranty constant fields, you have to type them to 
some constant type.

[julia-users] Re: Julia strange behaviour with GC

2014-09-14 Thread gael . mcdon

Interesting. Then I'll restate things and try to answer correctly to your 
comments. Sorry, that will be a long post again :/ but I really need that 
to show that the CLT partly fails in this case.
 

*1- Restating things: the function run a single time *

I know this is not what you are interested in. But I hope I'll be clearer 
there.

Some functions can be used rarely but can take a substantial time in the 
whole calculation. From what I see, people just repeat the calculation *n* 
times, take the arithmetic mean and choose whichever gives the smallest 
number.

*Question:* which function of *f1* and *f2* is the fastest for a single 
calculation given that set of inputs on *that* computer?

*Method 1:* put @time in front of each of them and get timing results.
*Problem 1:* this is not the answer to the question above: 

   - Instead of trying to answer to: *which function would always be faster 
   on my computer with that set of input?*
   - he answered: *which function was faster that time?*

*Problem 2: *Chances are high that "always" is too strong. "Generally" 
could be used instead.


*Method 2:*

> if you perform an operation n times, it will take about nμ seconds. If you 
> want your code to run as fast as possible that's probably what matters.
>
So, if I repeat the measurement *n* times and divide the resulting 
execution time by *n*, I should get a good estimate of the expected 
execution time.

*Problem:* "g*enerally*" is a word so vague that it is not enough! What 
does it mean? Pick your poison:

   - Is *f1* faster than* f2* in more than *X*% of the cases? (hint: the 
   arithmetic mean tells us *nothing* about that)
   - What is the most probable execution time? (hint: this is usually *not* 
   the arithmetic mean, eg. the geometric mean for the lognormal ditribution)
   - What's the range including *X*% of the execution times for* f1*/*f2*? 
   (hint: its center is usually *not* the arithmetic mean)
   - etc.

So not everyone wants the same thing. Plus, if you calculate the arithmetic 
mean of some repetitions, you take into account effects that may slow down 
(GC, OS over time, etc.) or speed up (cache, buffers, etc.)  the 
calculation and give a perfectly wrong estimate. The second thing is that 
the arithmetic mean is almost always not what you want and is always not 
enough (at least provide the variance if you are sure the distribution is 
Gaussian). And as John emphasized, the mean or any other location estimator 
tells us nothing about the distribution spread.

*Method 3:* plot histograms/PDF and let other people pick their own poison.
*Problem: *not a single number, but a complete distribution.
*Advantages:* everything is included and distribution metrics can be 
calculated from that; the spread is known; normality can actually be 
checked, if you observe so, you can tell is some function is *always* 
slower than another one, etc.




*2- How the CLT may sometimes partly failed *

Ok, let's state it now: the CLT is working quite well, it's just that some 
of our assumptions about it aren't. :)

In the world of technical computing, we are mostly interested in running 
> the same computationally intensive functions over and over, and we usually 
> only care about how long that will take.
>
comes down to:
*Question:* which function of *f1* and *f2* is the best suited to put 
inside of a tight loop?

You appropriately and correctly invoked the CLT:

> the central limit theorem guarantees that, given a function that takes a 
> mean time μ to run, the amount of time it takes to run that function n 
> times is approximately normally distributed with mean nμ for sufficiently 
> large n regardless of the distribution of the individual function 
> evaluation times as long as it has finite variance.
>

But let's split that a little bit and check it :)

*2.1- The central limit theorem*

The CLT is all about sums. If you draw *n* samples from the distribution* D* 
and perform a sum of them, you will get one number *N1*. If you draw *n* new 
samples 
from the same distribution* D* and perform a sum of them, you will get a 
new number *N2. *etc. Let's say you performed that operation *m* times. 
Then you got *m* sums of *n* elements from distribution *D*: *N1...Nm*.

The central limit theorem states that the distribution of the numbers *N1...Nm 
*should converge to a normal distribution as *n* goes to infinity. It can 
be justified by convolution: the distribution of the *N1...Nm* corresponds 
to the self-convolution of the initial distribution *D*, *n* times. And it 
so happen (and was proven) that any distribution with finite variance 
should converge to a Gaussian as *n* tends to infinity.

Even more interesting, if you divide these sums *N1...Nm* by *n*, you get a 
mean which variance is *always* smaller than the variance  of the initial 
distribution D.

These two behaviors are the roots of the CLT.

*2.2- Wash it, repeat, rinse, n times*

You are proposing to calculate *N1* only, *not* *N1

Re: [julia-users] Re: Julia strange behaviour with GC

2014-09-12 Thread gael . mcdon


>
> In general, it's important to account for uncertainty. This is the biggest 
> failing of Benchmark.jl. If I were to rewrite that package today, I would 
> place much more emphasis on reporting confidence intervals and I might not 
> even provide point estimates at all.
>

Amen

[julia-users] Re: Julia strange behaviour with GC

2014-09-12 Thread gael . mcdon


>
> You're right that microbenchmarks often do not reflect real-world 
> performance, but in defense of using the sample mean for benchmarking, it's 
> a good estimator of the population mean (provided the distribution has 
> finite variance), and if you perform an operation n times, it will take 
> about nμ seconds. If you want your code to run as fast as possible that's 
> probably what matters.
>
I don't agree. In general, whole programs with I/O has approximate 
log-normal distribution (from my experience). In which case the geometric 
mean is a far better estimator of the location and the mode of the 
distribution.

If you want your code to run as fast as possible, you should profile it. 

If you want to profile code snippets, to get the best local performances, 
then, and it's my whole point, you should at least plot the distributions.

Means can be deceptive. A quick example:


Imagine the experimental data are obtained from function `f1` with a mean 
of 550 s (asymmetric distribution). 
Imagine the normal data are obtained from function `f2` with a mean of 490 
s.

You would choose `f2` without any doubt. Now, Stefan, who has the same 
computer as you but invested in an SSD found that you made a poor choice in 
choosing function `f2` because on his computer, function `f1` is normal and 
gives a mean of 460 s vs. 490 s for `f1`. Who's correct?

(We could prove that the tail was produced by inefficient IO)

 

> As long as you run an operation enough times, the mean will accurately 
> reflect garbage collection overhead. (That isn't just by chance.) 
>
What was by chance is the almost linear trend which I proved wrong by 
sampling values a while longer.

Re: [julia-users] Values vs. Bindings: making sense of things

2014-09-12 Thread gael . mcdon


>
> For the behavior of === you'll want to google "Henry Baker EGAL." 
> Essentially two values are === if and only if there is program that can 
> distinguish them. There's no way to distinguish two "instances" of 1 since 
> integers are immutable. I put instances in quotes because it's not even 
> really well-defined what that means for immutable values like 1. 
>

Wasn't it you who explained that
for i in 1
   print(i)
end
is correct Julia because single values must be considered as dim 0 array? :)

Actually,
a = 1
a[1]
is also valid.

But 
a = 1
a[1] = 2
is not: is `1` an array or not?

I guess this is consistent, in a sense ... But still surprising.

At least,
a = 1
b = 1
a === b
true
seems logical now.

Re: [julia-users] Comprehension Produces Any

2014-09-12 Thread gael . mcdon


>
> @time(begin
> for n = 1:10
> d1 = Float64[ a1[a,b,i,j] .* b1[a,i,j] .* c1[b,i,j] for a = 1:10, 
> b = 1:10, i=1:100,j=1:100 ]
> end
> end)
>

For the sake of completeness,
begin
...
end 
blocks are not local.

I thought let blocks would but it appears they don't.

[julia-users] Julia strange behaviour with GC

2014-09-12 Thread gael . mcdon

A few days ago, Ján Dolinský posted some microbenchmark 
s 
about Julia being slower than Octave for squaring quite large matrices. At 
that time, I wanted to answer with the traditional *microbenchmarks are 
evil*™ type of answer.

I wanted to emphases that:

   1. The mean is hardly a good distribution location estimator in the 
   general (non-normal) case.
   2. Neglecting that, what he calculated was a typical execution time for 
   the squaring of *one and only one* random matrix filled with 7000x7000 
   random Float64 numbers with his computer.
   3. Assuming the 7000x7000 matrix size represents the typical matrix size 
   for his work, that means that Julia is likely *fast enough*: it's bellow 
   1 s, the difference between Julia and Octave is not noticeable by anything 
   except a computer.


But from his latter concerns, I guess this was not really what he had in 
mind (but I may have misinterpreted his words). I supposed that what he 
really wanted was to calculate *many *of such squared matrices, so many of 
them that the observed difference in execution times would matter a lot.


What would be a not-too-bad benchmark adapted to this use case? I guess 
repeating the calculation is required. But instead of measuring time 
outside of the loop, it might be very interesting to get it inside of the 
loop: for each one of the squarings. These individual measurements, if not 
too short, would allow us:

   - to plot the complete time distribution instead of assuming normality 
   and providing only one distribution parameter;
   - to get the cumulative distribution to represent the case of multiple 
   consecutive matrix squarings;
   - to follow the execution time evolution through time or iterations.

So here is the snippet of code that I used:

t = Float64[]
for i = 1:2020
X = rand(1000,1000)
tic()
XX = X.^2
b = toq()
push!(t, b)
end

The matrix generation is in-loop to avoid any smart caching.


*1- Distributions*

So here are the distributions (mind the Y axis is log):



Octave execution times are small and very close from each other. The 
behavior of those two Julia repetitions of the same calculation are much 
weirder : there is not just one peak, but 3 or 4. That's a strong hint 
*against* mean calculations. Furthermore, complete distributions contain 
much more information. I really think that calculating such estimations of 
the distributions is the bare minimum when benchmarking.

Ooops, I just said it: *estimations*. The methods used to estimate 
distributions usually need arbitrary parameters to be set (here, the 
Gaussian kernel and its bandwidth). So the peaks above are of almost 
arbitrary width. Who do I know whether these peaks represent something or 
are just produced by poor bandwidth choice? Repetitions of the whole 
process is one good way to test that. But here, I'll take another approach: 
plotting execution time vs. the iteration numbers.

*2- Execution times vs. iteration number*

So let's look at these execution times:



Ohoh! (Not so) Surprising! While Octave seems to behave as expected (random 
distribution around a single value), Julia exhibits a completely different 
profile: there are 4 different layers. Having done other repetitions, 
Julia's baseline seems to always be more or less the same but there are 
regular execution time bumps now and then. And if you look very carefully, 
you'll see that there is a bump around iteration #1550 in the "Julia 
(rep.)" case. More on that later.

*3- Periodogram*

So I just wanted to have a look at these bumps more carefully. Are they 
always occurring with a given frequency? A good way to answer this question 
is to plot the periodogram:



Julia periodogram shows that the bumps mostly occur every 3 iterations. 
This kind of frequency dependency is not exhibited by Octave.

>From there, I don't really know how to go a lot further than that: 
profiling may help and since @time has the capability to distinguish 
between calculation- and GC-related execution times, I guess it wouldn't be 
so hard to look into this from a GC perspective. This behaviour can also be 
clearly seen in some parts of the Time(iteration #) graph:


How is that important? Things like periodicity can give very strong hints 
about what is actually happening. Raw results are not enough and this is 
one reason why most benchmarks are useless. The second reason is that there 
is a discrepancy between the

Re: [julia-users] Comprehension Produces Any

2014-09-11 Thread gael . mcdon

Wouldn't it be enough to put it in a local scope (let block or in a function?).

For more information, you can ask or look at the Performance tips part of the 
manual.

[julia-users] Values vs. Bindings: making sense of things

2014-09-11 Thread gael . mcdon

> The binding of the argument to the function can never change, though the 
> values of that binding might.

It you be more correct to say that a method cannot change the binding of its 
arguments. You can change bindings, you just can't do it inside of a method 
because of scoping rules. It's just as you said in your second point.

> Is this how julia handles values and bindings, or it there more to this 
> picture?

You seem to have understood it correctly.


Your surprise may actually come from the fact that you read
`[1,2,3] == [1,2,3]`
while it was actually
`[1,2,3] === [1,2,3]`

The three equal signs represent a strong equality: the mathematical equality is 
not sufficient; the objects actually need to be the same, they should 
correspond to exactly the same data in memory, the same address.

This is precisely what you demonstrated.



I'm a bit more surprised with the first statement: 1 === 1. I guess Julia 
wouldn't allocate two different ojects in memory for that, maybe LLVM just 
replaces that with "true" directly. But I find it surprising because this kind 
of equality is only meaningful if you are comparing variables, not values.

I'm also wondering if `==` is required by `===`: would two view of the same 
array be ===-equal? Even if the first one sees the data as Int64 while the 
second one sees them as Float64?

I guess I'll have to spend 20 seconds on that tomorrow.

Re: [julia-users] Re: What wrong , help ;0

2014-09-11 Thread gael . mcdon

BTW, if what you want to achieve is the concatenation of a few arrays, you may 
want to do it from the start instead of putting them in different variables 
first to concatenate them afterwards.

[julia-users] What wrong , help ;0

2014-09-11 Thread gael . mcdon

Why are you using metaprogramming stuff instead of a dict or an array?

It's not what creates the problem, but at least with a dict, it should be 
easier for you to keep everything perfectly clear in your head.

So instead of generating symbols on the fly to @eval them, create a dict with 
"p2" and the likes as keys.

Re: [julia-users] For Loop over set of vector names

2014-09-08 Thread gael . mcdon

By the way, if your "actions" on these variables are independant and in-place, 
the array method would have the extra advantage that it can be made parallel at 
no cost and that it whould let you access the updated values through the array 
but also through their original names.

Re: [julia-users] For Loop over set of vector names

2014-09-08 Thread gael . mcdon

Hi Alex,

As John said, a dict would be far better.

If you feel the need to access the bounding names through a template like this, 
then you'd better build a dict from the start or just before using your trick.

If your elements are well known, you could even directly use an array : if you 
need both the content and the name, you can 'zip' them or 'enumerate' them.

On the fly bounding name generation (even just reading its content) is 
considered a bad practice because these names should only be relevant to the 
programmer, not to the program : you are really trying to achieve a functional 
equivalent to a dict.

[julia-users] Why are images generated by IJulia in low resolution?

2014-09-07 Thread gael . mcdon

Hi, this is matplotlib-related (the python lib behind PyPlot). You have to 
change the figsize or the dpi either this way:

http://stackoverflow.com/questions/17230797/how-to-set-the-matplotlib-figure-default-size-in-ipython-notebook

Or that way:

http://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib

I don't know if this is wrapped directly in PyPlot though. Some persons may 
give more accurate answer.

Just as a reminder though, if you want to reuse those figure, you might want a 
more suitable format such as svg (vector graphics).

BTW, you can switch to vector graphics following the indications in this thread 
(you just have to change Julia's Jupyter profile instead of creating a new one):

http://stackoverflow.com/questions/17582137/ipython-notebook-svg-figures-by-default

Re: [julia-users] Re: Can this magic square function be further optimized?

2014-08-26 Thread gael . mcdon

>I'm a little surprised that you have found the performance implications 
>unconvincing in discussions where the OGs advocate devectorization.

A 10x speed-up on a 10 ms calculation is generally considered unconvincing. An 
unknown gain on an unprofiled code for an undefined context is definitely 
unconvincing, especially if the vectorized form is widely clearer. 

I'm not talking about this specific case for which I find explicit loops at 
least as clear. But I, too, sometimes find the devectorization advice not 
convincing. But fortunately, this is usually for small scripts, not for actual 
Julia libs.

Re: [julia-users] why sum(abs(A)) is very slow

2014-08-23 Thread gael . mcdon

(I was also thinking about element-wise operations)

Re: [julia-users] why sum(abs(A)) is very slow

2014-08-23 Thread gael . mcdon

>To do any of that justice, you end up with a language that looks basically 
>like Haskell. So why not just use Haskell?

Because I don't know anything about it (yet), except the name and the fact that 
you often associated it with lazy evaluation.

Because (#2), this could be a way to make sumabs and the likes obsolete in 
*Julia*. :)

Re: [julia-users] why sum(abs(A)) is very slow

2014-08-22 Thread gael . mcdon

I'm not familiar with lazy evaluation (I've not used any language implementing 
it). But I was wondering...

Why not have a 'calculate_now' function to let the programmer choose when a 
result is guaranteed to be calculated? Otherwise, resort to lazy 
representations.

There could be some heuristic also: if at least one of the original object is 
freed by the GC, perform all the calculations depending on it.

That could also be simpler: defer actual calculations until the end of the 
current block.

[julia-users] Re: machinefile format

2014-08-20 Thread gael . mcdon

To be more precise...

SSH is only used to launch Julia on the worker nodes. Afterwards, Julia sets up 
a socket on both nodes and passes messages through that. If the sockets can't 
be bound, because of a restrictive firewall for instance, Julia would try for 
60 seconds and then time out.

At first, I was wondering if Julia would erroneously try to resolve any machine 
name it finds. But since you tried to add the hostname and the IP in 
/etc/hosts, without any improvement, that certainly means that the resolution 
is OK (not sure though : dig do no check /etc/hosts).

But whatever, by accessing the node, you can see if a socket has been set up by 
Julia.

[julia-users] Re: machinefile format

2014-08-20 Thread gael . mcdon

Oh! I misread what you wrote (I should be sleeping already).

I suggest you try to connect on the remote node to check wether Julia is 
started and trying to connect to the master node (with ps to spot julia and 
netstat or ss to spot the socket usage).

Maybe the worker can't connect back to the master because of firewall rules or 
something.

[julia-users] Re: machinefile format

2014-08-20 Thread gael . mcdon

I still can't check but I'm wondering if Julia is alway trying to resolve what 
it is given in a machonefile.

Did you try with "127.0.0.1" (assuming you got ssh setup on that machine)?

[julia-users] machinefile format

2014-08-20 Thread gael . mcdon

I can't test right now. Did you setup passwordless ssh?

Re: [julia-users] Proposal for inargs

2014-08-14 Thread gael . mcdon

>One nit: can you really support your assertion that C++ and Fortran are the 
>two major languages of scientific computing? In my world, Matlab is used by 
>about 20x (maybe 50x) as many people as C++. I suspect there's a major divide 
>depending on field. Science is a big place.

It's just as you said: it depends heavily on the field. In my experience, 
fields requiring heavy calculations (like CFD or ab initio computational 
chemistry) tend to use C++ and Fortran (actually mostly Fortran if the project 
was not started less than 10 years ago). Unfortunately, I can't find a 
ready-to-read list for CFD but for CC, it was easier:

http://en.m.wikipedia.org/wiki/List_of_quantum_chemistry_and_solid-state_physics_software

The explanation is simple: with BLAS and Fortran and 100-1000 CPU cores, some 
calculations already require weeks to complete.

Re: [julia-users] How to divide DArray vertically and not horizontally?

2014-08-10 Thread gael . mcdon

Sorry, I obviously meant *column* major (as fortran, not C) which makes 
horizontal splitting sensible as the data is naturally contiguous in memory. 
While vertical splitting would require at least two extra copies of the data.

[julia-users] How to divide DArray vertically and not horizontally?

2014-08-10 Thread gael . mcdon

>From my understanding, Julia being row major, it makes little sense to split 
>your arrays vertically.

If your algorithm makes use of a complete row instead of a complete column, 
you'd better either implement an alternative row-major array object or (far far 
more simply), work with the transpose of your data (as a real in-memory 
transpose, not as a view).

Re: [julia-users] git SSL problems with METADATA.jl

2014-08-01 Thread gael . mcdon

I didn't exactly want to launch a discussion about security in general. I did 
not even intend to *actually* mean that your sysadmin just want to look at what 
you are doing (that thing called irony ...).

Given more time, seeing your message, I would have come back and cleared your 
worries much in the way Ivar did. Exactly as he said: you are already allowed 
to download compile and use some random code grabbed from the internet. 
Untrusted download of nearly untrusted code isn't exactly a problem. Actually, 
my wild guess is that since the heartbleed bug finding, github changed its 
certificates and used a new CA (as the CAs themselves had to change them). But 
your system being frozen (aka. HPC great curse), its CA database is outdated. 
So the signature you are getting is legit but you are not checking it against 
the right, up to date, CA public certificate.

But disregarding the recent events, it was much more thrilling to assume that 
the certificates are replace on the go by an evil admin who just want to inject 
slight code modification in what you downloaded to transform it into a bitcoin 
miner.

The ironic tone came because I find it unacceptable, from a sysadmin, to give 
such crappy, misleading and idiotic advice without at least a tiny piece of 
explanation about why, in this specific case, as a temporary workaround, this 
is not all that crappy, misleading and idiotic. (He might be a very nice guy 
doing a very good job usually, but that's irrelevant).

I'm jalous! Instead of being paid a misery to do science, I should have became 
sysadmin, put some idiotic firewall to "protect" the internal network and its 
users, MITM all the https connexion and take advantage of the fools who dared 
to try paying online from within the firewall. I can imagine it "no, there's 
absolutely no problem, you can safely disregard those warnings. If it's really 
a problem, try to use HTTP whenever you can. Your problem is solved? You're 
very welcome". Well, I did crappy career choices. Oh, irony, again...


More seriously, that was ironic but not a joke. This is a bad idea even if it 
can be considered sensible. Ask your sysadmin to update the CA certificate 
database. If it's not working, he has to help you determine why there is still 
a problem with SSL.

@Stephan: for download, as you said, there is checksuming to prevent MITM data 
injection. Finding a collision is already very hard, finding working, 
malicious, code with collision is nearly impossible. SSL make it even more 
impossible.

I would add that even if the chain of trust is not complete, Linux codebase as 
well as Julia's can be trusted with a high degree of confidence. If absolute 
confidence should be reached to bother using SSL, then why not paying online 
using plain HTTP?

Re: [julia-users] git SSL problems with METADATA.jl

2014-08-01 Thread gael . mcdon

Sounds like a bad idea. If the SSL cert is not correct in your configuration 
(whereas it is in the outside world) it becomes clear that your admin just want 
to know what you are downloading.

Basically, he told you "Please let us perform MITM attack on your connexion. To 
make our job easier, please desactivate all the SSL checks so that our cert 
(and maybe others) are accepted".

The extra downside is that now anyone can alter the data you are downloading 
and you won't have the slightest idea this is happening (if that happens).

Re: [julia-users] Re: build-in function to find inverse of a matrix

2014-07-16 Thread gael . mcdon

I agree. What I meant is that since inv is avoided, it's not used nor well 
referenced in search engines or on this list. :)

[julia-users] Re: build-in function to find inverse of a matrix

2014-07-16 Thread gael . mcdon

I think that one of the main reasons it is so difficult to find is that inv 
should generally be avoided.

As Mauro wrote, if you can, rephrase your problem so that you can use x = A\b.

[julia-users] Re: efficient iterative updating for dataframe subsets

2014-07-10 Thread gael . mcdon

If you want to gain more performances, first identify hot spots by profiling 
your code. This is fairly easy in Julia: 
http://julia.readthedocs.org/en/latest/stdlib/profile/

Making your code less readable to gain a 0.01% speed increase in your whole 
program doesn't worth the pain.

Once bottlenecks are identified, there are plenty of ways to get more speed. 
Explicit or implicit devectorization can be used 
(https://github.com/lindahua/Devectorize.jl), sometimes BLAS can be called 
directly with little or no modification in the program structure, etc. You have 
also to be careful about types and memory layout in these parts.

If you need even more speed you can still leverage the SIMD instructions of 
your CPU  and of course, multicore/multinode parallelism 
(http://julia.readthedocs.org/en/latest/manual/performance-tips/).

But as I said, before optimizing, finish your program so that you would be able 
to understand it perfectly 6 months from now, properly profile it and if it is 
too slow for you, optimize the real bottlenecks (not the fantasized ones).

[julia-users] Re: GSOC 3D Visualizations plotting API - Make a wish!

2014-06-26 Thread gael . mcdon

Your progress is impressive! Thanks for your continuous work.

[julia-users] Re: SSH API in Julia

2014-06-20 Thread gael . mcdon


>
> Julia shells out for all ssh functionality.  

Oh, interesting. That would explain why I couldn't find any SSH.jl with 
libssh bindings nor libssh references in Julia's sources.
 

> supporting libgit in base would bring in libssh so maybe this should be 
> reconsidered in the future.

Good to know.

I'll also shell out then. It should still be less cumbersome than Python's 
Popen.
Thanks.

[julia-users] SSH API in Julia

2014-06-20 Thread gael . mcdon

Hi, I'm starting to replace some of my Python with Julia for actual work 
(post-processing actually). My files are on a remote cluster with an HTTP 
proxy between us. With Python, I have to Popen ssh/sftp because Paramiko 
has a bug with proxies in Python 3.

AFAIK, Julia is leveraging SSH for internode communications. So, I guess 
you either reimplemented the protocol or you used libssh... 

But either ways, you've got a working implementation of SSH: is there a way 
to access it to SFTP files or SSH in remote locations from within Julia (as 
a language) ? (I don't care about the proxy yet)

Cheers,
Gaël

[julia-users] Re: parallel processing in chunks

2014-06-11 Thread gael . mcdon


>
>
> i = 1
>> @sync @parallel for i in  i:i+stride
>>   dowork(i)
>>  i += 1
>> end
>>
>
Also, don't assume this kind of trick would work in general (i = 
i:i+stride). Here this is working because you declared i in the global 
scope (before entering the loop). But if you didn't, you would get:
julia> for k in 1:3
   println(k)
   end
1
2
3

julia> k
ERROR: k not defined

By the way, I'm not sure that global scope i should be actually modified by 
in-scope i. But changing that behavior might be tricky or not wanted.

[julia-users] Re: parallel processing in chunks

2014-06-11 Thread gael . mcdon

Many things are not right.


> L = 1:1000
> stride = 5
>
> function dowork(i)
> if i < 1000*: **not in python*
> sleep(5)
> else
> break *you are assuming your code will be inlined, but it won't*
> end *an extra end is missing here* 
>

That specific piece seems weird, I'm not sure this is legal but at the very 
least, this is confusing: 

> i = 1
> @sync @parallel for i in  i:i+stride
>   dowork(i)
>  i += 1
> end
>

I would propose:
j=1
...
@parallel for i... j:j+stride
...
j += stride + 1
...
to make things clear again. I'm not sure you'll get what you want though.

If you want dowork to break the loop, you should have it return something, 
like a boolean.
if dowork(i) break end

I've not spent much time yet at coding in parallel however, I'm really not 
sure @sync behaves as you expect 
.
 
>From what I've understood, @sync acts like a barrier: the @sync block will 
surrender the control to the main task once every @async jobs are run (but 
I may well be totally wrong). In your case, if the job for i = 1500 
finishes before i = 1499, your loop will be broken without dowork(1499) 
being calculated. This is the ugliest kind of bugs one can encounter: the 
ones that occur at random.

Re: [julia-users] Faster sub-strings

2014-06-11 Thread gael . mcdon


>
> We're planning a major revamp of string and I/O functionality in 0.4, so 
> this will all very likely change in the near future. Strings will be much 
> more lightweight and efficient, and SubStrings will be the same as Strings.
>
Are there plans to merge SubArrays with Arrays too?

Re: [julia-users] Faster sub-strings

2014-06-11 Thread gael . mcdon

 Oh ok, I understand now, your ImmutableString are already SubString:
s= SubString(randstring(size), 1)
to make it work right now and hopefully what I posted previously in the 
future. :)

Re: [julia-users] Faster sub-strings

2014-06-11 Thread gael . mcdon


>
> to make it work right now and hopefully what I posted previously in the 
> future. :)
>
Oh my! Does that even make sense?! 

I meant to say:
 s= SubString(randstring(size), 1)
can be used right as what you described as your ImmutableStrings.

While, with very little work in the source code, you could write:
 s= SubString(randstring(size))  
and get the exact behavior you described.

Re: [julia-users] Faster sub-strings

2014-06-11 Thread gael . mcdon


>
>
> One thought, it might be nice to have something a little more automatic. 
> Specifically, I was thinking if there was an 'ImmutableString' class, you 
> would always know that you should make a view rather than a copy. It would 
> also be in keeping with the general design where specifying a type allows 
> the compiler to do nice optimizations.
>
 
I guess that won't be the case to keep types stable. A slice of 
ImmutableString should remain an ImmutableString.

However, SubString (and SubArray) could be changed so that the indices are 
made optional. You could write
s = SubString(randstring(size))
and handle it with views.

Re: [julia-users] Faster sub-strings

2014-06-11 Thread gael . mcdon


>
> Hi Abe,
>
> Looks like you just reimplemented SubString.
>
> julia> x = "Hi there!"
> "Hi there!"
>
> julia> SubString(x, 2, 4)
> "i t"
>
> julia> typeof(ans)
> SubString{ASCIIString} (constructor with 1 method)
>
>
> Which is totally understandable, as there seems to be almost zero 
> documentation about them.  Would you mind opening an issue about that?
>

Hi both of you.

Wouldn't it be logical also to have "sub" handle Strings as it does for 
arrays?

julia> sub([1,2,3,4,5,6], 1:2)
2-element SubArray{Int64,1,Array{Int64,1},(UnitRange{Int64},)}:
 1
 2

and maybe call it "view" instead of or in addition to "sub"?

Re: [julia-users] Reading .tar(.gz) archives in Julia

2014-06-11 Thread gael . mcdon

Just for you to know, there are HDF5 bindings in almost every language I know, 
C++ included.

Don't take what I said as a suggestion though: it's perfectly fine as you did 
it. But others might read this thread and say "HDF5? why not, indeed".

[julia-users] Re: modules, include and export

2014-06-10 Thread gael . mcdon

And as we've already discussed "using" are ugly and in my opinion only make 
sense in the REPL. But well, that's my opinion and I'm not a language designer.

Just try to "using" a module redefining "sin", you won't try twice.

I'm also waiting for the first julia big projects and users begging "would it 
be possible to provide developer documentation on module Foo.jl? It's so big 
and intricated that I can't figure out what is used and when."

Wouldn't that make sense to allow only imports of exported functions? What's 
the rational of the consenting adult approach? Wouldn't it be possible to do 
things a la Python with the double underscore: mymod.__foo is not directly 
accessable, o e has to access it via mymod.__mymod_foo. Exported object could 
be accessable as they are and non exported object could be accessable through 
mymod.NotExported for instance.

Re: [julia-users] Reading .tar(.gz) archives in Julia

2014-06-10 Thread gael . mcdon

What would *also* be pretty useful is allowing different compression filters in 
HDF5.jl.

HDF5 compression capabilitied are not limited to Deflate. Blosc, for one, has 
allowed fast and efficient compression/decompression of data in my case.

Your program would have to be changed to save data in an HDF5 file, but you'll 
get ASCII -> binary "compression" for free, a file format pretty good for 
random access and compression filters that allow even further size reduction.

[julia-users] Re: modules, include and export

2014-06-09 Thread gael . mcdon

Which is asking for troubles...

I always thought that non exported objects could not be imported.

Re: [julia-users] negative power throws error

2014-06-09 Thread gael . mcdon

I'm just wondering: is it that hard to put a floating point somewhere in the 
expression to get all inputs promoted to floating points?

Re: [julia-users] packages with executable scripts

2014-06-09 Thread gael . mcdon

Being a Linux user, I find this behavior weird and explains why I'm mainly 
using packages distributed by my distribution of choice.

Python is not really a reference there since they still consider having 
distribution problems. Two of the main concerns regarding install time scripts 
are security and consistency (and all the bugs that may arise).

Again from a Linux centric point of view, the install script should install the 
executable in a valid path which requires root to be done consistently (or to 
be done at all system-wide).

Julia could add ~/.julia/bin to PATH at install time and install scripts could 
use that. But the day I want ~/.julia in ~/.local/julia, nothing would work 
anymore. The day a package wants to install something in ~/.julia on a Windows 
computer nothing would work anymore.

Other "new" languages usually just consider that only libs should be 
installable through their packet management system.

Re: [julia-users] How to create a copy of a function but with less arguments?

2014-06-04 Thread gael . mcdon

*Arrays*

function f1(a, b, c)
return (a.^2)*sum(a).*c.+exp(-b)
end

function f2(a, b, c=3)
return (a.^2).*sum(a).*c.+exp(-b)
end

f3(a, b) = f1(a, b, 3)

f4 = (a, b) -> f1(a, b, 3)

a = rand(1000);
b = rand(1000);

julia> @time for i=1:1 f1(a, b, 3) end
elapsed time: 0.813792743 seconds (48944 bytes allocated)

julia> @time for i=1:1 f2(a, b) end
elapsed time: 0.81631136 seconds (48944 bytes allocated)

julia> @time for i=1:1 f3(a, b) end
elapsed time: 0.794549163 seconds (48944 bytes allocated)

julia> @time for i=1:1 f4(a, b) end
elapsed time: 0.817011379 seconds (489375324 bytes allocated)

The difference is not significant (Kolmogorov-Smirnov test on individual 
values, p-value=0.4)


*Scalars*

julia> @time for i=1:1 f1(1, 2, 3) end
elapsed time: 0.000512844 seconds (0 bytes allocated)

julia> @time for i=1:1 f2(1, 2) end
elapsed time: 0.00053715 seconds (0 bytes allocated)

julia> @time for i=1:1 f3(1, 2) end
elapsed time: 0.000518782 seconds (0 bytes allocated)

julia> @time for i=1:1 f4(1, 2) end
elapsed time: 0.00165468 seconds (16 bytes allocated)

No significant difference in the first 3. The lambda function is slower 
though.

So Tamas, it seems like it is a fact at least in the scalar case.


*Extra fun*

function f5(c)
return (a,b) -> (a.^2).*sum(a).*c.+exp(-b)
end

f6 = f5(3)

function f7(c)
function f8(a,b)
return (a.^2).*sum(a).*c.+exp(-b)
end
return f8
end

f8 = f7(3)

julia> @time for i=1:1 f2(a, b) end
elapsed time: 0.790140152 seconds (48944 bytes allocated)

julia> @time for i=1:1 f6(a, b) end
elapsed time: 0.905023068 seconds (48944 bytes allocated)

julia> @time for i=1:1 f8(a, b) end
elapsed time: 0.84605453 seconds (48928 bytes allocated)

and

julia> @time for i=1:1 f2(1, 2) end
elapsed time: 0.000512635 seconds (0 bytes allocated)

julia> @time for i=1:1 f6(1, 2) end
elapsed time: 0.00284275 seconds (32 bytes allocated)

julia> @time for i=1:1 f8(1, 2) end
elapsed time: 0.000820845 seconds (16 bytes allocated)

This time, all the elapsed times seem statistically different, even with 
arrays (same test, p-value < 0.01).

Cheers

Re: [julia-users] Accuracy of timing macros

2014-06-03 Thread gael . mcdon

In recent tests, the best distribution estimator was a geometric mean because 
the elapsed-time distribution was log-normal.

If each iteration is not too short (to avoid timing inaccuracies) and not too 
long (to avoid spending a day benchmarking), you'd better get each elapsed-time 
individually and perform proper statistics on them.

What you most certainly want is the mode of the distribution of these values.

Re: [julia-users] Type inheritance in Julia

2014-05-30 Thread gael . mcdon

But I removed the first one within 5 minutes! I'm sorry about that, and even 
more because the post is long.

Re: [julia-users] Inherit behaviour of concrete types in Julia

2014-05-30 Thread gael . mcdon

Le vendredi 30 mai 2014 15:46:55 UTC+2, Tim Holy a écrit :
>
> Are you sure you don't have a type problem somewhere? 
>

Nope, same timings than yours. Your computer and mine should be quite 
similar.

But try this:

import SIUnits: Meter
raw = rand(5000);
units = raw*Meter;
sum(raw);
sum(units);

@time (for i = 1:1000; sum(raw); end)
@time (for i = 1:1000; sum(units); end)

julia> @time (for i = 1:1000; sum(raw); end)
elapsed time: 0.003861036 seconds (16000 bytes allocated)

julia> @time (for i = 1:1000; sum(units); end)
elapsed time: 0.308518733 seconds (79728000 bytes allocated)

There's obviously an allocation problem that might actually explain it all.

Re: [julia-users] Inherit behaviour of concrete types in Julia

2014-05-30 Thread gael . mcdon

It is behaving very well and is far smarter than what I was thinking of: 
this is really impressive!

The only drawback I see is that for now, units are 100 times slower than 
raw calculations on floats and 15 times slower than calculations on a 
Array{5000}. I really hope this will improve over time!

Le vendredi 30 mai 2014 13:13:22 UTC+2, gael@gmail.com a écrit :
>
> Both, I reposted something clearer. I was not aware of these efforts, I'm 
> gonna look at that. Thanks.
>
> Le vendredi 30 mai 2014 12:57:26 UTC+2, Tim Holy a écrit :
>>
>> I'm not sure if you're interested in this as an example of inheritance, 
>> or if 
>> you're interested in this specific application. If the latter, are you 
>> aware of 
>> existing efforts in this arena? You may want to check out SIUnits.jl. 
>>
>> --Tim 
>>
>> On Friday, May 30, 2014 03:37:47 AM gael@gmail.com wrote: 
>> > Hi everyone. 
>> > 
>> > I was thinking for some time about dimensionful arrays or variables in 
>> > general. (I really *really* *really* don't know whether the idea is 
>> sane. 
>> > Actually it tends to create more problem than it solves for what I've 
>> > tried.) 
>> > Dimensionful variables would be a variable which refer to data and a 
>> > physical unit. This can easily be implemented either as: 
>> > 
>> > type DimensionfulArray{T<:Number} 
>> > data::AbstractArray{T} 
>> > unit::String 
>> > end 
>> > 
>> > There are, however, many problems raised by this approach. For 
>> instance: 
>> > 
>> >1. DimensionfulArray does not inherit from AbstractArray so that 
>> every 
>> >functions should be redefined as Foo(x::DimensionfulArray) = 
>> Foo(x.data). 
>> > 2. The approach used in point 1. has the huge default to involve a lot 
>> of 
>> > boilerplate/repeated code and a new definition has to be written each 
>> time 
>> > a new function is used on DimensionfulArray. 
>> >3. Most of the time, one does want type invariance: 
>> > Foo(x::DimensionfulArray) = Foo(x.data) would return a plain array 
>> > instead of a DimensionfulArray instance (wanted most of the time). 
>> >4. Even if we are consenting adults there, letting the "unit" 
>> attribute 
>> >directly accessible is not a really good practice. 
>> >5. With this approach a lot of "If"s will be put everywhere to check 
>> >what the "unit" attribute is. 
>> > 
>> > 
>> http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/ 
>> > had the exact same problem and I'm not sure his solution is the most 
>> > maintainable one. 
>> > https://groups.google.com/forum/#!topic/julia-users/jlJrMTo_L1M was 
>> about 
>> > subtyping concrete types but Stefan redirected to 
>> > 
>> https://groups.google.com/forum/#!msg/julia-dev/p9bV6rMSmvY/cExeSb-cMBYJ 
>> > which is about inheritance in general and delegation in particular. 
>> > 
>> > In https://groups.google.com/forum/#!topic/julia-users/Wwn3KHmmm9I, 
>> John 
>> > crafted a first implementation of what would become the @delegate macro 
>> > which is now in a RFC process in 
>> > https://github.com/JuliaLang/julia/pull/3292. 
>> > 
>> > The 
>> > comment 
>> https://github.com/JuliaLang/julia/pull/3292#issuecomment-34450100 
>> > in particular was of a great interest for me as it would allow almost 
>> > painless delegation: the default is to delegate except if methods are 
>> > explicitly declared with DimensionfulArray as an input type. 
>> > 
>> > I would really like to be able to write: 
>> > 
>> > type DimensionfulAndRankfulArray{T<:Number} 
>> > @delegate data::AbstractArray{T} 
>> > @delegate rank::Int for <, >, ==, !=, <=, >= 
>> > unit::String 
>> > end 
>> > 
>> > But as stated 
>> > in https://github.com/JuliaLang/julia/pull/3292#issuecomment-34551204, 
>> a 
>> > macro is limited in what it can do. To me, but I have not your skills 
>> and 
>> > your experience, there are at least 3 ways to solve the problem: 
>> > 
>> >- Either let it as it is now. The down side is that any method 
>> >declaration occurring after the type declaration and acting on 
>> >AbstractArray won't be taken into account even though the developer 
>> wrote 
>> > that he was delegating by default to the data field. This makes the 
>> code 
>> > harder to grasp, raises strange errors and forbid any code sharing : 
>> from 
>> > the type declaration point, methods will have either to be defined for 
>> > AbstractArray OR DimensionfulAndRankfulArray, but not both of them at 
>> once. 
>> > - Either include delegate in the language and perform delegation at 
>> > runtime, not at type declaration time which can be heavy and complex. - 
>> Or 
>> > tell the compiler at type declaration time that you are sure that any 
>> > function accepting type AbstractArray is gonna work with your 
>> RankfulArray 
>> > because you delegated (trait-like approach with delegation :) ): 
>> > 
>> > 
>> > type RankfulArray{T<:Number} <: AbstractArray 
>> > delegate data::AbstractArray{T} 
>> >

Re: [julia-users] Inherit behaviour of concrete types in Julia

2014-05-30 Thread gael . mcdon

Both, I reposted something clearer. I was not aware of these efforts, I'm 
gonna look at that. Thanks.

Le vendredi 30 mai 2014 12:57:26 UTC+2, Tim Holy a écrit :
>
> I'm not sure if you're interested in this as an example of inheritance, or 
> if 
> you're interested in this specific application. If the latter, are you 
> aware of 
> existing efforts in this arena? You may want to check out SIUnits.jl. 
>
> --Tim 
>
> On Friday, May 30, 2014 03:37:47 AM gael@gmail.com  
> wrote: 
> > Hi everyone. 
> > 
> > I was thinking for some time about dimensionful arrays or variables in 
> > general. (I really *really* *really* don't know whether the idea is 
> sane. 
> > Actually it tends to create more problem than it solves for what I've 
> > tried.) 
> > Dimensionful variables would be a variable which refer to data and a 
> > physical unit. This can easily be implemented either as: 
> > 
> > type DimensionfulArray{T<:Number} 
> > data::AbstractArray{T} 
> > unit::String 
> > end 
> > 
> > There are, however, many problems raised by this approach. For instance: 
> > 
> >1. DimensionfulArray does not inherit from AbstractArray so that 
> every 
> >functions should be redefined as Foo(x::DimensionfulArray) = 
> Foo(x.data). 
> > 2. The approach used in point 1. has the huge default to involve a lot 
> of 
> > boilerplate/repeated code and a new definition has to be written each 
> time 
> > a new function is used on DimensionfulArray. 
> >3. Most of the time, one does want type invariance: 
> > Foo(x::DimensionfulArray) = Foo(x.data) would return a plain array 
> > instead of a DimensionfulArray instance (wanted most of the time). 
> >4. Even if we are consenting adults there, letting the "unit" 
> attribute 
> >directly accessible is not a really good practice. 
> >5. With this approach a lot of "If"s will be put everywhere to check 
> >what the "unit" attribute is. 
> > 
> > 
> http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/ 
> > had the exact same problem and I'm not sure his solution is the most 
> > maintainable one. 
> > https://groups.google.com/forum/#!topic/julia-users/jlJrMTo_L1M was 
> about 
> > subtyping concrete types but Stefan redirected to 
> > https://groups.google.com/forum/#!msg/julia-dev/p9bV6rMSmvY/cExeSb-cMBYJ 
> > which is about inheritance in general and delegation in particular. 
> > 
> > In https://groups.google.com/forum/#!topic/julia-users/Wwn3KHmmm9I, 
> John 
> > crafted a first implementation of what would become the @delegate macro 
> > which is now in a RFC process in 
> > https://github.com/JuliaLang/julia/pull/3292. 
> > 
> > The 
> > comment 
> https://github.com/JuliaLang/julia/pull/3292#issuecomment-34450100 
> > in particular was of a great interest for me as it would allow almost 
> > painless delegation: the default is to delegate except if methods are 
> > explicitly declared with DimensionfulArray as an input type. 
> > 
> > I would really like to be able to write: 
> > 
> > type DimensionfulAndRankfulArray{T<:Number} 
> > @delegate data::AbstractArray{T} 
> > @delegate rank::Int for <, >, ==, !=, <=, >= 
> > unit::String 
> > end 
> > 
> > But as stated 
> > in https://github.com/JuliaLang/julia/pull/3292#issuecomment-34551204, 
> a 
> > macro is limited in what it can do. To me, but I have not your skills 
> and 
> > your experience, there are at least 3 ways to solve the problem: 
> > 
> >- Either let it as it is now. The down side is that any method 
> >declaration occurring after the type declaration and acting on 
> >AbstractArray won't be taken into account even though the developer 
> wrote 
> > that he was delegating by default to the data field. This makes the code 
> > harder to grasp, raises strange errors and forbid any code sharing : 
> from 
> > the type declaration point, methods will have either to be defined for 
> > AbstractArray OR DimensionfulAndRankfulArray, but not both of them at 
> once. 
> > - Either include delegate in the language and perform delegation at 
> > runtime, not at type declaration time which can be heavy and complex. - 
> Or 
> > tell the compiler at type declaration time that you are sure that any 
> > function accepting type AbstractArray is gonna work with your 
> RankfulArray 
> > because you delegated (trait-like approach with delegation :) ): 
> > 
> > 
> > type RankfulArray{T<:Number} <: AbstractArray 
> > delegate data::AbstractArray{T} 
> > delegate rank::Int for <, >, ==, !=, <=, >= 
> > end 
> > 
> > This could be further simplified as: 
> > 
> > type RankfulArray{T<:Number} 
> > delegate data::AbstractArray{T} 
> > rank::Int 
> > end 
> > if there is one and only one delegation in the type definition. 
> > 
> > This would look a lot like concrete type inheritance though with all the 
> > drawbacks that I don't know yet. 
> > 
> > As a last word, being able to preserve types in some functions would be 
> > great

[julia-users] Type inheritance in Julia

2014-05-30 Thread gael . mcdon

(Repost, the first one was not clear)

Hi everyone.

I was thinking for some time about dimensionful arrays or variables in 
general. (I really *really* *really* don't know whether the idea is sane. 
Actually it tends to create more problem than it solves for what I've 
tried.)

Dimensionful variables would be a variable which refer to data and a 
physical unit. This can easily be implemented using composite types:
type DimensionfulArray{T<:Number}
data::AbstractArray{T}
unit::String
end

This approach is not great:

   1. DimensionfulArray does not inherit from AbstractArray so that every 
   functions should be redefined as Foo(x::DimensionfulArray) = Foo(x.data)..
   2. Foo(x::DimensionfulArray) = Foo(x.data) would return a plain array 
   while type stability is often wanted.
   3. Letting the "unit" attribute directly accessible is not a really good 
   practice. But to my knowledge, nothing can be done about that.
   4. With this approach a lot of *if*s have to be used in functions to 
   test for unit content.

As an alternative, there is the plain inheritance approach that would be:
type DimensionfulArray{T<:Number} <: AbstractArray{T}
but is not working because well, I can only declare abstract types this 
way, not concrete types. Otherwise it would be great as this would let me 
declare {T} <: AbstractArray: the "unit" field would be avoided 
altogether with the need for *if*s as one subtype would correspond to one 
unit.

http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/ had 
the exact same problem for DataFrames and solved this by using composite 
types and boilerplate code.

On the other hand, the discussion 
https://groups.google.com/forum/#!topic/julia-users/jlJrMTo_L1M was about 
subtyping concrete types but Stefan redirected to 
https://groups.google.com/forum/#!msg/julia-dev/p9bV6rMSmvY/cExeSb-cMBYJ which 
is about inheritance in general and delegation in particular as a solution 
to this problem. I liked the idea, but it's just a pain to implement.

Then I came to know 
https://groups.google.com/forum/#!topic/julia-users/Wwn3KHmmm9I, where John 
crafted a first implementation of what would become the @delegate macro 
which is now in a RFC process in
https://github.com/JuliaLang/julia/pull/3292. This is a great solution. The 
comment https://github.com/JuliaLang/julia/pull/3292#issuecomment-34450100 in 
particular was of a great interest for me as it would allow automated 
delegation.
With a little bit of extra work and imagination, I might be able to write a 
brand new composite type like:

type DimensionfulAndRankfulArray{T<:Number}
@delegate data::AbstractArray{T}
@delegate rank::Int for <, >, ==, !=, <=, >=
unit::String
end

But as stated in 
https://github.com/JuliaLang/julia/pull/3292#issuecomment-34551204, the 
macro system is limited in what it can do: for instance, methods declared 
after type declaration with the @delegation macro would not work with the 
delegated type as the macro has no way to know that a new usable method is 
declared afterwards. With my limited knowledge of the problem, I see at 
least 3 ways to solve the problem:

   - Either let it as it is now with its inconsistencies.
   - Either include delegate in the language and perform delegation at type 
   declaration time and at method declaration time (as a compilation hook).
   - Or *tell *the compiler at type declaration time that, in our case, any 
   method written for AbstractArray is gonna work with your RankfulArray 
   because of delegation (trait-like approach with delegation :))

type RankfulArray{T<:Number} <: AbstractArray
delegate data::AbstractArray{T}
delegate rank::Int for <, >, ==, !=, <=, >=
end

This could be further simplified as:

type RankfulArray{T<:Number}
delegate data::AbstractArray{T}
rank::Int 
end
if there is one and only one delegation in the type definition: in 
practice, because of the delegation, there is no obvious way to distinguish 
AbstractArray behavior and RankfulArray behavior.

This would look a lot like concrete type inheritance though.





As an extra and almost unrelated enhancement, being able to declare type 
preservation for some methods would be awesome, something like:
type in_eV{T<:Number}
delegate data::AbstractArray{T} preserve +, -
end
which would mean that i*n_eV([1,2,3]).^2* would return *Array([1,4,9])* but 
*in_eV([1,2,3]) 
+ 2* would return *in_eV([3,4,5]).*

This could be extended to preserve the type for any method:
type in_eV{T<:Number}
delegate data::AbstractArray{T} preserve
end

[julia-users] Inherit behaviour of concrete types in Julia

2014-05-30 Thread gael . mcdon

Hi everyone.

I was thinking for some time about dimensionful arrays or variables in 
general. (I really *really* *really* don't know whether the idea is sane. 
Actually it tends to create more problem than it solves for what I've 
tried.)
Dimensionful variables would be a variable which refer to data and a 
physical unit. This can easily be implemented either as:

type DimensionfulArray{T<:Number}
data::AbstractArray{T}
unit::String
end

There are, however, many problems raised by this approach. For instance:

   1. DimensionfulArray does not inherit from AbstractArray so that every 
   functions should be redefined as Foo(x::DimensionfulArray) = Foo(x.data).
   2. The approach used in point 1. has the huge default to involve a lot 
   of boilerplate/repeated code and a new definition has to be written each 
   time a new function is used on DimensionfulArray.
   3. Most of the time, one does want type invariance: 
Foo(x::DimensionfulArray) = Foo(x.data) would return a plain array instead 
   of a DimensionfulArray instance (wanted most of the time).
   4. Even if we are consenting adults there, letting the "unit" attribute 
   directly accessible is not a really good practice.
   5. With this approach a lot of "If"s will be put everywhere to check 
   what the "unit" attribute is.

http://grollchristian.wordpress.com/2014/01/22/julia-inheriting-behavior/ 
had the exact same problem and I'm not sure his solution is the most 
maintainable one.
https://groups.google.com/forum/#!topic/julia-users/jlJrMTo_L1M was about 
subtyping concrete types but Stefan redirected to 
https://groups.google.com/forum/#!msg/julia-dev/p9bV6rMSmvY/cExeSb-cMBYJ 
which is about inheritance in general and delegation in particular.

In https://groups.google.com/forum/#!topic/julia-users/Wwn3KHmmm9I, John 
crafted a first implementation of what would become the @delegate macro 
which is now in a RFC process in
https://github.com/JuliaLang/julia/pull/3292.

The 
comment https://github.com/JuliaLang/julia/pull/3292#issuecomment-34450100 
in particular was of a great interest for me as it would allow almost 
painless delegation: the default is to delegate except if methods are 
explicitly declared with DimensionfulArray as an input type. 

I would really like to be able to write:

type DimensionfulAndRankfulArray{T<:Number}
@delegate data::AbstractArray{T}
@delegate rank::Int for <, >, ==, !=, <=, >=
unit::String
end

But as stated 
in https://github.com/JuliaLang/julia/pull/3292#issuecomment-34551204, a 
macro is limited in what it can do. To me, but I have not your skills and 
your experience, there are at least 3 ways to solve the problem:

   - Either let it as it is now. The down side is that any method 
   declaration occurring after the type declaration and acting on 
   AbstractArray won't be taken into account even though the developer wrote 
   that he was delegating by default to the data field. This makes the code 
   harder to grasp, raises strange errors and forbid any code sharing : from 
   the type declaration point, methods will have either to be defined for 
   AbstractArray OR DimensionfulAndRankfulArray, but not both of them at once.
   - Either include delegate in the language and perform delegation at 
   runtime, not at type declaration time which can be heavy and complex.
   - Or  tell the compiler at type declaration time that you are sure that 
   any function accepting type AbstractArray is gonna work with your 
   RankfulArray because you delegated (trait-like approach with delegation :) 
   ):
   

type RankfulArray{T<:Number} <: AbstractArray
delegate data::AbstractArray{T}
delegate rank::Int for <, >, ==, !=, <=, >=
end

This could be further simplified as:

type RankfulArray{T<:Number}
delegate data::AbstractArray{T}
rank::Int 
end
if there is one and only one delegation in the type definition.

This would look a lot like concrete type inheritance though with all the 
drawbacks that I don't know yet.

As a last word, being able to preserve types in some functions would be 
great, something like:

type in_eV{T<:Number}
delegate data::AbstractArray{T} keep +, -
end
which would mean that in_eV([1,2,3]);^2 would return Array([1,4,9]) 
but in_eV([1,2,3]) + 2 would return in_eV([3,4,5]).

type in_eV{T<:Number}
delegate data::AbstractArray{T} keep
end
would keep the type for everything.


I'm really not sure that this does make sense at all.

[julia-users] Re: Easy way to declare function works on array of floats

2014-05-28 Thread gael . mcdon

Shouldn't it behave the opposite way?

Array{<:FloatingPoint}

representing an array composed of elements which are subtypes of 
FloatingPoint and

Array{FloatingPoint}

representing an array composed of elements of the exact same type which is 
a subtype of FloatingPoint?

Before implementing this great idea, a "<:Type" definition should be found 
as  JeffBezanson stressed on Github.

Le mardi 27 mai 2014 09:16:02 UTC+2, Toivo Henningsson a écrit :
>
> Great to hear such positive response :)
> I've created an issue for the feature request: 
> https://github.com/JuliaLang/julia/issues/6984
>
> On Monday, 26 May 2014 22:25:37 UTC+2, Adam Smith wrote:
>>
>> +1 to Toivo's idea. I LOVE that suggestion. Combine that with this, and 
>> function declarations can fit on one line again:
>> typealias Float FloatingPoint
>>
>>
>>
>> On Monday, May 26, 2014 3:33:38 PM UTC-4, Toivo Henningsson wrote:
>>>
>>> Or you can use
>>>
>>> typealias FPArray{T<:FloatingPoint} Array{T}
>>>
>>> foo(a::FPArray, b::FPArray) = a+b
>>>
>>> to get the same effect (foo will still apply when the element types of aand 
>>> b are different).
>>>
>>> Perhaps we could introduce a syntax to create such a covariant typealias 
>>> on the fly, e.g.
>>>
>>> const FPArray2 = Array{<:FloatingPoint}
>>>
>>> would work the same as FPArray above (though with an anonymous/hidden 
>>> type parameter).
>>> Then the example could be written
>>>
>>> foo(a::Array{<:FloatingPoint}, b::Array{<:FloatingPoint}) = a+b
>>>
>>> if you don't want to define the typealias first.
>>>
>>> On Sunday, 25 May 2014 17:44:26 UTC+2, Pierre-Yves Gérardy wrote:

 On Sunday, May 25, 2014 5:10:49 PM UTC+2, James Crist wrote:
>
> Yeah, that's what I've been using. My issue with it is that the 
> declarations get long for functions with more than 2 arrays. Was hoping 
> there was a more concise way.
>

 You can use  typealias Fp FloatingPoint , then

 function foo{T1<:Fp, T2<:Fp}(a::Array{T1}, b::Array{T2})

[julia-users] Re: Easy way to declare function works on array of floats

2014-05-26 Thread gael . mcdon

Are there some real life examples out there showing such a use? If so, couldn't 
they be replaced by types or collections of object?

Are such arrays still continguous memory segments?

As others said, this is not consistent from Julia's user perspective (while it 
could from Julia developpers one). 

Although, as said, duck typing is just enough in most of the cases, the 
inconsistancy perception remains.

[julia-users] Re: Easy way to declare function works on array of floats

2014-05-25 Thread gael . mcdon



Le dimanche 25 mai 2014 08:29:02 UTC+2, James Crist a écrit :
>
> I've been struggling with this for a while, and haven't found a way to do 
> it that I'm happy with. I'm sure there is one though. Basically, I want to 
> declare a function that works on an array of floats. Doesn't matter what 
> kind of float. Doesn't matter if there are 2 different kinds of float 
> arrays.
>
> For example, if `a` is an array of Float64, and `b` is an array of 
> Float32, the function foobar(a, b) should work, just as well as f(a, a) or 
> f(b, b). I've gotten a couple ways to work, but am not sure what's the most 
> Julian, and have to believe there is a better way:
>
> 1.) Parametric definition. This gets really long if there are multiple 
> different Matrices in the call.
> function foobar{T<:FloatingPoint, S<:FloatingPoint}(a::Matrix{T}, b::
> Matrix{S})
>
> There seems to be an inconsistency with type definition:
function foo(a::FloatingPoint, b::FloatingPoint)
return a+b
end
or with explicit promotion:
function foo(a::FloatingPoint, b::FloatingPoint)
a, b = promote(a, b)
return a+b
end
are working well. 

As well,
x = convert(Array{Float16}, rand(2))
x + rand(2)
is working ok (explicit promotion can also be done).

This would gracefully solve your problem.

But,
function foo(a::Array{FloatingPoint}, b::Array{FloatingPoint})
a + b
end 
is unfortunately not working. I don't know whether this is a bug or a 
choice.


 

> 2.) Type union definition. Not sure if this is optimal or not. It also 
> will only work with the initial defined floats, so anything that subtypes 
> FloatingPoint later on will not be valid
> FloatArrays = Union(Matrix{Float16}, Matrix{Float32}, Matrix{Float64})
>
> function foobar(a::FloatArrays, b::FloatArrays)
>
> Am I making this problem more complicated than it needs to be? What is the 
> correct way to do this?
>

[julia-users] Re: Easy way to declare function works on array of floats

2014-05-25 Thread gael . mcdon

Le dimanche 25 mai 2014 08:29:02 UTC+2, James Crist a écrit :
>
> I've been struggling with this for a while, and haven't found a way to do 
> it that I'm happy with. I'm sure there is one though. Basically, I want to 
> declare a function that works on an array of floats. Doesn't matter what 
> kind of float. Doesn't matter if there are 2 different kinds of float 
> arrays.
>
> For example, if `a` is an array of Float64, and `b` is an array of 
> Float32, the function foobar(a, b) should work, just as well as f(a, a) or 
> f(b, b). I've gotten a couple ways to work, but am not sure what's the most 
> Julian, and have to believe there is a better way:
>
> 1.) Parametric definition. This gets really long if there are multiple 
> different Matrices in the call.
> function foobar{T<:FloatingPoint, S<:FloatingPoint}(a::Matrix{T}, b::
> Matrix{S})
>
>
There seems to be an inconsistency with type definition:
function foo(a::FloatingPoint, b::FloatingPoint)
return a+b
end
or with explicit promotion:
function foo(a::FloatingPoint, b::FloatingPoint)
a, b = promote(a, b)
return a+b
end
are working well. 

As well,
x = convert(Array{Float16}, rand(2))
x + rand(2)
is working ok (explicit promotion can also be done).

This would gracefully solve your problem.

But,
function foo(a::Array{FloatingPoint}, b::Array{FloatingPoint})
a + b
end 
is unfortunately not working. I don't know whether this is a bug or a 
choice.



If you don't care about the specific type of input data, you can directly 
specify a::Matrix{FloatingPoint} and b::Matrix{FloatingPoint}
 

> 2.) Type union definition. Not sure if this is optimal or not. It also 
> will only work with the initial defined floats, so anything that subtypes 
> FloatingPoint later on will not be valid
> FloatArrays = Union(Matrix{Float16}, Matrix{Float32}, Matrix{Float64})
>
> function foobar(a::FloatArrays, b::FloatArrays)
>
> Am I making this problem more complicated than it needs to be? What is the 
> correct way to do this?
>

Just keep in mind that type promotion is not that cheap. This might be 
perfectly fine in your workflow or even done on purpose though so don't pay 
too much attention to this comment.

Re: [julia-users] Re: how to apply vector on diagonal of matrix ?

2014-05-22 Thread gael . mcdon



Le jeudi 22 mai 2014 16:12:12 UTC+2, Tobias Knopp a écrit :
>
> Sure I just made a very quick benchmark and it should not be taken too 
> seriously. I just thought we should not speculate too much on what Matlab 
> does but *better measure it*.
>

That's how I understood it. (But I can't deter myself from shamelessly 
writing posts too long to read. So I put the emphasis there. :)

[julia-users] Re: A 'constant' tuple is used by an inner loop function. How can I best improve performance?

2014-05-22 Thread gael . mcdon



Le jeudi 22 mai 2014 15:23:53 UTC+2, sam cooper a écrit :
>
> Hi,
>
> I have an inner loop function which uses a 'constant' tuple:
>
> fhandle = (expdat,obsv,expdev,t)
>
> with types
>
> fhandle::(Matrix{Float64},Vector{Int64},Float64,Vector{Int64})
>
> Currently I am passing fhandle to the function each time it's called and 
> then reallocating a set of variables i.e.
>
> function sqrerror(fhandle::(Matrix{Float64},Vector{Int64},Float64,Vector{
> Int64}),p::Vector{Float64})
>
> (expdat,obsv,expdev,tsample) = fhandle
> (obs,error_flag) = odesolve(p,tsample,obsv)
>
> if error_flag
> return(1e16)
> end
> error_val = sum((expdat.-obs).^2,1)
> error_val = sum(error_val./(expdev.^2))
> return(error_val)
> end
>
>
> Only 'p' is changed each time the function is called but fhandle is 
> constant but needs to be defined in another file really.
>
> Can I speed this up? I was thinking about using a module with const global 
> variables but the documentation suggested global variables are slower.
>
Did you profile your code? 

>From my experience (which you might not share), the most probable slow par 
of the code would most certainly be "odesolve" in which case, fhandle is 
not your problem. Please first profile your code and then let's see what we 
can do for you.
 

>
> Thankyou in advance for any help and advice
>
> Best,
> Sam
>
>

Re: [julia-users] Re: how to apply vector on diagonal of matrix ?

2014-05-21 Thread gael . mcdon

Just to add one thing. Julia should definitely not use "fast for loops" as 
a selling point. If one has access to unboxed values, it's done. No hard 
work. This is a selling point in comparison to matlab, python and r, but 
not in general.

A very smart automatic devectorizer is much much much harder to get (and 
define, smart is magic), and I would actually *love* to see exactly that as 
the main Julia selling point within a couple of years.
This is not something anyone using LLVM can provide within a few weeks for 
a new language. 

The perspectives would be *tremendous*! For code readability but also for 
automated parallelism.

Re: [julia-users] Re: how to apply vector on diagonal of matrix ?

2014-05-21 Thread gael . mcdon

Le mercredi 21 mai 2014 14:55:34 UTC+2, Tim Holy a écrit :
>
> I agree that vectorizing is very nice in some circumstances, and it will 
> be 
> great if Julia can get to the point of not introducing so many 
> temporaries. 
> (Like Tobi, I am not convinced this is an easy problem to solve.) 
>
> But there are plenty of cases where loops are clearer even when 
> vectorization 
> is both possible and efficient. For example, surely you can't tell me that 
>
> L = similar(A) 
> m,n = size(A) 
> L[2:m-1,2:n-1] = A[3:m,2:n-1] + A[1:m-2,2:n-1] + A[2:m-1,3:n] + 
> A[2:m-1,1:n-2] - 4A[2:m-1, 2:n-1] 
>
> is easier to read and understand than 
>
> L = similar(A) 
> for j=2:size(A,2)-1, i=2:size(A,1)-1 
> L[i,j] = A[i+1,j] + A[i-1,j] + A[i,j+1] + A[i,j-1] - 4A[i,j] 
> end 
>
> I completely agree. When I was talking about vectorized code, I meant what 
Andreas showed.

> Not only does the loops-version have fewer keystrokes, it's a heck of a 
> lot 
> clearer that what you're doing is computing the 2d discrete laplacian. 
>
> --Tim 
>
> On Wednesday, May 21, 2014 05:09:57 AM Andreas Lobinger wrote: 
> > Hello colleague, 
> > 
> > On Wednesday, May 21, 2014 7:33:01 AM UTC+2, Andreas Noack Jensen wrote: 
> > > Please consider b += Diagonal(c) instead of diagm. Diagonal(c) only 
> stores 
> > > the diagonal elements but works like diagm(c) for matrix arithmetic 
> > > 
> > > Vectorized code is always easier to understand 
> > > 
> > > 
> > > That is a strong statement. I have vectorised MATLAB code with repmat 
> and 
> > > meshgrids that is completely unreadable, but would be fairly easy to 
> > > follow 
> > > if written as loops. I really enjoy that I can just write a loop in 
> Julia 
> > > without slow execution. 
>
Me too :)  I agree about my statement: I completely overlooked complicated 
slicing and stuff. But I also said that a part of what made explicit loops 
less easier to understand are explicit indices. Same thing goes for 
"vectorized" code.

My whole point is to say that, in general, one should divide the problem 
into understandable pieces to write clear code. Vectorized expression are a 
way to get that. For instance,

A -= Diagonal(B)

is very easy to understand, if not obvious. You almost wrote in plain 
english there, and that's a strength: no need to *actually* understand the 
code to grasp what it does.

for i=1:length(B)
A[i, i] -= B[i]
end
is not that hard to understand because it's really short, but there, the 
only way you can get the intent of the programmer is by browsing the whole 
code and understanding the whole thing.

To get back to Tim example.
On could think of creating a DiscreteLaplacian2D function taking a matrix A 
as an input. This is actually what I would do to keep the complexity of any 
given piece of code very low. Let's solve the heat equation iteratively, 
for that we need to calculate:

DiscreteDerivative(A, 1) - alpha.*DiscreteLaplacian2D(A, [2,3])

or even better, knowing the shape of A:

TimeDerivative(A) - alpha.*SpaceLaplacian2D(A)

are very easy to grasp and understand. I've not directly used loops : this 
is plain vectorized code. It's obvious and easy to maintain.

I'm sure you will agree. But what's the problem then? This is not *really*what 
you meant about vectorized code.

Well, let's handle the same problem with the current trend : this is not as 
fast as Julia could, this might be fast enough but definitely not the 
fastest. It's also using more RAM that it should, so why not *rewriting it 
into loops*? Yes, of course, why not! The only problem is that the only way 
to achieve something usefull would be to solve the *whole* equation in one 
fat bunch of loops. Which means:

   - Duplicated code all over the place. You'll have to write/(copy/paste) 
   your devectorized laplacian for *each* single equation that needs it.
   - Unreadable code.
   - Longer code.
   - More places to make mistakes.
   - More places to search for to correct copy/paste mistakes.
   - Bigger mental map for you project.
   - Fewer external contributors.
   - more "end"s ;)

> 
> > If you write code in Matlab (or similar) that's using repmat (nowadays 
> > bsxfun) and meshgrids (and reshape) you are operating on wrong side of 
> > vectorising your code. Most likely (and there might be good reasons for 
> > that) your data is not in the 'right' shape, so you reorder more than 
> you 
> > calculate. For problems like this explicit loops are often the better 
> way 
> > (so straightforward in julia) OR you take some thinking, why the data 
> needs 
> > to be reordered for certain operations. 
> > 
> > The right side of vectorized code is the SIMD thinking which does 
> > operations (some times linalg) on Matrix/Vector forms, and that in the 
> best 
> > sense leads to one-liners like 
> > 
> > M += ((A*B) + (d*C) + (E .* M0)) 
> > 
> > which are easy to read/write/understand. 
> > 
> > Expressions like this seems to create in julia an incredible a

Re: [julia-users] Re: how to apply vector on diagonal of matrix ?

2014-05-20 Thread gael . mcdon

While I cannot not agree with this ;), I'd like to state that:
1) High level functions might leverage clever algorithms faster than plain 
loops (best example comming to mind: dot).
2) Vectorized code is always easier to understand, write, fix and maintain 
because the intent is clear from the start. You equation is written just as it 
was on paper and not burried within nested loops among many explicit indices.
Moreover, Julia will get better at devectorizing and at avoiding temporaries.

Therefore I would recommand using explicit loops only when *proved* to provide 
a necessary speedup or a memory gain.

In this case diagm is working just as intended and saving 20ns on matrix 
construction just seem silly. I perfectly understand your point though but 
explicit loops have downsides too.

Profile first, optimize later... Comment now!  ;)

[julia-users] Re: GSOC 3D Visualizations plotting API - Make a wish!

2014-05-19 Thread gael . mcdon

I don't want to be pushy. So this is the last time I talk about that but VTK 
*is* fast and already provide volume rendering among other things. 

An opengl binding would be great though but if I'm given the choice between a 
fully featured, highly optimized, highly parallel lib and doing it all over 
again from opengl I would really prefer the first option. Even worse, I would 
rather not use something as low level as opengl to display my data, having to 
reimplement everything each time I want a new feature is just not sensible (and 
not fast).

Even as a basic and "fast" package you'll end up being slower for even 
medium-sized datasets because it would require years of optimization to reach 
the speed of VTK.

[julia-users] Re: GSOC 3D Visualizations plotting API - Make a wish!

2014-05-18 Thread gael . mcdon


I actually said *not* to implement direct jpeg export. I thought you were 
targeting stuff like 3D graphs in which case jpeg export generaly does more 
harm than good because most of the people don't know which image format to 
choose and use what they are the most used to see: jpeg; even though it makes 
absolutely no sense for graphs.

Just a comment. If you are really willing to implement something from scratch 
please do. But did you consider using libraries like VTK (I'm sure I've read 
something about it on one of julia's lists a few days ago).

VTK is not only about providing a way to visualize data, it's also a complete 
framework for choosing how and what to render from a given dataset using a 
pipeline of filters which alter data or representations of data. This is what 
is used by paraview and in Python in Mayavi.

I would love to see something inspired by Mayavi in Julia!

Again, if you want to implement something using opengl, it's fine but you might 
also consider leveraging such a powerful tool.

[julia-users] GSOC 3D Visualizations plotting API - Make a wish!

2014-05-18 Thread gael . mcdon

High quality and low complexity vector graphics.

By high quality I mean clear, nice and customizable.

By low complexity I mean direct generation of SVG/PDF primitives : opengl2ps is 
horrible for that purpose, for instance opengl spheres are approximated by 
thousands of triangles instead of just being represented as a SVG/PDF circle 
primitive.

And vector graphics is just the way to go in almost any case.

As a bonus, please do no provide direct jpeg export.

[julia-users] Re: Slicing in julia compared to numpy

2014-05-09 Thread gael . mcdon


I don't understand this bit: "Numpy's notation has the advantage to make 
the most expensive operations harder to write, it also easily reminds us 
that arrays in numpy are row-major." How does this make the more expensive 
operation harder to write?

Just by writing this (for a a 2D array):
for i in a.T:
   do_something(i)
or even worse that:
for i in range(a.shape[1]):
   do_something(a[:,i])
I already know that I'm doing something slower than just:
for i in a:
do_something(i)

In the first case, I'm transposing (actually, numpy does nothing but 
returning another view of the data, so the transpose itself is cheap). 
Since numpy does not actually copy the data elsewhere, iterating over a.T 
is expensive because you are doing it column-wise in a row-major layout. So 
numpy user might have the good impression for the wrong reason 
(transposing) but the result is the same.

In the second case, it's just not the Python's way of doing it.

In both cases, beginers are usually surprised that one axis is blessed. 
They complain and they are told that there is a perfectly good reason to do 
so. They learn and they adapt. That's what I'm doing there also for another 
language which is not python: I want to learn why it was done that way in 
Julia. :)


> To me (but I didn't design Julia so what do I know) have a[i] refer to a 
> specific element makes sense because it makes it easy to write a loop that 
> does something to each element without having to worry about the dimensions 
> of the array. 
>
> To me, having a[i] is redundant: almost all operators (inplace operators 
included) has a per element counterpart which makes the writing of explicit 
loops superfluous in most of the cases.

[julia-users] Re: Slicing in julia compared to numpy

2014-05-09 Thread gael . mcdon

I don't understand this bit: "Numpy's notation has the advantage to make 
the most expensive operations harder to write, it also easily reminds us 
that arrays in numpy are row-major." How does this make the more expensive 
operation harder to write?

Just by writing this (for a a 2D array):
for i in a.T:
   do_something(i)
or even worse that:
for i in range(a.shape[1]):
   do_something(a[:,i])
I already know that I'm doing something slower than:
for i in a:







To me (but I didn't design Julia so what do I know) have a[i] refer to a 
> specific element makes sense because it makes it easy to write a loop that 
> does something to each element without having to worry about the dimensions 
> of the array. 
>
To me all dotted operators and all the basic functions already does that. 
If you want to perform operations inplace, inplace operators can also be 
dotted which makes the write of explicite loops over all the elements of a 
multidimensional array a pretty rare custom.

[julia-users] Slicing in julia compared to numpy

2014-05-09 Thread gael . mcdon

Hi, my mind might be completely fried by intensive use of numpy arrays, but 
I find slicing in Julia quite odd and I'd like to understand why. I'm not 
saying Julia *should* behave like numpy but I find numpy's behaviour really 
convenient. Let me explain...

Let say I created a random 2D array in numpy and Julia which is:
a = 0.618734  0.858034  0.261089
0.988064  0.323639  0.379946
0.571054  0.998853  0.453861
0.965824  0.629747  0.210426

In numpy, a[0] would give:
0.618734  0.858034  0.261089
and in Julia, a[1] would give:
0.618734

In numpy, a[:] would give the whole matrix, in Julia, the flatten matrix.

>From what I've understood, these differences come down to the fact that 
numpy arrays are considered as containers. The fact that default operators 
act element-wise emphases that fact while Julia's arrays are not considered 
only as containers but as algebraic objects. They are well defined in 1D 
(vector) and 2D (matrix) but not that much for higher dimensional stuff 
(tensors were evoked, with a generalisation of the dot product discussed 
here ).

In short the differences go down to: a[0] in numpy maps to a[1, :] in Julia 
and a[1] in Julia maps to a[0, 0] in numpy in 2D. 

Numpy's notation has the advantage to make the most expensive operations 
harder to write, it also easily reminds us that arrays in numpy are 
row-major.

Julia's notation has the advantage of not doing stuff behind your back 
(possibly saving a few CPU cycles), but at the expense of increased 
verbosity. This is even more obvious when a 3D container should be 
considered as a series of matrices: AFAIK, there is no way to cleanly 
iterate over them. I'll have to rely on explicite indexing.

Python:
ref = rand(3,4) # matrix of 3 rows by 4 columns
a = rand(5,4,3) # 5 matrices of 4 rows by 3 columns
for i in a:
print(i.dot(ref))
or, still in Python:
ref = rand(3,4)
a = rand(5,4,3)
print(a.dot(ref))
which is way faster than the first one.


In Julia in comparison, this is not allowed and one has to rely on explicit 
indexing:
ref = rand(3,4) # matrix of 3 rows by 4 columns
a = rand(4,3,5) # matrix of 4 rows by 3 columns, 5 times
#a * ref# Not working (yet)
for i = 1:size(a)[3]
println(a[:, :, i] * ref)
end

The Python version is less verbose, but more importantly, you don't have to 
think of row-major or column-major: if you need to get a transpose 
somewhere, that means slower code. Whereas in Julia, println(a[:, :, i] * 
ref) could be transformed to println(a[i, :, :] * ref) whithout realising 
this could lead to inefficiencies. Of course, this can be written in Python 
too, but that would also mean slower code.

I'm not saying Julia should change and I understand that once a generalised 
* operator lands, verbosity and innefficient constructs won't be a problem 
anymore. But considering all of that, numpy's behavior still looks more 
convenient (and this example runs twice faster on Python than on Julia 
which indicates that the performance penalty of returning a matrix the 
numpy's way is negligible).

Why does Julia make the choice to return only one number when the ambigous 
a[1] is executed?

74 matches

Mail list logo