Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-09 Thread Páll Haraldsson
On Friday, May 1, 2015 at 5:23:40 PM UTC, Steven G. Johnson wrote:

 On Friday, May 1, 2015 at 1:12:00 PM UTC-4, Steven Sagaert wrote: 

 That wasn't what I was saying. I like the philosophy behind julia. But in 
 practice (as of now) even in julia you still have to code in a certain 
 style if you want very good performance and that's no different than in any 
 other language.


 The goal of Julia is not to be a language in which it is *impossible* to 
 write slow code, or a language in which all programming styles are equally 
 fast.   The goal (or at least, one of the goals) is to be an expressive, 
 high-level dynamic language, in which it is also *possible* to write 
 performance-critical inner-loop code.


*Summary*

Thanks (all) for answering. I agree that *possible* to write fast code is a 
goal. I believe that has been achieved. Nobody commented much on my list of 
concerns..

Yes, of course *impossible* to write slow code is a very high bar.. I just 
thought Python - an interpreted language - wasn't a high bar :) I'm just 
using that as a comparison. I would like (newbie) Julia code not be beaten 
by (core language) Python. Or not at least by much (a constant factor). Has 
that been achieved? I noticed the yes/no answer on Any. Global no 
longer a problem? Yes, gets you slow code but compared to Python? 
Tuples/Dict now as fast? [I just noticed Named tuples thread.]

Then there are of course, say, Python libraries that are faster to 
non-exciting Julia ones.. My hope is through PyCall you can use them all (I 
understand that to be the case) - without speed penalty. We may still have 
the two/N-language problem for a while, for functionality reasons but not 
speed-reasons.. The dual Julia/Python is much preferred problem I think 
to Julia/C or Python/C.. and gets you all the batteries included you 
would want (speaking as a non-math user).

Great to see that strings are being worked on, I never wanted this thread 
to be just about one thing. I can now see how RefCounting in Python helps 
strings.. I'm also looking into how to beat Python there..


 

That *is* different from other high-level languages, in which it is 
 typically *not* possible to write performance-critical inner-loop code 
 without dropping down to a lower-level language (C, Fortran, Cython...).   
 If you are coding exclusively in Python or R, and there isn't an optimized 
 function appropriate for the innermost loops of your task at hand, you are 
 out of luck.



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-04 Thread Scott Jones
I wasn't trying to say that it was specific to strings, I was saying that 
it is not specific to I/O, which the name would seem to indicate...
and it keeps getting brought up as something that should be used for basic 
mutable string operations.

On Sunday, May 3, 2015 at 3:20:43 PM UTC-4, Tamas Papp wrote:

 consider 

 let io = IOBuffer() 
   write(io,rand(10)) 
   takebuf_array(io) 
 end 

 IOBuffer() is not specific to strings at all. 

 Best, 

 Tamas 

 On Sun, May 03 2015, Scott Jones scott.pa...@gmail.com javascript: 
 wrote: 

  Because you can have binary strings and text strings... there is even a 
  special literal for binary strings... 
  b\xffThis is a binary\x01\string 
  This is a \u307 text string 
  
  Calling it an IOBuffer makes it sound like it is specific to I/O, not 
 just 
  strings (binary or text) that you might never do I/O on... 
  
  On Sunday, May 3, 2015 at 2:43:14 PM UTC-4, Kristoffer Carlsson wrote: 
  
  Why should it be called StringBuffer when another common use of it is 
 to 
  write raw binary data? 



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-04 Thread Scott Jones


On Sunday, May 3, 2015 at 6:10:00 PM UTC-4, Kevin Squire wrote:

 One thing I was confused about when I first started using Julia was that 
 things that are done with strings in other languages are often done 
 directly with IO objects in Julia.

 For example, consider that, in Python, most classes define `__str__()` and 
 `__repr__()`, which create string representations of objects of this class 
 (the first more meant for human consumption, the second for parsing 
 (usually)).  

 In Julia, the implicit assumption is that most strings are meant for 
 output in some way, so why not skip the extra memory allocation and write 
 the string representation directly to output.  For this, types define 
 `show(io::IO, x::MyType)`.  If you really want to manipulate such strings, 
 you can (as pointed out in this thread) go through an IOBuffer object 
 first.  (There is also `repr(x::SomeType)`, but it's not emphasized as 
 much.)


Problem is, with what I'm doing, the strings are almost never written to 
output... they are analyzed, modified, stored and retrieved from a 
database... and you want all the normal
string operations... you might be doing regex search/replace, for 
example...  and for performance reasons, you don't want to be converting to 
an immutable string all the time.

This was a design decision made early on.  I personally found (and still 
 find) it somewhat awkward at times, but for many things, it works fine, and 
 (seemingly) it lets most string output allocate less memory by default.

 Now, it certainly is the case that mutable strings may be very useful in 
 some contexts.  The BioSeq.jl package implements mutable DNA and protein 
 sequences, which are very useful there, and would be represented by mutable 
 strings in many other languages.  The best way to test that would probably 
 be to create a package (say, MutableStrings.jl), and define useful types 
 and functions there.


There are a few things I'd like to add to Julia wrt strings, validated 
strings (right now, it is a bit of a mishmash as to whether or not convert 
functions will accept invalid Unicode data),
and mutable strings...  Somebody already did create a MutableStrings.jl, 
however it is broken, it doesn't look like it has been updated in over a 
year, and is only for ASCII and UTF-8, it doesn't have UTF-16 or UTF-32 
mutable strings...
(and I also want mutable 8-bit (ANSI Latin 1) strings  and UCS-2 strings 
(i.e. UTF-16 with no surrogates) [that is so that it would be a 
DirectIndexString, to get O(1) instead of O(n) for some operations].)

Cheers,
Kevin

 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-04 Thread Tamas Papp
I think you misunderstand: IOBuffer is suggested not for mutable string
operations in general, but only for efficient concatenation of many
strings.

Best,

Tamas

On Mon, May 04 2015, Scott Jones scott.paul.jo...@gmail.com wrote:

 I wasn't trying to say that it was specific to strings, I was saying that
 it is not specific to I/O, which the name would seem to indicate...
 and it keeps getting brought up as something that should be used for basic
 mutable string operations.

 On Sunday, May 3, 2015 at 3:20:43 PM UTC-4, Tamas Papp wrote:

 consider

 let io = IOBuffer()
   write(io,rand(10))
   takebuf_array(io)
 end

 IOBuffer() is not specific to strings at all.

 Best,

 Tamas

 On Sun, May 03 2015, Scott Jones scott.pa...@gmail.com javascript:
 wrote:

  Because you can have binary strings and text strings... there is even a
  special literal for binary strings...
  b\xffThis is a binary\x01\string
  This is a \u307 text string
 
  Calling it an IOBuffer makes it sound like it is specific to I/O, not
 just
  strings (binary or text) that you might never do I/O on...
 
  On Sunday, May 3, 2015 at 2:43:14 PM UTC-4, Kristoffer Carlsson wrote:
 
  Why should it be called StringBuffer when another common use of it is
 to
  write raw binary data?



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-04 Thread Scott Jones

 On May 4, 2015, at 3:21 AM, Tamas Papp tkp...@gmail.com wrote:
 
 I think you misunderstand: IOBuffer is suggested not for mutable string
 operations in general, but only for efficient concatenation of many
 strings.
 
 Best,
 
 Tamas

I don’t think that I misunderstood - it’s that using IOBuffer is the only 
solution that has been given here… and it doesn’t handle what I need to do 
efficiently...
If you have a better solution, please let me know…

Scott

 On Mon, May 04 2015, Scott Jones scott.paul.jo...@gmail.com 
 mailto:scott.paul.jo...@gmail.com wrote:
 
 I wasn't trying to say that it was specific to strings, I was saying that
 it is not specific to I/O, which the name would seem to indicate...
 and it keeps getting brought up as something that should be used for basic
 mutable string operations.
 
 On Sunday, May 3, 2015 at 3:20:43 PM UTC-4, Tamas Papp wrote:
 
 consider
 
 let io = IOBuffer()
  write(io,rand(10))
  takebuf_array(io)
 end
 
 IOBuffer() is not specific to strings at all.
 
 Best,
 
 Tamas
 
 On Sun, May 03 2015, Scott Jones scott.pa...@gmail.com http://gmail.com/ 
 javascript:
 wrote:
 
 Because you can have binary strings and text strings... there is even a
 special literal for binary strings...
 b\xffThis is a binary\x01\string
 This is a \u307 text string
 
 Calling it an IOBuffer makes it sound like it is specific to I/O, not
 just
 strings (binary or text) that you might never do I/O on...
 
 On Sunday, May 3, 2015 at 2:43:14 PM UTC-4, Kristoffer Carlsson wrote:
 
 Why should it be called StringBuffer when another common use of it is
 to
 write raw binary data?



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-04 Thread Tamas Papp
On Mon, May 04 2015, Scott Jones scott.paul.jo...@gmail.com wrote:

 On May 4, 2015, at 3:21 AM, Tamas Papp tkp...@gmail.com wrote:
 
 I think you misunderstand: IOBuffer is suggested not for mutable string
 operations in general, but only for efficient concatenation of many
 strings.
 
 Best,
 
 Tamas

 I don’t think that I misunderstood - it’s that using IOBuffer is the only 
 solution that has been given here… and it doesn’t handle what I need to do 
 efficiently...
 If you have a better solution, please let me know…

1. Can you share the benchmarks (and simplified, self-contained code)
for your problem using IOBuffer? I have always found it very fast, but
maybe what you are working on is different.

2. Do you have a specific algorithm in mind that would be more
efficient?

Best,

Tamas


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-03 Thread Scott Jones
I should be clear, I didn't mean that all strings should be immutable, but 
rather that I also want to have mutable strings available... There is a package 
for that, but 1) I think it's incomplete (I may need to contribute to it), and 
2) I think that they do belong in the base language...
CLU had both, which was very nice...
For many things, IOBuffer is exactly the right way of doing things (the name is 
misleading though... Maybe it should have been StringBuffer...), but there are 
use cases where you are constantly modifying the string while performing other 
string operations on it...

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-03 Thread Steven Sagaert
You really should ask the language designers about this for a definite 
answer but (one of the ) the reason(s) strings are immutable in julia (and 
in Java  others) is that it makes them good keys for Dicts.

On Saturday, May 2, 2015 at 7:16:24 PM UTC+2, Jameson wrote:

 IOBuffer does not inherit from string, nor does it implement any of the 
 methods expected of a mutable string (length, endof, insert! / splice! / 
 append!). If you want strings that support all of those operations, then 
 you will need something different from an IOBuffer. If you just wanted a 
 fast string builder, then IOBuffer is the right abstraction (ending with a 
 call to `takebuf_string!`). This dichotomy helps to give a clear 
 distinction in the code between the construction phase and usage phase.

 On Sat, May 2, 2015 at 12:49 PM Páll Haraldsson pall.ha...@gmail.com 
 javascript: wrote:

 2015-05-01 16:42 GMT+00:00 Steven G. Johnson steve...@gmail.com 
 javascript::


 In Julia, Ruby, Java, Go, and many other languages, concatenation 
 allocates a new string and hence building a string by repeated 
 concatenation is O(n^2).   That doesn't mean that those other languages 
 lose on string processing to Python, it just means that you have to do 
 things slightly differently (e.g. write to an IOBuffer in Julia).

 You can't always expect the *same code* (translated as literally as 
 possible) to be the optimal approach in different languages, and it is 
 inflammatory to compare languages according to this standard.

 A fairer question is whether it is *much harder* to get good performance 
 in one language vs. another for a certain task.   There will certainly be 
 tasks where Python is still superior in this sense simply because there are 
 many cases where Python calls highly tuned C libraries for operations that 
 have not been as optimized in Julia.  Julia will tend to shine the further 
 you stray from built-in operations in your performance-critical code.


 What I would like to know is do you need to make your own string type to 
 make Julia as fast (by a constant factor) to say Python. In another answer 
 IOBuffer was said to be not good enough.


 -- 
 Palli.



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-03 Thread Kevin Squire
One thing I was confused about when I first started using Julia was that
things that are done with strings in other languages are often done
directly with IO objects in Julia.

For example, consider that, in Python, most classes define `__str__()` and
`__repr__()`, which create string representations of objects of this class
(the first more meant for human consumption, the second for parsing
(usually)).

In Julia, the implicit assumption is that most strings are meant for output
in some way, so why not skip the extra memory allocation and write the
string representation directly to output.  For this, types define
`show(io::IO, x::MyType)`.  If you really want to manipulate such strings,
you can (as pointed out in this thread) go through an IOBuffer object
first.  (There is also `repr(x::SomeType)`, but it's not emphasized as
much.)

This was a design decision made early on.  I personally found (and still
find) it somewhat awkward at times, but for many things, it works fine, and
(seemingly) it lets most string output allocate less memory by default.

Now, it certainly is the case that mutable strings may be very useful in
some contexts.  The BioSeq.jl package implements mutable DNA and protein
sequences, which are very useful there, and would be represented by mutable
strings in many other languages.  The best way to test that would probably
be to create a package (say, MutableStrings.jl), and define useful types
and functions there.

Cheers,
   Kevin



On Sun, May 3, 2015 at 12:20 PM, Tamas Papp tkp...@gmail.com wrote:

 consider

 let io = IOBuffer()
   write(io,rand(10))
   takebuf_array(io)
 end

 IOBuffer() is not specific to strings at all.

 Best,

 Tamas

 On Sun, May 03 2015, Scott Jones scott.paul.jo...@gmail.com wrote:

  Because you can have binary strings and text strings... there is even a
  special literal for binary strings...
  b\xffThis is a binary\x01\string
  This is a \u307 text string
 
  Calling it an IOBuffer makes it sound like it is specific to I/O, not
 just
  strings (binary or text) that you might never do I/O on...
 
  On Sunday, May 3, 2015 at 2:43:14 PM UTC-4, Kristoffer Carlsson wrote:
 
  Why should it be called StringBuffer when another common use of it is to
  write raw binary data?



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-03 Thread Kristoffer Carlsson
Why should it be called StringBuffer when another common use of it is to write 
raw binary data?

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-03 Thread Tamas Papp
consider

let io = IOBuffer()
  write(io,rand(10))
  takebuf_array(io)
end

IOBuffer() is not specific to strings at all.

Best,

Tamas

On Sun, May 03 2015, Scott Jones scott.paul.jo...@gmail.com wrote:

 Because you can have binary strings and text strings... there is even a
 special literal for binary strings...
 b\xffThis is a binary\x01\string
 This is a \u307 text string

 Calling it an IOBuffer makes it sound like it is specific to I/O, not just
 strings (binary or text) that you might never do I/O on...

 On Sunday, May 3, 2015 at 2:43:14 PM UTC-4, Kristoffer Carlsson wrote:

 Why should it be called StringBuffer when another common use of it is to
 write raw binary data?


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-03 Thread Scott Jones
Because you can have binary strings and text strings... there is even a 
special literal for binary strings...
b\xffThis is a binary\x01\string
This is a \u307 text string

Calling it an IOBuffer makes it sound like it is specific to I/O, not just 
strings (binary or text) that you might never do I/O on...

On Sunday, May 3, 2015 at 2:43:14 PM UTC-4, Kristoffer Carlsson wrote:

 Why should it be called StringBuffer when another common use of it is to 
 write raw binary data?



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-02 Thread Jameson Nash
IOBuffer does not inherit from string, nor does it implement any of the
methods expected of a mutable string (length, endof, insert! / splice! /
append!). If you want strings that support all of those operations, then
you will need something different from an IOBuffer. If you just wanted a
fast string builder, then IOBuffer is the right abstraction (ending with a
call to `takebuf_string!`). This dichotomy helps to give a clear
distinction in the code between the construction phase and usage phase.

On Sat, May 2, 2015 at 12:49 PM Páll Haraldsson pall.haralds...@gmail.com
wrote:

 2015-05-01 16:42 GMT+00:00 Steven G. Johnson stevenj@gmail.com:


 In Julia, Ruby, Java, Go, and many other languages, concatenation
 allocates a new string and hence building a string by repeated
 concatenation is O(n^2).   That doesn't mean that those other languages
 lose on string processing to Python, it just means that you have to do
 things slightly differently (e.g. write to an IOBuffer in Julia).

 You can't always expect the *same code* (translated as literally as
 possible) to be the optimal approach in different languages, and it is
 inflammatory to compare languages according to this standard.

 A fairer question is whether it is *much harder* to get good performance
 in one language vs. another for a certain task.   There will certainly be
 tasks where Python is still superior in this sense simply because there are
 many cases where Python calls highly tuned C libraries for operations that
 have not been as optimized in Julia.  Julia will tend to shine the further
 you stray from built-in operations in your performance-critical code.


 What I would like to know is do you need to make your own string type to
 make Julia as fast (by a constant factor) to say Python. In another answer
 IOBuffer was said to be not good enough.


 --
 Palli.



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-02 Thread Páll Haraldsson
2015-05-01 16:42 GMT+00:00 Steven G. Johnson stevenj@gmail.com:


 In Julia, Ruby, Java, Go, and many other languages, concatenation
 allocates a new string and hence building a string by repeated
 concatenation is O(n^2).   That doesn't mean that those other languages
 lose on string processing to Python, it just means that you have to do
 things slightly differently (e.g. write to an IOBuffer in Julia).

 You can't always expect the *same code* (translated as literally as
 possible) to be the optimal approach in different languages, and it is
 inflammatory to compare languages according to this standard.

 A fairer question is whether it is *much harder* to get good performance
 in one language vs. another for a certain task.   There will certainly be
 tasks where Python is still superior in this sense simply because there are
 many cases where Python calls highly tuned C libraries for operations that
 have not been as optimized in Julia.  Julia will tend to shine the further
 you stray from built-in operations in your performance-critical code.


What I would like to know is do you need to make your own string type to
make Julia as fast (by a constant factor) to say Python. In another answer
IOBuffer was said to be not good enough.

-- 
Palli.


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread elextr


  If you are coding exclusively in Python or R, and there isn't an 
 optimized function appropriate for the innermost loops of your task at 
 hand, you are out of luck.



This is the important key takehome message, Julia is intended to allow both 
quick and simple and interactive and dynamic and optimised and fast code to 
written in one language.

I think Stefan announced Julia as we want it all :) 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 9:16:43 AM UTC-4, Tim Holy wrote:

 On Friday, May 01, 2015 03:19:03 AM Scott Jones wrote: 
  As the string grows, Julia's internals end up having to reallocate the 
  memory and sometimes copy it to a new location, hence the O(n^2) nature 
 of 
  the code. 

 Small correction: push! is not O(n^2), it's O(nlogn). Internally, the 
 storage 
 array grows by factors of 2 [1]; after one allocation of size 2n you can 
 add n 
 more elements without reallocating. 


Good to know, I hate to say it, but the performance looked so bad to me, I 
didn't bother to see if it even had that optimization (which is exactly 
what I did for strings for the language I used to develop)

Does it always grow by factors of 2?  That might not be so good...  we 
found that after a certain point, it was better to increase in chunks, say 
of 64K, or 1M, because increasing the size that way of large LOBs could 
make you run out of memory fairly quickly...

 

 That said, O(nlogn) can be pretty easily beat by O(2n): make one pass 
 through 
 and count how many you'll need, allocate the whole thing, and then stuff 
 in 
 elements. As you seem to be planning to do. 


Yes, and have very nice performance improvements to show for it (most were 
around 4-10x faster, go look at what I put in my gist), and that's even 
with my pure Julia
version... :-)
 


 --Tim 

 [1] Last I looked, that is; there was some discussion about switching it 
 to 
 something like 1.5 because of various discussions of memory fragmentation 
 and 
 reuse. 


Still, same issue as I described above... probably better to increase by 2x 
up to a point, and then by chunk sizes, where the chunk sizes might slowly 
get larger...
 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Tim Holy
On Friday, May 01, 2015 08:03:31 AM Scott Jones wrote:
 Still, same issue as I described above... probably better to increase by 2x 
 up to a point, and then by chunk sizes, where the chunk sizes might slowly
 get larger...

I see your point, but it will also break the O(nlogn) scaling. We couldn't 
hard-code the cutoff, because some people run julia on machines with 4GB of RAM 
and others with 1TB of RAM. So, we could query the amount of RAM available and 
switch based on that result, but since all this would only make a difference 
for operations that consume between 0.5x and 1x the user's RAM (which to me 
seems like a very narrow window, on the log scale), is it really worth the 
trouble?

--Tim



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Jeff Bezanson
Steven -- I agree and I find it very refreshing that you're willing to
judge a language by more than just performance. Any given language can
always be optimized better, so ideally you want to compare them by
more robust criteria.

Obviously a particular system might have a well-tuned library routine
that's faster than our equivalent. But think about it: is having a
slow interpreter, and relying on code to spend all its time in
pre-baked library kernels the *right* way to get performance? That's
just the same boring design that has been used over and over again, in
matlab, IDL, octave, R, etc. In those cases the language isn't
bringing much to the table, except a pile of rules about how important
code must still be written in C/Fortran, and how your code must be
vectorized or shame on you.

On Fri, May 1, 2015 at 11:48 AM, Tim Holy tim.h...@gmail.com wrote:
 On Friday, May 01, 2015 08:03:31 AM Scott Jones wrote:
 Still, same issue as I described above... probably better to increase by 2x
 up to a point, and then by chunk sizes, where the chunk sizes might slowly
 get larger...

 I see your point, but it will also break the O(nlogn) scaling. We couldn't
 hard-code the cutoff, because some people run julia on machines with 4GB of 
 RAM
 and others with 1TB of RAM. So, we could query the amount of RAM available and
 switch based on that result, but since all this would only make a difference
 for operations that consume between 0.5x and 1x the user's RAM (which to me
 seems like a very narrow window, on the log scale), is it really worth the
 trouble?

 --Tim



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Jameson Nash
The threshold would likely be most beneficial if it was based on pagesize
(which is constant relative to RAM size). For small allocations (less than
several megabytes), a modern malloc implementation typically uses a pool,
so growing a allocation (except by a small amount) will probably result in
a copy anyways, and no memory reuse. Once malloc switches to direct mmap
calls, then it probably makes sense to add pages at a more gradual rate.

On Fri, May 1, 2015 at 11:48 AM Tim Holy tim.h...@gmail.com wrote:

 On Friday, May 01, 2015 08:03:31 AM Scott Jones wrote:
  Still, same issue as I described above... probably better to increase by
 2x
  up to a point, and then by chunk sizes, where the chunk sizes might
 slowly
  get larger...

 I see your point, but it will also break the O(nlogn) scaling. We couldn't
 hard-code the cutoff, because some people run julia on machines with 4GB
 of RAM
 and others with 1TB of RAM. So, we could query the amount of RAM available
 and
 switch based on that result, but since all this would only make a
 difference
 for operations that consume between 0.5x and 1x the user's RAM (which to me
 seems like a very narrow window, on the log scale), is it really worth the
 trouble?

 --Tim




Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven Sagaert
Of course I'm not saying loops should not be benchmarked and I do use loops 
in julia also. I'm just saying that when doing performance comparison one 
should try to write the programs in each language in their most optimal 
style rather than similar style which is optimal for one language but very 
suboptimal in another language.
Ah I didn't know the article was rebutted by Stefan. I read that article 
before that happened and just looked it up again now as an example.

I guess the conclusion is that cross-language performance benchmarks are 
very tricky which was kinda my point :)

On Friday, May 1, 2015 at 3:13:24 PM UTC+2, Tim Holy wrote:

 Hi Steven, 

 I understand your point---you're saying you'd be unlikely to write those 
 algorithms in that manner, if your goal were to do those particular 
 computations. But the important point to keep in mind is that those 
 benchmarks 
 are simply toys for the purpose of testing performance of various 
 language 
 constructs. If you think it's irrelevant to benchmark loops for scientific 
 code, then you do very, very different stuff than me. Not all algorithms 
 reduce 
 to BLAS calls. I use julia to write all kinds of algorithms that I used to 
 write MEX functions for, back in my Matlab days. If all you need is A*b, 
 then 
 of course basically any scientific language will be just fine, with 
 minimal 
 differences in performance. 

 Moreover, that R benchmark on cumsum is simply not credible. I'm not sure 
 what 
 was happening (and that article doesn't post its code or procedures used 
 to 
 test), but julia's cumsum reduces to efficient machine code (basically, a 
 bunch 
 of addition operations). If they were computing cumsum across a specific 
 dimension, then this PR: 
 https://github.com/JuliaLang/julia/pull/7359 
 changed things. But more likely, someone forgot to run the code twice (so 
 it 
 got JIT-compiled), had a type-instability in the code they were testing, 
 or 
 some other mistake. It's too bad one can make mistakes, of course, but 
 then it 
 becomes a comparison of different programmers rather than different 
 programming 
 languages. 

 Indeed, if you read the comments in that post, Stefan already rebutted 
 that 
 benchmark, with a 4x advantage for Julia: 

 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/comment-page-1/#comment-89
  

 --Tim 



 On Friday, May 01, 2015 01:25:50 AM Steven Sagaert wrote: 
  I think the performance comparisons between Julia  Python are flawed. 
 They 
  seem to be between standard Python  Julia but since Julia is all about 
  scientific programming it really should be between SciPi  Julia. Since 
  SciPi uses much of the same underlying libs in Fortran/C the performance 
  gap will be much smaller and to be really fair it should be between 
 numba 
  compiled SciPi code  julia. I suspect the performance will be very 
 close 
  then (and close to C performance). 
  
  Similarly the standard benchmark (on the opening page of julia website) 
  between R  julia is also flawed because it takes the best case scenario 
  for julia (loops  mutable datastructures)  the worst case scenario for 
 R. 
  When the same R program is rewritten in vectorised style it beat julia 
  see 
  
 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyon 
  e-else-wanna-challenge-r/. 
  
  So my interest in julia isn't because it is the fastest scientific high 
  level language (because clearly at this stage you can't really claim 
 that) 
  but because it's a clean interesting language (still needs work for some 
  rough edges of course) with clean(er)  clear(er) libraries  and that 
 gives 
  reasonable performance out of the box without much tweaking. 
  
  On Friday, May 1, 2015 at 12:10:58 AM UTC+2, Scott Jones wrote: 
   Yes... Python will win on string processing... esp. with Python 3... I 
   quickly ran into things that were  800x faster in Python... 
   (I hope to help change that though!) 
   
   Scott 
   
   On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson 
 wrote: 
   I wouldn't expect a difference in Julia for code like that (didn't 
   check). But I guess what we are often seeing is someone comparing a 
 tuned 
   Python code to newbie Julia code. I still want it faster than that 
 code.. 
   (assuming same algorithm, note row vs. column major caveat). 
   
   The main point of mine, *should* Python at any time win? 
   
   2015-04-30 21:36 GMT+00:00 Sisyphuss zhengw...@gmail.com: 
   This post interests me. I'll write something here to follow this 
 post. 
   
   The performance gap between normal code in Python and badly-written 
 code 
   in Julia is something I'd like to know too. 
   As far as I know, Python interpret does some mysterious 
 optimizations. 
   For example `(x**2)**2` is 100x faster than `x**4`. 
   
   On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson 
 wrote: 
   Hi, 
   
   [As a best 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Stefan Karpinski
I'll quote one of my comments on this StackOverflow question
http://stackoverflow.com/questions/9968578/speeding-up-julias-poorly-written-r-examples
:

That all depends on what you are trying to measure. Personally, I'm not at
 all interested in how fast one can compute Fibonacci numbers. Yet that is
 one of our benchmarks. Why? Because I am very interested in how well
 languages support recursion – and the doubly recursive algorithm happens to
 be a great test of recursion, precisely because it is such a terrible way
 to compute Fibonacci numbers. So what would be learned by comparing an
 intentionally slow, excessively recursive algorithm in C and Julia against
 a tricky, clever, vectorized algorithm in R? Nothing at all.


On Fri, May 1, 2015 at 12:58 PM, Steven Sagaert steven.saga...@gmail.com
wrote:

 Of course I'm not saying loops should not be benchmarked and I do use
 loops in julia also. I'm just saying that when doing performance comparison
 one should try to write the programs in each language in their most optimal
 style rather than similar style which is optimal for one language but very
 suboptimal in another language.
 Ah I didn't know the article was rebutted by Stefan. I read that article
 before that happened and just looked it up again now as an example.

 I guess the conclusion is that cross-language performance benchmarks are
 very tricky which was kinda my point :)


 On Friday, May 1, 2015 at 3:13:24 PM UTC+2, Tim Holy wrote:

 Hi Steven,

 I understand your point---you're saying you'd be unlikely to write those
 algorithms in that manner, if your goal were to do those particular
 computations. But the important point to keep in mind is that those
 benchmarks
 are simply toys for the purpose of testing performance of various
 language
 constructs. If you think it's irrelevant to benchmark loops for
 scientific
 code, then you do very, very different stuff than me. Not all algorithms
 reduce
 to BLAS calls. I use julia to write all kinds of algorithms that I used
 to
 write MEX functions for, back in my Matlab days. If all you need is A*b,
 then
 of course basically any scientific language will be just fine, with
 minimal
 differences in performance.

 Moreover, that R benchmark on cumsum is simply not credible. I'm not sure
 what
 was happening (and that article doesn't post its code or procedures used
 to
 test), but julia's cumsum reduces to efficient machine code (basically, a
 bunch
 of addition operations). If they were computing cumsum across a specific
 dimension, then this PR:
 https://github.com/JuliaLang/julia/pull/7359
 changed things. But more likely, someone forgot to run the code twice (so
 it
 got JIT-compiled), had a type-instability in the code they were testing,
 or
 some other mistake. It's too bad one can make mistakes, of course, but
 then it
 becomes a comparison of different programmers rather than different
 programming
 languages.

 Indeed, if you read the comments in that post, Stefan already rebutted
 that
 benchmark, with a 4x advantage for Julia:

 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/comment-page-1/#comment-89

 --Tim



 On Friday, May 01, 2015 01:25:50 AM Steven Sagaert wrote:
  I think the performance comparisons between Julia  Python are flawed.
 They
  seem to be between standard Python  Julia but since Julia is all about
  scientific programming it really should be between SciPi  Julia. Since
  SciPi uses much of the same underlying libs in Fortran/C the
 performance
  gap will be much smaller and to be really fair it should be between
 numba
  compiled SciPi code  julia. I suspect the performance will be very
 close
  then (and close to C performance).
 
  Similarly the standard benchmark (on the opening page of julia website)
  between R  julia is also flawed because it takes the best case
 scenario
  for julia (loops  mutable datastructures)  the worst case scenario
 for R.
  When the same R program is rewritten in vectorised style it beat julia
  see
 
 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyon
  e-else-wanna-challenge-r/.
 
  So my interest in julia isn't because it is the fastest scientific high
  level language (because clearly at this stage you can't really claim
 that)
  but because it's a clean interesting language (still needs work for
 some
  rough edges of course) with clean(er)  clear(er) libraries  and that
 gives
  reasonable performance out of the box without much tweaking.
 
  On Friday, May 1, 2015 at 12:10:58 AM UTC+2, Scott Jones wrote:
   Yes... Python will win on string processing... esp. with Python 3...
 I
   quickly ran into things that were  800x faster in Python...
   (I hope to help change that though!)
  
   Scott
  
   On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson
 wrote:
   I wouldn't expect a difference in Julia for code like that (didn't
   check). But I guess what we are often seeing is someone 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven G. Johnson


On Thursday, April 30, 2015 at 6:10:58 PM UTC-4, Scott Jones wrote:

 Yes... Python will win on string processing... esp. with Python 3... I 
 quickly ran into things that were  800x faster in Python...
 (I hope to help change that though!)


The 800x faster example that you've referred to several times, if I 
recall correctly, is one where you repeatedly concatenate strings.  In 
CPython, under certain circumstances, this is optimized to mutating one of 
the strings in-place and is consequently O(n) where n is the final length, 
although this is not guaranteed by the language itself.  In Julia, Ruby, 
Java, Go, and many other languages, concatenation allocates a new string 
and hence building a string by repeated concatenation is O(n^2).   That 
doesn't mean that those other languages lose on string processing to 
Python, it just means that you have to do things slightly differently (e.g. 
write to an IOBuffer in Julia).

You can't always expect the *same code* (translated as literally as 
possible) to be the optimal approach in different languages, and it is 
inflammatory to compare languages according to this standard.

A fairer question is whether it is *much harder* to get good performance in 
one language vs. another for a certain task.   There will certainly be 
tasks where Python is still superior in this sense simply because there are 
many cases where Python calls highly tuned C libraries for operations that 
have not been as optimized in Julia.  Julia will tend to shine the further 
you stray from built-in operations in your performance-critical code.


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven G. Johnson
On Friday, May 1, 2015 at 1:12:00 PM UTC-4, Steven Sagaert wrote: 

 That wasn't what I was saying. I like the philosophy behind julia. But in 
 practice (as of now) even in julia you still have to code in a certain 
 style if you want very good performance and that's no different than in any 
 other language.


The goal of Julia is not to be a language in which it is *impossible* to 
write slow code, or a language in which all programming styles are equally 
fast.   The goal (or at least, one of the goals) is to be an expressive, 
high-level dynamic language, in which it is also *possible* to write 
performance-critical inner-loop code.

That *is* different from other high-level languages, in which it is 
typically *not* possible to write performance-critical inner-loop code 
without dropping down to a lower-level language (C, Fortran, Cython...).   
If you are coding exclusively in Python or R, and there isn't an optimized 
function appropriate for the innermost loops of your task at hand, you are 
out of luck.


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven Sagaert




 Obviously a particular system might have a well-tuned library routine 
 that's faster than our equivalent. But think about it: is having a 
 slow interpreter, and relying on code to spend all its time in 
 pre-baked library kernels the *right* way to get performance? That's 
 just the same boring design that has been used over and over again, in 
 matlab, IDL, octave, R, etc. In those cases the language isn't 
 bringing much to the table, except a pile of rules about how important 
 code must still be written in C/Fortran, and how your code must be 
 vectorized or shame on you.


That wasn't what I was saying. I like the philosophy behind julia. But in 
practice (as of now) even in julia you still have to code in a certain 
style if you want very good performance and that's no different than in any 
other language. Ideally of course the compiler should be able to optimize 
the code so that different styles (e.g. functional/vectorized style vs 
imperative/loops style) gives the same performance and the programmer 
doesn't have to think about it and maybe one day it will be like that in 
julia but we're not quite there yet AFAIK.

Having said that, I like Julia and hopefully it will keep on getting 
better/faster. So good job and keep up the good work.



 On Fri, May 1, 2015 at 11:48 AM, Tim Holy tim@gmail.com javascript: 
 wrote: 
  On Friday, May 01, 2015 08:03:31 AM Scott Jones wrote: 
  Still, same issue as I described above... probably better to increase 
 by 2x 
  up to a point, and then by chunk sizes, where the chunk sizes might 
 slowly 
  get larger... 
  
  I see your point, but it will also break the O(nlogn) scaling. We 
 couldn't 
  hard-code the cutoff, because some people run julia on machines with 4GB 
 of RAM 
  and others with 1TB of RAM. So, we could query the amount of RAM 
 available and 
  switch based on that result, but since all this would only make a 
 difference 
  for operations that consume between 0.5x and 1x the user's RAM (which to 
 me 
  seems like a very narrow window, on the log scale), is it really worth 
 the 
  trouble? 
  
  --Tim 
  



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 1:23:40 PM UTC-4, Steven G. Johnson wrote:

 On Friday, May 1, 2015 at 1:12:00 PM UTC-4, Steven Sagaert wrote: 

 That wasn't what I was saying. I like the philosophy behind julia. But in 
 practice (as of now) even in julia you still have to code in a certain 
 style if you want very good performance and that's no different than in any 
 other language.


 The goal of Julia is not to be a language in which it is *impossible* to 
 write slow code, or a language in which all programming styles are equally 
 fast.   The goal (or at least, one of the goals) is to be an expressive, 
 high-level dynamic language, in which it is also *possible* to write 
 performance-critical inner-loop code.


Yep, totally agree!  I had to deal with people (smart people too, who went 
to MIT also ;-) ) who expected the compiler/interpreter to magically 
improve their O(n^2) code!
 

 That *is* different from other high-level languages, in which it is 
 typically *not* possible to write performance-critical inner-loop code 
 without dropping down to a lower-level language (C, Fortran, Cython...).   
 If you are coding exclusively in Python or R, and there isn't an optimized 
 function appropriate for the innermost loops of your task at hand, you are 
 out of luck.


Also, very true...  I do hope that any issues that make my C version of UTF 
conversion routines faster than my equivalent Julia versions will be 
addressed before too long.
(and I don't even think it is that far off, or hard for any particular 
reason) 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 11:48:21 AM UTC-4, Tim Holy wrote:

 On Friday, May 01, 2015 08:03:31 AM Scott Jones wrote: 
  Still, same issue as I described above... probably better to increase by 
 2x 
  up to a point, and then by chunk sizes, where the chunk sizes might 
 slowly 
  get larger... 

 I see your point, but it will also break the O(nlogn) scaling. We couldn't 
 hard-code the cutoff, because some people run julia on machines with 4GB 
 of RAM 
 and others with 1TB of RAM. So, we could query the amount of RAM available 
 and 
 switch based on that result, but since all this would only make a 
 difference 
 for operations that consume between 0.5x and 1x the user's RAM (which to 
 me 
 seems like a very narrow window, on the log scale), is it really worth the 
 trouble? 

 --Tim 


For what I was doing, yes, it was definitely worth the trouble, because 
you'd have systems with 10s of thousands of processes (the limit was 64K on 
a single node), and you had to be very careful about not using up too much 
memory, and ending up thrashing...
Very different than when you maybe have a process for each core, and you 
have lots of memory for each one...
Different usage... different performance issues... 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 12:42:57 PM UTC-4, Steven G. Johnson wrote:



 On Thursday, April 30, 2015 at 6:10:58 PM UTC-4, Scott Jones wrote:

 Yes... Python will win on string processing... esp. with Python 3... I 
 quickly ran into things that were  800x faster in Python...
 (I hope to help change that though!)


 The 800x faster example that you've referred to several times, if I 
 recall correctly, is one where you repeatedly concatenate strings.  In 
 CPython, under certain circumstances, this is optimized to mutating one of 
 the strings in-place and is consequently O(n) where n is the final length, 
 although this is not guaranteed by the language itself.  In Julia, Ruby, 
 Java, Go, and many other languages, concatenation allocates a new string 
 and hence building a string by repeated concatenation is O(n^2).   That 
 doesn't mean that those other languages lose on string processing to 
 Python, it just means that you have to do things slightly differently (e.g. 
 write to an IOBuffer in Julia).


I just don't think that IOBuffers are a very good way to do that...  what I 
really need are mutable strings... and I know there is a package, and I 
need to investigate that further...
it's something that would be nice to have as part of the core of the 
language, instead of having to use either Vectors or IOBuffers...
As a new users, I would think, if I'm not doing IO, why should be using an 
IOBuffer...
 

 You can't always expect the *same code* (translated as literally as 
 possible) to be the optimal approach in different languages, and it is 
 inflammatory to compare languages according to this standard.


I was not intending to be inflammatory, just relating what my first 
experience was, which let me to investigate much more deeply, into the good 
and bad issues in Julia wrt performance (more good than bad, by a long 
shot).
 

 A fairer question is whether it is *much harder* to get good performance 
 in one language vs. another for a certain task.   There will certainly be 
 tasks where Python is still superior in this sense simply because there are 
 many cases where Python calls highly tuned C libraries for operations that 
 have not been as optimized in Julia.  Julia will tend to shine the further 
 you stray from built-in operations in your performance-critical code.


Yes, that is true... and that is why I'm betting on Julia in the long run 
(the other option for a lot of the code would have been Python or C++11, 
and I've already found Julia easier to deal with than either of them, even 
in it's pre 1.0 state) 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 12:38:33 PM UTC-4, Jeff Bezanson wrote:

 Steven -- I agree and I find it very refreshing that you're willing to 
 judge a language by more than just performance. Any given language can 
 always be optimized better, so ideally you want to compare them by 
 more robust criteria. 

 Obviously a particular system might have a well-tuned library routine 
 that's faster than our equivalent. But think about it: is having a 
 slow interpreter, and relying on code to spend all its time in 
 pre-baked library kernels the *right* way to get performance? That's 
 just the same boring design that has been used over and over again, in 
 matlab, IDL, octave, R, etc. In those cases the language isn't 
 bringing much to the table, except a pile of rules about how important 
 code must still be written in C/Fortran, and how your code must be 
 vectorized or shame on you. 


That's a very good point... and is one of the things I like a lot about 
Julia...
Even with my initial surprise about a single performance issue (the 
building up a string by concatenation), I did NOT judge Julia by that alone,
and have been quite happy with it overall [and I've been converting all of 
the developers at the startup where I'm consulting to Julia fans].
I also have faith, from what I've seen so far, is that performance issues 
*will* be addressed, as best as possible considering the architecture and 
goals of the language,
by a number of pretty smart people, both in and outside of the core team.

Scott


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven Sagaert


On Friday, May 1, 2015 at 7:23:40 PM UTC+2, Steven G. Johnson wrote:

 On Friday, May 1, 2015 at 1:12:00 PM UTC-4, Steven Sagaert wrote: 

 That wasn't what I was saying. I like the philosophy behind julia. But in 
 practice (as of now) even in julia you still have to code in a certain 
 style if you want very good performance and that's no different than in any 
 other language.


 The goal of Julia is not to be a language in which it is *impossible* to 
 write slow code, or a language in which all programming styles are equally 
 fast. 


I didn't say that was a goal of Julia but it sure  would be nice to have 
though :) but probably an impossible dream.
 

   The goal (or at least, one of the goals) is to be an expressive, 
 high-level dynamic language, in which it is also *possible* to write 
 performance-critical inner-loop code.

 That *is* different from other high-level languages, in which it is 
 typically *not* possible to write performance-critical inner-loop code 
 without dropping down to a lower-level language (C, Fortran, Cython...).   
 If you are coding exclusively in Python or R, and there isn't an optimized 
 function appropriate for the innermost loops of your task at hand, you are 
 out of luck.

like I said: I like Julia and I am rooting for it but just to play devil's 
advocate: I believe it's also a goal ( possibility) of numba to write 
c-level efficient code in Python. All you have to do add an annotation here 
and there. 


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Jameson Nash
I believe that both are actually very similar in that manner. I think the
main difference comes from the fact that Julia is an attempt to design the
core library to support and use the efficient constructs, while Numba and
other related projects are, for better or worse, inheriting the default
python semantics and built-in libraries.

Sometimes a new language is better than an old language simply because it
can drop compatibility concerns. For example, Java is known for providing
far more consistent multi-threading support than C, since it is a language
construct and not an add-on feature. It was possible in both, one just made
it easier for the programmer to access. Similarly, Node made it feasible to
write programs without any concept of a blocking operation. Again, this was
already possible in languages like Python and C, but Node (with it's legacy
in Javascript), made it a feature of the language and designed all of the
core API's to deal with it.


On Fri, May 1, 2015 at 2:27 PM Steven G. Johnson stevenj@gmail.com
wrote:



 On Friday, May 1, 2015 at 2:04:44 PM UTC-4, Steven Sagaert wrote:

 like I said: I like Julia and I am rooting for it but just to play
 devil's advocate: I believe it's also a goal ( possibility) of numba to
 write c-level efficient code in Python. All you have to do add an
 annotation here and there.


 Numba is arguably a 2nd lower-level language that happens to be embedded
 in Python — it is telling that Numba's documentation explicitly states that
 it can only get good performance when it is able to JIT the inner loops in
 nopython mode — basically, code that doesn't stray outside a small set of
 types.



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven G. Johnson


On Friday, May 1, 2015 at 2:04:44 PM UTC-4, Steven Sagaert wrote:

 like I said: I like Julia and I am rooting for it but just to play devil's 
 advocate: I believe it's also a goal ( possibility) of numba to write 
 c-level efficient code in Python. All you have to do add an annotation here 
 and there. 


Numba is arguably a 2nd lower-level language that happens to be embedded in 
Python — it is telling that Numba's documentation explicitly states that it 
can only get good performance when it is able to JIT the inner loops in 
nopython mode — basically, code that doesn't stray outside a small set of 
types.


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 1:25:41 AM UTC-4, Jeff Bezanson wrote:

 It is true that we have not yet done enough to optimize the worst and 
 worse performance cases. The bright side of that is that we have room 
 to improve; it's not that we've run out of ideas and techniques. 

 Tim is right that the complexity of our dispatch system makes julia 
 potentially slower than python. But in dispatch-heavy code I've seen 
 cases where we are faster or slower; it depends. 

 Python's string and dictionary operations, in particular, are really 
 fast. This is not surprising considering what the language was 
 designed for, and that they have a big library of well-tuned C code 
 for these things. 

 I still maintain that it is misleading to describe an *asymptotic* 
 slowdown as 800x slower. If you name a constant factor, it sounds 
 like you're talking about a constant factor slowdown. But the number 
 is arbitrary, because it depends on data size. In theory, of course, 
 an asymptotic slowdown is *much worse* than a constant factor 
 slowdown. However in the systems world constant factors are often more 
 important, and are often what we talk about. 


No, that was just my very first test comparing Julia  Python, using a size 
that matched the record sizes I'd typically seen from way too many years of
benchmarking (database / string processing operations)
 

 You say a lot of the algorithms are O(n) instead of O(1). Are there 
 any examples other than length()? 


Actually, it's worse than that... length, and getting finding a particular 
character by character position, and getting a substring by character 
position, some of the most frequent operations for what I deal with, are 
O(n) instead of O(1),  and things like conversions are O(n^2), not O(n) 
[and the conversions are much more complex, due to the string 
representation in Julia, unlike Python 3].
The conversions I am fixing, so that they are not O(n^2), but rather O(n) 
[slower than Python, again because of the representation, but not 
asymptotic].
The reason they are O(n^2), like the string concatenation problem I ran 
into right when I first started to evaluate Julia, is because of the way 
the conversion functions are written,
initially creating a 0-length array, and then doing push! to successively 
add characters to the array, and then finally calling UTF8String, 
UTF16String, or UTF32String to convert
the Vector{UInt8}, Vector{UInt16} or Vector{Char} respectively into an 
immutable string.
As the string grows, Julia's internals end up having to reallocate the 
memory and sometimes copy it to a new location, hence the O(n^2) nature of 
the code.

My changes, which hopefully will be accepted (after I check in my next 
round of pure Julia optimizations), solve that by first validating the 
input UTF-8, UTF-16, or UTF-32
string at the same time as calculating how many characters of the different 
ranges are present, so that the memory can be allocated once, exactly the 
size needed, and also
frequently allowing dispatching to simpler conversion code, when it is know 
that all of the characters in the string just need to be widened 
(zero-extended), or narrowed.

I disagree that UTF-8 has no space savings over UTF-32 when using the 
 full range of unicode. The reason is that strings often have only a 
 small percentage of non-BMP characters, with lots of spaces and 
 newlines etc. You don't want your whole file to use 4x the space just 
 to use one emoji. 


Please read my statement more carefully...

 UTF-8 *can* take up to 50% more storage than UTF-16 if you are just 
 dealing with BMP characters.
 If you have some field that needs to hold *a certain number of Unicode 
 characters*, for the full range of Unicode,
 you need to allocate 4 bytes for every character, so no savings compared 
 to UTF-16 or UTF-32.


My point was that if you have to allocate a buffer to hold a certain # of 
characters, say because you have a CHAR, NCHAR, or WCHAR, or VARCHAR, etc. 
field from a DBMS,
for UTF-8, you need to allocate at least 4 bytes per character, so no 
savings over UTF-16 or UTF-32 for those operations...

I spent over two years going back and forth to Japan, when I designed (and 
was the main implementor) for the Unicode support of a database system / 
language, and spent a lot of time looking at the just how much storage 
space different representations would take... Note, at that time, Unicode 
2.0 was not out, so the choice was between UCS-2 (no surrogates then), 
UTF-8, some combination thereof, or some new encoding.

My first version, released finally in 1997, used either 8-bit (ANSI Latin 
1) or UCS-2 to store data...  The next release, I came up with a new 
encoding for Unicode, that was much more compact (at the insistence of the 
Japanese customers, who didn't want their storage requirements to increase 
because of moving from SJIS and EUC to Unicode).
In memory, all strings were UCS-2 (or really UTF-16, but like Java, because 
I designed it 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven Sagaert
I think the performance comparisons between Julia  Python are flawed. They 
seem to be between standard Python  Julia but since Julia is all about 
scientific programming it really should be between SciPi  Julia. Since 
SciPi uses much of the same underlying libs in Fortran/C the performance 
gap will be much smaller and to be really fair it should be between numba 
compiled SciPi code  julia. I suspect the performance will be very close 
then (and close to C performance).

Similarly the standard benchmark (on the opening page of julia website) 
between R  julia is also flawed because it takes the best case scenario 
for julia (loops  mutable datastructures)  the worst case scenario for R. 
When the same R program is rewritten in vectorised style it beat julia 
see 
https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/.

So my interest in julia isn't because it is the fastest scientific high 
level language (because clearly at this stage you can't really claim that) 
but because it's a clean interesting language (still needs work for some 
rough edges of course) with clean(er)  clear(er) libraries  and that gives 
reasonable performance out of the box without much tweaking. 

On Friday, May 1, 2015 at 12:10:58 AM UTC+2, Scott Jones wrote:

 Yes... Python will win on string processing... esp. with Python 3... I 
 quickly ran into things that were  800x faster in Python...
 (I hope to help change that though!)

 Scott

 On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson wrote:

 I wouldn't expect a difference in Julia for code like that (didn't 
 check). But I guess what we are often seeing is someone comparing a tuned 
 Python code to newbie Julia code. I still want it faster than that code.. 
 (assuming same algorithm, note row vs. column major caveat).

 The main point of mine, *should* Python at any time win?

 2015-04-30 21:36 GMT+00:00 Sisyphuss zhengw...@gmail.com:

 This post interests me. I'll write something here to follow this post.

 The performance gap between normal code in Python and badly-written code 
 in Julia is something I'd like to know too.
 As far as I know, Python interpret does some mysterious optimizations. 
 For example `(x**2)**2` is 100x faster than `x**4`.




 On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:


 Hi,

 [As a best language is subjective, I'll put that aside for a moment.]

 Part I.

 The goal, as I understand, for Julia is at least within a factor of two 
 of C and already matching it mostly and long term beating that (and C++). 
 [What other goals are there? How about 0.4 now or even 1.0..?]

 While that is the goal as a language, you can write slow code in any 
 language and Julia makes that easier. :) [If I recall, Bezanson mentioned 
 it (the global problem) as a feature, any change there?]


 I've been following this forum for months and newbies hit the same 
 issues. But almost always without fail, Julia can be speed up (easily as 
 Tim Holy says). I'm thinking about the exceptions to that - are there any 
 left? And about the first code slowness (see Part II).

 Just recently the last two flaws of Julia that I could see where fixed: 
 Decimal floating point is in (I'll look into the 100x slowness, that is 
 probably to be expected of any language, still I think may be a 
 misunderstanding and/or I can do much better). And I understand the tuple 
 slowness has been fixed (that was really the only core language defect). 
 The former wasn't a performance problem (mostly a non existence problem 
 and 
 correctness one (where needed)..).


 Still we see threads like this one recent one:

 https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
 It seems changing the order of nested loops also helps

 Obviously Julia can't beat assembly but really C/Fortran is already 
 close enough (within a small factor). The above row vs. column major 
 (caching effects in general) can kill performance in all languages. 
 Putting 
 that newbie mistake aside, is there any reason Julia can be within a small 
 factor of assembly (or C) in all cases already?


 Part II.

 Except for caching issues, I still want the most newbie code or 
 intentionally brain-damaged code to run faster than at least 
 Python/scripting/interpreted languages.

 Potential problems (that I think are solved or at least not problems in 
 theory):

 1. I know Any kills performance. Still, isn't that the default in 
 Python (and Ruby, Perl?)? Is there a good reason Julia can't be faster 
 than 
 at least all the so-called scripting languages in all cases (excluding 
 small startup overhead, see below)?

 2. The global issue, not sure if that slows other languages down, say 
 Python. Even if it doesn't, should Julia be slower than Python because of 
 global?

 3. Garbage collection. I do not see that as a problem, incorrect? 
 Mostly performance variability ([3D] games - subject for another post, 
 as 
 I'm not sure 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones


On Friday, May 1, 2015 at 4:25:50 AM UTC-4, Steven Sagaert wrote:

 I think the performance comparisons between Julia  Python are flawed. 
 They seem to be between standard Python  Julia but since Julia is all 
 about scientific programming it really should be between SciPi  Julia. 
 Since SciPi uses much of the same underlying libs in Fortran/C the 
 performance gap will be much smaller and to be really fair it should be 
 between numba compiled SciPi code  julia. I suspect the performance will 
 be very close then (and close to C performance).


Why should Julia be limited to scientific programming?
I think it can be a great language for general programming, for the most 
part, I think it already is (it can use some changes for string handling 
[I'd like to work on that ;-)], decimal floating point support [that is 
currently being addressed, kudos to Steven G. Johnson], maybe some better 
language constructs to allow better software engineering practices [that is 
being hotly debated!], and definitely a real debugger [I think keno is 
working on that]).

Comparing Julia to Python for general computing is totally valid and 
interesting.
Comparing Julia to SciPy for scientific computing is also totally valid and 
interesting.

Similarly the standard benchmark (on the opening page of julia website) 
 between R  julia is also flawed because it takes the best case scenario 
 for julia (loops  mutable datastructures)  the worst case scenario for R. 
 When the same R program is rewritten in vectorised style it beat julia see 
 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/
 .

 So my interest in julia isn't because it is the fastest scientific high 
 level language (because clearly at this stage you can't really claim that) 
 but because it's a clean interesting language (still needs work for some 
 rough edges of course) with clean(er)  clear(er) libraries  and that gives 
 reasonable performance out of the box without much tweaking. 



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Tim Holy
Don't apologize; instead, tell us more about what Go does, and how you think 
things can be better. Those of us who don't know Go will thank you for it.

Best,
--Tim

On Thursday, April 30, 2015 09:42:47 PM Harry B wrote:
 Sorry my comment wasn't well thought out and a bit off topic. On
 exceptions/errors my issue is this
 https://github.com/JuliaLang/julia/issues/7026
 On profiling, I was comparing to Go, but again off topic and I take my
 comment back. I don't have any intelligent remarks to add (yet!) :)
 Thank you for the all the work you are doing.
 
 On Thursday, April 30, 2015 at 7:00:01 PM UTC-7, Tim Holy wrote:
  Harry, I'm curious about 2 of your 3 last points:
  
  On Thursday, April 30, 2015 05:50:15 PM Harry B wrote:
   (exceptions?, debugging, profiling tools)
  
  We have exceptions. What aspect are you referring to?
  Debugger: yes, that's missing, and it's a huge gap.
  Profiling tools: in my view we're doing OK (better than Matlab, in my
  opinion),
  but what do you see as missing?
  
  --Tim
  
   Thanks
   
It seemed to me tuples where slow because of Any used. I understand
  
  tuples
  
have been fixed, I'm not sure how.

I do not remember the post/all the details. Yes, tuples where slow/er
  
  than
  
Python. Maybe it was Dict, isn't that kind of a tuple? Now we have
  
  Pair in
  
0.4. I do not have 0.4, maybe I should bite the bullet and install..
  
  I'm
  
not doing anything production related and trying things out and using
0.3[.5] to avoid stability problems.. Then I can't judge the speed..

Another potential issue I saw with tuples (maybe that is not a problem
  
  in
  
general, and I do not know that languages do this) is that they can
  
  take a
  
lot of memory (to copy around). I was thinking, maybe they should do
similar to databases, only use a fixed amount of memory (a page)
  
  with a
  
pointer to overflow data..

2015-04-30 22:13 GMT+00:00 Ali Rezaee arv@gmail.com
  
  javascript::
They were interesting questions.
I would also like to know why poorly written Julia code
sometimes performs worse than similar python code, especially when
  
  tuples
  
are involved. Did you say it was fixed?

On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson
  
  wrote:
Hi,

[As a best language is subjective, I'll put that aside for a
  
  moment.]
  
Part I.

The goal, as I understand, for Julia is at least within a factor of
  
  two
  
of C and already matching it mostly and long term beating that (and
C++).
[What other goals are there? How about 0.4 now or even 1.0..?]

While that is the goal as a language, you can write slow code in any
language and Julia makes that easier. :) [If I recall, Bezanson
mentioned
it (the global problem) as a feature, any change there?]


I've been following this forum for months and newbies hit the same
issues. But almost always without fail, Julia can be speed up
  
  (easily as
  
Tim Holy says). I'm thinking about the exceptions to that - are
  
  there
  
any
left? And about the first code slowness (see Part II).

Just recently the last two flaws of Julia that I could see where
  
  fixed:
Decimal floating point is in (I'll look into the 100x slowness, that
  
  is
  
probably to be expected of any language, still I think may be a
misunderstanding and/or I can do much better). And I understand the
tuple
slowness has been fixed (that was really the only core language
defect).
The former wasn't a performance problem (mostly a non existence
  
  problem
  
and
correctness one (where needed)..).


Still we see threads like this one recent one:

https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
It seems changing the order of nested loops also helps

Obviously Julia can't beat assembly but really C/Fortran is already
close enough (within a small factor). The above row vs. column major
(caching effects in general) can kill performance in all languages.
Putting
that newbie mistake aside, is there any reason Julia can be within a
small
factor of assembly (or C) in all cases already?


Part II.

Except for caching issues, I still want the most newbie code or
intentionally brain-damaged code to run faster than at least
Python/scripting/interpreted languages.

Potential problems (that I think are solved or at least not problems
  
  in
  
theory):

1. I know Any kills performance. Still, isn't that the default in
  
  Python
  
(and Ruby, Perl?)? Is there a good reason Julia can't be faster than
  
  at
  
least all the so-called scripting languages in all cases (excluding
small
startup overhead, see below)?

2. The global issue, not sure if that slows other languages down,
  
  say

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Scott Jones
I just read through all of that very interesting thread on exceptions... it 
seems that Stefan was trying to reinvent the wheel, without knowing it.

Everybody interested in exception handling should go look up CLU... Julia 
seems to have gotten a lot of ideas from CLU (possibly rather indirectly,
through C++, Java, Lua...).
CLU had this well handled 40 years ago ;-)

Scott

On Friday, May 1, 2015 at 12:42:47 AM UTC-4, Harry B wrote:

 Sorry my comment wasn't well thought out and a bit off topic. On 
 exceptions/errors my issue is this 
 https://github.com/JuliaLang/julia/issues/7026
 On profiling, I was comparing to Go, but again off topic and I take my 
 comment back. I don't have any intelligent remarks to add (yet!) :)
 Thank you for the all the work you are doing. 

 On Thursday, April 30, 2015 at 7:00:01 PM UTC-7, Tim Holy wrote:

 Harry, I'm curious about 2 of your 3 last points: 

 On Thursday, April 30, 2015 05:50:15 PM Harry B wrote: 
  (exceptions?, debugging, profiling tools) 

 We have exceptions. What aspect are you referring to? 
 Debugger: yes, that's missing, and it's a huge gap. 
 Profiling tools: in my view we're doing OK (better than Matlab, in my 
 opinion), 
 but what do you see as missing? 

 --Tim 

  
  Thanks 
  -- 
  Harry 
  
  On Thursday, April 30, 2015 at 3:43:36 PM UTC-7, Páll Haraldsson wrote: 
   It seemed to me tuples where slow because of Any used. I understand 
 tuples 
   have been fixed, I'm not sure how. 
   
   I do not remember the post/all the details. Yes, tuples where slow/er 
 than 
   Python. Maybe it was Dict, isn't that kind of a tuple? Now we have 
 Pair in 
   0.4. I do not have 0.4, maybe I should bite the bullet and install.. 
 I'm 
   not doing anything production related and trying things out and using 
   0.3[.5] to avoid stability problems.. Then I can't judge the speed.. 
   
   Another potential issue I saw with tuples (maybe that is not a 
 problem in 
   general, and I do not know that languages do this) is that they can 
 take a 
   lot of memory (to copy around). I was thinking, maybe they should do 
   similar to databases, only use a fixed amount of memory (a page) 
 with a 
   pointer to overflow data.. 
   
   2015-04-30 22:13 GMT+00:00 Ali Rezaee arv@gmail.com 
 javascript:: 
   They were interesting questions. 
   I would also like to know why poorly written Julia code 
   sometimes performs worse than similar python code, especially when 
 tuples 
   are involved. Did you say it was fixed? 
   
   On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson 
 wrote: 
   Hi, 
   
   [As a best language is subjective, I'll put that aside for a 
 moment.] 
   
   Part I. 
   
   The goal, as I understand, for Julia is at least within a factor of 
 two 
   of C and already matching it mostly and long term beating that (and 
   C++). 
   [What other goals are there? How about 0.4 now or even 1.0..?] 
   
   While that is the goal as a language, you can write slow code in 
 any 
   language and Julia makes that easier. :) [If I recall, Bezanson 
   mentioned 
   it (the global problem) as a feature, any change there?] 
   
   
   I've been following this forum for months and newbies hit the same 
   issues. But almost always without fail, Julia can be speed up 
 (easily as 
   Tim Holy says). I'm thinking about the exceptions to that - are 
 there 
   any 
   left? And about the first code slowness (see Part II). 
   
   Just recently the last two flaws of Julia that I could see where 
 fixed: 
   Decimal floating point is in (I'll look into the 100x slowness, 
 that is 
   probably to be expected of any language, still I think may be a 
   misunderstanding and/or I can do much better). And I understand the 
   tuple 
   slowness has been fixed (that was really the only core language 
   defect). 
   The former wasn't a performance problem (mostly a non existence 
 problem 
   and 
   correctness one (where needed)..). 
   
   
   Still we see threads like this one recent one: 
   
   https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw 
   It seems changing the order of nested loops also helps 
   
   Obviously Julia can't beat assembly but really C/Fortran is already 
   close enough (within a small factor). The above row vs. column 
 major 
   (caching effects in general) can kill performance in all languages. 
   Putting 
   that newbie mistake aside, is there any reason Julia can be within 
 a 
   small 
   factor of assembly (or C) in all cases already? 
   
   
   Part II. 
   
   Except for caching issues, I still want the most newbie code or 
   intentionally brain-damaged code to run faster than at least 
   Python/scripting/interpreted languages. 
   
   Potential problems (that I think are solved or at least not 
 problems in 
   theory): 
   
   1. I know Any kills performance. Still, isn't that the default in 
 Python 
   (and Ruby, Perl?)? Is there a good reason Julia can't be faster 
 than at 
   least all 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Steven Sagaert


On Friday, May 1, 2015 at 12:26:54 PM UTC+2, Scott Jones wrote:



 On Friday, May 1, 2015 at 4:25:50 AM UTC-4, Steven Sagaert wrote:

 I think the performance comparisons between Julia  Python are flawed. 
 They seem to be between standard Python  Julia but since Julia is all 
 about scientific programming it really should be between SciPi  Julia. 
 Since SciPi uses much of the same underlying libs in Fortran/C the 
 performance gap will be much smaller and to be really fair it should be 
 between numba compiled SciPi code  julia. I suspect the performance will 
 be very close then (and close to C performance).


 Why should Julia be limited to scientific programming?
 I think it can be a great language for general programming, 


I agree but for now  the short time future I think the core domain of 
julia is scientific computing/data science and so to have fair comparisons 
one should not just compare julia to vanilla Python but  especially scipi  
numba.
 

 for the most part, I think it already is (it can use some changes for 
 string handling [I'd like to work on that ;-)], decimal floating point 
 support [that is currently being addressed, kudos to Steven G. Johnson], 
 maybe some better language constructs to allow better software engineering 
 practices [that is being hotly debated!], and definitely a real debugger [I 
 think keno is working on that]).


 Comparing Julia to Python for general computing is totally valid and 
 interesting.
 Comparing Julia to SciPy for scientific computing is also totally valid 
 and interesting.

 Similarly the standard benchmark (on the opening page of julia website) 
 between R  julia is also flawed because it takes the best case scenario 
 for julia (loops  mutable datastructures)  the worst case scenario for R. 
 When the same R program is rewritten in vectorised style it beat julia see 
 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/
 .

 So my interest in julia isn't because it is the fastest scientific high 
 level language (because clearly at this stage you can't really claim that) 
 but because it's a clean interesting language (still needs work for some 
 rough edges of course) with clean(er)  clear(er) libraries  and that gives 
 reasonable performance out of the box without much tweaking. 



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Patrick O'Leary
On Friday, May 1, 2015 at 3:25:50 AM UTC-5, Steven Sagaert wrote:

 I think the performance comparisons between Julia  Python are flawed. 
 They seem to be between standard Python  Julia but since Julia is all 
 about scientific programming it really should be between SciPi  Julia. 
 Since SciPi uses much of the same underlying libs in Fortran/C the 
 performance gap will be much smaller and to be really fair it should be 
 between numba compiled SciPi code  julia. I suspect the performance will 
 be very close then (and close to C performance).

 Similarly the standard benchmark (on the opening page of julia website) 
 between R  julia is also flawed because it takes the best case scenario 
 for julia (loops  mutable datastructures)  the worst case scenario for R. 
 When the same R program is rewritten in vectorised style it beat julia see 
 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/
 .


All benchmarks are flawed in that sense--a single benchmark can't tell you 
everything. The Julia performance benchmarks are testing algorithms 
expressed in the langauges themselves. It is not a test of foreign-function 
interfaces and BLAS implementations, so the benchmarks don't test that. 
This has been discussed at length--as one example, see 
https://github.com/JuliaLang/julia/issues/2412.


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Tim Holy
Hi Steven,

I understand your point---you're saying you'd be unlikely to write those 
algorithms in that manner, if your goal were to do those particular 
computations. But the important point to keep in mind is that those benchmarks 
are simply toys for the purpose of testing performance of various language 
constructs. If you think it's irrelevant to benchmark loops for scientific 
code, then you do very, very different stuff than me. Not all algorithms reduce 
to BLAS calls. I use julia to write all kinds of algorithms that I used to 
write MEX functions for, back in my Matlab days. If all you need is A*b, then 
of course basically any scientific language will be just fine, with minimal 
differences in performance.

Moreover, that R benchmark on cumsum is simply not credible. I'm not sure what 
was happening (and that article doesn't post its code or procedures used to 
test), but julia's cumsum reduces to efficient machine code (basically, a bunch 
of addition operations). If they were computing cumsum across a specific 
dimension, then this PR:
https://github.com/JuliaLang/julia/pull/7359
changed things. But more likely, someone forgot to run the code twice (so it 
got JIT-compiled), had a type-instability in the code they were testing, or 
some other mistake. It's too bad one can make mistakes, of course, but then it 
becomes a comparison of different programmers rather than different programming 
languages.

Indeed, if you read the comments in that post, Stefan already rebutted that 
benchmark, with a 4x advantage for Julia:
https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyone-else-wanna-challenge-r/comment-page-1/#comment-89

--Tim



On Friday, May 01, 2015 01:25:50 AM Steven Sagaert wrote:
 I think the performance comparisons between Julia  Python are flawed. They
 seem to be between standard Python  Julia but since Julia is all about
 scientific programming it really should be between SciPi  Julia. Since
 SciPi uses much of the same underlying libs in Fortran/C the performance
 gap will be much smaller and to be really fair it should be between numba
 compiled SciPi code  julia. I suspect the performance will be very close
 then (and close to C performance).
 
 Similarly the standard benchmark (on the opening page of julia website)
 between R  julia is also flawed because it takes the best case scenario
 for julia (loops  mutable datastructures)  the worst case scenario for R.
 When the same R program is rewritten in vectorised style it beat julia
 see
 https://matloff.wordpress.com/2014/05/21/r-beats-python-r-beats-julia-anyon
 e-else-wanna-challenge-r/.
 
 So my interest in julia isn't because it is the fastest scientific high
 level language (because clearly at this stage you can't really claim that)
 but because it's a clean interesting language (still needs work for some
 rough edges of course) with clean(er)  clear(er) libraries  and that gives
 reasonable performance out of the box without much tweaking.
 
 On Friday, May 1, 2015 at 12:10:58 AM UTC+2, Scott Jones wrote:
  Yes... Python will win on string processing... esp. with Python 3... I
  quickly ran into things that were  800x faster in Python...
  (I hope to help change that though!)
  
  Scott
  
  On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson wrote:
  I wouldn't expect a difference in Julia for code like that (didn't
  check). But I guess what we are often seeing is someone comparing a tuned
  Python code to newbie Julia code. I still want it faster than that code..
  (assuming same algorithm, note row vs. column major caveat).
  
  The main point of mine, *should* Python at any time win?
  
  2015-04-30 21:36 GMT+00:00 Sisyphuss zhengw...@gmail.com:
  This post interests me. I'll write something here to follow this post.
  
  The performance gap between normal code in Python and badly-written code
  in Julia is something I'd like to know too.
  As far as I know, Python interpret does some mysterious optimizations.
  For example `(x**2)**2` is 100x faster than `x**4`.
  
  On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:
  Hi,
  
  [As a best language is subjective, I'll put that aside for a moment.]
  
  Part I.
  
  The goal, as I understand, for Julia is at least within a factor of two
  of C and already matching it mostly and long term beating that (and
  C++).
  [What other goals are there? How about 0.4 now or even 1.0..?]
  
  While that is the goal as a language, you can write slow code in any
  language and Julia makes that easier. :) [If I recall, Bezanson
  mentioned
  it (the global problem) as a feature, any change there?]
  
  
  I've been following this forum for months and newbies hit the same
  issues. But almost always without fail, Julia can be speed up (easily
  as
  Tim Holy says). I'm thinking about the exceptions to that - are there
  any
  left? And about the first code slowness (see Part II).
  
  Just recently the last two flaws of Julia 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-05-01 Thread Tim Holy
On Friday, May 01, 2015 03:19:03 AM Scott Jones wrote:
 As the string grows, Julia's internals end up having to reallocate the 
 memory and sometimes copy it to a new location, hence the O(n^2) nature of 
 the code.

Small correction: push! is not O(n^2), it's O(nlogn). Internally, the storage 
array grows by factors of 2 [1]; after one allocation of size 2n you can add n 
more elements without reallocating.

That said, O(nlogn) can be pretty easily beat by O(2n): make one pass through 
and count how many you'll need, allocate the whole thing, and then stuff in 
elements. As you seem to be planning to do.

--Tim

[1] Last I looked, that is; there was some discussion about switching it to 
something like 1.5 because of various discussions of memory fragmentation and 
reuse.



Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Harry B
Sorry my comment wasn't well thought out and a bit off topic. On 
exceptions/errors my issue is this 
https://github.com/JuliaLang/julia/issues/7026
On profiling, I was comparing to Go, but again off topic and I take my 
comment back. I don't have any intelligent remarks to add (yet!) :)
Thank you for the all the work you are doing. 

On Thursday, April 30, 2015 at 7:00:01 PM UTC-7, Tim Holy wrote:

 Harry, I'm curious about 2 of your 3 last points: 

 On Thursday, April 30, 2015 05:50:15 PM Harry B wrote: 
  (exceptions?, debugging, profiling tools) 

 We have exceptions. What aspect are you referring to? 
 Debugger: yes, that's missing, and it's a huge gap. 
 Profiling tools: in my view we're doing OK (better than Matlab, in my 
 opinion), 
 but what do you see as missing? 

 --Tim 

  
  Thanks 
  -- 
  Harry 
  
  On Thursday, April 30, 2015 at 3:43:36 PM UTC-7, Páll Haraldsson wrote: 
   It seemed to me tuples where slow because of Any used. I understand 
 tuples 
   have been fixed, I'm not sure how. 
   
   I do not remember the post/all the details. Yes, tuples where slow/er 
 than 
   Python. Maybe it was Dict, isn't that kind of a tuple? Now we have 
 Pair in 
   0.4. I do not have 0.4, maybe I should bite the bullet and install.. 
 I'm 
   not doing anything production related and trying things out and using 
   0.3[.5] to avoid stability problems.. Then I can't judge the speed.. 
   
   Another potential issue I saw with tuples (maybe that is not a problem 
 in 
   general, and I do not know that languages do this) is that they can 
 take a 
   lot of memory (to copy around). I was thinking, maybe they should do 
   similar to databases, only use a fixed amount of memory (a page) 
 with a 
   pointer to overflow data.. 
   
   2015-04-30 22:13 GMT+00:00 Ali Rezaee arv@gmail.com 
 javascript:: 
   They were interesting questions. 
   I would also like to know why poorly written Julia code 
   sometimes performs worse than similar python code, especially when 
 tuples 
   are involved. Did you say it was fixed? 
   
   On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson 
 wrote: 
   Hi, 
   
   [As a best language is subjective, I'll put that aside for a 
 moment.] 
   
   Part I. 
   
   The goal, as I understand, for Julia is at least within a factor of 
 two 
   of C and already matching it mostly and long term beating that (and 
   C++). 
   [What other goals are there? How about 0.4 now or even 1.0..?] 
   
   While that is the goal as a language, you can write slow code in any 
   language and Julia makes that easier. :) [If I recall, Bezanson 
   mentioned 
   it (the global problem) as a feature, any change there?] 
   
   
   I've been following this forum for months and newbies hit the same 
   issues. But almost always without fail, Julia can be speed up 
 (easily as 
   Tim Holy says). I'm thinking about the exceptions to that - are 
 there 
   any 
   left? And about the first code slowness (see Part II). 
   
   Just recently the last two flaws of Julia that I could see where 
 fixed: 
   Decimal floating point is in (I'll look into the 100x slowness, that 
 is 
   probably to be expected of any language, still I think may be a 
   misunderstanding and/or I can do much better). And I understand the 
   tuple 
   slowness has been fixed (that was really the only core language 
   defect). 
   The former wasn't a performance problem (mostly a non existence 
 problem 
   and 
   correctness one (where needed)..). 
   
   
   Still we see threads like this one recent one: 
   
   https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw 
   It seems changing the order of nested loops also helps 
   
   Obviously Julia can't beat assembly but really C/Fortran is already 
   close enough (within a small factor). The above row vs. column major 
   (caching effects in general) can kill performance in all languages. 
   Putting 
   that newbie mistake aside, is there any reason Julia can be within a 
   small 
   factor of assembly (or C) in all cases already? 
   
   
   Part II. 
   
   Except for caching issues, I still want the most newbie code or 
   intentionally brain-damaged code to run faster than at least 
   Python/scripting/interpreted languages. 
   
   Potential problems (that I think are solved or at least not problems 
 in 
   theory): 
   
   1. I know Any kills performance. Still, isn't that the default in 
 Python 
   (and Ruby, Perl?)? Is there a good reason Julia can't be faster than 
 at 
   least all the so-called scripting languages in all cases (excluding 
   small 
   startup overhead, see below)? 
   
   2. The global issue, not sure if that slows other languages down, 
 say 
   Python. Even if it doesn't, should Julia be slower than Python 
 because 
   of 
   global? 
   
   3. Garbage collection. I do not see that as a problem, incorrect? 
 Mostly 
   performance variability ([3D] games - subject for another post, as 
 I'm 
   not sure 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Jeff Bezanson
It is true that we have not yet done enough to optimize the worst and
worse performance cases. The bright side of that is that we have room
to improve; it's not that we've run out of ideas and techniques.

Tim is right that the complexity of our dispatch system makes julia
potentially slower than python. But in dispatch-heavy code I've seen
cases where we are faster or slower; it depends.

Python's string and dictionary operations, in particular, are really
fast. This is not surprising considering what the language was
designed for, and that they have a big library of well-tuned C code
for these things.

I still maintain that it is misleading to describe an *asymptotic*
slowdown as 800x slower. If you name a constant factor, it sounds
like you're talking about a constant factor slowdown. But the number
is arbitrary, because it depends on data size. In theory, of course,
an asymptotic slowdown is *much worse* than a constant factor
slowdown. However in the systems world constant factors are often more
important, and are often what we talk about.

You say a lot of the algorithms are O(n) instead of O(1). Are there
any examples other than length()?

I disagree that UTF-8 has no space savings over UTF-32 when using the
full range of unicode. The reason is that strings often have only a
small percentage of non-BMP characters, with lots of spaces and
newlines etc. You don't want your whole file to use 4x the space just
to use one emoji.


On Fri, May 1, 2015 at 12:42 AM, Harry B harrysun...@gmail.com wrote:
 Sorry my comment wasn't well thought out and a bit off topic. On
 exceptions/errors my issue is this
 https://github.com/JuliaLang/julia/issues/7026
 On profiling, I was comparing to Go, but again off topic and I take my
 comment back. I don't have any intelligent remarks to add (yet!) :)
 Thank you for the all the work you are doing.

 On Thursday, April 30, 2015 at 7:00:01 PM UTC-7, Tim Holy wrote:

 Harry, I'm curious about 2 of your 3 last points:

 On Thursday, April 30, 2015 05:50:15 PM Harry B wrote:
  (exceptions?, debugging, profiling tools)

 We have exceptions. What aspect are you referring to?
 Debugger: yes, that's missing, and it's a huge gap.
 Profiling tools: in my view we're doing OK (better than Matlab, in my
 opinion),
 but what do you see as missing?

 --Tim

 
  Thanks
  --
  Harry
 
  On Thursday, April 30, 2015 at 3:43:36 PM UTC-7, Páll Haraldsson wrote:
   It seemed to me tuples where slow because of Any used. I understand
   tuples
   have been fixed, I'm not sure how.
  
   I do not remember the post/all the details. Yes, tuples where slow/er
   than
   Python. Maybe it was Dict, isn't that kind of a tuple? Now we have
   Pair in
   0.4. I do not have 0.4, maybe I should bite the bullet and install..
   I'm
   not doing anything production related and trying things out and using
   0.3[.5] to avoid stability problems.. Then I can't judge the speed..
  
   Another potential issue I saw with tuples (maybe that is not a problem
   in
   general, and I do not know that languages do this) is that they can
   take a
   lot of memory (to copy around). I was thinking, maybe they should do
   similar to databases, only use a fixed amount of memory (a page)
   with a
   pointer to overflow data..
  
   2015-04-30 22:13 GMT+00:00 Ali Rezaee arv@gmail.com
   javascript::
   They were interesting questions.
   I would also like to know why poorly written Julia code
   sometimes performs worse than similar python code, especially when
   tuples
   are involved. Did you say it was fixed?
  
   On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson
   wrote:
   Hi,
  
   [As a best language is subjective, I'll put that aside for a
   moment.]
  
   Part I.
  
   The goal, as I understand, for Julia is at least within a factor of
   two
   of C and already matching it mostly and long term beating that (and
   C++).
   [What other goals are there? How about 0.4 now or even 1.0..?]
  
   While that is the goal as a language, you can write slow code in any
   language and Julia makes that easier. :) [If I recall, Bezanson
   mentioned
   it (the global problem) as a feature, any change there?]
  
  
   I've been following this forum for months and newbies hit the same
   issues. But almost always without fail, Julia can be speed up
   (easily as
   Tim Holy says). I'm thinking about the exceptions to that - are
   there
   any
   left? And about the first code slowness (see Part II).
  
   Just recently the last two flaws of Julia that I could see where
   fixed:
   Decimal floating point is in (I'll look into the 100x slowness, that
   is
   probably to be expected of any language, still I think may be a
   misunderstanding and/or I can do much better). And I understand the
   tuple
   slowness has been fixed (that was really the only core language
   defect).
   The former wasn't a performance problem (mostly a non existence
   problem
   and
   correctness one (where needed)..).
  
  

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Páll Haraldsson
I wouldn't expect a difference in Julia for code like that (didn't check).
But I guess what we are often seeing is someone comparing a tuned Python
code to newbie Julia code. I still want it faster than that code..
(assuming same algorithm, note row vs. column major caveat).

The main point of mine, *should* Python at any time win?

2015-04-30 21:36 GMT+00:00 Sisyphuss zhengwend...@gmail.com:

 This post interests me. I'll write something here to follow this post.

 The performance gap between normal code in Python and badly-written code
 in Julia is something I'd like to know too.
 As far as I know, Python interpret does some mysterious optimizations. For
 example `(x**2)**2` is 100x faster than `x**4`.




 On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:


 Hi,

 [As a best language is subjective, I'll put that aside for a moment.]

 Part I.

 The goal, as I understand, for Julia is at least within a factor of two
 of C and already matching it mostly and long term beating that (and C++).
 [What other goals are there? How about 0.4 now or even 1.0..?]

 While that is the goal as a language, you can write slow code in any
 language and Julia makes that easier. :) [If I recall, Bezanson mentioned
 it (the global problem) as a feature, any change there?]


 I've been following this forum for months and newbies hit the same
 issues. But almost always without fail, Julia can be speed up (easily as
 Tim Holy says). I'm thinking about the exceptions to that - are there any
 left? And about the first code slowness (see Part II).

 Just recently the last two flaws of Julia that I could see where fixed:
 Decimal floating point is in (I'll look into the 100x slowness, that is
 probably to be expected of any language, still I think may be a
 misunderstanding and/or I can do much better). And I understand the tuple
 slowness has been fixed (that was really the only core language defect).
 The former wasn't a performance problem (mostly a non existence problem and
 correctness one (where needed)..).


 Still we see threads like this one recent one:

 https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
 It seems changing the order of nested loops also helps

 Obviously Julia can't beat assembly but really C/Fortran is already close
 enough (within a small factor). The above row vs. column major (caching
 effects in general) can kill performance in all languages. Putting that
 newbie mistake aside, is there any reason Julia can be within a small
 factor of assembly (or C) in all cases already?


 Part II.

 Except for caching issues, I still want the most newbie code or
 intentionally brain-damaged code to run faster than at least
 Python/scripting/interpreted languages.

 Potential problems (that I think are solved or at least not problems in
 theory):

 1. I know Any kills performance. Still, isn't that the default in Python
 (and Ruby, Perl?)? Is there a good reason Julia can't be faster than at
 least all the so-called scripting languages in all cases (excluding small
 startup overhead, see below)?

 2. The global issue, not sure if that slows other languages down, say
 Python. Even if it doesn't, should Julia be slower than Python because of
 global?

 3. Garbage collection. I do not see that as a problem, incorrect? Mostly
 performance variability ([3D] games - subject for another post, as I'm
 not sure that is even a problem in theory..). Should reference counting
 (Python) be faster? On the contrary, I think RC and even manual memory
 management could be slower.

 4. Concurrency, see nr. 3. GC may or may not have an issue with it. It
 can be a problem, what about in Julia? There are concurrent GC algorithms
 and/or real-time (just not in Julia). Other than GC is there any big
 (potential) problem for concurrent/parallel? I know about the threads work
 and new GC in 0.4.

 5. Subarrays (array slicing?). Not really what I consider a problem,
 compared to say C (and Python?). I know 0.4 did optimize it, but what
 languages do similar stuff? Functional ones?

 6. In theory, pure functional languages should be faster. Are they in
 practice in many or any case? Julia has non-mutable state if needed but
 maybe not as powerful? This seems a double-edged sword. I think Julia
 designers intentionally chose mutable state to conserve memory. Pros and
 cons? Mostly Pros for Julia?

 7. Startup time. Python is faster and for say web use, or compared to PHP
 could be an issue, but would be solved by not doing CGI-style web. How
 good/fast is Julia/the libraries right now for say web use? At least for
 long running programs (intended target of Julia) startup time is not an
 issue.

 8. MPI, do not know enough about it and parallel in general, seems you
 are doing a good job. I at least think there is no inherent limitation. At
 least Python is not in any way better for parallel/concurrent?

 9. Autoparallel. Julia doesn't try to be, but could (be an addon?). Is
 anyone doing really good and 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Páll Haraldsson
It seemed to me tuples where slow because of Any used. I understand tuples
have been fixed, I'm not sure how.

I do not remember the post/all the details. Yes, tuples where slow/er than
Python. Maybe it was Dict, isn't that kind of a tuple? Now we have Pair in
0.4. I do not have 0.4, maybe I should bite the bullet and install.. I'm
not doing anything production related and trying things out and using
0.3[.5] to avoid stability problems.. Then I can't judge the speed..

Another potential issue I saw with tuples (maybe that is not a problem in
general, and I do not know that languages do this) is that they can take a
lot of memory (to copy around). I was thinking, maybe they should do
similar to databases, only use a fixed amount of memory (a page) with a
pointer to overflow data..

2015-04-30 22:13 GMT+00:00 Ali Rezaee arv.ka...@gmail.com:

 They were interesting questions.
 I would also like to know why poorly written Julia code sometimes performs
 worse than similar python code, especially when tuples are involved. Did
 you say it was fixed?

 On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:


 Hi,

 [As a best language is subjective, I'll put that aside for a moment.]

 Part I.

 The goal, as I understand, for Julia is at least within a factor of two
 of C and already matching it mostly and long term beating that (and C++).
 [What other goals are there? How about 0.4 now or even 1.0..?]

 While that is the goal as a language, you can write slow code in any
 language and Julia makes that easier. :) [If I recall, Bezanson mentioned
 it (the global problem) as a feature, any change there?]


 I've been following this forum for months and newbies hit the same
 issues. But almost always without fail, Julia can be speed up (easily as
 Tim Holy says). I'm thinking about the exceptions to that - are there any
 left? And about the first code slowness (see Part II).

 Just recently the last two flaws of Julia that I could see where fixed:
 Decimal floating point is in (I'll look into the 100x slowness, that is
 probably to be expected of any language, still I think may be a
 misunderstanding and/or I can do much better). And I understand the tuple
 slowness has been fixed (that was really the only core language defect).
 The former wasn't a performance problem (mostly a non existence problem and
 correctness one (where needed)..).


 Still we see threads like this one recent one:

 https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
 It seems changing the order of nested loops also helps

 Obviously Julia can't beat assembly but really C/Fortran is already close
 enough (within a small factor). The above row vs. column major (caching
 effects in general) can kill performance in all languages. Putting that
 newbie mistake aside, is there any reason Julia can be within a small
 factor of assembly (or C) in all cases already?


 Part II.

 Except for caching issues, I still want the most newbie code or
 intentionally brain-damaged code to run faster than at least
 Python/scripting/interpreted languages.

 Potential problems (that I think are solved or at least not problems in
 theory):

 1. I know Any kills performance. Still, isn't that the default in Python
 (and Ruby, Perl?)? Is there a good reason Julia can't be faster than at
 least all the so-called scripting languages in all cases (excluding small
 startup overhead, see below)?

 2. The global issue, not sure if that slows other languages down, say
 Python. Even if it doesn't, should Julia be slower than Python because of
 global?

 3. Garbage collection. I do not see that as a problem, incorrect? Mostly
 performance variability ([3D] games - subject for another post, as I'm
 not sure that is even a problem in theory..). Should reference counting
 (Python) be faster? On the contrary, I think RC and even manual memory
 management could be slower.

 4. Concurrency, see nr. 3. GC may or may not have an issue with it. It
 can be a problem, what about in Julia? There are concurrent GC algorithms
 and/or real-time (just not in Julia). Other than GC is there any big
 (potential) problem for concurrent/parallel? I know about the threads work
 and new GC in 0.4.

 5. Subarrays (array slicing?). Not really what I consider a problem,
 compared to say C (and Python?). I know 0.4 did optimize it, but what
 languages do similar stuff? Functional ones?

 6. In theory, pure functional languages should be faster. Are they in
 practice in many or any case? Julia has non-mutable state if needed but
 maybe not as powerful? This seems a double-edged sword. I think Julia
 designers intentionally chose mutable state to conserve memory. Pros and
 cons? Mostly Pros for Julia?

 7. Startup time. Python is faster and for say web use, or compared to PHP
 could be an issue, but would be solved by not doing CGI-style web. How
 good/fast is Julia/the libraries right now for say web use? At least for
 long running programs (intended target of Julia) 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Scott Jones


On Thursday, April 30, 2015 at 6:34:23 PM UTC-4, Páll Haraldsson wrote:

 Interesting.. does that mean Unicode then that is esp. faster or something 
 else?

 800x faster is way worse than I thought and no good reason for it..


That particular case is because CPython (which is the standard C 
implementation of Python, what most people mean when they use Python), has 
optimized the case of

var += string

which is appending to a variable.

Although strings *are* immutable in Python, as in Julia, Python detects 
that you are replacing a string with the string concatenated with another, 
and if
nobody else has a reference to the string in that variable, it can simply 
update the string in place, and otherwise, it makes a new string big enough 
for the result,
and sets the variable to that new string.
 

 I'm really intrigued what is this slow, can't be the simple things like 
 say just string concatenation?!

 You can get similar speed using PyCall.jl :)


I'm not so sure... I don't really think so - because you still have to move 
the string from Julia (which uses either ASCII or UTF-8 for strings by 
default, you have to specifically
convert them to get UTF-16 or UTF-32...) to Python, and then back... and 
Julia's string conversions are rather slow... O(n^2) in most cases...
(I'm working in improving that, I hope I can get my changes accepted into 
Julia's Base)

For some obscure function like Levenshtein distance I might expect this (or 
 not implemented yet in Julia) as Python would use tuned C code or in any 
 function where you need to do non-trivial work per function-call.


 I failed to add regex to the list as an example as I was pretty sure it 
 was as fast (or faster, because of macros) as Perl as it is using the same 
 library.

 Similarly for all Unicode/UTF-8 stuff I was not expecting slowness. I know 
 the work on that in Python2/3 and expected Julia could/did similar.


No, a lot of the algorithms are O(n) instead of O(1), because of the 
decision to use UTF-8...
I'd like to convince the core team to change Julia to do what Python 3 does.
UTF-8 is pretty bad to use for internal string representation (where it 
shines is an an interchange format).
UTF-8 can take up to 50% more storage than UTF-16 if you are just dealing 
with BMP characters.
If you have some field that needs to hold a certain number of Unicode 
characters, for the full range of Unicode,
you need to allocate 4 bytes for every character, so no savings compared to 
UTF-16 or UTF-32.

Python 3 internally stores strings as either: 7-bit (ASCII), 8-bit (ANSI 
Latin1, only characters  0x100 present), 16-bit (UCS-2, i.e. there are no 
non-BMP characters present),
or 32-bit (UTF-32).  You might wonder why there is a special distinction 
between 7-bit ASCII and 8-bit ANSI Latin 1... they are both Unicode 
subsets, but 7-bit ASCII
can also be used directly without conversion as UTF-8.
All internal formats are directly addressable (unlike Julia's UTF8String 
and UTF16String), and the conversions between the 4 internal types is very 
fast, simple
widening (or a no-op, as in the case of ASCII - ANSI), when going from 
smaller to larger.

Julia also has a big problem with always wanting to have a terminating \0 
byte or word, which means that you can't take a substring or slice of 
another string without
making a copy to be able to add that terminating \0 (so lots of extra 
memory allocation and garbage collection for common algorithms).

I hope that makes things a bit clearer!

Scott


Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Tim Holy
Harry, I'm curious about 2 of your 3 last points:

On Thursday, April 30, 2015 05:50:15 PM Harry B wrote:
 (exceptions?, debugging, profiling tools)

We have exceptions. What aspect are you referring to?
Debugger: yes, that's missing, and it's a huge gap.
Profiling tools: in my view we're doing OK (better than Matlab, in my opinion), 
but what do you see as missing? 

--Tim

 
 Thanks
 --
 Harry
 
 On Thursday, April 30, 2015 at 3:43:36 PM UTC-7, Páll Haraldsson wrote:
  It seemed to me tuples where slow because of Any used. I understand tuples
  have been fixed, I'm not sure how.
  
  I do not remember the post/all the details. Yes, tuples where slow/er than
  Python. Maybe it was Dict, isn't that kind of a tuple? Now we have Pair in
  0.4. I do not have 0.4, maybe I should bite the bullet and install.. I'm
  not doing anything production related and trying things out and using
  0.3[.5] to avoid stability problems.. Then I can't judge the speed..
  
  Another potential issue I saw with tuples (maybe that is not a problem in
  general, and I do not know that languages do this) is that they can take a
  lot of memory (to copy around). I was thinking, maybe they should do
  similar to databases, only use a fixed amount of memory (a page) with a
  pointer to overflow data..
  
  2015-04-30 22:13 GMT+00:00 Ali Rezaee arv@gmail.com javascript::
  They were interesting questions.
  I would also like to know why poorly written Julia code
  sometimes performs worse than similar python code, especially when tuples
  are involved. Did you say it was fixed?
  
  On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:
  Hi,
  
  [As a best language is subjective, I'll put that aside for a moment.]
  
  Part I.
  
  The goal, as I understand, for Julia is at least within a factor of two
  of C and already matching it mostly and long term beating that (and
  C++).
  [What other goals are there? How about 0.4 now or even 1.0..?]
  
  While that is the goal as a language, you can write slow code in any
  language and Julia makes that easier. :) [If I recall, Bezanson
  mentioned
  it (the global problem) as a feature, any change there?]
  
  
  I've been following this forum for months and newbies hit the same
  issues. But almost always without fail, Julia can be speed up (easily as
  Tim Holy says). I'm thinking about the exceptions to that - are there
  any
  left? And about the first code slowness (see Part II).
  
  Just recently the last two flaws of Julia that I could see where fixed:
  Decimal floating point is in (I'll look into the 100x slowness, that is
  probably to be expected of any language, still I think may be a
  misunderstanding and/or I can do much better). And I understand the
  tuple
  slowness has been fixed (that was really the only core language
  defect).
  The former wasn't a performance problem (mostly a non existence problem
  and
  correctness one (where needed)..).
  
  
  Still we see threads like this one recent one:
  
  https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
  It seems changing the order of nested loops also helps
  
  Obviously Julia can't beat assembly but really C/Fortran is already
  close enough (within a small factor). The above row vs. column major
  (caching effects in general) can kill performance in all languages.
  Putting
  that newbie mistake aside, is there any reason Julia can be within a
  small
  factor of assembly (or C) in all cases already?
  
  
  Part II.
  
  Except for caching issues, I still want the most newbie code or
  intentionally brain-damaged code to run faster than at least
  Python/scripting/interpreted languages.
  
  Potential problems (that I think are solved or at least not problems in
  theory):
  
  1. I know Any kills performance. Still, isn't that the default in Python
  (and Ruby, Perl?)? Is there a good reason Julia can't be faster than at
  least all the so-called scripting languages in all cases (excluding
  small
  startup overhead, see below)?
  
  2. The global issue, not sure if that slows other languages down, say
  Python. Even if it doesn't, should Julia be slower than Python because
  of
  global?
  
  3. Garbage collection. I do not see that as a problem, incorrect? Mostly
  performance variability ([3D] games - subject for another post, as I'm
  not sure that is even a problem in theory..). Should reference counting
  (Python) be faster? On the contrary, I think RC and even manual memory
  management could be slower.
  
  4. Concurrency, see nr. 3. GC may or may not have an issue with it. It
  can be a problem, what about in Julia? There are concurrent GC
  algorithms
  and/or real-time (just not in Julia). Other than GC is there any big
  (potential) problem for concurrent/parallel? I know about the threads
  work
  and new GC in 0.4.
  
  5. Subarrays (array slicing?). Not really what I consider a problem,
  compared to say C (and Python?). I know 0.4 did optimize it, but what
  

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Harry B
a newbie comment:  If it can be made a bit more easier to write code that 
uses all the cores ( I am comparing to Go with its channels), it probably 
doesn't need to be faster than Python. 

From an outsider's perspective, @everywhere is inconvenient. pmap etc 
doesn't cover nearly as many cases as Go channels. May be it is 
documentation problem.

I wouldn't think it would be good to try to extract every last bit of speed 
when you are 0.4.. there are so many things to cleanup/build in the 
language and standard library (exceptions?, debugging, profiling tools)

Thanks
--
Harry

On Thursday, April 30, 2015 at 3:43:36 PM UTC-7, Páll Haraldsson wrote:

 It seemed to me tuples where slow because of Any used. I understand tuples 
 have been fixed, I'm not sure how.

 I do not remember the post/all the details. Yes, tuples where slow/er than 
 Python. Maybe it was Dict, isn't that kind of a tuple? Now we have Pair in 
 0.4. I do not have 0.4, maybe I should bite the bullet and install.. I'm 
 not doing anything production related and trying things out and using 
 0.3[.5] to avoid stability problems.. Then I can't judge the speed..

 Another potential issue I saw with tuples (maybe that is not a problem in 
 general, and I do not know that languages do this) is that they can take a 
 lot of memory (to copy around). I was thinking, maybe they should do 
 similar to databases, only use a fixed amount of memory (a page) with a 
 pointer to overflow data..

 2015-04-30 22:13 GMT+00:00 Ali Rezaee arv@gmail.com javascript::

 They were interesting questions.
 I would also like to know why poorly written Julia code 
 sometimes performs worse than similar python code, especially when tuples 
 are involved. Did you say it was fixed?

 On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:


 Hi,

 [As a best language is subjective, I'll put that aside for a moment.]

 Part I.

 The goal, as I understand, for Julia is at least within a factor of two 
 of C and already matching it mostly and long term beating that (and C++). 
 [What other goals are there? How about 0.4 now or even 1.0..?]

 While that is the goal as a language, you can write slow code in any 
 language and Julia makes that easier. :) [If I recall, Bezanson mentioned 
 it (the global problem) as a feature, any change there?]


 I've been following this forum for months and newbies hit the same 
 issues. But almost always without fail, Julia can be speed up (easily as 
 Tim Holy says). I'm thinking about the exceptions to that - are there any 
 left? And about the first code slowness (see Part II).

 Just recently the last two flaws of Julia that I could see where fixed: 
 Decimal floating point is in (I'll look into the 100x slowness, that is 
 probably to be expected of any language, still I think may be a 
 misunderstanding and/or I can do much better). And I understand the tuple 
 slowness has been fixed (that was really the only core language defect). 
 The former wasn't a performance problem (mostly a non existence problem and 
 correctness one (where needed)..).


 Still we see threads like this one recent one:

 https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
 It seems changing the order of nested loops also helps

 Obviously Julia can't beat assembly but really C/Fortran is already 
 close enough (within a small factor). The above row vs. column major 
 (caching effects in general) can kill performance in all languages. Putting 
 that newbie mistake aside, is there any reason Julia can be within a small 
 factor of assembly (or C) in all cases already?


 Part II.

 Except for caching issues, I still want the most newbie code or 
 intentionally brain-damaged code to run faster than at least 
 Python/scripting/interpreted languages.

 Potential problems (that I think are solved or at least not problems in 
 theory):

 1. I know Any kills performance. Still, isn't that the default in Python 
 (and Ruby, Perl?)? Is there a good reason Julia can't be faster than at 
 least all the so-called scripting languages in all cases (excluding small 
 startup overhead, see below)?

 2. The global issue, not sure if that slows other languages down, say 
 Python. Even if it doesn't, should Julia be slower than Python because of 
 global?

 3. Garbage collection. I do not see that as a problem, incorrect? Mostly 
 performance variability ([3D] games - subject for another post, as I'm 
 not sure that is even a problem in theory..). Should reference counting 
 (Python) be faster? On the contrary, I think RC and even manual memory 
 management could be slower.

 4. Concurrency, see nr. 3. GC may or may not have an issue with it. It 
 can be a problem, what about in Julia? There are concurrent GC algorithms 
 and/or real-time (just not in Julia). Other than GC is there any big 
 (potential) problem for concurrent/parallel? I know about the threads work 
 and new GC in 0.4.

 5. Subarrays (array slicing?). Not really what I consider 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Tim Holy
Strings have long been a performance sore-spot in julia, so we're glad Scott 
is hammering on that topic.

For interpreted code (including Julia with Any types), it's very possible 
that Python is and will remain faster. For one thing, Python is single-
dispatch, which means that when the interpreter has to go look up the function 
corresponding to your next expression, typically the list of applicable 
methods is quite short. In contrast, julia sometimes has to sort through huge 
method tables to determine the appropriate one to dispatch to. Multiple 
dispatch adds a lot of power to the language, and there's no performance cost 
for code that has been compiled, but it does make interpreted code slower.

Best,
--Tim

On Thursday, April 30, 2015 10:34:20 PM Páll Haraldsson wrote:
 Interesting.. does that mean Unicode then that is esp. faster or something
 else?
 
 800x faster is way worse than I thought and no good reason for it..
 
 I'm really intrigued what is this slow, can't be the simple things like say
 just string concatenation?!
 
 You can get similar speed using PyCall.jl :)
 
 For some obscure function like Levenshtein distance I might expect this (or
 not implemented yet in Julia) as Python would use tuned C code or in any
 function where you need to do non-trivial work per function-call.
 
 
 I failed to add regex to the list as an example as I was pretty sure it was
 as fast (or faster, because of macros) as Perl as it is using the same
 library.
 
 Similarly for all Unicode/UTF-8 stuff I was not expecting slowness. I know
 the work on that in Python2/3 and expected Julia could/did similar.
 
 2015-04-30 22:10 GMT+00:00 Scott Jones scott.paul.jo...@gmail.com:
  Yes... Python will win on string processing... esp. with Python 3... I
  quickly ran into things that were  800x faster in Python...
  (I hope to help change that though!)
  
  Scott
  
  On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson wrote:
  I wouldn't expect a difference in Julia for code like that (didn't
  check). But I guess what we are often seeing is someone comparing a tuned
  Python code to newbie Julia code. I still want it faster than that code..
  (assuming same algorithm, note row vs. column major caveat).
  
  The main point of mine, *should* Python at any time win?
  
  2015-04-30 21:36 GMT+00:00 Sisyphuss zhengw...@gmail.com:
  This post interests me. I'll write something here to follow this post.
  
  The performance gap between normal code in Python and badly-written code
  in Julia is something I'd like to know too.
  As far as I know, Python interpret does some mysterious optimizations.
  For example `(x**2)**2` is 100x faster than `x**4`.
  
  On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:
  Hi,
  
  [As a best language is subjective, I'll put that aside for a moment.]
  
  Part I.
  
  The goal, as I understand, for Julia is at least within a factor of two
  of C and already matching it mostly and long term beating that (and
  C++).
  [What other goals are there? How about 0.4 now or even 1.0..?]
  
  While that is the goal as a language, you can write slow code in any
  language and Julia makes that easier. :) [If I recall, Bezanson
  mentioned
  it (the global problem) as a feature, any change there?]
  
  
  I've been following this forum for months and newbies hit the same
  issues. But almost always without fail, Julia can be speed up (easily
  as
  Tim Holy says). I'm thinking about the exceptions to that - are there
  any
  left? And about the first code slowness (see Part II).
  
  Just recently the last two flaws of Julia that I could see where fixed:
  Decimal floating point is in (I'll look into the 100x slowness, that is
  probably to be expected of any language, still I think may be a
  misunderstanding and/or I can do much better). And I understand the
  tuple
  slowness has been fixed (that was really the only core language
  defect).
  The former wasn't a performance problem (mostly a non existence problem
  and
  correctness one (where needed)..).
  
  
  Still we see threads like this one recent one:
  
  https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
  It seems changing the order of nested loops also helps
  
  Obviously Julia can't beat assembly but really C/Fortran is already
  close enough (within a small factor). The above row vs. column major
  (caching effects in general) can kill performance in all languages.
  Putting
  that newbie mistake aside, is there any reason Julia can be within a
  small
  factor of assembly (or C) in all cases already?
  
Received server disconnect: b0 'Idle Timeout'

  
  Part II.
  
  Except for caching issues, I still want the most newbie code or
  intentionally brain-damaged code to run faster than at least
  Python/scripting/interpreted languages.
  
  Potential problems (that I think are solved or at least not problems in
  theory):
  
  1. I know Any kills performance. Still, isn't that the 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Scott Jones

 On Apr 30, 2015, at 9:58 PM, Tim Holy tim.h...@gmail.com wrote:
 
 Strings have long been a performance sore-spot in julia, so we're glad Scott 
 is hammering on that topic.

Thanks, Tim!  I was beginning to think I’d be banned from all Julia forums, for 
being a thorn in the side of
the Julia developers…
(I do want to say again… if I didn’t think what all of you had created wasn’t 
incredibly great, I wouldn’t be so interested
in making it even greater, in the particular areas I know a little about…
Also, the issues I’ve found are not because the developers aren’t brilliant 
[I’ve been super impressed, and I don’t impress
that easily!], but rather, either it’s outside of their area of expertise [as 
the numerical computing stuff is outside mine], or they
are incredibly busy making great strides in the areas that they are more 
interested in…)

 For interpreted code (including Julia with Any types), it's very possible 
 that Python is and will remain faster. For one thing, Python is single-
 dispatch, which means that when the interpreter has to go look up the 
 function 
 corresponding to your next expression, typically the list of applicable 
 methods is quite short. In contrast, julia sometimes has to sort through huge 
 method tables to determine the appropriate one to dispatch to. Multiple 
 dispatch adds a lot of power to the language, and there's no performance cost 
 for code that has been compiled, but it does make interpreted code slower.

Good point…

Scott

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Páll Haraldsson
Interesting.. does that mean Unicode then that is esp. faster or something
else?

800x faster is way worse than I thought and no good reason for it..

I'm really intrigued what is this slow, can't be the simple things like say
just string concatenation?!

You can get similar speed using PyCall.jl :)

For some obscure function like Levenshtein distance I might expect this (or
not implemented yet in Julia) as Python would use tuned C code or in any
function where you need to do non-trivial work per function-call.


I failed to add regex to the list as an example as I was pretty sure it was
as fast (or faster, because of macros) as Perl as it is using the same
library.

Similarly for all Unicode/UTF-8 stuff I was not expecting slowness. I know
the work on that in Python2/3 and expected Julia could/did similar.


2015-04-30 22:10 GMT+00:00 Scott Jones scott.paul.jo...@gmail.com:

 Yes... Python will win on string processing... esp. with Python 3... I
 quickly ran into things that were  800x faster in Python...
 (I hope to help change that though!)

 Scott

 On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson wrote:

 I wouldn't expect a difference in Julia for code like that (didn't
 check). But I guess what we are often seeing is someone comparing a tuned
 Python code to newbie Julia code. I still want it faster than that code..
 (assuming same algorithm, note row vs. column major caveat).

 The main point of mine, *should* Python at any time win?

 2015-04-30 21:36 GMT+00:00 Sisyphuss zhengw...@gmail.com:

 This post interests me. I'll write something here to follow this post.

 The performance gap between normal code in Python and badly-written code
 in Julia is something I'd like to know too.
 As far as I know, Python interpret does some mysterious optimizations.
 For example `(x**2)**2` is 100x faster than `x**4`.




 On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:


 Hi,

 [As a best language is subjective, I'll put that aside for a moment.]

 Part I.

 The goal, as I understand, for Julia is at least within a factor of two
 of C and already matching it mostly and long term beating that (and C++).
 [What other goals are there? How about 0.4 now or even 1.0..?]

 While that is the goal as a language, you can write slow code in any
 language and Julia makes that easier. :) [If I recall, Bezanson mentioned
 it (the global problem) as a feature, any change there?]


 I've been following this forum for months and newbies hit the same
 issues. But almost always without fail, Julia can be speed up (easily as
 Tim Holy says). I'm thinking about the exceptions to that - are there any
 left? And about the first code slowness (see Part II).

 Just recently the last two flaws of Julia that I could see where fixed:
 Decimal floating point is in (I'll look into the 100x slowness, that is
 probably to be expected of any language, still I think may be a
 misunderstanding and/or I can do much better). And I understand the tuple
 slowness has been fixed (that was really the only core language defect).
 The former wasn't a performance problem (mostly a non existence problem and
 correctness one (where needed)..).


 Still we see threads like this one recent one:

 https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
 It seems changing the order of nested loops also helps

 Obviously Julia can't beat assembly but really C/Fortran is already
 close enough (within a small factor). The above row vs. column major
 (caching effects in general) can kill performance in all languages. Putting
 that newbie mistake aside, is there any reason Julia can be within a small
 factor of assembly (or C) in all cases already?


 Part II.

 Except for caching issues, I still want the most newbie code or
 intentionally brain-damaged code to run faster than at least
 Python/scripting/interpreted languages.

 Potential problems (that I think are solved or at least not problems in
 theory):

 1. I know Any kills performance. Still, isn't that the default in
 Python (and Ruby, Perl?)? Is there a good reason Julia can't be faster than
 at least all the so-called scripting languages in all cases (excluding
 small startup overhead, see below)?

 2. The global issue, not sure if that slows other languages down, say
 Python. Even if it doesn't, should Julia be slower than Python because of
 global?

 3. Garbage collection. I do not see that as a problem, incorrect?
 Mostly performance variability ([3D] games - subject for another post, as
 I'm not sure that is even a problem in theory..). Should reference counting
 (Python) be faster? On the contrary, I think RC and even manual memory
 management could be slower.

 4. Concurrency, see nr. 3. GC may or may not have an issue with it. It
 can be a problem, what about in Julia? There are concurrent GC algorithms
 and/or real-time (just not in Julia). Other than GC is there any big
 (potential) problem for concurrent/parallel? I know about the threads work
 

Re: [julia-users] Re: Performance variability - can we expect Julia to be the fastest (best) language?

2015-04-30 Thread Scott Jones
Yes... Python will win on string processing... esp. with Python 3... I 
quickly ran into things that were  800x faster in Python...
(I hope to help change that though!)

Scott

On Thursday, April 30, 2015 at 6:01:45 PM UTC-4, Páll Haraldsson wrote:

 I wouldn't expect a difference in Julia for code like that (didn't check). 
 But I guess what we are often seeing is someone comparing a tuned Python 
 code to newbie Julia code. I still want it faster than that code.. 
 (assuming same algorithm, note row vs. column major caveat).

 The main point of mine, *should* Python at any time win?

 2015-04-30 21:36 GMT+00:00 Sisyphuss zhengw...@gmail.com javascript::

 This post interests me. I'll write something here to follow this post.

 The performance gap between normal code in Python and badly-written code 
 in Julia is something I'd like to know too.
 As far as I know, Python interpret does some mysterious optimizations. 
 For example `(x**2)**2` is 100x faster than `x**4`.




 On Thursday, April 30, 2015 at 9:58:35 PM UTC+2, Páll Haraldsson wrote:


 Hi,

 [As a best language is subjective, I'll put that aside for a moment.]

 Part I.

 The goal, as I understand, for Julia is at least within a factor of two 
 of C and already matching it mostly and long term beating that (and C++). 
 [What other goals are there? How about 0.4 now or even 1.0..?]

 While that is the goal as a language, you can write slow code in any 
 language and Julia makes that easier. :) [If I recall, Bezanson mentioned 
 it (the global problem) as a feature, any change there?]


 I've been following this forum for months and newbies hit the same 
 issues. But almost always without fail, Julia can be speed up (easily as 
 Tim Holy says). I'm thinking about the exceptions to that - are there any 
 left? And about the first code slowness (see Part II).

 Just recently the last two flaws of Julia that I could see where fixed: 
 Decimal floating point is in (I'll look into the 100x slowness, that is 
 probably to be expected of any language, still I think may be a 
 misunderstanding and/or I can do much better). And I understand the tuple 
 slowness has been fixed (that was really the only core language defect). 
 The former wasn't a performance problem (mostly a non existence problem and 
 correctness one (where needed)..).


 Still we see threads like this one recent one:

 https://groups.google.com/forum/#!topic/julia-users/-bx9xIfsHHw
 It seems changing the order of nested loops also helps

 Obviously Julia can't beat assembly but really C/Fortran is already 
 close enough (within a small factor). The above row vs. column major 
 (caching effects in general) can kill performance in all languages. Putting 
 that newbie mistake aside, is there any reason Julia can be within a small 
 factor of assembly (or C) in all cases already?


 Part II.

 Except for caching issues, I still want the most newbie code or 
 intentionally brain-damaged code to run faster than at least 
 Python/scripting/interpreted languages.

 Potential problems (that I think are solved or at least not problems in 
 theory):

 1. I know Any kills performance. Still, isn't that the default in Python 
 (and Ruby, Perl?)? Is there a good reason Julia can't be faster than at 
 least all the so-called scripting languages in all cases (excluding small 
 startup overhead, see below)?

 2. The global issue, not sure if that slows other languages down, say 
 Python. Even if it doesn't, should Julia be slower than Python because of 
 global?

 3. Garbage collection. I do not see that as a problem, incorrect? Mostly 
 performance variability ([3D] games - subject for another post, as I'm 
 not sure that is even a problem in theory..). Should reference counting 
 (Python) be faster? On the contrary, I think RC and even manual memory 
 management could be slower.

 4. Concurrency, see nr. 3. GC may or may not have an issue with it. It 
 can be a problem, what about in Julia? There are concurrent GC algorithms 
 and/or real-time (just not in Julia). Other than GC is there any big 
 (potential) problem for concurrent/parallel? I know about the threads work 
 and new GC in 0.4.

 5. Subarrays (array slicing?). Not really what I consider a problem, 
 compared to say C (and Python?). I know 0.4 did optimize it, but what 
 languages do similar stuff? Functional ones?

 6. In theory, pure functional languages should be faster. Are they in 
 practice in many or any case? Julia has non-mutable state if needed but 
 maybe not as powerful? This seems a double-edged sword. I think Julia 
 designers intentionally chose mutable state to conserve memory. Pros and 
 cons? Mostly Pros for Julia?

 7. Startup time. Python is faster and for say web use, or compared to 
 PHP could be an issue, but would be solved by not doing CGI-style web. How 
 good/fast is Julia/the libraries right now for say web use? At least for 
 long running programs (intended target of Julia) startup time is not an 
 issue.

 8.