Try running with --track-allocation=user and see if it's allocating memory on 
that line. If so, you have a type problem.
http://docs.julialang.org/en/latest/manual/performance-tips/
(2nd and 3rd sections)

--Tim


On Thursday, September 18, 2014 10:44:58 AM G. Patrick Mauroy wrote:
> No change.
> I over typed everything to avoid such type mismatches, particularly when
> experimenting with other integer types.  So unless I missed something
> somewhere, it should not be the case.
> I suspect something like the compiler does not recognize the incrementing
> variables should be registries.  Unless it is the inherent speed of
> incrementing, but I doubt it, I had some faster runs at some points...
> 
> On Thursday, September 18, 2014 12:58:12 PM UTC-4, John Myles White wrote:
> > 1 has type Int. If you add it to something with a different type, you
> > might be causing type instability. What happens if you replace the literal
> > 1 with one(T) for the type you're working with?
> > 
> >   -- John
> > 
> > On Sep 18, 2014, at 9:56 AM, G. Patrick Mauroy <gpma...@gmail.com
> > <javascript:>> wrote:
> > 
> > Profiling shows incrementing integers by 1 (i += 1) being the bottleneck.
> > 
> > Within the same loop are other statements that do take much less time.
> > 
> > In my performance optimizing zeal, I over typed the hell out of everything
> > to attempt squeezing performance to the last once.
> > Some of this zeal did help in other parts of the code, but now struggling
> > making sense at spending most of the time incrementing by 1.
> > I suspect the problem is over typing zeal because I seem to recall having
> > a version not so strongly typed that ran consistently 2-3 times faster for
> > default Int (but not for other Int types).  It was late at night so I
> > don't
> > recall the details!
> > 
> > I am pretty confident the increment variables are typed so there should
> > not be any undue cast.
> > 
> > Any idea?
> > 
> > Here is how my code conceptually looks like:
> > 
> > # Global static type declaration ahead seems to have helped (as opposed to
> > 
> >> deriving from eltype of underlying array at the beginning of function
> >> being
> >> profiled).
> >> IdType = Int # Int64
> >> DType = Int
> >> function my_fct(dt1, dt2)
> >> 
> >>   # Convert is for sure unnecessary for default Int types but more
> >> 
> >> rigorous and necessary in some parts of code when experimenting with
> >> other
> >> IdType & DType types.
> >> 
> >>   const oneIdType = convert(IdType, 1) # Used to make sure I increment
> >> 
> >> with a value of the proper type, again useless with IdType = Int.
> >> 
> >>   const zeroIdType = convert(IdType, 0)
> >>   i::IdType = zeroIdType; i2Match::IdType = zeroIdType; i2Lower::IdType =
> >> 
> >> zeroIdType; i2Upper::IdType = oneIdType;
> >> 
> >>   ...
> >>   
> >>     # Critical loop.
> >>     i2Match = i2Lower
> >>     while i2Match < i2Upper
> >>     
> >>       @inbounds i2MatchD2 = dt2D2[i2Match]
> >>       if i1D <= i2MatchD2
> >>       
> >>         i += oneIdType # SLOW!
> >>         @inbounds i2MatchD1 = dt2D1[i2Match]
> >>         @inbounds resid1[i] = i1id1
> >>         ...
> >>       
> >>       end
> >>       i2Match += oneIdType # SLOW!
> >>     
> >>     end
> >>   
> >>   ...
> >> 
> >> end
> > 
> > The undeclared types are 1-dim arrays of the appropriate type -- basically
> > all Int in this configuration.
> > 
> > Enclosed is the full stand-alone code if anyone cares to try.
> > On my machines, one function call is in the range of 0.05 to 0.1 sec,
> > highly depending upon garbage collection, so profiling with 100 runs is
> > done in about 10 sec.
> > 
> > Thanks.
> > 
> > Patrick
> > 
> > <crossJoinFilter.jl>

Reply via email to