Yoav,
 
Thanks for the comments.  See attempt at interspersed responses below.
 
Hola,

>Yes, it would be good to maintain acceptable html in javadoc. Yet, I'd
>like to point out that javadoc isn't java code. while we would like to
>maintain lots of it to help our users understand it, the library works
>just fine without it.

[Yoav] But if you do have it, it's be nice if it were in a human-friendly
browsing format, given that it's intended for humans and that most of
them use the HTML JavaDocs ;)

[phil] I agree.  We have a good bit of work to do still here.  Patches welcome :-)  
One thing that we do have beyond the package, class and method javadocs is the user 
guide, which is nearing completion.

>>5) Is double suitable for these calculations? Should the strictfp flag
be
>>used? (I have no idea as to the answer, but I have to ask)
>>
>Neither do I. Can anyone enlighten us?

[Yoav] You probably want strictfp: http://www.jguru.com/faq/view.jsp?EID=17544.

[Phil] I am not sure that we want this, but I am by no means a JVM expert.  From what 
I understand, the decision comes down to strict consistency of results on different 
platforms (mostly involving NaN and other boundary conditions) vs. performance.   In 
most practical applications, I would personally be more interested in performance.  It 
would be a major PITA (given the way things have to be declared); but I suppose that 
in theory we could support both.  I am open to discussion on this, but my vote at this 
point would be to release without strictfp support for 1.0.

 
[Yoav] Out of curiosity, why read each url/file twice?

[Phil] Because the implementation is primitive ;-)  The load method of 
EmpiricalDistribution needs to 1) compute basic univariate statistics for the whole 
file and 2) divide the range of values in the file into a predetermined number of 
"bins" and compute univariate statistics for the values in each bin.  The simplest way 
to do this is to pass the data once to do 1), then use the min and max discovered in 
1) to set up the bins and  compute the bin stats in the second pass.  Since the files 
may be large, it is not a good idea to try to load the data into memory during the 
first pass.  A single pass algorithm would have to either dynamically adjust the bins 
(and bin stats) as new extreme values are discovered or take extrema as arguments.  I 
would prefer not to require the extrema to be specified in advance.  The dynamic bin 
adjustment would be hard to do efficiently (at least is seems hard to me -- bright 
ideas / patches welcome :-)

Phil

 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to