Re: More proposed changes across code
Ugh, yeah, thinking about that a couple more seconds would have been useful. That is completely right. On Mon, Oct 20, 2008 at 3:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> Double.longBitsToDouble(Random.nextLong()) > > > NO!!
Re: More proposed changes across code
On Mon, Oct 20, 2008 at 7:00 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > .. > If you're looking for a good value to initialize some kind of "max" > variable to, try {Float,Double}.NEGATIVE_INFINITY. It is smaller than > every actual float/double. Try using the first element that you are comparing to! Anything else is pretty dangerous. > To generate a truly random float or double, try > > Double.longBitsToDouble(Random.nextLong()) NO!! This will not be the uniformly distributed number that you are probably wanting. It will have a very strange distribution that includes denormalized numbers, NAN's, and infinities of various kinds. Try gen.nextDouble() where gen is a math.Random or better random number generator. Even (double) Random.nextLong() / Long.MAX_VALUE would be better. -- ted
Re: More proposed changes across code
PS Looks like Hadoop does have a DoubleWritable now -- I'll switch to that where approrpriate as part of this pretty-big change list I am preparing. Also, while scouring the code I spotted what I believe is a common misconception that has led to a bug or two: {Double,Float}.MIN_VALUE is *not* the most negative value that a double/float can take on -- it is the smallest *positive* value it can take on. If you're looking for a good value to initialize some kind of "max" variable to, try {Float,Double}.NEGATIVE_INFINITY. It is smaller than every actual float/double. If you really need the most negative value, I think you want the negative of MAX_VALUE, but not 100% sure on that. Actually be careful there because in some cases I saw code trying to compute: (Float.MAX_VALUE - Float.MIN_VALUE) If you fix this to (Float.MAX_VALUE - (-Float.MAX_VALUE)) well you see the overflow problem. To generate a truly random float or double, try Double.longBitsToDouble(Random.nextLong()) though once again I would have to think about whether this actually generates NaN or INFINITY in some cases!
Re: More proposed changes across code
OK proceeding with these changes but will hold back submitting a bit for more comments. On Mon, Oct 20, 2008 at 12:06 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > No cases that I know of where floats actually help in our code and there are > bound to be places where they hurt. I agree, I think this is best. If the reason was just that Hadoop, shockingly, doesn't have a DoubleWritable, we can write one (and submit to Hadoop if desired) -- but we can also use FloatWritable too! you lose precision, yes, but just at the end. It doesn't mean floats should be used everywhere. > Furthermore, I try to follow the Spring philosophy that if the intermediate > caller doesn't have much of a chance of fixing the problem, then it should > be an unchecked exception. > > Thus, index out of bounds, numerical instability, bad argument and bad > encoding cases should be runtime exceptions. Any code where you find people > repacking exceptions into runtime exceptions to meet external API > requirements or where exceptions have to be declared but nobody ever catches > them except the framework are prime candidates for this treatment. This is a tangent -- I am not going to modify any exceptions -- but I suppose I follow a somewhat different rule. I have always understood RuntimeException to be for situations that should not happen when the an API is called correctly, in a debugged program. So, NullPointerException, IllegalArgumentException, etc. are not checked. It is unreasonable to force the caller to tell you what happens if it's calling an API wrong. However anything else, any other situation that can come up in a correctly-programmed bit of code, should be a checked exception. So I think stuff like an IOException of SQLException are properly checked exceptions. You can totally correctly call an HTTP API and still fail to get a web page if the internet is not available. You *should* be asked to say what happens if your method invocation can't complete normally -- or else why have exceptions in the first place? So in that sense I differ with the Spring/Hibernate theory. ... and that said I do not see any RuntimeExceptions which I think should be checked exceptions! just a tangent.
Re: More proposed changes across code
--- En date de : Dim 19.10.08, Grant Ingersoll <[EMAIL PROTECTED]> a écrit : > De: Grant Ingersoll <[EMAIL PROTECTED]> > Objet: Re: More proposed changes across code > À: mahout-dev@lucene.apache.org > Date: Dimanche 19 Octobre 2008, 18h30 > On Oct 19, 2008, at 11:16 AM, Sean Owen wrote: > > > On Sun, Oct 19, 2008 at 4:07 PM, Grant Ingersoll > > <[EMAIL PROTECTED]> wrote: > >> Doesn't the javadoc tool used @inherit to fill > in the inherited > >> docs when > >> viewing? > > > > Yes... I suppose I find that redundant. The subclass > method gets > > documented exactly as the superclass does. It looks > like the subclass > > had been explicitly documented, when it hadn't > been. I think its > > intent is to copy in documentation and add to it; I am > thinking only > > of cases where the javadoc only has a single element, > [EMAIL PROTECTED] > > > > > >>> 3. UpdatableFloat/Long -- just use Float[1] / > Long[1]? these classes > >>> don't seem to be used. > >> > >> Hmmm, they were used, but sure that works too. > > > > I can't find any usages of these classes, where > are they? > > Right, they aren't used any longer. Feel free to > remove. > > > > > > > > >>> 5. BruteForceTravellingSalesman says > "copyright Daniel Dwyer" -- can > >>> this be replaced by the standard copyright > header? > >> > >> No, this is in fact his code, licensed under the > ASL. I believe > >> the current > >> way we are handling it is correct. The original > code is his, and > >> the mods > >> are ours. > > > > Roger that, will leave it. But two notes then... > > - what about all the other code that game from > watchmaker? all the > > classes in the package say they came from watchmaker > > - I was told that for my stuff, yeah, I still own the > code/copyright > > but am licensing a copy to this project, and so it all > just gets > > licensed within Mahout according to the boilerplate > which says > > "Licensed to the ASF..." > > > > I'm not a lawyer and don't want to pick nits > but I do want to take > > extra care to get licensing right. > > Right. I believe the difference is you donated your code > to the ASF, > Daniel has merely published his code under the ASL, but has > not > donated to the ASF. It's a subtle distinction, I > suppose.Any of > the classes that came from watchmaker should say that, > although I know > many were developed by Deneche for the Watchmaker API. We > can go > review them again. In the case of the travellingSalesman example, I modified the original code to use Mahout when needed. My own modifications are a couple of lines in two or three classes, I included a readme.txt that describes the modified code and links to the original one. I replaced all the copyright headers with the standard one (I forgot BruteForceTravellingSalesman.java), and added a link to the original code in the class comments. I've been reading the Apache License 2.0, I'm not a lawyer and if I'm not mistaken, the "travellingSalesman" code included with Mahout is a "Derivative Work" of the original code, so we need to : . Point "in" the modified files that they have been changed, this files are: StrategyPanel.java, TravellingSalesman.java and EvolutionaryTravellingSalesman.java. . because the Watchmaker library contains a NOTICE.TXT file, Mahout must include a readable copy of the attribution notices contained within Watchmaker's NOTICE file. __ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail
Re : More proposed changes across code
> 5. BruteForceTravellingSalesman says "copyright Daniel > Dwyer" -- can > this be replaced by the standard copyright header? Oups, I tought I changed them all ! Yes you can replace it. __ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail
Re: More proposed changes across code
On Sun, Oct 19, 2008 at 5:58 AM, Sean Owen <[EMAIL PROTECTED]> wrote: > 1. serialVersionUID should not be used. +1 > 2. Javadoc that just read (non-javadoc) with a @see tag, or just has > [EMAIL PROTECTED], adds no value and can be removed. +1 I read Grant's comments, but I would rather have an indication that the code has no javadoc. > 3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes > don't seem to be used. I think that these came from code I had for counting things. long[1] is probably a bit better in terms of storage, but slightly more opaque and long-winded. +1 on the removal. > 4. floats to double? are there cases where storage is important enough > to sacrifice precision and CPU time? +1 No cases that I know of where floats actually help in our code and there are bound to be places where they hurt. > 6. I never declare RuntimeException subclasses in a method signature. +1 Furthermore, I try to follow the Spring philosophy that if the intermediate caller doesn't have much of a chance of fixing the problem, then it should be an unchecked exception. Thus, index out of bounds, numerical instability, bad argument and bad encoding cases should be runtime exceptions. Any code where you find people repacking exceptions into runtime exceptions to meet external API requirements or where exceptions have to be declared but nobody ever catches them except the framework are prime candidates for this treatment. -- ted
Re: More proposed changes across code
On Oct 19, 2008, at 11:16 AM, Sean Owen wrote: On Sun, Oct 19, 2008 at 4:07 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Doesn't the javadoc tool used @inherit to fill in the inherited docs when viewing? Yes... I suppose I find that redundant. The subclass method gets documented exactly as the superclass does. It looks like the subclass had been explicitly documented, when it hadn't been. I think its intent is to copy in documentation and add to it; I am thinking only of cases where the javadoc only has a single element, [EMAIL PROTECTED] 3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes don't seem to be used. Hmmm, they were used, but sure that works too. I can't find any usages of these classes, where are they? Right, they aren't used any longer. Feel free to remove. 5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can this be replaced by the standard copyright header? No, this is in fact his code, licensed under the ASL. I believe the current way we are handling it is correct. The original code is his, and the mods are ours. Roger that, will leave it. But two notes then... - what about all the other code that game from watchmaker? all the classes in the package say they came from watchmaker - I was told that for my stuff, yeah, I still own the code/copyright but am licensing a copy to this project, and so it all just gets licensed within Mahout according to the boilerplate which says "Licensed to the ASF..." I'm not a lawyer and don't want to pick nits but I do want to take extra care to get licensing right. Right. I believe the difference is you donated your code to the ASF, Daniel has merely published his code under the ASL, but has not donated to the ASF. It's a subtle distinction, I suppose.Any of the classes that came from watchmaker should say that, although I know many were developed by Deneche for the Watchmaker API. We can go review them again. -Grant
Re: More proposed changes across code
On Sun, Oct 19, 2008 at 4:07 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Doesn't the javadoc tool used @inherit to fill in the inherited docs when > viewing? Yes... I suppose I find that redundant. The subclass method gets documented exactly as the superclass does. It looks like the subclass had been explicitly documented, when it hadn't been. I think its intent is to copy in documentation and add to it; I am thinking only of cases where the javadoc only has a single element, [EMAIL PROTECTED] >> 3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes >> don't seem to be used. > > Hmmm, they were used, but sure that works too. I can't find any usages of these classes, where are they? >> 5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can >> this be replaced by the standard copyright header? > > No, this is in fact his code, licensed under the ASL. I believe the current > way we are handling it is correct. The original code is his, and the mods > are ours. Roger that, will leave it. But two notes then... - what about all the other code that game from watchmaker? all the classes in the package say they came from watchmaker - I was told that for my stuff, yeah, I still own the code/copyright but am licensing a copy to this project, and so it all just gets licensed within Mahout according to the boilerplate which says "Licensed to the ASF..." I'm not a lawyer and don't want to pick nits but I do want to take extra care to get licensing right.
Re: More proposed changes across code
On Oct 19, 2008, at 8:58 AM, Sean Owen wrote: Since I'm back on this topic, here are some items I'd propose to change. Yeah I know you can say they're small, but they get harder to fix as time goes on. But nevertheless what do you think of... 1. serialVersionUID should not be used. The default serialization mechanism errs on the side of considering differing class versions serialization-incompatble. This is almost always completely fine. Using serialVersionUID -- and forgetting to update it -- introduces subtler runtime bugs. I prefer to never use it. 2. Javadoc that just read (non-javadoc) with a @see tag, or just has [EMAIL PROTECTED], adds no value and can be removed. Doesn't the javadoc tool used @inherit to fill in the inherited docs when viewing? 3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes don't seem to be used. Hmmm, they were used, but sure that works too. 4. floats to double? are there cases where storage is important enough to sacrifice precision and CPU time? 5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can this be replaced by the standard copyright header? No, this is in fact his code, licensed under the ASL. I believe the current way we are handling it is correct. The original code is his, and the mods are ours. 6. I never declare RuntimeException subclasses in a method signature. They are not required and do not tell the compiler anything new. They can be javadoc'ed of course. I don't feel strongly about it but do slightly prefer to only declare checked exceptions in a signature for clarity.