Re: More proposed changes across code

2008-10-20 Thread Sean Owen
Ugh, yeah, thinking about that a couple more seconds would have been
useful. That is completely right.

On Mon, Oct 20, 2008 at 3:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>> Double.longBitsToDouble(Random.nextLong())
>
>
> NO!!


Re: More proposed changes across code

2008-10-20 Thread Ted Dunning
On Mon, Oct 20, 2008 at 7:00 AM, Sean Owen <[EMAIL PROTECTED]> wrote:

> ..
> If you're looking for a good value to initialize some kind of "max"
> variable to, try {Float,Double}.NEGATIVE_INFINITY. It is smaller than
> every actual float/double.


Try using the first element that you are comparing to!

Anything else is pretty dangerous.


> To generate a truly random float or double, try
>
> Double.longBitsToDouble(Random.nextLong())


NO!!

This will not be the uniformly distributed number that you are probably
wanting.  It will have a very strange distribution that includes
denormalized numbers, NAN's, and infinities of various kinds.

Try gen.nextDouble() where gen is a math.Random or better random number
generator.

Even (double) Random.nextLong() / Long.MAX_VALUE would be better.


-- 
ted


Re: More proposed changes across code

2008-10-20 Thread Sean Owen
PS Looks like Hadoop does have a DoubleWritable now -- I'll switch to
that where approrpriate as part of this pretty-big change list I am
preparing.

Also, while scouring the code I spotted what I believe is a common
misconception that has led to a bug or two: {Double,Float}.MIN_VALUE
is *not* the most negative value that a double/float can take on -- it
is the smallest *positive* value it can take on.

If you're looking for a good value to initialize some kind of "max"
variable to, try {Float,Double}.NEGATIVE_INFINITY. It is smaller than
every actual float/double.

If you really need the most negative value, I think you want the
negative of MAX_VALUE, but not 100% sure on that. Actually be careful
there because in some cases I saw code trying to compute:

(Float.MAX_VALUE - Float.MIN_VALUE)

If you fix this to

(Float.MAX_VALUE - (-Float.MAX_VALUE))

well you see the overflow problem.

To generate a truly random float or double, try

Double.longBitsToDouble(Random.nextLong())

though once again I would have to think about whether this actually
generates NaN or INFINITY in some cases!


Re: More proposed changes across code

2008-10-20 Thread Sean Owen
OK proceeding with these changes but will hold back submitting a bit
for more comments.

On Mon, Oct 20, 2008 at 12:06 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> No cases that I know of where floats actually help in our code and there are
> bound to be places where they hurt.

I agree, I think this is best.

If the reason was just that Hadoop, shockingly, doesn't have a
DoubleWritable, we can write one (and submit to Hadoop if desired) --
but we can also use FloatWritable too! you lose precision, yes, but
just at the end. It doesn't mean floats should be used everywhere.


> Furthermore, I try to follow the Spring philosophy that if the intermediate
> caller doesn't have much of a chance of fixing the problem, then it should
> be an unchecked exception.
>
> Thus, index out of bounds, numerical instability, bad argument and bad
> encoding cases should be runtime exceptions.  Any code where you find people
> repacking exceptions into runtime exceptions to meet external API
> requirements or where exceptions have to be declared but nobody ever catches
> them except the framework are prime candidates for this treatment.

This is a tangent -- I am not going to modify any exceptions -- but I
suppose I follow a somewhat different rule.

I have always understood RuntimeException to be for situations that
should not happen when the an API is called correctly, in a debugged
program. So, NullPointerException, IllegalArgumentException, etc. are
not checked. It is unreasonable to force the caller to tell you what
happens if it's calling an API wrong.

However anything else, any other situation that can come up in a
correctly-programmed bit of code, should be a checked exception. So I
think stuff like an IOException of SQLException are properly checked
exceptions. You can totally correctly call an HTTP API and still fail
to get a web page if the internet is not available. You *should* be
asked to say what happens if your method invocation can't complete
normally -- or else why have exceptions in the first place? So in that
sense I differ with the Spring/Hibernate theory.

... and that said I do not see any RuntimeExceptions which I think
should be checked exceptions! just a tangent.


Re: More proposed changes across code

2008-10-20 Thread deneche abdelhakim



--- En date de : Dim 19.10.08, Grant Ingersoll <[EMAIL PROTECTED]> a écrit :

> De: Grant Ingersoll <[EMAIL PROTECTED]>
> Objet: Re: More proposed changes across code
> À: mahout-dev@lucene.apache.org
> Date: Dimanche 19 Octobre 2008, 18h30
> On Oct 19, 2008, at 11:16 AM, Sean Owen wrote:
> 
> > On Sun, Oct 19, 2008 at 4:07 PM, Grant Ingersoll  
> > <[EMAIL PROTECTED]> wrote:
> >> Doesn't the javadoc tool used @inherit to fill
> in the inherited  
> >> docs when
> >> viewing?
> >
> > Yes... I suppose I find that redundant. The subclass
> method gets
> > documented exactly as the superclass does. It looks
> like the subclass
> > had been explicitly documented, when it hadn't
> been. I think its
> > intent is to copy in documentation and add to it; I am
> thinking only
> > of cases where the javadoc only has a single element,
> [EMAIL PROTECTED]
> >
> >
> >>> 3. UpdatableFloat/Long -- just use Float[1] /
> Long[1]? these classes
> >>> don't seem to be used.
> >>
> >> Hmmm, they were used, but sure that works too.
> >
> > I can't find any usages of these classes, where
> are they?
> 
> Right, they aren't used any longer.  Feel free to
> remove.
> 
> >
> >
> >
> >>> 5. BruteForceTravellingSalesman says
> "copyright Daniel Dwyer" -- can
> >>> this be replaced by the standard copyright
> header?
> >>
> >> No, this is in fact his code, licensed under the
> ASL.  I believe  
> >> the current
> >> way we are handling it is correct.  The original
> code is his, and  
> >> the mods
> >> are ours.
> >
> > Roger that, will leave it. But two notes then...
> > - what about all the other code that game from
> watchmaker? all the
> > classes in the package say they came from watchmaker
> > - I was told that for my stuff, yeah, I still own the
> code/copyright
> > but am licensing a copy to this project, and so it all
> just gets
> > licensed within Mahout according to the boilerplate
> which says
> > "Licensed to the ASF..."
> >
> > I'm not a lawyer and don't want to pick nits
> but I do want to take
> > extra care to get licensing right.
> 
> Right.  I believe the difference is you donated your code
> to the ASF,  
> Daniel has merely published his code under the ASL, but has
> not  
> donated to the ASF.  It's a subtle distinction, I
> suppose.Any of  
> the classes that came from watchmaker should say that,
> although I know  
> many were developed by Deneche for the Watchmaker API.  We
> can go  
> review them again.

In the case of the travellingSalesman example, I modified the original code to 
use Mahout when needed. My own modifications are a couple of lines in two or 
three classes, I included a readme.txt that describes the modified code and 
links to the original one. I replaced all the copyright headers with the 
standard one (I forgot BruteForceTravellingSalesman.java), and added a link to 
the original code in the class comments.
I've been reading the Apache License 2.0, I'm not a lawyer and if I'm not 
mistaken, the "travellingSalesman" code included with Mahout is a "Derivative 
Work" of the original code, so we need to :
. Point "in" the modified files that they have been changed, this files are: 
StrategyPanel.java, TravellingSalesman.java and 
EvolutionaryTravellingSalesman.java.
. because the Watchmaker library contains a NOTICE.TXT file, Mahout must 
include a readable copy of the attribution notices contained within 
Watchmaker's NOTICE file.

__
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail


Re : More proposed changes across code

2008-10-19 Thread deneche abdelhakim
> 5. BruteForceTravellingSalesman says "copyright Daniel
> Dwyer" -- can
> this be replaced by the standard copyright header?

Oups, I tought I changed them all ! Yes you can replace it.

__
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail


Re: More proposed changes across code

2008-10-19 Thread Ted Dunning
On Sun, Oct 19, 2008 at 5:58 AM, Sean Owen <[EMAIL PROTECTED]> wrote:

> 1. serialVersionUID should not be used.


+1


> 2. Javadoc that just read (non-javadoc) with a @see tag, or just has
> [EMAIL PROTECTED], adds no value and can be removed.


+1

I read Grant's comments, but I would rather have an indication that the code
has no javadoc.


> 3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes
> don't seem to be used.


I think that these came from code I had for counting things.  long[1] is
probably a bit better in terms of storage, but slightly more opaque and
long-winded.

+1 on the removal.


> 4. floats to double? are there cases where storage is important enough
> to sacrifice precision and CPU time?


+1

No cases that I know of where floats actually help in our code and there are
bound to be places where they hurt.


> 6. I never declare RuntimeException subclasses in a method signature.


+1

Furthermore, I try to follow the Spring philosophy that if the intermediate
caller doesn't have much of a chance of fixing the problem, then it should
be an unchecked exception.

Thus, index out of bounds, numerical instability, bad argument and bad
encoding cases should be runtime exceptions.  Any code where you find people
repacking exceptions into runtime exceptions to meet external API
requirements or where exceptions have to be declared but nobody ever catches
them except the framework are prime candidates for this treatment.


-- 
ted


Re: More proposed changes across code

2008-10-19 Thread Grant Ingersoll


On Oct 19, 2008, at 11:16 AM, Sean Owen wrote:

On Sun, Oct 19, 2008 at 4:07 PM, Grant Ingersoll  
<[EMAIL PROTECTED]> wrote:
Doesn't the javadoc tool used @inherit to fill in the inherited  
docs when

viewing?


Yes... I suppose I find that redundant. The subclass method gets
documented exactly as the superclass does. It looks like the subclass
had been explicitly documented, when it hadn't been. I think its
intent is to copy in documentation and add to it; I am thinking only
of cases where the javadoc only has a single element, [EMAIL PROTECTED]



3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes
don't seem to be used.


Hmmm, they were used, but sure that works too.


I can't find any usages of these classes, where are they?


Right, they aren't used any longer.  Feel free to remove.






5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can
this be replaced by the standard copyright header?


No, this is in fact his code, licensed under the ASL.  I believe  
the current
way we are handling it is correct.  The original code is his, and  
the mods

are ours.


Roger that, will leave it. But two notes then...
- what about all the other code that game from watchmaker? all the
classes in the package say they came from watchmaker
- I was told that for my stuff, yeah, I still own the code/copyright
but am licensing a copy to this project, and so it all just gets
licensed within Mahout according to the boilerplate which says
"Licensed to the ASF..."

I'm not a lawyer and don't want to pick nits but I do want to take
extra care to get licensing right.


Right.  I believe the difference is you donated your code to the ASF,  
Daniel has merely published his code under the ASL, but has not  
donated to the ASF.  It's a subtle distinction, I suppose.Any of  
the classes that came from watchmaker should say that, although I know  
many were developed by Deneche for the Watchmaker API.  We can go  
review them again.


-Grant


Re: More proposed changes across code

2008-10-19 Thread Sean Owen
On Sun, Oct 19, 2008 at 4:07 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> Doesn't the javadoc tool used @inherit to fill in the inherited docs when
> viewing?

Yes... I suppose I find that redundant. The subclass method gets
documented exactly as the superclass does. It looks like the subclass
had been explicitly documented, when it hadn't been. I think its
intent is to copy in documentation and add to it; I am thinking only
of cases where the javadoc only has a single element, [EMAIL PROTECTED]


>> 3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes
>> don't seem to be used.
>
> Hmmm, they were used, but sure that works too.

I can't find any usages of these classes, where are they?


>> 5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can
>> this be replaced by the standard copyright header?
>
> No, this is in fact his code, licensed under the ASL.  I believe the current
> way we are handling it is correct.  The original code is his, and the mods
> are ours.

Roger that, will leave it. But two notes then...
- what about all the other code that game from watchmaker? all the
classes in the package say they came from watchmaker
- I was told that for my stuff, yeah, I still own the code/copyright
but am licensing a copy to this project, and so it all just gets
licensed within Mahout according to the boilerplate which says
"Licensed to the ASF..."

I'm not a lawyer and don't want to pick nits but I do want to take
extra care to get licensing right.


Re: More proposed changes across code

2008-10-19 Thread Grant Ingersoll


On Oct 19, 2008, at 8:58 AM, Sean Owen wrote:


Since I'm back on this topic, here are some items I'd propose to
change. Yeah I know you can say they're small, but they get harder to
fix as time goes on. But nevertheless what do you think of...


1. serialVersionUID should not be used. The default serialization
mechanism errs on the side of considering differing class versions
serialization-incompatble. This is almost always completely fine.
Using serialVersionUID -- and forgetting to update it -- introduces
subtler runtime bugs. I prefer to never use it.

2. Javadoc that just read (non-javadoc) with a @see tag, or just has
[EMAIL PROTECTED], adds no value and can be removed.


Doesn't the javadoc tool used @inherit to fill in the inherited docs  
when viewing?





3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes
don't seem to be used.


Hmmm, they were used, but sure that works too.




4. floats to double? are there cases where storage is important enough
to sacrifice precision and CPU time?

5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can
this be replaced by the standard copyright header?


No, this is in fact his code, licensed under the ASL.  I believe the  
current way we are handling it is correct.  The original code is his,  
and the mods are ours.






6. I never declare RuntimeException subclasses in a method signature.
They are not required and do not tell the compiler anything new. They
can be javadoc'ed of course. I don't feel strongly about it but do
slightly prefer to only declare checked exceptions in a signature for
clarity.





More proposed changes across code

2008-10-19 Thread Sean Owen
Since I'm back on this topic, here are some items I'd propose to
change. Yeah I know you can say they're small, but they get harder to
fix as time goes on. But nevertheless what do you think of...


1. serialVersionUID should not be used. The default serialization
mechanism errs on the side of considering differing class versions
serialization-incompatble. This is almost always completely fine.
Using serialVersionUID -- and forgetting to update it -- introduces
subtler runtime bugs. I prefer to never use it.

2. Javadoc that just read (non-javadoc) with a @see tag, or just has
[EMAIL PROTECTED], adds no value and can be removed.

3. UpdatableFloat/Long -- just use Float[1] / Long[1]? these classes
don't seem to be used.

4. floats to double? are there cases where storage is important enough
to sacrifice precision and CPU time?

5. BruteForceTravellingSalesman says "copyright Daniel Dwyer" -- can
this be replaced by the standard copyright header?

6. I never declare RuntimeException subclasses in a method signature.
They are not required and do not tell the compiler anything new. They
can be javadoc'ed of course. I don't feel strongly about it but do
slightly prefer to only declare checked exceptions in a signature for
clarity.