Re: Using email addresses in exception messages

2019-05-13 Thread sebb
On Mon, 13 May 2019 at 18:42, Bill Igoe  wrote:
>
> In a prop shop long ago , I also emailed all exception messages. It is a
> great way to correct logic errors and capture  potentially sloppy code.
> Graphing the errors resulted in an exponential decline in errors.
>
> I also developed a handy performance tracker for critical algorithms and
> saved those results to a log file.
>
> Helps a ton.

A 'ton' is the appropriate word - there have been over 100 emails to
the dev@commons list complaining about the same problem.
(They have not been moderated through as it would not help)

It's only appropriate to email errors if they are sent to the people
who can do something about the problem.
In this case, that is the website developers who are using Collections
incorrectly.
Furthermore, they have exposed the Exception message to end users,
which means we get the emails direct from end users.

> Cheers to all
>
>
>
>
> On Sun, May 12, 2019 at 3:22 PM Emmanuel Bourg  wrote:
>
> > Le 12/05/2019 à 14:25, Gary Gregory a écrit :
> > > +1 to removing email addresses from exception messages. We should do a
> > pass
> > > over all of Commons.
> >
> > +1, makes sense.
> >
> > Emmanuel Bourg
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: Using email addresses in exception messages

2019-05-13 Thread Gary Gregory
You could do that with Log4j today ;-)

Gary

On Mon, May 13, 2019, 13:42 Bill Igoe  wrote:

> In a prop shop long ago , I also emailed all exception messages. It is a
> great way to correct logic errors and capture  potentially sloppy code.
> Graphing the errors resulted in an exponential decline in errors.
>
> I also developed a handy performance tracker for critical algorithms and
> saved those results to a log file.
>
> Helps a ton.
>
> Cheers to all
>
>
>
>
> On Sun, May 12, 2019 at 3:22 PM Emmanuel Bourg  wrote:
>
> > Le 12/05/2019 à 14:25, Gary Gregory a écrit :
> > > +1 to removing email addresses from exception messages. We should do a
> > pass
> > > over all of Commons.
> >
> > +1, makes sense.
> >
> > Emmanuel Bourg
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>


Re: Using email addresses in exception messages

2019-05-13 Thread Bill Igoe
In a prop shop long ago , I also emailed all exception messages. It is a
great way to correct logic errors and capture  potentially sloppy code.
Graphing the errors resulted in an exponential decline in errors.

I also developed a handy performance tracker for critical algorithms and
saved those results to a log file.

Helps a ton.

Cheers to all




On Sun, May 12, 2019 at 3:22 PM Emmanuel Bourg  wrote:

> Le 12/05/2019 à 14:25, Gary Gregory a écrit :
> > +1 to removing email addresses from exception messages. We should do a
> pass
> > over all of Commons.
>
> +1, makes sense.
>
> Emmanuel Bourg
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [rng] RNG-101 new MarsagliaTsangWang discrete probability sampler

2019-05-13 Thread Alex Herbert


> On 11 May 2019, at 23:25, Alex Herbert  wrote:
> 
> 
> 
>> On 11 May 2019, at 22:58, Gilles Sadowski  wrote:
>> 
>> Le sam. 11 mai 2019 à 23:32, Alex Herbert  a écrit 
>> :
>>> 
>>> 
>>> 
 On 10 May 2019, at 15:07, Gilles Sadowski  wrote:
 
 Hi.
 
 Le ven. 10 mai 2019 à 15:53, Alex Herbert >>> > a écrit :
> 
> 
> On 10/05/2019 14:27, Gilles Sadowski wrote:
>> Hi Alex.
>> 
>> Le ven. 10 mai 2019 à 13:57, Alex Herbert  a 
>> écrit :
>>> Can I get a review of the PR for RNG-101 please.
>> Thanks for this work!
>> 
>> I didn't go into the details; however, I see many fields and methods like
>> table1 ... table5
>> fillTable1 ... fillTable5
>> getTable1 ... getTable5
>> Wouldn't it be possible to use a 2D table:
>> table[5][];
>> so that e.g. only one "fillTable(int tableIndex, /* other args */)" 
>> method
>> is necessary (where "tableIndex" runs from 0 to 4)?
> 
> Yes. The design is based around using 5 tables as per the example code.
> 
> The sample() method knows which table it needs so it can directly jump
> to the table in question. I'd have to look at the difference in speed
> when using a 2D table as you are adding another array access but
> reducing the number of possible method calls (although you still need a
> method call). Maybe this will be optimised out by the JVM.
> 
> If the speed is not a factor then I'll rewrite it. Otherwise it's
> probably better done for speed as this is the entire point of the
> sampler given it disregards any probability under 2^-31 (i.e. it's not a
> perfectly fair sampler).
> 
> Note that 5 tables are needed for 5 hex digits (base 2^6). The paper
> states using 3 tables of base 2^10 then you get a speed increase
> (roughly 1.16x) at the cost of storage (roughly 9x). Changing to 2
> tables of base 2^15 does not make it much faster again.
> 
> I'll have a rethink to see if I can make the design work for different
> base sizes.
 
 That could be an extension made easier with the 2D table, but
 I quite agree that given the relatively minor speed improvement
 to be expected, it is not the main reason; the rationale was just to
 make the code a more compact and a little easier to grasp (IMHO).
 
 Gilles
>>> 
>>> I’ve done a more extensive look at the implications of changing the 
>>> implementation of the algorithm. This tested using: 1D or 2D tables; 
>>> interfaced storage to dynamic table types; base 6 or base 10 for the 
>>> algorithm; and allowing the base to be chosen. Results are in the Jira 
>>> ticket. Basically 2D arrays are slower and supporting choices for the 
>>> backing storage or base of the algorithm is slower.
>>> 
>>> To support the Poisson and Binomial samplers only requires 16-bit storage. 
>>> So a dedicated sampler using base 6 and short for the tables will be the 
>>> best compromise between storage space and speed. The base 10 sampler is 
>>> faster but takes about 9-10x more space in memory.
>>> 
>>> Note I originally wrote the sampler to use only 16-bit storage. I then 
>>> modified it to use dynamic storage without measuring performance. And so I 
>>> made it slightly slower.
>>> 
>>> The question is does the library even need to have a 32-bit storage 
>>> implementation? This would be used for a probability distribution with more 
>>> than 2^16 different possible samples. I think this would be an edge case. 
>>> Here the memory requirements will be in the tens of MB at a minimum but may 
>>> balloon to become much larger. In this case a different algorithm such as 
>>> the Alias method or a guide table is more memory stable as it only requires 
>>> 12 bytes of storage per index, irrespective of the shape of the probability 
>>> distribution.
>>> 
>>> If different implementations (of this algorithm) are added to the library 
>>> then the effect of using a sampler that dynamically chooses the storage 
>>> space and/or base for the algorithm is noticeable in the performance. In 
>>> this case these would be better served using a factory:
>>> 
>>> class DiscreteProbabilitySamplerFactory {
>>>   DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
>>> double[])
>>> }
>>> 
>>> But if specifically targeting this algorithm it could be:
>>> 
>>> class MarsagliaTsangWangDiscreteProbabilitySamplerFactory {
>>>   DiscreteSampler createDiscreteProbabilitySampler(UniformRandomProvider, 
>>> double[], boolean useBase10)
>>> }
>>> 
>>> Or something similar. The user can then choose to use a base 10 algorithm 
>>> if memory is not a concern.
>>> 
>>> I am wary of making this too complicated for just this sampler. So I would 
>>> vote for ignoring the base 10 version and sticking to the interfaced 
>>> storage implementation in the current PR or reverting back to the 16-bit 
>>> storage and not