Chris, Can you review the patch I just pushed up that adjusts how much logging is produced.
On Tue, Apr 19, 2011 at 6:25 AM, Christopher Jordan <[email protected]>wrote: > Just to further the point, logging is quite important. While you obviously > will not review every log, in a production environment, you certainly will > have monitoring scripts check them for ERROR and WARN entries. As well, if > you do not want to see the WARN entries from a specific class, you can > configure your logger to skip over them. > > On Apr 19, 2011, at 12:07 AM, Ted Dunning wrote: > > > I disagree. You should document that you are discarding documents. It > is > > reasonable to not document every lost document and good to throw an > > exception when too many failures occur. > > > > It is almost inevitable with large data that some inputs are malformed. > > These can't stop the show, but you have to know what your exception rate > is > > so you can detect catastrophic failures. > > > > On Mon, Apr 18, 2011 at 6:00 PM, Lance Norskog <[email protected]> > wrote: > > > >> Please don't log it. Nobody reads logs. > >> Right is right and wrong is wrong. Either throw an exception or ignore > it. > >> You can include a ratio of accepted vectors as an output. > >> > >> On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan <[email protected]> > >> wrote: > >>> I have incorporated this requested change in a new patch that I > attached > >> to ticket https://issues.apache.org/jira/browse/MAHOUT-675. > >>> > >>> It appears that the previous patch has already been applied. Should I > >> repull the repo, make a new ticket, and create a new patch? > >>> > >>> Thanks, > >>> > >>> Chris > >>> > >>> On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote: > >>> > >>> That sounds right to me. > >>> > >>> It might be plausible to blow an exception if a (configurable) large > >> percentage of all documents have to be rejected. That is a minor > >> improvement, though. > >>> > >>> On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan <[email protected] > >> <mailto:[email protected]>> wrote: > >>> I believe, at least in my situation, a better approach is for the > >> LuceneIterator to log a warning with the idField when it encounters a > >> problem document and move onto the next one. > >>> > >>> > >>> > >> > >> > >> > >> -- > >> Lance Norskog > >> [email protected] > >> > >
