Thanks Ted, I found a minor bug with the logging logic. I posted a revised patch to the ticket.
I think the next step here is to update the LuceneIterable and the Driver to allow for configuration of these new attributes. I think I will wait for the patch in MAHOUT-671 to get committed first though. Chris On Apr 19, 2011, at 4:55 PM, Ted Dunning wrote: Chris, Can you review the patch I just pushed up that adjusts how much logging is produced. On Tue, Apr 19, 2011 at 6:25 AM, Christopher Jordan <[email protected]<mailto:[email protected]>> wrote: Just to further the point, logging is quite important. While you obviously will not review every log, in a production environment, you certainly will have monitoring scripts check them for ERROR and WARN entries. As well, if you do not want to see the WARN entries from a specific class, you can configure your logger to skip over them. On Apr 19, 2011, at 12:07 AM, Ted Dunning wrote: > I disagree. You should document that you are discarding documents. It is > reasonable to not document every lost document and good to throw an > exception when too many failures occur. > > It is almost inevitable with large data that some inputs are malformed. > These can't stop the show, but you have to know what your exception rate is > so you can detect catastrophic failures. > > On Mon, Apr 18, 2011 at 6:00 PM, Lance Norskog > <[email protected]<mailto:[email protected]>> wrote: > >> Please don't log it. Nobody reads logs. >> Right is right and wrong is wrong. Either throw an exception or ignore it. >> You can include a ratio of accepted vectors as an output. >> >> On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan >> <[email protected]<mailto:[email protected]>> >> wrote: >>> I have incorporated this requested change in a new patch that I attached >> to ticket https://issues.apache.org/jira/browse/MAHOUT-675. >>> >>> It appears that the previous patch has already been applied. Should I >> repull the repo, make a new ticket, and create a new patch? >>> >>> Thanks, >>> >>> Chris >>> >>> On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote: >>> >>> That sounds right to me. >>> >>> It might be plausible to blow an exception if a (configurable) large >> percentage of all documents have to be rejected. That is a minor >> improvement, though. >>> >>> On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan >>> <[email protected]<mailto:[email protected]> >> <mailto:[email protected]<mailto:[email protected]>>> wrote: >>> I believe, at least in my situation, a better approach is for the >> LuceneIterator to log a warning with the idField when it encounters a >> problem document and move onto the next one. >>> >>> >>> >> >> >> >> -- >> Lance Norskog >> [email protected]<mailto:[email protected]> >>
