I disagree. You should document that you are discarding documents. It is reasonable to not document every lost document and good to throw an exception when too many failures occur.
It is almost inevitable with large data that some inputs are malformed. These can't stop the show, but you have to know what your exception rate is so you can detect catastrophic failures. On Mon, Apr 18, 2011 at 6:00 PM, Lance Norskog <[email protected]> wrote: > Please don't log it. Nobody reads logs. > Right is right and wrong is wrong. Either throw an exception or ignore it. > You can include a ratio of accepted vectors as an output. > > On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan <[email protected]> > wrote: > > I have incorporated this requested change in a new patch that I attached > to ticket https://issues.apache.org/jira/browse/MAHOUT-675. > > > > It appears that the previous patch has already been applied. Should I > repull the repo, make a new ticket, and create a new patch? > > > > Thanks, > > > > Chris > > > > On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote: > > > > That sounds right to me. > > > > It might be plausible to blow an exception if a (configurable) large > percentage of all documents have to be rejected. That is a minor > improvement, though. > > > > On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan <[email protected] > <mailto:[email protected]>> wrote: > > I believe, at least in my situation, a better approach is for the > LuceneIterator to log a warning with the idField when it encounters a > problem document and move onto the next one. > > > > > > > > > > -- > Lance Norskog > [email protected] >
