Thanks Ted,

I found a minor bug with the logging logic. I posted a revised patch to the 
ticket.

I think the next step here is to update the LuceneIterable and the Driver to 
allow for configuration of these new attributes. I think I will wait for the 
patch in MAHOUT-671 to get committed first though.

Chris

On Apr 19, 2011, at 4:55 PM, Ted Dunning wrote:

Chris,

Can you review the patch I just pushed up that adjusts how much logging is 
produced.

On Tue, Apr 19, 2011 at 6:25 AM, Christopher Jordan 
<[email protected]<mailto:[email protected]>> wrote:
Just to further the point, logging is quite important. While you obviously will 
not review every log, in a production environment, you certainly will have 
monitoring scripts check them for ERROR and WARN entries. As well, if you do 
not want to see the WARN entries from a specific class, you can configure your 
logger to skip over them.

On Apr 19, 2011, at 12:07 AM, Ted Dunning wrote:

> I disagree.  You should document that you are discarding documents.  It is
> reasonable to not document every lost document and good to throw an
> exception when too many failures occur.
>
> It is almost inevitable with large data that some inputs are malformed.
> These can't stop the show, but you have to know what your exception rate is
> so you can detect catastrophic failures.
>
> On Mon, Apr 18, 2011 at 6:00 PM, Lance Norskog 
> <[email protected]<mailto:[email protected]>> wrote:
>
>> Please don't log it. Nobody reads logs.
>> Right is right and wrong is wrong. Either throw an exception or ignore it.
>> You can include a ratio of accepted vectors as an output.
>>
>> On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan 
>> <[email protected]<mailto:[email protected]>>
>> wrote:
>>> I have incorporated this requested change in a new patch that I attached
>> to ticket https://issues.apache.org/jira/browse/MAHOUT-675.
>>>
>>> It appears that the previous patch has already been applied. Should I
>> repull the repo, make a new ticket, and create a new patch?
>>>
>>> Thanks,
>>>
>>> Chris
>>>
>>> On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote:
>>>
>>> That sounds right to me.
>>>
>>> It might be plausible to blow an exception if a (configurable) large
>> percentage of all documents have to be rejected.  That is a minor
>> improvement, though.
>>>
>>> On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan 
>>> <[email protected]<mailto:[email protected]>
>> <mailto:[email protected]<mailto:[email protected]>>> wrote:
>>> I believe, at least in my situation, a better approach is for the
>> LuceneIterator to log a warning with the idField when it encounters a
>> problem document and move onto the next one.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> [email protected]<mailto:[email protected]>
>>



Reply via email to