[ 
https://issues.apache.org/jira/browse/MAHOUT-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-583:
-----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Looks like someone has already made this change in SequenceFilesFromDirectory.

> Loss some data when create sequence files from directory
> --------------------------------------------------------
>
>                 Key: MAHOUT-583
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-583
>             Project: Mahout
>          Issue Type: Bug
>          Components: Utils
>    Affects Versions: 0.5
>         Environment: All situation
>            Reporter: yumeng
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: abcd.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Loss some data when create sequence files from directory. It will happen when 
> we need more than one output chunk file. It create chunk-0 twice. The first 
> chunk-0 file is overwrite by the second chunk-0 file. That's because the name 
> of the second chunk file starts from 0 not 1.
> For example, it creates files in the sequence, chunk-0, chunk-0, chunk-1, 
> chunk-2, chunk-3, chunk-*.  So we loss the first chunk-0 file if we create 
> more than one chunk files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to