You can try this patch. It ended up being a lot trickier than I
thought to reuse existing data structures and reapply updates. If it
works for you... it still seems worthwhile to add this change.

Sean

On Thu, Feb 4, 2010 at 2:42 PM, Sean Owen <[email protected]> wrote:
> Yeah the idea is to save time even transferring the file around,
> rather than reload time. But I think it's possible to change the code
> to be smarter here and not reload the main data file if it hasn't
> changed, but just assume the re-playing the diffs will be fine. I'll
> try to commit something later this evening.
>
> On Thu, Feb 4, 2010 at 2:22 PM, Vinicius Carvalho
> <[email protected]> wrote:
>> Hello there! I'm following the samples @ mathou in action. So I created the
>> sample based on the movielens 10M dataset using the filedatamodel, and
>> genericitembasedrecommender.
>>
>> Well, so I add a new file to directory named ratings.1.dat containing a new
>> user. Calling refresh takes a long time. On the machine I'm testing the file
>> loading and user processing takes less than 10 seconds. Adding a new file,
>> and calling refresh(null) as instructed on the book.
>>
>> Well, it took quite a long time (the new file had only 3 entries, and whole
>> process around 25 seconds, twice as before)
>>
>> I know this is one huge file :) but, on the book says that only new entries
>> are reprocessed, but according to this log:
>>
>> 04/02/10 12:18:21:021 BRST]  INFO common.RefreshHelper: Added refreshable:
>> FileDataModel[dataFile:/home/vinicius/Documents/logs/ratings.dat]
>> [04/02/10 12:18:24:024 BRST] DEBUG file.FileDataModel: File has changed;
>> reloading...
>> [04/02/10 12:18:24:024 BRST]  INFO file.FileDataModel: Reading file info...
>> [04/02/10 12:18:26:026 BRST]  INFO file.FileDataModel: Processed 1000000
>> lines
>> [04/02/10 12:18:28:028 BRST]  INFO file.FileDataModel: Processed 2000000
>> lines
>> [04/02/10 12:18:29:029 BRST]  INFO file.FileDataModel: Processed 3000000
>> lines
>> [04/02/10 12:18:30:030 BRST]  INFO file.FileDataModel: Processed 4000000
>> lines
>> [04/02/10 12:18:31:031 BRST]  INFO file.FileDataModel: Processed 5000000
>> lines
>> [04/02/10 12:18:33:033 BRST]  INFO file.FileDataModel: Processed 6000000
>> lines
>> [04/02/10 12:18:34:034 BRST]  INFO file.FileDataModel: Processed 7000000
>> lines
>> [04/02/10 12:18:35:035 BRST]  INFO file.FileDataModel: Processed 8000000
>> lines
>> [04/02/10 12:18:37:037 BRST]  INFO file.FileDataModel: Processed 9000000
>> lines
>> [04/02/10 12:18:39:039 BRST]  INFO file.FileDataModel: Processed 10000000
>> lines
>> [04/02/10 12:18:39:039 BRST]  INFO file.FileDataModel: Read lines: 10000054
>> [04/02/10 12:18:39:039 BRST]  INFO file.FileDataModel: Reading file info...
>> [04/02/10 12:18:39:039 BRST]  INFO file.FileDataModel: Read lines: 3
>> [04/02/10 12:18:39:039 BRST]  INFO model.GenericDataModel: Processed 10000
>> users
>> [04/02/10 12:18:40:040 BRST]  INFO model.GenericDataModel: Processed 20000
>> users
>> [04/02/10 12:18:40:040 BRST]  INFO model.GenericDataModel: Processed 30000
>> users
>> [04/02/10 12:18:41:041 BRST]  INFO model.GenericDataModel: Processed 40000
>> users
>> [04/02/10 12:18:42:042 BRST]  INFO model.GenericDataModel: Processed 50000
>> users
>> [04/02/10 12:18:45:045 BRST]  INFO model.GenericDataModel: Processed 60000
>> users
>> [04/02/10 12:18:45:045 BRST]  INFO model.GenericDataModel: Processed 69879
>> users
>> [04/02/10 12:18:46:046 BRST]  INFO common.RefreshHelper: Refreshed:
>> [FileDataModel[dataFile:/home/vinicius/Documents/logs/ratings.dat]]
>>
>>
>> It seems that whole ratings.dat is re-read again.
>>
>> Is GenericItemBasedRecommender needs to refresh the entire file? Is it
>> possible to speed up things?
>>
>> Regards
>>
>>
>> --
>> The intuitive mind is a sacred gift and the
>> rational mind is a faithful servant. We have
>> created a society that honors the servant and
>> has forgotten the gift.
>>
>

Reply via email to