Hi, I'd like to use the Netflix challenge data and just can't figure out how to efficiently "scan" the files. https://www.kaggle.com/netflix-inc/netflix-prize-data
The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or 3 values associated to each ID: The format is as follows: *1:* value1,value2, value3 value1,value2, value3 value1,value2, value3 value1,value2, value3 *2:* value1,value2, value3 value1,value2, value3 *3:* value1,value2, value3 value1,value2, value3 value1,value2, value3 *4:* etc ... And I want to create a matrix where each line is of the form: ID value1, value2, value3 Si "ID" needs to be duplicated - I could write a Perl script to convert this format to CSV, but I'm sure there's a simple R trick. Thanks for suggestions! Emmanuel [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.