Hi,

I'd like to use the Netflix challenge data and just can't figure out how to
efficiently "scan" the files.
https://www.kaggle.com/netflix-inc/netflix-prize-data

The files have two types of row, either an *ID* e.g., "1:" , "2:", etc. or
3 values associated to each ID:

The format is as follows:
*1:*
value1,value2, value3
value1,value2, value3
value1,value2, value3
value1,value2, value3
*2:*
value1,value2, value3
value1,value2, value3
*3:*
value1,value2, value3
value1,value2, value3
value1,value2, value3
*4:*
etc ...

And I want to create a matrix where each line is of the form:

ID value1, value2, value3

Si "ID" needs to be duplicated - I could write a Perl script to convert
this format to CSV, but I'm sure there's a simple R trick.

Thanks for suggestions!

Emmanuel

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to