The following is one way to parse your file using R (using R-3.1.2 on Windows in a US English locale). I downloaded it from Google Docs in tab-separated format. I could not get read.table() to do the job, but I don't completely understand the encoding/fileEncoding business there.
> file <- "exampX.xlsx - examp.tsv" # the name Google Docs suggested > lines <- readLines(file, encoding="UTF-8") Warning message: In readLines(file, encoding = "UTF-8") : incomplete final line found on 'exampX.xlsx - examp.tsv' > fields <- strsplit(lines, "\t") > txt <- vapply(fields, function(x)x[2], "") # 2nd field of each line > nmbrs <- regmatches(txt, gregexpr("[[:digit:]]+(\\*[[:digit:]]+)*", txt)) > lines[16:20] [1] "1.97\tл.а. 11 35*46 27*46" "1.61\tсамбо 9 31*36 29*45" [3] "1.17\tс.п. 4 37*29 39*30" "1.54\tушу 9 31*39 30*38" [5] "1.73\tсамбо 6 32*39 29*39" > nmbrs[16:20] [[1]] [1] "11" "35*46" "27*46" [[2]] [1] "9" "31*36" "29*45" [[3]] [1] "4" "37*29" "39*30" [[4]] [1] "9" "31*39" "30*38" [[5]] [1] "6" "32*39" "29*39" If you want to split those "x*y" into "x" and "y" you can use the pattern "[[:digit:]]+" instead of the one I used. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jan 21, 2015 at 12:31 PM, Dr Polanski <n.polyans...@gmail.com> wrote: > Hi all! > > Sorry to bother you, I am trying to learn some R via coursera courses and > other internet sources yet haven’t managed to go far > > And now I need to do some, I hope, not too difficult things, which I think > R can do, yet have no idea how to make it do so > > I have a big set of data (empirical) which was obtained by my colleagues > and store at not convenient way - all of the data in two cells of an excel > table > an example of the data is in the attached file (the link) > > > https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing > > so the first column has a number and the second has a whole vector (I > guess it is) which looks like > «some words in Cyrillic(the length varies)» and then the set of numbers > «12*23 34*45» (another problem that some times it is «12*23, 34*56» > > And the number of raws is about 3000 so it is impossible to do manually > > what I need to have at the end is to have it separately in different excel > cells > - what is written in words - | 12 | 23 | 34 | 45 | > > Do you think it is possible to do so using R (or something else?) > > Thank you very much in advance and sorry for asking for help and so stupid > question, the problem is - I am trying and yet haven’t even managed to > install openSUSE onto my laptop - only Ubuntu! :) > > > Thank you very much! > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.