I think R is quite capable of doing this. You would have to learn a comparable number of fiddly bits to accomplish this in R, Python or Perl.

That is not to say that learning Perl or Python is a bad idea... but in terms of "shortest path" I think they are of comparable complexity. All three languages support regular expressions, which would be the key bit of knowledge to acquire regardless of which tool you use.

Other fiddly bits might involve handling the cyrillic strings as data, though you did not convey a desire to retain that information.

One way (not extracting cyrillic text):

library(XLConnect)
DF <- readWorksheetFromFile( "exampX.xlsx", sheet="examp" )
pattern <- "^.*(\\d+) *\\* *(\\d+)[^\\d]*(\\d+) *\\* *(\\d+).*$"
idx <- grep( pattern, DF[[2]] )
dta <- sub( pattern, "\\1,\\2,\\3,\\4", DF[[2]][idx])
dtamatrix <- apply( do.call( rbind
                           , strsplit( dta, "," ) )
                  , 2
                  , as.numeric
                  )
extracted <- data.frame( V1=DF[[1]][idx], dtamatrix )


On Wed, 21 Jan 2015, Collin Lynch wrote:

Dr. Polanski, I would recommend something else.  Given the messy nature of
your data I would suggest using a language like Python or Perl to extract
it to an appropriate format.  Python has good regular expression support
and unicode support.  If you can save your data as a csv file or even text
line by line then it would be possible to write some code to read the file,
match the lines with a simple regular expression, and then spit them back
out as a csv file which you could read into R.

I realize that this means learning a new language or finding someone with
the requisite skills by I would recommend that over attempting to use R's
text processing.

   Collin.

On Wed, Jan 21, 2015 at 3:31 PM, Dr Polanski <n.polyans...@gmail.com> wrote:

Hi all!

Sorry to bother you, I am trying to learn some R via coursera courses and
other internet sources yet haven?t managed to go far

And now I need to do some, I hope, not too difficult things, which I think
R can do, yet have no idea how to make it do so

I have a big set of data (empirical) which was obtained by my colleagues
and store at not convenient  way - all of the data in two cells of an excel
table
an example of the data is in the attached file (the link)


https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing

so the first column has a number and the second has a whole vector (I
guess it is) which looks like
?some words in Cyrillic(the length varies)? and then the set of numbers
?12*23 34*45? (another problem that some times it is ?12*23, 34*56?

And the number of raws is about 3000 so it is impossible to do manually

what I need to have at the end is to have it separately in different excel
cells
- what is written in words - |  12  | 23 | 34 | 45 |

Do you think it is possible to do so using R (or something else?)

Thank you very much in advance and sorry for asking for help and so stupid
question, the problem is - I am trying and yet haven?t even managed to
install openSUSE onto my laptop - only Ubuntu! :)


Thank you very much!
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to