Re: [R] Novice question about getting data into R

2012-06-22 Thread Petr PIKAL
Hi
> 
> Dear Petr, 
>   
> Thank you very much for reply. You cannot read the Chinese characters 
may 

Yes, and I cannot install it either.

> be because you don't have install this language. Do you have any idea 
how 
> solve this problem? Or who can help me? May I install Linux? 

What problem? To read chinese characters?

I tried to search CRAN with 

read chinese character

and I got many answers you could go through. However in the first glance 
it seems to me that this issue is not trivial.

Regards
Petr


>   
> Best regards, 
> Ms. Márcia Schmaltz 修安琪 Departamento de Português / Department 
of 
> Portuguese Faculdade de CiĂŞncias Sociais e Humanas / Faculty of Social 
> Science and Humanities - UM 
澳门大学社会科学及人文学院葡语系 
> http://www.umac.mo/fsh/ciela/staff/Marcia_Schmaltz.html (+853) 6231-2114 

> and 8397-8902 -"Petr Pikal-3 [via R]"  +s789695n4634110...@n4.nabble.com>wrote: - 
> 
> To: schmaltz  From: "Petr Pikal-3 [via R]" 
 node+s789695n4634110...@n4.nabble.com> Date: 06/21/2012 09:27PM 
> Subject: Re: Novice question about getting data into R Hi I can read the 

> example you provided without much problem. dput(head(test)) 
structure(list
> (n = 0:5, X = c(NA, NA, NA, NA, NA, NA), start = c(11185L, 39530L, 
40544L,
> 109684L, 114629L, 118841L), X.1 = c(NA, NA, NA, NA, NA, NA), dur = c(1L, 

> 2L, 1L, 1L, 0L, 1L), X.2 = c(NA, NA, NA, NA, NA, NA), pause = c(28344L, 
> 1012L, 69139L, 4944L, 4212L, 2558L), X.3 = c(NA, NA, NA, NA, NA, NA), 
par 
> = c(0, 100, 100, 100, 0, 100), X.4 = c(NA, NA, NA, NA, NA, NA), ins = c
> (2L, 3L, 2L, 2L, 1L, 2L), X.5 = c(NA, NA, NA, NA, NA, NA), del = c(0L, 
0L,
> 0L, 0L, 0L, 0L), X.6 = c(NA, NA, NA, NA, NA, NA), sid = structure(c(10L, 

> 13L, 16L, 1L, 11L, 12L), .Label = c(" -1", " -1+11+13+15", " -1+110", " 
-1
> +16", " -1+26+29", " -1+27+30", " -1+32", " -1+4+5", " -1+48", " 1", " 
> 17", " 18+19", " 2", " 20", " 28", " 3", " 36", " 37", " 38", " 42", " 
> 43", " 45", " 49", " 50", " 53", " 54", " 58", " 59", " 61+64"), class = 

> "factor"), X.7 = c(NA, NA, NA, NA, NA, NA), tid = structure(c(1L, 6L, 
20L,
> 30L, 38L, 39L), .Label = c(" 1", " 10+11+12", " 13+14", " 15+16+17", " 
18
> +19", " 2+3", " 20", " 21", " 22", " 23", " 24+25", " 26", " 27+28+29", 
" 
> 30+31+32", " 33+34", " 35", " 36+37", " 38", " 39", " 4", " 40", " 41", 
" 
> 42", " 43", " 44+45", " 46", " 47", " 48", " 49", " 5", " 50", " 51", " 
52
> +93", " 53", " 54", " 55", " 56", " 6", " 7", " 8", " 9"), class = 
> "factor"), X.8 = c(NA, NA, NA, NA, NA, NA), str = structure(c(5L, 6L, 
5L, 
> 5L, 4L, 5L), .Label = c(" ,", " ,_", " .", " quot;, " 颯", " 
> 颯quot;, " 颯", " 颯颯", " 颯颯quot;), class = 
> "factor")), .Names = c("n", "X", "start", "X.1", "dur", "X.2", "pause", 
> "X.3", "par", "X.4", "ins", "X.5", "del", "X.6", "sid", "X.7", "tid", 
"X.
> 8", "str" ), row.names = c(NA, 6L), class = "data.frame") Only Chinese 
> characters are missing and some extra columns appear > str(test) 
> 'data.frame':   41 obs. of  19 variables:  $ n   
>  : int  0 1 2 3 4 5 6 7 8 9 ...  $ X    : logi 
>  NA NA NA NA NA NA ...  $ start: int  11185 39530 40544 
> 109684 114629 118841 121400 128201 129793 131852 ...  $ X.1  : 

> logi  NA NA NA NA NA NA ...  $ dur  : int  1 2 1 1 0 
1
> 1 1 436 608 ...  $ X.2  : logi  NA NA NA NA NA NA ... 
>  $ pause: int  28344 1012 69139 4944 4212 2558 6800 1591 1623 
> 3573 ...  $ X.3  : logi  NA NA NA NA NA NA ...  $ 
par 
>  : num  0 100 100 100 0 100 100 100 0 100 ...  $ X.4 
>  : logi  NA NA NA NA NA NA ...  $ ins  : int  2 
3
> 2 2 1 2 2 2 3 3 ...  $ X.5  : logi  NA NA NA NA NA NA ... 

>  $ del  : int  0 0 0 0 0 0 0 0 0 0 ...  $ X.6 
 : 
> logi  NA NA NA NA NA NA ...  $ sid  : Factor w/ 29 levels 
"
> -1"," -1+11+13+15",..: 10 13 16 1 11 12 1 1 2 4 ...  $ X.7  : 
> logi  NA NA NA NA NA NA ...  $ tid  : Factor w/ 41 levels 
"
> 1"," 10+11+12",..: 1 6 20 30 38 39 40 41 2 3 ...  $ X.8  : 
logi 
>  NA NA NA NA NA NA ...  $ str  : Factor w/ 9 levels " 
,"," 
> ,_"," .",..: 5 6 5 5 4 5 5 5 6 6 ... > sessionInfo() R Under 
> development (unstable) (2012-03-03 r58569) Platform: 
i386-pc-mingw32/i386 
> (32-bit) locale: [1] LC_COLLATE=Czech_Czech Republic.1250 
>  LC_CTYPE=Czech_Czech Republic.1250 [3] LC_MONETARY=Czech_Czech 
> Republic.1250 LC_NUMERIC=C   [5] LC_TIME=Czech_Czech Republic.1250 
> Regards Petr > Dear Professor Daalgard, > > I beginning to 
> participate in one research of statiscal modelling of > 
> translators'activity data, and recently install R and try to generate 
the 
> > one Translation Progress Graph, as my colleagues do (with sucess), 
> but in my > Windows platform was found the error below. According 
> R'FAQs, it seems to be > very common error, as I'm not even familiar 
> with the program R and even with > the ProGra, could you help me? 
> Please! > > Note: the Translation Progress Graph is compost by 
> quintuple data {S, T, A, 
> > F, K} f

Re: [R] Novice question about getting data into R

2012-06-21 Thread schmaltz
Dear Petr, 
  
Thank you very much for reply. You cannot read the Chinese characters may be 
because you don't have install this language. Do you have any idea how solve 
this problem? Or who can help me? May I install Linux? 
  
Best regards, 
Ms. Márcia Schmaltz 修安琪 Departamento de Português / Department of 
Portuguese Faculdade de Ciências Sociais e Humanas / Faculty of Social Science 
and Humanities - UM 澳门大学社会科学及人文学院葡语系 
http://www.umac.mo/fsh/ciela/staff/Marcia_Schmaltz.html (+853) 6231-2114 and 
8397-8902 -"Petr Pikal-3 [via R]" 
wrote: - 

To: schmaltz  From: "Petr Pikal-3 [via R]" 
 Date: 06/21/2012 09:27PM 
Subject: Re: Novice question about getting data into R Hi I can read the 
example you provided without much problem. dput(head(test)) structure(list(n = 
0:5, X = c(NA, NA, NA, NA, NA, NA), start = c(11185L, 39530L, 40544L, 109684L, 
114629L, 118841L), X.1 = c(NA, NA, NA, NA, NA, NA), dur = c(1L, 2L, 1L, 1L, 0L, 
1L), X.2 = c(NA, NA, NA, NA, NA, NA), pause = c(28344L, 1012L, 69139L, 4944L, 
4212L, 2558L), X.3 = c(NA, NA, NA, NA, NA, NA), par = c(0, 100, 100, 100, 0, 
100), X.4 = c(NA, NA, NA, NA, NA, NA), ins = c(2L, 3L, 2L, 2L, 1L, 2L), X.5 = 
c(NA, NA, NA, NA, NA, NA), del = c(0L, 0L, 0L, 0L, 0L, 0L), X.6 = c(NA, NA, NA, 
NA, NA, NA), sid = structure(c(10L, 13L, 16L, 1L, 11L, 12L), .Label = c(" -1", 
" -1+11+13+15", " -1+110", " -1+16", " -1+26+29", " -1+27+30", " -1+32", " 
-1+4+5", " -1+48", " 1", " 17", " 18+19", " 2", " 20", " 28", " 3", " 36", " 
37", " 38", " 42", " 43", " 45", " 49", " 50", " 53", " 54", " 58", " 59", " 
61+64"), class = "factor"), X.7 = c(NA, NA, NA, NA, NA, NA), tid = 
structure(c(1L, 6L, 20L, 30L, 38L, 39L), .Label = c(" 1", " 10+11+12", " 
13+14", " 15+16+17", " 18+19", " 2+3", " 20", " 21", " 22", " 23", " 24+25", " 
26", " 27+28+29", " 30+31+32", " 33+34", " 35", " 36+37", " 38", " 39", " 4", " 
40", " 41", " 42", " 43", " 44+45", " 46", " 47", " 48", " 49", " 5", " 50", " 
51", " 52+93", " 53", " 54", " 55", " 56", " 6", " 7", " 8", " 9"), class = 
"factor"), X.8 = c(NA, NA, NA, NA, NA, NA), str = structure(c(5L, 6L, 5L, 5L, 
4L, 5L), .Label = c(" ,", " ,_", " .", " quot;, " 颯", " 颯quot;, " 
颯", " 颯颯", " 颯颯quot;), class = "factor")), .Names = 
c("n", "X", "start", "X.1", "dur", "X.2", "pause", "X.3", "par", "X.4", "ins", 
"X.5", "del", "X.6", "sid", "X.7", "tid", "X.8", "str" ), row.names = c(NA, 
6L), class = "data.frame") Only Chinese characters are missing and some extra 
columns appear > str(test) 'data.frame':   41 obs. of  19 
variables:  $ n    : int  0 1 2 3 4 5 6 7 8 9 ...  $ X 
   : logi  NA NA NA NA NA NA ...  $ start: int  11185 
39530 40544 109684 114629 118841 121400 128201 129793 131852 ...  $ X.1 
 : logi  NA NA NA NA NA NA ...  $ dur  : int  1 2 1 1 
0 1 1 1 436 608 ...  $ X.2  : logi  NA NA NA NA NA NA ... 
 $ pause: int  28344 1012 69139 4944 4212 2558 6800 1591 1623 3573 
...  $ X.3  : logi  NA NA NA NA NA NA ...  $ par  : 
num  0 100 100 100 0 100 100 100 0 100 ...  $ X.4  : logi 
 NA NA NA NA NA NA ...  $ ins  : int  2 3 2 2 1 2 2 2 3 3 
...  $ X.5  : logi  NA NA NA NA NA NA ...  $ del  : 
int  0 0 0 0 0 0 0 0 0 0 ...  $ X.6  : logi  NA NA NA NA NA 
NA ...  $ sid  : Factor w/ 29 levels " -1"," -1+11+13+15",..: 10 13 
16 1 11 12 1 1 2 4 ...  $ X.7  : logi  NA NA NA NA NA NA ... 
 $ tid  : Factor w/ 41 levels " 1"," 10+11+12",..: 1 6 20 30 38 39 40 
41 2 3 ...  $ X.8  : logi  NA NA NA NA NA NA ...  $ str 
 : Factor w/ 9 levels " ,"," ,_"," .",..: 5 6 5 5 4 5 5 5 6 6 ... > 
sessionInfo() R Under development (unstable) (2012-03-03 r58569) Platform: 
i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Czech_Czech Republic.1250 
 LC_CTYPE=Czech_Czech Republic.1250 [3] LC_MONETARY=Czech_Czech 
Republic.1250 LC_NUMERIC=C   [5] LC_TIME=Czech_Czech Republic.1250 Regards 
Petr > Dear Professor Daalgard, > > I beginning to participate in one 
research of statiscal modelling of > translators'activity data, and recently 
install R and try to generate the > one Translation Progress Graph, as my 
colleagues do (with sucess), but in my > Windows platform was found the 
error below. According R'FAQs, it seems to be > very common error, as I'm 
not even familiar with the program R and even with > the ProGra, could you 
help me? Please! > > Note: the Translation Progress Graph is compost by 
quintuple data {S, T, A, 
> F, K} for Source and Target Text, Alignment, Fixation and Keyboar data, 
> respectively. > > > 
>ReadData("C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2") > 
Reading Fixation Units: > 
C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .fu > Reading 
Production Units: > C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 
.pu > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings, : > line 38 did not have 10 elements > > Note: 

Re: [R] Novice question about getting data into R

2012-06-21 Thread Petr PIKAL
Hi

I can read the example you provided without much problem.

dput(head(test))
structure(list(n = 0:5, X = c(NA, NA, NA, NA, NA, NA), start = c(11185L, 
39530L, 40544L, 109684L, 114629L, 118841L), X.1 = c(NA, NA, NA, 
NA, NA, NA), dur = c(1L, 2L, 1L, 1L, 0L, 1L), X.2 = c(NA, NA, 
NA, NA, NA, NA), pause = c(28344L, 1012L, 69139L, 4944L, 4212L, 
2558L), X.3 = c(NA, NA, NA, NA, NA, NA), par = c(0, 100, 100, 
100, 0, 100), X.4 = c(NA, NA, NA, NA, NA, NA), ins = c(2L, 3L, 
2L, 2L, 1L, 2L), X.5 = c(NA, NA, NA, NA, NA, NA), del = c(0L, 
0L, 0L, 0L, 0L, 0L), X.6 = c(NA, NA, NA, NA, NA, NA), sid = 
structure(c(10L, 
13L, 16L, 1L, 11L, 12L), .Label = c(" -1", " -1+11+13+15", " -1+110", 
" -1+16", " -1+26+29", " -1+27+30", " -1+32", " -1+4+5", " -1+48", 
" 1", " 17", " 18+19", " 2", " 20", " 28", " 3", " 36", " 37", 
" 38", " 42", " 43", " 45", " 49", " 50", " 53", " 54", " 58", 
" 59", " 61+64"), class = "factor"), X.7 = c(NA, NA, NA, NA, 
NA, NA), tid = structure(c(1L, 6L, 20L, 30L, 38L, 39L), .Label = c(" 1", 
" 10+11+12", " 13+14", " 15+16+17", " 18+19", " 2+3", " 20", 
" 21", " 22", " 23", " 24+25", " 26", " 27+28+29", " 30+31+32", 
" 33+34", " 35", " 36+37", " 38", " 39", " 4", " 40", " 41", 
" 42", " 43", " 44+45", " 46", " 47", " 48", " 49", " 5", " 50", 
" 51", " 52+93", " 53", " 54", " 55", " 56", " 6", " 7", " 8", 
" 9"), class = "factor"), X.8 = c(NA, NA, NA, NA, NA, NA), str = 
structure(c(5L, 
6L, 5L, 5L, 4L, 5L), .Label = c(" ,", " ,_", " .", " ・", " ・・", 
" ・・・", " ・・・.", " ", " ・"), class = "factor")), .Names = c("n", 
"X", "start", "X.1", "dur", "X.2", "pause", "X.3", "par", "X.4", 
"ins", "X.5", "del", "X.6", "sid", "X.7", "tid", "X.8", "str"
), row.names = c(NA, 6L), class = "data.frame")

Only Chinese characters are missing and some extra columns appear

> str(test)
'data.frame':   41 obs. of  19 variables:
 $ n: int  0 1 2 3 4 5 6 7 8 9 ...
 $ X: logi  NA NA NA NA NA NA ...
 $ start: int  11185 39530 40544 109684 114629 118841 121400 128201 129793 
131852 ...
 $ X.1  : logi  NA NA NA NA NA NA ...
 $ dur  : int  1 2 1 1 0 1 1 1 436 608 ...
 $ X.2  : logi  NA NA NA NA NA NA ...
 $ pause: int  28344 1012 69139 4944 4212 2558 6800 1591 1623 3573 ...
 $ X.3  : logi  NA NA NA NA NA NA ...
 $ par  : num  0 100 100 100 0 100 100 100 0 100 ...
 $ X.4  : logi  NA NA NA NA NA NA ...
 $ ins  : int  2 3 2 2 1 2 2 2 3 3 ...
 $ X.5  : logi  NA NA NA NA NA NA ...
 $ del  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ X.6  : logi  NA NA NA NA NA NA ...
 $ sid  : Factor w/ 29 levels " -1"," -1+11+13+15",..: 10 13 16 1 11 12 1 
1 2 4 ...
 $ X.7  : logi  NA NA NA NA NA NA ...
 $ tid  : Factor w/ 41 levels " 1"," 10+11+12",..: 1 6 20 30 38 39 40 41 2 
3 ...
 $ X.8  : logi  NA NA NA NA NA NA ...
 $ str  : Factor w/ 9 levels " ,"," ,_"," .",..: 5 6 5 5 4 5 5 5 6 6 ...

> sessionInfo()
R Under development (unstable) (2012-03-03 r58569)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Czech_Czech Republic.1250  LC_CTYPE=Czech_Czech 
Republic.1250 
[3] LC_MONETARY=Czech_Czech Republic.1250 LC_NUMERIC=C  
[5] LC_TIME=Czech_Czech Republic.1250 

Regards
Petr

> Dear Professor Daalgard,
> 
> I beginning to participate in one research of statiscal modelling of
> translators'activity data, and recently install R and try to generate 
the
> one Translation Progress Graph, as my colleagues do (with sucess), but 
in my
> Windows platform was found the error below. According R'FAQs, it seems 
to be
> very common error, as I'm not even familiar with the program R and even 
with
> the ProGra, could you help me? Please!
> 
> Note: the Translation Progress Graph is compost by quintuple data {S, T, 
A,
> F, K} for Source and Target Text, Alignment, Fixation and Keyboar data,
> respectively. 
> 
> 
> >ReadData("C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2")
> Reading Fixation Units:
> C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .fu
> Reading Production Units:
> C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .pu
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings, : 
> line 38 did not have 10 elements
> 
> Note: We try to delete the line 38, and the program results in another 
line
> error. Even delete all lines after, the some error occur. I think is not 
one
> encoding error, due the fact my colleague use Linux, and I Windows.
> 
> Sample of file above:
> n   start   dur   pause   par   ins   del   sid   tid   str
> 0   11185   1   28344   0   2   0   1   1   尽管
> 1   39530   2   1012   100.00   3   0   2   2+3   发展中
> 2   40544   1   69139   100.00   2   0   3   4   国家
> 3   109684   1   4944   100.00   2   0   -1   5   关于
> 4   114629   0   4212   0   1   0   17   6   为
> 5   118841   1   2558   100.00   2   0   18+19   7   贫困
> 6   121400   1   6800   100.00   2   0   -1   8   人民
> 7   128201   1   1591   100.00   2   0   -1   9   争取
> 8   129793   436   1623   0   3   0   -1+11+13+15   10+11+12   更好的
> 9   131852   608   3573   100.00   3   0   -1+16   13+14   生活

Re: [R] Novice question about getting data into R

2012-06-20 Thread schmaltz
Dear Professor Daalgard,

I beginning to participate in one research of statiscal modelling of
translators'activity data, and recently install R and try to generate the
one Translation Progress Graph, as my colleagues do (with sucess), but in my
Windows platform was found the error below. According R'FAQs, it seems to be
very common error, as I'm not even familiar with the program R and even with
the ProGra, could you help me? Please!

Note: the Translation Progress Graph is compost by quintuple data {S, T, A,
F, K} for Source and Target Text, Alignment, Fixation and Keyboar data,
respectively. 
 

>ReadData("C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2")
Reading Fixation Units:
C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .fu
Reading Production Units:
C:/Users/schmaltz/Dropbox/EN-CH/proGra/EN-ZH_P2_T4_T2 .pu
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : 
line 38 did not have 10 elements

Note: We try to delete the line 38, and the program results in another line
error. Even delete all lines after, the some error occur. I think is not one
encoding error, due the fact my colleague use Linux, and I Windows.

Sample of file above:
n   start   dur pause   par ins del sid tid str
0   11185   1   28344   0   2   0   1   1   尽管
1   39530   2   1012100.00  3   0   2   2+3 发展中
2   40544   1   69139   100.00  2   0   3   4   国家
3   109684  1   4944100.00  2   0   -1  5   关于
4   114629  0   42120   1   0   17  6   为
5   118841  1   2558100.00  2   0   18+19   7   贫困
6   121400  1   6800100.00  2   0   -1  8   人民
7   128201  1   1591100.00  2   0   -1  9   争取
8   129793  436 16230   3   0   -1+11+13+15 
10+11+12更好的
9   131852  608 3573100.00  3   0   -1+16   13+14   生活的
10  136033  12021309100.00  5   0   -1+4+5  15+16+17
说辞是可以
11  138544  468 3682100.00  3   0   -1  18+19   理解的
12  142694  359 10811   0   2   0   20  20  ,_
13  153864  0   21210   1   0   -1  21  但
14  155985  1   2838100.00  2   0   -1  22  其实
15  158824  1   1435100.00  2   0   -1  23  保护
16  160260  421 361987.65   3   0   -1  24+25   环境和
17  164300  1   1075100.00  2   0   28  26  经济
18  165376  11081030100.00  4   0   -1+26+29
27+28+29发展是不
19  167514  1466844054.98   4   0   -1+27+30
30+31+32冲突的.
20  177420  906 4023100.00  4   0   -1+32   33+34   我们必须
21  182349  1   1622100.00  2   0   36  35  鼓励
22  183972  2   1573100.00  3   0   37  36+37   发展中
23  185547  1   15381   100.00  2   0   38  38  国家
24  200929  1   1934100.00  2   0   42  39  扩展
25  202864  1   5864100.00  2   0   43  40  绿色
26  208729  1   4383100.00  2   0   -1  41  植被
27  213113  0   14970   1   0   45  42  ,
28  214610  1   2963100.00  2   0   -1  43  发展
29  217574  906 5085100.00  4   0   -1+48   44+45   节能科技
30  223565  0   15750   1   0   49  46  ,
31  225140  1   2683100.00  2   0   50  47  并且
32  227824  1   2136100.00  2   0   53  48  帮助
33  229961  1   6613100.00  2   0   -1  49  它们
34  236575  1   6068100.00  2   0   54  50  减少
35  242644  1   2635100.00  2   0   -1  51  环境
36  245280  343 8315100.00  3   0   -1+110  52+93   污染和
37  253938  1   1653100.00  2   0   -1  53  破坏
38  255592  0   25381   0   1   0   58  54  .
39  280973  1   1809100.00  2   0   59  55  一些
40  282783  1   16878   100.00  2   0   61+64   56  国家

Thank you very much!

Marcia

--
View this message in context: 
http://r.789695.n4.nabble.com/Novice-question-about-getting-data-into-R-tp866806p4633954.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Novice question about getting data into R

2008-09-19 Thread Ted Byers

Thanks one and all.

Actually, I used OpenOffice's spreadsheet to creat the csv file, but I have
been using it long enough to know to specify how I wanted it, and sometimes,
when that proves annoying, I'll use Perl to finess it the way I want it.

It seems my principle error was to assume that it would ignore the character
strings within the double quotes and determine fields based on the commas. 
Silvia's remarks about empty cells and blanks in the middle of column names
were right on the mark.

Tom, I appreciate the caveats you mention.  I am aware of the complications
of i18n, but they don't affect me much as my stuff is run exclusively in
Canada (pretty much the same norms as the US).  They don't affect me (in a
sense because I have manipuated data around such issues using perl in order
to satisfy the peculiarities of the software used on one project or another
- I deal with it almost as a matter of course, as long as I already know the
peculiarities of the software I am working with), and I have plenty of
experience moving data between spreadsheets, RDBMS such as MS SQL,
PostgreSQl, MySQL, and XML files, and have had to resort to unusual
delimiters in the past because of peculiarities in the data feed.  While I
have tonnes of experience developing software (C++, Java, FORTRAN, perl) I
only started playing with R a few months ago, and this is the first I have
had to import real data into it.  While the tutorials I found were useful,
it seems there are key tidbits of information I need scattered through the
documentation and I am finding it challenging to find the peculiarities of
R.

Thanks again one and all.

Ted



Tom Backer Johnsen wrote:
> 
> Silvia Lomascolo wrote:
>> 
>>> refdata =
>>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv",
>>> header
>>> = TRUE)
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>> na.strings, 
>> : 
>>   line 1 did not have 42 elements
>>> refdata =
>>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>> na.strings, 
>> : 
>>   line 2 did not have 42 elements
>> R interprets that you have 42 columns from the variable names. Do you?
>> See
>> if removing spaces between column names helps (e.g., "week.1" instead of
>> "week 1").  Also, because yours is a csv file, fields are separated by
>> comas.  You can either use the "read.csv" command instead of the
>> "read.table" (see ?read.table for details), or add the argument sep=","
>> to
>> tell R that fields are separated by comas.  You might also need to
>> specify,
>> if you have empty cells, what to do with them (e.g., na.strings="")
> 
> You are of course right about the NA's (missing values, empty cells) as 
> well as the possible blanks in the column names.  It might nevertheless 
> be a good idea for him to at least submit a few of the lines at the top 
> of the file.  A .csv file as generated by Excel on Windows is not 
> necessarily comma-separated.  That depends on the "list separator" 
> setting under "Regional Language Settings" found in the Control Panel. 
> On my machine, the list separator is a semicolon for a .csv file.  The 
> reason is simple, in Norway, the standard decimal separator is a comma, 
> and you do not want to confuse the system too much.  So, that particular 
> point is dependent on the settisngs for his locale (language, country).
> 
> Tom
>> 
>> 
>> 
>> 
> 
> 
> -- 
> ++
> | Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
> | University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
> | Tel : +47-5558-9185Fax : +47-5558-9879 |
> | Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ |
> ++
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19577763.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Novice question about getting data into R

2008-09-19 Thread Tom Backer Johnsen

Silvia Lomascolo wrote:



refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements

R interprets that you have 42 columns from the variable names. Do you? See
if removing spaces between column names helps (e.g., "week.1" instead of
"week 1").  Also, because yours is a csv file, fields are separated by
comas.  You can either use the "read.csv" command instead of the
"read.table" (see ?read.table for details), or add the argument sep="," to
tell R that fields are separated by comas.  You might also need to specify,
if you have empty cells, what to do with them (e.g., na.strings="")


You are of course right about the NA's (missing values, empty cells) as 
well as the possible blanks in the column names.  It might nevertheless 
be a good idea for him to at least submit a few of the lines at the top 
of the file.  A .csv file as generated by Excel on Windows is not 
necessarily comma-separated.  That depends on the "list separator" 
setting under "Regional Language Settings" found in the Control Panel. 
On my machine, the list separator is a semicolon for a .csv file.  The 
reason is simple, in Norway, the standard decimal separator is a comma, 
and you do not want to confuse the system too much.  So, that particular 
point is dependent on the settisngs for his locale (language, country).


Tom








--
++
| Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
| University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
| Tel : +47-5558-9185Fax : +47-5558-9879 |
| Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ |
++

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Novice question about getting data into R

2008-09-19 Thread Silvia Lomascolo


> refdata =
> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
> = TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements
> refdata =
> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements
>
R interprets that you have 42 columns from the variable names. Do you? See
if removing spaces between column names helps (e.g., "week.1" instead of
"week 1").  Also, because yours is a csv file, fields are separated by
comas.  You can either use the "read.csv" command instead of the
"read.table" (see ?read.table for details), or add the argument sep="," to
tell R that fields are separated by comas.  You might also need to specify,
if you have empty cells, what to do with them (e.g., na.strings="")




-- 
View this message in context: 
http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19576350.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Novice question about getting data into R

2008-09-19 Thread John Kane
Try read.csv("K:\\MerchantData\\RiskModel\\refund_distribution.csv",header = 
TRUE)


--- On Fri, 9/19/08, Ted Byers <[EMAIL PROTECTED]> wrote:

> From: Ted Byers <[EMAIL PROTECTED]>
> Subject: [R]  Novice question about getting data into R
> To: r-help@r-project.org
> Received: Friday, September 19, 2008, 1:01 PM
> I found it easy to use R when typing data manually into it. 
> Now I need to
> read data from a file, and I get the following errors:
> 
> > refdata =
> >
> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv",
> header
> > = TRUE)
> Error in scan(file, what, nmax, sep, dec, quote, skip,
> nlines, na.strings, 
> : 
>   line 1 did not have 42 elements
> > refdata =
> >
> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
> Error in scan(file, what, nmax, sep, dec, quote, skip,
> nlines, na.strings, 
> : 
>   line 2 did not have 42 elements
> >
> 
> (I'd tried the first version above because the first
> record has column
> names.)
> 
> First, I don't know why R expects 42 elements in a
> record.  
> There is one column for a time variable (weeks since a
> given week of samples
> were taken) and one for each week of sampling in the data
> file (Week 18
> through Week 37 inclusive).  And there is only 19 rows.
> The samples represented by the columns are independant, and
> the numbers in
> the columns are the fraction of events sampled that result
> in an event of
> another kind in the week since the sample was taken.
> 
> The samples are not the same size, and starting with week
> 20, the number of
> values progressively gets smaller since there have been
> fewer than 37  weeks
> since the samples were taken.
> 
> I can show you the contents of the data file if you wish. 
> It is
> unremarkable, csv, with strings used for column names
> enclosed in double
> quotes.
> 
> I don't have to manually separate the samples into
> their own files do I?  I
> was hoping to write a function that estimates the density
> function that best
> fits each sample individually, and then iterate of the
> columns, applying
> that function to each in turn.
> 
> What is the best way to handle this?
> 
> Thanks
> 
> Ted
> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19576065.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Novice question about getting data into R

2008-09-19 Thread Peter Dalgaard
Ted Byers wrote:
> I found it easy to use R when typing data manually into it.  Now I need to
> read data from a file, and I get the following errors:
>
>   
>> refdata =
>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
>> = TRUE)
>> 
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
> : 
>   line 1 did not have 42 elements
>   
>> refdata =
>> read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
>> 
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
> : 
>   line 2 did not have 42 elements
>   
>
> (I'd tried the first version above because the first record has column
> names.)
>
> First, I don't know why R expects 42 elements in a record.  
>   
Hard to tell. One guess is that you have 42 header names. Spaces inside
any of them? Is this really a CSV file? (As in Comma Separated Values).
If so, you at least need to set the sep= argument, but how about
read.csv()? or if TAB separated, read.delim().
> There is one column for a time variable (weeks since a given week of samples
> were taken) and one for each week of sampling in the data file (Week 18
> through Week 37 inclusive).  And there is only 19 rows.
> The samples represented by the columns are independant, and the numbers in
> the columns are the fraction of events sampled that result in an event of
> another kind in the week since the sample was taken.
>
> The samples are not the same size, and starting with week 20, the number of
> values progressively gets smaller since there have been fewer than 37  weeks
> since the samples were taken.
>
> I can show you the contents of the data file if you wish.  It is
> unremarkable, csv, with strings used for column names enclosed in double
> quotes.
>   
You might well have to. One man's "unremarkable" can be remarkably
different from others'...
> I don't have to manually separate the samples into their own files do I?  I
> was hoping to write a function that estimates the density function that best
> fits each sample individually, and then iterate of the columns, applying
> that function to each in turn.
>
> What is the best way to handle this?
>
> Thanks
>
> Ted
>
>
>   


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Novice question about getting data into R

2008-09-19 Thread Duncan Murdoch

On 9/19/2008 1:01 PM, Ted Byers wrote:

I found it easy to use R when typing data manually into it.  Now I need to
read data from a file, and I get the following errors:


refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)


If your file is really a comma separated file, use read.csv, not 
read.table (which defaults to white space separators).


Duncan Murdoch

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements




(I'd tried the first version above because the first record has column
names.)

First, I don't know why R expects 42 elements in a record.  
There is one column for a time variable (weeks since a given week of samples

were taken) and one for each week of sampling in the data file (Week 18
through Week 37 inclusive).  And there is only 19 rows.
The samples represented by the columns are independant, and the numbers in
the columns are the fraction of events sampled that result in an event of
another kind in the week since the sample was taken.

The samples are not the same size, and starting with week 20, the number of
values progressively gets smaller since there have been fewer than 37  weeks
since the samples were taken.

I can show you the contents of the data file if you wish.  It is
unremarkable, csv, with strings used for column names enclosed in double
quotes.

I don't have to manually separate the samples into their own files do I?  I
was hoping to write a function that estimates the density function that best
fits each sample individually, and then iterate of the columns, applying
that function to each in turn.

What is the best way to handle this?

Thanks

Ted




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.