Re: [R] Odd result

avi.e.gross Sun, 24 Sep 2023 08:34:24 -0700

David,

You have choices depending on your situation and plans.


Obviously the ideal solution is to make any CSV you save your EXCEL data in to 
have exactly what you want. So if your original EXCEL file contains things like 
a blank character down around row 973, get rid of it or else all lines to there 
may be picked up and made into an NA. I suggest deleting all extra lines as a 
first try.

The other method to try is simply to read in the file and only keep complete 
cases. But your data shows you can have an NA in some columns, such as for 7/25 
so using complete.cases() is not a good choice.

So since your first column  (or maybe second) seems to be a date and I think 
that is not optional, simply filter your data.frame to remove all rows where 
is.na(DF$COL) is TRUE or some similar stratagem such as checking if all columns 
are NA.

My guess is you may have re-used an EXCEL file and put new shorter data in it, 
or that the file has been edited and something was left where it should not be, 
perhaps something non-numeric. 

Another idea is to NOT use the CSV route and use one of many packages carefully 
to read the data from a native EXCEL format such as an XSLX file where you can 
specify which tab you want and where on the page you want to read from. You can 
point it at the precise rectangular area you want.

And, of course, there are an assortment of cut/paste ways to get the data into 
tour R program, albeit if the data can change and you need to run the analysis 
again, these are less useful. Here is an example making use of the fact that on 
Windows, the copied text is tab separated.

text="A B
1       0
2       1
3       2
4       3
5       4
6       5
7       6
8       7
9       8
10      9
"
df=read.csv(text=text, sep="\t")
df

    A B
1   1 0
2   2 1
3   3 2
4   4 3
5   5 4
6   6 5
7   7 6
8   8 7
9   9 8

-----Original Message-----
From: R-help <[email protected]> On Behalf Of Parkhurst, David
Sent: Saturday, September 23, 2023 6:55 PM
To: [email protected]
Subject: [R] Odd result

With help from several people, I used file.choose() to get my file name, and 
read.csv() to read in the file as KurtzData.  Then when I print KurtzData, the 
last several lines look like this:
39   5/31/22              16.0      341    1.75525 0.0201 0.0214   7.00
40   6/28/22  2:00 PM      0.0      215    0.67950 0.0156 0.0294     NA
41   7/25/22 11:00 AM      11.9   1943.5        NA     NA 0.0500   7.80
42   8/31/22                  0    220.5        NA     NA 0.0700  30.50
43   9/28/22              0.067     10.9        NA     NA 0.0700  10.20
44  10/26/22              0.086      237        NA     NA 0.1550  45.00
45   1/12/23  1:00 PM     36.26    24196        NA     NA 0.7500 283.50
46   2/14/23  1:00 PM     20.71       55        NA     NA 0.0500   2.40
47                                              NA     NA     NA     NA
48                                              NA     NA     NA     NA
49                                              NA     NA     NA     NA

Then the NA�s go down to one numbered 973.  Where did those extras likely come 
from, and how do I get rid of them?  I assume I need to get rid of all the 
lines after #46,  to do calculations and graphics, no?

David

        [[alternative HTML version deleted]]


-----Original Message-----
From: R-help <[email protected]> On Behalf Of Parkhurst, David
Sent: Saturday, September 23, 2023 6:55 PM
To: [email protected]
Subject: [R] Odd result

With help from several people, I used file.choose() to get my file name, and 
read.csv() to read in the file as KurtzData.  Then when I print KurtzData, the 
last several lines look like this:
39   5/31/22              16.0      341    1.75525 0.0201 0.0214   7.00
40   6/28/22  2:00 PM      0.0      215    0.67950 0.0156 0.0294     NA
41   7/25/22 11:00 AM      11.9   1943.5        NA     NA 0.0500   7.80
42   8/31/22                  0    220.5        NA     NA 0.0700  30.50
43   9/28/22              0.067     10.9        NA     NA 0.0700  10.20
44  10/26/22              0.086      237        NA     NA 0.1550  45.00
45   1/12/23  1:00 PM     36.26    24196        NA     NA 0.7500 283.50
46   2/14/23  1:00 PM     20.71       55        NA     NA 0.0500   2.40
47                                              NA     NA     NA     NA
48                                              NA     NA     NA     NA
49                                              NA     NA     NA     NA

Then the NA�s go down to one numbered 973.  Where did those extras likely come 
from, and how do I get rid of them?  I assume I need to get rid of all the 
lines after #46,  to do calculations and graphics, no?

David

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Odd result

Reply via email to