Cool, looks like that'd do it, almost as if converting an entire record to a
character string and comparing strings.
-- M. B. Hardy, statistician
work: Applied Research Associates, S. E. Div.
8537 Six Forks Rd., # 6000 / Raleigh, NC 27615-2963
(919) 582-3329, fax: 582-3301
home: 1020 W. South St. / Raleigh, NC 27603-2162
(919) 834-1245
________________________________________
From: William Dunlap [[email protected]]
Sent: Saturday, January 27, 2018 4:57 PM
To: Marsh Hardy ARA/RISK
Cc: Ulrik Stervbo; Eric Berger; [email protected]
Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row.
If your two objects have class "data.frame" (look at class(objectName)) and they
both have the same number of columns and the same order of columns and the
column types match closely enough (use all.equal(x1, x2) for that), then you
can try
which( rowSums( x1 != x2 ) > 0)
E.g.,
> x1 <- data.frame(X=1:5, Y=rep(c("A","B"),c(3,2)))
> x2 <- data.frame(X=c(1,2,-3,-4,5), Y=rep(c("A","B"),c(2,3)))
> x1
X Y
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
> x2
X Y
1 1 A
2 2 A
3 -3 B
4 -4 B
5 5 B
> which( rowSums( x1 != x2 ) > 0)
[1] 3 4
If you want to allow small numeric differences but exactly character matches
you will have to get a bit fancier. Splitting the data.frames into character
and
numeric parts and comparing each works well.
Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>
On Sat, Jan 27, 2018 at 1:18 PM, Marsh Hardy ARA/RISK
<[email protected]<mailto:[email protected]>> wrote:
Hi Guys, I apologize for my rank & utter newness at R.
I used summary() and found about 95 variables, both character and numeric, all
with "Length:368842" I assume is the # of records.
I'd like to know the record number (row #?) of any record where the data
doesn't match in the 2 files of what should be the same output.
Thanks in advance, M.
//
________________________________________
From: Ulrik Stervbo [[email protected]<mailto:[email protected]>]
Sent: Saturday, January 27, 2018 10:00 AM
To: Eric Berger
Cc: Marsh Hardy ARA/RISK; [email protected]<mailto:[email protected]>
Subject: Re: [R] Newbie wants to compare 2 huge RDSs row by row.
Also, it will be easier to provide helpful information if you'd describe what
in your data you want to compare and what you hope to get out of the comparison.
Best wishes,
Ulrik
Eric Berger
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
schrieb am Sa., 27. Jan. 2018, 08:18:
Hi Marsh,
An RDS is not a data structure such as a data.frame. It can be anything.
For example if I want to save my objects a, b, c I could do:
> saveRDS( list(a,b,c,), file="tmp.RDS")
Then read them back later with
> myList <- readRDS( "tmp.RDS" )
Do you have additional information about your "RDSs" ?
Eric
On Sat, Jan 27, 2018 at 6:54 AM, Marsh Hardy ARA/RISK
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
wrote:
> Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing
> row numbers with mismatches?
>
> Thanks in advance.
>
> //
>
> ______________________________________________
> [email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected]<mailto:[email protected]> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.