Re: [R] Create a Data Frame from an XML

2013-01-24 Thread Adam Gabbert
Hi Ben,

Thanks for info. That is definitely a viable solution for the example I
provided.  It is more common that I have larger files with more edits to
make.

The main reason I went with a data frame methodology is becuase of the file
types I have.  Essentially, I have an XML 50+ rows by 10 columns of
data in a similar form to the provided example.  Seperately, I have an
.xlsx file that contains 3 to 6 columns of data that must replace a
particular column of the data in the XML file.  So my approach was
to have one XML data frame, one .xlsx data frame, combine them as
necessary, and output a final XML format with updated data. Apparently, it
wasn't quite as trivial of a problem as I was hoping.


On Wed, Jan 23, 2013 at 8:09 PM, Ben Tupper btup...@bigelow.org wrote:

 Hi Adam,

  On Jan 23, 2013, at 11:36 AM, Adam Gabbert wrote:

  Hello Gentlemen,

 I mistakenly sent the message twice, because the first time I didn't
 receive a notification message so I was unsure if it went through properly.

 Your solutions worked great. Thank you!  I felt like I was fairly close
 just couldn't quite get the final step.

 Now, I'm trying to reverse the process and account for my header.

 In other words I have my data frame in R:

 BRANDNUMYEARVALUE
 GMC1  1999  1
 FORD   2  2000  12000
 GMC1  2001   12500
  etc
 and I make some edits.
 BRANDNUMYEARVALUE
 DODGE   3  1999  1
 TOYOTA   4 2000  12000
 DODGE3  2001   12500


 You needn't transform to a data frame if all you need to do is tweak the
 values of some of the attributes.  You can always set the attributes of
 each row node directly.

  s - c(  data,  row BRAND=\GMC\ NUM=\1\ YEAR=\1999\
 VALUE=\1\ /,
  row BRAND=\FORD\ NUM=\1\ YEAR=\2000\ VALUE=\12000\ /,
  row BRAND=\GMC\ NUM=\1\ YEAR=\2001\ VALUE=\12500\ /,
  row BRAND=\GMC\ NUM=\1\ YEAR=\2008\ VALUE=\22000\ /,
  /data)

 x - xmlRoot(xmlTreeParse(s, asText = TRUE, useInternalNodes = TRUE))

 node - x[row][[1]]
 node
 xmlAttrs(node) - c(BRAND = BUICK, NUM = 3, YEAR = 2000, VALUE = 0)
 node
 x



  So now I would need to ouput an XML file in the same format accounting
 for my header (essentially, add z: in front of row).



 I think that what you're describing is a namespace identifier.  Check the
 XML package help for ?xmlNamespace  In particular check this example on the
 help page.

   node - xmlNode(arg, xmlNode(name, foo), namespace=R)
   xmlNamespace(node)



 Cheers,
 Ben

  (What I want to output)
data
z:row BRAND=DODGE NUM=3 YEAR=1999 VALUE=1 /
z:row BRAND=TOYOTA NUM=4 YEAR=2000 VALUE=12000 /
z:row BRAND=DODGE NUM=3 YEAR=2001 VALUE=12500 /
z:row BRAND=TOYOTA NUM=4 YEAR=2002 VALUE=13000 /
z:row BRAND=DODGE NUM=3 YEAR=2003 VALUE=14000 /
z:row BRAND=TOYOTA NUM=4 YEAR=2004 VALUE=17000 /
z:row BRAND=DODGE NUM=3 YEAR=2005 VALUE=15000 /
z:row BRAND=DODGE NUM=3 YEAR=1967 VALUE=PRICELESS /
z:row BRAND=TOYOTA NUM=4 YEAR=2007 VALUE=17500 /
z:row BRAND=DODGE NUM=3 YEAR=2008 VALUE=22000 /
/data
 Thus far from the help I've found online I was trying to set up an xmlTree
 xml - xmlTree()

 and use xml$addTag to create nodes and put in the data from my data
 frame.  I feel like I'm not really even close to a solution so I'm starting
 to believe that this might not be the best path to go down.

 Once again, any help is much appreciated.

 AG


 On Tue, Jan 22, 2013 at 6:04 PM, Duncan Temple Lang 
 dtemplel...@ucdavis.edu wrote:


 Hi Adam

  [You seem to have sent the same message twice to the mailing list.]

 There are various strategies/approaches to creating the data frame
 from the XML.

 Perhaps the approach that most closely follows your approach is

   xmlRoot(doc)[ row ]

 which  returns a list of XML nodes whose node name is row that are
 children of the root node data.

 So
   sapply(xmlRoot(doc) [ row ], xmlAttrs)

 yields a matrix with as many columns as there are  row nodes
 and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes.

 So

   d = t( sapply(xmlRoot(doc) [ row ], xmlAttrs) )

 gives you a matrix with the correct rows and column orientation
 and now you can turn that into a data frame, converting the
 columns into numbers, etc. as you want with regular R commands
 (i.e. independently of the XML).


  D.

 On 1/22/13 1:43 PM, Adam Gabbert wrote:
   Hello,
 
  I'm attempting to read information from an XML into a data frame in R
 using
  the XML package. I am unable to get the data into a data frame as I
 would
  like.  I have some sample code below.
 
  *XML Code:*
 
  Header...
 
  Data I want in a data frame:
 
 data
row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 /
row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 /
row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 /
row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 /
row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 /
row BRAND

Re: [R] Create a Data Frame from an XML

2013-01-23 Thread Adam Gabbert
Hello Gentlemen,

I mistakenly sent the message twice, because the first time I didn't
receive a notification message so I was unsure if it went through properly.

Your solutions worked great. Thank you!  I felt like I was fairly close
just couldn't quite get the final step.

Now, I'm trying to reverse the process and account for my header.

In other words I have my data frame in R:

BRANDNUMYEARVALUE
GMC1  1999  1
FORD   2  2000  12000
GMC1  2001   12500
 etc
and I make some edits.
BRANDNUMYEARVALUE
DODGE   3  1999  1
TOYOTA   4 2000  12000
DODGE3  2001   12500
So now I would need to ouput an XML file in the same format accounting for
my header (essentially, add z: in front of row).

(What I want to output)
   data
   z:row BRAND=DODGE NUM=3 YEAR=1999 VALUE=1 /
   z:row BRAND=TOYOTA NUM=4 YEAR=2000 VALUE=12000 /
   z:row BRAND=DODGE NUM=3 YEAR=2001 VALUE=12500 /
   z:row BRAND=TOYOTA NUM=4 YEAR=2002 VALUE=13000 /
   z:row BRAND=DODGE NUM=3 YEAR=2003 VALUE=14000 /
   z:row BRAND=TOYOTA NUM=4 YEAR=2004 VALUE=17000 /
   z:row BRAND=DODGE NUM=3 YEAR=2005 VALUE=15000 /
   z:row BRAND=DODGE NUM=3 YEAR=1967 VALUE=PRICELESS /
   z:row BRAND=TOYOTA NUM=4 YEAR=2007 VALUE=17500 /
   z:row BRAND=DODGE NUM=3 YEAR=2008 VALUE=22000 /
   /data
Thus far from the help I've found online I was trying to set up an xmlTree
xml - xmlTree()

and use xml$addTag to create nodes and put in the data from my data frame.
I feel like I'm not really even close to a solution so I'm starting to
believe that this might not be the best path to go down.

Once again, any help is much appreciated.

AG


On Tue, Jan 22, 2013 at 6:04 PM, Duncan Temple Lang dtemplel...@ucdavis.edu
 wrote:


 Hi Adam

  [You seem to have sent the same message twice to the mailing list.]

 There are various strategies/approaches to creating the data frame
 from the XML.

 Perhaps the approach that most closely follows your approach is

   xmlRoot(doc)[ row ]

 which  returns a list of XML nodes whose node name is row that are
 children of the root node data.

 So
   sapply(xmlRoot(doc) [ row ], xmlAttrs)

 yields a matrix with as many columns as there are  row nodes
 and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes.

 So

   d = t( sapply(xmlRoot(doc) [ row ], xmlAttrs) )

 gives you a matrix with the correct rows and column orientation
 and now you can turn that into a data frame, converting the
 columns into numbers, etc. as you want with regular R commands
 (i.e. independently of the XML).


  D.

 On 1/22/13 1:43 PM, Adam Gabbert wrote:
   Hello,
 
  I'm attempting to read information from an XML into a data frame in R
 using
  the XML package. I am unable to get the data into a data frame as I
 would
  like.  I have some sample code below.
 
  *XML Code:*
 
  Header...
 
  Data I want in a data frame:
 
 data
row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 /
row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 /
row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 /
row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 /
row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 /
row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 /
row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 /
row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS /
row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 /
row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 /
/data
 
  *R Code:*
 
  doc -xmlInternalTreeParse (Sample2.xml)
  top - xmlRoot (doc)
  xmlName (top)
  names (top)
  art - top [[row]]
  art
  **
  *Output:*
 
  artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/
 
  * *
 
 
  This is where I am having difficulties.  I am unable to access
 additional
  rows; ( i.e.  row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / )
 
  and I am unable to access the individual entries to actually create the
  data frame.  The data frame I would like is as follows:
 
  BRANDNUMYEARVALUE
  GMC1  1999  1
  FORD   2  2000  12000
  GMC1  2001   12500
  etc
 
  Any help or suggestions would be appreciated.  Conversly, my eventual
 goal
  would be to take a data frame and write it into an XML in the previously
  shown format.
 
  Thank you
 
  AG
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented

[R] Creating a Data Frame from an XML

2013-01-22 Thread Adam Gabbert
Hello,

I'm attempting to read information from an XML into a data frame in R using
the XML package. I am unable to get the data into a data frame as I would
like.  I have some sample code below.

*XML Code:*

Header...

Data I want in a data frame:

   data
  row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 /
  row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 /
  row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 /
  row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 /
  row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 /
  row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 /
  row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 /
  row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS /
  row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 /
  row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 /
  /data

*R Code:*

doc -xmlInternalTreeParse (Sample2.xml)
top - xmlRoot (doc)
xmlName (top)
names (top)
art - top [[row]]
art
**
*Output:*

 artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/




This is where I am having difficulties.  I am unable to access additional
rows; ( i.e.  row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / )

and I am unable to access the individual entries to actually create the
data frame.  The data frame I would like is as follows:

BRANDNUMYEARVALUE
GMC1  1999  1
FORD   2  2000  12000
GMC1  2001   12500
etc

Any help or suggestions would be appreciated.  Conversly, my eventual goal
would be to take a data frame and write it into an XML in the previously
shown format.

Thank you

AG

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create a Data Frame from an XML

2013-01-22 Thread Adam Gabbert
 Hello,

I'm attempting to read information from an XML into a data frame in R using
the XML package. I am unable to get the data into a data frame as I would
like.  I have some sample code below.

*XML Code:*

Header...

Data I want in a data frame:

   data
  row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 /
  row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 /
  row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 /
  row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 /
  row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 /
  row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 /
  row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 /
  row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS /
  row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 /
  row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 /
  /data

*R Code:*

doc -xmlInternalTreeParse (Sample2.xml)
top - xmlRoot (doc)
xmlName (top)
names (top)
art - top [[row]]
art
**
*Output:*

 artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/

* *


This is where I am having difficulties.  I am unable to access additional
rows; ( i.e.  row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / )

and I am unable to access the individual entries to actually create the
data frame.  The data frame I would like is as follows:

BRANDNUMYEARVALUE
GMC1  1999  1
FORD   2  2000  12000
GMC1  2001   12500
etc

Any help or suggestions would be appreciated.  Conversly, my eventual goal
would be to take a data frame and write it into an XML in the previously
shown format.

Thank you

AG

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Select Original and Duplicates

2012-09-28 Thread Adam Gabbert
I would like to select a all the duplicate rows of a data frame including
the original.  Any help would be much appreciated.  This is where I'm at so
far. Thanks.

#Sample data frame:
df - read.table(header=T, con - textConnection('
 label value
 A 4
 B 3
 C 6
 B 3
 B 1
 A 2
 A 4
 A 4
'))
close(con)

# Duplicate entries
df[duplicated(df),]

# label value
# B 3
# A 4
# A 4

#I want to select all the rows that are duplicated including the original
#This is the output I want
# label value
# B 3
# B 3
# A 4
# A 4
# A 4

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select Original and Duplicates

2012-09-28 Thread Adam Gabbert
That works. Thank you!

On Fri, Sep 28, 2012 at 4:22 PM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,

 Try the following.


 idx - duplicated(df) | duplicated(df, fromLast = TRUE)
 df[idx, ]

 Note that they are returned in their original order in the df.

 Hope this helps,

 Rui Barradas

 Em 28-09-2012 21:11, Adam Gabbert escreveu:

 I would like to select a all the duplicate rows of a data frame including
 the original.  Any help would be much appreciated.  This is where I'm at
 so
 far. Thanks.

 #Sample data frame:
 df - read.table(header=T, con - textConnection('
   label value
   A 4
   B 3
   C 6
   B 3
   B 1
   A 2
   A 4
   A 4
 '))
 close(con)

 # Duplicate entries
 df[duplicated(df),]

 # label value
 # B 3
 # A 4
 # A 4

 #I want to select all the rows that are duplicated including the original
 #This is the output I want
 # label value
 # B 3
 # B 3
 # A 4
 # A 4
 # A 4

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error Bars ggplot2

2012-07-26 Thread Adam Gabbert
Hello,

I'm attempting to plot error bars side by side rather than stacked on top
of each other with ggplot2.  Here is the sample code I am using:

#Code

#Data
spd-c(s,f,f,s,f,s,s,s,f,f,s,f)
r-c(4.9,3.2,2.1,.2,3.8,6.4,7.5,1.7,3.4,4.1,2.2,5)

#Turn spd into a factor
spd.f-factor(spd)

#Place data into a data frame
data-data.frame(cbind(spd.f,r))

#Load ggplot2
library(ggplot2)

#Create plot
pd-position_dodge(.2)
myplot-ggplot(data,aes(x=spd,y=r,colour=spd))+
  geom_errorbar(aes(ymin=3,ymax=5),width=.1)+
  geom_point()+
  geom_errorbar(aes(ymin=1,ymax=6),width=.1,colour=black,position=pd)
#Display plot
myplot

I have attached a plot that my sample code produces.  As you can see the
error bars are stacked.  How can I get them to plot side by side?

Thanks

AG
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error Bars ggplot2

2012-07-26 Thread Adam Gabbert
Hi Dennis,

Part of my problem could be that I'm unsure how to nest another variable
withn spd.f.  Perhaps if I give a better explanation of my goal things will
make more sense.

My intent is to calculate two sets of confidence intervals to show the
benefits of a DOE approach versus a Non-DOE approach.  For example, I want
to calculate the confidence intervals for f and s in two ways.  First
using a DOE approach, by taking the mean of the f or s values
plus/minus t-value at a 95%ci with 10df multiplied by the standard error
[mean(f-values) +- tval(.95ci,10df)*std.err].  Second using a bin
specific approach which only looks at the 6 f or s values (i.e,
mean(f-values) +- tval(.95ci, 5df)*std.err.mean).

This will leave me with two confidence intervals, that I want to plot side
by side to show that the DOE approach confidence interval will be smaller
for most cases.  I have attached a sample plot that shows the plot layout
I'm trying to get.

Thanks

AG



Hi:

Your code makes no sense because the variable by which you want to
dodge is the same as the one you're using on the x-axis. Dodging by
subgroups is an application of visualizing nested data, which you
don't have (at least in the state that you posted). For your data,
this would work:

ggplot(data, aes(x = spd, y = r, colour = spd)) +
   geom_errorbar(aes(ymin = 3, ymax = 5), width = 0.1) +
   geom_point()

I don't understand the point of the second geom_errorbar() call, so
I'm just avoiding it.

In order to dodge (appose groups in factor B side by side within each
level of factor A), you need a third variable whose values are nested
within levels of spd.f.

HTH,
Dennis

On Thu, Jul 26, 2012 at 6:03 AM, Adam Gabbert adamjgabb...@gmail.com
wrote:
 Hello,

 I'm attempting to plot error bars side by side rather than stacked on top
 of each other with ggplot2.  Here is the sample code I am using:

 #Code

 #Data
 spd-c(s,f,f,s,f,s,s,s,f,f,s,f)
 r-c(4.9,3.2,2.1,.2,3.8,6.4,7.5,1.7,3.4,4.1,2.2,5)

 #Turn spd into a factor
 spd.f-factor(spd)

 #Place data into a data frame
 data-data.frame(cbind(spd.f,r))

 #Load ggplot2
 library(ggplot2)

 #Create plot
 pd-position_dodge(.2)
 myplot-ggplot(data,aes(x=spd,y=r,colour=spd))+
   geom_errorbar(aes(ymin=3,ymax=5),width=.1)+
   geom_point()+
   geom_errorbar(aes(ymin=1,ymax=6),width=.1,colour=black,position=pd)
 #Display plot
 myplot

 I have attached a plot that my sample code produces.  As you can see the
 error bars are stacked.  How can I get them to plot side by side?

 Thanks

 AG
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.