Re: [R] Create a Data Frame from an XML
Hi Ben, Thanks for info. That is definitely a viable solution for the example I provided. It is more common that I have larger files with more edits to make. The main reason I went with a data frame methodology is becuase of the file types I have. Essentially, I have an XML 50+ rows by 10 columns of data in a similar form to the provided example. Seperately, I have an .xlsx file that contains 3 to 6 columns of data that must replace a particular column of the data in the XML file. So my approach was to have one XML data frame, one .xlsx data frame, combine them as necessary, and output a final XML format with updated data. Apparently, it wasn't quite as trivial of a problem as I was hoping. On Wed, Jan 23, 2013 at 8:09 PM, Ben Tupper btup...@bigelow.org wrote: Hi Adam, On Jan 23, 2013, at 11:36 AM, Adam Gabbert wrote: Hello Gentlemen, I mistakenly sent the message twice, because the first time I didn't receive a notification message so I was unsure if it went through properly. Your solutions worked great. Thank you! I felt like I was fairly close just couldn't quite get the final step. Now, I'm trying to reverse the process and account for my header. In other words I have my data frame in R: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc and I make some edits. BRANDNUMYEARVALUE DODGE 3 1999 1 TOYOTA 4 2000 12000 DODGE3 2001 12500 You needn't transform to a data frame if all you need to do is tweak the values of some of the attributes. You can always set the attributes of each row node directly. s - c( data, row BRAND=\GMC\ NUM=\1\ YEAR=\1999\ VALUE=\1\ /, row BRAND=\FORD\ NUM=\1\ YEAR=\2000\ VALUE=\12000\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2001\ VALUE=\12500\ /, row BRAND=\GMC\ NUM=\1\ YEAR=\2008\ VALUE=\22000\ /, /data) x - xmlRoot(xmlTreeParse(s, asText = TRUE, useInternalNodes = TRUE)) node - x[row][[1]] node xmlAttrs(node) - c(BRAND = BUICK, NUM = 3, YEAR = 2000, VALUE = 0) node x So now I would need to ouput an XML file in the same format accounting for my header (essentially, add z: in front of row). I think that what you're describing is a namespace identifier. Check the XML package help for ?xmlNamespace In particular check this example on the help page. node - xmlNode(arg, xmlNode(name, foo), namespace=R) xmlNamespace(node) Cheers, Ben (What I want to output) data z:row BRAND=DODGE NUM=3 YEAR=1999 VALUE=1 / z:row BRAND=TOYOTA NUM=4 YEAR=2000 VALUE=12000 / z:row BRAND=DODGE NUM=3 YEAR=2001 VALUE=12500 / z:row BRAND=TOYOTA NUM=4 YEAR=2002 VALUE=13000 / z:row BRAND=DODGE NUM=3 YEAR=2003 VALUE=14000 / z:row BRAND=TOYOTA NUM=4 YEAR=2004 VALUE=17000 / z:row BRAND=DODGE NUM=3 YEAR=2005 VALUE=15000 / z:row BRAND=DODGE NUM=3 YEAR=1967 VALUE=PRICELESS / z:row BRAND=TOYOTA NUM=4 YEAR=2007 VALUE=17500 / z:row BRAND=DODGE NUM=3 YEAR=2008 VALUE=22000 / /data Thus far from the help I've found online I was trying to set up an xmlTree xml - xmlTree() and use xml$addTag to create nodes and put in the data from my data frame. I feel like I'm not really even close to a solution so I'm starting to believe that this might not be the best path to go down. Once again, any help is much appreciated. AG On Tue, Jan 22, 2013 at 6:04 PM, Duncan Temple Lang dtemplel...@ucdavis.edu wrote: Hi Adam [You seem to have sent the same message twice to the mailing list.] There are various strategies/approaches to creating the data frame from the XML. Perhaps the approach that most closely follows your approach is xmlRoot(doc)[ row ] which returns a list of XML nodes whose node name is row that are children of the root node data. So sapply(xmlRoot(doc) [ row ], xmlAttrs) yields a matrix with as many columns as there are row nodes and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes. So d = t( sapply(xmlRoot(doc) [ row ], xmlAttrs) ) gives you a matrix with the correct rows and column orientation and now you can turn that into a data frame, converting the columns into numbers, etc. as you want with regular R commands (i.e. independently of the XML). D. On 1/22/13 1:43 PM, Adam Gabbert wrote: Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND
Re: [R] Create a Data Frame from an XML
Hello Gentlemen, I mistakenly sent the message twice, because the first time I didn't receive a notification message so I was unsure if it went through properly. Your solutions worked great. Thank you! I felt like I was fairly close just couldn't quite get the final step. Now, I'm trying to reverse the process and account for my header. In other words I have my data frame in R: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc and I make some edits. BRANDNUMYEARVALUE DODGE 3 1999 1 TOYOTA 4 2000 12000 DODGE3 2001 12500 So now I would need to ouput an XML file in the same format accounting for my header (essentially, add z: in front of row). (What I want to output) data z:row BRAND=DODGE NUM=3 YEAR=1999 VALUE=1 / z:row BRAND=TOYOTA NUM=4 YEAR=2000 VALUE=12000 / z:row BRAND=DODGE NUM=3 YEAR=2001 VALUE=12500 / z:row BRAND=TOYOTA NUM=4 YEAR=2002 VALUE=13000 / z:row BRAND=DODGE NUM=3 YEAR=2003 VALUE=14000 / z:row BRAND=TOYOTA NUM=4 YEAR=2004 VALUE=17000 / z:row BRAND=DODGE NUM=3 YEAR=2005 VALUE=15000 / z:row BRAND=DODGE NUM=3 YEAR=1967 VALUE=PRICELESS / z:row BRAND=TOYOTA NUM=4 YEAR=2007 VALUE=17500 / z:row BRAND=DODGE NUM=3 YEAR=2008 VALUE=22000 / /data Thus far from the help I've found online I was trying to set up an xmlTree xml - xmlTree() and use xml$addTag to create nodes and put in the data from my data frame. I feel like I'm not really even close to a solution so I'm starting to believe that this might not be the best path to go down. Once again, any help is much appreciated. AG On Tue, Jan 22, 2013 at 6:04 PM, Duncan Temple Lang dtemplel...@ucdavis.edu wrote: Hi Adam [You seem to have sent the same message twice to the mailing list.] There are various strategies/approaches to creating the data frame from the XML. Perhaps the approach that most closely follows your approach is xmlRoot(doc)[ row ] which returns a list of XML nodes whose node name is row that are children of the root node data. So sapply(xmlRoot(doc) [ row ], xmlAttrs) yields a matrix with as many columns as there are row nodes and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes. So d = t( sapply(xmlRoot(doc) [ row ], xmlAttrs) ) gives you a matrix with the correct rows and column orientation and now you can turn that into a data frame, converting the columns into numbers, etc. as you want with regular R commands (i.e. independently of the XML). D. On 1/22/13 1:43 PM, Adam Gabbert wrote: Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** *Output:* artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ * * This is where I am having difficulties. I am unable to access additional rows; ( i.e. row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / ) and I am unable to access the individual entries to actually create the data frame. The data frame I would like is as follows: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc Any help or suggestions would be appreciated. Conversly, my eventual goal would be to take a data frame and write it into an XML in the previously shown format. Thank you AG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented
[R] Creating a Data Frame from an XML
Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** *Output:* artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ This is where I am having difficulties. I am unable to access additional rows; ( i.e. row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / ) and I am unable to access the individual entries to actually create the data frame. The data frame I would like is as follows: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc Any help or suggestions would be appreciated. Conversly, my eventual goal would be to take a data frame and write it into an XML in the previously shown format. Thank you AG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create a Data Frame from an XML
Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** *Output:* artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ * * This is where I am having difficulties. I am unable to access additional rows; ( i.e. row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / ) and I am unable to access the individual entries to actually create the data frame. The data frame I would like is as follows: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc Any help or suggestions would be appreciated. Conversly, my eventual goal would be to take a data frame and write it into an XML in the previously shown format. Thank you AG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Select Original and Duplicates
I would like to select a all the duplicate rows of a data frame including the original. Any help would be much appreciated. This is where I'm at so far. Thanks. #Sample data frame: df - read.table(header=T, con - textConnection(' label value A 4 B 3 C 6 B 3 B 1 A 2 A 4 A 4 ')) close(con) # Duplicate entries df[duplicated(df),] # label value # B 3 # A 4 # A 4 #I want to select all the rows that are duplicated including the original #This is the output I want # label value # B 3 # B 3 # A 4 # A 4 # A 4 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select Original and Duplicates
That works. Thank you! On Fri, Sep 28, 2012 at 4:22 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Try the following. idx - duplicated(df) | duplicated(df, fromLast = TRUE) df[idx, ] Note that they are returned in their original order in the df. Hope this helps, Rui Barradas Em 28-09-2012 21:11, Adam Gabbert escreveu: I would like to select a all the duplicate rows of a data frame including the original. Any help would be much appreciated. This is where I'm at so far. Thanks. #Sample data frame: df - read.table(header=T, con - textConnection(' label value A 4 B 3 C 6 B 3 B 1 A 2 A 4 A 4 ')) close(con) # Duplicate entries df[duplicated(df),] # label value # B 3 # A 4 # A 4 #I want to select all the rows that are duplicated including the original #This is the output I want # label value # B 3 # B 3 # A 4 # A 4 # A 4 [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error Bars ggplot2
Hello, I'm attempting to plot error bars side by side rather than stacked on top of each other with ggplot2. Here is the sample code I am using: #Code #Data spd-c(s,f,f,s,f,s,s,s,f,f,s,f) r-c(4.9,3.2,2.1,.2,3.8,6.4,7.5,1.7,3.4,4.1,2.2,5) #Turn spd into a factor spd.f-factor(spd) #Place data into a data frame data-data.frame(cbind(spd.f,r)) #Load ggplot2 library(ggplot2) #Create plot pd-position_dodge(.2) myplot-ggplot(data,aes(x=spd,y=r,colour=spd))+ geom_errorbar(aes(ymin=3,ymax=5),width=.1)+ geom_point()+ geom_errorbar(aes(ymin=1,ymax=6),width=.1,colour=black,position=pd) #Display plot myplot I have attached a plot that my sample code produces. As you can see the error bars are stacked. How can I get them to plot side by side? Thanks AG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error Bars ggplot2
Hi Dennis, Part of my problem could be that I'm unsure how to nest another variable withn spd.f. Perhaps if I give a better explanation of my goal things will make more sense. My intent is to calculate two sets of confidence intervals to show the benefits of a DOE approach versus a Non-DOE approach. For example, I want to calculate the confidence intervals for f and s in two ways. First using a DOE approach, by taking the mean of the f or s values plus/minus t-value at a 95%ci with 10df multiplied by the standard error [mean(f-values) +- tval(.95ci,10df)*std.err]. Second using a bin specific approach which only looks at the 6 f or s values (i.e, mean(f-values) +- tval(.95ci, 5df)*std.err.mean). This will leave me with two confidence intervals, that I want to plot side by side to show that the DOE approach confidence interval will be smaller for most cases. I have attached a sample plot that shows the plot layout I'm trying to get. Thanks AG Hi: Your code makes no sense because the variable by which you want to dodge is the same as the one you're using on the x-axis. Dodging by subgroups is an application of visualizing nested data, which you don't have (at least in the state that you posted). For your data, this would work: ggplot(data, aes(x = spd, y = r, colour = spd)) + geom_errorbar(aes(ymin = 3, ymax = 5), width = 0.1) + geom_point() I don't understand the point of the second geom_errorbar() call, so I'm just avoiding it. In order to dodge (appose groups in factor B side by side within each level of factor A), you need a third variable whose values are nested within levels of spd.f. HTH, Dennis On Thu, Jul 26, 2012 at 6:03 AM, Adam Gabbert adamjgabb...@gmail.com wrote: Hello, I'm attempting to plot error bars side by side rather than stacked on top of each other with ggplot2. Here is the sample code I am using: #Code #Data spd-c(s,f,f,s,f,s,s,s,f,f,s,f) r-c(4.9,3.2,2.1,.2,3.8,6.4,7.5,1.7,3.4,4.1,2.2,5) #Turn spd into a factor spd.f-factor(spd) #Place data into a data frame data-data.frame(cbind(spd.f,r)) #Load ggplot2 library(ggplot2) #Create plot pd-position_dodge(.2) myplot-ggplot(data,aes(x=spd,y=r,colour=spd))+ geom_errorbar(aes(ymin=3,ymax=5),width=.1)+ geom_point()+ geom_errorbar(aes(ymin=1,ymax=6),width=.1,colour=black,position=pd) #Display plot myplot I have attached a plot that my sample code produces. As you can see the error bars are stacked. How can I get them to plot side by side? Thanks AG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.