Re: [R] Calculating Betweenness - Efficiency problem
It would seem that you can output the initial file from EXCEL, read it into R with 'read.csv' and then use 'factor' to convert the characters for City1 and City2 to the numbers that you want to use. Have you tried this approach? On Fri, Jul 18, 2008 at 3:51 PM, Senthil Purushothaman <[EMAIL PROTECTED]> wrote: > Hello, > > I am calculating 'Betweenness' of a large network using R. Currently, I have > the node-node information (City1-City2) in an excel file, present in two > columns where column A has City1 and column B has City2 that city1 is > connected to. These are the steps that I go through to calculate betweenness > of my network. > > a) Convert the City1-City2 (text) into Number1-Number2 in the excel file > where every unique city has a unique number. > b) Paste all the city-city information separated by comma into c(...) in the > R GUI to obtain the corresponding vectors. As you can imagine this copy-paste > operation takes a long time. Example: c(1,3,1,5,2,4,2,5). Just fyi, I have a > text file that contains all nodes separated by comma based on the appropriate > link information. > c) Then, I create a graph file with the above vector. > d) I use the graph file to calculate betweenness of my network. > > I am sure there must be a better, more efficient way to calculate > betweenness. Ideally, I would like to just have the City1 - City2 (link) > information in two columns in an excel file and calculate the betweenness > from that file directly. > > Please provide an optimal solution for this problem. I appreciate your time > and help. > > Thanks, > Senthil > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Betweenness - Efficiency problem
Given that you have the data, exactly what are you expecting the charts to look like? You mention vectors, but are you trying to create a directed graph? It would be useful to show a small subset of the data and then how you would expect to process it and what type of graphs you would like to see. Factor will basically assign a unique number associating a value to a character string, and it may have no relationship to the way you have things stored in your matrix. A little more information is required so that we understand the problem you are trying to solve. On Sat, Jul 19, 2008 at 5:59 PM, Senthil Purushothaman <[EMAIL PROTECTED]> wrote: > Hi Jim, > Thank you for the response. Your suggestion will help me avoid the whole > text to number conversion process that I perform using LookUp in excel. I > will definitely give it a shot. But it still doesn't address the vector > conversion since a graph file is drawn only using the vectors. Assuming that > I use 'factor' to convert the characters to numbers, how do I convert these > numbers into vectors? > > Thanks, > Senthil > > > > > -Original Message- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Sat 7/19/2008 4:49 AM > To: Senthil Purushothaman > Cc: r-help@r-project.org > Subject: Re: [R] Calculating Betweenness - Efficiency problem > > It would seem that you can output the initial file from EXCEL, read it > into R with 'read.csv' and then use 'factor' to convert the characters > for City1 and City2 to the numbers that you want to use. Have you > tried this approach? > > On Fri, Jul 18, 2008 at 3:51 PM, Senthil Purushothaman > <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I am calculating 'Betweenness' of a large network using R. Currently, I >> have the node-node information (City1-City2) in an excel file, present in >> two columns where column A has City1 and column B has City2 that city1 is >> connected to. These are the steps that I go through to calculate betweenness >> of my network. >> >> a) Convert the City1-City2 (text) into Number1-Number2 in the excel file >> where every unique city has a unique number. >> b) Paste all the city-city information separated by comma into c(...) in >> the R GUI to obtain the corresponding vectors. As you can imagine this >> copy-paste operation takes a long time. Example: c(1,3,1,5,2,4,2,5). Just >> fyi, I have a text file that contains all nodes separated by comma based on >> the appropriate link information. >> c) Then, I create a graph file with the above vector. >> d) I use the graph file to calculate betweenness of my network. >> >> I am sure there must be a better, more efficient way to calculate >> betweenness. Ideally, I would like to just have the City1 - City2 (link) >> information in two columns in an excel file and calculate the betweenness >> from that file directly. >> >> Please provide an optimal solution for this problem. I appreciate your >> time and help. >> >> Thanks, >> Senthil >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Betweenness - Efficiency problem
Hi Jim, Thank you for the response. Your suggestion will help me avoid the whole text to number conversion process that I perform using LookUp in excel. I will definitely give it a shot. But it still doesn't address the vector conversion since a graph file is drawn only using the vectors. Assuming that I use 'factor' to convert the characters to numbers, how do I convert these numbers into vectors? Thanks, Senthil -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Sat 7/19/2008 4:49 AM To: Senthil Purushothaman Cc: r-help@r-project.org Subject: Re: [R] Calculating Betweenness - Efficiency problem It would seem that you can output the initial file from EXCEL, read it into R with 'read.csv' and then use 'factor' to convert the characters for City1 and City2 to the numbers that you want to use. Have you tried this approach? On Fri, Jul 18, 2008 at 3:51 PM, Senthil Purushothaman <[EMAIL PROTECTED]> wrote: > Hello, > > I am calculating 'Betweenness' of a large network using R. Currently, I have > the node-node information (City1-City2) in an excel file, present in two > columns where column A has City1 and column B has City2 that city1 is > connected to. These are the steps that I go through to calculate betweenness > of my network. > > a) Convert the City1-City2 (text) into Number1-Number2 in the excel file > where every unique city has a unique number. > b) Paste all the city-city information separated by comma into c(...) in the > R GUI to obtain the corresponding vectors. As you can imagine this copy-paste > operation takes a long time. Example: c(1,3,1,5,2,4,2,5). Just fyi, I have a > text file that contains all nodes separated by comma based on the appropriate > link information. > c) Then, I create a graph file with the above vector. > d) I use the graph file to calculate betweenness of my network. > > I am sure there must be a better, more efficient way to calculate > betweenness. Ideally, I would like to just have the City1 - City2 (link) > information in two columns in an excel file and calculate the betweenness > from that file directly. > > Please provide an optimal solution for this problem. I appreciate your time > and help. > > Thanks, > Senthil > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Betweenness - Efficiency problem
Senthil, you can try the 'igraph' package. Export your two-column Excel file as a .csv, use 'read.csv' to read that into R, then 'graph.data.frame' to create an igraph graph from it. Finally, call 'betweenness' on the graph. It is really just three/four lines, something like this: tab <- read.csv(...) g <- graph.data.frame(tab) bet <- betweenness(g) bet <- data.frame(city=V(g)$name, betweenness=bet) The last line creates a two column data frame with the betweenness score of each city. Best, Gabor On Sat, Jul 19, 2008 at 02:59:07PM -0700, Senthil Purushothaman wrote: > Hi Jim, > Thank you for the response. Your suggestion will help me avoid the whole > text to number conversion process that I perform using LookUp in excel. I > will definitely give it a shot. But it still doesn't address the vector > conversion since a graph file is drawn only using the vectors. Assuming that > I use 'factor' to convert the characters to numbers, how do I convert these > numbers into vectors? > > Thanks, > Senthil > > > > > -Original Message- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Sat 7/19/2008 4:49 AM > To: Senthil Purushothaman > Cc: r-help@r-project.org > Subject: Re: [R] Calculating Betweenness - Efficiency problem > > It would seem that you can output the initial file from EXCEL, read it > into R with 'read.csv' and then use 'factor' to convert the characters > for City1 and City2 to the numbers that you want to use. Have you > tried this approach? > > On Fri, Jul 18, 2008 at 3:51 PM, Senthil Purushothaman > <[EMAIL PROTECTED]> wrote: > > Hello, > > > > I am calculating 'Betweenness' of a large network using R. Currently, I > > have the node-node information (City1-City2) in an excel file, present in > > two columns where column A has City1 and column B has City2 that city1 is > > connected to. These are the steps that I go through to calculate > > betweenness of my network. > > > > a) Convert the City1-City2 (text) into Number1-Number2 in the excel file > > where every unique city has a unique number. > > b) Paste all the city-city information separated by comma into c(...) in > > the R GUI to obtain the corresponding vectors. As you can imagine this > > copy-paste operation takes a long time. Example: c(1,3,1,5,2,4,2,5). Just > > fyi, I have a text file that contains all nodes separated by comma based on > > the appropriate link information. > > c) Then, I create a graph file with the above vector. > > d) I use the graph file to calculate betweenness of my network. > > > > I am sure there must be a better, more efficient way to calculate > > betweenness. Ideally, I would like to just have the City1 - City2 (link) > > information in two columns in an excel file and calculate the betweenness > > from that file directly. > > > > Please provide an optimal solution for this problem. I appreciate your time > > and help. > > > > Thanks, > > Senthil > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Csardi Gabor <[EMAIL PROTECTED]>UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Betweenness - Efficiency problem
Dear Gabor, Thank you very much for the insights. I have been using the igraph package for my computations. But I did not know about graph.data.frame(). Thanks again for that. So I did run my data using the steps you had provided. Weirdly, even though the .csv file has approximately 300,000 records (remember that the file gets truncated to 65536 rows when opened in Excel 2003), not all of them are pulled in during the operation and the final betweenness list contains only ~1000+ records but it should be tens of thousands. I know that you are a busy person. This problem seems to be a very different challenge. I am attaching the Test.csv file for your experiments. Thank you very much again. Best regards, Senthil (909) 267-0799 -Original Message- From: Gabor Csardi [mailto:[EMAIL PROTECTED] Sent: Monday, July 21, 2008 1:57 AM To: Senthil Purushothaman Cc: jim holtman; r-help@r-project.org Subject: Re: [R] Calculating Betweenness - Efficiency problem Senthil, you can try the 'igraph' package. Export your two-column Excel file as a .csv, use 'read.csv' to read that into R, then 'graph.data.frame' to create an igraph graph from it. Finally, call 'betweenness' on the graph. It is really just three/four lines, something like this: tab <- read.csv(...) g <- graph.data.frame(tab) bet <- betweenness(g) bet <- data.frame(city=V(g)$name, betweenness=bet) The last line creates a two column data frame with the betweenness score of each city. Best, Gabor On Sat, Jul 19, 2008 at 02:59:07PM -0700, Senthil Purushothaman wrote: > Hi Jim, > Thank you for the response. Your suggestion will help me avoid the whole text to number conversion process that I perform using LookUp in excel. I will definitely give it a shot. But it still doesn't address the vector conversion since a graph file is drawn only using the vectors. Assuming that I use 'factor' to convert the characters to numbers, how do I convert these numbers into vectors? > > Thanks, > Senthil > > > > > -Original Message- > From: jim holtman [mailto:[EMAIL PROTECTED] > Sent: Sat 7/19/2008 4:49 AM > To: Senthil Purushothaman > Cc: r-help@r-project.org > Subject: Re: [R] Calculating Betweenness - Efficiency problem > > It would seem that you can output the initial file from EXCEL, read it > into R with 'read.csv' and then use 'factor' to convert the characters > for City1 and City2 to the numbers that you want to use. Have you > tried this approach? > > On Fri, Jul 18, 2008 at 3:51 PM, Senthil Purushothaman > <[EMAIL PROTECTED]> wrote: > > Hello, > > > > I am calculating 'Betweenness' of a large network using R. Currently, I have the node-node information (City1-City2) in an excel file, present in two columns where column A has City1 and column B has City2 that city1 is connected to. These are the steps that I go through to calculate betweenness of my network. > > > > a) Convert the City1-City2 (text) into Number1-Number2 in the excel file where every unique city has a unique number. > > b) Paste all the city-city information separated by comma into c(...) in the R GUI to obtain the corresponding vectors. As you can imagine this copy-paste operation takes a long time. Example: c(1,3,1,5,2,4,2,5). Just fyi, I have a text file that contains all nodes separated by comma based on the appropriate link information. > > c) Then, I create a graph file with the above vector. > > d) I use the graph file to calculate betweenness of my network. > > > > I am sure there must be a better, more efficient way to calculate betweenness. Ideally, I would like to just have the City1 - City2 (link) information in two columns in an excel file and calculate the betweenness from that file directly. > > > > Please provide an optimal solution for this problem. I appreciate your time and help. > > > > Thanks, > > Senthil > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem you are trying to solve? > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Csardi Gabor <[EMAIL PROTECTED]>UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Betweenness - Efficiency problem
Senthil, sending a 12Mb file to the list is not a good idea. I've run the code in my previous email without any problem, so you need to be a bit more specific about what went wrong for you. This is what I get: > library(igraph) > tab <- read.csv("/tmp/Test.csv") > dim(tab) [1] 304711 2 > length(unique(tab)) [1] 2 > g <- graph.data.frame(tab) > summary(g) Vertices: 48072 Edges: 304711 Directed: TRUE No graph attributes. Vertex attributes: name. No edge attributes. > system.time(bet <- betweenness(g)) user system elapsed 661.180 0.098 661.716 > length(bet) [1] 48072 > bet <- data.frame(city=V(g)$name, betweenness=bet) > dim(bet) [1] 48072 2 Best, Gabor On Tue, Jul 22, 2008 at 11:58:37AM -0700, Senthil Purushothaman wrote: > Dear Gabor, >Thank you very much for the insights. I have been using the igraph > package for my computations. But I did not know about > graph.data.frame(). Thanks again for that. So I did run my data using > the steps you had provided. Weirdly, even though the .csv file has > approximately 300,000 records (remember that the file gets truncated to > 65536 rows when opened in Excel 2003), not all of them are pulled in > during the operation and the final betweenness list contains only ~1000+ > records but it should be tens of thousands. > > I know that you are a busy person. This problem seems to be a very > different challenge. I am attaching the Test.csv file for your > experiments. Thank you very much again. > > Best regards, > Senthil > (909) 267-0799 > > -Original Message- > From: Gabor Csardi [mailto:[EMAIL PROTECTED] > Sent: Monday, July 21, 2008 1:57 AM > To: Senthil Purushothaman > Cc: jim holtman; r-help@r-project.org > Subject: Re: [R] Calculating Betweenness - Efficiency problem > > Senthil, > > you can try the 'igraph' package. Export your two-column Excel file > as a .csv, use 'read.csv' to read that into R, then 'graph.data.frame' > to create an igraph graph from it. Finally, call 'betweenness' on > the graph. It is really just three/four lines, something like this: > > tab <- read.csv(...) > g <- graph.data.frame(tab) > bet <- betweenness(g) > bet <- data.frame(city=V(g)$name, betweenness=bet) > > The last line creates a two column data frame with the betweenness > score of each city. > > Best, > Gabor [...] -- Csardi Gabor <[EMAIL PROTECTED]>UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Betweenness - Efficiency problem
Dear Gabor, I am really sorry about the file attachment. As you might have figured out I am quite new to the forum interaction techniques. I will keep your suggestion in mind. Thanks for taking the time to test the data I sent you. I found out where exactly the problem is. The surprising part is you should have had the same issue unless you were able to handle it. Many rows in the data sheet I had sent you have funky characters and when R reads them to create a graph it fails at that particular row. All the data just before that row with that unidentified character is used to create the graph. There are two issues at this point in time. a) How were you able to pull in all the information with the above mentioned problem being still in place? b) I have the number-number equivalent of the same data set and I used it to test if the betweenness works. I know that there are a total of 50251 nodes in the sheet. This time given that it is all numbers, the read.csv goes through successfully. I then use graph.data.frame to draw the graph and the summary information was startling. There were only 50245 vertices. I re-ran all my numbers and still they did not tally. A sample set of the data that I input looks like this. 1-4455 1-34545 2-4657 ... ... 50251-87 50251-11 I have no idea how 50251 nodes from the data sheet shrinks to 50245 nodes in the R graph. I tried creating the vectors (earlier methodology which takes a lot of copy/paste time) using c() and then drew a graph from that and the number of nodes show up to be 50251 which indicates that there are 50251 vertices. I would really appreciate if you can take a look at this issue since I am not sure if this is a data input issue or igraph issue. I will send you the number-number information sheet in a separate email. Thank you very much. I respect your time and effort in helping me out resolve this interesting challenge. Best regards, Senthil (909) 267-0799 -Original Message- From: Gabor Csardi [mailto:[EMAIL PROTECTED] Sent: Wed 7/23/2008 3:43 AM To: Senthil Purushothaman Cc: jim holtman; r-help@r-project.org Subject: Re: [R] Calculating Betweenness - Efficiency problem Senthil, sending a 12Mb file to the list is not a good idea. I've run the code in my previous email without any problem, so you need to be a bit more specific about what went wrong for you. This is what I get: > library(igraph) > tab <- read.csv("/tmp/Test.csv") > dim(tab) [1] 304711 2 > length(unique(tab)) [1] 2 > g <- graph.data.frame(tab) > summary(g) Vertices: 48072 Edges: 304711 Directed: TRUE No graph attributes. Vertex attributes: name. No edge attributes. > system.time(bet <- betweenness(g)) user system elapsed 661.180 0.098 661.716 > length(bet) [1] 48072 > bet <- data.frame(city=V(g)$name, betweenness=bet) > dim(bet) [1] 48072 2 Best, Gabor On Tue, Jul 22, 2008 at 11:58:37AM -0700, Senthil Purushothaman wrote: > Dear Gabor, >Thank you very much for the insights. I have been using the igraph > package for my computations. But I did not know about > graph.data.frame(). Thanks again for that. So I did run my data using > the steps you had provided. Weirdly, even though the .csv file has > approximately 300,000 records (remember that the file gets truncated to > 65536 rows when opened in Excel 2003), not all of them are pulled in > during the operation and the final betweenness list contains only ~1000+ > records but it should be tens of thousands. > > I know that you are a busy person. This problem seems to be a very > different challenge. I am attaching the Test.csv file for your > experiments. Thank you very much again. > > Best regards, > Senthil > (909) 267-0799 > > -Original Message- > From: Gabor Csardi [mailto:[EMAIL PROTECTED] > Sent: Monday, July 21, 2008 1:57 AM > To: Senthil Purushothaman > Cc: jim holtman; r-help@r-project.org > Subject: Re: [R] Calculating Betweenness - Efficiency problem > > Senthil, > > you can try the 'igraph' package. Export your two-column Excel file > as a .csv, use 'read.csv' to read that into R, then 'graph.data.frame' > to create an igraph graph from it. Finally, call 'betweenness' on > the graph. It is really just three/four lines, something like this: > > tab <- read.csv(...) > g <- graph.data.frame(tab) > bet <- betweenness(g) > bet <- data.frame(city=V(g)$name, betweenness=bet) > > The last line creates a two column data frame with the betweenness > score of each city. > > Best, > Gabor [...] -- Csardi Gabor <[EMAIL PROTECTED]>UNIL DGM [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.