[R] exporting clustering results to table
Hello list, the following approach did not work: clustersA <- pam(distances, nkA, diss=TRUE); gc(); filenameclu = paste("filenameclu", ".txt"); write.table(clustersA , file=filenameclu,sep=","); although it worked with clustersA <- hclust(distances, method="ward"); and a consecutive kclassA <- cutree(clustersA, k=nkA); filename = paste("clusters", ".txt"); write.table(kclassA,file=filename,sep=",",col.names=TRUE,row.names=TRUE); Is there a generic method to export cluster object? I know that pam is different (cluster object and some more data)- how can I extract & export the clustering into a table with two columns, ID = dissimilarity matrix row, and cluster = number of the cluster? I waas using sink to get the data, but for large matrices it involves a huge amount of manual formatting afterwards, let's say in excel. Thanks many times Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] exporting clustering results to table
Hello Haris, no, that is not the problem. But thank you anyway. I figured that paste has a funny behavior. But the object resulting from pam is complex, and cannot be cast into a table frame easily... Charilaos Skiadas wrote: > On Nov 27, 2007, at 7:41 AM, Martin Tomko wrote: > >> filename = paste("clusters", ".txt"); > > Don't know if this relates to your problem, but because "paste" adds > spaces by default (since sep=" ") this would result in a file named > "clusters .txt", not "clusters.txt". > > Haris Skiadas > Department of Mathematics and Computer Science > Hanover College > > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] median of binned values
Dear list, I have a vector (array, table row, whatever is best) of frequency values for categories (or bins), and I need to find the median category. Trivial to do by hand, but I was wondering if there is a means to do it in R in an elegant way. The obvious medioan(vector) returns the median frequency for the binns, and that is not what I want. i.e,: freq cat11 cat2 10 cat3 100 cat4 1000 cat5 1 I want it to return cat5, instead of cat3. Thanks a lot Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] median of binned values
Thank you, Chuck, would you mind commenting a bit on the code, it is not all clear... HOw would you go to retrieve only the numeric value (not the category name)? I am just starting with R, and the functionality of replicate and levels is not quite clear. I tried the documentation, but am not any wiser. What if I had a vector v <- vector(c(1,10,100,1000,1)) and wanted to perform it on that? Thanks a lot Martin Chuck Cleland wrote: > Martin Tomko wrote: >> Dear list, >> I have a vector (array, table row, whatever is best) of frequency values >> for categories (or bins), and I need to find the median category. >> Trivial to do by hand, but I was wondering if there is a means to do it >> in R in an elegant way. >> >> The obvious medioan(vector) returns the median frequency for the binns, >> and that is not what I want. i.e,: >> freq >> cat11 >> cat2 10 >> cat3 100 >> cat4 1000 >> cat5 1 >> >> I want it to return cat5, instead of cat3. > > df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")), > freq = c(1,10,100,1000,1)) > > df > binname freq > 1cat1 1 > 2cat210 > 3cat3 100 > 4cat4 1000 > 5cat5 1 > > with(df, levels(binname)[median(rep(as.numeric(binname), freq))]) > [1] "cat5" > >> Thanks a lot >> Martin >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Martin Tomko Postdoctoral Research Assistant Geographic Information Systems Division Department of Geography University of Zurich - Irchel Winterthurerstr. 190 CH-8057 Zurich, Switzerland email: [EMAIL PROTECTED] site: http://www.geo.uzh.ch/~mtomko mob:+41-788 629 558 tel:+41-44-6355256 fax:+41-44-6356848 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting activity time intervals
Dear All, I have interval data (for Mon-Sun, 00-24h) of an activity and would like to visually plot them in a matrix-like plot, where color A would be assigned to the activity, and color X to unspecified time usage. Note that the activities are not in standardised units (hours or so), but from startTime to endTime (in hrs:mins) In principle it is a bar plot where multiple bars can be stacked one on top of another, with say the x axis representing time in a day, the y axis the day of the week, without gaps between the bars? can anyone please suggest a way to plot these? Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "pattern matching" accross multiple matrices
Hi all, I have a set of patterns which can occur in a series of (3) matrices. I want to identify those and create a fourth one with the identifiers of the cases. Something like: for (i in 1:l) { for (j in 1:w) { A[A[i,j]==1 & D[i,j]==1 & P[i,j]==1] <- Case1; A[A[i,j]==-1 & D[i,j]==-1 & P[i,j]==-1] <- Case2; etc } } the code seems to run, but is very slow Could anyone please suggest a better approach? I was thinking that 3 matrices could be stacked in a cube, and the column of a cube searched for a pattern, but am not sure how to do that... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "pattern matching" accross multiple matrices
Hi all, I have a set of patterns which can occur in a series of (3) matrices. I want to identify those and create a fourth one with the identifiers of the cases. Something like: for (i in 1:l) { for (j in 1:w) { A[A[i,j]==1 & D[i,j]==1 & P[i,j]==1] <- Case1; A[A[i,j]==-1 & D[i,j]==-1 & P[i,j]==-1] <- Case2; etc } } the code seems to run, but is very slow Could anyone please suggest a better approach? I was thinking that 3 matrices could be stacked in a cube, and the column of a cube searched for a pattern, but am not sure how to do that... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] install.package("TinnR") - there is no package called 'TinnR'
I have troubles make TinnR 2.2.0.2 work, it seems that the dependency on the package TinnR that cannot be found (I tried also manual downloads, but I cannot find the package anywhere on any CRAN mirror). I even set a default cran mirror in the Rprofile.site file, so that the later command can find it: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). No luck. Even manually issuing the comand in Rterm fails, package ‘TinnR’ is not available. Any idea how I could make my TinnR work? I googled extensively, but without luck... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install.package("TinnR") - there is no package called 'TinnR'
David, that is not helpful. I KNOW that TinnR is a standalone editor. If you had a look at the Rprofile.site required by TinnR, you would notice the part of the code I send earlier: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). These are executed by the TinnR editor upon start. The TinnR package MUST therefore exist, and is required. Hope that someone else can REALLY help. Martin David Winsemius wrote: Tinn-R is not an R package. It is a standalone text editor: http://www.lmgtfy.com/?q=tinn-r -- David Winsemius On Mar 24, 2009, at 7:05 AM, Martin Tomko wrote: I have troubles make TinnR 2.2.0.2 work, it seems that the dependency on the package TinnR that cannot be found (I tried also manual downloads, but I cannot find the package anywhere on any CRAN mirror). I even set a default cran mirror in the Rprofile.site file, so that the later command can find it: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). No luck. Even manually issuing the comand in Rterm fails, package ‘TinnR’ is not available. Any idea how I could make my TinnR work? I googled extensively, but without luck... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install.package("TinnR") - there is no package called 'TinnR'
Patrick, Romain, thank you very much for your help! I have found the site of the package at http://r-forge.r-project.org/projects/tinnr/ , as Romain suggests, but when you try to download, iyou find that the package is actually not contributed to the repository, it is just a space holder for it! Patrick's http://cran.r-project.org/web/packages/TinnR/index.html is better, and I will give it a try... I was just not sure abouyt the versions, but it seems to be a relatively recent one, so hopefully it will work. Thanks again! Martin Richardson, Patrick wrote: Hi Martin, If all else fails, you could download the package from http://cran.r-project.org/web/packages/TinnR/index.html and install as a .zip file from within R. That is bizarre that R cannot find the package. I've had no problems downloading and installing. Best regards, Patrick -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Martin Tomko Sent: Tuesday, March 24, 2009 8:25 AM To: David Winsemius Cc: r-help@r-project.org Subject: Re: [R] install.package("TinnR") - there is no package called 'TinnR' David, that is not helpful. I KNOW that TinnR is a standalone editor. If you had a look at the Rprofile.site required by TinnR, you would notice the part of the code I send earlier: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). These are executed by the TinnR editor upon start. The TinnR package MUST therefore exist, and is required. Hope that someone else can REALLY help. Martin David Winsemius wrote: Tinn-R is not an R package. It is a standalone text editor: http://www.lmgtfy.com/?q=tinn-r -- David Winsemius On Mar 24, 2009, at 7:05 AM, Martin Tomko wrote: I have troubles make TinnR 2.2.0.2 work, it seems that the dependency on the package TinnR that cannot be found (I tried also manual downloads, but I cannot find the package anywhere on any CRAN mirror). I even set a default cran mirror in the Rprofile.site file, so that the later command can find it: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). No luck. Even manually issuing the comand in Rterm fails, package 'TinnR' is not available. Any idea how I could make my TinnR work? I googled extensively, but without luck... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email message, including any attachments, is for ...{{dropped:5}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install.package("TinnR") - there is no package called 'TinnR' (RESOLVED)
Hi Duncan, it is Win Xp with the 2.6.0 R-project version. Sorry, I should have included this before. Installing the TinnR package manually from a local zip file downloaded from CRAN helped. I am still not sure why the package was not picked in the repositories. Can anyone please see if the package is visible to other under install packages in any repository? Thanks Martin Duncan Murdoch wrote: On 3/24/2009 7:05 AM, Martin Tomko wrote: I have troubles make TinnR 2.2.0.2 work, it seems that the dependency on the package TinnR that cannot be found (I tried also manual downloads, but I cannot find the package anywhere on any CRAN mirror). What R version are you using, on what platform? I have no trouble with an automatic install of the TinnR package into 2.8.1 on Windows. Duncan Murdoch I even set a default cran mirror in the Rprofile.site file, so that the later command can find it: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). No luck. Even manually issuing the comand in Rterm fails, package ‘TinnR’ is not available. Any idea how I could make my TinnR work? I googled extensively, but without luck... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install.package("TinnR") - there is no package called 'TinnR' (RESOLVED)
Yes, will do. I just was not in need to upgrade as everything worked... An R is not my primary development environment - i.e., I need it a couple of times a year. But you are right, it is good to be on an updated version. Cheers Martin Duncan Murdoch wrote: On 3/24/2009 8:55 AM, Martin Tomko wrote: Hi Duncan, it is Win Xp with the 2.6.0 R-project version. Sorry, I should have included this before. Installing the TinnR package manually from a local zip file downloaded from CRAN helped. I am still not sure why the package was not picked in the repositories. Can anyone please see if the package is visible to other under install packages in any repository? Your version of R is too old. TinnR was last updated in February this year and claims to support 2.6.0, but CRAN no longer builds binaries for 2.6.x. (Version 2.6.0 became obsolete in November 2007 when 2.6.1 was released, and binaries for the 2.6.x series stopped being built sometime last year.) If you are set up for installing from source, you could try downloading the source package http://cran.r-project.org/src/contrib/TinnR_1.0.3.tar.gz and running Rcmd INSTALL TinnR_1.0.3.tar.gz but it is probably easier to update your R to the current release. Duncan Murdoch Thanks Martin Duncan Murdoch wrote: On 3/24/2009 7:05 AM, Martin Tomko wrote: I have troubles make TinnR 2.2.0.2 work, it seems that the dependency on the package TinnR that cannot be found (I tried also manual downloads, but I cannot find the package anywhere on any CRAN mirror). What R version are you using, on what platform? I have no trouble with an automatic install of the TinnR package into 2.8.1 on Windows. Duncan Murdoch I even set a default cran mirror in the Rprofile.site file, so that the later command can find it: # check necesary packages necessary = c('TinnR', 'svSocket') installed = necessary %in% installed.packages()[, 'Package'] if (length(necessary[!installed]) >=1) install.packages(necessary[!installed], dep=T). No luck. Even manually issuing the comand in Rterm fails, package ‘TinnR’ is not available. Any idea how I could make my TinnR work? I googled extensively, but without luck... Thanks Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strange behavior when reading csv - line wraps
Dear All, I am observing a strange behavior and searching the archives and help pages didn't help much. I have a csv with a variable number of fields in each line. I use dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE); to read it in, and it works. But - some lines are long and 'wrap', or split and continue on the next line. So when I check the dim of the frame, they are not correct and I can see when I do a printout that the lines is split into two in the frame. I checked the input file and all is good. an example of the input is: 37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet; where the last values occurs on the next line in the data frame. It does not have to be the last value, as in the follwong example, the word "kempten" starts the next line: 39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio; What could be the reason? I ws thinking about solving the issue by using a different separator, that I would use for the first 7 fields and concatenating all of the remaining values into a single stirng value, but could not figure out how to do such a substitution in R. Unfortunately, on my system I cannot specify a range for sed... Thanks for any help/pointers Martin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behavior when reading csv - line wraps
Jim, the two lines I put in are the actual problematic input lines. In these examples, there are no quotes nor # signs, although I have no means to make sure they do not occur in the inputs (any hints how I could deal with that?). I am trying to avoid as much pre-processing outside R as possible, and I have to process about 500 files with up to 3000 records each, so I need a more or less automated/batch solution. - so any string substitution will have to occur in R. But for the moment, I do not see a reaason for substitution, and the wrapping still occurs. Cheers Martin jim holtman wrote: You need to supply the actual input line so we can see what is happening. Are you sure you do not have unbalanced quotes in your input (try quote='') or do you have comment characters ("#") in your input? On Fri, May 29, 2009 at 3:15 PM, Martin Tomko <mailto:martin.to...@geo.uzh.ch>> wrote: Dear All, I am observing a strange behavior and searching the archives and help pages didn't help much. I have a csv with a variable number of fields in each line. I use dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE); to read it in, and it works. But - some lines are long and 'wrap', or split and continue on the next line. So when I check the dim of the frame, they are not correct and I can see when I do a printout that the lines is split into two in the frame. I checked the input file and all is good. an example of the input is: 37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet; where the last values occurs on the next line in the data frame. It does not have to be the last value, as in the follwong example, the word "kempten" starts the next line: 39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio; What could be the reason? I ws thinking about solving the issue by using a different separator, that I would use for the first 7 fields and concatenating all of the remaining values into a single stirng value, but could not figure out how to do such a substitution in R. Unfortunately, on my system I cannot specify a range for sed... Thanks for any help/pointers Martin __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange behavior when reading csv - line wraps
Dear Jim, with the help of Ted, we diagnosed that the cause is in the extreme variability in line length during reading in. As the table column number is apparently determined fro mthe first five lines, what exceeds this length gets automatically on the next line. I am now trying to find a way to read in the data despite this. I have no control over the table extent, the only thing that would make sense according to my data would be to read in a fixed number of columns and merge all remaining columns as a long string in the last one. No idea how to do this, though. Thanks Martin jim holtman wrote: It is still not clear to me exactly how you want to read the lines in. If the lines have a variable number of fields, and some of the lines might be wrapped, is there some way to determine where the start of each line is. If you are reading them in with read.csv, then the system is assuming that each line starts a new row. If this is not the case, then you will have to state the rules that determine where the lines start. You can always read the data in with 'scan' to separate each line and then do whatever processing is required to put together the rows in a data frame that you want. In one of your examples, you indicated that the line was split starting at the word "kempten"; if this is in the middle of the line, then you would have to create the break after reading the line in with 'scan' and then creating the rows in the dataframe. All of this can be done in R if you can state what the criteria is. On Sat, May 30, 2009 at 4:32 AM, Martin Tomko <mailto:martin.to...@geo.uzh.ch>> wrote: Jim, the two lines I put in are the actual problematic input lines. In these examples, there are no quotes nor # signs, although I have no means to make sure they do not occur in the inputs (any hints how I could deal with that?). I am trying to avoid as much pre-processing outside R as possible, and I have to process about 500 files with up to 3000 records each, so I need a more or less automated/batch solution. - so any string substitution will have to occur in R. But for the moment, I do not see a reaason for substitution, and the wrapping still occurs. Cheers Martin jim holtman wrote: You need to supply the actual input line so we can see what is happening. Are you sure you do not have unbalanced quotes in your input (try quote='') or do you have comment characters ("#") in your input? On Fri, May 29, 2009 at 3:15 PM, Martin Tomko mailto:martin.to...@geo.uzh.ch> <mailto:martin.to...@geo.uzh.ch <mailto:martin.to...@geo.uzh.ch>>> wrote: Dear All, I am observing a strange behavior and searching the archives and help pages didn't help much. I have a csv with a variable number of fields in each line. I use dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE); to read it in, and it works. But - some lines are long and 'wrap', or split and continue on the next line. So when I check the dim of the frame, they are not correct and I can see when I do a printout that the lines is split into two in the frame. I checked the input file and all is good. an example of the input is: 37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet; where the last values occurs on the next line in the data frame. It does not have to be the last value, as in the follwong example, the word "kempten" starts the next line: 39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio; What could be the reason? I ws thinking about solving the issue by using a different separator, that I would use for the first 7 fields and concatenating all of the remaining values into a single stirng value, but could not figure out how to do such a substitution in R. Unfortunately, on my system I cannot specify a range for sed... Thanks for any help/pointers Martin __
Re: [R] strange behavior when reading csv - line wraps
Big thanks to Ted and Jim for all the help. Martin (Ted Harding) wrote: Ah!!! It was count.fields() which we had overlooked! We discoveered a work-round which involved using Data0 <- readLines(file) to create a vector of strings, one for each line of the input file, and then using NF <- unlist(lapply(R0,function(x) length(unlist(gregexpr(";",x,fixed=TRUE,useBytes=TRUE)) to count the number of occurrences of ";" (the separator) in each line. (NF+1) produces the same result as count.fields(file,sep=";"). Thanks for pointing out the existence of count.fields()! Ted. On 31-May-09 15:04:23, jim holtman wrote: You can do something like this: count the number of fields in each line of the file and use the max to determine the number of columns for read.table: file <- '/tempxx.txt' maxFields <- max(count.fields(file)) # max # now setup read.table for max number input <- read.table(file, colClasses=rep(NA, maxFields), fill=TRUE, col.names=paste("V", seq(maxFields), sep='')) On Sun, May 31, 2009 at 6:06 AM, Martin Tomko wrote: Dear Jim, with the help of Ted, we diagnosed that the cause is in the extreme variability in line length during reading in. As the table column number is apparently determined fro mthe first five lines, what exceeds this length gets automatically on the next line. I am now trying to find a way to read in the data despite this. I have no control over the table extent, the only thing that would make sense according to my data would be to read in a fixed number of columns and merge all remaining columns as a long string in the last one. No idea how to do this, though. Thanks Martin jim holtman wrote: It is still not clear to me exactly how you want to read the lines in. If the lines have a variable number of fields, and some of the lines might be wrapped, is there some way to determine where the start of each line is. If you are reading them in with read.csv, then the system is assuming that each line starts a new row. If this is not the case, then you will have to state the rules that determine where the lines start. You can always read the data in with 'scan' to separate each line and then do whatever processing is required to put together the rows in a data frame that you want. In one of your examples, you indicated that the line was split starting at the word "kempten"; if this is in the middle of the line, then you would have to create the break after reading the line in with 'scan' and then creating the rows in the dataframe. All of this can be done in R if you can state what the criteria is. On Sat, May 30, 2009 at 4:32 AM, Martin Tomko > wrote: Jim, the two lines I put in are the actual problematic input lines. In these examples, there are no quotes nor # signs, although I have no means to make sure they do not occur in the inputs (any hints how I could deal with that?). I am trying to avoid as much pre-processing outside R as possible, and I have to process about 500 files with up to 3000 records each, so I need a more or less automated/batch solution. - so any string substitution will have to occur in R. But for the moment, I do not see a reaason for substitution, and the wrapping still occurs. Cheers Martin jim holtman wrote: You need to supply the actual input line so we can see what is happening. Are you sure you do not have unbalanced quotes in your input (try quote='') or do you have comment characters ("#") in your input? On Fri, May 29, 2009 at 3:15 PM, Martin Tomko mailto:martin.to...@geo.uzh.ch> <mailto:martin.to...@geo.uzh.ch <mailto:martin.to...@geo.uzh.ch>>> wrote: Dear All, I am observing a strange behavior and searching the archives and help pages didn't help much. I have a csv with a variable number of fields in each line. I use dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE); to read it in, and it works. But - some lines are long and 'wrap', or split and continue on the next line. So when I check the dim of the frame, they are not correct and I can see when I do a printout that the lines is split into two in the frame. I checked the input file and all is good. an example of the input is: 37;2175168475;13;8.522729;47.19537;16366...@n00 ;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switz erland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;touris mus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitri otnet; where the last values occurs on